feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules

- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
This commit is contained in:
2025-10-30 00:09:39 +02:00
parent 3154c67978
commit 7b5bdcf4d3
503 changed files with 16136 additions and 54638 deletions

View File

@@ -0,0 +1,77 @@
# Concelier Apple Security Update Connector Operations
This runbook covers staging and production rollout for the Apple security updates connector (`source:vndr-apple:*`), including observability checks and fixture maintenance.
## 1. Prerequisites
- Network egress (or mirrored cache) for `https://gdmf.apple.com/v2/pmv` and the Apple Support domain (`https://support.apple.com/`).
- Optional: corporate proxy exclusions for the Apple hosts if outbound traffic is normally filtered.
- Updated configuration (environment variables or `concelier.yaml`) with an `apple` section. Example baseline:
```yaml
concelier:
sources:
apple:
softwareLookupUri: "https://gdmf.apple.com/v2/pmv"
advisoryBaseUri: "https://support.apple.com/"
localeSegment: "en-us"
maxAdvisoriesPerFetch: 25
initialBackfill: "120.00:00:00"
modifiedTolerance: "02:00:00"
failureBackoff: "00:05:00"
```
> `softwareLookupUri` and `advisoryBaseUri` must stay absolute and aligned with the HTTP allow-list; Concelier automatically adds both hosts to the connector HttpClient.
## 2. Staging Smoke Test
1. Deploy the configuration and restart the Concelier workers to ensure the Apple connector options are bound.
2. Trigger a full connector cycle:
- CLI: `stella db jobs run source:vndr-apple:fetch --and-then source:vndr-apple:parse --and-then source:vndr-apple:map`
- REST: `POST /jobs/run { "kind": "source:vndr-apple:fetch", "chain": ["source:vndr-apple:parse", "source:vndr-apple:map"] }`
3. Validate metrics exported under meter `StellaOps.Concelier.Connector.Vndr.Apple`:
- `apple.fetch.items` (documents fetched)
- `apple.fetch.failures`
- `apple.fetch.unchanged`
- `apple.parse.failures`
- `apple.map.affected.count` (histogram of affected package counts)
4. Cross-check the shared HTTP counters:
- `concelier.source.http.requests_total{concelier_source="vndr-apple"}` should increase for both index and detail phases.
- `concelier.source.http.failures_total{concelier_source="vndr-apple"}` should remain flat (0) during a healthy run.
5. Inspect the info logs:
- `Apple software index fetch … processed=X newDocuments=Y`
- `Apple advisory parse complete … aliases=… affected=…`
- `Mapped Apple advisory … pendingMappings=0`
6. Confirm MongoDB state:
- `raw_documents` store contains the HT article HTML with metadata (`apple.articleId`, `apple.postingDate`).
- `dtos` store has `schemaVersion="apple.security.update.v1"`.
- `advisories` collection includes keys `HTxxxxxx` with normalized SemVer rules.
- `source_states` entry for `apple` shows a recent `cursor.lastPosted`.
## 3. Production Monitoring
- **Dashboards** Add the following expressions to your Concelier Grafana board (OTLP/Prometheus naming assumed):
- `rate(apple_fetch_items_total[15m])` vs `rate(concelier_source_http_requests_total{concelier_source="vndr-apple"}[15m])`
- `rate(apple_fetch_failures_total[5m])` for error spikes (`severity=warning` at `>0`)
- `histogram_quantile(0.95, rate(apple_map_affected_count_bucket[1h]))` to watch affected-package fan-out
- `increase(apple_parse_failures_total[6h])` to catch parser drift (alerts at `>0`)
- **Alerts** Page if `rate(apple_fetch_items_total[2h]) == 0` during business hours while other connectors are active. This often indicates lookup feed failures or misconfigured allow-lists.
- **Logs** Surface warnings `Apple document {DocumentId} missing GridFS payload` or `Apple parse failed`—repeated hits imply storage issues or HTML regressions.
- **Telemetry pipeline** `StellaOps.Concelier.WebService` now exports `StellaOps.Concelier.Connector.Vndr.Apple` alongside existing Concelier meters; ensure your OTEL collector or Prometheus scraper includes it.
## 4. Fixture Maintenance
Regression fixtures live under `src/Concelier/__Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests/Apple/Fixtures`. Refresh them whenever Apple reshapes the HT layout or when new platforms appear.
1. Run the helper script matching your platform:
- Bash: `./scripts/update-apple-fixtures.sh`
- PowerShell: `./scripts/update-apple-fixtures.ps1`
2. Each script exports `UPDATE_APPLE_FIXTURES=1`, updates the `WSLENV` passthrough, and touches `.update-apple-fixtures` so WSL+VS Code test runs observe the flag. The subsequent test execution fetches the live HT articles listed in `AppleFixtureManager`, sanitises the HTML, and rewrites the `.expected.json` DTO snapshots.
3. Review the diff for localisation or nav noise. Once satisfied, re-run the tests without the env var (`dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests/StellaOps.Concelier.Connector.Vndr.Apple.Tests.csproj`) to verify determinism.
4. Commit fixture updates together with any parser/mapping changes that motivated them.
## 5. Known Issues & Follow-up Tasks
- Apple occasionally throttles anonymous requests after bursts. The connector backs off automatically, but persistent `apple.fetch.failures` spikes might require mirroring the HT content or scheduling wider fetch windows.
- Rapid Security Responses may appear before the general patch notes surface in the lookup JSON. When that happens, the fetch run will log `detailFailures>0`. Collect sample HTML and refresh fixtures to confirm parser coverage.
- Multi-locale content is still under regression sweep (`src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Vndr.Apple/TASKS.md`). Capture non-`en-us` snapshots once the fixture tooling stabilises.

View File

@@ -0,0 +1,72 @@
# Concelier CCCS Connector Operations
This runbook covers daytoday operation of the Canadian Centre for Cyber Security (`source:cccs:*`) connector, including configuration, telemetry, and historical backfill guidance for English/French advisories.
## 1. Configuration Checklist
- Network egress (or mirrored cache) for `https://www.cyber.gc.ca/` and the JSON API endpoints under `/api/cccs/`.
- Set the Concelier options before restarting workers. Example `concelier.yaml` snippet:
```yaml
concelier:
sources:
cccs:
feeds:
- language: "en"
uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=en&content_type=cccs_threat"
- language: "fr"
uri: "https://www.cyber.gc.ca/api/cccs/threats/v1/get?lang=fr&content_type=cccs_threat"
maxEntriesPerFetch: 80 # increase temporarily for backfill runs
maxKnownEntries: 512
requestTimeout: "00:00:30"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
```
> The `/api/cccs/threats/v1/get` endpoint returns thousands of records per language (≈5100 rows each as of 20251014). The connector honours `maxEntriesPerFetch`, so leave it low for steadystate and raise it for planned backfills.
## 2. Telemetry & Logging
- **Metrics (Meter `StellaOps.Concelier.Connector.Cccs`):**
- `cccs.fetch.attempts`, `cccs.fetch.success`, `cccs.fetch.failures`
- `cccs.fetch.documents`, `cccs.fetch.unchanged`
- `cccs.parse.success`, `cccs.parse.failures`, `cccs.parse.quarantine`
- `cccs.map.success`, `cccs.map.failures`
- **Shared HTTP metrics** via `SourceDiagnostics`:
- `concelier.source.http.requests{concelier.source="cccs"}`
- `concelier.source.http.failures{concelier.source="cccs"}`
- `concelier.source.http.duration{concelier.source="cccs"}`
- **Structured logs**
- `CCCS fetch completed feeds=… items=… newDocuments=… pendingDocuments=…`
- `CCCS parse completed parsed=… failures=…`
- `CCCS map completed mapped=… failures=…`
- Warnings fire when GridFS payloads/DTOs go missing or parser sanitisation fails.
Suggested Grafana alerts:
- `increase(cccs.fetch.failures_total[15m]) > 0`
- `rate(cccs.map.success_total[1h]) == 0` while other connectors are active
- `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cccs"}[1h])) > 5s`
## 3. Historical Backfill Plan
1. **Snapshot the source** the API accepts `page=<n>` and `lang=<en|fr>` query parameters. `page=0` returns the full dataset (observed earliest `date_created`: 20180608 for EN, 20180608 for FR). Mirror those responses into Offline Kit storage when operating airgapped.
2. **Stage ingestion**:
- Temporarily raise `maxEntriesPerFetch` (e.g. 500) and restart Concelier workers.
- Run chained jobs until `pendingDocuments` drains:
`stella db jobs run source:cccs:fetch --and-then source:cccs:parse --and-then source:cccs:map`
- Monitor `cccs.fetch.unchanged` growth; once it approaches dataset size the backfill is complete.
3. **Optional pagination sweep** for incremental mirrors, iterate `page=<n>` (0…N) while `response.Count == 50`, persisting JSON to disk. Store alongside metadata (`language`, `page`, SHA256) so repeated runs detect drift.
4. **Language split** keep EN/FR payloads separate to preserve canonical language fields. The connector emits `Language` directly from the feed entry, so mixed ingestion simply produces parallel advisories keyed by the same serial number.
5. **Throttle planning** schedule backfills during maintenance windows; the API tolerates burst downloads but respect the 250ms request delay or raise it if mirrored traffic is not available.
## 4. Selector & Sanitiser Notes
- `CccsHtmlParser` now parses the **unsanitised DOM** (via AngleSharp) and only sanitises when persisting `ContentHtml`.
- Product extraction walks headings (`Affected Products`, `Produits touchés`, `Mesures recommandées`) and consumes nested lists within `div/section/article` containers.
- `HtmlContentSanitizer` allows `<h1>…<h6>` and `<section>` so stored HTML keeps headings for UI rendering and downstream summarisation.
## 5. Fixture Maintenance
- Regression fixtures live in `src/Concelier/__Tests/StellaOps.Concelier.Connector.Cccs.Tests/Fixtures`.
- Refresh via `UPDATE_CCCS_FIXTURES=1 dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Cccs.Tests/StellaOps.Concelier.Connector.Cccs.Tests.csproj`.
- Fixtures capture both EN/FR advisories with nested lists to guard against sanitiser regressions; review diffs for heading/list changes before committing.

View File

@@ -0,0 +1,146 @@
# Concelier CERT-Bund Connector Operations
_Last updated: 2025-10-17_
Germanys Federal Office for Information Security (BSI) operates the Warn- und Informationsdienst (WID) portal. The Concelier CERT-Bund connector (`source:cert-bund:*`) ingests the public RSS feed, hydrates the portals JSON detail endpoint, and maps the result into canonical advisories while preserving the original German content.
---
## 1. Configuration Checklist
- Allow outbound access (or stage mirrors) for:
- `https://wid.cert-bund.de/content/public/securityAdvisory/rss`
- `https://wid.cert-bund.de/portal/` (session/bootstrap)
- `https://wid.cert-bund.de/portal/api/securityadvisory` (detail/search/export JSON)
- Ensure the HTTP client reuses a cookie container (the connectors dependency injection wiring already sets this up).
Example `concelier.yaml` fragment:
```yaml
concelier:
sources:
cert-bund:
feedUri: "https://wid.cert-bund.de/content/public/securityAdvisory/rss"
portalBootstrapUri: "https://wid.cert-bund.de/portal/"
detailApiUri: "https://wid.cert-bund.de/portal/api/securityadvisory"
maxAdvisoriesPerFetch: 50
maxKnownAdvisories: 512
requestTimeout: "00:00:30"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
```
> Leave `maxAdvisoriesPerFetch` at 50 during normal operation. Raise it only for controlled backfills, then restore the default to avoid overwhelming the portal.
---
## 2. Telemetry & Logging
- **Meter**: `StellaOps.Concelier.Connector.CertBund`
- **Counters / histograms**:
- `certbund.feed.fetch.attempts|success|failures`
- `certbund.feed.items.count`
- `certbund.feed.enqueued.count`
- `certbund.feed.coverage.days`
- `certbund.detail.fetch.attempts|success|not_modified|failures{reason}`
- `certbund.parse.success|failures{reason}`
- `certbund.parse.products.count`, `certbund.parse.cve.count`
- `certbund.map.success|failures{reason}`
- `certbund.map.affected.count`, `certbund.map.aliases.count`
- Shared HTTP metrics remain available through `concelier.source.http.*`.
**Structured logs** (all emitted at information level when work occurs):
- `CERT-Bund fetch cycle: … truncated {Truncated}, coverageDays={CoverageDays}`
- `CERT-Bund parse cycle: parsed {Parsed}, failures {Failures}, …`
- `CERT-Bund map cycle: mapped {Mapped}, failures {Failures}, …`
Alerting ideas:
1. `increase(certbund.detail.fetch.failures_total[10m]) > 0`
2. `rate(certbund.map.success_total[30m]) == 0`
3. `histogram_quantile(0.95, rate(concelier_source_http_duration_bucket{concelier_source="cert-bund"}[15m])) > 5s`
The WebService now registers the meter so metrics surface automatically once OpenTelemetry metrics are enabled.
---
## 3. Historical Backfill & Export Strategy
### 3.1 Retention snapshot
- RSS window: ~250 advisories (≈90days at current cadence).
- Older advisories are accessible through the JSON search/export APIs once the anti-CSRF token is supplied.
### 3.2 JSON search pagination
```bash
# 1. Bootstrap cookies (client_config + XSRF-TOKEN)
curl -s -c cookies.txt "https://wid.cert-bund.de/portal/" > /dev/null
curl -s -b cookies.txt -c cookies.txt \
-H "X-Requested-With: XMLHttpRequest" \
"https://wid.cert-bund.de/portal/api/security/csrf" > /dev/null
XSRF=$(awk '/XSRF-TOKEN/ {print $7}' cookies.txt)
# 2. Page search results
curl -s -b cookies.txt \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "X-XSRF-TOKEN: ${XSRF}" \
-X POST \
--data '{"page":4,"size":100,"sort":["published,desc"]}' \
"https://wid.cert-bund.de/portal/api/securityadvisory/search" \
> certbund-page4.json
```
Iterate `page` until the response `content` array is empty. Pages 09 currently cover 2014→present. Persist JSON responses (plus SHA256) for Offline Kit parity.
> **Shortcut** run `python src/Tools/certbund_offline_snapshot.py --output seed-data/cert-bund`
> to bootstrap the session, capture the paginated search responses, and regenerate
> the manifest/checksum files automatically. Supply `--cookie-file` and `--xsrf-token`
> if the portal requires a browser-derived session (see options via `--help`).
### 3.3 Export bundles
```bash
python src/Tools/certbund_offline_snapshot.py \
--output seed-data/cert-bund \
--start-year 2014 \
--end-year "$(date -u +%Y)"
```
The helper stores yearly exports under `seed-data/cert-bund/export/`,
captures paginated search snapshots in `seed-data/cert-bund/search/`,
and generates the manifest + SHA files in `seed-data/cert-bund/manifest/`.
Split ranges according to your compliance window (default: one file per
calendar year). Concelier can ingest these JSON payloads directly when
operating offline.
> When automatic bootstrap fails (e.g. portal introduces CAPTCHA), run the
> manual `curl` flow above, then rerun the helper with `--skip-fetch` to
> rebuild the manifest from the existing files.
### 3.4 Connector-driven catch-up
1. Temporarily raise `maxAdvisoriesPerFetch` (e.g. 150) and reduce `requestDelay`.
2. Run `stella db jobs run source:cert-bund:fetch --and-then source:cert-bund:parse --and-then source:cert-bund:map` until the fetch log reports `enqueued=0`.
3. Restore defaults and capture the cursor snapshot for audit.
---
## 4. Locale & Translation Guidance
- Advisories remain in German (`language: "de"`). Preserve wording for provenance and legal accuracy.
- UI localisation: enable the translation bundles documented in `docs/15_UI_GUIDE.md` if English UI copy is required. Operators can overlay machine or human translations, but the canonical database stores the source text.
- Docs guild is compiling a CERT-Bund terminology glossary under `docs/locale/certbund-glossary.md` so downstream teams can reference consistent English equivalents without altering the stored advisories.
---
## 5. Verification Checklist
1. Observe `certbund.feed.fetch.success` and `certbund.detail.fetch.success` increments after runs; `certbund.feed.coverage.days` should hover near the observed RSS window.
2. Ensure summary logs report `truncated=false` in steady state—`true` indicates the fetch cap was hit.
3. During backfills, watch `certbund.feed.enqueued.count` trend to zero.
4. Spot-check stored advisories in Mongo to confirm `language="de"` and reference URLs match the portal detail endpoint.
5. For Offline Kit exports, validate SHA256 hashes before distribution.

View File

@@ -0,0 +1,94 @@
# Concelier Cisco PSIRT Connector OAuth Provisioning SOP
_Last updated: 2025-10-14_
## 1. Scope
This runbook describes how Ops provisions, rotates, and distributes Cisco PSIRT openVuln OAuth client credentials for the Concelier Cisco connector. It covers online and air-gapped (Offline Kit) environments, quota-aware execution, and escalation paths.
## 2. Prerequisites
- Active Cisco.com (CCO) account with access to the Cisco API Console.
- Cisco PSIRT openVuln API entitlement (visible under “My Apps & Keys” once granted).citeturn3search0
- Concelier configuration location (typically `/etc/stella/concelier.yaml` in production) or Offline Kit secret bundle staging directory.
## 3. Provisioning workflow
1. **Register the application**
- Sign in at <https://apiconsole.cisco.com>.
- Select **Register a New App** → Application Type: `Service`, Grant Type: `Client Credentials`, API: `Cisco PSIRT openVuln API`.citeturn3search0
- Record the generated `clientId` and `clientSecret` in the Ops vault.
2. **Verify token issuance**
- Request an access token with:
```bash
curl -s https://id.cisco.com/oauth2/default/v1/token \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=client_credentials" \
-d "client_id=${CLIENT_ID}" \
-d "client_secret=${CLIENT_SECRET}"
```
- Confirm HTTP 200 and an `expires_in` value of 3600 seconds (tokens live for one hour).citeturn3search0turn3search7
- Preserve the response only long enough to validate syntax; do **not** persist tokens.
3. **Authorize Concelier runtime**
- Update `concelier:sources:cisco:auth` (or the module-specific secret template) with the stored credentials.
- For Offline Kit delivery, export encrypted secrets into `offline-kit/secrets/cisco-openvuln.json` using the platforms sealed secret format.
4. **Connectivity validation**
- From the Concelier control plane, run `stella db jobs run source:vndr-cisco:fetch --dry-run`.
- Ensure the Source HTTP diagnostics record `Bearer` authorization headers and no 401/403 responses.
## 4. Rotation SOP
| Step | Owner | Notes |
| --- | --- | --- |
| 1. Schedule rotation | Ops (monthly board) | Rotate every 90 days or immediately after suspected credential exposure. |
| 2. Create replacement app | Ops | Repeat §3.1 with “-next” suffix; verify token issuance. |
| 3. Stage dual credentials | Ops + Concelier On-Call | Publish new credentials to secret store alongside current pair. |
| 4. Cut over | Concelier On-Call | Restart connector workers during a low-traffic window (<10 min) to pick up the new secret. |
| 5. Deactivate legacy app | Ops | Delete prior app in Cisco API Console once telemetry confirms successful fetch/parse cycles for 2 consecutive hours. |
**Automation hooks**
- Rotation reminders are tracked in OpsRunbookOps board (`OPS-RUN-KEYS` swim lane); add checklist items for Concelier Cisco when opening a rotation task.
- Use the secret management pipeline (`ops/secrets/rotate.sh --connector cisco`) to template vault updates; the script renders a redacted diff for audit.
## 5. Offline Kit packaging
1. Generate the credential bundle using the Offline Kit CLI:
`offline-kit secrets add cisco-openvuln --client-id … --client-secret …`
2. Store the encrypted payload under `offline-kit/secrets/cisco-openvuln.enc`.
3. Distribute via the Offline Kit channel; update `offline-kit/MANIFEST.md` with the credential fingerprint (SHA256 of plaintext concatenated with metadata).
4. Document validation steps for the receiving site (token request from an air-gapped relay or cached token mirror).
## 6. Quota and throttling guidance
- Cisco enforces combined limits of 5 requests/second, 30 requests/minute, and 5000 requests/day per application.citeturn0search0turn3search6
- Concelier fetch jobs must respect `Retry-After` headers on HTTP 429 responses; Ops should monitor for sustained quota saturation and consider paging window adjustments.
- Telemetry to watch: `concelier.source.http.requests{concelier.source="vndr-cisco"}`, `concelier.source.http.failures{...}`, and connector-specific metrics once implemented.
## 7. Telemetry & Monitoring
- **Metrics (Meter `StellaOps.Concelier.Connector.Vndr.Cisco`)**
- `cisco.fetch.documents`, `cisco.fetch.failures`, `cisco.fetch.unchanged`
- `cisco.parse.success`, `cisco.parse.failures`
- `cisco.map.success`, `cisco.map.failures`, `cisco.map.affected.packages`
- **Shared HTTP metrics** via `SourceDiagnostics`:
- `concelier.source.http.requests{concelier.source="vndr-cisco"}`
- `concelier.source.http.failures{concelier.source="vndr-cisco"}`
- `concelier.source.http.duration{concelier.source="vndr-cisco"}`
- **Structured logs**
- `Cisco fetch completed date=… pages=… added=…` (info)
- `Cisco parse completed parsed=… failures=…` (info)
- `Cisco map completed mapped=… failures=…` (info)
- Warnings surface when DTO serialization fails or GridFS payload is missing.
- Suggested alerts: non-zero `cisco.fetch.failures` in 15m, or `cisco.map.success` flatlines while fetch continues.
## 8. Incident response
- **Token compromise** revoke the application in the Cisco API Console, purge cached secrets, rotate immediately per §4.
- **Persistent 401/403** confirm credentials in vault, then validate token issuance; if unresolved, open a Cisco DevNet support ticket referencing the application ID.
- **429 spikes** inspect job scheduler cadence and adjust connector options (`maxRequestsPerWindow`) before requesting higher quotas from Cisco.
## 9. References
- Cisco PSIRT openVuln API Authentication Guide.citeturn3search0
- Accessing the openVuln API using curl (token lifetime).citeturn3search7
- openVuln API rate limit documentation.citeturn0search0turn3search6

View File

@@ -0,0 +1,151 @@
{
"title": "Concelier CVE & KEV Observability",
"uid": "concelier-cve-kev",
"schemaVersion": 38,
"version": 1,
"editable": true,
"timezone": "",
"time": {
"from": "now-24h",
"to": "now"
},
"refresh": "5m",
"templating": {
"list": [
{
"name": "datasource",
"type": "datasource",
"query": "prometheus",
"refresh": 1,
"hide": 0
}
]
},
"panels": [
{
"type": "timeseries",
"title": "CVE fetch success vs failure",
"gridPos": { "h": 9, "w": 12, "x": 0, "y": 0 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(cve_fetch_success_total[5m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "success"
},
{
"refId": "B",
"expr": "rate(cve_fetch_failures_total[5m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "failure"
}
]
},
{
"type": "timeseries",
"title": "KEV fetch cadence",
"gridPos": { "h": 9, "w": 12, "x": 12, "y": 0 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(kev_fetch_success_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "success"
},
{
"refId": "B",
"expr": "rate(kev_fetch_failures_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "failure"
},
{
"refId": "C",
"expr": "rate(kev_fetch_unchanged_total[30m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "unchanged"
}
]
},
{
"type": "table",
"title": "KEV parse anomalies (24h)",
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 9 },
"fieldConfig": {
"defaults": {
"unit": "short"
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "sum by (reason) (increase(kev_parse_anomalies_total[24h]))",
"format": "table",
"datasource": { "type": "prometheus", "uid": "${datasource}" }
}
],
"transformations": [
{
"id": "organize",
"options": {
"renameByName": {
"Value": "count"
}
}
}
]
},
{
"type": "timeseries",
"title": "Advisories emitted",
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 9 },
"fieldConfig": {
"defaults": {
"unit": "ops",
"custom": {
"drawStyle": "line",
"lineWidth": 2,
"fillOpacity": 10
}
},
"overrides": []
},
"targets": [
{
"refId": "A",
"expr": "rate(cve_map_success_total[15m])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "CVE"
},
{
"refId": "B",
"expr": "rate(kev_map_advisories_total[24h])",
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"legendFormat": "KEV"
}
]
}
]
}

View File

@@ -0,0 +1,143 @@
# Concelier CVE & KEV Connector Operations
This playbook equips operators with the steps required to roll out and monitor the CVE Services and CISA KEV connectors across environments.
## 1. CVE Services Connector (`source:cve:*`)
### 1.1 Prerequisites
- CVE Services API credentials (organisation ID, user ID, API key) with access to the JSON 5 API.
- Network egress to `https://cveawg.mitre.org` (or a mirrored endpoint) from the Concelier workers.
- Updated `concelier.yaml` (or the matching environment variables) with the following section:
```yaml
concelier:
sources:
cve:
baseEndpoint: "https://cveawg.mitre.org/api/"
apiOrg: "ORG123"
apiUser: "user@example.org"
apiKeyFile: "/var/run/secrets/concelier/cve-api-key"
seedDirectory: "./seed-data/cve"
pageSize: 200
maxPagesPerFetch: 5
initialBackfill: "30.00:00:00"
requestDelay: "00:00:00.250"
failureBackoff: "00:10:00"
```
> Store the API key outside source control. When using `apiKeyFile`, mount the secret file into the container/host; alternatively supply `apiKey` via `CONCELIER_SOURCES__CVE__APIKEY`.
> 🪙 When credentials are not yet available, configure `seedDirectory` to point at mirrored CVE JSON (for example, the repos `seed-data/cve/` bundle). The connector will ingest those records and log a warning instead of failing the job; live fetching resumes automatically once `apiOrg` / `apiUser` / `apiKey` are supplied.
### 1.2 Smoke Test (staging)
1. Deploy the updated configuration and restart the Concelier service so the connector picks up the credentials.
2. Trigger one end-to-end cycle:
- Concelier CLI: `stella db jobs run source:cve:fetch --and-then source:cve:parse --and-then source:cve:map`
- REST fallback: `POST /jobs/run { "kind": "source:cve:fetch", "chain": ["source:cve:parse", "source:cve:map"] }`
3. Observe the following metrics (exported via OTEL meter `StellaOps.Concelier.Connector.Cve`):
- `cve.fetch.attempts`, `cve.fetch.success`, `cve.fetch.documents`, `cve.fetch.failures`, `cve.fetch.unchanged`
- `cve.parse.success`, `cve.parse.failures`, `cve.parse.quarantine`
- `cve.map.success`
4. Verify Prometheus shows matching `concelier.source.http.requests_total{concelier_source="cve"}` deltas (list vs detail phases) while `concelier.source.http.failures_total{concelier_source="cve"}` stays flat.
5. Confirm the info-level summary log `CVEs fetch window … pages=X detailDocuments=Y detailFailures=Z` appears once per fetch run and shows `detailFailures=0`.
6. Verify the MongoDB advisory store contains fresh CVE advisories (`advisoryKey` prefix `cve/`) and that the source cursor (`source_states` collection) advanced.
### 1.3 Production Monitoring
- **Dashboards** Plot `rate(cve_fetch_success_total[5m])`, `rate(cve_fetch_failures_total[5m])`, and `rate(cve_fetch_documents_total[5m])` alongside `concelier_source_http_requests_total{concelier_source="cve"}` to confirm HTTP and connector counters stay aligned. Keep `concelier.range.primitives{scheme=~"semver|vendor"}` on the same board for range coverage. Example alerts:
- `rate(cve_fetch_failures_total[5m]) > 0` for 10minutes (`severity=warning`)
- `rate(cve_map_success_total[15m]) == 0` while `rate(cve_fetch_success_total[15m]) > 0` (`severity=critical`)
- `sum_over_time(cve_parse_quarantine_total[1h]) > 0` to catch schema anomalies
- **Logs** Monitor warnings such as `Failed fetching CVE record {CveId}` and `Malformed CVE JSON`, and surface the summary info log `CVEs fetch window … detailFailures=0 detailUnchanged=0` on dashboards. A non-zero `detailFailures` usually indicates rate-limit or auth issues on detail requests.
- **Grafana pack** Import `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` and filter by panel legend (`CVE`, `KEV`) to reuse the canned layout.
- **Backfill window** Operators can tighten or widen `initialBackfill` / `maxPagesPerFetch` after validating throughput. Update config and restart Concelier to apply changes.
### 1.4 Staging smoke log (2025-10-15)
While Ops finalises long-lived CVE Services credentials, we validated the connector end-to-end against the recorded CVE-2024-0001 payloads used in regression tests:
- Command: `dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Cve.Tests/StellaOps.Concelier.Connector.Cve.Tests.csproj -l "console;verbosity=detailed"`
- Summary log emitted by the connector:
```
CVEs fetch window 2024-09-01T00:00:00Z->2024-10-01T00:00:00Z pages=1 listSuccess=1 detailDocuments=1 detailFailures=0 detailUnchanged=0 pendingDocuments=0->1 pendingMappings=0->1 hasMorePages=False nextWindowStart=2024-09-15T12:00:00Z nextWindowEnd=(none) nextPage=1
```
- Telemetry captured by `Meter` `StellaOps.Concelier.Connector.Cve`:
| Metric | Value |
|--------|-------|
| `cve.fetch.attempts` | 1 |
| `cve.fetch.success` | 1 |
| `cve.fetch.documents` | 1 |
| `cve.parse.success` | 1 |
| `cve.map.success` | 1 |
The Grafana pack `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` has been imported into staging so the panels referenced above render against these counters once the live API keys are in place.
## 2. CISA KEV Connector (`source:kev:*`)
### 2.1 Prerequisites
- Network egress (or mirrored content) for `https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json`.
- No credentials are required, but the HTTP allow-list must include `www.cisa.gov`.
- Confirm the following snippet in `concelier.yaml` (defaults shown; tune as needed):
```yaml
concelier:
sources:
kev:
feedUri: "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"
requestTimeout: "00:01:00"
failureBackoff: "00:05:00"
```
### 2.2 Schema validation & anomaly handling
The connector validates each catalog against `Schemas/kev-catalog.schema.json`. Failures increment `kev.parse.failures_total{reason="schema"}` and the document is quarantined (status `Failed`). Additional failure reasons include `download`, `invalidJson`, `deserialize`, `missingPayload`, and `emptyCatalog`. Entry-level anomalies are surfaced through `kev.parse.anomalies_total` with reasons:
| Reason | Meaning |
| --- | --- |
| `missingCveId` | Catalog entry omitted `cveID`; the entry is skipped. |
| `countMismatch` | Catalog `count` field disagreed with the actual entry total. |
| `nullEntry` | Upstream emitted a `null` entry object (rare upstream defect). |
Treat repeated schema failures or growing anomaly counts as an upstream regression and coordinate with CISA or mirror maintainers.
### 2.3 Smoke Test (staging)
1. Deploy the configuration and restart Concelier.
2. Trigger a pipeline run:
- CLI: `stella db jobs run source:kev:fetch --and-then source:kev:parse --and-then source:kev:map`
- REST: `POST /jobs/run { "kind": "source:kev:fetch", "chain": ["source:kev:parse", "source:kev:map"] }`
3. Verify the metrics exposed by meter `StellaOps.Concelier.Connector.Kev`:
- `kev.fetch.attempts`, `kev.fetch.success`, `kev.fetch.unchanged`, `kev.fetch.failures`
- `kev.parse.entries` (tag `catalogVersion`), `kev.parse.failures`, `kev.parse.anomalies` (tag `reason`)
- `kev.map.advisories` (tag `catalogVersion`)
4. Confirm `concelier.source.http.requests_total{concelier_source="kev"}` increments once per fetch and that the paired `concelier.source.http.failures_total` stays flat (zero increase).
5. Inspect the info logs `Fetched KEV catalog document … pendingDocuments=…` and `Parsed KEV catalog document … entries=…`—they should appear exactly once per run and `Mapped X/Y… skipped=0` should match the `kev.map.advisories` delta.
6. Confirm MongoDB documents exist for the catalog JSON (`raw_documents` & `dtos`) and that advisories with prefix `kev/` are written.
### 2.4 Production Monitoring
- Alert when `rate(kev_fetch_success_total[8h]) == 0` during working hours (daily cadence breach) and when `increase(kev_fetch_failures_total[1h]) > 0`.
- Page the on-call if `increase(kev_parse_failures_total{reason="schema"}[6h]) > 0`—this usually signals an upstream payload change. Treat repeated `reason="download"` spikes as networking issues to the mirror.
- Track anomaly spikes through `sum_over_time(kev_parse_anomalies_total{reason="missingCveId"}[24h])`. Rising `countMismatch` trends point to catalog publishing bugs.
- Surface the fetch/mapping info logs (`Fetched KEV catalog document …` and `Mapped X/Y KEV advisories … skipped=S`) on dashboards; absence of those logs while metrics show success typically means schema validation short-circuited the run.
### 2.5 Known good dashboard tiles
Add the following panels to the Concelier observability board:
| Metric | Recommended visualisation |
|--------|---------------------------|
| `rate(kev_fetch_success_total[30m])` | Single-stat (last 24h) with warning threshold `>0` |
| `rate(kev_parse_entries_total[1h])` by `catalogVersion` | Stacked area highlights daily release size |
| `sum_over_time(kev_parse_anomalies_total[1d])` by `reason` | Table anomaly breakdown (matches dashboard panel) |
| `rate(cve_map_success_total[15m])` vs `rate(kev_map_advisories_total[24h])` | Comparative timeseries for advisories emitted |
## 3. Runbook updates
- Record staging/production smoke test results (date, catalog version, advisory counts) in your teams change log.
- Add the CVE/KEV job kinds to the standard maintenance checklist so operators can manually trigger them after planned downtime.
- Keep this document in sync with future connector changes (for example, new anomaly reasons or additional metrics).
- Version-control dashboard tweaks alongside `docs/modules/concelier/operations/connectors/cve-kev-grafana-dashboard.json` so operations can re-import the observability pack during restores.

View File

@@ -0,0 +1,123 @@
# Concelier GHSA Connector Operations Runbook
_Last updated: 2025-10-16_
## 1. Overview
The GitHub Security Advisories (GHSA) connector pulls advisory metadata from the GitHub REST API `/security/advisories` endpoint. GitHub enforces both primary and secondary rate limits, so operators must monitor usage and configure retries to avoid throttling incidents.
## 2. Rate-limit telemetry
The connector now surfaces rate-limit headers on every fetch and exposes the following metrics via OpenTelemetry:
| Metric | Description | Tags |
|--------|-------------|------|
| `ghsa.ratelimit.limit` (histogram) | Samples the reported request quota at fetch time. | `phase` = `list` or `detail`, `resource` (e.g., `core`). |
| `ghsa.ratelimit.remaining` (histogram) | Remaining requests returned by `X-RateLimit-Remaining`. | `phase`, `resource`. |
| `ghsa.ratelimit.reset_seconds` (histogram) | Seconds until `X-RateLimit-Reset`. | `phase`, `resource`. |
| `ghsa.ratelimit.headroom_pct` (histogram) | Percentage of the quota still available (`remaining / limit * 100`). | `phase`, `resource`. |
| `ghsa.ratelimit.headroom_pct_current` (observable gauge) | Latest headroom percentage reported per resource. | `phase`, `resource`. |
| `ghsa.ratelimit.exhausted` (counter) | Incremented whenever GitHub returns a zero remaining quota and the connector delays before retrying. | `phase`. |
### Dashboards & alerts
- Plot `ghsa.ratelimit.remaining` as the latest value to watch the runway. Alert when the value stays below **`RateLimitWarningThreshold`** (default `500`) for more than 5 minutes.
- Use `ghsa.ratelimit.headroom_pct_current` to visualise remaining quota % — paging once it sits below **10%** for longer than a single reset window helps avoid secondary limits.
- Raise a separate alert on `increase(ghsa.ratelimit.exhausted[15m]) > 0` to catch hard throttles.
- Overlay `ghsa.fetch.attempts` vs `ghsa.fetch.failures` to confirm retries are effective.
## 3. Logging signals
When `X-RateLimit-Remaining` falls below `RateLimitWarningThreshold`, the connector emits:
```
GHSA rate limit warning: remaining {Remaining}/{Limit} for {Phase} {Resource} (headroom {Headroom}%)
```
When GitHub reports zero remaining calls, the connector logs and sleeps for the reported `Retry-After`/`X-RateLimit-Reset` interval (falling back to `SecondaryRateLimitBackoff`).
After the quota recovers above the warning threshold the connector writes an informational log with the refreshed remaining/headroom, letting operators clear alerts quickly.
## 4. Configuration knobs (`concelier.yaml`)
```yaml
concelier:
sources:
ghsa:
apiToken: "${GITHUB_PAT}"
pageSize: 50
requestDelay: "00:00:00.200"
failureBackoff: "00:05:00"
rateLimitWarningThreshold: 500 # warn below this many remaining calls
secondaryRateLimitBackoff: "00:02:00" # fallback delay when GitHub omits Retry-After
```
### Recommendations
- Increase `requestDelay` in air-gapped or burst-heavy deployments to smooth token consumption.
- Lower `rateLimitWarningThreshold` only if your dashboards already page on the new histogram; never set it negative.
- For bots using a low-privilege PAT, keep `secondaryRateLimitBackoff` at ≥60 seconds to respect GitHubs secondary-limit guidance.
#### Default job schedule
| Job kind | Cron | Timeout | Lease |
|----------|------|---------|-------|
| `source:ghsa:fetch` | `1,11,21,31,41,51 * * * *` | 6 minutes | 4 minutes |
| `source:ghsa:parse` | `3,13,23,33,43,53 * * * *` | 5 minutes | 4 minutes |
| `source:ghsa:map` | `5,15,25,35,45,55 * * * *` | 5 minutes | 4 minutes |
These defaults spread GHSA stages across the hour so fetch completes before parse/map fire. Override them via `concelier.jobs.definitions[...]` when coordinating multiple connectors on the same runner.
## 5. Provisioning credentials
Concelier requires a GitHub personal access token (classic) with the **`read:org`** and **`security_events`** scopes to pull GHSA data. Store it as a secret and reference it via `concelier.sources.ghsa.apiToken`.
### Docker Compose (stack operators)
```yaml
services:
concelier:
environment:
CONCELIER__SOURCES__GHSA__APITOKEN: /run/secrets/ghsa_pat
secrets:
- ghsa_pat
secrets:
ghsa_pat:
file: ./secrets/ghsa_pat.txt # contains only the PAT value
```
### Helm values (cluster operators)
```yaml
concelier:
extraEnv:
- name: CONCELIER__SOURCES__GHSA__APITOKEN
valueFrom:
secretKeyRef:
name: concelier-ghsa
key: apiToken
extraSecrets:
concelier-ghsa:
apiToken: "<paste PAT here or source from external secret store>"
```
After rotating the PAT, restart the Concelier workers (or run `kubectl rollout restart deployment/concelier`) to ensure the configuration reloads.
When enabling GHSA the first time, run a staged backfill:
1. Trigger `source:ghsa:fetch` manually (CLI or API) outside of peak hours.
2. Watch `concelier.jobs.health` for the GHSA jobs until they report `healthy`.
3. Allow the scheduled cron cadence to resume once the initial backlog drains (typically < 30 minutes).
## 6. Runbook steps when throttled
1. Check `ghsa.ratelimit.exhausted` for the affected phase (`list` vs `detail`).
2. Confirm the connector is delayinglogs will show `GHSA rate limit exhausted...` with the chosen backoff.
3. If rate limits stay exhausted:
- Verify no other jobs are sharing the PAT.
- Temporarily reduce `MaxPagesPerFetch` or `PageSize` to shrink burst size.
- Consider provisioning a dedicated PAT (GHSA permissions only) for Concelier.
4. After the quota resets, reset `rateLimitWarningThreshold`/`requestDelay` to their normal values and monitor the histograms for at least one hour.
## 7. Alert integration quick reference
- Prometheus: `ghsa_ratelimit_remaining_bucket` (from histogram) use `histogram_quantile(0.99, ...)` to trend capacity.
- VictoriaMetrics: `LAST_over_time(ghsa_ratelimit_remaining_sum[5m])` for simple last-value graphs.
- Grafana: stack remaining + used to visualise total limit per resource.
## 8. Canonical metric fallback analytics
When GitHub omits CVSS vectors/scores, the connector now assigns a deterministic canonical metric id in the form `ghsa:severity/<level>` and publishes it to Merge so severity precedence still resolves against GHSA even without CVSS data.
- Metric: `ghsa.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `reason=no_cvss`.
- Monitor the counter alongside Merge parity checks; a sudden spike suggests GitHub is shipping advisories without vectors and warrants cross-checking downstream exporters.
- Because the canonical id feeds Merge, parity dashboards should overlay this metric to confirm fallback advisories continue to merge ahead of downstream sources when GHSA supplies more recent data.

View File

@@ -0,0 +1,122 @@
# Concelier CISA ICS Connector Operations
This runbook documents how to provision, rotate, and validate credentials for the CISA Industrial Control Systems (ICS) connector (`source:ics-cisa:*`). Follow it before enabling the connector in staging or offline installations.
## 1. Credential Provisioning
1. **Create a service mailbox** reachable by the Ops crew (shared mailbox recommended).
2. Browse to `https://public.govdelivery.com/accounts/USDHSCISA/subscriber/new` and subscribe the mailbox to the following GovDelivery topics:
- `USDHSCISA_16` — ICS-CERT advisories (legacy numbering: `ICSA-YY-###`).
- `USDHSCISA_19` — ICS medical advisories (`ICSMA-YY-###`).
- `USDHSCISA_17` — ICS alerts (`IR-ALERT-YY-###`) for completeness.
3. Complete the verification email. After confirmation, note the **personalised subscription code** included in the “Manage Preferences” link. It has the shape `code=AB12CD34EF`.
4. Store the code in the shared secret vault (or Offline Kit secrets bundle) as `concelier/sources/icscisa/govdelivery/code`.
> GovDelivery does not expose a one-time API key; the personalised code is what authenticates the RSS pull. Never commit it to git.
## 2. Feed Validation
Use the following command to confirm the feed is reachable before wiring it into Concelier (substitute `<CODE>` with the personalised value):
```bash
curl -H "User-Agent: StellaOpsConcelier/ics-cisa" \
"https://content.govdelivery.com/accounts/USDHSCISA/topics/ICS-CERT/feed.rss?format=xml&code=<CODE>"
```
If the endpoint returns HTTP 200 and an RSS payload, record the sample response under `docs/artifacts/icscisa/` (see Task `FEEDCONN-ICSCISA-02-007`). HTTP 403 or 406 usually means the subscription was not confirmed or the code was mistyped.
## 3. Configuration Snippet
Add the connector configuration to `concelier.yaml` (or equivalent environment variables):
```yaml
concelier:
sources:
icscisa:
govDelivery:
code: "${CONCELIER_ICS_CISA_GOVDELIVERY_CODE}"
topics:
- "USDHSCISA_16"
- "USDHSCISA_19"
- "USDHSCISA_17"
rssBaseUri: "https://content.govdelivery.com/accounts/USDHSCISA"
requestDelay: "00:00:01"
failureBackoff: "00:05:00"
```
Environment variable example:
```bash
export CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE="AB12CD34EF"
```
Concelier automatically register the host with the Source.Common HTTP allow-list when the connector assembly is loaded.
Optional tuning keys (set only when needed):
- `proxyUri` — HTTP/HTTPS proxy URL used when Akamai blocks direct pulls.
- `requestVersion` / `requestVersionPolicy` — override HTTP negotiation when the proxy requires HTTP/1.1.
- `enableDetailScrape` — toggle HTML detail fallback (defaults to true).
- `captureAttachments` — collect PDF attachments from detail pages (defaults to true).
- `detailBaseUri` — alternate host for detail enrichment if CISA changes their layout.
## 4. Seeding Without GovDelivery
If credentials are still pending, populate the connector with the community CSV dataset before enabling the live fetch:
1. Run `./scripts/fetch-ics-cisa-seed.sh` (or `.ps1`) to download the latest `CISA_ICS_ADV_*.csv` files into `seed-data/ics-cisa/`.
2. Copy the CSVs (and the generated `.sha256` files) into your Offline Kit staging area so they ship alongside the other feeds.
3. Import the kit as usual. The connector can parse the seed data for historical context, but **live GovDelivery credentials are still required** for fresh advisories.
4. Once credentials arrive, update `concelier:sources:icscisa:govDelivery:code` and re-trigger `source:ics-cisa:fetch` so the connector switches to the authorised feed.
> The CSVs are licensed under ODbL1.0 by the ICS Advisory Project. Preserve the attribution when redistributing them.
## 4. Integration Validation
1. Ensure secrets are in place and restart the Concelier workers.
2. Run a dry-run fetch/parse/map chain against an Akamai-protected topic:
```bash
CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE=... \
CONCELIER_SOURCES_ICSCISA_ENABLEDETAILSCRAPE=1 \
stella db jobs run source:ics-cisa:fetch --and-then source:ics-cisa:parse --and-then source:ics-cisa:map
```
3. Confirm logs contain `ics-cisa detail fetch` entries and that new documents/DTOs include attachments (see `docs/artifacts/icscisa`). Canonical advisories should expose PDF links as `references.kind == "attachment"` and affected packages should surface `primitives.semVer.exactValue` for single-version hits.
4. If Akamai blocks direct fetches, set `concelier:sources:icscisa:proxyUri` to your allow-listed egress proxy and rerun the dry-run.
## 4. Rotation & Incident Response
- Review GovDelivery access quarterly. Rotate the personalised code whenever Ops changes the service mailbox password or membership.
- Revoking the subscription in GovDelivery invalidates the code immediately; update the vault and configuration in the same change.
- If the code leaks, remove the subscription (`https://public.govdelivery.com/accounts/USDHSCISA/subscriber/manage_preferences?code=<CODE>`), resubscribe, and distribute the new value via the vault.
## 5. Offline Kit Handling
Include the personalised code in `offline-kit/secrets/concelier/icscisa.env`:
```
CONCELIER_SOURCES_ICSCISA_GOVDELIVERY_CODE=AB12CD34EF
```
The Offline Kit deployment script copies this file into the container secret directory mounted at `/run/secrets/concelier`. Ensure permissions are `600` and ownership matches the Concelier runtime user.
## 6. Telemetry & Monitoring
The connector emits metrics under the meter `StellaOps.Concelier.Connector.Ics.Cisa`. They allow operators to track Akamai fallbacks, detail enrichment health, and advisory fan-out.
- `icscisa.fetch.*` counters for `attempts`, `success`, `failures`, `not_modified`, and `fallbacks`, plus histogram `icscisa.fetch.documents` showing documents added per topic pull (tags: `concelier.source`, `icscisa.topic`).
- `icscisa.parse.*` counters for `success`/`failures` and histograms `icscisa.parse.advisories`, `icscisa.parse.attachments`, `icscisa.parse.detail_fetches` to monitor enrichment workload per feed document.
- `icscisa.detail.*` counters `success` / `failures` per advisory (tagged with `icscisa.advisory`) to alert when Akamai blocks detail pages.
- `icscisa.map.*` counters for `success`/`failures` and histograms `icscisa.map.references`, `icscisa.map.packages`, `icscisa.map.aliases` capturing canonical fan-out.
Suggested alerts:
- `increase(icscisa.fetch.failures_total[15m]) > 0` or `increase(icscisa.fetch.fallbacks_total[15m]) > 5` — sustained Akamai or proxy issues.
- `increase(icscisa.detail.failures_total[30m]) > 0` — detail enrichment breaking (potential HTML layout change).
- `histogram_quantile(0.95, rate(icscisa.map.references_bucket[1h]))` trending sharply higher — sudden advisory reference explosion worth investigating.
- Keep an eye on shared HTTP metrics (`concelier.source.http.*{concelier.source="ics-cisa"}`) for request latency and retry patterns.
## 6. Related Tasks
- `FEEDCONN-ICSCISA-02-009` (GovDelivery credential onboarding) — completed once this runbook is followed and secrets are placed in the vault.
- `FEEDCONN-ICSCISA-02-007` (document inventory) — archive the first successful RSS response and any attachment URL schema under `docs/artifacts/icscisa/`.

View File

@@ -0,0 +1,74 @@
# Concelier KISA Connector Operations
Operational guidance for the Korea Internet & Security Agency (KISA / KNVD) connector (`source:kisa:*`). Pair this with the engineering brief in `docs/dev/kisa_connector_notes.md`.
## 1. Prerequisites
- Outbound HTTPS (or mirrored cache) for `https://knvd.krcert.or.kr/`.
- Connector options defined under `concelier:sources:kisa`:
```yaml
concelier:
sources:
kisa:
feedUri: "https://knvd.krcert.or.kr/rss/securityInfo.do"
detailApiUri: "https://knvd.krcert.or.kr/rssDetailData.do"
detailPageUri: "https://knvd.krcert.or.kr/detailDos.do"
maxAdvisoriesPerFetch: 10
requestDelay: "00:00:01"
failureBackoff: "00:05:00"
```
> Ensure the URIs stay absolute—Concelier adds the `feedUri`/`detailApiUri` hosts to the HttpClient allow-list automatically.
## 2. Staging Smoke Test
1. Restart the Concelier workers so the KISA options bind.
2. Run a full connector cycle:
- CLI: `stella db jobs run source:kisa:fetch --and-then source:kisa:parse --and-then source:kisa:map`
- REST: `POST /jobs/run { "kind": "source:kisa:fetch", "chain": ["source:kisa:parse", "source:kisa:map"] }`
3. Confirm telemetry (Meter `StellaOps.Concelier.Connector.Kisa`):
- `kisa.feed.success`, `kisa.feed.items`
- `kisa.detail.success` / `.failures`
- `kisa.parse.success` / `.failures`
- `kisa.map.success` / `.failures`
- `kisa.cursor.updates`
4. Inspect logs for structured entries:
- `KISA feed returned {ItemCount}`
- `KISA fetched detail for {Idx} … category={Category}`
- `KISA mapped advisory {AdvisoryId} (severity={Severity})`
- Absence of warnings such as `document missing GridFS payload`.
5. Validate MongoDB state:
- `raw_documents.metadata` has `kisa.idx`, `kisa.category`, `kisa.title`.
- DTO store contains `schemaVersion="kisa.detail.v1"`.
- Advisories include aliases (`IDX`, CVE) and `language="ko"`.
- `source_states` entry for `kisa` shows recent `cursor.lastFetchAt`.
## 3. Production Monitoring
- **Dashboards** Add the following Prometheus/OTEL expressions:
- `rate(kisa_feed_items_total[15m])` versus `rate(concelier_source_http_requests_total{concelier_source="kisa"}[15m])`
- `increase(kisa_detail_failures_total{reason!="empty-document"}[1h])` alert at `>0`
- `increase(kisa_parse_failures_total[1h])` for storage/JSON issues
- `increase(kisa_map_failures_total[1h])` to flag schema drift
- `increase(kisa_cursor_updates_total[6h]) == 0` during active windows → warn
- **Alerts** Page when `rate(kisa_feed_success_total[2h]) == 0` while other connectors are active; back off for maintenance windows announced on `https://knvd.krcert.or.kr/`.
- **Logs** Watch for repeated warnings (`document missing`, `DTO missing`) or errors with reason tags `HttpRequestException`, `download`, `parse`, `map`.
## 4. Localisation Handling
- Hangul categories (for example `취약점정보`) flow into telemetry tags (`category=…`) and logs. Dashboards must render UTF8 and avoid transliteration.
- HTML content is sanitised before storage; translation teams can consume the `ContentHtml` field safely.
- Advisory severity remains as provided by KISA (`High`, `Medium`, etc.). Map-level failures include the severity tag for filtering.
## 5. Fixture & Regression Maintenance
- Regression fixtures: `src/Concelier/__Tests/StellaOps.Concelier.Connector.Kisa.Tests/Fixtures/kisa-feed.xml` and `kisa-detail.json`.
- Refresh via `UPDATE_KISA_FIXTURES=1 dotnet test src/Concelier/__Tests/StellaOps.Concelier.Connector.Kisa.Tests/StellaOps.Concelier.Connector.Kisa.Tests.csproj`.
- The telemetry regression (`KisaConnectorTests.Telemetry_RecordsMetrics`) will fail if counters/log wiring drifts—treat failures as gating.
## 6. Known Issues
- RSS feeds only expose the latest 10 advisories; long outages require replay via archived feeds or manual IDX seeds.
- Detail endpoint occasionally throttles; the connector honours `requestDelay` and reports failures with reason `HttpRequestException`. Consider increasing delay for weekend backfills.
- If `kisa.category` tags suddenly appear as `unknown`, verify KISA has not renamed RSS elements; update the parser fixtures before production rollout.

View File

@@ -0,0 +1,86 @@
# Concelier MSRC Connector Azure AD Onboarding Brief
_Drafted: 2025-10-15_
## 1. App registration requirements
- **Tenant**: shared StellaOps production Azure AD.
- **Application type**: confidential client (web/API) issuing client credentials.
- **API permissions**: `api://api.msrc.microsoft.com/.default` (Application). Admin consent required once.
- **Token audience**: `https://api.msrc.microsoft.com/`.
- **Grant type**: client credentials. Concelier will request tokens via `POST https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/token`.
## 2. Secret/credential policy
- Maintain two client secrets (primary + standby) rotating every 90 days.
- Store secrets in the Concelier secrets vault; Offline Kit deployments must mirror the secret payloads in their encrypted store.
- Record rotation cadence in Ops runbook and update Concelier configuration (`CONCELIER__SOURCES__VNDR__MSRC__CLIENTSECRET`) ahead of expiry.
## 3. Concelier configuration sample
```yaml
concelier:
sources:
vndr.msrc:
tenantId: "<azure-tenant-guid>"
clientId: "<app-registration-client-id>"
clientSecret: "<pull from secret store>"
apiVersion: "2024-08-01"
locale: "en-US"
requestDelay: "00:00:00.250"
failureBackoff: "00:05:00"
cursorOverlapMinutes: 10
downloadCvrf: false # set true to persist CVRF ZIP alongside JSON detail
```
## 4. CVRF artefacts
- The MSRC REST payload exposes `cvrfUrl` per advisory. Current connector persists the link as advisory metadata and reference; it does **not** download the ZIP by default.
- Ops should mirror CVRF ZIPs when preparing Offline Kits so air-gapped deployments can reconcile advisories without direct internet access.
- Once Offline Kit storage guidelines are finalised, extend the connector configuration with `downloadCvrf: true` to enable automatic attachment retrieval.
### 4.1 State seeding helper
Use `src/Tools/SourceStateSeeder` to queue historical advisories (detail JSON + optional CVRF artefacts) for replay without manual Mongo edits. Example seed file:
```json
{
"source": "vndr.msrc",
"cursor": {
"lastModifiedCursor": "2024-01-01T00:00:00Z"
},
"documents": [
{
"uri": "https://api.msrc.microsoft.com/sug/v2.0/vulnerability/ADV2024-0001",
"contentFile": "./seeds/adv2024-0001.json",
"contentType": "application/json",
"metadata": { "msrc.vulnerabilityId": "ADV2024-0001" },
"addToPendingDocuments": true
},
{
"uri": "https://download.microsoft.com/msrc/2024/ADV2024-0001.cvrf.zip",
"contentFile": "./seeds/adv2024-0001.cvrf.zip",
"contentType": "application/zip",
"status": "mapped",
"addToPendingDocuments": false
}
]
}
```
Run the helper:
```bash
dotnet run --project src/Tools/SourceStateSeeder -- \
--connection-string "mongodb://localhost:27017" \
--database concelier \
--input seeds/msrc-backfill.json
```
Any documents marked `addToPendingDocuments` will appear in the connector cursor; `DownloadCvrf` can remain disabled if the ZIP artefact is pre-seeded.
## 5. Outstanding items
- Ops to confirm tenant/app names and provide client credentials through the secure channel.
- Connector team monitors token cache health (already implemented); validate instrumentation once Ops supplies credentials.
- Offline Kit packaging: add encrypted blob containing client credentials with rotation instructions.

View File

@@ -0,0 +1,48 @@
# NKCKI Connector Operations Guide
## Overview
The NKCKI connector ingests JSON bulletin archives from cert.gov.ru, expanding each `*.json.zip` attachment into per-vulnerability DTOs before canonical mapping. The fetch pipeline now supports cache-backed recovery, deterministic pagination, and telemetry suitable for production monitoring.
## Configuration
Key options exposed through `concelier:sources:ru-nkcki:http`:
- `maxBulletinsPerFetch` limits new bulletin downloads in a single run (default `5`).
- `maxListingPagesPerFetch` maximum listing pages visited during pagination (default `3`).
- `listingCacheDuration` minimum interval between listing fetches before falling back to cached artefacts (default `00:10:00`).
- `cacheDirectory` optional path for persisted bulletin archives used during offline or failure scenarios.
- `requestDelay` delay inserted between bulletin downloads to respect upstream politeness.
When operating in offline-first mode, set `cacheDirectory` to a writable path (e.g. `/var/lib/concelier/cache/ru-nkcki`) and pre-populate bulletin archives via the offline kit.
## Telemetry
`RuNkckiDiagnostics` emits the following metrics under meter `StellaOps.Concelier.Connector.Ru.Nkcki`:
- `nkcki.listing.fetch.attempts` / `nkcki.listing.fetch.success` / `nkcki.listing.fetch.failures`
- `nkcki.listing.pages.visited` (histogram, `pages`)
- `nkcki.listing.attachments.discovered` / `nkcki.listing.attachments.new`
- `nkcki.bulletin.fetch.success` / `nkcki.bulletin.fetch.cached` / `nkcki.bulletin.fetch.failures`
- `nkcki.entries.processed` (histogram, `entries`)
Integrate these counters into standard Concelier observability dashboards to track crawl coverage and cache hit rates.
## Archive Backfill Strategy
Bitrix pagination surfaces archives via `?PAGEN_1=n`. The connector now walks up to `maxListingPagesPerFetch` pages, deduplicating bulletin IDs and maintaining a rolling `knownBulletins` window. Backfill strategy:
1. Enumerate pages from newest to oldest, respecting `maxListingPagesPerFetch` and `listingCacheDuration` to avoid refetch storms.
2. Persist every `*.json.zip` attachment to the configured cache directory. This enables replay when listing access is temporarily blocked.
3. During archive replay, `ProcessCachedBulletinsAsync` enqueues missing documents while respecting `maxVulnerabilitiesPerFetch`.
4. For historical HTML-only advisories, collect page URLs and metadata while offline (future work: HTML and PDF extraction pipeline documented in `docs/concelier-connector-research-20251011.md`).
For large migrations, seed caches with archived zip bundles, then run fetch/parse/map cycles in chronological order to maintain deterministic outputs.
## Failure Handling
- Listing failures mark the source state with exponential backoff while attempting cache replay.
- Bulletin fetches fall back to cached copies before surfacing an error.
- Mongo integration tests rely on bundled OpenSSL 1.1 libraries (`src/Tools/openssl/linux-x64`) to keep `Mongo2Go` operational on modern distros.
Refer to `ru-nkcki` entries in `src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Ru.Nkcki/TASKS.md` for outstanding items.

View File

@@ -0,0 +1,24 @@
# Concelier OSV Connector Operations Notes
_Last updated: 2025-10-16_
The OSV connector ingests advisories from OSV.dev across OSS ecosystems. This note highlights the additional merge/export expectations introduced with the canonical metric fallback work in Sprint 4.
## 1. Canonical metric fallbacks
- When OSV omits CVSS vectors (common for CVSS v4-only payloads) the mapper now emits a deterministic canonical metric id in the form `osv:severity/<level>` and normalises the advisory severity to the same `<level>`.
- Metric: `osv.map.canonical_metric_fallbacks` (counter) with tags `severity`, `canonical_metric_id`, `ecosystem`, `reason=no_cvss`. Watch this alongside merge parity dashboards to catch spikes where OSV publishes severity-only advisories.
- Merge precedence still prefers GHSA over OSV; the shared severity-based canonical id keeps Merge/export parity deterministic even when only OSV supplies severity data.
## 2. CWE provenance
- `database_specific.cwe_ids` now populates provenance decision reasons for every mapped weakness. Expect `decisionReason="database_specific.cwe_ids"` on OSV weakness provenance and confirm exporters preserve the value.
- If OSV ever attaches `database_specific.cwe_notes`, the connector will surface the joined note string in `decisionReason` instead of the default marker.
## 3. Dashboards & alerts
- Extend existing merge dashboards with the new counter:
- Overlay `sum(osv.map.canonical_metric_fallbacks{ecosystem=~".+"})` with Merge severity overrides to confirm fallback advisories are reconciling cleanly.
- Alert when the 1-hour sum exceeds 50 for any ecosystem; baseline volume is currently <5 per day (mostly GHSA mirrors emitting CVSS v4 only).
- Exporters already surface `canonicalMetricId`; no schema change is required, but ORAS/Trivy bundles should be spot-checked after deploying the connector update.
## 4. Runbook updates
- Fixture parity suites (`osv-ghsa.*`) now assert the fallback id and provenance notes. Regenerate via `dotnet test src/Concelier/StellaOps.Concelier.PluginBinaries/StellaOps.Concelier.Connector.Osv.Tests/StellaOps.Concelier.Connector.Osv.Tests.csproj`.
- When investigating merge severity conflicts, include the fallback counter and confirm OSV advisories carry the expected `osv:severity/<level>` id before raising connector bugs.