Initial commit (history squashed)

2025-10-07 10:14:21 +03:00
commit b97fc7685a
1132 changed files with 117842 additions and 0 deletions
--- a/docs/ops/feedser-cve-kev-operations.md
+++ b/docs/ops/feedser-cve-kev-operations.md
@@ -0,0 +1,104 @@
+# Feedser CVE & KEV Connector Operations
+
+This playbook equips operators with the steps required to roll out and monitor the CVE Services and CISA KEV connectors across environments.
+
+## 1. CVE Services Connector (`source:cve:*`)
+
+### 1.1 Prerequisites
+
+- CVE Services API credentials (organisation ID, user ID, API key) with access to the JSON 5 API.
+- Network egress to `https://cveawg.mitre.org` (or a mirrored endpoint) from the Feedser workers.
+- Updated `feedser.yaml` (or the matching environment variables) with the following section:
+
+```yaml
+feedser:
+  sources:
+    cve:
+      baseEndpoint: "https://cveawg.mitre.org/api/"
+      apiOrg: "ORG123"
+      apiUser: "user@example.org"
+      apiKeyFile: "/var/run/secrets/feedser/cve-api-key"
+      pageSize: 200
+      maxPagesPerFetch: 5
+      initialBackfill: "30.00:00:00"
+      requestDelay: "00:00:00.250"
+      failureBackoff: "00:10:00"
+```
+
+> ℹ️  Store the API key outside source control. When using `apiKeyFile`, mount the secret file into the container/host; alternatively supply `apiKey` via `FEEDSER_SOURCES__CVE__APIKEY`.
+
+### 1.2 Smoke Test (staging)
+
+1. Deploy the updated configuration and restart the Feedser service so the connector picks up the credentials.
+2. Trigger one end-to-end cycle:
+   - Feedser CLI: `stella db jobs run source:cve:fetch --and-then source:cve:parse --and-then source:cve:map`
+   - REST fallback: `POST /jobs/run { "kind": "source:cve:fetch", "chain": ["source:cve:parse", "source:cve:map"] }`
+3. Observe the following metrics (exported via OTEL meter `StellaOps.Feedser.Source.Cve`):
+   - `cve.fetch.attempts`, `cve.fetch.success`, `cve.fetch.failures`, `cve.fetch.unchanged`
+   - `cve.parse.success`, `cve.parse.failures`, `cve.parse.quarantine`
+   - `cve.map.success`
+4. Verify the MongoDB advisory store contains fresh CVE advisories (`advisoryKey` prefix `cve/`) and that the source cursor (`source_states` collection) advanced.
+
+### 1.3 Production Monitoring
+
+- **Dashboards** – Add the counters above plus `feedser.range.primitives` (filtered by `scheme=semver` or `scheme=vendor`) to the Feedser overview board. Alert when:
+  - `rate(cve.fetch.failures[5m]) > 0`
+  - `rate(cve.map.success[15m]) == 0` while fetch attempts continue
+  - `sum_over_time(cve.parse.quarantine[1h]) > 0`
+- **Logs** – Watch for `CveConnector` warnings such as `Failed fetching CVE record` or schema validation errors (`Malformed CVE JSON`). These are emitted with the CVE ID and document identifier for triage.
+- **Backfill window** – operators can tighten or widen the `initialBackfill` / `maxPagesPerFetch` values after validating baseline throughput. Update the config and restart the worker to apply changes.
+
+## 2. CISA KEV Connector (`source:kev:*`)
+
+### 2.1 Prerequisites
+
+- Network egress (or mirrored content) for `https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json`.
+- No credentials are required, but the HTTP allow-list must include `www.cisa.gov`.
+- Confirm the following snippet in `feedser.yaml` (defaults shown; tune as needed):
+
+```yaml
+feedser:
+  sources:
+    kev:
+      feedUri: "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"
+      requestTimeout: "00:01:00"
+      failureBackoff: "00:05:00"
+```
+
+### 2.2 Schema validation & anomaly handling
+
+From this sprint the connector validates the KEV JSON payload against `Schemas/kev-catalog.schema.json`. Malformed documents are quarantined, and entries missing a CVE ID are dropped with a warning (`reason=missingCveId`). Operators should treat repeated schema failures as an upstream regression and coordinate with CISA or mirror maintainers.
+
+### 2.3 Smoke Test (staging)
+
+1. Deploy the configuration and restart Feedser.
+2. Trigger a pipeline run:
+   - CLI: `stella db jobs run source:kev:fetch --and-then source:kev:parse --and-then source:kev:map`
+   - REST: `POST /jobs/run { "kind": "source:kev:fetch", "chain": ["source:kev:parse", "source:kev:map"] }`
+3. Verify the metrics exposed by meter `StellaOps.Feedser.Source.Kev`:
+   - `kev.fetch.attempts`, `kev.fetch.success`, `kev.fetch.unchanged`, `kev.fetch.failures`
+   - `kev.parse.entries` (tag `catalogVersion`), `kev.parse.failures`, `kev.parse.anomalies` (tag `reason`)
+   - `kev.map.advisories` (tag `catalogVersion`)
+4. Confirm MongoDB documents exist for the catalog JSON (`raw_documents` & `dtos`) and that advisories with prefix `kev/` are written.
+
+### 2.4 Production Monitoring
+
+- Alert when `kev.fetch.success` goes to zero for longer than the expected daily cadence (default: trigger if `rate(kev.fetch.success[8h]) == 0` during business hours).
+- Track anomaly spikes via `kev.parse.anomalies{reason="missingCveId"}`. A sustained non-zero rate means the upstream catalog contains unexpected records.
+- The connector logs each validated catalog: `Parsed KEV catalog document … entries=X`. Absence of that log alongside consecutive `kev.fetch.success` counts suggests schema validation failures—correlate with warning-level events in the `StellaOps.Feedser.Source.Kev` logger.
+
+### 2.5 Known good dashboard tiles
+
+Add the following panels to the Feedser observability board:
+
+| Metric | Recommended visualisation |
+|--------|---------------------------|
+| `kev.fetch.success` | Single-stat (last 24 h) with threshold alert |
+| `rate(kev.parse.entries[1h])` by `catalogVersion` | Stacked area – highlights daily release size |
+| `sum_over_time(kev.parse.anomalies[1d])` by `reason` | Table – anomaly breakdown |
+
+## 3. Runbook updates
+
+- Record staging/production smoke test results (date, catalog version, advisory counts) in your team’s change log.
+- Add the CVE/KEV job kinds to the standard maintenance checklist so operators can manually trigger them after planned downtime.
+- Keep this document in sync with future connector changes (for example, new anomaly reasons or additional metrics).