144 lines
9.5 KiB
Markdown
144 lines
9.5 KiB
Markdown
# Feedser CVE & KEV Connector Operations
|
||
|
||
This playbook equips operators with the steps required to roll out and monitor the CVE Services and CISA KEV connectors across environments.
|
||
|
||
## 1. CVE Services Connector (`source:cve:*`)
|
||
|
||
### 1.1 Prerequisites
|
||
|
||
- CVE Services API credentials (organisation ID, user ID, API key) with access to the JSON 5 API.
|
||
- Network egress to `https://cveawg.mitre.org` (or a mirrored endpoint) from the Feedser workers.
|
||
- Updated `feedser.yaml` (or the matching environment variables) with the following section:
|
||
|
||
```yaml
|
||
feedser:
|
||
sources:
|
||
cve:
|
||
baseEndpoint: "https://cveawg.mitre.org/api/"
|
||
apiOrg: "ORG123"
|
||
apiUser: "user@example.org"
|
||
apiKeyFile: "/var/run/secrets/feedser/cve-api-key"
|
||
seedDirectory: "./seed-data/cve"
|
||
pageSize: 200
|
||
maxPagesPerFetch: 5
|
||
initialBackfill: "30.00:00:00"
|
||
requestDelay: "00:00:00.250"
|
||
failureBackoff: "00:10:00"
|
||
```
|
||
|
||
> ℹ️ Store the API key outside source control. When using `apiKeyFile`, mount the secret file into the container/host; alternatively supply `apiKey` via `FEEDSER_SOURCES__CVE__APIKEY`.
|
||
|
||
> 🪙 When credentials are not yet available, configure `seedDirectory` to point at mirrored CVE JSON (for example, the repo’s `seed-data/cve/` bundle). The connector will ingest those records and log a warning instead of failing the job; live fetching resumes automatically once `apiOrg` / `apiUser` / `apiKey` are supplied.
|
||
|
||
### 1.2 Smoke Test (staging)
|
||
|
||
1. Deploy the updated configuration and restart the Feedser service so the connector picks up the credentials.
|
||
2. Trigger one end-to-end cycle:
|
||
- Feedser CLI: `stella db jobs run source:cve:fetch --and-then source:cve:parse --and-then source:cve:map`
|
||
- REST fallback: `POST /jobs/run { "kind": "source:cve:fetch", "chain": ["source:cve:parse", "source:cve:map"] }`
|
||
3. Observe the following metrics (exported via OTEL meter `StellaOps.Feedser.Source.Cve`):
|
||
- `cve.fetch.attempts`, `cve.fetch.success`, `cve.fetch.documents`, `cve.fetch.failures`, `cve.fetch.unchanged`
|
||
- `cve.parse.success`, `cve.parse.failures`, `cve.parse.quarantine`
|
||
- `cve.map.success`
|
||
4. Verify Prometheus shows matching `feedser.source.http.requests_total{feedser_source="cve"}` deltas (list vs detail phases) while `feedser.source.http.failures_total{feedser_source="cve"}` stays flat.
|
||
5. Confirm the info-level summary log `CVEs fetch window … pages=X detailDocuments=Y detailFailures=Z` appears once per fetch run and shows `detailFailures=0`.
|
||
6. Verify the MongoDB advisory store contains fresh CVE advisories (`advisoryKey` prefix `cve/`) and that the source cursor (`source_states` collection) advanced.
|
||
|
||
### 1.3 Production Monitoring
|
||
|
||
- **Dashboards** – Plot `rate(cve_fetch_success_total[5m])`, `rate(cve_fetch_failures_total[5m])`, and `rate(cve_fetch_documents_total[5m])` alongside `feedser_source_http_requests_total{feedser_source="cve"}` to confirm HTTP and connector counters stay aligned. Keep `feedser.range.primitives{scheme=~"semver|vendor"}` on the same board for range coverage. Example alerts:
|
||
- `rate(cve_fetch_failures_total[5m]) > 0` for 10 minutes (`severity=warning`)
|
||
- `rate(cve_map_success_total[15m]) == 0` while `rate(cve_fetch_success_total[15m]) > 0` (`severity=critical`)
|
||
- `sum_over_time(cve_parse_quarantine_total[1h]) > 0` to catch schema anomalies
|
||
- **Logs** – Monitor warnings such as `Failed fetching CVE record {CveId}` and `Malformed CVE JSON`, and surface the summary info log `CVEs fetch window … detailFailures=0 detailUnchanged=0` on dashboards. A non-zero `detailFailures` usually indicates rate-limit or auth issues on detail requests.
|
||
- **Grafana pack** – Import `docs/ops/feedser-cve-kev-grafana-dashboard.json` and filter by panel legend (`CVE`, `KEV`) to reuse the canned layout.
|
||
- **Backfill window** – Operators can tighten or widen `initialBackfill` / `maxPagesPerFetch` after validating throughput. Update config and restart Feedser to apply changes.
|
||
|
||
### 1.4 Staging smoke log (2025-10-15)
|
||
|
||
While Ops finalises long-lived CVE Services credentials, we validated the connector end-to-end against the recorded CVE-2024-0001 payloads used in regression tests:
|
||
|
||
- Command: `dotnet test src/StellaOps.Feedser.Source.Cve.Tests/StellaOps.Feedser.Source.Cve.Tests.csproj -l "console;verbosity=detailed"`
|
||
- Summary log emitted by the connector:
|
||
```
|
||
CVEs fetch window 2024-09-01T00:00:00Z->2024-10-01T00:00:00Z pages=1 listSuccess=1 detailDocuments=1 detailFailures=0 detailUnchanged=0 pendingDocuments=0->1 pendingMappings=0->1 hasMorePages=False nextWindowStart=2024-09-15T12:00:00Z nextWindowEnd=(none) nextPage=1
|
||
```
|
||
- Telemetry captured by `Meter` `StellaOps.Feedser.Source.Cve`:
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| `cve.fetch.attempts` | 1 |
|
||
| `cve.fetch.success` | 1 |
|
||
| `cve.fetch.documents` | 1 |
|
||
| `cve.parse.success` | 1 |
|
||
| `cve.map.success` | 1 |
|
||
|
||
The Grafana pack `docs/ops/feedser-cve-kev-grafana-dashboard.json` has been imported into staging so the panels referenced above render against these counters once the live API keys are in place.
|
||
|
||
## 2. CISA KEV Connector (`source:kev:*`)
|
||
|
||
### 2.1 Prerequisites
|
||
|
||
- Network egress (or mirrored content) for `https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json`.
|
||
- No credentials are required, but the HTTP allow-list must include `www.cisa.gov`.
|
||
- Confirm the following snippet in `feedser.yaml` (defaults shown; tune as needed):
|
||
|
||
```yaml
|
||
feedser:
|
||
sources:
|
||
kev:
|
||
feedUri: "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"
|
||
requestTimeout: "00:01:00"
|
||
failureBackoff: "00:05:00"
|
||
```
|
||
|
||
### 2.2 Schema validation & anomaly handling
|
||
|
||
The connector validates each catalog against `Schemas/kev-catalog.schema.json`. Failures increment `kev.parse.failures_total{reason="schema"}` and the document is quarantined (status `Failed`). Additional failure reasons include `download`, `invalidJson`, `deserialize`, `missingPayload`, and `emptyCatalog`. Entry-level anomalies are surfaced through `kev.parse.anomalies_total` with reasons:
|
||
|
||
| Reason | Meaning |
|
||
| --- | --- |
|
||
| `missingCveId` | Catalog entry omitted `cveID`; the entry is skipped. |
|
||
| `countMismatch` | Catalog `count` field disagreed with the actual entry total. |
|
||
| `nullEntry` | Upstream emitted a `null` entry object (rare upstream defect). |
|
||
|
||
Treat repeated schema failures or growing anomaly counts as an upstream regression and coordinate with CISA or mirror maintainers.
|
||
|
||
### 2.3 Smoke Test (staging)
|
||
|
||
1. Deploy the configuration and restart Feedser.
|
||
2. Trigger a pipeline run:
|
||
- CLI: `stella db jobs run source:kev:fetch --and-then source:kev:parse --and-then source:kev:map`
|
||
- REST: `POST /jobs/run { "kind": "source:kev:fetch", "chain": ["source:kev:parse", "source:kev:map"] }`
|
||
3. Verify the metrics exposed by meter `StellaOps.Feedser.Source.Kev`:
|
||
- `kev.fetch.attempts`, `kev.fetch.success`, `kev.fetch.unchanged`, `kev.fetch.failures`
|
||
- `kev.parse.entries` (tag `catalogVersion`), `kev.parse.failures`, `kev.parse.anomalies` (tag `reason`)
|
||
- `kev.map.advisories` (tag `catalogVersion`)
|
||
4. Confirm `feedser.source.http.requests_total{feedser_source="kev"}` increments once per fetch and that the paired `feedser.source.http.failures_total` stays flat (zero increase).
|
||
5. Inspect the info logs `Fetched KEV catalog document … pendingDocuments=…` and `Parsed KEV catalog document … entries=…`—they should appear exactly once per run and `Mapped X/Y… skipped=0` should match the `kev.map.advisories` delta.
|
||
6. Confirm MongoDB documents exist for the catalog JSON (`raw_documents` & `dtos`) and that advisories with prefix `kev/` are written.
|
||
|
||
### 2.4 Production Monitoring
|
||
|
||
- Alert when `rate(kev_fetch_success_total[8h]) == 0` during working hours (daily cadence breach) and when `increase(kev_fetch_failures_total[1h]) > 0`.
|
||
- Page the on-call if `increase(kev_parse_failures_total{reason="schema"}[6h]) > 0`—this usually signals an upstream payload change. Treat repeated `reason="download"` spikes as networking issues to the mirror.
|
||
- Track anomaly spikes through `sum_over_time(kev_parse_anomalies_total{reason="missingCveId"}[24h])`. Rising `countMismatch` trends point to catalog publishing bugs.
|
||
- Surface the fetch/mapping info logs (`Fetched KEV catalog document …` and `Mapped X/Y KEV advisories … skipped=S`) on dashboards; absence of those logs while metrics show success typically means schema validation short-circuited the run.
|
||
|
||
### 2.5 Known good dashboard tiles
|
||
|
||
Add the following panels to the Feedser observability board:
|
||
|
||
| Metric | Recommended visualisation |
|
||
|--------|---------------------------|
|
||
| `rate(kev_fetch_success_total[30m])` | Single-stat (last 24 h) with warning threshold `>0` |
|
||
| `rate(kev_parse_entries_total[1h])` by `catalogVersion` | Stacked area – highlights daily release size |
|
||
| `sum_over_time(kev_parse_anomalies_total[1d])` by `reason` | Table – anomaly breakdown (matches dashboard panel) |
|
||
| `rate(cve_map_success_total[15m])` vs `rate(kev_map_advisories_total[24h])` | Comparative timeseries for advisories emitted |
|
||
|
||
## 3. Runbook updates
|
||
|
||
- Record staging/production smoke test results (date, catalog version, advisory counts) in your team’s change log.
|
||
- Add the CVE/KEV job kinds to the standard maintenance checklist so operators can manually trigger them after planned downtime.
|
||
- Keep this document in sync with future connector changes (for example, new anomaly reasons or additional metrics).
|
||
- Version-control dashboard tweaks alongside `docs/ops/feedser-cve-kev-grafana-dashboard.json` so operations can re-import the observability pack during restores.
|