Files
git.stella-ops.org/docs/dev/EXCITITOR_STATEMENT_BACKFILL.md
root 68da90a11a
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Restructure solution layout by module
2025-10-28 15:10:40 +02:00

87 lines
3.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Excititor Statement Backfill Runbook
Last updated: 2025-10-19
## Overview
Use this runbook when you need to rebuild the `vex.statements` collection from historical raw documents. Typical scenarios:
- Upgrading the statement schema (e.g., adding severity/KEV/EPSS signals).
- Recovering from a partial ingest outage where statements were never persisted.
- Seeding a freshly provisioned Excititor deployment from an existing raw archive.
Backfill operates server-side via the Excititor WebService and reuses the same pipeline that powers the `/excititor/statements` ingestion endpoint. Each raw document is normalized, signed metadata is preserved, and duplicate statements are skipped unless the run is forced.
## Prerequisites
1. **Connectivity to Excititor WebService** the CLI uses the backend URL configured in `stellaops.yml` or the `--backend-url` argument.
2. **Authority credentials** the CLI honours the existing Authority client configuration; ensure the caller has permission to invoke admin endpoints.
3. **Mongo replica set** (recommended) causal consistency guarantees rely on majority read/write concerns. Standalone deployment works but skips cross-document transactions.
## CLI command
```
stellaops excititor backfill-statements \
[--retrieved-since <ISO8601>] \
[--force] \
[--batch-size <int>] \
[--max-documents <int>]
```
| Option | Description |
| ------ | ----------- |
| `--retrieved-since` | Only process raw documents fetched on or after the specified timestamp (UTC by default). |
| `--force` | Reprocess documents even if matching statements already exist (useful after schema upgrades). |
| `--batch-size` | Number of raw documents pulled per batch (default `100`). |
| `--max-documents` | Optional hard limit on the number of raw documents to evaluate. |
Example replay the last 48 hours of Red Hat ingest while keeping existing statements:
```
stellaops excititor backfill-statements \
--retrieved-since "$(date -u -d '48 hours ago' +%Y-%m-%dT%H:%M:%SZ)"
```
Example full replay with forced overwrites, capped at 2,000 documents:
```
stellaops excititor backfill-statements --force --max-documents 2000
```
The command returns a summary similar to:
```
Backfill completed: evaluated 450, backfilled 180, claims written 320, skipped 270, failures 0.
```
## Behaviour
- Raw documents are streamed in ascending `retrievedAt` order.
- Each document is normalized using the registered VEX normalizers (CSAF, CycloneDX, OpenVEX).
- Statements are appended through the same `IVexClaimStore.AppendAsync` path that powers `/excititor/statements`.
- Duplicate detection compares `Document.Digest`; duplicates are skipped unless `--force` is specified.
- Failures are logged with the offending digest and continue with the next document.
## Observability
- CLI logs aggregate counts and the backend logs per-digest warnings or errors.
- Mongo writes carry majority write concern; expect backfill throughput to match ingest baselines (≈5 seconds warm, 30 seconds cold).
- Monitor the `excititor.storage.backfill` log scope for detailed telemetry.
## Post-run verification
1. Inspect the `vex.statements` collection for the targeted window (check `InsertedAt`).
2. Re-run the Excititor storage test suite if possible:
```
dotnet test src/Excititor/__Tests/StellaOps.Excititor.Storage.Mongo.Tests/StellaOps.Excititor.Storage.Mongo.Tests.csproj
```
3. Optionally, call `/excititor/statements/{vulnerabilityId}/{productKey}` to confirm the expected statements exist.
## Rollback
If a forced run produced incorrect statements, use the standard Mongo rollback procedure:
1. Identify the `InsertedAt` window for the backfill run.
2. Delete affected records from `vex.statements` (and any downstream exports if applicable).
3. Rerun the backfill command with corrected parameters.