# Excititor Statement Backfill Runbook Last updated: 2025-10-19 ## Overview Use this runbook when you need to rebuild the `vex.statements` collection from historical raw documents. Typical scenarios: - Upgrading the statement schema (e.g., adding severity/KEV/EPSS signals). - Recovering from a partial ingest outage where statements were never persisted. - Seeding a freshly provisioned Excititor deployment from an existing raw archive. Backfill operates server-side via the Excititor WebService and reuses the same pipeline that powers the `/excititor/statements` ingestion endpoint. Each raw document is normalized, signed metadata is preserved, and duplicate statements are skipped unless the run is forced. ## Prerequisites 1. **Connectivity to Excititor WebService** – the CLI uses the backend URL configured in `stellaops.yml` or the `--backend-url` argument. 2. **Authority credentials** – the CLI honours the existing Authority client configuration; ensure the caller has permission to invoke admin endpoints. 3. **Mongo replica set** (recommended) – causal consistency guarantees rely on majority read/write concerns. Standalone deployment works but skips cross-document transactions. ## CLI command ``` stellaops excititor backfill-statements \ [--retrieved-since ] \ [--force] \ [--batch-size ] \ [--max-documents ] ``` | Option | Description | | ------ | ----------- | | `--retrieved-since` | Only process raw documents fetched on or after the specified timestamp (UTC by default). | | `--force` | Reprocess documents even if matching statements already exist (useful after schema upgrades). | | `--batch-size` | Number of raw documents pulled per batch (default `100`). | | `--max-documents` | Optional hard limit on the number of raw documents to evaluate. | Example – replay the last 48 hours of Red Hat ingest while keeping existing statements: ``` stellaops excititor backfill-statements \ --retrieved-since "$(date -u -d '48 hours ago' +%Y-%m-%dT%H:%M:%SZ)" ``` Example – full replay with forced overwrites, capped at 2,000 documents: ``` stellaops excititor backfill-statements --force --max-documents 2000 ``` The command returns a summary similar to: ``` Backfill completed: evaluated 450, backfilled 180, claims written 320, skipped 270, failures 0. ``` ## Behaviour - Raw documents are streamed in ascending `retrievedAt` order. - Each document is normalized using the registered VEX normalizers (CSAF, CycloneDX, OpenVEX). - Statements are appended through the same `IVexClaimStore.AppendAsync` path that powers `/excititor/statements`. - Duplicate detection compares `Document.Digest`; duplicates are skipped unless `--force` is specified. - Failures are logged with the offending digest and continue with the next document. ## Observability - CLI logs aggregate counts and the backend logs per-digest warnings or errors. - Mongo writes carry majority write concern; expect backfill throughput to match ingest baselines (≈5 seconds warm, 30 seconds cold). - Monitor the `excititor.storage.backfill` log scope for detailed telemetry. ## Post-run verification 1. Inspect the `vex.statements` collection for the targeted window (check `InsertedAt`). 2. Re-run the Excititor storage test suite if possible: ``` dotnet test src/Excititor/__Tests/StellaOps.Excititor.Storage.Mongo.Tests/StellaOps.Excititor.Storage.Mongo.Tests.csproj ``` 3. Optionally, call `/excititor/statements/{vulnerabilityId}/{productKey}` to confirm the expected statements exist. ## Rollback If a forced run produced incorrect statements, use the standard Mongo rollback procedure: 1. Identify the `InsertedAt` window for the backfill run. 2. Delete affected records from `vex.statements` (and any downstream exports if applicable). 3. Rerun the backfill command with corrected parameters.