Here’s a simple, practical idea to make your scans provably repeatable over time and catch drift fast. # Replay Fidelity (what, why, how) **What it is:** the share of historical scans that reproduce **bit‑for‑bit** when re‑run using their saved manifests (inputs, versions, rules, seeds). Higher = more deterministic system. **Why you want it:** it exposes hidden nondeterminism (feed drift, time‑dependent rules, race conditions, unstable dependency resolution) and proves auditability for customers/compliance. --- ## The metric * **Per‑scan:** `replay_match = 1` if SBOM/VEX/findings + hashes are identical; else `0`. * **Windowed:** `Replay Fidelity = (Σ replay_match) / (# historical replays in window)`. * **Breakdown:** also track by scanner, language, image base, feed version, and environment. --- ## What must be captured in the scan manifest * Exact source refs (image digest / repo SHA), container layers’ digests * Scanner build ID + config (flags, rules, lattice/policy sets, seeds) * Feed snapshots (CVE DB, OVAL, vendor advisories) as **content‑addressed** bundles * Normalization/version of SBOM schema (e.g., CycloneDX 1.6 vs SPDX 3.0.1) * Platform facts (OS/kernel, tz, locale), toolchain versions, clock policy --- ## Pass/Fail rules you can ship * **Green:** Fidelity ≥ 0.98 over 30 days, and no bucket < 0.95 * **Warn:** Any bucket drops by ≥ 2% week‑over‑week * **Fail the pipeline:** If fidelity < 0.90 or any regulated project < 0.95 --- ## Minimal replay harness (outline) 1. Pick N historical scans (e.g., last 200 or stratified by image language). 2. Restore their **frozen** manifest (scanner binary, feed bundle, policy lattice, seeds). 3. Re‑run in a pinned runtime (OCI digest, pinned kernel in VM, fixed TZ/locale). 4. Compare artifacts: SBOM JSON, VEX JSON, findings list, evidence blobs → SHA‑256. 5. Emit: pass/fail, diff summary, and the “cause” tag if mismatch (feed, policy, runtime, code). --- ## Dashboard (what to show) * Fidelity % (30/90‑day) + sparkline * Top offenders (by language/scanner/policy set) * “Cause of mismatch” histogram (feed vs runtime vs code vs policy) * Click‑through: deterministic diff (e.g., which CVEs flipped and why) --- ## Quick wins for Stella Ops * Treat **feeds as immutable snapshots** (content‑addressed tar.zst) and record their digest in each scan. * Run scanner in a **repro shell** (OCI image digest + fixed TZ/locale + no network). * Normalize SBOM/VEX (key order, whitespace, float precision) before hashing. * Add a `stella replay --from MANIFEST.json` command + nightly cron to sample replays. * Store `replay_result` rows; expose `/metrics` for Prometheus and a CI badge: `Replay Fidelity: 99.2%`. Want me to draft the `stella replay` CLI spec and the DB table (DDL) you can drop into Postgres? Below is an extended “Replay Fidelity” design **plus a concrete development implementation plan** you can hand to engineering. I’m assuming Stella Ops is doing container/app security scans that output SBOM + findings (and optionally VEX), and uses vulnerability “feeds” and policy/lattice/rules. --- ## 1) Extend the concept: Replay Fidelity as a product capability ### 1.1 Fidelity levels (so you can be strict without being brittle) Instead of a single yes/no, define **tiers** that you can report and gate on: 1. **Bitwise Fidelity (BF)** * *Definition:* All primary artifacts (SBOM, findings, VEX, evidence) match **byte-for-byte** after canonicalization. * *Use:* strongest auditability, catch ordering/nondeterminism. 2. **Semantic Fidelity (SF)** * *Definition:* The *meaning* matches even if formatting differs (e.g., key order, whitespace, timestamps). * *How:* compare normalized objects: same packages, versions, CVEs, fix versions, severities, policy verdicts. * *Use:* protects you from “cosmetic diffs” and helps triage. 3. **Policy Fidelity (PF)** * *Definition:* Final policy decision (pass/fail + reason codes) matches. * *Use:* useful when outputs may evolve but governance outcome must remain stable. **Recommended reporting:** * Dashboard shows BF, SF, PF together. * Default engineering SLO: **BF ≥ 0.98**; compliance SLO: **BF ≥ 0.95** for regulated projects; PF should be ~1.0 unless policy changed intentionally. --- ### 1.2 “Why did it drift?”—Mismatch classification taxonomy When a replay fails, auto-tag the cause so humans don’t diff JSON by hand. **Primary mismatch classes** * **Feed drift:** CVE/OVAL/vendor advisory snapshot differs. * **Policy drift:** policy/lattice/rules differ (or default rule set changed). * **Runtime drift:** base image / libc / kernel / locale / tz / CPU arch differences. * **Scanner drift:** scanner binary build differs or dependency versions changed. * **Nondeterminism:** ordering instability, concurrency race, unseeded RNG, time-based logic. * **External IO:** network calls, “latest” resolution, remote package registry changes. **Output:** a `mismatch_reason` plus a short `diff_summary`. --- ### 1.3 Deterministic “scan envelope” design A replay only works if the scan is fully specified. **Scan envelope components** * **Inputs:** image digest, repo commit SHA, build provenance, layers digests. * **Scanner:** scanner OCI image digest (or binary digest), config flags, feature toggles. * **Feeds:** content-addressed feed bundle digests (see §2.3). * **Policy/rules:** git commit SHA + content digest of compiled rules. * **Environment:** OS/arch, tz/locale, “clock mode”, network mode, CPU count. * **Normalization:** “canonicalization version” for SBOM/VEX/findings. --- ### 1.4 Canonicalization so “bitwise” is meaningful To make BF achievable: * Canonical JSON serialization (sorted keys, stable array ordering, normalized floats) * Strip/normalize volatile fields (timestamps, “scan_duration_ms”, hostnames) * Stable ordering for lists: packages sorted by `(purl, version)`, vulnerabilities by `(cve_id, affected_purl)` * Deterministic IDs: if you generate internal IDs, derive from stable hashes of content (not UUID4) --- ### 1.5 Sampling strategy You don’t need to replay everything. **Nightly sample:** stratified by: * language ecosystem (npm, pip, maven, go, rust…) * scanner engine * base OS * “regulatory tier” * image size/complexity **Plus:** always replay “golden canaries” (a fixed set of reference images) after every scanner release and feed ingestion pipeline change. --- ## 2) Technical architecture blueprint ### 2.1 System components 1. **Manifest Writer (in the scan pipeline)** * Produces `ScanManifest v1` JSON * Records all digests and versions 2. **Artifact Store** * Stores SBOM, findings, VEX, evidence blobs * Stores canonical hashes for BF checks 3. **Feed Snapshotter** * Periodically builds immutable feed bundles * Content-addressed (digest-keyed) * Stores metadata (source URLs, generation timestamp, signature) 4. **Replay Orchestrator** * Chooses historical scans to replay * Launches “replay executor” jobs 5. **Replay Executor** * Runs scanner in pinned container image * Network off, tz fixed, clock policy applied * Produces new artifacts + hashes 6. **Diff & Scoring Engine** * Computes BF/SF/PF * Generates mismatch classification + diff summary 7. **Metrics + UI Dashboard** * Prometheus metrics * UI for drill-down diffs --- ### 2.2 Data model (Postgres-friendly) **Core tables** * `scan_manifests` * `scan_id (pk)` * `manifest_json` * `manifest_sha256` * `created_at` * `scan_artifacts` * `scan_id (fk)` * `artifact_type` (sbom|findings|vex|evidence) * `artifact_uri` * `canonical_sha256` * `schema_version` * `feed_snapshots` * `feed_digest (pk)` * `bundle_uri` * `sources_json` * `generated_at` * `signature` * `replay_runs` * `replay_id (pk)` * `original_scan_id (fk)` * `status` (queued|running|passed|failed) * `bf_match bool`, `sf_match bool`, `pf_match bool` * `mismatch_reason` * `diff_summary_json` * `started_at`, `finished_at` * `executor_env_json` (arch, tz, cpu, image digest) **Indexes** * `(created_at)` for sampling windows * `(mismatch_reason, finished_at)` for triage * `(scanner_version, ecosystem)` for breakdown dashboards --- ### 2.3 Feed Snapshotting (the key to long-term replay) **Feed bundle format** * `feeds///...` inside a tar.zst * manifest file inside bundle: `feed_bundle_manifest.json` containing: * source URLs * retrieval commit/etag (if any) * file hashes * generated_by version **Content addressing** * Digest of the entire bundle (`sha256(tar.zst)`) is the reference. * Scans record only the digest + URI. **Immutability** * Store bundles in object storage with WORM / retention if you need compliance. --- ### 2.4 Replay execution sandbox For determinism, enforce: * **No network** (K8s NetworkPolicy, firewall rules, or container runtime flags) * **Fixed TZ/locale** * **Pinned container image digest** * **Clock policy** * Either “real time but recorded” or “frozen time at original scan timestamp” * If scanner logic uses current date for severity windows, freeze time --- ## 3) Development implementation plan I’ll lay this out as **workstreams** + **a sprinted plan**. You can compress/expand depending on team size. ### Workstream A — Scan Manifest & Canonical Artifacts **Goal:** every scan is replayable on paper, even before replays run. **Deliverables** * `ScanManifest v1` schema + writer integrated into scan pipeline * Canonicalization library + canonical hashing for all artifacts **Acceptance criteria** * Every scan stores: input digests, scanner digest, policy digest, feed digest placeholders * Artifact hashes are stable across repeated runs in the same environment --- ### Workstream B — Feed Snapshotting & Policy Versioning **Goal:** eliminate “feed drift” by pinning immutable inputs. **Deliverables** * Feed bundle builder + signer + uploader * Policy/rules bundler (compiled rules bundle, digest recorded) **Acceptance criteria** * New scans reference feed bundle digests (not “latest”) * A scan can be re-run with the same feed bundle and policy bundle --- ### Workstream C — Replay Runner & Diff Engine **Goal:** execute historical scans and score BF/SF/PF with actionable diffs. **Deliverables** * `stella replay --from manifest.json` * Orchestrator job to schedule replays * Diff engine + mismatch classifier * Storage of replay results **Acceptance criteria** * Replay produces deterministic artifacts in a pinned environment * Dashboard/CLI shows BF/SF/PF + diff summary for failures --- ### Workstream D — Observability, Dashboard, and CI Gates **Goal:** make fidelity visible and enforceable. **Deliverables** * Prometheus metrics: `replay_fidelity_bf`, `replay_fidelity_sf`, `replay_fidelity_pf` * Breakdown labels (scanner, ecosystem, policy_set, base_os) * Alerts for drop thresholds * CI gate option: “block release if BF < threshold on canary set” **Acceptance criteria** * Engineering can see drift within 24h * Releases are blocked when fidelity regressions occur --- ## 4) Suggested sprint plan with concrete tasks ### Sprint 0 — Design lock + baseline **Tasks** * Define manifest schema: `ScanManifest v1` fields + versioning rules * Decide canonicalization rules (what is normalized vs preserved) * Choose initial “golden canary” scan set (10–20 representative targets) * Add “replay-fidelity” epic with ownership & SLIs/SLOs **Exit criteria** * Approved schema + canonicalization spec * Canary set stored and tagged --- ### Sprint 1 — Manifest writer + artifact hashing (MVP) **Tasks** * Implement manifest writer in scan pipeline * Store `manifest_json` + `manifest_sha256` * Implement canonicalization + hashing for: * findings list (sorted) * SBOM (normalized) * VEX (if present) * Persist canonical hashes in `scan_artifacts` **Exit criteria** * Two identical scans in the same environment yield identical artifact hashes * A “manifest export” endpoint/CLI works: * `stella scan --emit-manifest out.json` --- ### Sprint 2 — Feed snapshotter + policy bundling **Tasks** * Build feed bundler job: * pull raw sources * normalize layout * generate `feed_bundle_manifest.json` * tar.zst + sha256 * upload + record in `feed_snapshots` * Update scan pipeline: * resolve feed bundle digest at scan start * record digest in scan manifest * Bundle policy/lattice: * compile rules into an immutable artifact * record policy bundle digest in manifest **Exit criteria** * Scans reference immutable feed + policy digests * You can fetch feed bundle by digest and reproduce the same feed inputs --- ### Sprint 3 — Replay executor + “no network” sandbox **Tasks** * Create replay container image / runtime wrapper * Implement `stella replay --from MANIFEST.json` * pulls scanner image by digest * mounts feed bundle + policy bundle * runs in network-off mode * applies tz/locale + clock mode * Store replay outputs as artifacts (`replay_scan_id` or `replay_id` linkage) **Exit criteria** * Replay runs end-to-end for canary scans * Deterministic runtime controls verified (no DNS egress, fixed tz) --- ### Sprint 4 — Diff engine + mismatch classification **Tasks** * Implement BF compare (canonical hashes) * Implement SF compare (semantic JSON/object comparison) * Implement PF compare (policy decision equivalence) * Implement mismatch classification rules: * if feed digest differs → feed drift * if scanner digest differs → scanner drift * if environment differs → runtime drift * else → nondeterminism (with sub-tags for ordering/time/RNG) * Generate `diff_summary_json`: * top N changed CVEs * packages added/removed * policy verdict changes **Exit criteria** * Every failed replay has a cause tag and a diff summary that’s useful in <2 minutes * Engineers can reproduce failures locally with the manifest --- ### Sprint 5 — Dashboard + alerts + CI gate **Tasks** * Expose Prometheus metrics from replay service * Build dashboard: * BF/SF/PF trends * breakdown by ecosystem/scanner/policy * mismatch cause histogram * Add alerting rules (drop threshold, bucket regression) * Add CI gate mode: * “run replays on canary set for this release candidate” * block merge if BF < target **Exit criteria** * Fidelity visible to leadership and engineering * Release process is protected by canary replays --- ### Sprint 6 — Hardening + compliance polish **Tasks** * Backward compatible manifest upgrades: * `manifest_version` bump rules * migration support * Artifact signing / integrity: * sign manifest hash * optional transparency log later * Storage & retention policies (cost controls) * Runbook + oncall playbook **Exit criteria** * Audit story is complete: “show me exactly how scan X was produced” * Operational load is manageable and cost-bounded --- ## 5) Engineering specs you can start implementing immediately ### 5.1 `ScanManifest v1` skeleton (example) ```json { "manifest_version": "1.0", "scan_id": "scan_123", "created_at": "2025-12-12T10:15:30Z", "input": { "type": "oci_image", "image_ref": "registry/app@sha256:...", "layers": ["sha256:...", "sha256:..."], "source_provenance": {"repo_sha": "abc123", "build_id": "ci-999"} }, "scanner": { "engine": "stella", "scanner_image_digest": "sha256:...", "scanner_version": "2025.12.0", "config_digest": "sha256:...", "flags": ["--deep", "--vex"] }, "feeds": { "vuln_feed_bundle_digest": "sha256:...", "license_db_digest": "sha256:..." }, "policy": { "policy_bundle_digest": "sha256:...", "policy_set": "prod-default" }, "environment": { "arch": "amd64", "os": "linux", "tz": "UTC", "locale": "C", "network": "disabled", "clock_mode": "frozen", "clock_value": "2025-12-12T10:15:30Z" }, "normalization": { "canonicalizer_version": "1.2.0", "sbom_schema": "cyclonedx-1.6", "vex_schema": "cyclonedx-vex-1.0" } } ``` --- ### 5.2 CLI spec (minimal) * `stella scan ... --emit-manifest MANIFEST.json --emit-artifacts-dir out/` * `stella replay --from MANIFEST.json --out-dir replay_out/` * `stella diff --a out/ --b replay_out/ --mode bf|sf|pf --json` --- ## 6) Testing strategy (to prevent determinism regressions) ### Unit tests * Canonicalization: same object → same bytes * Sorting stability: randomized input order → stable output * Hash determinism ### Integration tests * Golden canaries: * run scan twice in same runner → BF match * replay from manifest → BF match * “Network leak” test: * DNS requests must be zero * “Clock leak” test: * freeze time; ensure outputs do not include real timestamps ### Chaos tests * Vary CPU count, run concurrency, run order → still BF match * Randomized scheduling / thread interleavings to find races --- ## 7) Operational policies (so it stays useful) ### Retention & cost controls * Keep full artifacts for regulated scans (e.g., 1–7 years) * For non-regulated: * keep manifests + canonical hashes long-term * expire heavy evidence blobs after N days * Compress large artifacts and dedupe by digest ### Alerting examples * BF drops by ≥2% week-over-week (any major bucket) → warn * BF < 0.90 overall or regulated BF < 0.95 → page / block release ### Triage workflow * Failed replay auto-creates a ticket with: * manifest link * mismatch_reason * diff_summary * reproduction command --- ## 8) What “done” looks like (definition of success) * Any customer/auditor can pick a scan from 6 months ago and you can: 1. retrieve manifest + feed bundle + policy bundle by digest 2. replay in a pinned sandbox 3. show BF/SF/PF results and diffs * Engineering sees drift quickly and can attribute it to feed vs scanner vs runtime. --- If you want, I can also provide: * a **Postgres DDL** for the tables above, * a **Prometheus metrics contract** (names + labels + example queries), * and a **diff_summary_json schema** that supports a UI “diff view” without reprocessing artifacts.