18 KiB
Here’s a simple, practical idea to make your scans provably repeatable over time and catch drift fast.
Replay Fidelity (what, why, how)
What it is: the share of historical scans that reproduce bit‑for‑bit when re‑run using their saved manifests (inputs, versions, rules, seeds). Higher = more deterministic system.
Why you want it: it exposes hidden nondeterminism (feed drift, time‑dependent rules, race conditions, unstable dependency resolution) and proves auditability for customers/compliance.
The metric
- Per‑scan:
replay_match = 1if SBOM/VEX/findings + hashes are identical; else0. - Windowed:
Replay Fidelity = (Σ replay_match) / (# historical replays in window). - Breakdown: also track by scanner, language, image base, feed version, and environment.
What must be captured in the scan manifest
- Exact source refs (image digest / repo SHA), container layers’ digests
- Scanner build ID + config (flags, rules, lattice/policy sets, seeds)
- Feed snapshots (CVE DB, OVAL, vendor advisories) as content‑addressed bundles
- Normalization/version of SBOM schema (e.g., CycloneDX 1.6 vs SPDX 3.0.1)
- Platform facts (OS/kernel, tz, locale), toolchain versions, clock policy
Pass/Fail rules you can ship
- Green: Fidelity ≥ 0.98 over 30 days, and no bucket < 0.95
- Warn: Any bucket drops by ≥ 2% week‑over‑week
- Fail the pipeline: If fidelity < 0.90 or any regulated project < 0.95
Minimal replay harness (outline)
- Pick N historical scans (e.g., last 200 or stratified by image language).
- Restore their frozen manifest (scanner binary, feed bundle, policy lattice, seeds).
- Re‑run in a pinned runtime (OCI digest, pinned kernel in VM, fixed TZ/locale).
- Compare artifacts: SBOM JSON, VEX JSON, findings list, evidence blobs → SHA‑256.
- Emit: pass/fail, diff summary, and the “cause” tag if mismatch (feed, policy, runtime, code).
Dashboard (what to show)
- Fidelity % (30/90‑day) + sparkline
- Top offenders (by language/scanner/policy set)
- “Cause of mismatch” histogram (feed vs runtime vs code vs policy)
- Click‑through: deterministic diff (e.g., which CVEs flipped and why)
Quick wins for Stella Ops
- Treat feeds as immutable snapshots (content‑addressed tar.zst) and record their digest in each scan.
- Run scanner in a repro shell (OCI image digest + fixed TZ/locale + no network).
- Normalize SBOM/VEX (key order, whitespace, float precision) before hashing.
- Add a
stella replay --from MANIFEST.jsoncommand + nightly cron to sample replays. - Store
replay_resultrows; expose/metricsfor Prometheus and a CI badge:Replay Fidelity: 99.2%.
Want me to draft the stella replay CLI spec and the DB table (DDL) you can drop into Postgres?
Below is an extended “Replay Fidelity” design plus a concrete development implementation plan you can hand to engineering. I’m assuming Stella Ops is doing container/app security scans that output SBOM + findings (and optionally VEX), and uses vulnerability “feeds” and policy/lattice/rules.
1) Extend the concept: Replay Fidelity as a product capability
1.1 Fidelity levels (so you can be strict without being brittle)
Instead of a single yes/no, define tiers that you can report and gate on:
-
Bitwise Fidelity (BF)
- Definition: All primary artifacts (SBOM, findings, VEX, evidence) match byte-for-byte after canonicalization.
- Use: strongest auditability, catch ordering/nondeterminism.
-
Semantic Fidelity (SF)
- Definition: The meaning matches even if formatting differs (e.g., key order, whitespace, timestamps).
- How: compare normalized objects: same packages, versions, CVEs, fix versions, severities, policy verdicts.
- Use: protects you from “cosmetic diffs” and helps triage.
-
Policy Fidelity (PF)
- Definition: Final policy decision (pass/fail + reason codes) matches.
- Use: useful when outputs may evolve but governance outcome must remain stable.
Recommended reporting:
- Dashboard shows BF, SF, PF together.
- Default engineering SLO: BF ≥ 0.98; compliance SLO: BF ≥ 0.95 for regulated projects; PF should be ~1.0 unless policy changed intentionally.
1.2 “Why did it drift?”—Mismatch classification taxonomy
When a replay fails, auto-tag the cause so humans don’t diff JSON by hand.
Primary mismatch classes
- Feed drift: CVE/OVAL/vendor advisory snapshot differs.
- Policy drift: policy/lattice/rules differ (or default rule set changed).
- Runtime drift: base image / libc / kernel / locale / tz / CPU arch differences.
- Scanner drift: scanner binary build differs or dependency versions changed.
- Nondeterminism: ordering instability, concurrency race, unseeded RNG, time-based logic.
- External IO: network calls, “latest” resolution, remote package registry changes.
Output: a mismatch_reason plus a short diff_summary.
1.3 Deterministic “scan envelope” design
A replay only works if the scan is fully specified.
Scan envelope components
- Inputs: image digest, repo commit SHA, build provenance, layers digests.
- Scanner: scanner OCI image digest (or binary digest), config flags, feature toggles.
- Feeds: content-addressed feed bundle digests (see §2.3).
- Policy/rules: git commit SHA + content digest of compiled rules.
- Environment: OS/arch, tz/locale, “clock mode”, network mode, CPU count.
- Normalization: “canonicalization version” for SBOM/VEX/findings.
1.4 Canonicalization so “bitwise” is meaningful
To make BF achievable:
- Canonical JSON serialization (sorted keys, stable array ordering, normalized floats)
- Strip/normalize volatile fields (timestamps, “scan_duration_ms”, hostnames)
- Stable ordering for lists: packages sorted by
(purl, version), vulnerabilities by(cve_id, affected_purl) - Deterministic IDs: if you generate internal IDs, derive from stable hashes of content (not UUID4)
1.5 Sampling strategy
You don’t need to replay everything.
Nightly sample: stratified by:
- language ecosystem (npm, pip, maven, go, rust…)
- scanner engine
- base OS
- “regulatory tier”
- image size/complexity
Plus: always replay “golden canaries” (a fixed set of reference images) after every scanner release and feed ingestion pipeline change.
2) Technical architecture blueprint
2.1 System components
-
Manifest Writer (in the scan pipeline)
- Produces
ScanManifest v1JSON - Records all digests and versions
- Produces
-
Artifact Store
- Stores SBOM, findings, VEX, evidence blobs
- Stores canonical hashes for BF checks
-
Feed Snapshotter
- Periodically builds immutable feed bundles
- Content-addressed (digest-keyed)
- Stores metadata (source URLs, generation timestamp, signature)
-
Replay Orchestrator
- Chooses historical scans to replay
- Launches “replay executor” jobs
-
Replay Executor
- Runs scanner in pinned container image
- Network off, tz fixed, clock policy applied
- Produces new artifacts + hashes
-
Diff & Scoring Engine
- Computes BF/SF/PF
- Generates mismatch classification + diff summary
-
Metrics + UI Dashboard
- Prometheus metrics
- UI for drill-down diffs
2.2 Data model (Postgres-friendly)
Core tables
-
scan_manifestsscan_id (pk)manifest_jsonmanifest_sha256created_at
-
scan_artifactsscan_id (fk)artifact_type(sbom|findings|vex|evidence)artifact_uricanonical_sha256schema_version
-
feed_snapshotsfeed_digest (pk)bundle_urisources_jsongenerated_atsignature
-
replay_runsreplay_id (pk)original_scan_id (fk)status(queued|running|passed|failed)bf_match bool,sf_match bool,pf_match boolmismatch_reasondiff_summary_jsonstarted_at,finished_atexecutor_env_json(arch, tz, cpu, image digest)
Indexes
(created_at)for sampling windows(mismatch_reason, finished_at)for triage(scanner_version, ecosystem)for breakdown dashboards
2.3 Feed Snapshotting (the key to long-term replay)
Feed bundle format
-
feeds/<source>/<date>/...inside a tar.zst -
manifest file inside bundle:
feed_bundle_manifest.jsoncontaining:- source URLs
- retrieval commit/etag (if any)
- file hashes
- generated_by version
Content addressing
- Digest of the entire bundle (
sha256(tar.zst)) is the reference. - Scans record only the digest + URI.
Immutability
- Store bundles in object storage with WORM / retention if you need compliance.
2.4 Replay execution sandbox
For determinism, enforce:
-
No network (K8s NetworkPolicy, firewall rules, or container runtime flags)
-
Fixed TZ/locale
-
Pinned container image digest
-
Clock policy
- Either “real time but recorded” or “frozen time at original scan timestamp”
- If scanner logic uses current date for severity windows, freeze time
3) Development implementation plan
I’ll lay this out as workstreams + a sprinted plan. You can compress/expand depending on team size.
Workstream A — Scan Manifest & Canonical Artifacts
Goal: every scan is replayable on paper, even before replays run.
Deliverables
ScanManifest v1schema + writer integrated into scan pipeline- Canonicalization library + canonical hashing for all artifacts
Acceptance criteria
- Every scan stores: input digests, scanner digest, policy digest, feed digest placeholders
- Artifact hashes are stable across repeated runs in the same environment
Workstream B — Feed Snapshotting & Policy Versioning
Goal: eliminate “feed drift” by pinning immutable inputs.
Deliverables
- Feed bundle builder + signer + uploader
- Policy/rules bundler (compiled rules bundle, digest recorded)
Acceptance criteria
- New scans reference feed bundle digests (not “latest”)
- A scan can be re-run with the same feed bundle and policy bundle
Workstream C — Replay Runner & Diff Engine
Goal: execute historical scans and score BF/SF/PF with actionable diffs.
Deliverables
stella replay --from manifest.json- Orchestrator job to schedule replays
- Diff engine + mismatch classifier
- Storage of replay results
Acceptance criteria
- Replay produces deterministic artifacts in a pinned environment
- Dashboard/CLI shows BF/SF/PF + diff summary for failures
Workstream D — Observability, Dashboard, and CI Gates
Goal: make fidelity visible and enforceable.
Deliverables
- Prometheus metrics:
replay_fidelity_bf,replay_fidelity_sf,replay_fidelity_pf - Breakdown labels (scanner, ecosystem, policy_set, base_os)
- Alerts for drop thresholds
- CI gate option: “block release if BF < threshold on canary set”
Acceptance criteria
- Engineering can see drift within 24h
- Releases are blocked when fidelity regressions occur
4) Suggested sprint plan with concrete tasks
Sprint 0 — Design lock + baseline
Tasks
- Define manifest schema:
ScanManifest v1fields + versioning rules - Decide canonicalization rules (what is normalized vs preserved)
- Choose initial “golden canary” scan set (10–20 representative targets)
- Add “replay-fidelity” epic with ownership & SLIs/SLOs
Exit criteria
- Approved schema + canonicalization spec
- Canary set stored and tagged
Sprint 1 — Manifest writer + artifact hashing (MVP)
Tasks
-
Implement manifest writer in scan pipeline
-
Store
manifest_json+manifest_sha256 -
Implement canonicalization + hashing for:
- findings list (sorted)
- SBOM (normalized)
- VEX (if present)
-
Persist canonical hashes in
scan_artifacts
Exit criteria
-
Two identical scans in the same environment yield identical artifact hashes
-
A “manifest export” endpoint/CLI works:
stella scan --emit-manifest out.json
Sprint 2 — Feed snapshotter + policy bundling
Tasks
-
Build feed bundler job:
- pull raw sources
- normalize layout
- generate
feed_bundle_manifest.json - tar.zst + sha256
- upload + record in
feed_snapshots
-
Update scan pipeline:
- resolve feed bundle digest at scan start
- record digest in scan manifest
-
Bundle policy/lattice:
- compile rules into an immutable artifact
- record policy bundle digest in manifest
Exit criteria
- Scans reference immutable feed + policy digests
- You can fetch feed bundle by digest and reproduce the same feed inputs
Sprint 3 — Replay executor + “no network” sandbox
Tasks
-
Create replay container image / runtime wrapper
-
Implement
stella replay --from MANIFEST.json- pulls scanner image by digest
- mounts feed bundle + policy bundle
- runs in network-off mode
- applies tz/locale + clock mode
-
Store replay outputs as artifacts (
replay_scan_idorreplay_idlinkage)
Exit criteria
- Replay runs end-to-end for canary scans
- Deterministic runtime controls verified (no DNS egress, fixed tz)
Sprint 4 — Diff engine + mismatch classification
Tasks
-
Implement BF compare (canonical hashes)
-
Implement SF compare (semantic JSON/object comparison)
-
Implement PF compare (policy decision equivalence)
-
Implement mismatch classification rules:
- if feed digest differs → feed drift
- if scanner digest differs → scanner drift
- if environment differs → runtime drift
- else → nondeterminism (with sub-tags for ordering/time/RNG)
-
Generate
diff_summary_json:- top N changed CVEs
- packages added/removed
- policy verdict changes
Exit criteria
- Every failed replay has a cause tag and a diff summary that’s useful in <2 minutes
- Engineers can reproduce failures locally with the manifest
Sprint 5 — Dashboard + alerts + CI gate
Tasks
-
Expose Prometheus metrics from replay service
-
Build dashboard:
- BF/SF/PF trends
- breakdown by ecosystem/scanner/policy
- mismatch cause histogram
-
Add alerting rules (drop threshold, bucket regression)
-
Add CI gate mode:
- “run replays on canary set for this release candidate”
- block merge if BF < target
Exit criteria
- Fidelity visible to leadership and engineering
- Release process is protected by canary replays
Sprint 6 — Hardening + compliance polish
Tasks
-
Backward compatible manifest upgrades:
manifest_versionbump rules- migration support
-
Artifact signing / integrity:
- sign manifest hash
- optional transparency log later
-
Storage & retention policies (cost controls)
-
Runbook + oncall playbook
Exit criteria
- Audit story is complete: “show me exactly how scan X was produced”
- Operational load is manageable and cost-bounded
5) Engineering specs you can start implementing immediately
5.1 ScanManifest v1 skeleton (example)
{
"manifest_version": "1.0",
"scan_id": "scan_123",
"created_at": "2025-12-12T10:15:30Z",
"input": {
"type": "oci_image",
"image_ref": "registry/app@sha256:...",
"layers": ["sha256:...", "sha256:..."],
"source_provenance": {"repo_sha": "abc123", "build_id": "ci-999"}
},
"scanner": {
"engine": "stella",
"scanner_image_digest": "sha256:...",
"scanner_version": "2025.12.0",
"config_digest": "sha256:...",
"flags": ["--deep", "--vex"]
},
"feeds": {
"vuln_feed_bundle_digest": "sha256:...",
"license_db_digest": "sha256:..."
},
"policy": {
"policy_bundle_digest": "sha256:...",
"policy_set": "prod-default"
},
"environment": {
"arch": "amd64",
"os": "linux",
"tz": "UTC",
"locale": "C",
"network": "disabled",
"clock_mode": "frozen",
"clock_value": "2025-12-12T10:15:30Z"
},
"normalization": {
"canonicalizer_version": "1.2.0",
"sbom_schema": "cyclonedx-1.6",
"vex_schema": "cyclonedx-vex-1.0"
}
}
5.2 CLI spec (minimal)
stella scan ... --emit-manifest MANIFEST.json --emit-artifacts-dir out/stella replay --from MANIFEST.json --out-dir replay_out/stella diff --a out/ --b replay_out/ --mode bf|sf|pf --json
6) Testing strategy (to prevent determinism regressions)
Unit tests
- Canonicalization: same object → same bytes
- Sorting stability: randomized input order → stable output
- Hash determinism
Integration tests
-
Golden canaries:
- run scan twice in same runner → BF match
- replay from manifest → BF match
-
“Network leak” test:
- DNS requests must be zero
-
“Clock leak” test:
- freeze time; ensure outputs do not include real timestamps
Chaos tests
- Vary CPU count, run concurrency, run order → still BF match
- Randomized scheduling / thread interleavings to find races
7) Operational policies (so it stays useful)
Retention & cost controls
-
Keep full artifacts for regulated scans (e.g., 1–7 years)
-
For non-regulated:
- keep manifests + canonical hashes long-term
- expire heavy evidence blobs after N days
-
Compress large artifacts and dedupe by digest
Alerting examples
- BF drops by ≥2% week-over-week (any major bucket) → warn
- BF < 0.90 overall or regulated BF < 0.95 → page / block release
Triage workflow
-
Failed replay auto-creates a ticket with:
- manifest link
- mismatch_reason
- diff_summary
- reproduction command
8) What “done” looks like (definition of success)
-
Any customer/auditor can pick a scan from 6 months ago and you can:
- retrieve manifest + feed bundle + policy bundle by digest
- replay in a pinned sandbox
- show BF/SF/PF results and diffs
-
Engineering sees drift quickly and can attribute it to feed vs scanner vs runtime.
If you want, I can also provide:
- a Postgres DDL for the tables above,
- a Prometheus metrics contract (names + labels + example queries),
- and a diff_summary_json schema that supports a UI “diff view” without reprocessing artifacts.