stella-ops.org/git.stella-ops.org

Fork 0

Files

master d776e93b16

Docs CI / lint-and-preview (push) Has been cancelled

Details

add advisories

2025-12-13 02:08:11 +02:00

18 KiB

Raw Blame History

Here’s a simple, practical idea to make your scans provably repeatable over time and catch drift fast.

Replay Fidelity (what, why, how)

What it is: the share of historical scans that reproduce bit‑for‑bit when re‑run using their saved manifests (inputs, versions, rules, seeds). Higher = more deterministic system.

Why you want it: it exposes hidden nondeterminism (feed drift, time‑dependent rules, race conditions, unstable dependency resolution) and proves auditability for customers/compliance.

The metric

Per‑scan: replay_match = 1 if SBOM/VEX/findings + hashes are identical; else 0.
Windowed: Replay Fidelity = (Σ replay_match) / (# historical replays in window).
Breakdown: also track by scanner, language, image base, feed version, and environment.

What must be captured in the scan manifest

Exact source refs (image digest / repo SHA), container layers’ digests
Scanner build ID + config (flags, rules, lattice/policy sets, seeds)
Feed snapshots (CVE DB, OVAL, vendor advisories) as content‑addressed bundles
Normalization/version of SBOM schema (e.g., CycloneDX 1.6 vs SPDX 3.0.1)
Platform facts (OS/kernel, tz, locale), toolchain versions, clock policy

Pass/Fail rules you can ship

Green: Fidelity ≥ 0.98 over 30 days, and no bucket < 0.95
Warn: Any bucket drops by ≥ 2% week‑over‑week
Fail the pipeline: If fidelity < 0.90 or any regulated project < 0.95

Minimal replay harness (outline)

Pick N historical scans (e.g., last 200 or stratified by image language).
Restore their frozen manifest (scanner binary, feed bundle, policy lattice, seeds).
Re‑run in a pinned runtime (OCI digest, pinned kernel in VM, fixed TZ/locale).
Compare artifacts: SBOM JSON, VEX JSON, findings list, evidence blobs → SHA‑256.
Emit: pass/fail, diff summary, and the “cause” tag if mismatch (feed, policy, runtime, code).

Dashboard (what to show)

Fidelity % (30/90‑day) + sparkline
Top offenders (by language/scanner/policy set)
“Cause of mismatch” histogram (feed vs runtime vs code vs policy)
Click‑through: deterministic diff (e.g., which CVEs flipped and why)

Quick wins for Stella Ops

Treat feeds as immutable snapshots (content‑addressed tar.zst) and record their digest in each scan.
Run scanner in a repro shell (OCI image digest + fixed TZ/locale + no network).
Normalize SBOM/VEX (key order, whitespace, float precision) before hashing.
Add a stella replay --from MANIFEST.json command + nightly cron to sample replays.
Store replay_result rows; expose /metrics for Prometheus and a CI badge: Replay Fidelity: 99.2%.

Want me to draft the stella replay CLI spec and the DB table (DDL) you can drop into Postgres? Below is an extended “Replay Fidelity” design plus a concrete development implementation plan you can hand to engineering. I’m assuming Stella Ops is doing container/app security scans that output SBOM + findings (and optionally VEX), and uses vulnerability “feeds” and policy/lattice/rules.

1) Extend the concept: Replay Fidelity as a product capability

1.1 Fidelity levels (so you can be strict without being brittle)

Instead of a single yes/no, define tiers that you can report and gate on:

Bitwise Fidelity (BF)
- Definition: All primary artifacts (SBOM, findings, VEX, evidence) match byte-for-byte after canonicalization.
- Use: strongest auditability, catch ordering/nondeterminism.
Semantic Fidelity (SF)
- Definition: The meaning matches even if formatting differs (e.g., key order, whitespace, timestamps).
- How: compare normalized objects: same packages, versions, CVEs, fix versions, severities, policy verdicts.
- Use: protects you from “cosmetic diffs” and helps triage.
Policy Fidelity (PF)
- Definition: Final policy decision (pass/fail + reason codes) matches.
- Use: useful when outputs may evolve but governance outcome must remain stable.

Recommended reporting:

Dashboard shows BF, SF, PF together.
Default engineering SLO: BF ≥ 0.98; compliance SLO: BF ≥ 0.95 for regulated projects; PF should be ~1.0 unless policy changed intentionally.

1.2 “Why did it drift?”—Mismatch classification taxonomy

When a replay fails, auto-tag the cause so humans don’t diff JSON by hand.

Primary mismatch classes

Feed drift: CVE/OVAL/vendor advisory snapshot differs.
Policy drift: policy/lattice/rules differ (or default rule set changed).
Runtime drift: base image / libc / kernel / locale / tz / CPU arch differences.
Scanner drift: scanner binary build differs or dependency versions changed.
Nondeterminism: ordering instability, concurrency race, unseeded RNG, time-based logic.
External IO: network calls, “latest” resolution, remote package registry changes.

Output: a mismatch_reason plus a short diff_summary.

1.3 Deterministic “scan envelope” design

A replay only works if the scan is fully specified.

Scan envelope components

Inputs: image digest, repo commit SHA, build provenance, layers digests.
Scanner: scanner OCI image digest (or binary digest), config flags, feature toggles.
Feeds: content-addressed feed bundle digests (see §2.3).
Policy/rules: git commit SHA + content digest of compiled rules.
Environment: OS/arch, tz/locale, “clock mode”, network mode, CPU count.
Normalization: “canonicalization version” for SBOM/VEX/findings.

1.4 Canonicalization so “bitwise” is meaningful

To make BF achievable:

Canonical JSON serialization (sorted keys, stable array ordering, normalized floats)
Strip/normalize volatile fields (timestamps, “scan_duration_ms”, hostnames)
Stable ordering for lists: packages sorted by (purl, version), vulnerabilities by (cve_id, affected_purl)
Deterministic IDs: if you generate internal IDs, derive from stable hashes of content (not UUID4)

1.5 Sampling strategy

You don’t need to replay everything.

Nightly sample: stratified by:

language ecosystem (npm, pip, maven, go, rust…)
scanner engine
base OS
“regulatory tier”
image size/complexity

Plus: always replay “golden canaries” (a fixed set of reference images) after every scanner release and feed ingestion pipeline change.

2) Technical architecture blueprint

2.1 System components

Manifest Writer (in the scan pipeline)
- Produces ScanManifest v1 JSON
- Records all digests and versions
Artifact Store
- Stores SBOM, findings, VEX, evidence blobs
- Stores canonical hashes for BF checks
Feed Snapshotter
- Periodically builds immutable feed bundles
- Content-addressed (digest-keyed)
- Stores metadata (source URLs, generation timestamp, signature)
Replay Orchestrator
- Chooses historical scans to replay
- Launches “replay executor” jobs
Replay Executor
- Runs scanner in pinned container image
- Network off, tz fixed, clock policy applied
- Produces new artifacts + hashes
Diff & Scoring Engine
- Computes BF/SF/PF
- Generates mismatch classification + diff summary
Metrics + UI Dashboard
- Prometheus metrics
- UI for drill-down diffs

2.2 Data model (Postgres-friendly)

Core tables

scan_manifests
- scan_id (pk)
- manifest_json
- manifest_sha256
- created_at
scan_artifacts
- scan_id (fk)
- artifact_type (sbom|findings|vex|evidence)
- artifact_uri
- canonical_sha256
- schema_version
feed_snapshots
- feed_digest (pk)
- bundle_uri
- sources_json
- generated_at
- signature
replay_runs
- replay_id (pk)
- original_scan_id (fk)
- status (queued|running|passed|failed)
- bf_match bool, sf_match bool, pf_match bool
- mismatch_reason
- diff_summary_json
- started_at, finished_at
- executor_env_json (arch, tz, cpu, image digest)

Indexes

(created_at) for sampling windows
(mismatch_reason, finished_at) for triage
(scanner_version, ecosystem) for breakdown dashboards

2.3 Feed Snapshotting (the key to long-term replay)

Feed bundle format

feeds/<source>/<date>/... inside a tar.zst
manifest file inside bundle: feed_bundle_manifest.json containing:
- source URLs
- retrieval commit/etag (if any)
- file hashes
- generated_by version

Content addressing

Digest of the entire bundle (sha256(tar.zst)) is the reference.
Scans record only the digest + URI.

Immutability

Store bundles in object storage with WORM / retention if you need compliance.

2.4 Replay execution sandbox

For determinism, enforce:

No network (K8s NetworkPolicy, firewall rules, or container runtime flags)
Fixed TZ/locale
Pinned container image digest
Clock policy
- Either “real time but recorded” or “frozen time at original scan timestamp”
- If scanner logic uses current date for severity windows, freeze time

3) Development implementation plan

I’ll lay this out as workstreams + a sprinted plan. You can compress/expand depending on team size.

Workstream A — Scan Manifest & Canonical Artifacts

Goal: every scan is replayable on paper, even before replays run.

Deliverables

ScanManifest v1 schema + writer integrated into scan pipeline
Canonicalization library + canonical hashing for all artifacts

Acceptance criteria

Every scan stores: input digests, scanner digest, policy digest, feed digest placeholders
Artifact hashes are stable across repeated runs in the same environment

Workstream B — Feed Snapshotting & Policy Versioning

Goal: eliminate “feed drift” by pinning immutable inputs.

Deliverables

Feed bundle builder + signer + uploader
Policy/rules bundler (compiled rules bundle, digest recorded)

Acceptance criteria

New scans reference feed bundle digests (not “latest”)
A scan can be re-run with the same feed bundle and policy bundle

Workstream C — Replay Runner & Diff Engine

Goal: execute historical scans and score BF/SF/PF with actionable diffs.

Deliverables

stella replay --from manifest.json
Orchestrator job to schedule replays
Diff engine + mismatch classifier
Storage of replay results

Acceptance criteria

Replay produces deterministic artifacts in a pinned environment
Dashboard/CLI shows BF/SF/PF + diff summary for failures

Workstream D — Observability, Dashboard, and CI Gates

Goal: make fidelity visible and enforceable.

Deliverables

Prometheus metrics: replay_fidelity_bf, replay_fidelity_sf, replay_fidelity_pf
Breakdown labels (scanner, ecosystem, policy_set, base_os)
Alerts for drop thresholds
CI gate option: “block release if BF < threshold on canary set”

Acceptance criteria

Engineering can see drift within 24h
Releases are blocked when fidelity regressions occur

4) Suggested sprint plan with concrete tasks

Sprint 0 — Design lock + baseline

Tasks

Define manifest schema: ScanManifest v1 fields + versioning rules
Decide canonicalization rules (what is normalized vs preserved)
Choose initial “golden canary” scan set (10–20 representative targets)
Add “replay-fidelity” epic with ownership & SLIs/SLOs

Exit criteria

Approved schema + canonicalization spec
Canary set stored and tagged

Sprint 1 — Manifest writer + artifact hashing (MVP)

Tasks

Implement manifest writer in scan pipeline
Store manifest_json + manifest_sha256
Implement canonicalization + hashing for:
- findings list (sorted)
- SBOM (normalized)
- VEX (if present)
Persist canonical hashes in scan_artifacts

Exit criteria

Two identical scans in the same environment yield identical artifact hashes
A “manifest export” endpoint/CLI works:
- stella scan --emit-manifest out.json

Sprint 2 — Feed snapshotter + policy bundling

Tasks

Build feed bundler job:
- pull raw sources
- normalize layout
- generate feed_bundle_manifest.json
- tar.zst + sha256
- upload + record in feed_snapshots
Update scan pipeline:
- resolve feed bundle digest at scan start
- record digest in scan manifest
Bundle policy/lattice:
- compile rules into an immutable artifact
- record policy bundle digest in manifest

Exit criteria

Scans reference immutable feed + policy digests
You can fetch feed bundle by digest and reproduce the same feed inputs

Sprint 3 — Replay executor + “no network” sandbox

Tasks

Create replay container image / runtime wrapper
Implement stella replay --from MANIFEST.json
- pulls scanner image by digest
- mounts feed bundle + policy bundle
- runs in network-off mode
- applies tz/locale + clock mode
Store replay outputs as artifacts (replay_scan_id or replay_id linkage)

Exit criteria

Replay runs end-to-end for canary scans
Deterministic runtime controls verified (no DNS egress, fixed tz)

Sprint 4 — Diff engine + mismatch classification

Tasks

Implement BF compare (canonical hashes)
Implement SF compare (semantic JSON/object comparison)
Implement PF compare (policy decision equivalence)
Implement mismatch classification rules:
- if feed digest differs → feed drift
- if scanner digest differs → scanner drift
- if environment differs → runtime drift
- else → nondeterminism (with sub-tags for ordering/time/RNG)
Generate diff_summary_json:
- top N changed CVEs
- packages added/removed
- policy verdict changes

Exit criteria

Every failed replay has a cause tag and a diff summary that’s useful in <2 minutes
Engineers can reproduce failures locally with the manifest

Sprint 5 — Dashboard + alerts + CI gate

Tasks

Expose Prometheus metrics from replay service
Build dashboard:
- BF/SF/PF trends
- breakdown by ecosystem/scanner/policy
- mismatch cause histogram
Add alerting rules (drop threshold, bucket regression)
Add CI gate mode:
- “run replays on canary set for this release candidate”
- block merge if BF < target

Exit criteria

Fidelity visible to leadership and engineering
Release process is protected by canary replays

Sprint 6 — Hardening + compliance polish

Tasks

Backward compatible manifest upgrades:
- manifest_version bump rules
- migration support
Artifact signing / integrity:
- sign manifest hash
- optional transparency log later
Storage & retention policies (cost controls)
Runbook + oncall playbook

Exit criteria

Audit story is complete: “show me exactly how scan X was produced”
Operational load is manageable and cost-bounded

5) Engineering specs you can start implementing immediately

5.1 `ScanManifest v1` skeleton (example)

{
  "manifest_version": "1.0",
  "scan_id": "scan_123",
  "created_at": "2025-12-12T10:15:30Z",

  "input": {
    "type": "oci_image",
    "image_ref": "registry/app@sha256:...",
    "layers": ["sha256:...", "sha256:..."],
    "source_provenance": {"repo_sha": "abc123", "build_id": "ci-999"}
  },

  "scanner": {
    "engine": "stella",
    "scanner_image_digest": "sha256:...",
    "scanner_version": "2025.12.0",
    "config_digest": "sha256:...",
    "flags": ["--deep", "--vex"]
  },

  "feeds": {
    "vuln_feed_bundle_digest": "sha256:...",
    "license_db_digest": "sha256:..."
  },

  "policy": {
    "policy_bundle_digest": "sha256:...",
    "policy_set": "prod-default"
  },

  "environment": {
    "arch": "amd64",
    "os": "linux",
    "tz": "UTC",
    "locale": "C",
    "network": "disabled",
    "clock_mode": "frozen",
    "clock_value": "2025-12-12T10:15:30Z"
  },

  "normalization": {
    "canonicalizer_version": "1.2.0",
    "sbom_schema": "cyclonedx-1.6",
    "vex_schema": "cyclonedx-vex-1.0"
  }
}

5.2 CLI spec (minimal)

stella scan ... --emit-manifest MANIFEST.json --emit-artifacts-dir out/
stella replay --from MANIFEST.json --out-dir replay_out/
stella diff --a out/ --b replay_out/ --mode bf|sf|pf --json

6) Testing strategy (to prevent determinism regressions)

Unit tests

Canonicalization: same object → same bytes
Sorting stability: randomized input order → stable output
Hash determinism

Integration tests

Golden canaries:
- run scan twice in same runner → BF match
- replay from manifest → BF match
“Network leak” test:
- DNS requests must be zero
“Clock leak” test:
- freeze time; ensure outputs do not include real timestamps

Chaos tests

Vary CPU count, run concurrency, run order → still BF match
Randomized scheduling / thread interleavings to find races

7) Operational policies (so it stays useful)

Retention & cost controls

Keep full artifacts for regulated scans (e.g., 1–7 years)
For non-regulated:
- keep manifests + canonical hashes long-term
- expire heavy evidence blobs after N days
Compress large artifacts and dedupe by digest

Alerting examples

BF drops by ≥2% week-over-week (any major bucket) → warn
BF < 0.90 overall or regulated BF < 0.95 → page / block release

Triage workflow

Failed replay auto-creates a ticket with:
- manifest link
- mismatch_reason
- diff_summary
- reproduction command

8) What “done” looks like (definition of success)

Any customer/auditor can pick a scan from 6 months ago and you can:
1. retrieve manifest + feed bundle + policy bundle by digest
2. replay in a pinned sandbox
3. show BF/SF/PF results and diffs
Engineering sees drift quickly and can attribute it to feed vs scanner vs runtime.

If you want, I can also provide:

a Postgres DDL for the tables above,
a Prometheus metrics contract (names + labels + example queries),
and a diff_summary_json schema that supports a UI “diff view” without reprocessing artifacts.

18 KiB Raw Blame History Unescape Escape

Replay Fidelity (what, why, how)

The metric

What must be captured in the scan manifest

Pass/Fail rules you can ship

Minimal replay harness (outline)

Dashboard (what to show)

Quick wins for Stella Ops

1) Extend the concept: Replay Fidelity as a product capability

1.1 Fidelity levels (so you can be strict without being brittle)

1.2 “Why did it drift?”—Mismatch classification taxonomy

1.3 Deterministic “scan envelope” design

1.4 Canonicalization so “bitwise” is meaningful

1.5 Sampling strategy

2) Technical architecture blueprint

2.1 System components

2.2 Data model (Postgres-friendly)

2.3 Feed Snapshotting (the key to long-term replay)

2.4 Replay execution sandbox

3) Development implementation plan

Workstream A — Scan Manifest & Canonical Artifacts

Workstream B — Feed Snapshotting & Policy Versioning

Workstream C — Replay Runner & Diff Engine

Workstream D — Observability, Dashboard, and CI Gates

4) Suggested sprint plan with concrete tasks

Sprint 0 — Design lock + baseline

Sprint 1 — Manifest writer + artifact hashing (MVP)

Sprint 2 — Feed snapshotter + policy bundling

Sprint 3 — Replay executor + “no network” sandbox

Sprint 4 — Diff engine + mismatch classification

Sprint 5 — Dashboard + alerts + CI gate

Sprint 6 — Hardening + compliance polish

5) Engineering specs you can start implementing immediately

5.1 ScanManifest v1 skeleton (example)

5.2 CLI spec (minimal)

6) Testing strategy (to prevent determinism regressions)

Unit tests

Integration tests

Chaos tests

7) Operational policies (so it stays useful)

Retention & cost controls

Alerting examples

Triage workflow

8) What “done” looks like (definition of success)

18 KiB

Raw Blame History

Quick wins for Stella Ops

5.1 `ScanManifest v1` skeleton (example)