Files
git.stella-ops.org/docs/product-advisories/01-Dec-2025 - Verifiable Proof Spine Receipts and Benchmarks.md
StellaOps Bot 44171930ff
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
feat: Add UI benchmark driver and scenarios for graph interactions
- Introduced `ui_bench_driver.mjs` to read scenarios and fixture manifest, generating a deterministic run plan.
- Created `ui_bench_plan.md` outlining the purpose, scope, and next steps for the benchmark.
- Added `ui_bench_scenarios.json` containing various scenarios for graph UI interactions.
- Implemented tests for CLI commands, ensuring bundle verification and telemetry defaults.
- Developed schemas for orchestrator components, including replay manifests and event envelopes.
- Added mock API for risk management, including listing and statistics functionalities.
- Implemented models for risk profiles and query options to support the new API.
2025-12-02 01:28:17 +02:00

3.9 KiB
Raw Blame History

Verifiable Proof Spine: Receipts + Benchmarks

Compiled: 2025-12-01 (UTC)

Why this matters

Move from “trust the scanner” to “prove every verdict.” Each finding and every “not affected” claim must carry cryptographic, replayable evidence that anyone can verify offline or online.

Differentiators to build in

  • Graph Revision ID on every verdict: stable Merkle root over SBOM nodes/edges, policies, feeds, scan params, and tool versions. Any data change → new graph hash → new revisioned verdicts; surface the ID in UI/API.
  • Machine-verifiable receipts (DSSE): emit a DSSE-wrapped in-toto statement per verdict (predicate stellaops.dev/verdict@v1) including graphRevisionId, artifact digests, rule id/version, inputs, and timestamps; sign with Authority keys (offline mode supported) and keep receipts queryable/exportable; mirror to Rekor-compatible ledger when online.
  • Reachability evidence: attach call-stack slices (entry→sink, symbols, file:line) for code-level cases; for binaries, include symbol presence proofs (bitmap/offsets) hashed and referenced from DSSE payloads.
  • Deterministic replay manifests: publish replay.manifest.json with inputs, feeds, rule/tool/container digests so auditors can recompute the same graph hash and verdicts offline.

Benchmarks to publish (headline KPIs)

  • False-positive reduction vs baseline scanners: run public corpus across 34 popular scanners; label ground truth once; report mean and p95 FP reduction.
  • Proof coverage: percentage of findings/VEX items carrying valid DSSE receipts; break out reachable vs unreachable and “not affected.”
  • Triage time saved: analyst minutes from alert to final disposition with receipts visible vs hidden; publish p50/p95 deltas.
  • Determinism stability: re-run identical scans across nodes; publish % identical graph hashes and explain drift causes when different.

Minimal implementation plan (week-by-week)

  • Week 1 Primitives: add Graph Revision ID generator in scanner.webservice (Merkle over normalized SBOM+edges+policies+toolVersions); define VerdictReceipt schema (protobuf/JSON) and DSSE envelope types.
  • Week 2 Signing + storage: wire DSSE signing via Authority with offline key support/rotation; persist receipts in Receipts table keyed by (graphRevisionId, verdictId); enable JSONL export and ledger mirror.
  • Week 3 Reachability proofs: capture call-stack slices in reachability engine; serialize and hash; add binary symbol proof module (ELF/PE bitmap + digest) and reference from receipts.
  • Week 4 Replay + UX: emit replay.manifest per scan (inputs, tool digests); UI shows “Verified” badge, graph hash, signature issuer, and one-click “Copy receipt”; API: GET /verdicts/{id}/receipt, GET /graphs/{rev}/replay.
  • Week 5 Benchmarks harness: create bench/ fixtures and runner with baseline scanner adapters, ground-truth labels, metrics export for FP%, proof coverage, triage time capture hooks.

Developer guardrails (non-negotiable)

  • No receipt, no ship: any surfaced verdict must carry a DSSE receipt; fail closed otherwise.
  • Schema freeze windows: changes to rule inputs or policy logic must bump rule version and therefore graph hash.
  • Replay-first CI: PRs touching scanning/rules must pass a replay test that reproduces prior graph hashes on gold fixtures.
  • Clock safety: use monotonic time for receipts; include UTC wall-time separately.

What to show buyers/auditors

  • Audit kit: sample container + receipts + replay manifest + one command to reproduce the same graph hash.
  • One-page benchmark readout: FP reduction, proof coverage, triage time saved (p50/p95), corpus description.

Optional follow-ons

  • Provide DSSE predicate schema, Postgres DDL for Receipts and Graphs, and a minimal .NET verification CLI (stellaops-verify) that replays a manifest and validates signatures.