feat: Add UI benchmark driver and scenarios for graph interactions

- Introduced `ui_bench_driver.mjs` to read scenarios and fixture manifest, generating a deterministic run plan. - Created `ui_bench_plan.md` outlining the purpose, scope, and next steps for the benchmark. - Added `ui_bench_scenarios.json` containing various scenarios for graph UI interactions. - Implemented tests for CLI commands, ensuring bundle verification and telemetry defaults. - Developed schemas for orchestrator components, including replay manifests and event envelopes. - Added mock API for risk management, including listing and statistics functionalities. - Implemented models for risk profiles and query options to support the new API.
2025-12-02 01:28:17 +02:00
parent 909d9b6220
commit 44171930ff
94 changed files with 3606 additions and 271 deletions
--- a/docs/product-advisories/01-Dec-2025
+++ b/docs/product-advisories/01-Dec-2025
@@ -0,0 +1,38 @@
+# Verifiable Proof Spine: Receipts + Benchmarks
+
+Compiled: 2025-12-01 (UTC)
+
+## Why this matters
+Move from “trust the scanner” to “prove every verdict.” Each finding and every “not affected” claim must carry cryptographic, replayable evidence that anyone can verify offline or online.
+
+## Differentiators to build in
+- **Graph Revision ID on every verdict**: stable Merkle root over SBOM nodes/edges, policies, feeds, scan params, and tool versions. Any data change → new graph hash → new revisioned verdicts; surface the ID in UI/API.
+- **Machine-verifiable receipts (DSSE)**: emit a DSSE-wrapped in-toto statement per verdict (predicate `stellaops.dev/verdict@v1`) including graphRevisionId, artifact digests, rule id/version, inputs, and timestamps; sign with Authority keys (offline mode supported) and keep receipts queryable/exportable; mirror to Rekor-compatible ledger when online.
+- **Reachability evidence**: attach call-stack slices (entry→sink, symbols, file:line) for code-level cases; for binaries, include symbol presence proofs (bitmap/offsets) hashed and referenced from DSSE payloads.
+- **Deterministic replay manifests**: publish `replay.manifest.json` with inputs, feeds, rule/tool/container digests so auditors can recompute the same graph hash and verdicts offline.
+
+## Benchmarks to publish (headline KPIs)
+- **False-positive reduction vs baseline scanners**: run public corpus across 3–4 popular scanners; label ground truth once; report mean and p95 FP reduction.
+- **Proof coverage**: percentage of findings/VEX items carrying valid DSSE receipts; break out reachable vs unreachable and “not affected.”
+- **Triage time saved**: analyst minutes from alert to final disposition with receipts visible vs hidden; publish p50/p95 deltas.
+- **Determinism stability**: re-run identical scans across nodes; publish % identical graph hashes and explain drift causes when different.
+
+## Minimal implementation plan (week-by-week)
+- **Week 1 – Primitives**: add Graph Revision ID generator in scanner.webservice (Merkle over normalized SBOM+edges+policies+toolVersions); define `VerdictReceipt` schema (protobuf/JSON) and DSSE envelope types.
+- **Week 2 – Signing + storage**: wire DSSE signing via Authority with offline key support/rotation; persist receipts in `Receipts` table keyed by (graphRevisionId, verdictId); enable JSONL export and ledger mirror.
+- **Week 3 – Reachability proofs**: capture call-stack slices in reachability engine; serialize and hash; add binary symbol proof module (ELF/PE bitmap + digest) and reference from receipts.
+- **Week 4 – Replay + UX**: emit replay.manifest per scan (inputs, tool digests); UI shows “Verified” badge, graph hash, signature issuer, and one-click “Copy receipt”; API: `GET /verdicts/{id}/receipt`, `GET /graphs/{rev}/replay`.
+- **Week 5 – Benchmarks harness**: create `bench/` fixtures and runner with baseline scanner adapters, ground-truth labels, metrics export for FP%, proof coverage, triage time capture hooks.
+
+## Developer guardrails (non-negotiable)
+- **No receipt, no ship**: any surfaced verdict must carry a DSSE receipt; fail closed otherwise.
+- **Schema freeze windows**: changes to rule inputs or policy logic must bump rule version and therefore graph hash.
+- **Replay-first CI**: PRs touching scanning/rules must pass a replay test that reproduces prior graph hashes on gold fixtures.
+- **Clock safety**: use monotonic time for receipts; include UTC wall-time separately.
+
+## What to show buyers/auditors
+- Audit kit: sample container + receipts + replay manifest + one command to reproduce the same graph hash.
+- One-page benchmark readout: FP reduction, proof coverage, triage time saved (p50/p95), corpus description.
+
+## Optional follow-ons
+- Provide DSSE predicate schema, Postgres DDL for `Receipts` and `Graphs`, and a minimal .NET verification CLI (`stellaops-verify`) that replays a manifest and validates signatures.