feat: Add UI benchmark driver and scenarios for graph interactions
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
- Introduced `ui_bench_driver.mjs` to read scenarios and fixture manifest, generating a deterministic run plan. - Created `ui_bench_plan.md` outlining the purpose, scope, and next steps for the benchmark. - Added `ui_bench_scenarios.json` containing various scenarios for graph UI interactions. - Implemented tests for CLI commands, ensuring bundle verification and telemetry defaults. - Developed schemas for orchestrator components, including replay manifests and event envelopes. - Added mock API for risk management, including listing and statistics functionalities. - Implemented models for risk profiles and query options to support the new API.
This commit is contained in:
@@ -0,0 +1,38 @@
|
||||
# Verifiable Proof Spine: Receipts + Benchmarks
|
||||
|
||||
Compiled: 2025-12-01 (UTC)
|
||||
|
||||
## Why this matters
|
||||
Move from “trust the scanner” to “prove every verdict.” Each finding and every “not affected” claim must carry cryptographic, replayable evidence that anyone can verify offline or online.
|
||||
|
||||
## Differentiators to build in
|
||||
- **Graph Revision ID on every verdict**: stable Merkle root over SBOM nodes/edges, policies, feeds, scan params, and tool versions. Any data change → new graph hash → new revisioned verdicts; surface the ID in UI/API.
|
||||
- **Machine-verifiable receipts (DSSE)**: emit a DSSE-wrapped in-toto statement per verdict (predicate `stellaops.dev/verdict@v1`) including graphRevisionId, artifact digests, rule id/version, inputs, and timestamps; sign with Authority keys (offline mode supported) and keep receipts queryable/exportable; mirror to Rekor-compatible ledger when online.
|
||||
- **Reachability evidence**: attach call-stack slices (entry→sink, symbols, file:line) for code-level cases; for binaries, include symbol presence proofs (bitmap/offsets) hashed and referenced from DSSE payloads.
|
||||
- **Deterministic replay manifests**: publish `replay.manifest.json` with inputs, feeds, rule/tool/container digests so auditors can recompute the same graph hash and verdicts offline.
|
||||
|
||||
## Benchmarks to publish (headline KPIs)
|
||||
- **False-positive reduction vs baseline scanners**: run public corpus across 3–4 popular scanners; label ground truth once; report mean and p95 FP reduction.
|
||||
- **Proof coverage**: percentage of findings/VEX items carrying valid DSSE receipts; break out reachable vs unreachable and “not affected.”
|
||||
- **Triage time saved**: analyst minutes from alert to final disposition with receipts visible vs hidden; publish p50/p95 deltas.
|
||||
- **Determinism stability**: re-run identical scans across nodes; publish % identical graph hashes and explain drift causes when different.
|
||||
|
||||
## Minimal implementation plan (week-by-week)
|
||||
- **Week 1 – Primitives**: add Graph Revision ID generator in scanner.webservice (Merkle over normalized SBOM+edges+policies+toolVersions); define `VerdictReceipt` schema (protobuf/JSON) and DSSE envelope types.
|
||||
- **Week 2 – Signing + storage**: wire DSSE signing via Authority with offline key support/rotation; persist receipts in `Receipts` table keyed by (graphRevisionId, verdictId); enable JSONL export and ledger mirror.
|
||||
- **Week 3 – Reachability proofs**: capture call-stack slices in reachability engine; serialize and hash; add binary symbol proof module (ELF/PE bitmap + digest) and reference from receipts.
|
||||
- **Week 4 – Replay + UX**: emit replay.manifest per scan (inputs, tool digests); UI shows “Verified” badge, graph hash, signature issuer, and one-click “Copy receipt”; API: `GET /verdicts/{id}/receipt`, `GET /graphs/{rev}/replay`.
|
||||
- **Week 5 – Benchmarks harness**: create `bench/` fixtures and runner with baseline scanner adapters, ground-truth labels, metrics export for FP%, proof coverage, triage time capture hooks.
|
||||
|
||||
## Developer guardrails (non-negotiable)
|
||||
- **No receipt, no ship**: any surfaced verdict must carry a DSSE receipt; fail closed otherwise.
|
||||
- **Schema freeze windows**: changes to rule inputs or policy logic must bump rule version and therefore graph hash.
|
||||
- **Replay-first CI**: PRs touching scanning/rules must pass a replay test that reproduces prior graph hashes on gold fixtures.
|
||||
- **Clock safety**: use monotonic time for receipts; include UTC wall-time separately.
|
||||
|
||||
## What to show buyers/auditors
|
||||
- Audit kit: sample container + receipts + replay manifest + one command to reproduce the same graph hash.
|
||||
- One-page benchmark readout: FP reduction, proof coverage, triage time saved (p50/p95), corpus description.
|
||||
|
||||
## Optional follow-ons
|
||||
- Provide DSSE predicate schema, Postgres DDL for `Receipts` and `Graphs`, and a minimal .NET verification CLI (`stellaops-verify`) that replays a manifest and validates signatures.
|
||||
Reference in New Issue
Block a user