feat: Add UI benchmark driver and scenarios for graph interactions
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled

- Introduced `ui_bench_driver.mjs` to read scenarios and fixture manifest, generating a deterministic run plan.
- Created `ui_bench_plan.md` outlining the purpose, scope, and next steps for the benchmark.
- Added `ui_bench_scenarios.json` containing various scenarios for graph UI interactions.
- Implemented tests for CLI commands, ensuring bundle verification and telemetry defaults.
- Developed schemas for orchestrator components, including replay manifests and event envelopes.
- Added mock API for risk management, including listing and statistics functionalities.
- Implemented models for risk profiles and query options to support the new API.
This commit is contained in:
StellaOps Bot
2025-12-02 01:28:17 +02:00
parent 909d9b6220
commit 44171930ff
94 changed files with 3606 additions and 271 deletions

View File

@@ -0,0 +1,38 @@
# Verifiable Proof Spine: Receipts + Benchmarks
Compiled: 2025-12-01 (UTC)
## Why this matters
Move from “trust the scanner” to “prove every verdict.” Each finding and every “not affected” claim must carry cryptographic, replayable evidence that anyone can verify offline or online.
## Differentiators to build in
- **Graph Revision ID on every verdict**: stable Merkle root over SBOM nodes/edges, policies, feeds, scan params, and tool versions. Any data change → new graph hash → new revisioned verdicts; surface the ID in UI/API.
- **Machine-verifiable receipts (DSSE)**: emit a DSSE-wrapped in-toto statement per verdict (predicate `stellaops.dev/verdict@v1`) including graphRevisionId, artifact digests, rule id/version, inputs, and timestamps; sign with Authority keys (offline mode supported) and keep receipts queryable/exportable; mirror to Rekor-compatible ledger when online.
- **Reachability evidence**: attach call-stack slices (entry→sink, symbols, file:line) for code-level cases; for binaries, include symbol presence proofs (bitmap/offsets) hashed and referenced from DSSE payloads.
- **Deterministic replay manifests**: publish `replay.manifest.json` with inputs, feeds, rule/tool/container digests so auditors can recompute the same graph hash and verdicts offline.
## Benchmarks to publish (headline KPIs)
- **False-positive reduction vs baseline scanners**: run public corpus across 34 popular scanners; label ground truth once; report mean and p95 FP reduction.
- **Proof coverage**: percentage of findings/VEX items carrying valid DSSE receipts; break out reachable vs unreachable and “not affected.”
- **Triage time saved**: analyst minutes from alert to final disposition with receipts visible vs hidden; publish p50/p95 deltas.
- **Determinism stability**: re-run identical scans across nodes; publish % identical graph hashes and explain drift causes when different.
## Minimal implementation plan (week-by-week)
- **Week 1 Primitives**: add Graph Revision ID generator in scanner.webservice (Merkle over normalized SBOM+edges+policies+toolVersions); define `VerdictReceipt` schema (protobuf/JSON) and DSSE envelope types.
- **Week 2 Signing + storage**: wire DSSE signing via Authority with offline key support/rotation; persist receipts in `Receipts` table keyed by (graphRevisionId, verdictId); enable JSONL export and ledger mirror.
- **Week 3 Reachability proofs**: capture call-stack slices in reachability engine; serialize and hash; add binary symbol proof module (ELF/PE bitmap + digest) and reference from receipts.
- **Week 4 Replay + UX**: emit replay.manifest per scan (inputs, tool digests); UI shows “Verified” badge, graph hash, signature issuer, and one-click “Copy receipt”; API: `GET /verdicts/{id}/receipt`, `GET /graphs/{rev}/replay`.
- **Week 5 Benchmarks harness**: create `bench/` fixtures and runner with baseline scanner adapters, ground-truth labels, metrics export for FP%, proof coverage, triage time capture hooks.
## Developer guardrails (non-negotiable)
- **No receipt, no ship**: any surfaced verdict must carry a DSSE receipt; fail closed otherwise.
- **Schema freeze windows**: changes to rule inputs or policy logic must bump rule version and therefore graph hash.
- **Replay-first CI**: PRs touching scanning/rules must pass a replay test that reproduces prior graph hashes on gold fixtures.
- **Clock safety**: use monotonic time for receipts; include UTC wall-time separately.
## What to show buyers/auditors
- Audit kit: sample container + receipts + replay manifest + one command to reproduce the same graph hash.
- One-page benchmark readout: FP reduction, proof coverage, triage time saved (p50/p95), corpus description.
## Optional follow-ons
- Provide DSSE predicate schema, Postgres DDL for `Receipts` and `Graphs`, and a minimal .NET verification CLI (`stellaops-verify`) that replays a manifest and validates signatures.