up

2025-11-26 20:23:28 +02:00
parent 4831c7fcb0
commit d63af51f84
139 changed files with 8010 additions and 2795 deletions
--- a/docs/benchmarks/signals/bench-determinism.md
+++ b/docs/benchmarks/signals/bench-determinism.md
@@ -1,4 +1,4 @@
-# Determinism Benchmark (cross-scanner) — Draft
+# Determinism Benchmark (cross-scanner) — Stable (2025-11)

 Source: advisory “23-Nov-2025 - Benchmarking Determinism in Vulnerability Scoring”. This doc captures the runnable harness pattern and expected outputs for task BENCH-DETERMINISM-401-057.

@@ -34,6 +34,7 @@ for sbom, vex in zip(SBOMS, VEXES):
 - 3–5 SBOMs (CycloneDX 1.6 / SPDX 3.0.1) + matching VEX docs covering affected/not_affected/fixed.
 - Feeds bundle: vendor DBs (NVD, GHSA, OVAL) hashed and frozen.
 - Policy: single normalization profile (e.g., prefer vendor scores, CVSS v3.1).
+- Reachability dataset (optional combined run): `tests/reachability/samples-public` corpus; graphs produced via `stella graph lift` for each language sample; runtime traces optional.

 ## Metrics
 - Determinism rate = identical_hash_runs / total_runs (20 per scanner/SBOM).
@@ -42,8 +43,51 @@ for sbom, vex in zip(SBOMS, VEXES):

 ## Deliverables
 - `bench/determinism/` with harness, hashed inputs, and `results.csv`.
- CI target `bench:determinism` producing determinism% and σ per scanner.
+- `bench/determinism/inputs.sha256` listing SBOM, VEX, feed bundle hashes (deterministic ordering).
+- `bench/reachability/dataset.sha256` listing reachability corpus inputs (graphs, runtime traces) when running combined bench.
+- CI target `bench:determinism` producing determinism% and σ per scanner; optional `bench:reachability` to recompute graph hash and runtime hit stability.

 ## Links
 - Source advisory: `docs/product-advisories/23-Nov-2025 - Benchmarking Determinism in Vulnerability Scoring.md`
 - Sprint task: BENCH-DETERMINISM-401-057 (SPRINT_0401_0001_0001_reachability_evidence_chain.md)
+
+---
+
+## How to run (local)
+
+```sh
+cd bench/determinism
+python3 -m venv .venv && source .venv/bin/activate
+pip install -r requirements.txt
+
+# Freeze feeds and policy hashes
+./freeze_feeds.sh ../feeds/bundle.tar.gz > inputs.sha256
+
+# Run determinism bench
+python run_bench.py --sboms inputs/sboms/*.json --vex inputs/vex/*.json \
+  --scanners configs/scanners.yaml --runs 20 --shuffle
+
+# Reachability dataset (optional)
+python run_reachability.py --graphs ../reachability/graphs/*.json \
+  --runtime ../reachability/runtime/*.ndjson.gz --output results-reach.csv
+```
+
+Outputs are written to `results.csv` (determinism) and `results-reach.csv` (reachability stability) plus SHA manifests.
+
+## How to run (CI)
+
+- Target `bench:determinism` in CI (see `.gitea/workflows/bench-determinism.yml`) runs the harness with frozen feeds and uploads `results.csv` + `inputs.sha256` as artifacts.
+- Optional `bench:reachability` target replays reachability corpus, recomputes graph hashes, and compares against expected `dataset.sha256`.
+- CI must fail if determinism rate < 0.95 or any graph hash mismatch.
+
+## Offline/air-gap workflow
+
+1. Place feeds bundle, SBOMs, VEX, and reachability corpus under `offline/inputs/` with matching `inputs.sha256` and `dataset.sha256`.
+2. Run `./offline_run.sh --inputs offline/inputs --outputs offline/results` to execute both benches without network.
+3. Verify hashes: `sha256sum -c offline/inputs/inputs.sha256` and `sha256sum -c offline/inputs/dataset.sha256`.
+4. Store outputs plus manifests in Offline Kit; include DSSE envelope if signing is enabled (`./sign_results.sh`).
+
+## Notes
+- Keep file ordering deterministic (lexicographic) when generating manifests.
+- Do not pull live feeds during bench runs; use frozen bundles only.
+- For reachability runs, require symbol manifest availability; otherwise mark missing symbols and fail the run.