# Determinism Benchmark (cross-scanner) — Stable (2025-11) Source: advisory “23-Nov-2025 - Benchmarking Determinism in Vulnerability Scoring”. This doc captures the runnable harness pattern and expected outputs for task BENCH-DETERMINISM-401-057. ## Goal - Measure determinism rate, order-invariance, and CVSS delta σ across scanners when fed identical SBOM+VEX inputs with frozen feeds. ## Minimal harness (Python excerpt) ```python # run_bench.py (excerpt) — deterministic JSON hashing def canon(obj): return json.dumps(obj, sort_keys=True, separators=(',', ':')).encode() def shas(b): return hashlib.sha256(b).hexdigest() for sbom, vex in zip(SBOMS, VEXES): for scanner, tmpl in SCANNERS.items(): for mode in ("canonical", "shuffled"): for i in range(10): if mode == "shuffled": sb, vx = shuffle(sbom), shuffle(vex) out = run(tmpl.format(sbom=sb, vex=vx)) norm = normalize(out) # purl, vuln id, base_cvss, effective blob = canon({"scanner": scanner, "sbom": sbom, "vex": vex, "findings": norm}) results.append({ "hash": shas(blob), "mode": mode, "run": i, "scanner": scanner, "sbom": sbom }) ``` ## Inputs - 3–5 SBOMs (CycloneDX 1.6 / SPDX 3.0.1) + matching VEX docs covering affected/not_affected/fixed. - Feeds bundle: vendor DBs (NVD, GHSA, OVAL) hashed and frozen. - Policy: single normalization profile (e.g., prefer vendor scores, CVSS v3.1). - Reachability dataset (optional combined run): `tests/reachability/samples-public` corpus; graphs produced via `stella graph lift` for each language sample; runtime traces optional. ## Metrics - Determinism rate = identical_hash_runs / total_runs (20 per scanner/SBOM). - Order-invariance failures (# distinct hashes between canonical vs shuffled). - CVSS delta σ vs reference; VEX stability (σ_after ≤ σ_before). ## Deliverables - Harness at `src/Bench/StellaOps.Bench/Determinism` (offline-friendly mock scanner included). - `results/*.csv` with per-run hashes plus `summary.json` determinism rate. - `results/inputs.sha256` listing SBOM, VEX, and config hashes (deterministic ordering). - `bench/reachability/dataset.sha256` listing reachability corpus inputs (graphs, runtime traces) when running combined bench. - CI target `bench:determinism` producing determinism% and σ per scanner; optional `bench:reachability` to recompute graph hash and runtime hit stability. ## Links - Source advisory: `docs/product-advisories/23-Nov-2025 - Benchmarking Determinism in Vulnerability Scoring.md` - Sprint task: BENCH-DETERMINISM-401-057 (SPRINT_0401_0001_0001_reachability_evidence_chain.md) --- ## How to run (local) ```sh cd src/Bench/StellaOps.Bench/Determinism # Run determinism bench (uses built-in mock scanner by default; defaults to 10 runs) python run_bench.py --sboms inputs/sboms/*.json --vex inputs/vex/*.json \ --config configs/scanners.json --shuffle --output results # Reachability dataset (optional) python run_reachability.py --graphs ../reachability/graphs/*.json \ --runtime ../reachability/runtime/*.ndjson.gz --output results-reach.csv ``` Outputs are written to `results.csv` (determinism) and `results-reach.csv` (reachability stability) plus SHA manifests. ## How to run (CI) - Workflow `.gitea/workflows/bench-determinism.yml` calls `scripts/bench/determinism-run.sh`, which runs the harness with the bundled mock scanner and uploads `out/bench-determinism/**` (results, manifests, summary). Set `DET_EXTRA_INPUTS` to include frozen feed bundles in `inputs.sha256`. - Optional `bench:reachability` target (future) will replay reachability corpus, recompute graph hashes, and compare against expected `dataset.sha256`. - CI fails when `determinism_rate` < `BENCH_DETERMINISM_THRESHOLD` (defaults to 0.95; set via env in the workflow). ## Offline/air-gap workflow 1. Place feeds bundle, SBOMs, VEX, and reachability corpus under `offline/inputs/` with matching `inputs.sha256` and `dataset.sha256`. 2. Run `./offline_run.sh --inputs offline/inputs --outputs offline/results` to execute both benches without network. 3. Verify hashes: `sha256sum -c offline/inputs/inputs.sha256` and `sha256sum -c offline/inputs/dataset.sha256`. 4. Store outputs plus manifests in Offline Kit; include DSSE envelope if signing is enabled (`./sign_results.sh`). ## Notes - Keep file ordering deterministic (lexicographic) when generating manifests. - Do not pull live feeds during bench runs; use frozen bundles only. - For reachability runs, require symbol manifest availability; otherwise mark missing symbols and fail the run.