# Determinism Benchmark (cross-scanner) — Draft Source: advisory “23-Nov-2025 - Benchmarking Determinism in Vulnerability Scoring”. This doc captures the runnable harness pattern and expected outputs for task BENCH-DETERMINISM-401-057. ## Goal - Measure determinism rate, order-invariance, and CVSS delta σ across scanners when fed identical SBOM+VEX inputs with frozen feeds. ## Minimal harness (Python excerpt) ```python # run_bench.py (excerpt) — deterministic JSON hashing def canon(obj): return json.dumps(obj, sort_keys=True, separators=(',', ':')).encode() def shas(b): return hashlib.sha256(b).hexdigest() for sbom, vex in zip(SBOMS, VEXES): for scanner, tmpl in SCANNERS.items(): for mode in ("canonical", "shuffled"): for i in range(10): if mode == "shuffled": sb, vx = shuffle(sbom), shuffle(vex) out = run(tmpl.format(sbom=sb, vex=vx)) norm = normalize(out) # purl, vuln id, base_cvss, effective blob = canon({"scanner": scanner, "sbom": sbom, "vex": vex, "findings": norm}) results.append({ "hash": shas(blob), "mode": mode, "run": i, "scanner": scanner, "sbom": sbom }) ``` ## Inputs - 3–5 SBOMs (CycloneDX 1.6 / SPDX 3.0.1) + matching VEX docs covering affected/not_affected/fixed. - Feeds bundle: vendor DBs (NVD, GHSA, OVAL) hashed and frozen. - Policy: single normalization profile (e.g., prefer vendor scores, CVSS v3.1). ## Metrics - Determinism rate = identical_hash_runs / total_runs (20 per scanner/SBOM). - Order-invariance failures (# distinct hashes between canonical vs shuffled). - CVSS delta σ vs reference; VEX stability (σ_after ≤ σ_before). ## Deliverables - `bench/determinism/` with harness, hashed inputs, and `results.csv`. - CI target `bench:determinism` producing determinism% and σ per scanner. ## Links - Source advisory: `docs/product-advisories/23-Nov-2025 - Benchmarking Determinism in Vulnerability Scoring.md` - Sprint task: BENCH-DETERMINISM-401-057 (SPRINT_0401_0001_0001_reachability_evidence_chain.md)