5.1 KiB
5.1 KiB
Determinism Benchmark (cross-scanner) — Stable (2025-11)
Source: advisory “23-Nov-2025 - Benchmarking Determinism in Vulnerability Scoring”. This doc captures the runnable harness pattern and expected outputs for task BENCH-DETERMINISM-401-057.
Goal
- Measure determinism rate, order-invariance, and CVSS delta σ across scanners when fed identical SBOM+VEX inputs with frozen feeds.
Minimal harness (Python excerpt)
# run_bench.py (excerpt) — deterministic JSON hashing
def canon(obj):
return json.dumps(obj, sort_keys=True, separators=(',', ':')).encode()
def shas(b):
return hashlib.sha256(b).hexdigest()
for sbom, vex in zip(SBOMS, VEXES):
for scanner, tmpl in SCANNERS.items():
for mode in ("canonical", "shuffled"):
for i in range(10):
if mode == "shuffled":
sb, vx = shuffle(sbom), shuffle(vex)
out = run(tmpl.format(sbom=sb, vex=vx))
norm = normalize(out) # purl, vuln id, base_cvss, effective
blob = canon({"scanner": scanner, "sbom": sbom,
"vex": vex, "findings": norm})
results.append({
"hash": shas(blob), "mode": mode,
"run": i, "scanner": scanner, "sbom": sbom
})
Inputs
- 3–5 SBOMs (CycloneDX 1.6 / SPDX 3.0.1) + matching VEX docs covering affected/not_affected/fixed.
- Feeds bundle: vendor DBs (NVD, GHSA, OVAL) hashed and frozen.
- Policy: single normalization profile (e.g., prefer vendor scores, CVSS v3.1).
- Reachability dataset (optional combined run):
tests/reachability/samples-publiccorpus; graphs produced viastella graph liftfor each language sample; runtime traces optional.
Metrics
- Determinism rate = identical_hash_runs / total_runs (20 per scanner/SBOM).
- Order-invariance failures (# distinct hashes between canonical vs shuffled).
- CVSS delta σ vs reference; VEX stability (σ_after ≤ σ_before).
Deliverables
- Harness at
src/Bench/StellaOps.Bench/Determinism(offline-friendly mock scanner included). results/*.csvwith per-run hashes plussummary.jsondeterminism rate.results/inputs.sha256listing SBOM, VEX, and config hashes (deterministic ordering).bench/reachability/dataset.sha256listing reachability corpus inputs (graphs, runtime traces) when running combined bench.- CI target
bench:determinismproducing determinism% and σ per scanner; optionalbench:reachabilityto recompute graph hash and runtime hit stability.
Links
- Source advisory:
docs/product-advisories/23-Nov-2025 - Benchmarking Determinism in Vulnerability Scoring.md - Sprint task: BENCH-DETERMINISM-401-057 (SPRINT_0401_0001_0001_reachability_evidence_chain.md)
How to run (local)
cd src/Bench/StellaOps.Bench/Determinism
# Run determinism bench (uses built-in mock scanner by default; defaults to 10 runs)
python run_bench.py --sboms inputs/sboms/*.json --vex inputs/vex/*.json \
--config configs/scanners.json --shuffle --output results
# Reachability dataset (optional)
python run_reachability.py --graphs inputs/graphs/*.json \
--runtime inputs/runtime/*.ndjson --output results
Outputs are written to results.csv (determinism), results-reach.csv/results-reach.json (reachability hashes), and manifests inputs.sha256 + dataset.sha256 (if reachability). Feed bundle hashes live in the same manifest when provided via DET_EXTRA_INPUTS.
How to run (CI)
- Workflow
.gitea/workflows/bench-determinism.ymlcallsscripts/bench/determinism-run.sh, which runs the harness with the bundled mock scanner and uploadsout/bench-determinism/**(results, manifests, summary). SetDET_EXTRA_INPUTSto include frozen feed bundles ininputs.sha256; optionalDET_REACH_GRAPHS/DET_REACH_RUNTIMEadds reachability hashes +dataset.sha256. - Optional
bench:reachabilitytarget (future) will replay reachability corpus, recompute graph hashes, and compare against expecteddataset.sha256. - CI fails when
determinism_rate<BENCH_DETERMINISM_THRESHOLD(defaults to 0.95; set via env in the workflow).
Offline/air-gap workflow
- Place feeds bundle (see
src/Bench/StellaOps.Bench/Determinism/inputs/feeds/README.md), SBOMs, VEX, and optional reachability corpus underoffline/inputs/with matchinginputs.sha256and (if reachability)dataset.sha256. A sampleinputs/inputs.sha256is provided for the bundled demo SBOM/VEX/config. - Run
./offline_run.sh --inputs offline/inputs --output offline/results(script lives undersrc/Bench/StellaOps.Bench/Determinism) to execute benches without network (defaults: runs=10, threshold=0.95; manifest verification on). Use--no-verifyto skip hash checks if manifests are absent. - Store outputs plus manifests in Offline Kit; include DSSE envelope if signing is enabled (
./sign_results.sh).
Notes
- Keep file ordering deterministic (lexicographic) when generating manifests.
- Do not pull live feeds during bench runs; use frozen bundles only.
- For reachability runs, require symbol manifest availability; otherwise mark missing symbols and fail the run.