89 lines
5.1 KiB
Markdown
89 lines
5.1 KiB
Markdown
# Determinism Benchmark (cross-scanner) — Stable (2025-11)
|
||
|
||
Source: advisory “23-Nov-2025 - Benchmarking Determinism in Vulnerability Scoring”. This doc captures the runnable harness pattern and expected outputs for task BENCH-DETERMINISM-401-057.
|
||
|
||
## Goal
|
||
- Measure determinism rate, order-invariance, and CVSS delta σ across scanners when fed identical SBOM+VEX inputs with frozen feeds.
|
||
|
||
## Minimal harness (Python excerpt)
|
||
```python
|
||
# run_bench.py (excerpt) — deterministic JSON hashing
|
||
def canon(obj):
|
||
return json.dumps(obj, sort_keys=True, separators=(',', ':')).encode()
|
||
|
||
def shas(b):
|
||
return hashlib.sha256(b).hexdigest()
|
||
|
||
for sbom, vex in zip(SBOMS, VEXES):
|
||
for scanner, tmpl in SCANNERS.items():
|
||
for mode in ("canonical", "shuffled"):
|
||
for i in range(10):
|
||
if mode == "shuffled":
|
||
sb, vx = shuffle(sbom), shuffle(vex)
|
||
out = run(tmpl.format(sbom=sb, vex=vx))
|
||
norm = normalize(out) # purl, vuln id, base_cvss, effective
|
||
blob = canon({"scanner": scanner, "sbom": sbom,
|
||
"vex": vex, "findings": norm})
|
||
results.append({
|
||
"hash": shas(blob), "mode": mode,
|
||
"run": i, "scanner": scanner, "sbom": sbom
|
||
})
|
||
```
|
||
|
||
## Inputs
|
||
- 3–5 SBOMs (CycloneDX 1.6 / SPDX 3.0.1) + matching VEX docs covering affected/not_affected/fixed.
|
||
- Feeds bundle: vendor DBs (NVD, GHSA, OVAL) hashed and frozen.
|
||
- Policy: single normalization profile (e.g., prefer vendor scores, CVSS v3.1).
|
||
- Reachability dataset (optional combined run): `tests/reachability/samples-public` corpus; graphs produced via `stella graph lift` for each language sample; runtime traces optional.
|
||
|
||
## Metrics
|
||
- Determinism rate = identical_hash_runs / total_runs (20 per scanner/SBOM).
|
||
- Order-invariance failures (# distinct hashes between canonical vs shuffled).
|
||
- CVSS delta σ vs reference; VEX stability (σ_after ≤ σ_before).
|
||
|
||
## Deliverables
|
||
- Harness at `src/Bench/StellaOps.Bench/Determinism` (offline-friendly mock scanner included).
|
||
- `results/*.csv` with per-run hashes plus `summary.json` determinism rate.
|
||
- `results/inputs.sha256` listing SBOM, VEX, and config hashes (deterministic ordering).
|
||
- `bench/reachability/dataset.sha256` listing reachability corpus inputs (graphs, runtime traces) when running combined bench.
|
||
- CI target `bench:determinism` producing determinism% and σ per scanner; optional `bench:reachability` to recompute graph hash and runtime hit stability.
|
||
|
||
## Links
|
||
- Source advisory: `docs/product-advisories/23-Nov-2025 - Benchmarking Determinism in Vulnerability Scoring.md`
|
||
- Sprint task: BENCH-DETERMINISM-401-057 (SPRINT_0401_0001_0001_reachability_evidence_chain.md)
|
||
|
||
---
|
||
|
||
## How to run (local)
|
||
|
||
```sh
|
||
cd src/Bench/StellaOps.Bench/Determinism
|
||
|
||
# Run determinism bench (uses built-in mock scanner by default; defaults to 10 runs)
|
||
python run_bench.py --sboms inputs/sboms/*.json --vex inputs/vex/*.json \
|
||
--config configs/scanners.json --shuffle --output results
|
||
|
||
# Reachability dataset (optional)
|
||
python run_reachability.py --graphs inputs/graphs/*.json \
|
||
--runtime inputs/runtime/*.ndjson --output results
|
||
```
|
||
|
||
Outputs are written to `results.csv` (determinism), `results-reach.csv`/`results-reach.json` (reachability hashes), and manifests `inputs.sha256` + `dataset.sha256` (if reachability). Feed bundle hashes live in the same manifest when provided via `DET_EXTRA_INPUTS`.
|
||
|
||
## How to run (CI)
|
||
|
||
- Workflow `.gitea/workflows/bench-determinism.yml` calls `scripts/bench/determinism-run.sh`, which runs the harness with the bundled mock scanner and uploads `out/bench-determinism/**` (results, manifests, summary). Set `DET_EXTRA_INPUTS` to include frozen feed bundles in `inputs.sha256`; optional `DET_REACH_GRAPHS`/`DET_REACH_RUNTIME` adds reachability hashes + `dataset.sha256`.
|
||
- Optional `bench:reachability` target (future) will replay reachability corpus, recompute graph hashes, and compare against expected `dataset.sha256`.
|
||
- CI fails when `determinism_rate` < `BENCH_DETERMINISM_THRESHOLD` (defaults to 0.95; set via env in the workflow).
|
||
|
||
## Offline/air-gap workflow
|
||
|
||
1. Place feeds bundle (see `src/Bench/StellaOps.Bench/Determinism/inputs/feeds/README.md`), SBOMs, VEX, and optional reachability corpus under `offline/inputs/` with matching `inputs.sha256` and (if reachability) `dataset.sha256`. A sample `inputs/inputs.sha256` is provided for the bundled demo SBOM/VEX/config.
|
||
2. Run `./offline_run.sh --inputs offline/inputs --output offline/results` (script lives under `src/Bench/StellaOps.Bench/Determinism`) to execute benches without network (defaults: runs=10, threshold=0.95; manifest verification on). Use `--no-verify` to skip hash checks if manifests are absent.
|
||
3. Store outputs plus manifests in Offline Kit; include DSSE envelope if signing is enabled (`./sign_results.sh`).
|
||
|
||
## Notes
|
||
- Keep file ordering deterministic (lexicographic) when generating manifests.
|
||
- Do not pull live feeds during bench runs; use frozen bundles only.
|
||
- For reachability runs, require symbol manifest availability; otherwise mark missing symbols and fail the run.
|