Files
git.stella-ops.org/docs/benchmarks/signals/bench-determinism.md
StellaOps Bot ea970ead2a
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
sdk-generator-smoke / sdk-smoke (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
up
2025-11-27 07:46:56 +02:00

90 lines
4.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Determinism Benchmark (cross-scanner) — Stable (2025-11)
Source: advisory “23-Nov-2025 - Benchmarking Determinism in Vulnerability Scoring”. This doc captures the runnable harness pattern and expected outputs for task BENCH-DETERMINISM-401-057.
## Goal
- Measure determinism rate, order-invariance, and CVSS delta σ across scanners when fed identical SBOM+VEX inputs with frozen feeds.
## Minimal harness (Python excerpt)
```python
# run_bench.py (excerpt) — deterministic JSON hashing
def canon(obj):
return json.dumps(obj, sort_keys=True, separators=(',', ':')).encode()
def shas(b):
return hashlib.sha256(b).hexdigest()
for sbom, vex in zip(SBOMS, VEXES):
for scanner, tmpl in SCANNERS.items():
for mode in ("canonical", "shuffled"):
for i in range(10):
if mode == "shuffled":
sb, vx = shuffle(sbom), shuffle(vex)
out = run(tmpl.format(sbom=sb, vex=vx))
norm = normalize(out) # purl, vuln id, base_cvss, effective
blob = canon({"scanner": scanner, "sbom": sbom,
"vex": vex, "findings": norm})
results.append({
"hash": shas(blob), "mode": mode,
"run": i, "scanner": scanner, "sbom": sbom
})
```
## Inputs
- 35 SBOMs (CycloneDX 1.6 / SPDX 3.0.1) + matching VEX docs covering affected/not_affected/fixed.
- Feeds bundle: vendor DBs (NVD, GHSA, OVAL) hashed and frozen.
- Policy: single normalization profile (e.g., prefer vendor scores, CVSS v3.1).
- Reachability dataset (optional combined run): `tests/reachability/samples-public` corpus; graphs produced via `stella graph lift` for each language sample; runtime traces optional.
## Metrics
- Determinism rate = identical_hash_runs / total_runs (20 per scanner/SBOM).
- Order-invariance failures (# distinct hashes between canonical vs shuffled).
- CVSS delta σ vs reference; VEX stability (σ_after ≤ σ_before).
## Deliverables
- Harness at `src/Bench/StellaOps.Bench/Determinism` (offline-friendly mock scanner included).
- `results/*.csv` with per-run hashes plus `summary.json` determinism rate.
- `results/inputs.sha256` listing SBOM, VEX, and config hashes (deterministic ordering).
- `bench/reachability/dataset.sha256` listing reachability corpus inputs (graphs, runtime traces) when running combined bench.
- CI target `bench:determinism` producing determinism% and σ per scanner; optional `bench:reachability` to recompute graph hash and runtime hit stability.
## Links
- Source advisory: `docs/product-advisories/23-Nov-2025 - Benchmarking Determinism in Vulnerability Scoring.md`
- Sprint task: BENCH-DETERMINISM-401-057 (SPRINT_0401_0001_0001_reachability_evidence_chain.md)
---
## How to run (local)
```sh
cd src/Bench/StellaOps.Bench/Determinism
# Run determinism bench (uses built-in mock scanner by default; defaults to 10 runs)
python run_bench.py --sboms inputs/sboms/*.json --vex inputs/vex/*.json \
--config configs/scanners.json --shuffle --output results
# Reachability dataset (optional)
python run_reachability.py --graphs ../reachability/graphs/*.json \
--runtime ../reachability/runtime/*.ndjson.gz --output results-reach.csv
```
Outputs are written to `results.csv` (determinism) and `results-reach.csv` (reachability stability) plus SHA manifests.
## How to run (CI)
- Workflow `.gitea/workflows/bench-determinism.yml` calls `scripts/bench/determinism-run.sh`, which runs the harness with the bundled mock scanner and uploads `out/bench-determinism/**` (results, manifests, summary). Set `DET_EXTRA_INPUTS` to include frozen feed bundles in `inputs.sha256`.
- Optional `bench:reachability` target (future) will replay reachability corpus, recompute graph hashes, and compare against expected `dataset.sha256`.
- CI fails when `determinism_rate` < `BENCH_DETERMINISM_THRESHOLD` (defaults to 0.95; set via env in the workflow).
## Offline/air-gap workflow
1. Place feeds bundle, SBOMs, VEX, and reachability corpus under `offline/inputs/` with matching `inputs.sha256` and `dataset.sha256`.
2. Run `./offline_run.sh --inputs offline/inputs --outputs offline/results` to execute both benches without network.
3. Verify hashes: `sha256sum -c offline/inputs/inputs.sha256` and `sha256sum -c offline/inputs/dataset.sha256`.
4. Store outputs plus manifests in Offline Kit; include DSSE envelope if signing is enabled (`./sign_results.sh`).
## Notes
- Keep file ordering deterministic (lexicographic) when generating manifests.
- Do not pull live feeds during bench runs; use frozen bundles only.
- For reachability runs, require symbol manifest availability; otherwise mark missing symbols and fail the run.