stella-ops.org/git.stella-ops.org

Fork 0

Files

StellaOps Bot d63af51f84

api-governance / spectral-lint (push) Has been cancelled

Details

Docs CI / lint-and-preview (push) Has been cancelled

Details

oas-ci / oas-validate (push) Has been cancelled

Details

SDK Publish & Sign / sdk-publish (push) Has been cancelled

Details

Policy Lint & Smoke / policy-lint (push) Has been cancelled

Details

Policy Simulation / policy-simulate (push) Has been cancelled

Details

devportal-offline / build-offline (push) Has been cancelled

Details

2025-11-26 20:23:28 +02:00

4.5 KiB

Raw Blame History

Determinism Benchmark (cross-scanner) — Stable (2025-11)

Source: advisory “23-Nov-2025 - Benchmarking Determinism in Vulnerability Scoring”. This doc captures the runnable harness pattern and expected outputs for task BENCH-DETERMINISM-401-057.

Goal

Measure determinism rate, order-invariance, and CVSS delta σ across scanners when fed identical SBOM+VEX inputs with frozen feeds.

Minimal harness (Python excerpt)

# run_bench.py (excerpt) — deterministic JSON hashing
def canon(obj):
    return json.dumps(obj, sort_keys=True, separators=(',', ':')).encode()

def shas(b):
    return hashlib.sha256(b).hexdigest()

for sbom, vex in zip(SBOMS, VEXES):
    for scanner, tmpl in SCANNERS.items():
        for mode in ("canonical", "shuffled"):
            for i in range(10):
                if mode == "shuffled":
                    sb, vx = shuffle(sbom), shuffle(vex)
                out = run(tmpl.format(sbom=sb, vex=vx))
                norm = normalize(out)  # purl, vuln id, base_cvss, effective
                blob = canon({"scanner": scanner, "sbom": sbom,
                              "vex": vex, "findings": norm})
                results.append({
                    "hash": shas(blob), "mode": mode,
                    "run": i, "scanner": scanner, "sbom": sbom
                })

Inputs

3–5 SBOMs (CycloneDX 1.6 / SPDX 3.0.1) + matching VEX docs covering affected/not_affected/fixed.
Feeds bundle: vendor DBs (NVD, GHSA, OVAL) hashed and frozen.
Policy: single normalization profile (e.g., prefer vendor scores, CVSS v3.1).
Reachability dataset (optional combined run): tests/reachability/samples-public corpus; graphs produced via stella graph lift for each language sample; runtime traces optional.

Metrics

Determinism rate = identical_hash_runs / total_runs (20 per scanner/SBOM).
Order-invariance failures (# distinct hashes between canonical vs shuffled).
CVSS delta σ vs reference; VEX stability (σ_after ≤ σ_before).

Deliverables

bench/determinism/ with harness, hashed inputs, and results.csv.
bench/determinism/inputs.sha256 listing SBOM, VEX, feed bundle hashes (deterministic ordering).
bench/reachability/dataset.sha256 listing reachability corpus inputs (graphs, runtime traces) when running combined bench.
CI target bench:determinism producing determinism% and σ per scanner; optional bench:reachability to recompute graph hash and runtime hit stability.

How to run (local)

cd bench/determinism
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Freeze feeds and policy hashes
./freeze_feeds.sh ../feeds/bundle.tar.gz > inputs.sha256

# Run determinism bench
python run_bench.py --sboms inputs/sboms/*.json --vex inputs/vex/*.json \
  --scanners configs/scanners.yaml --runs 20 --shuffle

# Reachability dataset (optional)
python run_reachability.py --graphs ../reachability/graphs/*.json \
  --runtime ../reachability/runtime/*.ndjson.gz --output results-reach.csv

Outputs are written to results.csv (determinism) and results-reach.csv (reachability stability) plus SHA manifests.

How to run (CI)

Target bench:determinism in CI (see .gitea/workflows/bench-determinism.yml) runs the harness with frozen feeds and uploads results.csv + inputs.sha256 as artifacts.
Optional bench:reachability target replays reachability corpus, recomputes graph hashes, and compares against expected dataset.sha256.
CI must fail if determinism rate < 0.95 or any graph hash mismatch.

Offline/air-gap workflow

Place feeds bundle, SBOMs, VEX, and reachability corpus under offline/inputs/ with matching inputs.sha256 and dataset.sha256.
Run ./offline_run.sh --inputs offline/inputs --outputs offline/results to execute both benches without network.
Verify hashes: sha256sum -c offline/inputs/inputs.sha256 and sha256sum -c offline/inputs/dataset.sha256.
Store outputs plus manifests in Offline Kit; include DSSE envelope if signing is enabled (./sign_results.sh).

Notes

Keep file ordering deterministic (lexicographic) when generating manifests.
Do not pull live feeds during bench runs; use frozen bundles only.
For reachability runs, require symbol manifest availability; otherwise mark missing symbols and fail the run.

4.5 KiB Raw Blame History Unescape Escape

Determinism Benchmark (cross-scanner) — Stable (2025-11)

Goal

Minimal harness (Python excerpt)

Inputs

Metrics

Deliverables

Links

How to run (local)

How to run (CI)

Offline/air-gap workflow

Notes

4.5 KiB

Raw Blame History