up
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
sdk-generator-smoke / sdk-smoke (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled

This commit is contained in:
StellaOps Bot
2025-11-27 07:46:56 +02:00
parent d63af51f84
commit ea970ead2a
302 changed files with 43161 additions and 1534 deletions

View File

@@ -0,0 +1,45 @@
# Determinism Benchmark Harness (BENCH-DETERMINISM-401-057)
Location: `src/Bench/StellaOps.Bench/Determinism`
## What it does
- Runs a deterministic, offline-friendly benchmark that hashes scanner outputs for paired SBOM/VEX inputs.
- Produces `results.csv`, `inputs.sha256`, and `summary.json` capturing determinism rate.
- Ships with a built-in mock scanner so CI/offline runs do not need external tools.
## Quick start
```sh
cd src/Bench/StellaOps.Bench/Determinism
python3 run_bench.py --shuffle --runs 3 --output out
```
Outputs land in `out/`:
- `results.csv` per-run hashes (mode/run/scanner)
- `inputs.sha256` deterministic manifest of SBOM/VEX/config inputs
- `summary.json` aggregate determinism rate
## Inputs
- SBOMs: `inputs/sboms/*.json` (sample SPDX provided)
- VEX: `inputs/vex/*.json` (sample OpenVEX provided)
- Scanner config: `configs/scanners.json` (defaults to built-in mock scanner)
## Adding real scanners
1. Add an entry to `configs/scanners.json` with `kind: "command"` and a command array, e.g.:
```json
{
"name": "scannerX",
"kind": "command",
"command": ["python", "../../scripts/scannerX_wrapper.py", "{sbom}", "{vex}"]
}
```
2. Commands must write JSON with a top-level `findings` array; each finding should include `purl`, `vulnerability`, `status`, and `base_score`.
3. Keep commands offline and deterministic; pin any feeds to local bundles before running.
## Determinism expectations
- Canonical and shuffled runs should yield identical hashes per scanner/SBOM/VEX tuple.
- CI should treat determinism_rate < 0.95 as a failure once wired into workflows.
## Maintenance
- Tests live in `tests/` and cover shuffle stability + manifest generation.
- Update `docs/benchmarks/signals/bench-determinism.md` when inputs/outputs change.
- Mirror task status in `docs/implplan/SPRINT_0512_0001_0001_bench.md` and `src/Bench/StellaOps.Bench/TASKS.md`.

View File

@@ -0,0 +1,12 @@
{
"scanners": [
{
"name": "mock",
"kind": "mock",
"description": "Deterministic mock scanner used for CI/offline parity",
"parameters": {
"severity_bias": 0.25
}
}
]
}

View File

@@ -0,0 +1,16 @@
{
"spdxVersion": "SPDX-3.0",
"documentNamespace": "https://stellaops.local/spdx/sample-spdx",
"packages": [
{
"name": "demo-lib",
"versionInfo": "1.0.0",
"purl": "pkg:pypi/demo-lib@1.0.0"
},
{
"name": "demo-cli",
"versionInfo": "0.4.2",
"purl": "pkg:generic/demo-cli@0.4.2"
}
]
}

View File

@@ -0,0 +1,19 @@
{
"version": "1.0",
"statements": [
{
"vulnerability": "CVE-2024-0001",
"products": ["pkg:pypi/demo-lib@1.0.0"],
"status": "affected",
"justification": "known_exploited",
"timestamp": "2025-11-01T00:00:00Z"
},
{
"vulnerability": "CVE-2023-9999",
"products": ["pkg:generic/demo-cli@0.4.2"],
"status": "not_affected",
"justification": "vulnerable_code_not_present",
"timestamp": "2025-10-28T00:00:00Z"
}
]
}

View File

@@ -0,0 +1,3 @@
38453c9c0e0a90d22d7048d3201bf1b5665eb483e6682db1a7112f8e4f4fa1e6 configs/scanners.json
577f932bbb00dbd596e46b96d5fbb9561506c7730c097e381a6b34de40402329 inputs/sboms/sample-spdx.json
1b54ce4087800cfe1d5ac439c10a1f131b7476b2093b79d8cd0a29169314291f inputs/vex/sample-openvex.json

View File

@@ -0,0 +1,21 @@
scanner,sbom,vex,mode,run,hash,finding_count
mock,sample-spdx.json,sample-openvex.json,canonical,0,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,shuffled,0,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,canonical,1,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,shuffled,1,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,canonical,2,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,shuffled,2,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,canonical,3,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,shuffled,3,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,canonical,4,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,shuffled,4,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,canonical,5,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,shuffled,5,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,canonical,6,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,shuffled,6,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,canonical,7,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,shuffled,7,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,canonical,8,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,shuffled,8,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,canonical,9,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
mock,sample-spdx.json,sample-openvex.json,shuffled,9,d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18,2
1 scanner sbom vex mode run hash finding_count
2 mock sample-spdx.json sample-openvex.json canonical 0 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
3 mock sample-spdx.json sample-openvex.json shuffled 0 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
4 mock sample-spdx.json sample-openvex.json canonical 1 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
5 mock sample-spdx.json sample-openvex.json shuffled 1 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
6 mock sample-spdx.json sample-openvex.json canonical 2 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
7 mock sample-spdx.json sample-openvex.json shuffled 2 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
8 mock sample-spdx.json sample-openvex.json canonical 3 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
9 mock sample-spdx.json sample-openvex.json shuffled 3 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
10 mock sample-spdx.json sample-openvex.json canonical 4 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
11 mock sample-spdx.json sample-openvex.json shuffled 4 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
12 mock sample-spdx.json sample-openvex.json canonical 5 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
13 mock sample-spdx.json sample-openvex.json shuffled 5 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
14 mock sample-spdx.json sample-openvex.json canonical 6 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
15 mock sample-spdx.json sample-openvex.json shuffled 6 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
16 mock sample-spdx.json sample-openvex.json canonical 7 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
17 mock sample-spdx.json sample-openvex.json shuffled 7 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
18 mock sample-spdx.json sample-openvex.json canonical 8 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
19 mock sample-spdx.json sample-openvex.json shuffled 8 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
20 mock sample-spdx.json sample-openvex.json canonical 9 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2
21 mock sample-spdx.json sample-openvex.json shuffled 9 d1cc5f0d22e863e457af589fb2c6c1737b67eb586338bccfe23ea7908c8a8b18 2

View File

@@ -0,0 +1,3 @@
{
"determinism_rate": 1.0
}

View File

@@ -0,0 +1,309 @@
#!/usr/bin/env python3
"""
Determinism benchmark harness for BENCH-DETERMINISM-401-057.
- Offline by default; uses a built-in mock scanner that derives findings from
SBOM and VEX documents without external calls.
- Produces deterministic hashes for canonical and (optionally) shuffled inputs.
- Writes `results.csv` and `inputs.sha256` to the chosen output directory.
"""
from __future__ import annotations
import argparse
import csv
import hashlib
import json
import shutil
import subprocess
from copy import deepcopy
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Dict, Iterable, List, Sequence
import random
@dataclass(frozen=True)
class Scanner:
name: str
kind: str # "mock" or "command"
command: Sequence[str] | None = None
parameters: Dict[str, Any] | None = None
# ---------- utility helpers ----------
def sha256_bytes(data: bytes) -> str:
return hashlib.sha256(data).hexdigest()
def load_json(path: Path) -> Any:
with path.open("r", encoding="utf-8") as f:
return json.load(f)
def dump_canonical(obj: Any) -> bytes:
return json.dumps(obj, sort_keys=True, separators=(",", ":")).encode("utf-8")
def shuffle_obj(obj: Any, rng: random.Random) -> Any:
if isinstance(obj, list):
shuffled = [shuffle_obj(item, rng) for item in obj]
rng.shuffle(shuffled)
return shuffled
if isinstance(obj, dict):
items = list(obj.items())
rng.shuffle(items)
return {k: shuffle_obj(v, rng) for k, v in items}
return obj # primitive
def stable_int(value: str, modulo: int) -> int:
digest = hashlib.sha256(value.encode("utf-8")).hexdigest()
return int(digest[:16], 16) % modulo
# ---------- mock scanner ----------
def run_mock_scanner(sbom: Dict[str, Any], vex: Dict[str, Any], parameters: Dict[str, Any] | None) -> Dict[str, Any]:
severity_bias = float(parameters.get("severity_bias", 0.0)) if parameters else 0.0
packages = sbom.get("packages", [])
statements = vex.get("statements", [])
findings: List[Dict[str, Any]] = []
for stmt in statements:
vuln = stmt.get("vulnerability")
status = stmt.get("status", "unknown")
for product in stmt.get("products", []):
score_seed = stable_int(f"{product}:{vuln}", 600)
score = (score_seed / 10.0) + severity_bias
findings.append(
{
"purl": product,
"vulnerability": vuln,
"status": status,
"base_score": round(score, 1),
}
)
# Add packages with no statements as informational rows
seen_products = {f["purl"] for f in findings}
for pkg in packages:
purl = pkg.get("purl")
if purl and purl not in seen_products:
findings.append(
{
"purl": purl,
"vulnerability": "NONE",
"status": "unknown",
"base_score": 0.0,
}
)
findings.sort(key=lambda f: (f.get("purl", ""), f.get("vulnerability", "")))
return {"scanner": "mock", "findings": findings}
# ---------- runners ----------
def run_scanner(scanner: Scanner, sbom_path: Path, vex_path: Path, sbom_obj: Dict[str, Any], vex_obj: Dict[str, Any]) -> Dict[str, Any]:
if scanner.kind == "mock":
return run_mock_scanner(sbom_obj, vex_obj, scanner.parameters)
if scanner.kind == "command":
if scanner.command is None:
raise ValueError(f"Scanner {scanner.name} missing command")
cmd = [part.format(sbom=sbom_path, vex=vex_path) for part in scanner.command]
result = subprocess.run(cmd, check=True, capture_output=True, text=True)
return json.loads(result.stdout)
raise ValueError(f"Unsupported scanner kind: {scanner.kind}")
def canonical_hash(scanner_name: str, sbom_path: Path, vex_path: Path, normalized_findings: List[Dict[str, Any]]) -> str:
payload = {
"scanner": scanner_name,
"sbom": sbom_path.name,
"vex": vex_path.name,
"findings": normalized_findings,
}
return sha256_bytes(dump_canonical(payload))
def normalize_output(raw: Dict[str, Any]) -> List[Dict[str, Any]]:
findings = raw.get("findings", [])
normalized: List[Dict[str, Any]] = []
for entry in findings:
normalized.append(
{
"purl": entry.get("purl", ""),
"vulnerability": entry.get("vulnerability", ""),
"status": entry.get("status", "unknown"),
"base_score": float(entry.get("base_score", 0.0)),
}
)
normalized.sort(key=lambda f: (f["purl"], f["vulnerability"]))
return normalized
def write_results(results: List[Dict[str, Any]], output_csv: Path) -> None:
output_csv.parent.mkdir(parents=True, exist_ok=True)
fieldnames = ["scanner", "sbom", "vex", "mode", "run", "hash", "finding_count"]
with output_csv.open("w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for row in results:
writer.writerow(row)
def write_inputs_manifest(inputs: List[Path], manifest_path: Path) -> None:
manifest_path.parent.mkdir(parents=True, exist_ok=True)
lines: List[str] = []
for path in sorted(inputs, key=lambda p: str(p)):
digest = sha256_bytes(path.read_bytes())
try:
rel_path = path.resolve().relative_to(Path.cwd().resolve())
except ValueError:
rel_path = path.resolve()
lines.append(f"{digest} {rel_path.as_posix()}\n")
with manifest_path.open("w", encoding="utf-8") as f:
f.writelines(lines)
def load_scanners(config_path: Path) -> List[Scanner]:
cfg = load_json(config_path)
scanners = []
for entry in cfg.get("scanners", []):
scanners.append(
Scanner(
name=entry.get("name", "unknown"),
kind=entry.get("kind", "mock"),
command=entry.get("command"),
parameters=entry.get("parameters", {}),
)
)
return scanners
def run_bench(
sboms: Sequence[Path],
vexes: Sequence[Path],
scanners: Sequence[Scanner],
runs: int,
shuffle: bool,
output_dir: Path,
manifest_extras: Sequence[Path] | None = None,
) -> List[Dict[str, Any]]:
if len(sboms) != len(vexes):
raise ValueError("SBOM/VEX counts must match for pairwise runs")
results: List[Dict[str, Any]] = []
for sbom_path, vex_path in zip(sboms, vexes):
sbom_obj = load_json(sbom_path)
vex_obj = load_json(vex_path)
for scanner in scanners:
for run in range(runs):
for mode in ("canonical", "shuffled" if shuffle else ""):
if not mode:
continue
sbom_candidate = deepcopy(sbom_obj)
vex_candidate = deepcopy(vex_obj)
if mode == "shuffled":
seed = sha256_bytes(f"{sbom_path}:{vex_path}:{run}:{scanner.name}".encode("utf-8"))
rng = random.Random(int(seed[:16], 16))
sbom_candidate = shuffle_obj(sbom_candidate, rng)
vex_candidate = shuffle_obj(vex_candidate, rng)
raw_output = run_scanner(scanner, sbom_path, vex_path, sbom_candidate, vex_candidate)
normalized = normalize_output(raw_output)
results.append(
{
"scanner": scanner.name,
"sbom": sbom_path.name,
"vex": vex_path.name,
"mode": mode,
"run": run,
"hash": canonical_hash(scanner.name, sbom_path, vex_path, normalized),
"finding_count": len(normalized),
}
)
output_dir.mkdir(parents=True, exist_ok=True)
return results
def compute_determinism_rate(results: List[Dict[str, Any]]) -> float:
by_key: Dict[tuple, List[str]] = {}
for row in results:
key = (row["scanner"], row["sbom"], row["vex"], row["mode"])
by_key.setdefault(key, []).append(row["hash"])
stable = 0
total = 0
for hashes in by_key.values():
total += len(hashes)
if len(set(hashes)) == 1:
stable += len(hashes)
return stable / total if total else 0.0
# ---------- CLI ----------
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Determinism benchmark harness")
parser.add_argument("--sboms", nargs="*", default=["inputs/sboms/*.json"], help="Glob(s) for SBOM inputs")
parser.add_argument("--vex", nargs="*", default=["inputs/vex/*.json"], help="Glob(s) for VEX inputs")
parser.add_argument("--config", default="configs/scanners.json", help="Scanner config JSON path")
parser.add_argument("--runs", type=int, default=10, help="Runs per scanner/SBOM pair")
parser.add_argument("--shuffle", action="store_true", help="Enable shuffled-order runs")
parser.add_argument("--output", default="results", help="Output directory")
parser.add_argument(
"--manifest-extra",
nargs="*",
default=[],
help="Extra files (or globs) to include in inputs.sha256 (e.g., frozen feeds)",
)
return parser.parse_args()
def expand_globs(patterns: Iterable[str]) -> List[Path]:
paths: List[Path] = []
for pattern in patterns:
if not pattern:
continue
for path in sorted(Path().glob(pattern)):
if path.is_file():
paths.append(path)
return paths
def main() -> None:
args = parse_args()
sboms = expand_globs(args.sboms)
vexes = expand_globs(args.vex)
manifest_extras = expand_globs(args.manifest_extra)
output_dir = Path(args.output)
if not sboms or not vexes:
raise SystemExit("No SBOM or VEX inputs found; supply --sboms/--vex globs")
scanners = load_scanners(Path(args.config))
if not scanners:
raise SystemExit("Scanner config has no entries")
results = run_bench(sboms, vexes, scanners, args.runs, args.shuffle, output_dir, manifest_extras)
results_csv = output_dir / "results.csv"
write_results(results, results_csv)
manifest_inputs = sboms + vexes + [Path(args.config)] + (manifest_extras or [])
write_inputs_manifest(manifest_inputs, output_dir / "inputs.sha256")
determinism = compute_determinism_rate(results)
summary_path = output_dir / "summary.json"
summary_path.write_text(json.dumps({"determinism_rate": determinism}, indent=2), encoding="utf-8")
print(f"Wrote {results_csv} (determinism_rate={determinism:.3f})")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,61 @@
import sys
from pathlib import Path
from tempfile import TemporaryDirectory
import unittest
# Allow direct import of run_bench from the harness folder
HARNESS_DIR = Path(__file__).resolve().parents[1]
sys.path.insert(0, str(HARNESS_DIR))
import run_bench # noqa: E402
class DeterminismBenchTests(unittest.TestCase):
def setUp(self) -> None:
self.base = HARNESS_DIR
self.sboms = [self.base / "inputs" / "sboms" / "sample-spdx.json"]
self.vexes = [self.base / "inputs" / "vex" / "sample-openvex.json"]
self.scanners = run_bench.load_scanners(self.base / "configs" / "scanners.json")
def test_canonical_and_shuffled_hashes_match(self):
with TemporaryDirectory() as tmp:
out_dir = Path(tmp)
results = run_bench.run_bench(
self.sboms,
self.vexes,
self.scanners,
runs=3,
shuffle=True,
output_dir=out_dir,
)
rate = run_bench.compute_determinism_rate(results)
self.assertAlmostEqual(rate, 1.0)
hashes = {(r["scanner"], r["mode"]): r["hash"] for r in results}
self.assertEqual(len(hashes), 2)
def test_inputs_manifest_written(self):
with TemporaryDirectory() as tmp:
out_dir = Path(tmp)
extra = Path(tmp) / "feeds.tar.gz"
extra.write_bytes(b"feed")
results = run_bench.run_bench(
self.sboms,
self.vexes,
self.scanners,
runs=1,
shuffle=False,
output_dir=out_dir,
manifest_extras=[extra],
)
run_bench.write_results(results, out_dir / "results.csv")
manifest = out_dir / "inputs.sha256"
run_bench.write_inputs_manifest(self.sboms + self.vexes + [extra], manifest)
text = manifest.read_text(encoding="utf-8")
self.assertIn("sample-spdx.json", text)
self.assertIn("sample-openvex.json", text)
self.assertIn("feeds.tar.gz", text)
if __name__ == "__main__":
unittest.main()

View File

@@ -0,0 +1,5 @@
# Tasks (Benchmarks Guild)
| ID | Status | Sprint | Notes | Evidence |
| --- | --- | --- | --- | --- |
| BENCH-DETERMINISM-401-057 | DONE (2025-11-26) | SPRINT_0512_0001_0001_bench | Determinism harness and mock scanner added under `src/Bench/StellaOps.Bench/Determinism`; manifests + sample inputs included. | `src/Bench/StellaOps.Bench/Determinism/results` (generated) |