up
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled

This commit is contained in:
StellaOps Bot
2025-12-01 21:16:22 +02:00
parent c11d87d252
commit 909d9b6220
208 changed files with 860954 additions and 832 deletions

View File

@@ -0,0 +1,48 @@
# Reachability Benchmark Launch (BENCH-LAUNCH-513-017)
## Audience
- Security engineering and platform teams evaluating reachability analysis tools.
- Benchmark participants (vendors, OSS maintainers) who need deterministic scoring.
## Positioning
- **Deterministic by default:** fixed seeds, SOURCE_DATE_EPOCH builds, sorted outputs.
- **Offline ready:** no registry pulls or telemetry; baselines run without network.
- **Explainable:** truth sets include static/dynamic evidence; scorer rewards path + guards.
- **Vendor-neutral:** Semgrep / CodeQL / Stella baselines provided for comparison.
## Whats included
- Cases across JS, Python, C (Java pending JDK availability).
- Schemas for cases, entrypoints, truth, and submissions.
- Baselines: Semgrep, CodeQL, Stella (offline).
- Tooling: scorer (`rb-score`), leaderboard (`rb-compare`), deterministic CI script (`ci/run-ci.sh`).
- Static site (`website/`) for quick start + leaderboard view.
## How to try it
```bash
# Build and validate
python tools/build/build_all.py --cases cases
python tools/validate.py --schemas schemas
# Run baselines (offline)
bash baselines/semgrep/run_all.sh cases /tmp/semgrep
bash baselines/stella/run_all.sh cases /tmp/stella
bash baselines/codeql/run_all.sh cases /tmp/codeql
# Score your submission
tools/scorer/rb_score.py --truth benchmark/truth/<aggregate>.json --submission submission.json --format json
```
## Key dates
- 2025-12-01: Public beta (v1.0.0 schemas, JS/PY/C cases, offline baselines).
- 2025-12-15 (target): Add Java track once JDK available in CI.
- Quarterly: hidden set rotation + leaderboard refresh.
## Calls to action
- Vendors: submit offlinereproducible `submission.json` for inclusion on the public leaderboard.
- Practitioners: run baselines locally to benchmark internal pipelines.
- OSS: propose new cases via PR; follow determinism checklist in `docs/submission-guide.md`.
## Risks & mitigations
- **Java track blocked (JDK)** — provide runner with JDK>=17; until then Java is excluded from CI.
- **Hidden set leakage** — governed by rotation policy in `docs/governance.md`; no public release of hidden cases.
- **Telemetry drift** — all runner scripts disable telemetry by env; reviewers verify no network calls.