49 lines
2.2 KiB
Markdown
49 lines
2.2 KiB
Markdown
# Reachability Benchmark Launch (BENCH-LAUNCH-513-017)
|
||
|
||
## Audience
|
||
- Security engineering and platform teams evaluating reachability analysis tools.
|
||
- Benchmark participants (vendors, OSS maintainers) who need deterministic scoring.
|
||
|
||
## Positioning
|
||
- **Deterministic by default:** fixed seeds, SOURCE_DATE_EPOCH builds, sorted outputs.
|
||
- **Offline ready:** no registry pulls or telemetry; baselines run without network.
|
||
- **Explainable:** truth sets include static/dynamic evidence; scorer rewards path + guards.
|
||
- **Vendor-neutral:** Semgrep / CodeQL / Stella baselines provided for comparison.
|
||
|
||
## What’s included
|
||
- Cases across JS, Python, C (Java pending JDK availability).
|
||
- Schemas for cases, entrypoints, truth, and submissions.
|
||
- Baselines: Semgrep, CodeQL, Stella (offline).
|
||
- Tooling: scorer (`rb-score`), leaderboard (`rb-compare`), deterministic CI script (`ci/run-ci.sh`).
|
||
- Static site (`website/`) for quick start + leaderboard view.
|
||
|
||
## How to try it
|
||
```bash
|
||
# Build and validate
|
||
python tools/build/build_all.py --cases cases
|
||
python tools/validate.py --schemas schemas
|
||
|
||
# Run baselines (offline)
|
||
bash baselines/semgrep/run_all.sh cases /tmp/semgrep
|
||
bash baselines/stella/run_all.sh cases /tmp/stella
|
||
bash baselines/codeql/run_all.sh cases /tmp/codeql
|
||
|
||
# Score your submission
|
||
tools/scorer/rb_score.py --truth benchmark/truth/<aggregate>.json --submission submission.json --format json
|
||
```
|
||
|
||
## Key dates
|
||
- 2025-12-01: Public beta (v1.0.0 schemas, JS/PY/C cases, offline baselines).
|
||
- 2025-12-15 (target): Add Java track once JDK available in CI.
|
||
- Quarterly: hidden set rotation + leaderboard refresh.
|
||
|
||
## Calls to action
|
||
- Vendors: submit offline‑reproducible `submission.json` for inclusion on the public leaderboard.
|
||
- Practitioners: run baselines locally to benchmark internal pipelines.
|
||
- OSS: propose new cases via PR; follow determinism checklist in `docs/submission-guide.md`.
|
||
|
||
## Risks & mitigations
|
||
- **Java track blocked (JDK)** — provide runner with JDK>=17; until then Java is excluded from CI.
|
||
- **Hidden set leakage** — governed by rotation policy in `docs/governance.md`; no public release of hidden cases.
|
||
- **Telemetry drift** — all runner scripts disable telemetry by env; reviewers verify no network calls.
|