Files
git.stella-ops.org/docs/marketing/reachability-benchmark-launch.md
StellaOps Bot 909d9b6220
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
up
2025-12-01 21:16:22 +02:00

49 lines
2.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Reachability Benchmark Launch (BENCH-LAUNCH-513-017)
## Audience
- Security engineering and platform teams evaluating reachability analysis tools.
- Benchmark participants (vendors, OSS maintainers) who need deterministic scoring.
## Positioning
- **Deterministic by default:** fixed seeds, SOURCE_DATE_EPOCH builds, sorted outputs.
- **Offline ready:** no registry pulls or telemetry; baselines run without network.
- **Explainable:** truth sets include static/dynamic evidence; scorer rewards path + guards.
- **Vendor-neutral:** Semgrep / CodeQL / Stella baselines provided for comparison.
## Whats included
- Cases across JS, Python, C (Java pending JDK availability).
- Schemas for cases, entrypoints, truth, and submissions.
- Baselines: Semgrep, CodeQL, Stella (offline).
- Tooling: scorer (`rb-score`), leaderboard (`rb-compare`), deterministic CI script (`ci/run-ci.sh`).
- Static site (`website/`) for quick start + leaderboard view.
## How to try it
```bash
# Build and validate
python tools/build/build_all.py --cases cases
python tools/validate.py --schemas schemas
# Run baselines (offline)
bash baselines/semgrep/run_all.sh cases /tmp/semgrep
bash baselines/stella/run_all.sh cases /tmp/stella
bash baselines/codeql/run_all.sh cases /tmp/codeql
# Score your submission
tools/scorer/rb_score.py --truth benchmark/truth/<aggregate>.json --submission submission.json --format json
```
## Key dates
- 2025-12-01: Public beta (v1.0.0 schemas, JS/PY/C cases, offline baselines).
- 2025-12-15 (target): Add Java track once JDK available in CI.
- Quarterly: hidden set rotation + leaderboard refresh.
## Calls to action
- Vendors: submit offlinereproducible `submission.json` for inclusion on the public leaderboard.
- Practitioners: run baselines locally to benchmark internal pipelines.
- OSS: propose new cases via PR; follow determinism checklist in `docs/submission-guide.md`.
## Risks & mitigations
- **Java track blocked (JDK)** — provide runner with JDK>=17; until then Java is excluded from CI.
- **Hidden set leakage** — governed by rotation policy in `docs/governance.md`; no public release of hidden cases.
- **Telemetry drift** — all runner scripts disable telemetry by env; reviewers verify no network calls.