1.7 KiB
1.7 KiB
StellaOps Reachability Benchmark (Public)
Deterministic, reproducible benchmark for reachability analysis tools.
Goals
- Provide open cases with ground truth for reachable/unreachable sinks.
- Enforce determinism (hash-stable builds, fixed seeds, pinned deps).
- Enable fair scoring via the
rb-scoreCLI and published schemas.
Layout
cases/<lang>/<project>/— benchmark cases with deterministic Dockerfiles, pinned deps, oracle tests.schemas/— JSON/YAML schemas for cases, entrypoints, truth, submissions.benchmark/truth/— ground-truth labels (hidden/internal split optional).benchmark/submissions/— sample submissions and format reference.tools/scorer/—rb-scoreCLI and tests.baselines/— reference runners (Semgrep, CodeQL, Stella) with normalized outputs.ci/— deterministic CI workflows and scripts.website/— static site (leaderboard/docs/downloads).
Determinism & Offline Rules
- No network during build/test; pin images/deps; set
SOURCE_DATE_EPOCH. - Sort file lists; stable JSON/YAML emitters; fixed RNG seeds.
- All scripts must succeed on a clean machine with cached toolchain tarballs only.
Licensing
- Apache-2.0 for all benchmark assets. Third-party snippets must be license-compatible and attributed.
Quick Start (once populated)
# schema sanity checks (offline)
python tools/validate.py all schemas/examples
# score a submission (coming in task 513-008)
cd tools/scorer
./rb-score --cases ../cases --truth ../benchmark/truth --submission ../benchmark/submissions/sample.json
Contributing
See CONTRIBUTING.md. Open issues/PRs welcome; please provide hashes and logs for reproducibility.