StellaOps Reachability Benchmark (Public)

Deterministic, reproducible benchmark for reachability analysis tools.

Goals

Provide open cases with ground truth for reachable/unreachable sinks.
Enforce determinism (hash-stable builds, fixed seeds, pinned deps).
Enable fair scoring via the rb-score CLI and published schemas.

Layout

cases/<lang>/<project>/ — benchmark cases with deterministic Dockerfiles, pinned deps, oracle tests.
schemas/ — JSON/YAML schemas for cases, entrypoints, truth, submissions.
benchmark/truth/ — ground-truth labels (hidden/internal split optional).
benchmark/submissions/ — sample submissions and format reference.
tools/scorer/ — rb-score CLI and tests.
baselines/ — reference runners (Semgrep, CodeQL, Stella) with normalized outputs.
ci/ — deterministic CI workflows and scripts.
website/ — static site (leaderboard/docs/downloads).

Determinism & Offline Rules

No network during build/test; pin images/deps; set SOURCE_DATE_EPOCH.
Sort file lists; stable JSON/YAML emitters; fixed RNG seeds.
All scripts must succeed on a clean machine with cached toolchain tarballs only.

Licensing

Apache-2.0 for all benchmark assets. Third-party snippets must be license-compatible and attributed.

Quick Start (once populated)

# schema sanity checks (offline)
python tools/validate.py all schemas/examples

# score a submission (coming in task 513-008)
cd tools/scorer
./rb-score --cases ../cases --truth ../benchmark/truth --submission ../benchmark/submissions/sample.json

Contributing

See CONTRIBUTING.md. Open issues/PRs welcome; please provide hashes and logs for reproducibility.

1.7 KiB Raw Blame History

StellaOps Reachability Benchmark (Public)

Goals

Layout

Determinism & Offline Rules

Licensing

Quick Start (once populated)

Contributing

1.7 KiB

Raw Blame History