# StellaOps Reachability Benchmark (Public) Deterministic, reproducible benchmark for reachability analysis tools. ## Goals - Provide open cases with ground truth for reachable/unreachable sinks. - Enforce determinism (hash-stable builds, fixed seeds, pinned deps). - Enable fair scoring via the `rb-score` CLI and published schemas. ## Layout - `cases///` ƒ?" benchmark cases with deterministic Dockerfiles, pinned deps, oracle tests. - `schemas/` ƒ?" JSON/YAML schemas for cases, entrypoints, truth, submissions. - `benchmark/truth/` ƒ?" ground-truth labels (hidden/internal split optional). - `benchmark/submissions/` ƒ?" sample submissions and format reference. - `tools/scorer/` ƒ?" `rb-score` CLI and tests. - `tools/build/` ƒ?" `build_all.py` (run all cases) and `validate_builds.py` (run twice and compare hashes). - `baselines/` ƒ?" reference runners (Semgrep, CodeQL, Stella) with normalized outputs. - `ci/` ƒ?" deterministic CI workflows and scripts. - `website/` ƒ?" static site (leaderboard/docs/downloads). Sample cases added (JS track): - `cases/js/unsafe-eval` (reachable sink) ƒ+' `benchmark/truth/js-unsafe-eval.json`. - `cases/js/guarded-eval` (unreachable by default) ƒ+' `benchmark/truth/js-guarded-eval.json`. - `cases/js/express-eval` (admin eval reachable) ƒ+' `benchmark/truth/js-express-eval.json`. - `cases/js/express-guarded` (admin eval gated by env) ƒ+' `benchmark/truth/js-express-guarded.json`. - `cases/js/fastify-template` (template rendering reachable) ƒ+' `benchmark/truth/js-fastify-template.json`. Sample cases added (Python track): - `cases/py/unsafe-exec` (reachable eval) ƒ+' `benchmark/truth/py-unsafe-exec.json`. - `cases/py/guarded-exec` (unreachable when FEATURE_ENABLE != 1) ƒ+' `benchmark/truth/py-guarded-exec.json`. - `cases/py/flask-template` (template rendering reachable) ƒ+' `benchmark/truth/py-flask-template.json`. - `cases/py/fastapi-guarded` (unreachable unless ALLOW_EXEC=true) ƒ+' `benchmark/truth/py-fastapi-guarded.json`. - `cases/py/django-ssti` (template rendering reachable, autoescape off) ƒ+' `benchmark/truth/py-django-ssti.json`. Sample cases added (Java track): - `cases/java/spring-deserialize` (reachable Java deserialization) ƒ+' `benchmark/truth/java-spring-deserialize.json`. - `cases/java/spring-guarded` (deserialization unreachable unless ALLOW_DESER=true) ƒ+' `benchmark/truth/java-spring-guarded.json`. - `cases/java/micronaut-deserialize` (reachable Micronaut-style deserialization) ƒ+' `benchmark/truth/java-micronaut-deserialize.json`. - `cases/java/micronaut-guarded` (unreachable unless ALLOW_MN_DESER=true) ƒ+' `benchmark/truth/java-micronaut-guarded.json`. - `cases/java/spring-reflection` (reflection sink reachable via Class.forName) ƒ+' `benchmark/truth/java-spring-reflection.json`. ## Determinism & Offline Rules - No network during build/test; pin images/deps; set `SOURCE_DATE_EPOCH`. - Sort file lists; stable JSON/YAML emitters; fixed RNG seeds. - All scripts must succeed on a clean machine with cached toolchain tarballs only. - Java builds auto-use vendored Temurin 21 via `tools/java/ensure_jdk.sh` when `JAVA_HOME`/`javac` are absent. ## Licensing - Apache-2.0 for all benchmark assets. Third-party snippets must be license-compatible and attributed. ## Quick Start (once populated) ```bash # schema sanity checks (offline) python tools/validate.py all schemas/examples # score a submission (coming in task 513-008) ./tools/scorer/rb-score --cases cases --truth benchmark/truth --submission benchmark/submissions/sample.json # deterministic case builds (skip a language when a toolchain is unavailable) python tools/build/build_all.py --cases cases --skip-lang js ``` ## Contributing See CONTRIBUTING.md. Open issues/PRs welcome; please provide hashes and logs for reproducibility.