git.stella-ops.org/bench/reachability-benchmark/README.md

# StellaOps Reachability Benchmark (Public)

Deterministic, reproducible benchmark for reachability analysis tools.

## Goals
- Provide open cases with ground truth for reachable/unreachable sinks.
- Enforce determinism (hash-stable builds, fixed seeds, pinned deps).
- Enable fair scoring via the `rb-score` CLI and published schemas.

## Layout
- `cases/<lang>/<project>/` — benchmark cases with deterministic Dockerfiles, pinned deps, oracle tests.
- `schemas/` — JSON/YAML schemas for cases, entrypoints, truth, submissions.
- `benchmark/truth/` — ground-truth labels (hidden/internal split optional).
- `benchmark/submissions/` — sample submissions and format reference.
- `tools/scorer/` — `rb-score` CLI and tests.
- `baselines/` — reference runners (Semgrep, CodeQL, Stella) with normalized outputs.
- `ci/` — deterministic CI workflows and scripts.
- `website/` — static site (leaderboard/docs/downloads).

Sample cases added (JS track):
- `cases/js/unsafe-eval` (reachable sink) → `benchmark/truth/js-unsafe-eval.json`.
- `cases/js/guarded-eval` (unreachable by default) → `benchmark/truth/js-guarded-eval.json`.
- `cases/js/express-eval` (admin eval reachable) → `benchmark/truth/js-express-eval.json`.
- `cases/js/express-guarded` (admin eval gated by env) → `benchmark/truth/js-express-guarded.json`.
- `cases/js/fastify-template` (template rendering reachable) → `benchmark/truth/js-fastify-template.json`.

Sample cases added (Python track):
- `cases/py/unsafe-exec` (reachable eval) → `benchmark/truth/py-unsafe-exec.json`.
- `cases/py/guarded-exec` (unreachable when FEATURE_ENABLE != 1) → `benchmark/truth/py-guarded-exec.json`.
- `cases/py/flask-template` (template rendering reachable) → `benchmark/truth/py-flask-template.json`.
- `cases/py/fastapi-guarded` (unreachable unless ALLOW_EXEC=true) → `benchmark/truth/py-fastapi-guarded.json`.
- `cases/py/django-ssti` (template rendering reachable, autoescape off) → `benchmark/truth/py-django-ssti.json`.

Sample cases added (Java track):
- `cases/java/spring-deserialize` (reachable Java deserialization) → `benchmark/truth/java-spring-deserialize.json`.
- `cases/java/spring-guarded` (deserialization unreachable unless ALLOW_DESER=true) → `benchmark/truth/java-spring-guarded.json`.

## Determinism & Offline Rules
- No network during build/test; pin images/deps; set `SOURCE_DATE_EPOCH`.
- Sort file lists; stable JSON/YAML emitters; fixed RNG seeds.
- All scripts must succeed on a clean machine with cached toolchain tarballs only.

## Licensing
- Apache-2.0 for all benchmark assets. Third-party snippets must be license-compatible and attributed.

## Quick Start (once populated)
```bash
# schema sanity checks (offline)
python tools/validate.py all schemas/examples

# score a submission (coming in task 513-008)
cd tools/scorer
./rb-score --cases ../cases --truth ../benchmark/truth --submission ../benchmark/submissions/sample.json
```

## Contributing
See CONTRIBUTING.md. Open issues/PRs welcome; please provide hashes and logs for reproducibility.