up
This commit is contained in:
39
bench/reachability-benchmark/README.md
Normal file
39
bench/reachability-benchmark/README.md
Normal file
@@ -0,0 +1,39 @@
|
||||
# StellaOps Reachability Benchmark (Public)
|
||||
|
||||
Deterministic, reproducible benchmark for reachability analysis tools.
|
||||
|
||||
## Goals
|
||||
- Provide open cases with ground truth for reachable/unreachable sinks.
|
||||
- Enforce determinism (hash-stable builds, fixed seeds, pinned deps).
|
||||
- Enable fair scoring via the `rb-score` CLI and published schemas.
|
||||
|
||||
## Layout
|
||||
- `cases/<lang>/<project>/` — benchmark cases with deterministic Dockerfiles, pinned deps, oracle tests.
|
||||
- `schemas/` — JSON/YAML schemas for cases, entrypoints, truth, submissions.
|
||||
- `benchmark/truth/` — ground-truth labels (hidden/internal split optional).
|
||||
- `benchmark/submissions/` — sample submissions and format reference.
|
||||
- `tools/scorer/` — `rb-score` CLI and tests.
|
||||
- `baselines/` — reference runners (Semgrep, CodeQL, Stella) with normalized outputs.
|
||||
- `ci/` — deterministic CI workflows and scripts.
|
||||
- `website/` — static site (leaderboard/docs/downloads).
|
||||
|
||||
## Determinism & Offline Rules
|
||||
- No network during build/test; pin images/deps; set `SOURCE_DATE_EPOCH`.
|
||||
- Sort file lists; stable JSON/YAML emitters; fixed RNG seeds.
|
||||
- All scripts must succeed on a clean machine with cached toolchain tarballs only.
|
||||
|
||||
## Licensing
|
||||
- Apache-2.0 for all benchmark assets. Third-party snippets must be license-compatible and attributed.
|
||||
|
||||
## Quick Start (once populated)
|
||||
```bash
|
||||
# validate schemas
|
||||
npm test ./schemas # or python -m pytest schemas
|
||||
|
||||
# score a submission
|
||||
cd tools/scorer
|
||||
./rb-score --cases ../cases --truth ../benchmark/truth --submission ../benchmark/submissions/sample.json
|
||||
```
|
||||
|
||||
## Contributing
|
||||
See CONTRIBUTING.md. Open issues/PRs welcome; please provide hashes and logs for reproducibility.
|
||||
Reference in New Issue
Block a user