up
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
cryptopro-linux-csp / build-and-test (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
sm-remote-ci / build-and-test (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
cryptopro-linux-csp / build-and-test (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
sm-remote-ci / build-and-test (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
This commit is contained in:
@@ -8,38 +8,42 @@ Deterministic, reproducible benchmark for reachability analysis tools.
|
||||
- Enable fair scoring via the `rb-score` CLI and published schemas.
|
||||
|
||||
## Layout
|
||||
- `cases/<lang>/<project>/` — benchmark cases with deterministic Dockerfiles, pinned deps, oracle tests.
|
||||
- `schemas/` — JSON/YAML schemas for cases, entrypoints, truth, submissions.
|
||||
- `benchmark/truth/` — ground-truth labels (hidden/internal split optional).
|
||||
- `benchmark/submissions/` — sample submissions and format reference.
|
||||
- `tools/scorer/` — `rb-score` CLI and tests.
|
||||
- `tools/build/` — `build_all.py` (run all cases) and `validate_builds.py` (run twice and compare hashes).
|
||||
- `baselines/` — reference runners (Semgrep, CodeQL, Stella) with normalized outputs.
|
||||
- `ci/` — deterministic CI workflows and scripts.
|
||||
- `website/` — static site (leaderboard/docs/downloads).
|
||||
- `cases/<lang>/<project>/` ƒ?" benchmark cases with deterministic Dockerfiles, pinned deps, oracle tests.
|
||||
- `schemas/` ƒ?" JSON/YAML schemas for cases, entrypoints, truth, submissions.
|
||||
- `benchmark/truth/` ƒ?" ground-truth labels (hidden/internal split optional).
|
||||
- `benchmark/submissions/` ƒ?" sample submissions and format reference.
|
||||
- `tools/scorer/` ƒ?" `rb-score` CLI and tests.
|
||||
- `tools/build/` ƒ?" `build_all.py` (run all cases) and `validate_builds.py` (run twice and compare hashes).
|
||||
- `baselines/` ƒ?" reference runners (Semgrep, CodeQL, Stella) with normalized outputs.
|
||||
- `ci/` ƒ?" deterministic CI workflows and scripts.
|
||||
- `website/` ƒ?" static site (leaderboard/docs/downloads).
|
||||
|
||||
Sample cases added (JS track):
|
||||
- `cases/js/unsafe-eval` (reachable sink) → `benchmark/truth/js-unsafe-eval.json`.
|
||||
- `cases/js/guarded-eval` (unreachable by default) → `benchmark/truth/js-guarded-eval.json`.
|
||||
- `cases/js/express-eval` (admin eval reachable) → `benchmark/truth/js-express-eval.json`.
|
||||
- `cases/js/express-guarded` (admin eval gated by env) → `benchmark/truth/js-express-guarded.json`.
|
||||
- `cases/js/fastify-template` (template rendering reachable) → `benchmark/truth/js-fastify-template.json`.
|
||||
- `cases/js/unsafe-eval` (reachable sink) ƒ+' `benchmark/truth/js-unsafe-eval.json`.
|
||||
- `cases/js/guarded-eval` (unreachable by default) ƒ+' `benchmark/truth/js-guarded-eval.json`.
|
||||
- `cases/js/express-eval` (admin eval reachable) ƒ+' `benchmark/truth/js-express-eval.json`.
|
||||
- `cases/js/express-guarded` (admin eval gated by env) ƒ+' `benchmark/truth/js-express-guarded.json`.
|
||||
- `cases/js/fastify-template` (template rendering reachable) ƒ+' `benchmark/truth/js-fastify-template.json`.
|
||||
|
||||
Sample cases added (Python track):
|
||||
- `cases/py/unsafe-exec` (reachable eval) → `benchmark/truth/py-unsafe-exec.json`.
|
||||
- `cases/py/guarded-exec` (unreachable when FEATURE_ENABLE != 1) → `benchmark/truth/py-guarded-exec.json`.
|
||||
- `cases/py/flask-template` (template rendering reachable) → `benchmark/truth/py-flask-template.json`.
|
||||
- `cases/py/fastapi-guarded` (unreachable unless ALLOW_EXEC=true) → `benchmark/truth/py-fastapi-guarded.json`.
|
||||
- `cases/py/django-ssti` (template rendering reachable, autoescape off) → `benchmark/truth/py-django-ssti.json`.
|
||||
- `cases/py/unsafe-exec` (reachable eval) ƒ+' `benchmark/truth/py-unsafe-exec.json`.
|
||||
- `cases/py/guarded-exec` (unreachable when FEATURE_ENABLE != 1) ƒ+' `benchmark/truth/py-guarded-exec.json`.
|
||||
- `cases/py/flask-template` (template rendering reachable) ƒ+' `benchmark/truth/py-flask-template.json`.
|
||||
- `cases/py/fastapi-guarded` (unreachable unless ALLOW_EXEC=true) ƒ+' `benchmark/truth/py-fastapi-guarded.json`.
|
||||
- `cases/py/django-ssti` (template rendering reachable, autoescape off) ƒ+' `benchmark/truth/py-django-ssti.json`.
|
||||
|
||||
Sample cases added (Java track):
|
||||
- `cases/java/spring-deserialize` (reachable Java deserialization) → `benchmark/truth/java-spring-deserialize.json`.
|
||||
- `cases/java/spring-guarded` (deserialization unreachable unless ALLOW_DESER=true) → `benchmark/truth/java-spring-guarded.json`.
|
||||
- `cases/java/spring-deserialize` (reachable Java deserialization) ƒ+' `benchmark/truth/java-spring-deserialize.json`.
|
||||
- `cases/java/spring-guarded` (deserialization unreachable unless ALLOW_DESER=true) ƒ+' `benchmark/truth/java-spring-guarded.json`.
|
||||
- `cases/java/micronaut-deserialize` (reachable Micronaut-style deserialization) ƒ+' `benchmark/truth/java-micronaut-deserialize.json`.
|
||||
- `cases/java/micronaut-guarded` (unreachable unless ALLOW_MN_DESER=true) ƒ+' `benchmark/truth/java-micronaut-guarded.json`.
|
||||
- `cases/java/spring-reflection` (reflection sink reachable via Class.forName) ƒ+' `benchmark/truth/java-spring-reflection.json`.
|
||||
|
||||
## Determinism & Offline Rules
|
||||
- No network during build/test; pin images/deps; set `SOURCE_DATE_EPOCH`.
|
||||
- Sort file lists; stable JSON/YAML emitters; fixed RNG seeds.
|
||||
- All scripts must succeed on a clean machine with cached toolchain tarballs only.
|
||||
- Java builds auto-use vendored Temurin 21 via `tools/java/ensure_jdk.sh` when `JAVA_HOME`/`javac` are absent.
|
||||
|
||||
## Licensing
|
||||
- Apache-2.0 for all benchmark assets. Third-party snippets must be license-compatible and attributed.
|
||||
@@ -50,8 +54,10 @@ Sample cases added (Java track):
|
||||
python tools/validate.py all schemas/examples
|
||||
|
||||
# score a submission (coming in task 513-008)
|
||||
cd tools/scorer
|
||||
./rb-score --cases ../cases --truth ../benchmark/truth --submission ../benchmark/submissions/sample.json
|
||||
./tools/scorer/rb-score --cases cases --truth benchmark/truth --submission benchmark/submissions/sample.json
|
||||
|
||||
# deterministic case builds (skip a language when a toolchain is unavailable)
|
||||
python tools/build/build_all.py --cases cases --skip-lang js
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
Reference in New Issue
Block a user