stella-ops.org/git.stella-ops.org

Fork 0

Files

StellaOps Bot 108d1c64b3

Docs CI / lint-and-preview (push) Has been cancelled

Details

Findings Ledger CI / build-test (push) Has been cancelled

Details

Findings Ledger CI / migration-validation (push) Has been cancelled

Details

Scanner Analyzers / Discover Analyzers (push) Has been cancelled

Details

Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled

Details

AOC Guard CI / aoc-guard (push) Has been cancelled

Details

Concelier Attestation Tests / attestation-tests (push) Has been cancelled

Details

cryptopro-linux-csp / build-and-test (push) Has been cancelled

Details

Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled

Details

Signals CI & Image / signals-ci (push) Has been cancelled

Details

sm-remote-ci / build-and-test (push) Has been cancelled

Details

Findings Ledger CI / generate-manifest (push) Has been cancelled

Details

AOC Guard CI / aoc-verify (push) Has been cancelled

Details

Scanner Analyzers / Build Analyzers (push) Has been cancelled

Details

Scanner Analyzers / Test Language Analyzers (push) Has been cancelled

Details

Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled

Details

Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled

Details

2025-12-09 09:38:09 +02:00

3.7 KiB

Raw Permalink Blame History

StellaOps Reachability Benchmark (Public)

Deterministic, reproducible benchmark for reachability analysis tools.

Goals

Provide open cases with ground truth for reachable/unreachable sinks.
Enforce determinism (hash-stable builds, fixed seeds, pinned deps).
Enable fair scoring via the rb-score CLI and published schemas.

Layout

cases/<lang>/<project>/ ƒ?" benchmark cases with deterministic Dockerfiles, pinned deps, oracle tests.
schemas/ ƒ?" JSON/YAML schemas for cases, entrypoints, truth, submissions.
benchmark/truth/ ƒ?" ground-truth labels (hidden/internal split optional).
benchmark/submissions/ ƒ?" sample submissions and format reference.
tools/scorer/ ƒ?" rb-score CLI and tests.
tools/build/ ƒ?" build_all.py (run all cases) and validate_builds.py (run twice and compare hashes).
baselines/ ƒ?" reference runners (Semgrep, CodeQL, Stella) with normalized outputs.
ci/ ƒ?" deterministic CI workflows and scripts.
website/ ƒ?" static site (leaderboard/docs/downloads).

Sample cases added (JS track):

cases/js/unsafe-eval (reachable sink) ƒ+' benchmark/truth/js-unsafe-eval.json.
cases/js/guarded-eval (unreachable by default) ƒ+' benchmark/truth/js-guarded-eval.json.
cases/js/express-eval (admin eval reachable) ƒ+' benchmark/truth/js-express-eval.json.
cases/js/express-guarded (admin eval gated by env) ƒ+' benchmark/truth/js-express-guarded.json.
cases/js/fastify-template (template rendering reachable) ƒ+' benchmark/truth/js-fastify-template.json.

Sample cases added (Python track):

cases/py/unsafe-exec (reachable eval) ƒ+' benchmark/truth/py-unsafe-exec.json.
cases/py/guarded-exec (unreachable when FEATURE_ENABLE != 1) ƒ+' benchmark/truth/py-guarded-exec.json.
cases/py/flask-template (template rendering reachable) ƒ+' benchmark/truth/py-flask-template.json.
cases/py/fastapi-guarded (unreachable unless ALLOW_EXEC=true) ƒ+' benchmark/truth/py-fastapi-guarded.json.
cases/py/django-ssti (template rendering reachable, autoescape off) ƒ+' benchmark/truth/py-django-ssti.json.

Sample cases added (Java track):

cases/java/spring-deserialize (reachable Java deserialization) ƒ+' benchmark/truth/java-spring-deserialize.json.
cases/java/spring-guarded (deserialization unreachable unless ALLOW_DESER=true) ƒ+' benchmark/truth/java-spring-guarded.json.
cases/java/micronaut-deserialize (reachable Micronaut-style deserialization) ƒ+' benchmark/truth/java-micronaut-deserialize.json.
cases/java/micronaut-guarded (unreachable unless ALLOW_MN_DESER=true) ƒ+' benchmark/truth/java-micronaut-guarded.json.
cases/java/spring-reflection (reflection sink reachable via Class.forName) ƒ+' benchmark/truth/java-spring-reflection.json.

Determinism & Offline Rules

No network during build/test; pin images/deps; set SOURCE_DATE_EPOCH.
Sort file lists; stable JSON/YAML emitters; fixed RNG seeds.
All scripts must succeed on a clean machine with cached toolchain tarballs only.
Java builds auto-use vendored Temurin 21 via tools/java/ensure_jdk.sh when JAVA_HOME/javac are absent.

Licensing

Apache-2.0 for all benchmark assets. Third-party snippets must be license-compatible and attributed.

Quick Start (once populated)

# schema sanity checks (offline)
python tools/validate.py all schemas/examples

# score a submission (coming in task 513-008)
./tools/scorer/rb-score --cases cases --truth benchmark/truth --submission benchmark/submissions/sample.json

# deterministic case builds (skip a language when a toolchain is unavailable)
python tools/build/build_all.py --cases cases --skip-lang js

Contributing

See CONTRIBUTING.md. Open issues/PRs welcome; please provide hashes and logs for reproducibility.

3.7 KiB Raw Permalink Blame History

StellaOps Reachability Benchmark (Public)

Goals

Layout

Determinism & Offline Rules

Licensing

Quick Start (once populated)

Contributing

3.7 KiB

Raw Permalink Blame History