Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
cryptopro-linux-csp / build-and-test (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
sm-remote-ci / build-and-test (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
2.7 KiB
2.7 KiB
Reachability Benchmark · Submission Guide
This guide explains how to produce a compliant submission for the Stella Ops reachability benchmark. It is fully offline-friendly.
Prerequisites
- Python 3.11+
- Your analyzer toolchain (no network calls during analysis)
- Schemas from
schemas/and truth frombenchmark/truth/
Steps
-
Build cases deterministically
python tools/build/build_all.py --cases cases- Sets
SOURCE_DATE_EPOCH. - Uses vendored Temurin 21 via
tools/java/ensure_jdk.shwhenJAVA_HOME/javacare missing; pass--skip-langif another toolchain is unavailable on your runner.
- Sets
-
Run your analyzer
- For each case, produce sink predictions in memory-safe JSON.
- Do not reach out to the internet, package registries, or remote APIs.
-
Emit
submission.json- Must conform to
schemas/submission.schema.json(version: 1.0.0). - Sort cases and sinks alphabetically to ensure determinism.
- Include optional runtime stats under
run(time_s, peak_mb) if available.
- Must conform to
-
Validate
python tools/validate.py --submission submission.json --schema schemas/submission.schema.json -
Score locally
tools/scorer/rb_score.py --truth benchmark/truth/<aggregate>.json --submission submission.json --format json -
Compare (optional)
tools/scorer/rb_compare.py --truth benchmark/truth/<aggregate>.json \ --submissions submission.json baselines/*/submission.json \ --output leaderboard.json --text
Determinism checklist
- Set
SOURCE_DATE_EPOCHfor all builds. - Disable telemetry/version checks in your analyzer.
- Avoid nondeterministic ordering (sort file and sink lists).
- No network access; use vendored toolchains only.
- Use fixed seeds for any sampling.
Packaging
- Submit a zip/tar with:
submission.json- Tool version & configuration (README)
- Optional logs and runtime metrics
- For production submissions, sign
submission.jsonwith DSSE and record the envelope undersignaturesin the manifest (seebenchmark/manifest.sample.json). - Do not include binaries that require network access or licenses we cannot redistribute.
Provenance & Manifest
- Reference kit manifest:
benchmark/manifest.sample.json(schema:benchmark/schemas/benchmark-manifest.schema.json). - Validate your bundle offline:
python tools/verify_manifest.py benchmark/manifest.sample.json --root bench/reachability-benchmark - Determinism templates:
benchmark/templates/determinism/*.envcan be sourced by build scripts per language.
Support
- Open issues in the public repo (once live) or provide a reproducible script that runs fully offline.