Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
cryptopro-linux-csp / build-and-test (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
sm-remote-ci / build-and-test (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
69 lines
2.7 KiB
Markdown
69 lines
2.7 KiB
Markdown
# Reachability Benchmark · Submission Guide
|
|
|
|
This guide explains how to produce a compliant submission for the Stella Ops reachability benchmark. It is fully offline-friendly.
|
|
|
|
## Prerequisites
|
|
- Python 3.11+
|
|
- Your analyzer toolchain (no network calls during analysis)
|
|
- Schemas from `schemas/` and truth from `benchmark/truth/`
|
|
|
|
## Steps
|
|
1) **Build cases deterministically**
|
|
```bash
|
|
python tools/build/build_all.py --cases cases
|
|
```
|
|
- Sets `SOURCE_DATE_EPOCH`.
|
|
- Uses vendored Temurin 21 via `tools/java/ensure_jdk.sh` when `JAVA_HOME`/`javac` are missing; pass `--skip-lang` if another toolchain is unavailable on your runner.
|
|
|
|
2) **Run your analyzer**
|
|
- For each case, produce sink predictions in memory-safe JSON.
|
|
- Do not reach out to the internet, package registries, or remote APIs.
|
|
|
|
3) **Emit `submission.json`**
|
|
- Must conform to `schemas/submission.schema.json` (`version: 1.0.0`).
|
|
- Sort cases and sinks alphabetically to ensure determinism.
|
|
- Include optional runtime stats under `run` (time_s, peak_mb) if available.
|
|
|
|
4) **Validate**
|
|
```bash
|
|
python tools/validate.py --submission submission.json --schema schemas/submission.schema.json
|
|
```
|
|
|
|
5) **Score locally**
|
|
```bash
|
|
tools/scorer/rb_score.py --truth benchmark/truth/<aggregate>.json --submission submission.json --format json
|
|
```
|
|
|
|
6) **Compare (optional)**
|
|
```bash
|
|
tools/scorer/rb_compare.py --truth benchmark/truth/<aggregate>.json \
|
|
--submissions submission.json baselines/*/submission.json \
|
|
--output leaderboard.json --text
|
|
```
|
|
|
|
## Determinism checklist
|
|
- Set `SOURCE_DATE_EPOCH` for all builds.
|
|
- Disable telemetry/version checks in your analyzer.
|
|
- Avoid nondeterministic ordering (sort file and sink lists).
|
|
- No network access; use vendored toolchains only.
|
|
- Use fixed seeds for any sampling.
|
|
|
|
## Packaging
|
|
- Submit a zip/tar with:
|
|
- `submission.json`
|
|
- Tool version & configuration (README)
|
|
- Optional logs and runtime metrics
|
|
- For production submissions, sign `submission.json` with DSSE and record the envelope under `signatures` in the manifest (see `benchmark/manifest.sample.json`).
|
|
- Do **not** include binaries that require network access or licenses we cannot redistribute.
|
|
|
|
## Provenance & Manifest
|
|
- Reference kit manifest: `benchmark/manifest.sample.json` (schema: `benchmark/schemas/benchmark-manifest.schema.json`).
|
|
- Validate your bundle offline:
|
|
```bash
|
|
python tools/verify_manifest.py benchmark/manifest.sample.json --root bench/reachability-benchmark
|
|
```
|
|
- Determinism templates: `benchmark/templates/determinism/*.env` can be sourced by build scripts per language.
|
|
|
|
## Support
|
|
- Open issues in the public repo (once live) or provide a reproducible script that runs fully offline.
|