stella-ops.org/git.stella-ops.org

Fork 0

Files

StellaOps Bot 909d9b6220

AOC Guard CI / aoc-guard (push) Has been cancelled

Details

AOC Guard CI / aoc-verify (push) Has been cancelled

Details

Docs CI / lint-and-preview (push) Has been cancelled

Details

Policy Lint & Smoke / policy-lint (push) Has been cancelled

Details

2025-12-01 21:16:22 +02:00

2.0 KiB

Raw Blame History

Reachability Benchmark · Submission Guide

This guide explains how to produce a compliant submission for the Stella Ops reachability benchmark. It is fully offline-friendly.

Prerequisites

Python 3.11+
Your analyzer toolchain (no network calls during analysis)
Schemas from schemas/ and truth from benchmark/truth/

Steps

Build cases deterministically
```
python tools/build/build_all.py --cases cases
```
- Sets SOURCE_DATE_EPOCH.
- Skips Java by default if JDK is unavailable (pass --skip-lang as needed).
Run your analyzer
- For each case, produce sink predictions in memory-safe JSON.
- Do not reach out to the internet, package registries, or remote APIs.
Emit submission.json
- Must conform to schemas/submission.schema.json (version: 1.0.0).
- Sort cases and sinks alphabetically to ensure determinism.
- Include optional runtime stats under run (time_s, peak_mb) if available.

Validate

python tools/validate.py --submission submission.json --schema schemas/submission.schema.json

Score locally

tools/scorer/rb_score.py --truth benchmark/truth/<aggregate>.json --submission submission.json --format json

Compare (optional)

tools/scorer/rb_compare.py --truth benchmark/truth/<aggregate>.json \
  --submissions submission.json baselines/*/submission.json \
  --output leaderboard.json --text

Determinism checklist

Set SOURCE_DATE_EPOCH for all builds.
Disable telemetry/version checks in your analyzer.
Avoid nondeterministic ordering (sort file and sink lists).
No network access; use vendored toolchains only.
Use fixed seeds for any sampling.

Packaging

Submit a zip/tar with:
- submission.json
- Tool version & configuration (README)
- Optional logs and runtime metrics
Do not include binaries that require network access or licenses we cannot redistribute.

Support

Open issues in the public repo (once live) or provide a reproducible script that runs fully offline.

2.0 KiB Raw Blame History

Reachability Benchmark · Submission Guide

Prerequisites

Steps

Determinism checklist

Packaging

Support

2.0 KiB

Raw Blame History