Files
git.stella-ops.org/bench/reachability-benchmark/docs/submission-guide.md
StellaOps Bot 108d1c64b3
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Findings Ledger CI / build-test (push) Has been cancelled
Findings Ledger CI / migration-validation (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
cryptopro-linux-csp / build-and-test (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
sm-remote-ci / build-and-test (push) Has been cancelled
Findings Ledger CI / generate-manifest (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
up
2025-12-09 09:38:09 +02:00

2.7 KiB

Reachability Benchmark · Submission Guide

This guide explains how to produce a compliant submission for the Stella Ops reachability benchmark. It is fully offline-friendly.

Prerequisites

  • Python 3.11+
  • Your analyzer toolchain (no network calls during analysis)
  • Schemas from schemas/ and truth from benchmark/truth/

Steps

  1. Build cases deterministically

    python tools/build/build_all.py --cases cases
    
    • Sets SOURCE_DATE_EPOCH.
    • Uses vendored Temurin 21 via tools/java/ensure_jdk.sh when JAVA_HOME/javac are missing; pass --skip-lang if another toolchain is unavailable on your runner.
  2. Run your analyzer

    • For each case, produce sink predictions in memory-safe JSON.
    • Do not reach out to the internet, package registries, or remote APIs.
  3. Emit submission.json

    • Must conform to schemas/submission.schema.json (version: 1.0.0).
    • Sort cases and sinks alphabetically to ensure determinism.
    • Include optional runtime stats under run (time_s, peak_mb) if available.
  4. Validate

    python tools/validate.py --submission submission.json --schema schemas/submission.schema.json
    
  5. Score locally

    tools/scorer/rb_score.py --truth benchmark/truth/<aggregate>.json --submission submission.json --format json
    
  6. Compare (optional)

    tools/scorer/rb_compare.py --truth benchmark/truth/<aggregate>.json \
      --submissions submission.json baselines/*/submission.json \
      --output leaderboard.json --text
    

Determinism checklist

  • Set SOURCE_DATE_EPOCH for all builds.
  • Disable telemetry/version checks in your analyzer.
  • Avoid nondeterministic ordering (sort file and sink lists).
  • No network access; use vendored toolchains only.
  • Use fixed seeds for any sampling.

Packaging

  • Submit a zip/tar with:
    • submission.json
    • Tool version & configuration (README)
    • Optional logs and runtime metrics
  • For production submissions, sign submission.json with DSSE and record the envelope under signatures in the manifest (see benchmark/manifest.sample.json).
  • Do not include binaries that require network access or licenses we cannot redistribute.

Provenance & Manifest

  • Reference kit manifest: benchmark/manifest.sample.json (schema: benchmark/schemas/benchmark-manifest.schema.json).
  • Validate your bundle offline:
    python tools/verify_manifest.py benchmark/manifest.sample.json --root bench/reachability-benchmark
    
  • Determinism templates: benchmark/templates/determinism/*.env can be sourced by build scripts per language.

Support

  • Open issues in the public repo (once live) or provide a reproducible script that runs fully offline.