Files
git.stella-ops.org/bench/reachability-benchmark/docs/submission-guide.md
StellaOps Bot e1262eb916 Add receipt input JSON and SHA256 hash for CVSS policy scoring tests
- Introduced a new JSON fixture `receipt-input.json` containing base, environmental, and threat metrics for CVSS scoring.
- Added corresponding SHA256 hash file `receipt-input.sha256` to ensure integrity of the JSON fixture.
2025-12-04 07:30:42 +02:00

69 lines
2.6 KiB
Markdown

# Reachability Benchmark · Submission Guide
This guide explains how to produce a compliant submission for the Stella Ops reachability benchmark. It is fully offline-friendly.
## Prerequisites
- Python 3.11+
- Your analyzer toolchain (no network calls during analysis)
- Schemas from `schemas/` and truth from `benchmark/truth/`
## Steps
1) **Build cases deterministically**
```bash
python tools/build/build_all.py --cases cases
```
- Sets `SOURCE_DATE_EPOCH`.
- Skips Java by default if JDK is unavailable (pass `--skip-lang` as needed).
2) **Run your analyzer**
- For each case, produce sink predictions in memory-safe JSON.
- Do not reach out to the internet, package registries, or remote APIs.
3) **Emit `submission.json`**
- Must conform to `schemas/submission.schema.json` (`version: 1.0.0`).
- Sort cases and sinks alphabetically to ensure determinism.
- Include optional runtime stats under `run` (time_s, peak_mb) if available.
4) **Validate**
```bash
python tools/validate.py --submission submission.json --schema schemas/submission.schema.json
```
5) **Score locally**
```bash
tools/scorer/rb_score.py --truth benchmark/truth/<aggregate>.json --submission submission.json --format json
```
6) **Compare (optional)**
```bash
tools/scorer/rb_compare.py --truth benchmark/truth/<aggregate>.json \
--submissions submission.json baselines/*/submission.json \
--output leaderboard.json --text
```
## Determinism checklist
- Set `SOURCE_DATE_EPOCH` for all builds.
- Disable telemetry/version checks in your analyzer.
- Avoid nondeterministic ordering (sort file and sink lists).
- No network access; use vendored toolchains only.
- Use fixed seeds for any sampling.
## Packaging
- Submit a zip/tar with:
- `submission.json`
- Tool version & configuration (README)
- Optional logs and runtime metrics
- For production submissions, sign `submission.json` with DSSE and record the envelope under `signatures` in the manifest (see `benchmark/manifest.sample.json`).
- Do **not** include binaries that require network access or licenses we cannot redistribute.
## Provenance & Manifest
- Reference kit manifest: `benchmark/manifest.sample.json` (schema: `benchmark/schemas/benchmark-manifest.schema.json`).
- Validate your bundle offline:
```bash
python tools/verify_manifest.py benchmark/manifest.sample.json --root bench/reachability-benchmark
```
- Determinism templates: `benchmark/templates/determinism/*.env` can be sourced by build scripts per language.
## Support
- Open issues in the public repo (once live) or provide a reproducible script that runs fully offline.