up
This commit is contained in:
41
bench/reachability-benchmark/docs/governance.md
Normal file
41
bench/reachability-benchmark/docs/governance.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# Reachability Benchmark · Governance & Maintenance
|
||||
|
||||
## Roles
|
||||
- **TAC (Technical Advisory Committee):** approves material changes to schemas, truth sets, and scoring rules; rotates quarterly.
|
||||
- **Maintainers:** curate cases, review submissions, run determinism checks, and publish baselines.
|
||||
- **Observers:** may propose cases and review reports; no merge rights.
|
||||
|
||||
## Release cadence
|
||||
- **Quarterly update window:** publish new/updated cases and hidden test set refreshes once per quarter.
|
||||
- **Patch releases:** critical fixes to schemas or scorer may be shipped off-cycle; must remain backward compatible within `version: 1.x`.
|
||||
|
||||
## Hidden test set
|
||||
- A reserved set of cases is held back to prevent overfitting.
|
||||
- Rotation policy: replace at least 25% of hidden cases each quarter; keep prior versions for audit.
|
||||
- Hidden cases follow the same determinism rules; hashes and schema versions are documented internally.
|
||||
|
||||
## Change control
|
||||
- All changes require:
|
||||
- Schema validation (`tools/validate.py`).
|
||||
- Deterministic rebuild (`tools/build/build_all.py` with `SOURCE_DATE_EPOCH`).
|
||||
- Updated truth files and baselines.
|
||||
- Execution log entry in `docs/implplan/SPRINT_0513_...` with date/owner.
|
||||
- Breaking changes to schemas or scoring rules require TAC approval and a new major schema version.
|
||||
|
||||
## Determinism rules (global)
|
||||
- No network access during build, analysis, or scoring.
|
||||
- Fixed seeds and sorted outputs.
|
||||
- Stable timestamps via `SOURCE_DATE_EPOCH`.
|
||||
- Telemetry disabled for all tools.
|
||||
|
||||
## Licensing & provenance
|
||||
- All public artifacts are Apache-2.0.
|
||||
- Third-party snippets must retain attribution and be license-compatible.
|
||||
- Each release captures toolchain hashes (compilers, runners) in the release notes.
|
||||
|
||||
## Incident handling
|
||||
- If a nondeterminism or licensing issue is found:
|
||||
1) Freeze new submissions.
|
||||
2) Reproduce with `ci/run-ci.sh`.
|
||||
3) Issue a hotfix release of truth/baselines; bump patch version.
|
||||
4) Announce in release notes and mark superseded artifacts.
|
||||
59
bench/reachability-benchmark/docs/submission-guide.md
Normal file
59
bench/reachability-benchmark/docs/submission-guide.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# Reachability Benchmark · Submission Guide
|
||||
|
||||
This guide explains how to produce a compliant submission for the Stella Ops reachability benchmark. It is fully offline-friendly.
|
||||
|
||||
## Prerequisites
|
||||
- Python 3.11+
|
||||
- Your analyzer toolchain (no network calls during analysis)
|
||||
- Schemas from `schemas/` and truth from `benchmark/truth/`
|
||||
|
||||
## Steps
|
||||
1) **Build cases deterministically**
|
||||
```bash
|
||||
python tools/build/build_all.py --cases cases
|
||||
```
|
||||
- Sets `SOURCE_DATE_EPOCH`.
|
||||
- Skips Java by default if JDK is unavailable (pass `--skip-lang` as needed).
|
||||
|
||||
2) **Run your analyzer**
|
||||
- For each case, produce sink predictions in memory-safe JSON.
|
||||
- Do not reach out to the internet, package registries, or remote APIs.
|
||||
|
||||
3) **Emit `submission.json`**
|
||||
- Must conform to `schemas/submission.schema.json` (`version: 1.0.0`).
|
||||
- Sort cases and sinks alphabetically to ensure determinism.
|
||||
- Include optional runtime stats under `run` (time_s, peak_mb) if available.
|
||||
|
||||
4) **Validate**
|
||||
```bash
|
||||
python tools/validate.py --submission submission.json --schema schemas/submission.schema.json
|
||||
```
|
||||
|
||||
5) **Score locally**
|
||||
```bash
|
||||
tools/scorer/rb_score.py --truth benchmark/truth/<aggregate>.json --submission submission.json --format json
|
||||
```
|
||||
|
||||
6) **Compare (optional)**
|
||||
```bash
|
||||
tools/scorer/rb_compare.py --truth benchmark/truth/<aggregate>.json \
|
||||
--submissions submission.json baselines/*/submission.json \
|
||||
--output leaderboard.json --text
|
||||
```
|
||||
|
||||
## Determinism checklist
|
||||
- Set `SOURCE_DATE_EPOCH` for all builds.
|
||||
- Disable telemetry/version checks in your analyzer.
|
||||
- Avoid nondeterministic ordering (sort file and sink lists).
|
||||
- No network access; use vendored toolchains only.
|
||||
- Use fixed seeds for any sampling.
|
||||
|
||||
## Packaging
|
||||
- Submit a zip/tar with:
|
||||
- `submission.json`
|
||||
- Tool version & configuration (README)
|
||||
- Optional logs and runtime metrics
|
||||
- Do **not** include binaries that require network access or licenses we cannot redistribute.
|
||||
|
||||
## Support
|
||||
- Open issues in the public repo (once live) or provide a reproducible script that runs fully offline.
|
||||
Reference in New Issue
Block a user