1.9 KiB
1.9 KiB
Reachability Benchmark · Governance & Maintenance
Roles
- TAC (Technical Advisory Committee): approves material changes to schemas, truth sets, and scoring rules; rotates quarterly.
- Maintainers: curate cases, review submissions, run determinism checks, and publish baselines.
- Observers: may propose cases and review reports; no merge rights.
Release cadence
- Quarterly update window: publish new/updated cases and hidden test set refreshes once per quarter.
- Patch releases: critical fixes to schemas or scorer may be shipped off-cycle; must remain backward compatible within
version: 1.x.
Hidden test set
- A reserved set of cases is held back to prevent overfitting.
- Rotation policy: replace at least 25% of hidden cases each quarter; keep prior versions for audit.
- Hidden cases follow the same determinism rules; hashes and schema versions are documented internally.
Change control
- All changes require:
- Schema validation (
tools/validate.py). - Deterministic rebuild (
tools/build/build_all.pywithSOURCE_DATE_EPOCH). - Updated truth files and baselines.
- Execution log entry in
docs/implplan/SPRINT_0513_...with date/owner.
- Schema validation (
- Breaking changes to schemas or scoring rules require TAC approval and a new major schema version.
Determinism rules (global)
- No network access during build, analysis, or scoring.
- Fixed seeds and sorted outputs.
- Stable timestamps via
SOURCE_DATE_EPOCH. - Telemetry disabled for all tools.
Licensing & provenance
- All public artifacts are Apache-2.0.
- Third-party snippets must retain attribution and be license-compatible.
- Each release captures toolchain hashes (compilers, runners) in the release notes.
Incident handling
- If a nondeterminism or licensing issue is found:
- Freeze new submissions.
- Reproduce with
ci/run-ci.sh. - Issue a hotfix release of truth/baselines; bump patch version.
- Announce in release notes and mark superseded artifacts.