- Introduced a new JSON fixture `receipt-input.json` containing base, environmental, and threat metrics for CVSS scoring. - Added corresponding SHA256 hash file `receipt-input.sha256` to ensure integrity of the JSON fixture.
35 lines
2.9 KiB
Markdown
35 lines
2.9 KiB
Markdown
# Reachability Benchmark Gaps (G1–G12, RD1–RD10, RB1–RB10) — Remediation
|
||
|
||
Date: 2025-12-03
|
||
Status: IMPLEMENTED
|
||
|
||
This note closes BENCH-GAPS-513-018, DATASET-GAPS-513-019, and REACH-FIXTURE-GAPS-513-020 by defining manifest/schema updates, verification tooling, and operational guardrails.
|
||
|
||
## What changed
|
||
- **Benchmark kit manifest + schema**: `benchmark/schemas/benchmark-manifest.schema.json` with signed/hashed entries for cases, truth, baselines, schemas, and tools. Sample at `benchmark/manifest.sample.json`.
|
||
- **Offline verifier**: `tools/verify_manifest.py` validates the manifest against local files (hashes, required entries, DSSE envelope presence) to keep runs deterministic and tamper-evident.
|
||
- **Coverage/trace schemas**: `schemas/coverage.schema.json` and `schemas/trace.schema.json` govern oracle outputs referenced by manifest hashes.
|
||
- **Submission provenance checks**: manifest requires SHA-256 for submission schema, scorer package, and each baseline submission; DSSE path optional but encouraged.
|
||
- **Determinism env templates**: manifest captures `sourceDateEpoch` and per-tool pinned versions; cases must provide build seeds in case metadata.
|
||
- **Unreachability oracles**: truth files must include explicit rationale for unreachable cases; manifest enforces presence of `truth` artifact per case.
|
||
- **Sandbox/redaction guidance**: case metadata must declare `sandbox` and `redaction` policy fields (schema updated) to ensure PII removal and constrained execution.
|
||
- **Resource normalization**: manifest records build/runtime resource limits (cpu/memory) for repeatable benchmarking.
|
||
- **Offline kit & checklist**: dataset safety checklist at `benchmark/checklists/dataset-safety.md`; deterministic packaging via `tools/package_offline_kit.sh`.
|
||
- **Frozen baselines**: Semgrep rulepack hash pinned at `baselines/semgrep/rules.sha256`; manifest supports hashed baseline submissions.
|
||
|
||
## How to use
|
||
```bash
|
||
python tools/verify_manifest.py benchmark/manifest.sample.json --root benchmark
|
||
```
|
||
- Fails on hash mismatch, missing artifacts, or schema violations.
|
||
- Optional `--pubkey` will verify DSSE envelopes when provided.
|
||
|
||
## Gap mapping (summary)
|
||
- **G1–G12 (benchmark gaps)**: addressed via manifest schema fields (attestations, submission provenance, determinism templates, coverage/trace schema refs), offline verifier, and required resource/sandbox metadata.
|
||
- **RD1–RD10 (dataset gaps)**: lockfile-style manifest with hashes for SBOMs, datasets, truth, binaries; licensing/PII redaction captured via `redaction.policy`; semantic version + changelog required.
|
||
- **RB1–RB10 (fixtures gaps)**: per-case truth + evidence entries mandatory; manifest enforces presence and hashes; DSSE optional but recorded; coverage/trace schema references included.
|
||
|
||
## Follow-ups
|
||
- When new cases land, regenerate manifest and rerun `tools/verify_manifest.py` in CI.
|
||
- For production releases, sign the manifest DSSE and set `signatures[]` accordingly.
|