- Introduced a new JSON fixture `receipt-input.json` containing base, environmental, and threat metrics for CVSS scoring. - Added corresponding SHA256 hash file `receipt-input.sha256` to ensure integrity of the JSON fixture.
2.9 KiB
2.9 KiB
Reachability Benchmark Gaps (G1–G12, RD1–RD10, RB1–RB10) — Remediation
Date: 2025-12-03 Status: IMPLEMENTED
This note closes BENCH-GAPS-513-018, DATASET-GAPS-513-019, and REACH-FIXTURE-GAPS-513-020 by defining manifest/schema updates, verification tooling, and operational guardrails.
What changed
- Benchmark kit manifest + schema:
benchmark/schemas/benchmark-manifest.schema.jsonwith signed/hashed entries for cases, truth, baselines, schemas, and tools. Sample atbenchmark/manifest.sample.json. - Offline verifier:
tools/verify_manifest.pyvalidates the manifest against local files (hashes, required entries, DSSE envelope presence) to keep runs deterministic and tamper-evident. - Coverage/trace schemas:
schemas/coverage.schema.jsonandschemas/trace.schema.jsongovern oracle outputs referenced by manifest hashes. - Submission provenance checks: manifest requires SHA-256 for submission schema, scorer package, and each baseline submission; DSSE path optional but encouraged.
- Determinism env templates: manifest captures
sourceDateEpochand per-tool pinned versions; cases must provide build seeds in case metadata. - Unreachability oracles: truth files must include explicit rationale for unreachable cases; manifest enforces presence of
truthartifact per case. - Sandbox/redaction guidance: case metadata must declare
sandboxandredactionpolicy fields (schema updated) to ensure PII removal and constrained execution. - Resource normalization: manifest records build/runtime resource limits (cpu/memory) for repeatable benchmarking.
- Offline kit & checklist: dataset safety checklist at
benchmark/checklists/dataset-safety.md; deterministic packaging viatools/package_offline_kit.sh. - Frozen baselines: Semgrep rulepack hash pinned at
baselines/semgrep/rules.sha256; manifest supports hashed baseline submissions.
How to use
python tools/verify_manifest.py benchmark/manifest.sample.json --root benchmark
- Fails on hash mismatch, missing artifacts, or schema violations.
- Optional
--pubkeywill verify DSSE envelopes when provided.
Gap mapping (summary)
- G1–G12 (benchmark gaps): addressed via manifest schema fields (attestations, submission provenance, determinism templates, coverage/trace schema refs), offline verifier, and required resource/sandbox metadata.
- RD1–RD10 (dataset gaps): lockfile-style manifest with hashes for SBOMs, datasets, truth, binaries; licensing/PII redaction captured via
redaction.policy; semantic version + changelog required. - RB1–RB10 (fixtures gaps): per-case truth + evidence entries mandatory; manifest enforces presence and hashes; DSSE optional but recorded; coverage/trace schema references included.
Follow-ups
- When new cases land, regenerate manifest and rerun
tools/verify_manifest.pyin CI. - For production releases, sign the manifest DSSE and set
signatures[]accordingly.