Files
git.stella-ops.org/bench/reachability-benchmark/docs/gaps/benchmark-gaps-remediation.md
StellaOps Bot e1262eb916 Add receipt input JSON and SHA256 hash for CVSS policy scoring tests
- Introduced a new JSON fixture `receipt-input.json` containing base, environmental, and threat metrics for CVSS scoring.
- Added corresponding SHA256 hash file `receipt-input.sha256` to ensure integrity of the JSON fixture.
2025-12-04 07:30:42 +02:00

35 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Reachability Benchmark Gaps (G1G12, RD1RD10, RB1RB10) — Remediation
Date: 2025-12-03
Status: IMPLEMENTED
This note closes BENCH-GAPS-513-018, DATASET-GAPS-513-019, and REACH-FIXTURE-GAPS-513-020 by defining manifest/schema updates, verification tooling, and operational guardrails.
## What changed
- **Benchmark kit manifest + schema**: `benchmark/schemas/benchmark-manifest.schema.json` with signed/hashed entries for cases, truth, baselines, schemas, and tools. Sample at `benchmark/manifest.sample.json`.
- **Offline verifier**: `tools/verify_manifest.py` validates the manifest against local files (hashes, required entries, DSSE envelope presence) to keep runs deterministic and tamper-evident.
- **Coverage/trace schemas**: `schemas/coverage.schema.json` and `schemas/trace.schema.json` govern oracle outputs referenced by manifest hashes.
- **Submission provenance checks**: manifest requires SHA-256 for submission schema, scorer package, and each baseline submission; DSSE path optional but encouraged.
- **Determinism env templates**: manifest captures `sourceDateEpoch` and per-tool pinned versions; cases must provide build seeds in case metadata.
- **Unreachability oracles**: truth files must include explicit rationale for unreachable cases; manifest enforces presence of `truth` artifact per case.
- **Sandbox/redaction guidance**: case metadata must declare `sandbox` and `redaction` policy fields (schema updated) to ensure PII removal and constrained execution.
- **Resource normalization**: manifest records build/runtime resource limits (cpu/memory) for repeatable benchmarking.
- **Offline kit & checklist**: dataset safety checklist at `benchmark/checklists/dataset-safety.md`; deterministic packaging via `tools/package_offline_kit.sh`.
- **Frozen baselines**: Semgrep rulepack hash pinned at `baselines/semgrep/rules.sha256`; manifest supports hashed baseline submissions.
## How to use
```bash
python tools/verify_manifest.py benchmark/manifest.sample.json --root benchmark
```
- Fails on hash mismatch, missing artifacts, or schema violations.
- Optional `--pubkey` will verify DSSE envelopes when provided.
## Gap mapping (summary)
- **G1G12 (benchmark gaps)**: addressed via manifest schema fields (attestations, submission provenance, determinism templates, coverage/trace schema refs), offline verifier, and required resource/sandbox metadata.
- **RD1RD10 (dataset gaps)**: lockfile-style manifest with hashes for SBOMs, datasets, truth, binaries; licensing/PII redaction captured via `redaction.policy`; semantic version + changelog required.
- **RB1RB10 (fixtures gaps)**: per-case truth + evidence entries mandatory; manifest enforces presence and hashes; DSSE optional but recorded; coverage/trace schema references included.
## Follow-ups
- When new cases land, regenerate manifest and rerun `tools/verify_manifest.py` in CI.
- For production releases, sign the manifest DSSE and set `signatures[]` accordingly.