Files
git.stella-ops.org/bench/reachability-benchmark/docs/gaps/benchmark-gaps-remediation.md
StellaOps Bot 35c8f9216f Add tests and implement timeline ingestion options with NATS and Redis subscribers
- Introduced `BinaryReachabilityLifterTests` to validate binary lifting functionality.
- Created `PackRunWorkerOptions` for configuring worker paths and execution persistence.
- Added `TimelineIngestionOptions` for configuring NATS and Redis ingestion transports.
- Implemented `NatsTimelineEventSubscriber` for subscribing to NATS events.
- Developed `RedisTimelineEventSubscriber` for reading from Redis Streams.
- Added `TimelineEnvelopeParser` to normalize incoming event envelopes.
- Created unit tests for `TimelineEnvelopeParser` to ensure correct field mapping.
- Implemented `TimelineAuthorizationAuditSink` for logging authorization outcomes.
2025-12-03 09:46:48 +02:00

32 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Reachability Benchmark Gaps (G1G12, RD1RD10, RB1RB10) — Remediation
Date: 2025-12-03
Status: IMPLEMENTED
This note closes BENCH-GAPS-513-018, DATASET-GAPS-513-019, and REACH-FIXTURE-GAPS-513-020 by defining manifest/schema updates, verification tooling, and operational guardrails.
## What changed
- **Benchmark kit manifest + schema**: `benchmark/schemas/benchmark-manifest.schema.json` with signed/hashed entries for cases, truth, baselines, schemas, and tools. Sample at `benchmark/manifest.sample.json`.
- **Offline verifier**: `tools/verify_manifest.py` validates the manifest against local files (hashes, required entries, DSSE envelope presence) to keep runs deterministic and tamper-evident.
- **Submission provenance checks**: manifest requires SHA-256 for submission schema, scorer package, and each baseline submission; DSSE path optional but encouraged.
- **Determinism env templates**: manifest captures `sourceDateEpoch` and per-tool pinned versions; cases must provide build seeds in case metadata.
- **Unreachability oracles**: truth files must include explicit rationale for unreachable cases; manifest enforces presence of `truth` artifact per case.
- **Sandbox/redaction guidance**: case metadata must declare `sandbox` and `redaction` policy fields (schema updated) to ensure PII removal and constrained execution.
- **Resource normalization**: manifest records build/runtime resource limits (cpu/memory) for repeatable benchmarking.
## How to use
```bash
python tools/verify_manifest.py benchmark/manifest.sample.json --root benchmark
```
- Fails on hash mismatch, missing artifacts, or schema violations.
- Optional `--pubkey` will verify DSSE envelopes when provided.
## Gap mapping (summary)
- **G1G12 (benchmark gaps)**: addressed via manifest schema fields (attestations, submission provenance, determinism templates, coverage/trace schema refs), offline verifier, and required resource/sandbox metadata.
- **RD1RD10 (dataset gaps)**: lockfile-style manifest with hashes for SBOMs, datasets, truth, binaries; licensing/PII redaction captured via `redaction.policy`; semantic version + changelog required.
- **RB1RB10 (fixtures gaps)**: per-case truth + evidence entries mandatory; manifest enforces presence and hashes; DSSE optional but recorded; coverage/trace schema references included.
## Follow-ups
- When new cases land, regenerate manifest and rerun `tools/verify_manifest.py` in CI.
- For production releases, sign the manifest DSSE and set `signatures[]` accordingly.