up

2025-11-27 07:46:56 +02:00
parent d63af51f84
commit ea970ead2a
302 changed files with 43161 additions and 1534 deletions
--- a/docs/replay/TEST_STRATEGY.md
+++ b/docs/replay/TEST_STRATEGY.md
@@ -1,59 +1,57 @@
-# Replay Test Strategy (Draft)
+# Replay Test Strategy

-> **Ownership:** Docs Guild · Scanner Guild · Evidence Locker Guild · QA Guild  
-> **Related:** `docs/replay/DETERMINISTIC_REPLAY.md`, `docs/replay/DEVS_GUIDE_REPLAY.md`, `docs/modules/platform/architecture-overview.md`, `docs/implplan/SPRINT_186_record_deterministic_execution.md`, `docs/implplan/SPRINT_187_evidence_locker_cli_integration.md`
+> **Imposed rule:** Replay tests must use frozen inputs (SBOM, advisories, VEX, feeds, policy, tools) and fixed seeds/clocks; any non-determinism is a test failure.

-This playbook enumerates the deterministic replay validation suite. It guides the work tracked under Sprints 186–187 so every guild ships the same baseline before enabling `scan --record`.
+This strategy defines how we validate replayability of Scanner outputs and attestations across tool/definition updates and environments.

---
+## 1. Goals
+- Prove that a recorded scan bundle (inputs + manifests) replays bit-for-bit across environments.
+- Detect drift from feeds, policy, or tooling changes before shipping releases.
+- Provide auditors with evidence (hashes, DSSE bundles) that replays are deterministic.

-## 1 · Test matrix
+## 2. Test layers
+1) **Golden replay**: take a recorded bundle (SBOM/VEX/feeds/policy/tool hashes) and rerun; assert hash equality for SBOM, findings, VEX, logs. Fail on any difference.
+2) **Feed drift guard**: rerun bundle after feed update; expect differences; ensure drift is surfaced (hash mismatch, diff report) not silently masked.
+3) **Tool upgrade**: rerun with new scanner version; expect stable outputs if no functional change, otherwise require documented diffs.
+4) **Policy change**: rerun with updated policy; expect explain trace to show changed rules and hash delta; diff must be recorded.
+5) **Offline**: replay in sealed mode using only bundle contents; no network access permitted.

-| ID | Scenario | Purpose | Modules | Required Artifacts |
-|----|----------|---------|---------|--------------------|
-| T-STRICT-001 | **Golden Replay** | Re-run a recorded scan and expect byte-identical outputs. | Scanner.WebService, Scanner.Worker, CLI | `manifest.json`, input/output bundles, DSSE signatures |
-| T-FEED-002 | **Feed Drift What-If** | Re-run with updated feeds (`--what-if feeds`) to ensure only feed hashes change. | Scanner.Worker, Concelier, CLI | Feed snapshot bundles, policy bundle, diff report |
-| T-TOOL-003 | **Toolchain Upgrade Guard** | Attempt replay with newer scanner binary; expect rejection with `ToolHashMismatch`. | Scanner.Worker, Replay.Core | Tool hash catalog, error log |
-| T-POLICY-004 | **Policy Variation Diff** | Re-run with alternate lattice bundle; expect deterministic diff, not failure. | Policy Engine, CLI | Policy bundle(s), diff output |
-| T-LEDGER-005 | **Ledger Verification** | Verify Rekor inclusion proof and DSSE signatures offline. | Attestor, Signer, Authority, CLI | DSSE envelopes, Rekor proof, RootPack |
-| T-RETENTION-006 | **Retention Sweep** | Ensure Evidence Locker prunes hot CAS after SLA while preserving cold storage copies. | Evidence Locker, Ops | Replay retention config, audit logs |
-| T-OFFLINE-007 | **Offline Kit Replay** | Execute `stella replay` using only Offline Kit artifacts. | CLI, Evidence Locker | Offline kit bundle, local RootPack |
-| T-OPA-008 | **Runbook Drill** | Simulate replay-driven incident response per `docs/runbooks/replay_ops.md`. | Ops Guild, Scanner, Authority | Runbook checklist, incident notes |
-| T-REACH-009 | **Reachability Replay** | Rehydrate reachability graphs/traces from replay bundles and compare against reachbench fixtures. | Scanner, Signals, Replay | `reachbench-2025-expanded`, reachability CAS references |
+## 3. Inputs
+- Replay bundle contents: `sbom`, `feeds.tar.gz`, `policy.tar.gz`, `scanner-image`, `reachability.graph`, `runtime-trace` (optional), `replay.yaml`.
+- Hash manifest: SHA-256 for every file; top-level Merkle root.
+- DSSE attestations (optional): for replay manifest and artifacts.

---
+## 4. Determinism settings
+- Fixed clock (`--fixed-clock` ISO-8601), RNG seed (`RNG_SEED`), single-threaded mode (`SCANNER_MAX_CONCURRENCY=1`), stable ordering (sorted inputs), log filtering (strip timestamps/PIDs).
+- Disable network/egress; rely on bundled feeds/policy.

-## 2 · Execution guidelines
+## 5. Assertions
+- Hash equality for outputs: SBOMs, findings, VEX, logs (canonicalised), determinism.json (if present).
+- Verify DSSE signatures and Rekor proofs when available; fail if mismatched or missing.
+- Report diff summary when hashes differ (feed/tool/policy drift).

-1. **Deterministic environment** — Freeze clock, locale, timezone, and random seed per manifest. See `docs/replay/DETERMINISTIC_REPLAY.md` §4.  
-2. **Canonical verification** — Use `StellaOps.Replay.Core` JSON serializer; reject non-canonical payloads before diffing.  
-3. **Data sources** — Replay always consumes `replay_runs` + CAS bundles, never live feeds/policies.  
-4. **CI integration** —  
-   - Scanner repo: add pipeline stage `ReplayStrict` running T-STRICT-001 on fixture images (x64 + arm64).  
-   - CLI repo: smoke test `scan --record`, `verify`, `replay`, `diff` using generated fixtures.  
-   - Evidence Locker repo: nightly retention test (T-RETENTION-006) with dry-run mode.  
-5. **Observability** — Emit metrics `replay_verify_total{result}`, `replay_diff_total{mode}`, `replay_bundle_size_bytes`. Structured logs require `replay.scan_id`, `subject.digest`, `manifest.hash`.
+## 6. Tooling
+- CLI: `stella replay run --bundle <path> --fixed-clock 2025-11-01T00:00:00Z --seed 1337 --single-threaded`.
+- Scripts: `scripts/replay/verify_bundle.sh` (hash/manifest check), `scripts/replay/run_replay.sh` (orchestrates fixed settings), `scripts/replay/diff_outputs.py` (canonical diffs).
+- CI: `bench:determinism` target executes golden replay on reference bundles; fails on hash delta.

---
+## 7. Outputs
+- `replay-results.json` with per-artifact hashes, pass/fail, diff counts.
+- `replay.log` filtered (no timestamps/PIDs), `replay.hashes` (sha256sum of outputs).
+- Optional DSSE attestation for replay results.

-## 3 · Fixtures and tooling
+## 8. Reporting
+- Publish results to CI artifacts; store in Evidence Locker for audit.
+- Add summary to release notes when replay is part of a release gate.

- **Fixture catalog** lives under `tools/replay-fixtures/`. Include `README.md` describing update workflow and deterministic compression command.  
- **Generation script** (`./tools/replay-fixtures/build.sh`) orchestrates recording, verifying, and packaging fixtures.  
- **Checksum manifest** (`fixtures/checksums.json`) lists CAS digests and DSSE hashes for quick sanity checks.  
- **CI secrets** must provide offline RootPack and replay signing keys; use sealed secrets in air-gapped pipelines.
+## 9. Checklists
+- [ ] Bundle verified (hash manifest, DSSE if present).
+- [ ] Fixed clock/seed/concurrency applied.
+- [ ] Network disabled; feeds/policy/tooling from bundle only.
+- [ ] Outputs hashed and compared to baseline; diffs recorded.
+- [ ] Replay results stored + (optionally) attested.

---
-
-## 4 · Acceptance checklist
-
- [ ] All test scenarios executed on x64 and arm64 runners.  
- [ ] Replay verification metrics ingested into Telemetry Stack dashboards.  
- [ ] Evidence Locker retention job validated against hot/cold tiers.  
- [ ] CLI documentation updated with troubleshooting steps observed during tests.  
- [ ] Runbook drill logged with timestamp and owners in `docs/runbooks/replay_ops.md`.  
- [ ] Reachability replay drill captured (`T-REACH-009`) with fixture references and Signals verification logs.  
-
---
-
-*Drafted: 2025-11-03. Update statuses in Sprint 186/187 boards when this checklist is satisfied.*
+## References
+- `docs/modules/scanner/determinism-score.md`
+- `docs/replay/DETERMINISTIC_REPLAY.md`
+- `docs/modules/scanner/entropy.md`