up
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
sdk-generator-smoke / sdk-smoke (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
sdk-generator-smoke / sdk-smoke (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
This commit is contained in:
@@ -1,59 +1,57 @@
|
||||
# Replay Test Strategy (Draft)
|
||||
# Replay Test Strategy
|
||||
|
||||
> **Ownership:** Docs Guild · Scanner Guild · Evidence Locker Guild · QA Guild
|
||||
> **Related:** `docs/replay/DETERMINISTIC_REPLAY.md`, `docs/replay/DEVS_GUIDE_REPLAY.md`, `docs/modules/platform/architecture-overview.md`, `docs/implplan/SPRINT_186_record_deterministic_execution.md`, `docs/implplan/SPRINT_187_evidence_locker_cli_integration.md`
|
||||
> **Imposed rule:** Replay tests must use frozen inputs (SBOM, advisories, VEX, feeds, policy, tools) and fixed seeds/clocks; any non-determinism is a test failure.
|
||||
|
||||
This playbook enumerates the deterministic replay validation suite. It guides the work tracked under Sprints 186–187 so every guild ships the same baseline before enabling `scan --record`.
|
||||
This strategy defines how we validate replayability of Scanner outputs and attestations across tool/definition updates and environments.
|
||||
|
||||
---
|
||||
## 1. Goals
|
||||
- Prove that a recorded scan bundle (inputs + manifests) replays bit-for-bit across environments.
|
||||
- Detect drift from feeds, policy, or tooling changes before shipping releases.
|
||||
- Provide auditors with evidence (hashes, DSSE bundles) that replays are deterministic.
|
||||
|
||||
## 1 · Test matrix
|
||||
## 2. Test layers
|
||||
1) **Golden replay**: take a recorded bundle (SBOM/VEX/feeds/policy/tool hashes) and rerun; assert hash equality for SBOM, findings, VEX, logs. Fail on any difference.
|
||||
2) **Feed drift guard**: rerun bundle after feed update; expect differences; ensure drift is surfaced (hash mismatch, diff report) not silently masked.
|
||||
3) **Tool upgrade**: rerun with new scanner version; expect stable outputs if no functional change, otherwise require documented diffs.
|
||||
4) **Policy change**: rerun with updated policy; expect explain trace to show changed rules and hash delta; diff must be recorded.
|
||||
5) **Offline**: replay in sealed mode using only bundle contents; no network access permitted.
|
||||
|
||||
| ID | Scenario | Purpose | Modules | Required Artifacts |
|
||||
|----|----------|---------|---------|--------------------|
|
||||
| T-STRICT-001 | **Golden Replay** | Re-run a recorded scan and expect byte-identical outputs. | Scanner.WebService, Scanner.Worker, CLI | `manifest.json`, input/output bundles, DSSE signatures |
|
||||
| T-FEED-002 | **Feed Drift What-If** | Re-run with updated feeds (`--what-if feeds`) to ensure only feed hashes change. | Scanner.Worker, Concelier, CLI | Feed snapshot bundles, policy bundle, diff report |
|
||||
| T-TOOL-003 | **Toolchain Upgrade Guard** | Attempt replay with newer scanner binary; expect rejection with `ToolHashMismatch`. | Scanner.Worker, Replay.Core | Tool hash catalog, error log |
|
||||
| T-POLICY-004 | **Policy Variation Diff** | Re-run with alternate lattice bundle; expect deterministic diff, not failure. | Policy Engine, CLI | Policy bundle(s), diff output |
|
||||
| T-LEDGER-005 | **Ledger Verification** | Verify Rekor inclusion proof and DSSE signatures offline. | Attestor, Signer, Authority, CLI | DSSE envelopes, Rekor proof, RootPack |
|
||||
| T-RETENTION-006 | **Retention Sweep** | Ensure Evidence Locker prunes hot CAS after SLA while preserving cold storage copies. | Evidence Locker, Ops | Replay retention config, audit logs |
|
||||
| T-OFFLINE-007 | **Offline Kit Replay** | Execute `stella replay` using only Offline Kit artifacts. | CLI, Evidence Locker | Offline kit bundle, local RootPack |
|
||||
| T-OPA-008 | **Runbook Drill** | Simulate replay-driven incident response per `docs/runbooks/replay_ops.md`. | Ops Guild, Scanner, Authority | Runbook checklist, incident notes |
|
||||
| T-REACH-009 | **Reachability Replay** | Rehydrate reachability graphs/traces from replay bundles and compare against reachbench fixtures. | Scanner, Signals, Replay | `reachbench-2025-expanded`, reachability CAS references |
|
||||
## 3. Inputs
|
||||
- Replay bundle contents: `sbom`, `feeds.tar.gz`, `policy.tar.gz`, `scanner-image`, `reachability.graph`, `runtime-trace` (optional), `replay.yaml`.
|
||||
- Hash manifest: SHA-256 for every file; top-level Merkle root.
|
||||
- DSSE attestations (optional): for replay manifest and artifacts.
|
||||
|
||||
---
|
||||
## 4. Determinism settings
|
||||
- Fixed clock (`--fixed-clock` ISO-8601), RNG seed (`RNG_SEED`), single-threaded mode (`SCANNER_MAX_CONCURRENCY=1`), stable ordering (sorted inputs), log filtering (strip timestamps/PIDs).
|
||||
- Disable network/egress; rely on bundled feeds/policy.
|
||||
|
||||
## 2 · Execution guidelines
|
||||
## 5. Assertions
|
||||
- Hash equality for outputs: SBOMs, findings, VEX, logs (canonicalised), determinism.json (if present).
|
||||
- Verify DSSE signatures and Rekor proofs when available; fail if mismatched or missing.
|
||||
- Report diff summary when hashes differ (feed/tool/policy drift).
|
||||
|
||||
1. **Deterministic environment** — Freeze clock, locale, timezone, and random seed per manifest. See `docs/replay/DETERMINISTIC_REPLAY.md` §4.
|
||||
2. **Canonical verification** — Use `StellaOps.Replay.Core` JSON serializer; reject non-canonical payloads before diffing.
|
||||
3. **Data sources** — Replay always consumes `replay_runs` + CAS bundles, never live feeds/policies.
|
||||
4. **CI integration** —
|
||||
- Scanner repo: add pipeline stage `ReplayStrict` running T-STRICT-001 on fixture images (x64 + arm64).
|
||||
- CLI repo: smoke test `scan --record`, `verify`, `replay`, `diff` using generated fixtures.
|
||||
- Evidence Locker repo: nightly retention test (T-RETENTION-006) with dry-run mode.
|
||||
5. **Observability** — Emit metrics `replay_verify_total{result}`, `replay_diff_total{mode}`, `replay_bundle_size_bytes`. Structured logs require `replay.scan_id`, `subject.digest`, `manifest.hash`.
|
||||
## 6. Tooling
|
||||
- CLI: `stella replay run --bundle <path> --fixed-clock 2025-11-01T00:00:00Z --seed 1337 --single-threaded`.
|
||||
- Scripts: `scripts/replay/verify_bundle.sh` (hash/manifest check), `scripts/replay/run_replay.sh` (orchestrates fixed settings), `scripts/replay/diff_outputs.py` (canonical diffs).
|
||||
- CI: `bench:determinism` target executes golden replay on reference bundles; fails on hash delta.
|
||||
|
||||
---
|
||||
## 7. Outputs
|
||||
- `replay-results.json` with per-artifact hashes, pass/fail, diff counts.
|
||||
- `replay.log` filtered (no timestamps/PIDs), `replay.hashes` (sha256sum of outputs).
|
||||
- Optional DSSE attestation for replay results.
|
||||
|
||||
## 3 · Fixtures and tooling
|
||||
## 8. Reporting
|
||||
- Publish results to CI artifacts; store in Evidence Locker for audit.
|
||||
- Add summary to release notes when replay is part of a release gate.
|
||||
|
||||
- **Fixture catalog** lives under `tools/replay-fixtures/`. Include `README.md` describing update workflow and deterministic compression command.
|
||||
- **Generation script** (`./tools/replay-fixtures/build.sh`) orchestrates recording, verifying, and packaging fixtures.
|
||||
- **Checksum manifest** (`fixtures/checksums.json`) lists CAS digests and DSSE hashes for quick sanity checks.
|
||||
- **CI secrets** must provide offline RootPack and replay signing keys; use sealed secrets in air-gapped pipelines.
|
||||
## 9. Checklists
|
||||
- [ ] Bundle verified (hash manifest, DSSE if present).
|
||||
- [ ] Fixed clock/seed/concurrency applied.
|
||||
- [ ] Network disabled; feeds/policy/tooling from bundle only.
|
||||
- [ ] Outputs hashed and compared to baseline; diffs recorded.
|
||||
- [ ] Replay results stored + (optionally) attested.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Acceptance checklist
|
||||
|
||||
- [ ] All test scenarios executed on x64 and arm64 runners.
|
||||
- [ ] Replay verification metrics ingested into Telemetry Stack dashboards.
|
||||
- [ ] Evidence Locker retention job validated against hot/cold tiers.
|
||||
- [ ] CLI documentation updated with troubleshooting steps observed during tests.
|
||||
- [ ] Runbook drill logged with timestamp and owners in `docs/runbooks/replay_ops.md`.
|
||||
- [ ] Reachability replay drill captured (`T-REACH-009`) with fixture references and Signals verification logs.
|
||||
|
||||
---
|
||||
|
||||
*Drafted: 2025-11-03. Update statuses in Sprint 186/187 boards when this checklist is satisfied.*
|
||||
## References
|
||||
- `docs/modules/scanner/determinism-score.md`
|
||||
- `docs/replay/DETERMINISTIC_REPLAY.md`
|
||||
- `docs/modules/scanner/entropy.md`
|
||||
|
||||
Reference in New Issue
Block a user