git.stella-ops.org/docs/replay/TEST_STRATEGY.md

# Replay Test Strategy

> **Imposed rule:** Replay tests must use frozen inputs (SBOM, advisories, VEX, feeds, policy, tools) and fixed seeds/clocks; any non-determinism is a test failure.

This strategy defines how we validate replayability of Scanner outputs and attestations across tool/definition updates and environments.

## 1. Goals
- Prove that a recorded scan bundle (inputs + manifests) replays bit-for-bit across environments.
- Detect drift from feeds, policy, or tooling changes before shipping releases.
- Provide auditors with evidence (hashes, DSSE bundles) that replays are deterministic.

## 2. Test layers
1) **Golden replay**: take a recorded bundle (SBOM/VEX/feeds/policy/tool hashes) and rerun; assert hash equality for SBOM, findings, VEX, logs. Fail on any difference.
2) **Feed drift guard**: rerun bundle after feed update; expect differences; ensure drift is surfaced (hash mismatch, diff report) not silently masked.
3) **Tool upgrade**: rerun with new scanner version; expect stable outputs if no functional change, otherwise require documented diffs.
4) **Policy change**: rerun with updated policy; expect explain trace to show changed rules and hash delta; diff must be recorded.
5) **Offline**: replay in sealed mode using only bundle contents; no network access permitted.

## 3. Inputs
- Replay bundle contents: `sbom`, `feeds.tar.gz`, `policy.tar.gz`, `scanner-image`, `reachability.graph`, `runtime-trace` (optional), `replay.yaml`.
- Hash manifest: SHA-256 for every file; top-level Merkle root.
- DSSE attestations (optional): for replay manifest and artifacts.

## 4. Determinism settings
- Fixed clock (`--fixed-clock` ISO-8601), RNG seed (`RNG_SEED`), single-threaded mode (`SCANNER_MAX_CONCURRENCY=1`), stable ordering (sorted inputs), log filtering (strip timestamps/PIDs).
- Disable network/egress; rely on bundled feeds/policy.

## 5. Assertions
- Hash equality for outputs: SBOMs, findings, VEX, logs (canonicalised), determinism.json (if present).
- Verify DSSE signatures and Rekor proofs when available; fail if mismatched or missing.
- Report diff summary when hashes differ (feed/tool/policy drift).

## 6. Tooling
- CLI: `stella replay run --bundle <path> --fixed-clock 2025-11-01T00:00:00Z --seed 1337 --single-threaded`.
- Scripts: `scripts/replay/verify_bundle.sh` (hash/manifest check), `scripts/replay/run_replay.sh` (orchestrates fixed settings), `scripts/replay/diff_outputs.py` (canonical diffs).
- CI: `bench:determinism` target executes golden replay on reference bundles; fails on hash delta.

## 7. Outputs
- `replay-results.json` with per-artifact hashes, pass/fail, diff counts.
- `replay.log` filtered (no timestamps/PIDs), `replay.hashes` (sha256sum of outputs).
- Optional DSSE attestation for replay results.

## 8. Reporting
- Publish results to CI artifacts; store in Evidence Locker for audit.
- Add summary to release notes when replay is part of a release gate.

## 9. Checklists
- [ ] Bundle verified (hash manifest, DSSE if present).
- [ ] Fixed clock/seed/concurrency applied.
- [ ] Network disabled; feeds/policy/tooling from bundle only.
- [ ] Outputs hashed and compared to baseline; diffs recorded.
- [ ] Replay results stored + (optionally) attested.

## References
- `docs/modules/scanner/determinism-score.md`
- `docs/replay/DETERMINISTIC_REPLAY.md`
- `docs/modules/scanner/entropy.md`