# Replay Verification _Last updated: 2025-12-22. Owner: Scanner Guild._ This document describes the **replay verification** workflow that ensures reachability slices are reproducible and tamper-evident. --- ## 1. Overview Replay verification answers: *"Given the same inputs, do we get the exact same slice?"* This is critical for: - **Audit trails**: Prove analysis results are genuine - **Tamper detection**: Detect modified inputs or results - **Debugging**: Identify sources of non-determinism - **Compliance**: Demonstrate reproducible security analysis --- ## 2. Replay Workflow ``` ┌─────────────────┐ ┌──────────────────┐ ┌───────────────────┐ │ Original │ │ Rehydrate │ │ Recompute │ │ Slice │────►│ Inputs │────►│ Slice │ │ (with digest) │ │ from CAS │ │ (fresh) │ └─────────────────┘ └──────────────────┘ └───────────────────┘ │ ▼ ┌───────────────────┐ │ Compare │ │ byte-for-byte │ └───────────────────┘ │ ┌─────────────┴─────────────┐ ▼ ▼ ┌──────────┐ ┌──────────┐ │ MATCH │ │ MISMATCH │ │ ✓ │ │ + diff │ └──────────┘ └──────────┘ ``` --- ## 3. API Reference ### 3.1 Replay Endpoint ```http POST /api/slices/replay Content-Type: application/json { "sliceDigest": "blake3:a1b2c3d4..." } ``` ### 3.2 Response Format **Match Response (200 OK)**: ```json { "match": true, "originalDigest": "blake3:a1b2c3d4...", "recomputedDigest": "blake3:a1b2c3d4...", "replayedAt": "2025-12-22T10:00:00Z", "inputsVerified": true } ``` **Mismatch Response (200 OK)**: ```json { "match": false, "originalDigest": "blake3:a1b2c3d4...", "recomputedDigest": "blake3:e5f6g7h8...", "replayedAt": "2025-12-22T10:00:00Z", "diff": { "missingNodes": ["node:5"], "extraNodes": ["node:6"], "missingEdges": [{"from": "node:1", "to": "node:5"}], "extraEdges": [{"from": "node:1", "to": "node:6"}], "verdictDiff": { "original": "unreachable", "recomputed": "reachable" }, "confidenceDiff": { "original": 0.95, "recomputed": 0.72 } }, "possibleCauses": [ "Input graph may have been modified", "Analyzer version mismatch: 1.2.0 vs 1.2.1", "Feed version changed: nvd-2025-12-20 vs nvd-2025-12-22" ] } ``` **Error Response (404 Not Found)**: ```json { "error": "slice_not_found", "message": "Slice with digest blake3:a1b2c3d4... not found in CAS", "sliceDigest": "blake3:a1b2c3d4..." } ``` --- ## 4. Input Rehydration All inputs must be CAS-addressed for replay: ### 4.1 Required Inputs | Input | CAS Key | Description | |-------|---------|-------------| | Graph | `cas://graphs/{digest}` | Full RichGraph JSON | | Binaries | `cas://binaries/{digest}` | Binary file hashes | | SBOM | `cas://sboms/{digest}` | CycloneDX/SPDX document | | Policy | `cas://policies/{digest}` | Policy DSL | | Feeds | `cas://feeds/{version}` | Advisory feed snapshot | ### 4.2 Manifest Contents ```json { "manifest": { "analyzerVersion": "scanner.native:1.2.0", "rulesetHash": "sha256:abc123...", "feedVersions": { "nvd": "2025-12-20", "osv": "2025-12-20", "ghsa": "2025-12-20" }, "createdAt": "2025-12-22T10:00:00Z", "toolchain": "iced-x86:1.21.0", "environment": { "os": "linux", "arch": "x86_64" } } } ``` --- ## 5. Determinism Requirements For byte-for-byte reproducibility: ### 5.1 JSON Canonicalization ``` 1. Keys sorted alphabetically at all levels 2. No whitespace (compact JSON) 3. UTF-8 encoding 4. Lowercase hex for all hashes 5. Numbers: no trailing zeros, scientific notation for large values ``` ### 5.2 Graph Ordering ``` Nodes: sorted by symbolId (lexicographic) Edges: sorted by (from, to) tuple (lexicographic) Paths: sorted by first node, then path length ``` ### 5.3 Timestamp Handling ``` All timestamps: UTC, ISO-8601, with 'Z' suffix Example: "2025-12-22T10:00:00Z" No milliseconds unless significant ``` ### 5.4 Floating Point ``` Confidence values: round to 6 decimal places Example: 0.950000, not 0.95 or 0.9500001 ``` --- ## 6. Diff Computation When slices don't match: ### 6.1 Diff Algorithm ```python def compute_diff(original, recomputed): diff = SliceDiff() # Node diff orig_nodes = set(n.id for n in original.subgraph.nodes) new_nodes = set(n.id for n in recomputed.subgraph.nodes) diff.missing_nodes = list(orig_nodes - new_nodes) diff.extra_nodes = list(new_nodes - orig_nodes) # Edge diff orig_edges = set((e.from, e.to) for e in original.subgraph.edges) new_edges = set((e.from, e.to) for e in recomputed.subgraph.edges) diff.missing_edges = list(orig_edges - new_edges) diff.extra_edges = list(new_edges - orig_edges) # Verdict diff if original.verdict.status != recomputed.verdict.status: diff.verdict_diff = { "original": original.verdict.status, "recomputed": recomputed.verdict.status } return diff ``` ### 6.2 Cause Analysis ```python def analyze_causes(original, recomputed, manifest): causes = [] if manifest.analyzerVersion != current_version(): causes.append(f"Analyzer version mismatch") if manifest.feedVersions != current_feed_versions(): causes.append(f"Feed version changed") if original.inputs.graphDigest != fetch_graph_digest(): causes.append(f"Input graph may have been modified") return causes ``` --- ## 7. CLI Usage ### 7.1 Replay Command ```bash # Replay and verify a slice stella slice replay --digest blake3:a1b2c3d4... # Output: # ✓ Slice verified: digest matches # Original: blake3:a1b2c3d4... # Recomputed: blake3:a1b2c3d4... ``` ### 7.2 Verbose Mode ```bash stella slice replay --digest blake3:a1b2c3d4... --verbose # Output: # Fetching slice from CAS... # Rehydrating inputs: # - Graph: cas://graphs/blake3:xyz... ✓ # - SBOM: cas://sboms/sha256:abc... ✓ # - Policy: cas://policies/sha256:def... ✓ # Recomputing slice... # Comparing results... # ✓ Match confirmed ``` ### 7.3 Mismatch Handling ```bash stella slice replay --digest blake3:a1b2c3d4... # Output: # ✗ Slice mismatch detected! # # Differences: # Nodes: 1 missing, 0 extra # Edges: 1 missing, 1 extra # Verdict: unreachable → reachable # # Possible causes: # - Input graph may have been modified # - Analyzer version: 1.2.0 → 1.2.1 # # Run with --diff-file to export detailed diff ``` --- ## 8. Error Handling | Error | Cause | Resolution | |-------|-------|------------| | `slice_not_found` | Slice not in CAS | Check digest, verify upload | | `input_not_found` | Referenced input missing | Reupload inputs | | `version_mismatch` | Analyzer version differs | Pin version or accept drift | | `feed_stale` | Feed snapshot unavailable | Use latest or pin version | --- ## 9. Security Considerations 1. **Input integrity**: Verify CAS digests before replay 2. **Audit logging**: Log all replay attempts 3. **Rate limiting**: Prevent replay DoS 4. **Access control**: Same permissions as slice access --- ## 10. Performance Targets | Metric | Target | |--------|--------| | Replay latency | <5s for typical slice | | Input fetch | <2s (parallel CAS fetches) | | Comparison | <100ms | --- ## 11. Related Documentation - [Slice Schema](./slice-schema.md) - [Binary Reachability Schema](./binary-reachability-schema.md) - [Determinism Requirements](../contracts/determinism.md) - [CAS Architecture](../modules/platform/cas.md) --- _Created: 2025-12-22. See Sprint 3820 for implementation details._