Files
git.stella-ops.org/docs/reachability/replay-verification.md
StellaOps Bot df94136727 feat: Implement distro-native version comparison for RPM, Debian, and Alpine packages
- Add RpmVersionComparer for RPM version comparison with epoch, version, and release handling.
- Introduce DebianVersion for parsing Debian EVR (Epoch:Version-Release) strings.
- Create ApkVersion for parsing Alpine APK version strings with suffix support.
- Define IVersionComparator interface for version comparison with proof-line generation.
- Implement VersionComparisonResult struct to encapsulate comparison results and proof lines.
- Add tests for Debian and RPM version comparers to ensure correct functionality and edge case handling.
- Create project files for the version comparison library and its tests.
2025-12-22 09:49:53 +02:00

333 lines
8.7 KiB
Markdown

# Replay Verification
_Last updated: 2025-12-22. Owner: Scanner Guild._
This document describes the **replay verification** workflow that ensures reachability slices are reproducible and tamper-evident.
---
## 1. Overview
Replay verification answers: *"Given the same inputs, do we get the exact same slice?"*
This is critical for:
- **Audit trails**: Prove analysis results are genuine
- **Tamper detection**: Detect modified inputs or results
- **Debugging**: Identify sources of non-determinism
- **Compliance**: Demonstrate reproducible security analysis
---
## 2. Replay Workflow
```
┌─────────────────┐ ┌──────────────────┐ ┌───────────────────┐
│ Original │ │ Rehydrate │ │ Recompute │
│ Slice │────►│ Inputs │────►│ Slice │
│ (with digest) │ │ from CAS │ │ (fresh) │
└─────────────────┘ └──────────────────┘ └───────────────────┘
┌───────────────────┐
│ Compare │
│ byte-for-byte │
└───────────────────┘
┌─────────────┴─────────────┐
▼ ▼
┌──────────┐ ┌──────────┐
│ MATCH │ │ MISMATCH │
│ ✓ │ │ + diff │
└──────────┘ └──────────┘
```
---
## 3. API Reference
### 3.1 Replay Endpoint
```http
POST /api/slices/replay
Content-Type: application/json
{
"sliceDigest": "blake3:a1b2c3d4..."
}
```
### 3.2 Response Format
**Match Response (200 OK)**:
```json
{
"match": true,
"originalDigest": "blake3:a1b2c3d4...",
"recomputedDigest": "blake3:a1b2c3d4...",
"replayedAt": "2025-12-22T10:00:00Z",
"inputsVerified": true
}
```
**Mismatch Response (200 OK)**:
```json
{
"match": false,
"originalDigest": "blake3:a1b2c3d4...",
"recomputedDigest": "blake3:e5f6g7h8...",
"replayedAt": "2025-12-22T10:00:00Z",
"diff": {
"missingNodes": ["node:5"],
"extraNodes": ["node:6"],
"missingEdges": [{"from": "node:1", "to": "node:5"}],
"extraEdges": [{"from": "node:1", "to": "node:6"}],
"verdictDiff": {
"original": "unreachable",
"recomputed": "reachable"
},
"confidenceDiff": {
"original": 0.95,
"recomputed": 0.72
}
},
"possibleCauses": [
"Input graph may have been modified",
"Analyzer version mismatch: 1.2.0 vs 1.2.1",
"Feed version changed: nvd-2025-12-20 vs nvd-2025-12-22"
]
}
```
**Error Response (404 Not Found)**:
```json
{
"error": "slice_not_found",
"message": "Slice with digest blake3:a1b2c3d4... not found in CAS",
"sliceDigest": "blake3:a1b2c3d4..."
}
```
---
## 4. Input Rehydration
All inputs must be CAS-addressed for replay:
### 4.1 Required Inputs
| Input | CAS Key | Description |
|-------|---------|-------------|
| Graph | `cas://graphs/{digest}` | Full RichGraph JSON |
| Binaries | `cas://binaries/{digest}` | Binary file hashes |
| SBOM | `cas://sboms/{digest}` | CycloneDX/SPDX document |
| Policy | `cas://policies/{digest}` | Policy DSL |
| Feeds | `cas://feeds/{version}` | Advisory feed snapshot |
### 4.2 Manifest Contents
```json
{
"manifest": {
"analyzerVersion": "scanner.native:1.2.0",
"rulesetHash": "sha256:abc123...",
"feedVersions": {
"nvd": "2025-12-20",
"osv": "2025-12-20",
"ghsa": "2025-12-20"
},
"createdAt": "2025-12-22T10:00:00Z",
"toolchain": "iced-x86:1.21.0",
"environment": {
"os": "linux",
"arch": "x86_64"
}
}
}
```
---
## 5. Determinism Requirements
For byte-for-byte reproducibility:
### 5.1 JSON Canonicalization
```
1. Keys sorted alphabetically at all levels
2. No whitespace (compact JSON)
3. UTF-8 encoding
4. Lowercase hex for all hashes
5. Numbers: no trailing zeros, scientific notation for large values
```
### 5.2 Graph Ordering
```
Nodes: sorted by symbolId (lexicographic)
Edges: sorted by (from, to) tuple (lexicographic)
Paths: sorted by first node, then path length
```
### 5.3 Timestamp Handling
```
All timestamps: UTC, ISO-8601, with 'Z' suffix
Example: "2025-12-22T10:00:00Z"
No milliseconds unless significant
```
### 5.4 Floating Point
```
Confidence values: round to 6 decimal places
Example: 0.950000, not 0.95 or 0.9500001
```
---
## 6. Diff Computation
When slices don't match:
### 6.1 Diff Algorithm
```python
def compute_diff(original, recomputed):
diff = SliceDiff()
# Node diff
orig_nodes = set(n.id for n in original.subgraph.nodes)
new_nodes = set(n.id for n in recomputed.subgraph.nodes)
diff.missing_nodes = list(orig_nodes - new_nodes)
diff.extra_nodes = list(new_nodes - orig_nodes)
# Edge diff
orig_edges = set((e.from, e.to) for e in original.subgraph.edges)
new_edges = set((e.from, e.to) for e in recomputed.subgraph.edges)
diff.missing_edges = list(orig_edges - new_edges)
diff.extra_edges = list(new_edges - orig_edges)
# Verdict diff
if original.verdict.status != recomputed.verdict.status:
diff.verdict_diff = {
"original": original.verdict.status,
"recomputed": recomputed.verdict.status
}
return diff
```
### 6.2 Cause Analysis
```python
def analyze_causes(original, recomputed, manifest):
causes = []
if manifest.analyzerVersion != current_version():
causes.append(f"Analyzer version mismatch")
if manifest.feedVersions != current_feed_versions():
causes.append(f"Feed version changed")
if original.inputs.graphDigest != fetch_graph_digest():
causes.append(f"Input graph may have been modified")
return causes
```
---
## 7. CLI Usage
### 7.1 Replay Command
```bash
# Replay and verify a slice
stella slice replay --digest blake3:a1b2c3d4...
# Output:
# ✓ Slice verified: digest matches
# Original: blake3:a1b2c3d4...
# Recomputed: blake3:a1b2c3d4...
```
### 7.2 Verbose Mode
```bash
stella slice replay --digest blake3:a1b2c3d4... --verbose
# Output:
# Fetching slice from CAS...
# Rehydrating inputs:
# - Graph: cas://graphs/blake3:xyz... ✓
# - SBOM: cas://sboms/sha256:abc... ✓
# - Policy: cas://policies/sha256:def... ✓
# Recomputing slice...
# Comparing results...
# ✓ Match confirmed
```
### 7.3 Mismatch Handling
```bash
stella slice replay --digest blake3:a1b2c3d4...
# Output:
# ✗ Slice mismatch detected!
#
# Differences:
# Nodes: 1 missing, 0 extra
# Edges: 1 missing, 1 extra
# Verdict: unreachable → reachable
#
# Possible causes:
# - Input graph may have been modified
# - Analyzer version: 1.2.0 → 1.2.1
#
# Run with --diff-file to export detailed diff
```
---
## 8. Error Handling
| Error | Cause | Resolution |
|-------|-------|------------|
| `slice_not_found` | Slice not in CAS | Check digest, verify upload |
| `input_not_found` | Referenced input missing | Reupload inputs |
| `version_mismatch` | Analyzer version differs | Pin version or accept drift |
| `feed_stale` | Feed snapshot unavailable | Use latest or pin version |
---
## 9. Security Considerations
1. **Input integrity**: Verify CAS digests before replay
2. **Audit logging**: Log all replay attempts
3. **Rate limiting**: Prevent replay DoS
4. **Access control**: Same permissions as slice access
---
## 10. Performance Targets
| Metric | Target |
|--------|--------|
| Replay latency | <5s for typical slice |
| Input fetch | <2s (parallel CAS fetches) |
| Comparison | <100ms |
---
## 11. Related Documentation
- [Slice Schema](./slice-schema.md)
- [Binary Reachability Schema](./binary-reachability-schema.md)
- [Determinism Requirements](../contracts/determinism.md)
- [CAS Architecture](../modules/platform/cas.md)
---
_Created: 2025-12-22. See Sprint 3820 for implementation details._