Sprint 3500.0004.0004 (Documentation & Handoff) - T2 DONE Operations Runbooks Added: - score-replay-runbook.md: Deterministic replay procedures - proof-verification-runbook.md: DSSE/Merkle verification ops - airgap-operations-runbook.md: Offline kit management CLI Reference Docs: - reachability-cli-reference.md - score-proofs-cli-reference.md - unknowns-cli-reference.md Air-Gap Guides: - score-proofs-reachability-airgap-runbook.md Training Materials: - score-proofs-concept-guide.md UI API Clients: - proof.client.ts - reachability.client.ts - unknowns.client.ts All 5 operations runbooks now complete (reachability, unknowns-queue, score-replay, proof-verification, airgap-operations).
13 KiB
Score Replay Operations Runbook
Version: 1.0.0
Sprint: 3500.0004.0004
Last Updated: 2025-12-20
This runbook covers operational procedures for Score Replay, including deterministic score computation verification, proof bundle validation, and troubleshooting replay discrepancies.
Table of Contents
- Overview
- Score Replay Operations
- Determinism Verification
- Proof Bundle Management
- Troubleshooting
- Monitoring & Alerting
- Escalation Procedures
1. Overview
What is Score Replay?
Score Replay is the ability to re-execute a vulnerability score computation using the exact same inputs (SBOM, rules, policies, feeds) that were used in the original scan. This provides:
- Auditability: Prove that a score was computed correctly
- Determinism verification: Confirm that identical inputs produce identical outputs
- Compliance evidence: Generate proof bundles for regulatory requirements
- Dispute resolution: Verify contested scan results
Key Concepts
| Term | Definition |
|---|---|
| Manifest | Content-addressed record of all scoring inputs (SBOM hash, rules hash, policy hash, feed hash) |
| Proof Bundle | Signed attestation containing manifest, score, and Merkle proof |
| Root Hash | Merkle tree root computed from all input hashes |
| DSSE Envelope | Dead Simple Signing Envelope containing the signed proof |
| Freeze Timestamp | Optional timestamp to replay scoring at a specific point in time |
Architecture Components
| Component | Purpose | Location |
|---|---|---|
| Score Engine | Computes vulnerability scores | Scanner Worker |
| Manifest Store | Persists scoring manifests | scanner.manifest table |
| Proof Chain | Generates Merkle proofs | Attestor library |
| Signer | Signs proof bundles (DSSE) | Signer service |
2. Score Replay Operations
2.1 Triggering a Score Replay
Via CLI
# Basic replay
stella score replay --scan <scan-id>
# Replay with specific manifest
stella score replay --scan <scan-id> --manifest-hash sha256:abc123...
# Replay with frozen timestamp (for determinism testing)
stella score replay --scan <scan-id> --freeze 2025-01-15T00:00:00Z
# Output as JSON
stella score replay --scan <scan-id> --output json
Via API
# POST /api/v1/scanner/score/{scanId}/replay
curl -X POST "https://scanner.stellaops.local/api/v1/scanner/score/scan-123/replay" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"manifestHash": "sha256:abc123...",
"freezeTimestamp": "2025-01-15T00:00:00Z"
}'
Expected Response
{
"scanId": "scan-123",
"score": 7.5,
"rootHash": "sha256:def456...",
"bundleUri": "/api/v1/scanner/scans/scan-123/proofs/sha256:def456...",
"manifestHash": "sha256:abc123...",
"replayedAt": "2025-01-16T10:30:00Z",
"deterministic": true
}
2.2 Retrieving Proof Bundles
Via CLI
# Get bundle for a scan
stella score bundle --scan <scan-id>
# Download bundle to file
stella score bundle --scan <scan-id> --output bundle.tar.gz
Via API
# GET /api/v1/scanner/score/{scanId}/bundle
curl "https://scanner.stellaops.local/api/v1/scanner/score/scan-123/bundle" \
-H "Authorization: Bearer $TOKEN" \
-o bundle.tar.gz
2.3 Verifying Score Integrity
Via CLI
# Verify against expected root hash
stella score verify --scan <scan-id> --root-hash sha256:def456...
# Verify downloaded bundle
stella proof verify --bundle bundle.tar.gz
Via API
# POST /api/v1/scanner/score/{scanId}/verify
curl -X POST "https://scanner.stellaops.local/api/v1/scanner/score/scan-123/verify" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"expectedRootHash": "sha256:def456..."}'
3. Determinism Verification
3.1 What Affects Determinism?
Score computation is deterministic when:
| Input | Requirement |
|---|---|
| SBOM | Identical content (same hash) |
| Rules | Same rule version and configuration |
| Policy | Same policy document |
| Feeds | Same feed snapshot (freeze timestamp) |
| Ordering | Findings sorted deterministically |
3.2 Running Determinism Checks
# Run replay twice and compare
REPLAY1=$(stella score replay --scan $SCAN_ID --output json)
REPLAY2=$(stella score replay --scan $SCAN_ID --output json)
# Extract root hashes
HASH1=$(echo $REPLAY1 | jq -r '.rootHash')
HASH2=$(echo $REPLAY2 | jq -r '.rootHash')
# Compare
if [ "$HASH1" = "$HASH2" ]; then
echo "✓ Determinism verified: $HASH1"
else
echo "✗ Non-deterministic! $HASH1 != $HASH2"
exit 1
fi
3.3 Common Determinism Issues
| Issue | Cause | Resolution |
|---|---|---|
| Different root hash | Feed data changed between replays | Use --freeze timestamp |
| Score drift | Rule version mismatch | Pin rules version in manifest |
| Ordering differences | Non-stable sort in findings | Check Scanner version (fixed in v2.1+) |
| Timestamp in output | Current time in computation | Ensure frozen time mode |
3.4 Feed Freeze for Reproducibility
# Replay with feed state frozen to original scan time
stella score replay --scan $SCAN_ID \
--freeze $(stella scan show $SCAN_ID --output json | jq -r '.scannedAt')
4. Proof Bundle Management
4.1 Bundle Contents
A proof bundle (.tar.gz) contains:
bundle/
├── manifest.json # Input hashes and metadata
├── score.json # Computed score and findings summary
├── merkle-proof.json # Merkle tree with inclusion proofs
├── dsse-envelope.json # Signed attestation (DSSE format)
└── certificate.pem # Signing certificate (optional)
4.2 Inspecting Bundles
# Extract and view manifest
tar -xzf bundle.tar.gz
cat bundle/manifest.json | jq .
# Verify DSSE signature
stella proof verify --bundle bundle.tar.gz --verbose
# Check Merkle proof
stella proof spine --bundle bundle.tar.gz
4.3 Bundle Retention Policy
| Environment | Retention | Notes |
|---|---|---|
| Production | 7 years | Regulatory compliance |
| Staging | 90 days | Testing purposes |
| Development | 30 days | Cleanup automatically |
4.4 Archiving Bundles
# Export bundle to long-term storage
stella score bundle --scan $SCAN_ID --output /archive/proofs/$SCAN_ID.tar.gz
# Bulk export for compliance audit
stella score bundle-export \
--since 2024-01-01 \
--until 2024-12-31 \
--output /archive/2024-proofs/
5. Troubleshooting
5.1 Replay Returns Different Score
Symptoms: Replayed score differs from original scan score.
Diagnostic Steps:
-
Check manifest integrity:
stella scan show $SCAN_ID --output json | jq '.manifest' -
Verify feed state:
# Compare feed hashes stella score replay --scan $SCAN_ID --freeze $ORIGINAL_TIME --output json | jq '.manifestHash' -
Check for rule updates:
stella rules show --version --output json
Resolution:
- Use
--freezetimestamp matching original scan - Pin rule versions in policy
- Regenerate manifest if inputs changed legitimately
5.2 Proof Verification Fails
Symptoms: stella proof verify returns validation errors.
Diagnostic Steps:
-
Check DSSE signature:
stella proof verify --bundle bundle.tar.gz --verbose 2>&1 | grep -i signature -
Verify certificate validity:
openssl x509 -in bundle/certificate.pem -noout -dates -
Check Merkle proof:
stella proof spine --bundle bundle.tar.gz --verify
Common Errors:
| Error | Cause | Fix |
|---|---|---|
SIGNATURE_INVALID |
Bundle tampered or wrong key | Re-download bundle |
CERTIFICATE_EXPIRED |
Signing cert expired | Check signing key rotation |
MERKLE_MISMATCH |
Root hash doesn't match | Verify correct bundle version |
MANIFEST_MISSING |
Incomplete bundle | Re-export from API |
5.3 Replay Timeout
Symptoms: Replay request times out or takes too long.
Diagnostic Steps:
-
Check scan size:
stella scan show $SCAN_ID --output json | jq '.findingsCount' -
Monitor replay progress:
stella score replay --scan $SCAN_ID --verbose
Resolution:
- For large scans (>10k findings), increase timeout
- Check Scanner Worker health
- Consider async replay for very large scans
5.4 Missing Manifest
Symptoms: Manifest not found error on replay.
Diagnostic Steps:
-
Verify scan exists:
stella scan show $SCAN_ID -
Check manifest table:
SELECT * FROM scanner.manifest WHERE scan_id = 'scan-123';
Resolution:
- Manifest may have been purged (check retention policy)
- Restore from backup if available
- Re-run scan if original inputs available
6. Monitoring & Alerting
6.1 Key Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
score_replay_duration_ms |
Time to complete replay | p99 > 30s |
score_replay_determinism_failures |
Non-deterministic replays | > 0 |
proof_verification_failures |
Failed verifications | > 5/hour |
manifest_storage_size_bytes |
Manifest table size | > 100GB |
6.2 Grafana Dashboard Queries
# Replay latency
histogram_quantile(0.99,
rate(score_replay_duration_ms_bucket[5m])
)
# Determinism failure rate
rate(score_replay_determinism_failures_total[1h])
# Proof verification success rate
sum(rate(proof_verification_success_total[1h])) /
sum(rate(proof_verification_total[1h]))
6.3 Alert Rules
groups:
- name: score-replay
rules:
- alert: ScoreReplayLatencyHigh
expr: histogram_quantile(0.99, rate(score_replay_duration_ms_bucket[5m])) > 30000
for: 5m
labels:
severity: warning
annotations:
summary: Score replay latency exceeds 30s at p99
- alert: DeterminismFailure
expr: increase(score_replay_determinism_failures_total[1h]) > 0
for: 1m
labels:
severity: critical
annotations:
summary: Non-deterministic score replay detected
7. Escalation Procedures
7.1 Escalation Matrix
| Severity | Condition | Response Time | Escalate To |
|---|---|---|---|
| P1 - Critical | Determinism failure in production | 15 minutes | Platform Team Lead |
| P2 - High | Proof verification failures > 10/hour | 1 hour | Scanner Team |
| P3 - Medium | Replay latency degradation | 4 hours | Scanner Team |
| P4 - Low | Single replay failure | Next business day | Support Queue |
7.2 P1: Determinism Failure Response
-
Immediate Actions (0-15 min):
- Capture affected scan IDs
- Preserve original manifest data
- Check for recent deployments
-
Investigation (15-60 min):
- Compare input hashes between replays
- Check feed synchronization status
- Review rule engine logs
-
Remediation:
- Roll back if deployment-related
- Freeze feeds if data drift
- Hotfix if code bug identified
7.3 Contacts
| Role | Contact | Availability |
|---|---|---|
| Scanner Team Lead | scanner-lead@stellaops.io | Business hours |
| Platform On-Call | platform-oncall@stellaops.io | 24/7 |
| Security Team | security@stellaops.io | Business hours |
Appendix A: SQL Queries
Check Manifest History
SELECT
scan_id,
manifest_hash,
sbom_hash,
rules_hash,
policy_hash,
feed_hash,
created_at
FROM scanner.manifest
WHERE scan_id = 'scan-123'
ORDER BY created_at DESC;
Find Non-Deterministic Replays
SELECT
scan_id,
COUNT(DISTINCT root_hash) as unique_hashes,
MIN(replayed_at) as first_replay,
MAX(replayed_at) as last_replay
FROM scanner.replay_log
GROUP BY scan_id
HAVING COUNT(DISTINCT root_hash) > 1;
Proof Bundle Statistics
SELECT
DATE_TRUNC('day', created_at) as day,
COUNT(*) as bundles_created,
AVG(bundle_size_bytes) as avg_size,
SUM(bundle_size_bytes) as total_size
FROM scanner.proof_bundle
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY DATE_TRUNC('day', created_at)
ORDER BY day DESC;
Appendix B: CLI Quick Reference
# Score Replay Commands
stella score replay --scan <id> # Replay score computation
stella score replay --scan <id> --freeze <ts> # Replay with frozen time
stella score bundle --scan <id> # Get proof bundle
stella score verify --scan <id> --root-hash <hash> # Verify score
# Proof Commands
stella proof verify --bundle <path> # Verify bundle file
stella proof verify --bundle <path> --offline # Offline verification
stella proof spine --bundle <path> # Show Merkle spine
# Output Formats
--output json # JSON output
--output table # Table output (default)
--output yaml # YAML output
Revision History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2025-12-20 | Agent | Initial release |