Files
git.stella-ops.org/docs/operations/score-replay-runbook.md
StellaOps Bot 4b3db9ca85 docs(ops): Complete operations runbooks for Epic 3500
Sprint 3500.0004.0004 (Documentation & Handoff) - T2 DONE

Operations Runbooks Added:
- score-replay-runbook.md: Deterministic replay procedures
- proof-verification-runbook.md: DSSE/Merkle verification ops
- airgap-operations-runbook.md: Offline kit management

CLI Reference Docs:
- reachability-cli-reference.md
- score-proofs-cli-reference.md
- unknowns-cli-reference.md

Air-Gap Guides:
- score-proofs-reachability-airgap-runbook.md

Training Materials:
- score-proofs-concept-guide.md

UI API Clients:
- proof.client.ts
- reachability.client.ts
- unknowns.client.ts

All 5 operations runbooks now complete (reachability, unknowns-queue,
score-replay, proof-verification, airgap-operations).
2025-12-20 22:30:02 +02:00

13 KiB

Score Replay Operations Runbook

Version: 1.0.0
Sprint: 3500.0004.0004
Last Updated: 2025-12-20

This runbook covers operational procedures for Score Replay, including deterministic score computation verification, proof bundle validation, and troubleshooting replay discrepancies.


Table of Contents

  1. Overview
  2. Score Replay Operations
  3. Determinism Verification
  4. Proof Bundle Management
  5. Troubleshooting
  6. Monitoring & Alerting
  7. Escalation Procedures

1. Overview

What is Score Replay?

Score Replay is the ability to re-execute a vulnerability score computation using the exact same inputs (SBOM, rules, policies, feeds) that were used in the original scan. This provides:

  • Auditability: Prove that a score was computed correctly
  • Determinism verification: Confirm that identical inputs produce identical outputs
  • Compliance evidence: Generate proof bundles for regulatory requirements
  • Dispute resolution: Verify contested scan results

Key Concepts

Term Definition
Manifest Content-addressed record of all scoring inputs (SBOM hash, rules hash, policy hash, feed hash)
Proof Bundle Signed attestation containing manifest, score, and Merkle proof
Root Hash Merkle tree root computed from all input hashes
DSSE Envelope Dead Simple Signing Envelope containing the signed proof
Freeze Timestamp Optional timestamp to replay scoring at a specific point in time

Architecture Components

Component Purpose Location
Score Engine Computes vulnerability scores Scanner Worker
Manifest Store Persists scoring manifests scanner.manifest table
Proof Chain Generates Merkle proofs Attestor library
Signer Signs proof bundles (DSSE) Signer service

2. Score Replay Operations

2.1 Triggering a Score Replay

Via CLI

# Basic replay
stella score replay --scan <scan-id>

# Replay with specific manifest
stella score replay --scan <scan-id> --manifest-hash sha256:abc123...

# Replay with frozen timestamp (for determinism testing)
stella score replay --scan <scan-id> --freeze 2025-01-15T00:00:00Z

# Output as JSON
stella score replay --scan <scan-id> --output json

Via API

# POST /api/v1/scanner/score/{scanId}/replay
curl -X POST "https://scanner.stellaops.local/api/v1/scanner/score/scan-123/replay" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "manifestHash": "sha256:abc123...",
    "freezeTimestamp": "2025-01-15T00:00:00Z"
  }'

Expected Response

{
  "scanId": "scan-123",
  "score": 7.5,
  "rootHash": "sha256:def456...",
  "bundleUri": "/api/v1/scanner/scans/scan-123/proofs/sha256:def456...",
  "manifestHash": "sha256:abc123...",
  "replayedAt": "2025-01-16T10:30:00Z",
  "deterministic": true
}

2.2 Retrieving Proof Bundles

Via CLI

# Get bundle for a scan
stella score bundle --scan <scan-id>

# Download bundle to file
stella score bundle --scan <scan-id> --output bundle.tar.gz

Via API

# GET /api/v1/scanner/score/{scanId}/bundle
curl "https://scanner.stellaops.local/api/v1/scanner/score/scan-123/bundle" \
  -H "Authorization: Bearer $TOKEN" \
  -o bundle.tar.gz

2.3 Verifying Score Integrity

Via CLI

# Verify against expected root hash
stella score verify --scan <scan-id> --root-hash sha256:def456...

# Verify downloaded bundle
stella proof verify --bundle bundle.tar.gz

Via API

# POST /api/v1/scanner/score/{scanId}/verify
curl -X POST "https://scanner.stellaops.local/api/v1/scanner/score/scan-123/verify" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"expectedRootHash": "sha256:def456..."}'

3. Determinism Verification

3.1 What Affects Determinism?

Score computation is deterministic when:

Input Requirement
SBOM Identical content (same hash)
Rules Same rule version and configuration
Policy Same policy document
Feeds Same feed snapshot (freeze timestamp)
Ordering Findings sorted deterministically

3.2 Running Determinism Checks

# Run replay twice and compare
REPLAY1=$(stella score replay --scan $SCAN_ID --output json)
REPLAY2=$(stella score replay --scan $SCAN_ID --output json)

# Extract root hashes
HASH1=$(echo $REPLAY1 | jq -r '.rootHash')
HASH2=$(echo $REPLAY2 | jq -r '.rootHash')

# Compare
if [ "$HASH1" = "$HASH2" ]; then
  echo "✓ Determinism verified: $HASH1"
else
  echo "✗ Non-deterministic! $HASH1 != $HASH2"
  exit 1
fi

3.3 Common Determinism Issues

Issue Cause Resolution
Different root hash Feed data changed between replays Use --freeze timestamp
Score drift Rule version mismatch Pin rules version in manifest
Ordering differences Non-stable sort in findings Check Scanner version (fixed in v2.1+)
Timestamp in output Current time in computation Ensure frozen time mode

3.4 Feed Freeze for Reproducibility

# Replay with feed state frozen to original scan time
stella score replay --scan $SCAN_ID \
  --freeze $(stella scan show $SCAN_ID --output json | jq -r '.scannedAt')

4. Proof Bundle Management

4.1 Bundle Contents

A proof bundle (.tar.gz) contains:

bundle/
├── manifest.json       # Input hashes and metadata
├── score.json          # Computed score and findings summary
├── merkle-proof.json   # Merkle tree with inclusion proofs
├── dsse-envelope.json  # Signed attestation (DSSE format)
└── certificate.pem     # Signing certificate (optional)

4.2 Inspecting Bundles

# Extract and view manifest
tar -xzf bundle.tar.gz
cat bundle/manifest.json | jq .

# Verify DSSE signature
stella proof verify --bundle bundle.tar.gz --verbose

# Check Merkle proof
stella proof spine --bundle bundle.tar.gz

4.3 Bundle Retention Policy

Environment Retention Notes
Production 7 years Regulatory compliance
Staging 90 days Testing purposes
Development 30 days Cleanup automatically

4.4 Archiving Bundles

# Export bundle to long-term storage
stella score bundle --scan $SCAN_ID --output /archive/proofs/$SCAN_ID.tar.gz

# Bulk export for compliance audit
stella score bundle-export \
  --since 2024-01-01 \
  --until 2024-12-31 \
  --output /archive/2024-proofs/

5. Troubleshooting

5.1 Replay Returns Different Score

Symptoms: Replayed score differs from original scan score.

Diagnostic Steps:

  1. Check manifest integrity:

    stella scan show $SCAN_ID --output json | jq '.manifest'
    
  2. Verify feed state:

    # Compare feed hashes
    stella score replay --scan $SCAN_ID --freeze $ORIGINAL_TIME --output json | jq '.manifestHash'
    
  3. Check for rule updates:

    stella rules show --version --output json
    

Resolution:

  • Use --freeze timestamp matching original scan
  • Pin rule versions in policy
  • Regenerate manifest if inputs changed legitimately

5.2 Proof Verification Fails

Symptoms: stella proof verify returns validation errors.

Diagnostic Steps:

  1. Check DSSE signature:

    stella proof verify --bundle bundle.tar.gz --verbose 2>&1 | grep -i signature
    
  2. Verify certificate validity:

    openssl x509 -in bundle/certificate.pem -noout -dates
    
  3. Check Merkle proof:

    stella proof spine --bundle bundle.tar.gz --verify
    

Common Errors:

Error Cause Fix
SIGNATURE_INVALID Bundle tampered or wrong key Re-download bundle
CERTIFICATE_EXPIRED Signing cert expired Check signing key rotation
MERKLE_MISMATCH Root hash doesn't match Verify correct bundle version
MANIFEST_MISSING Incomplete bundle Re-export from API

5.3 Replay Timeout

Symptoms: Replay request times out or takes too long.

Diagnostic Steps:

  1. Check scan size:

    stella scan show $SCAN_ID --output json | jq '.findingsCount'
    
  2. Monitor replay progress:

    stella score replay --scan $SCAN_ID --verbose
    

Resolution:

  • For large scans (>10k findings), increase timeout
  • Check Scanner Worker health
  • Consider async replay for very large scans

5.4 Missing Manifest

Symptoms: Manifest not found error on replay.

Diagnostic Steps:

  1. Verify scan exists:

    stella scan show $SCAN_ID
    
  2. Check manifest table:

    SELECT * FROM scanner.manifest WHERE scan_id = 'scan-123';
    

Resolution:

  • Manifest may have been purged (check retention policy)
  • Restore from backup if available
  • Re-run scan if original inputs available

6. Monitoring & Alerting

6.1 Key Metrics

Metric Description Alert Threshold
score_replay_duration_ms Time to complete replay p99 > 30s
score_replay_determinism_failures Non-deterministic replays > 0
proof_verification_failures Failed verifications > 5/hour
manifest_storage_size_bytes Manifest table size > 100GB

6.2 Grafana Dashboard Queries

# Replay latency
histogram_quantile(0.99, 
  rate(score_replay_duration_ms_bucket[5m])
)

# Determinism failure rate
rate(score_replay_determinism_failures_total[1h])

# Proof verification success rate
sum(rate(proof_verification_success_total[1h])) /
sum(rate(proof_verification_total[1h]))

6.3 Alert Rules

groups:
  - name: score-replay
    rules:
      - alert: ScoreReplayLatencyHigh
        expr: histogram_quantile(0.99, rate(score_replay_duration_ms_bucket[5m])) > 30000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: Score replay latency exceeds 30s at p99

      - alert: DeterminismFailure
        expr: increase(score_replay_determinism_failures_total[1h]) > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: Non-deterministic score replay detected

7. Escalation Procedures

7.1 Escalation Matrix

Severity Condition Response Time Escalate To
P1 - Critical Determinism failure in production 15 minutes Platform Team Lead
P2 - High Proof verification failures > 10/hour 1 hour Scanner Team
P3 - Medium Replay latency degradation 4 hours Scanner Team
P4 - Low Single replay failure Next business day Support Queue

7.2 P1: Determinism Failure Response

  1. Immediate Actions (0-15 min):

    • Capture affected scan IDs
    • Preserve original manifest data
    • Check for recent deployments
  2. Investigation (15-60 min):

    • Compare input hashes between replays
    • Check feed synchronization status
    • Review rule engine logs
  3. Remediation:

    • Roll back if deployment-related
    • Freeze feeds if data drift
    • Hotfix if code bug identified

7.3 Contacts

Role Contact Availability
Scanner Team Lead scanner-lead@stellaops.io Business hours
Platform On-Call platform-oncall@stellaops.io 24/7
Security Team security@stellaops.io Business hours

Appendix A: SQL Queries

Check Manifest History

SELECT 
  scan_id,
  manifest_hash,
  sbom_hash,
  rules_hash,
  policy_hash,
  feed_hash,
  created_at
FROM scanner.manifest
WHERE scan_id = 'scan-123'
ORDER BY created_at DESC;

Find Non-Deterministic Replays

SELECT 
  scan_id,
  COUNT(DISTINCT root_hash) as unique_hashes,
  MIN(replayed_at) as first_replay,
  MAX(replayed_at) as last_replay
FROM scanner.replay_log
GROUP BY scan_id
HAVING COUNT(DISTINCT root_hash) > 1;

Proof Bundle Statistics

SELECT 
  DATE_TRUNC('day', created_at) as day,
  COUNT(*) as bundles_created,
  AVG(bundle_size_bytes) as avg_size,
  SUM(bundle_size_bytes) as total_size
FROM scanner.proof_bundle
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY DATE_TRUNC('day', created_at)
ORDER BY day DESC;

Appendix B: CLI Quick Reference

# Score Replay Commands
stella score replay --scan <id>              # Replay score computation
stella score replay --scan <id> --freeze <ts> # Replay with frozen time
stella score bundle --scan <id>              # Get proof bundle
stella score verify --scan <id> --root-hash <hash>  # Verify score

# Proof Commands
stella proof verify --bundle <path>          # Verify bundle file
stella proof verify --bundle <path> --offline # Offline verification
stella proof spine --bundle <path>           # Show Merkle spine

# Output Formats
--output json                                # JSON output
--output table                               # Table output (default)
--output yaml                                # YAML output

Revision History

Version Date Author Changes
1.0.0 2025-12-20 Agent Initial release