# Score Proofs Operations Runbook > **Version**: 1.0.0 > **Sprint**: 3500.0004.0004 > **Last Updated**: 2025-12-20 This runbook covers operational procedures for Score Proofs, including score replay, proof verification, and troubleshooting. --- ## Table of Contents 1. [Overview](#1-overview) 2. [Score Replay Operations](#2-score-replay-operations) 3. [Proof Verification Operations](#3-proof-verification-operations) 4. [Proof Bundle Management](#4-proof-bundle-management) 5. [Troubleshooting](#5-troubleshooting) 6. [Monitoring & Alerting](#6-monitoring--alerting) 7. [Escalation Procedures](#7-escalation-procedures) --- ## 1. Overview ### What are Score Proofs? Score Proofs provide cryptographically verifiable audit trails for vulnerability scoring decisions. Each proof: - **Records inputs**: SBOM, feed snapshots, VEX data, policy hashes - **Traces computation**: Every scoring rule application - **Signs results**: DSSE envelopes with configurable trust anchors - **Enables replay**: Same inputs → same outputs (deterministic) ### Key Components | Component | Purpose | Location | |-----------|---------|----------| | Scan Manifest | Records all inputs deterministically | `scanner.scan_manifest` table | | Proof Ledger | DAG of scoring computation nodes | `scanner.proof_bundle` table | | DSSE Envelope | Cryptographic signature wrapper | In proof bundle JSON | | Proof Bundle | ZIP archive for offline verification | Stored in object storage | ### Prerequisites - Access to Scanner WebService API - `scanner.proofs` OAuth scope - CLI access with `stella` configured - Trust anchor public keys (for verification) --- ## 2. Score Replay Operations ### 2.1 When to Replay Scores Score replay is needed when: - **Feed updates**: New advisories from Concelier - **VEX updates**: New VEX statements from Excititor - **Policy changes**: Updated scoring policy rules - **Audit requests**: Need to verify historical scores - **Investigation**: Analyze why a score changed ### 2.2 Manual Score Replay (API) ```bash # Get current scan manifest curl -s "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/manifest" \ -H "Authorization: Bearer $TOKEN" | jq '.manifest' # Replay with current feeds (uses latest snapshots) curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/score/replay" \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{}' | jq '.scoreProof.rootHash' # Replay with specific feed snapshot curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/score/replay" \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "overrides": { "concelierSnapshotHash": "sha256:specific-feed-snapshot..." } }' ``` ### 2.3 Manual Score Replay (CLI) ```bash # Replay with current feeds stella score replay --scan-id $SCAN_ID # Replay with specific snapshot stella score replay --scan-id $SCAN_ID \ --feed-snapshot sha256:specific-feed-snapshot... # Replay and compare with original stella score replay --scan-id $SCAN_ID --diff # Replay in offline mode (air-gap) stella score replay --scan-id $SCAN_ID \ --offline \ --bundle /path/to/offline-bundle.zip ``` ### 2.4 Batch Score Replay For bulk replay (e.g., after major feed update): ```bash # List all scans from last 7 days stella scan list --since 7d --format json > scans.json # Replay each scan cat scans.json | jq -r '.[].scanId' | while read SCAN_ID; do echo "Replaying $SCAN_ID..." stella score replay --scan-id "$SCAN_ID" --quiet done # Or use the batch API endpoint (more efficient) curl -X POST "https://scanner.example.com/api/v1/scanner/batch/replay" \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "scanIds": ["scan-1", "scan-2", "scan-3"], "parallel": true, "maxConcurrency": 10 }' ``` ### 2.5 Nightly Replay Job The Scheduler automatically replays scores when Concelier publishes new snapshots: ```yaml # Job configuration in Scheduler job: name: nightly-score-replay schedule: "0 3 * * *" # 3 AM daily trigger: type: concelier-snapshot-published action: type: batch-replay config: maxAge: 30d parallel: true maxConcurrency: 20 ``` **Monitoring the nightly job**: ```bash # Check job status stella scheduler job status nightly-score-replay # View recent runs stella scheduler job runs nightly-score-replay --last 7 # Check for failures stella scheduler job runs nightly-score-replay --status failed ``` --- ## 3. Proof Verification Operations ### 3.1 Online Verification ```bash # Verify via API curl -X POST "https://scanner.example.com/api/v1/proofs/verify" \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "proofBundleId": "sha256:proof123...", "checkRekor": true, "anchorIds": ["anchor-001"] }' # Verify via CLI stella proof verify --bundle-id sha256:proof123... --check-rekor ``` ### 3.2 Offline Verification (Air-Gap) For air-gapped environments: ```bash # 1. Download proof bundle (on connected system) curl -o proof-bundle.zip \ "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/proofs/sha256:proof123..." # 2. Transfer to air-gapped system (USB, etc.) # 3. Verify offline (on air-gapped system) stella proof verify --bundle proof-bundle.zip \ --offline \ --trust-anchor /path/to/trust-anchor.pem # 4. Verify with explicit public key stella proof verify --bundle proof-bundle.zip \ --offline \ --public-key /path/to/public-key.pem \ --skip-rekor # No network access ``` ### 3.3 Verification Checks | Check | Description | Can Skip? | |-------|-------------|-----------| | Signature Valid | DSSE signature matches payload | No | | ID Recomputed | Content-addressed ID matches | No | | Merkle Path Valid | Merkle tree construction correct | No | | Rekor Inclusion | Transparency log entry exists | Yes (offline) | | Timestamp Valid | Proof created within valid window | Configurable | ### 3.4 Failed Verification Troubleshooting ```bash # Get detailed verification report stella proof verify --bundle-id sha256:proof123... --verbose # Check specific failures stella proof verify --bundle-id sha256:proof123... --check signatureValid stella proof verify --bundle-id sha256:proof123... --check idRecomputed stella proof verify --bundle-id sha256:proof123... --check merklePathValid # Dump proof bundle contents for inspection stella proof inspect --bundle proof-bundle.zip --output-dir ./inspection/ ``` --- ## 4. Proof Bundle Management ### 4.1 Download Proof Bundles ```bash # Download single bundle stella proof download --scan-id $SCAN_ID --output proof.zip # Download with specific root hash stella proof download --scan-id $SCAN_ID \ --root-hash sha256:proof123... \ --output proof.zip # Download all bundles for a scan stella proof download --scan-id $SCAN_ID --all --output-dir ./proofs/ ``` ### 4.2 Bundle Contents ```bash # List bundle contents unzip -l proof-bundle.zip # Expected contents: # manifest.json - Scan manifest (canonical JSON) # manifest.dsse.json - DSSE signature of manifest # score_proof.json - Proof ledger (ProofNode array) # proof_root.dsse.json - DSSE signature of proof root # meta.json - Metadata (timestamps, versions) # Extract and inspect unzip proof-bundle.zip -d ./proof-contents/ cat ./proof-contents/manifest.json | jq . cat ./proof-contents/score_proof.json | jq '.nodes | length' ``` ### 4.3 Proof Retention Proof bundles are retained based on policy: | Tier | Retention | Description | |------|-----------|-------------| | Hot | 30 days | Recent proofs, fast access | | Warm | 1 year | Archived proofs, slower access | | Cold | 7 years | Compliance archive, retrieval required | **Check retention status**: ```bash stella proof status --scan-id $SCAN_ID # Output: tier=hot, expires=2025-01-17, retrievable=true ``` **Retrieve from cold storage**: ```bash # Request retrieval (async, may take hours) stella proof retrieve --scan-id $SCAN_ID --root-hash sha256:proof123... # Check retrieval status stella proof retrieve-status --request-id req-001 ``` ### 4.4 Export for Audit ```bash # Export proof bundle with full chain stella proof export --scan-id $SCAN_ID \ --include-chain \ --include-anchors \ --output audit-bundle.zip # Export multiple scans for audit period stella proof export-batch \ --from 2025-01-01 \ --to 2025-01-31 \ --output-dir ./audit-jan-2025/ ``` --- ## 5. Troubleshooting ### 5.1 Score Mismatch After Replay **Symptom**: Replayed score differs from original. **Diagnosis**: ```bash # Compare manifests stella score diff --scan-id $SCAN_ID --original --replayed # Check for feed changes stella score manifest --scan-id $SCAN_ID | jq '.concelierSnapshotHash' # Compare input hashes stella score inputs --scan-id $SCAN_ID --hash ``` **Common causes**: 1. **Feed snapshot changed**: Original used different advisory data 2. **Policy updated**: Scoring rules changed between runs 3. **VEX statements added**: New VEX data affects scores 4. **Non-deterministic seed**: Check if `deterministic: true` in manifest **Resolution**: ```bash # Replay with exact original snapshots stella score replay --scan-id $SCAN_ID --use-original-snapshots ``` ### 5.2 Proof Verification Failed **Symptom**: Verification returns `verified: false`. **Diagnosis**: ```bash # Get detailed error stella proof verify --bundle-id sha256:proof123... --verbose 2>&1 | head -50 # Common errors: # - "Signature verification failed": Key mismatch or tampering # - "ID recomputation failed": Canonical JSON issue # - "Merkle path invalid": Proof chain corrupted # - "Rekor entry not found": Not logged to transparency log ``` **Resolution by error type**: | Error | Cause | Resolution | |-------|-------|------------| | Signature failed | Key rotated | Use correct trust anchor | | ID mismatch | Content modified | Re-generate proof | | Merkle invalid | Partial upload | Re-download bundle | | Rekor missing | Log lag or skip | Wait or verify offline | ### 5.3 Missing Proof Bundle **Symptom**: Proof bundle not found. **Diagnosis**: ```bash # Check if scan exists stella scan status --scan-id $SCAN_ID # Check proof generation status stella proof status --scan-id $SCAN_ID # Check if proof was generated stella proof list --scan-id $SCAN_ID ``` **Common causes**: 1. **Scan still in progress**: Proof generated after completion 2. **Proof generation failed**: Check worker logs 3. **Archived to cold storage**: Needs retrieval 4. **Retention expired**: Proof deleted per policy ### 5.4 Replay Performance Issues **Symptom**: Replay taking too long. **Diagnosis**: ```bash # Check replay queue depth stella scheduler queue status replay # Check worker health stella scanner workers status # Check for resource constraints kubectl top pods -l app=scanner-worker ``` **Optimization**: ```bash # Reduce parallelism during peak hours stella scheduler job update nightly-score-replay \ --config.maxConcurrency=5 # Skip unchanged scans stella score replay --scan-id $SCAN_ID --skip-unchanged ``` --- ## 6. Monitoring & Alerting ### 6.1 Key Metrics | Metric | Description | Alert Threshold | |--------|-------------|-----------------| | `score_replay_duration_seconds` | Time to replay a score | > 30s | | `proof_verification_success_rate` | % of successful verifications | < 99% | | `proof_bundle_size_bytes` | Size of proof bundles | > 100MB | | `replay_queue_depth` | Pending replay jobs | > 1000 | | `proof_generation_failures` | Failed proof generations | > 0/hour | ### 6.2 Grafana Dashboard ``` Dashboard: Score Proofs Operations Panels: - Replay throughput (replays/minute) - Replay latency (p50, p95, p99) - Verification success rate - Proof bundle storage usage - Queue depth over time ``` ### 6.3 Alerting Rules ```yaml # Prometheus alerting rules groups: - name: score-proofs rules: - alert: ReplayLatencyHigh expr: histogram_quantile(0.95, score_replay_duration_seconds) > 30 for: 5m labels: severity: warning annotations: summary: "Score replay latency is high" - alert: ProofVerificationFailures expr: increase(proof_verification_failures_total[1h]) > 10 for: 5m labels: severity: critical annotations: summary: "Multiple proof verification failures detected" - alert: ReplayQueueBacklog expr: replay_queue_depth > 1000 for: 15m labels: severity: warning annotations: summary: "Score replay queue backlog is growing" ``` --- ## 7. Escalation Procedures ### 7.1 Escalation Matrix | Severity | Condition | Response Time | Escalation Path | |----------|-----------|---------------|-----------------| | P1 | Proof verification failing for all scans | 15 min | On-call → Team Lead → VP Eng | | P2 | Replay failures > 10% | 1 hour | On-call → Team Lead | | P3 | Replay latency > 60s p95 | 4 hours | On-call | | P4 | Queue backlog > 5000 | 24 hours | Ticket | ### 7.2 P1 Response Procedure 1. **Acknowledge** alert in PagerDuty 2. **Triage**: ```bash # Check service health stella health check --service scanner stella health check --service attestor # Check recent changes kubectl rollout history deployment/scanner-worker ``` 3. **Mitigate**: ```bash # If recent deployment, rollback kubectl rollout undo deployment/scanner-worker # If key rotation issue, restore previous anchor stella anchor restore --anchor-id anchor-001 --revision previous ``` 4. **Communicate**: Update status page, notify stakeholders 5. **Resolve**: Fix root cause, verify fix 6. **Postmortem**: Document incident within 48 hours ### 7.3 Contact Information | Role | Contact | Availability | |------|---------|--------------| | On-Call Engineer | PagerDuty `scanner-oncall` | 24/7 | | Scanner Team Lead | @scanner-lead | Business hours | | Security Team | security@stellaops.local | Business hours | | VP Engineering | @vp-eng | Escalation only | --- ## Related Documentation - [Score Proofs API Reference](../api/score-proofs-reachability-api-reference.md) - [Proof Chain Architecture](../modules/attestor/architecture.md) - [CLI Reference](./cli-reference.md) - [Air-Gap Operations](../airgap/operations.md) --- **Last Updated**: 2025-12-20 **Version**: 1.0.0 **Sprint**: 3500.0004.0004