- Implement ProofChainTestFixture for PostgreSQL-backed integration tests. - Create StellaOps.Integration.ProofChain project with necessary dependencies. - Add ReachabilityIntegrationTests to validate call graph extraction and reachability analysis. - Introduce ReachabilityTestFixture for managing corpus and fixture paths. - Establish StellaOps.Integration.Reachability project with required references. - Develop UnknownsWorkflowTests to cover the unknowns lifecycle: detection, ranking, escalation, and resolution. - Create StellaOps.Integration.Unknowns project with dependencies for unknowns workflow.
14 KiB
Score Proofs Operations Runbook
Version: 1.0.0
Sprint: 3500.0004.0004
Last Updated: 2025-12-20
This runbook covers operational procedures for Score Proofs, including score replay, proof verification, and troubleshooting.
Table of Contents
- Overview
- Score Replay Operations
- Proof Verification Operations
- Proof Bundle Management
- Troubleshooting
- Monitoring & Alerting
- Escalation Procedures
1. Overview
What are Score Proofs?
Score Proofs provide cryptographically verifiable audit trails for vulnerability scoring decisions. Each proof:
- Records inputs: SBOM, feed snapshots, VEX data, policy hashes
- Traces computation: Every scoring rule application
- Signs results: DSSE envelopes with configurable trust anchors
- Enables replay: Same inputs → same outputs (deterministic)
Key Components
| Component | Purpose | Location |
|---|---|---|
| Scan Manifest | Records all inputs deterministically | scanner.scan_manifest table |
| Proof Ledger | DAG of scoring computation nodes | scanner.proof_bundle table |
| DSSE Envelope | Cryptographic signature wrapper | In proof bundle JSON |
| Proof Bundle | ZIP archive for offline verification | Stored in object storage |
Prerequisites
- Access to Scanner WebService API
scanner.proofsOAuth scope- CLI access with
stellaconfigured - Trust anchor public keys (for verification)
2. Score Replay Operations
2.1 When to Replay Scores
Score replay is needed when:
- Feed updates: New advisories from Concelier
- VEX updates: New VEX statements from Excititor
- Policy changes: Updated scoring policy rules
- Audit requests: Need to verify historical scores
- Investigation: Analyze why a score changed
2.2 Manual Score Replay (API)
# Get current scan manifest
curl -s "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/manifest" \
-H "Authorization: Bearer $TOKEN" | jq '.manifest'
# Replay with current feeds (uses latest snapshots)
curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/score/replay" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{}' | jq '.scoreProof.rootHash'
# Replay with specific feed snapshot
curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/score/replay" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"overrides": {
"concelierSnapshotHash": "sha256:specific-feed-snapshot..."
}
}'
2.3 Manual Score Replay (CLI)
# Replay with current feeds
stella score replay --scan-id $SCAN_ID
# Replay with specific snapshot
stella score replay --scan-id $SCAN_ID \
--feed-snapshot sha256:specific-feed-snapshot...
# Replay and compare with original
stella score replay --scan-id $SCAN_ID --diff
# Replay in offline mode (air-gap)
stella score replay --scan-id $SCAN_ID \
--offline \
--bundle /path/to/offline-bundle.zip
2.4 Batch Score Replay
For bulk replay (e.g., after major feed update):
# List all scans from last 7 days
stella scan list --since 7d --format json > scans.json
# Replay each scan
cat scans.json | jq -r '.[].scanId' | while read SCAN_ID; do
echo "Replaying $SCAN_ID..."
stella score replay --scan-id "$SCAN_ID" --quiet
done
# Or use the batch API endpoint (more efficient)
curl -X POST "https://scanner.example.com/api/v1/scanner/batch/replay" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"scanIds": ["scan-1", "scan-2", "scan-3"],
"parallel": true,
"maxConcurrency": 10
}'
2.5 Nightly Replay Job
The Scheduler automatically replays scores when Concelier publishes new snapshots:
# Job configuration in Scheduler
job:
name: nightly-score-replay
schedule: "0 3 * * *" # 3 AM daily
trigger:
type: concelier-snapshot-published
action:
type: batch-replay
config:
maxAge: 30d
parallel: true
maxConcurrency: 20
Monitoring the nightly job:
# Check job status
stella scheduler job status nightly-score-replay
# View recent runs
stella scheduler job runs nightly-score-replay --last 7
# Check for failures
stella scheduler job runs nightly-score-replay --status failed
3. Proof Verification Operations
3.1 Online Verification
# Verify via API
curl -X POST "https://scanner.example.com/api/v1/proofs/verify" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"proofBundleId": "sha256:proof123...",
"checkRekor": true,
"anchorIds": ["anchor-001"]
}'
# Verify via CLI
stella proof verify --bundle-id sha256:proof123... --check-rekor
3.2 Offline Verification (Air-Gap)
For air-gapped environments:
# 1. Download proof bundle (on connected system)
curl -o proof-bundle.zip \
"https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/proofs/sha256:proof123..."
# 2. Transfer to air-gapped system (USB, etc.)
# 3. Verify offline (on air-gapped system)
stella proof verify --bundle proof-bundle.zip \
--offline \
--trust-anchor /path/to/trust-anchor.pem
# 4. Verify with explicit public key
stella proof verify --bundle proof-bundle.zip \
--offline \
--public-key /path/to/public-key.pem \
--skip-rekor # No network access
3.3 Verification Checks
| Check | Description | Can Skip? |
|---|---|---|
| Signature Valid | DSSE signature matches payload | No |
| ID Recomputed | Content-addressed ID matches | No |
| Merkle Path Valid | Merkle tree construction correct | No |
| Rekor Inclusion | Transparency log entry exists | Yes (offline) |
| Timestamp Valid | Proof created within valid window | Configurable |
3.4 Failed Verification Troubleshooting
# Get detailed verification report
stella proof verify --bundle-id sha256:proof123... --verbose
# Check specific failures
stella proof verify --bundle-id sha256:proof123... --check signatureValid
stella proof verify --bundle-id sha256:proof123... --check idRecomputed
stella proof verify --bundle-id sha256:proof123... --check merklePathValid
# Dump proof bundle contents for inspection
stella proof inspect --bundle proof-bundle.zip --output-dir ./inspection/
4. Proof Bundle Management
4.1 Download Proof Bundles
# Download single bundle
stella proof download --scan-id $SCAN_ID --output proof.zip
# Download with specific root hash
stella proof download --scan-id $SCAN_ID \
--root-hash sha256:proof123... \
--output proof.zip
# Download all bundles for a scan
stella proof download --scan-id $SCAN_ID --all --output-dir ./proofs/
4.2 Bundle Contents
# List bundle contents
unzip -l proof-bundle.zip
# Expected contents:
# manifest.json - Scan manifest (canonical JSON)
# manifest.dsse.json - DSSE signature of manifest
# score_proof.json - Proof ledger (ProofNode array)
# proof_root.dsse.json - DSSE signature of proof root
# meta.json - Metadata (timestamps, versions)
# Extract and inspect
unzip proof-bundle.zip -d ./proof-contents/
cat ./proof-contents/manifest.json | jq .
cat ./proof-contents/score_proof.json | jq '.nodes | length'
4.3 Proof Retention
Proof bundles are retained based on policy:
| Tier | Retention | Description |
|---|---|---|
| Hot | 30 days | Recent proofs, fast access |
| Warm | 1 year | Archived proofs, slower access |
| Cold | 7 years | Compliance archive, retrieval required |
Check retention status:
stella proof status --scan-id $SCAN_ID
# Output: tier=hot, expires=2025-01-17, retrievable=true
Retrieve from cold storage:
# Request retrieval (async, may take hours)
stella proof retrieve --scan-id $SCAN_ID --root-hash sha256:proof123...
# Check retrieval status
stella proof retrieve-status --request-id req-001
4.4 Export for Audit
# Export proof bundle with full chain
stella proof export --scan-id $SCAN_ID \
--include-chain \
--include-anchors \
--output audit-bundle.zip
# Export multiple scans for audit period
stella proof export-batch \
--from 2025-01-01 \
--to 2025-01-31 \
--output-dir ./audit-jan-2025/
5. Troubleshooting
5.1 Score Mismatch After Replay
Symptom: Replayed score differs from original.
Diagnosis:
# Compare manifests
stella score diff --scan-id $SCAN_ID --original --replayed
# Check for feed changes
stella score manifest --scan-id $SCAN_ID | jq '.concelierSnapshotHash'
# Compare input hashes
stella score inputs --scan-id $SCAN_ID --hash
Common causes:
- Feed snapshot changed: Original used different advisory data
- Policy updated: Scoring rules changed between runs
- VEX statements added: New VEX data affects scores
- Non-deterministic seed: Check if
deterministic: truein manifest
Resolution:
# Replay with exact original snapshots
stella score replay --scan-id $SCAN_ID --use-original-snapshots
5.2 Proof Verification Failed
Symptom: Verification returns verified: false.
Diagnosis:
# Get detailed error
stella proof verify --bundle-id sha256:proof123... --verbose 2>&1 | head -50
# Common errors:
# - "Signature verification failed": Key mismatch or tampering
# - "ID recomputation failed": Canonical JSON issue
# - "Merkle path invalid": Proof chain corrupted
# - "Rekor entry not found": Not logged to transparency log
Resolution by error type:
| Error | Cause | Resolution |
|---|---|---|
| Signature failed | Key rotated | Use correct trust anchor |
| ID mismatch | Content modified | Re-generate proof |
| Merkle invalid | Partial upload | Re-download bundle |
| Rekor missing | Log lag or skip | Wait or verify offline |
5.3 Missing Proof Bundle
Symptom: Proof bundle not found.
Diagnosis:
# Check if scan exists
stella scan status --scan-id $SCAN_ID
# Check proof generation status
stella proof status --scan-id $SCAN_ID
# Check if proof was generated
stella proof list --scan-id $SCAN_ID
Common causes:
- Scan still in progress: Proof generated after completion
- Proof generation failed: Check worker logs
- Archived to cold storage: Needs retrieval
- Retention expired: Proof deleted per policy
5.4 Replay Performance Issues
Symptom: Replay taking too long.
Diagnosis:
# Check replay queue depth
stella scheduler queue status replay
# Check worker health
stella scanner workers status
# Check for resource constraints
kubectl top pods -l app=scanner-worker
Optimization:
# Reduce parallelism during peak hours
stella scheduler job update nightly-score-replay \
--config.maxConcurrency=5
# Skip unchanged scans
stella score replay --scan-id $SCAN_ID --skip-unchanged
6. Monitoring & Alerting
6.1 Key Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
score_replay_duration_seconds |
Time to replay a score | > 30s |
proof_verification_success_rate |
% of successful verifications | < 99% |
proof_bundle_size_bytes |
Size of proof bundles | > 100MB |
replay_queue_depth |
Pending replay jobs | > 1000 |
proof_generation_failures |
Failed proof generations | > 0/hour |
6.2 Grafana Dashboard
Dashboard: Score Proofs Operations
Panels:
- Replay throughput (replays/minute)
- Replay latency (p50, p95, p99)
- Verification success rate
- Proof bundle storage usage
- Queue depth over time
6.3 Alerting Rules
# Prometheus alerting rules
groups:
- name: score-proofs
rules:
- alert: ReplayLatencyHigh
expr: histogram_quantile(0.95, score_replay_duration_seconds) > 30
for: 5m
labels:
severity: warning
annotations:
summary: "Score replay latency is high"
- alert: ProofVerificationFailures
expr: increase(proof_verification_failures_total[1h]) > 10
for: 5m
labels:
severity: critical
annotations:
summary: "Multiple proof verification failures detected"
- alert: ReplayQueueBacklog
expr: replay_queue_depth > 1000
for: 15m
labels:
severity: warning
annotations:
summary: "Score replay queue backlog is growing"
7. Escalation Procedures
7.1 Escalation Matrix
| Severity | Condition | Response Time | Escalation Path |
|---|---|---|---|
| P1 | Proof verification failing for all scans | 15 min | On-call → Team Lead → VP Eng |
| P2 | Replay failures > 10% | 1 hour | On-call → Team Lead |
| P3 | Replay latency > 60s p95 | 4 hours | On-call |
| P4 | Queue backlog > 5000 | 24 hours | Ticket |
7.2 P1 Response Procedure
- Acknowledge alert in PagerDuty
- Triage:
# Check service health stella health check --service scanner stella health check --service attestor # Check recent changes kubectl rollout history deployment/scanner-worker - Mitigate:
# If recent deployment, rollback kubectl rollout undo deployment/scanner-worker # If key rotation issue, restore previous anchor stella anchor restore --anchor-id anchor-001 --revision previous - Communicate: Update status page, notify stakeholders
- Resolve: Fix root cause, verify fix
- Postmortem: Document incident within 48 hours
7.3 Contact Information
| Role | Contact | Availability |
|---|---|---|
| On-Call Engineer | PagerDuty scanner-oncall |
24/7 |
| Scanner Team Lead | @scanner-lead | Business hours |
| Security Team | security@stellaops.local | Business hours |
| VP Engineering | @vp-eng | Escalation only |
Related Documentation
Last Updated: 2025-12-20
Version: 1.0.0
Sprint: 3500.0004.0004