Add integration tests for Proof Chain and Reachability workflows
- Implement ProofChainTestFixture for PostgreSQL-backed integration tests. - Create StellaOps.Integration.ProofChain project with necessary dependencies. - Add ReachabilityIntegrationTests to validate call graph extraction and reachability analysis. - Introduce ReachabilityTestFixture for managing corpus and fixture paths. - Establish StellaOps.Integration.Reachability project with required references. - Develop UnknownsWorkflowTests to cover the unknowns lifecycle: detection, ranking, escalation, and resolution. - Create StellaOps.Integration.Unknowns project with dependencies for unknowns workflow.
This commit is contained in:
544
docs/operations/score-proofs-runbook.md
Normal file
544
docs/operations/score-proofs-runbook.md
Normal file
@@ -0,0 +1,544 @@
|
||||
# Score Proofs Operations Runbook
|
||||
|
||||
> **Version**: 1.0.0
|
||||
> **Sprint**: 3500.0004.0004
|
||||
> **Last Updated**: 2025-12-20
|
||||
|
||||
This runbook covers operational procedures for Score Proofs, including score replay, proof verification, and troubleshooting.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#1-overview)
|
||||
2. [Score Replay Operations](#2-score-replay-operations)
|
||||
3. [Proof Verification Operations](#3-proof-verification-operations)
|
||||
4. [Proof Bundle Management](#4-proof-bundle-management)
|
||||
5. [Troubleshooting](#5-troubleshooting)
|
||||
6. [Monitoring & Alerting](#6-monitoring--alerting)
|
||||
7. [Escalation Procedures](#7-escalation-procedures)
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
### What are Score Proofs?
|
||||
|
||||
Score Proofs provide cryptographically verifiable audit trails for vulnerability scoring decisions. Each proof:
|
||||
|
||||
- **Records inputs**: SBOM, feed snapshots, VEX data, policy hashes
|
||||
- **Traces computation**: Every scoring rule application
|
||||
- **Signs results**: DSSE envelopes with configurable trust anchors
|
||||
- **Enables replay**: Same inputs → same outputs (deterministic)
|
||||
|
||||
### Key Components
|
||||
|
||||
| Component | Purpose | Location |
|
||||
|-----------|---------|----------|
|
||||
| Scan Manifest | Records all inputs deterministically | `scanner.scan_manifest` table |
|
||||
| Proof Ledger | DAG of scoring computation nodes | `scanner.proof_bundle` table |
|
||||
| DSSE Envelope | Cryptographic signature wrapper | In proof bundle JSON |
|
||||
| Proof Bundle | ZIP archive for offline verification | Stored in object storage |
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Access to Scanner WebService API
|
||||
- `scanner.proofs` OAuth scope
|
||||
- CLI access with `stella` configured
|
||||
- Trust anchor public keys (for verification)
|
||||
|
||||
---
|
||||
|
||||
## 2. Score Replay Operations
|
||||
|
||||
### 2.1 When to Replay Scores
|
||||
|
||||
Score replay is needed when:
|
||||
|
||||
- **Feed updates**: New advisories from Concelier
|
||||
- **VEX updates**: New VEX statements from Excititor
|
||||
- **Policy changes**: Updated scoring policy rules
|
||||
- **Audit requests**: Need to verify historical scores
|
||||
- **Investigation**: Analyze why a score changed
|
||||
|
||||
### 2.2 Manual Score Replay (API)
|
||||
|
||||
```bash
|
||||
# Get current scan manifest
|
||||
curl -s "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/manifest" \
|
||||
-H "Authorization: Bearer $TOKEN" | jq '.manifest'
|
||||
|
||||
# Replay with current feeds (uses latest snapshots)
|
||||
curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/score/replay" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{}' | jq '.scoreProof.rootHash'
|
||||
|
||||
# Replay with specific feed snapshot
|
||||
curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/score/replay" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"overrides": {
|
||||
"concelierSnapshotHash": "sha256:specific-feed-snapshot..."
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
### 2.3 Manual Score Replay (CLI)
|
||||
|
||||
```bash
|
||||
# Replay with current feeds
|
||||
stella score replay --scan-id $SCAN_ID
|
||||
|
||||
# Replay with specific snapshot
|
||||
stella score replay --scan-id $SCAN_ID \
|
||||
--feed-snapshot sha256:specific-feed-snapshot...
|
||||
|
||||
# Replay and compare with original
|
||||
stella score replay --scan-id $SCAN_ID --diff
|
||||
|
||||
# Replay in offline mode (air-gap)
|
||||
stella score replay --scan-id $SCAN_ID \
|
||||
--offline \
|
||||
--bundle /path/to/offline-bundle.zip
|
||||
```
|
||||
|
||||
### 2.4 Batch Score Replay
|
||||
|
||||
For bulk replay (e.g., after major feed update):
|
||||
|
||||
```bash
|
||||
# List all scans from last 7 days
|
||||
stella scan list --since 7d --format json > scans.json
|
||||
|
||||
# Replay each scan
|
||||
cat scans.json | jq -r '.[].scanId' | while read SCAN_ID; do
|
||||
echo "Replaying $SCAN_ID..."
|
||||
stella score replay --scan-id "$SCAN_ID" --quiet
|
||||
done
|
||||
|
||||
# Or use the batch API endpoint (more efficient)
|
||||
curl -X POST "https://scanner.example.com/api/v1/scanner/batch/replay" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"scanIds": ["scan-1", "scan-2", "scan-3"],
|
||||
"parallel": true,
|
||||
"maxConcurrency": 10
|
||||
}'
|
||||
```
|
||||
|
||||
### 2.5 Nightly Replay Job
|
||||
|
||||
The Scheduler automatically replays scores when Concelier publishes new snapshots:
|
||||
|
||||
```yaml
|
||||
# Job configuration in Scheduler
|
||||
job:
|
||||
name: nightly-score-replay
|
||||
schedule: "0 3 * * *" # 3 AM daily
|
||||
trigger:
|
||||
type: concelier-snapshot-published
|
||||
action:
|
||||
type: batch-replay
|
||||
config:
|
||||
maxAge: 30d
|
||||
parallel: true
|
||||
maxConcurrency: 20
|
||||
```
|
||||
|
||||
**Monitoring the nightly job**:
|
||||
|
||||
```bash
|
||||
# Check job status
|
||||
stella scheduler job status nightly-score-replay
|
||||
|
||||
# View recent runs
|
||||
stella scheduler job runs nightly-score-replay --last 7
|
||||
|
||||
# Check for failures
|
||||
stella scheduler job runs nightly-score-replay --status failed
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Proof Verification Operations
|
||||
|
||||
### 3.1 Online Verification
|
||||
|
||||
```bash
|
||||
# Verify via API
|
||||
curl -X POST "https://scanner.example.com/api/v1/proofs/verify" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"proofBundleId": "sha256:proof123...",
|
||||
"checkRekor": true,
|
||||
"anchorIds": ["anchor-001"]
|
||||
}'
|
||||
|
||||
# Verify via CLI
|
||||
stella proof verify --bundle-id sha256:proof123... --check-rekor
|
||||
```
|
||||
|
||||
### 3.2 Offline Verification (Air-Gap)
|
||||
|
||||
For air-gapped environments:
|
||||
|
||||
```bash
|
||||
# 1. Download proof bundle (on connected system)
|
||||
curl -o proof-bundle.zip \
|
||||
"https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/proofs/sha256:proof123..."
|
||||
|
||||
# 2. Transfer to air-gapped system (USB, etc.)
|
||||
|
||||
# 3. Verify offline (on air-gapped system)
|
||||
stella proof verify --bundle proof-bundle.zip \
|
||||
--offline \
|
||||
--trust-anchor /path/to/trust-anchor.pem
|
||||
|
||||
# 4. Verify with explicit public key
|
||||
stella proof verify --bundle proof-bundle.zip \
|
||||
--offline \
|
||||
--public-key /path/to/public-key.pem \
|
||||
--skip-rekor # No network access
|
||||
```
|
||||
|
||||
### 3.3 Verification Checks
|
||||
|
||||
| Check | Description | Can Skip? |
|
||||
|-------|-------------|-----------|
|
||||
| Signature Valid | DSSE signature matches payload | No |
|
||||
| ID Recomputed | Content-addressed ID matches | No |
|
||||
| Merkle Path Valid | Merkle tree construction correct | No |
|
||||
| Rekor Inclusion | Transparency log entry exists | Yes (offline) |
|
||||
| Timestamp Valid | Proof created within valid window | Configurable |
|
||||
|
||||
### 3.4 Failed Verification Troubleshooting
|
||||
|
||||
```bash
|
||||
# Get detailed verification report
|
||||
stella proof verify --bundle-id sha256:proof123... --verbose
|
||||
|
||||
# Check specific failures
|
||||
stella proof verify --bundle-id sha256:proof123... --check signatureValid
|
||||
stella proof verify --bundle-id sha256:proof123... --check idRecomputed
|
||||
stella proof verify --bundle-id sha256:proof123... --check merklePathValid
|
||||
|
||||
# Dump proof bundle contents for inspection
|
||||
stella proof inspect --bundle proof-bundle.zip --output-dir ./inspection/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Proof Bundle Management
|
||||
|
||||
### 4.1 Download Proof Bundles
|
||||
|
||||
```bash
|
||||
# Download single bundle
|
||||
stella proof download --scan-id $SCAN_ID --output proof.zip
|
||||
|
||||
# Download with specific root hash
|
||||
stella proof download --scan-id $SCAN_ID \
|
||||
--root-hash sha256:proof123... \
|
||||
--output proof.zip
|
||||
|
||||
# Download all bundles for a scan
|
||||
stella proof download --scan-id $SCAN_ID --all --output-dir ./proofs/
|
||||
```
|
||||
|
||||
### 4.2 Bundle Contents
|
||||
|
||||
```bash
|
||||
# List bundle contents
|
||||
unzip -l proof-bundle.zip
|
||||
|
||||
# Expected contents:
|
||||
# manifest.json - Scan manifest (canonical JSON)
|
||||
# manifest.dsse.json - DSSE signature of manifest
|
||||
# score_proof.json - Proof ledger (ProofNode array)
|
||||
# proof_root.dsse.json - DSSE signature of proof root
|
||||
# meta.json - Metadata (timestamps, versions)
|
||||
|
||||
# Extract and inspect
|
||||
unzip proof-bundle.zip -d ./proof-contents/
|
||||
cat ./proof-contents/manifest.json | jq .
|
||||
cat ./proof-contents/score_proof.json | jq '.nodes | length'
|
||||
```
|
||||
|
||||
### 4.3 Proof Retention
|
||||
|
||||
Proof bundles are retained based on policy:
|
||||
|
||||
| Tier | Retention | Description |
|
||||
|------|-----------|-------------|
|
||||
| Hot | 30 days | Recent proofs, fast access |
|
||||
| Warm | 1 year | Archived proofs, slower access |
|
||||
| Cold | 7 years | Compliance archive, retrieval required |
|
||||
|
||||
**Check retention status**:
|
||||
|
||||
```bash
|
||||
stella proof status --scan-id $SCAN_ID
|
||||
# Output: tier=hot, expires=2025-01-17, retrievable=true
|
||||
```
|
||||
|
||||
**Retrieve from cold storage**:
|
||||
|
||||
```bash
|
||||
# Request retrieval (async, may take hours)
|
||||
stella proof retrieve --scan-id $SCAN_ID --root-hash sha256:proof123...
|
||||
|
||||
# Check retrieval status
|
||||
stella proof retrieve-status --request-id req-001
|
||||
```
|
||||
|
||||
### 4.4 Export for Audit
|
||||
|
||||
```bash
|
||||
# Export proof bundle with full chain
|
||||
stella proof export --scan-id $SCAN_ID \
|
||||
--include-chain \
|
||||
--include-anchors \
|
||||
--output audit-bundle.zip
|
||||
|
||||
# Export multiple scans for audit period
|
||||
stella proof export-batch \
|
||||
--from 2025-01-01 \
|
||||
--to 2025-01-31 \
|
||||
--output-dir ./audit-jan-2025/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Troubleshooting
|
||||
|
||||
### 5.1 Score Mismatch After Replay
|
||||
|
||||
**Symptom**: Replayed score differs from original.
|
||||
|
||||
**Diagnosis**:
|
||||
|
||||
```bash
|
||||
# Compare manifests
|
||||
stella score diff --scan-id $SCAN_ID --original --replayed
|
||||
|
||||
# Check for feed changes
|
||||
stella score manifest --scan-id $SCAN_ID | jq '.concelierSnapshotHash'
|
||||
|
||||
# Compare input hashes
|
||||
stella score inputs --scan-id $SCAN_ID --hash
|
||||
```
|
||||
|
||||
**Common causes**:
|
||||
|
||||
1. **Feed snapshot changed**: Original used different advisory data
|
||||
2. **Policy updated**: Scoring rules changed between runs
|
||||
3. **VEX statements added**: New VEX data affects scores
|
||||
4. **Non-deterministic seed**: Check if `deterministic: true` in manifest
|
||||
|
||||
**Resolution**:
|
||||
|
||||
```bash
|
||||
# Replay with exact original snapshots
|
||||
stella score replay --scan-id $SCAN_ID --use-original-snapshots
|
||||
```
|
||||
|
||||
### 5.2 Proof Verification Failed
|
||||
|
||||
**Symptom**: Verification returns `verified: false`.
|
||||
|
||||
**Diagnosis**:
|
||||
|
||||
```bash
|
||||
# Get detailed error
|
||||
stella proof verify --bundle-id sha256:proof123... --verbose 2>&1 | head -50
|
||||
|
||||
# Common errors:
|
||||
# - "Signature verification failed": Key mismatch or tampering
|
||||
# - "ID recomputation failed": Canonical JSON issue
|
||||
# - "Merkle path invalid": Proof chain corrupted
|
||||
# - "Rekor entry not found": Not logged to transparency log
|
||||
```
|
||||
|
||||
**Resolution by error type**:
|
||||
|
||||
| Error | Cause | Resolution |
|
||||
|-------|-------|------------|
|
||||
| Signature failed | Key rotated | Use correct trust anchor |
|
||||
| ID mismatch | Content modified | Re-generate proof |
|
||||
| Merkle invalid | Partial upload | Re-download bundle |
|
||||
| Rekor missing | Log lag or skip | Wait or verify offline |
|
||||
|
||||
### 5.3 Missing Proof Bundle
|
||||
|
||||
**Symptom**: Proof bundle not found.
|
||||
|
||||
**Diagnosis**:
|
||||
|
||||
```bash
|
||||
# Check if scan exists
|
||||
stella scan status --scan-id $SCAN_ID
|
||||
|
||||
# Check proof generation status
|
||||
stella proof status --scan-id $SCAN_ID
|
||||
|
||||
# Check if proof was generated
|
||||
stella proof list --scan-id $SCAN_ID
|
||||
```
|
||||
|
||||
**Common causes**:
|
||||
|
||||
1. **Scan still in progress**: Proof generated after completion
|
||||
2. **Proof generation failed**: Check worker logs
|
||||
3. **Archived to cold storage**: Needs retrieval
|
||||
4. **Retention expired**: Proof deleted per policy
|
||||
|
||||
### 5.4 Replay Performance Issues
|
||||
|
||||
**Symptom**: Replay taking too long.
|
||||
|
||||
**Diagnosis**:
|
||||
|
||||
```bash
|
||||
# Check replay queue depth
|
||||
stella scheduler queue status replay
|
||||
|
||||
# Check worker health
|
||||
stella scanner workers status
|
||||
|
||||
# Check for resource constraints
|
||||
kubectl top pods -l app=scanner-worker
|
||||
```
|
||||
|
||||
**Optimization**:
|
||||
|
||||
```bash
|
||||
# Reduce parallelism during peak hours
|
||||
stella scheduler job update nightly-score-replay \
|
||||
--config.maxConcurrency=5
|
||||
|
||||
# Skip unchanged scans
|
||||
stella score replay --scan-id $SCAN_ID --skip-unchanged
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Monitoring & Alerting
|
||||
|
||||
### 6.1 Key Metrics
|
||||
|
||||
| Metric | Description | Alert Threshold |
|
||||
|--------|-------------|-----------------|
|
||||
| `score_replay_duration_seconds` | Time to replay a score | > 30s |
|
||||
| `proof_verification_success_rate` | % of successful verifications | < 99% |
|
||||
| `proof_bundle_size_bytes` | Size of proof bundles | > 100MB |
|
||||
| `replay_queue_depth` | Pending replay jobs | > 1000 |
|
||||
| `proof_generation_failures` | Failed proof generations | > 0/hour |
|
||||
|
||||
### 6.2 Grafana Dashboard
|
||||
|
||||
```
|
||||
Dashboard: Score Proofs Operations
|
||||
Panels:
|
||||
- Replay throughput (replays/minute)
|
||||
- Replay latency (p50, p95, p99)
|
||||
- Verification success rate
|
||||
- Proof bundle storage usage
|
||||
- Queue depth over time
|
||||
```
|
||||
|
||||
### 6.3 Alerting Rules
|
||||
|
||||
```yaml
|
||||
# Prometheus alerting rules
|
||||
groups:
|
||||
- name: score-proofs
|
||||
rules:
|
||||
- alert: ReplayLatencyHigh
|
||||
expr: histogram_quantile(0.95, score_replay_duration_seconds) > 30
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Score replay latency is high"
|
||||
|
||||
- alert: ProofVerificationFailures
|
||||
expr: increase(proof_verification_failures_total[1h]) > 10
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Multiple proof verification failures detected"
|
||||
|
||||
- alert: ReplayQueueBacklog
|
||||
expr: replay_queue_depth > 1000
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Score replay queue backlog is growing"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Escalation Procedures
|
||||
|
||||
### 7.1 Escalation Matrix
|
||||
|
||||
| Severity | Condition | Response Time | Escalation Path |
|
||||
|----------|-----------|---------------|-----------------|
|
||||
| P1 | Proof verification failing for all scans | 15 min | On-call → Team Lead → VP Eng |
|
||||
| P2 | Replay failures > 10% | 1 hour | On-call → Team Lead |
|
||||
| P3 | Replay latency > 60s p95 | 4 hours | On-call |
|
||||
| P4 | Queue backlog > 5000 | 24 hours | Ticket |
|
||||
|
||||
### 7.2 P1 Response Procedure
|
||||
|
||||
1. **Acknowledge** alert in PagerDuty
|
||||
2. **Triage**:
|
||||
```bash
|
||||
# Check service health
|
||||
stella health check --service scanner
|
||||
stella health check --service attestor
|
||||
|
||||
# Check recent changes
|
||||
kubectl rollout history deployment/scanner-worker
|
||||
```
|
||||
3. **Mitigate**:
|
||||
```bash
|
||||
# If recent deployment, rollback
|
||||
kubectl rollout undo deployment/scanner-worker
|
||||
|
||||
# If key rotation issue, restore previous anchor
|
||||
stella anchor restore --anchor-id anchor-001 --revision previous
|
||||
```
|
||||
4. **Communicate**: Update status page, notify stakeholders
|
||||
5. **Resolve**: Fix root cause, verify fix
|
||||
6. **Postmortem**: Document incident within 48 hours
|
||||
|
||||
### 7.3 Contact Information
|
||||
|
||||
| Role | Contact | Availability |
|
||||
|------|---------|--------------|
|
||||
| On-Call Engineer | PagerDuty `scanner-oncall` | 24/7 |
|
||||
| Scanner Team Lead | @scanner-lead | Business hours |
|
||||
| Security Team | security@stellaops.local | Business hours |
|
||||
| VP Engineering | @vp-eng | Escalation only |
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Score Proofs API Reference](../api/score-proofs-reachability-api-reference.md)
|
||||
- [Proof Chain Architecture](../modules/attestor/architecture.md)
|
||||
- [CLI Reference](./cli-reference.md)
|
||||
- [Air-Gap Operations](../airgap/operations.md)
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-12-20
|
||||
**Version**: 1.0.0
|
||||
**Sprint**: 3500.0004.0004
|
||||
Reference in New Issue
Block a user