# Score Proofs Operations Runbook

> **Version**: 1.0.0  
> **Sprint**: 3500.0004.0004  
> **Last Updated**: 2025-12-20

This runbook covers operational procedures for Score Proofs, including score replay, proof verification, and troubleshooting.

---

## Table of Contents

1. [Overview](#1-overview)
2. [Score Replay Operations](#2-score-replay-operations)
3. [Proof Verification Operations](#3-proof-verification-operations)
4. [Proof Bundle Management](#4-proof-bundle-management)
5. [Troubleshooting](#5-troubleshooting)
6. [Monitoring & Alerting](#6-monitoring--alerting)
7. [Escalation Procedures](#7-escalation-procedures)

---

## 1. Overview

### What are Score Proofs?

Score Proofs provide cryptographically verifiable audit trails for vulnerability scoring decisions. Each proof:

- **Records inputs**: SBOM, feed snapshots, VEX data, policy hashes
- **Traces computation**: Every scoring rule application
- **Signs results**: DSSE envelopes with configurable trust anchors
- **Enables replay**: Same inputs → same outputs (deterministic)

### Key Components

| Component | Purpose | Location |
|-----------|---------|----------|
| Scan Manifest | Records all inputs deterministically | `scanner.scan_manifest` table |
| Proof Ledger | DAG of scoring computation nodes | `scanner.proof_bundle` table |
| DSSE Envelope | Cryptographic signature wrapper | In proof bundle JSON |
| Proof Bundle | ZIP archive for offline verification | Stored in object storage |

### Prerequisites

- Access to Scanner WebService API
- `scanner.proofs` OAuth scope
- CLI access with `stella` configured
- Trust anchor public keys (for verification)

---

## 2. Score Replay Operations

### 2.1 When to Replay Scores

Score replay is needed when:

- **Feed updates**: New advisories from Concelier
- **VEX updates**: New VEX statements from Excititor
- **Policy changes**: Updated scoring policy rules
- **Audit requests**: Need to verify historical scores
- **Investigation**: Analyze why a score changed

### 2.2 Manual Score Replay (API)

```bash
# Get current scan manifest
curl -s "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/manifest" \
  -H "Authorization: Bearer $TOKEN" | jq '.manifest'

# Replay with current feeds (uses latest snapshots)
curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/score/replay" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{}' | jq '.scoreProof.rootHash'

# Replay with specific feed snapshot
curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/score/replay" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "overrides": {
      "concelierSnapshotHash": "sha256:specific-feed-snapshot..."
    }
  }'
```

### 2.3 Manual Score Replay (CLI)

```bash
# Replay with current feeds
stella score replay --scan-id $SCAN_ID

# Replay with specific snapshot
stella score replay --scan-id $SCAN_ID \
  --feed-snapshot sha256:specific-feed-snapshot...

# Replay and compare with original
stella score replay --scan-id $SCAN_ID --diff

# Replay in offline mode (air-gap)
stella score replay --scan-id $SCAN_ID \
  --offline \
  --bundle /path/to/offline-bundle.zip
```

### 2.4 Batch Score Replay

For bulk replay (e.g., after major feed update):

```bash
# List all scans from last 7 days
stella scan list --since 7d --format json > scans.json

# Replay each scan
cat scans.json | jq -r '.[].scanId' | while read SCAN_ID; do
  echo "Replaying $SCAN_ID..."
  stella score replay --scan-id "$SCAN_ID" --quiet
done

# Or use the batch API endpoint (more efficient)
curl -X POST "https://scanner.example.com/api/v1/scanner/batch/replay" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "scanIds": ["scan-1", "scan-2", "scan-3"],
    "parallel": true,
    "maxConcurrency": 10
  }'
```

### 2.5 Nightly Replay Job

The Scheduler automatically replays scores when Concelier publishes new snapshots:

```yaml
# Job configuration in Scheduler
job:
  name: nightly-score-replay
  schedule: "0 3 * * *"  # 3 AM daily
  trigger:
    type: concelier-snapshot-published
  action:
    type: batch-replay
    config:
      maxAge: 30d
      parallel: true
      maxConcurrency: 20
```

**Monitoring the nightly job**:

```bash
# Check job status
stella scheduler job status nightly-score-replay

# View recent runs
stella scheduler job runs nightly-score-replay --last 7

# Check for failures
stella scheduler job runs nightly-score-replay --status failed
```

---

## 3. Proof Verification Operations

### 3.1 Online Verification

```bash
# Verify via API
curl -X POST "https://scanner.example.com/api/v1/proofs/verify" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "proofBundleId": "sha256:proof123...",
    "checkRekor": true,
    "anchorIds": ["anchor-001"]
  }'

# Verify via CLI
stella proof verify --bundle-id sha256:proof123... --check-rekor
```

### 3.2 Offline Verification (Air-Gap)

For air-gapped environments:

```bash
# 1. Download proof bundle (on connected system)
curl -o proof-bundle.zip \
  "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/proofs/sha256:proof123..."

# 2. Transfer to air-gapped system (USB, etc.)

# 3. Verify offline (on air-gapped system)
stella proof verify --bundle proof-bundle.zip \
  --offline \
  --trust-anchor /path/to/trust-anchor.pem

# 4. Verify with explicit public key
stella proof verify --bundle proof-bundle.zip \
  --offline \
  --public-key /path/to/public-key.pem \
  --skip-rekor  # No network access
```

### 3.3 Verification Checks

| Check | Description | Can Skip? |
|-------|-------------|-----------|
| Signature Valid | DSSE signature matches payload | No |
| ID Recomputed | Content-addressed ID matches | No |
| Merkle Path Valid | Merkle tree construction correct | No |
| Rekor Inclusion | Transparency log entry exists | Yes (offline) |
| Timestamp Valid | Proof created within valid window | Configurable |

### 3.4 Failed Verification Troubleshooting

```bash
# Get detailed verification report
stella proof verify --bundle-id sha256:proof123... --verbose

# Check specific failures
stella proof verify --bundle-id sha256:proof123... --check signatureValid
stella proof verify --bundle-id sha256:proof123... --check idRecomputed
stella proof verify --bundle-id sha256:proof123... --check merklePathValid

# Dump proof bundle contents for inspection
stella proof inspect --bundle proof-bundle.zip --output-dir ./inspection/
```

---

## 4. Proof Bundle Management

### 4.1 Download Proof Bundles

```bash
# Download single bundle
stella proof download --scan-id $SCAN_ID --output proof.zip

# Download with specific root hash
stella proof download --scan-id $SCAN_ID \
  --root-hash sha256:proof123... \
  --output proof.zip

# Download all bundles for a scan
stella proof download --scan-id $SCAN_ID --all --output-dir ./proofs/
```

### 4.2 Bundle Contents

```bash
# List bundle contents
unzip -l proof-bundle.zip

# Expected contents:
#   manifest.json        - Scan manifest (canonical JSON)
#   manifest.dsse.json   - DSSE signature of manifest
#   score_proof.json     - Proof ledger (ProofNode array)
#   proof_root.dsse.json - DSSE signature of proof root
#   meta.json            - Metadata (timestamps, versions)

# Extract and inspect
unzip proof-bundle.zip -d ./proof-contents/
cat ./proof-contents/manifest.json | jq .
cat ./proof-contents/score_proof.json | jq '.nodes | length'
```

### 4.3 Proof Retention

Proof bundles are retained based on policy:

| Tier | Retention | Description |
|------|-----------|-------------|
| Hot | 30 days | Recent proofs, fast access |
| Warm | 1 year | Archived proofs, slower access |
| Cold | 7 years | Compliance archive, retrieval required |

**Check retention status**:

```bash
stella proof status --scan-id $SCAN_ID
# Output: tier=hot, expires=2025-01-17, retrievable=true
```

**Retrieve from cold storage**:

```bash
# Request retrieval (async, may take hours)
stella proof retrieve --scan-id $SCAN_ID --root-hash sha256:proof123...

# Check retrieval status
stella proof retrieve-status --request-id req-001
```

### 4.4 Export for Audit

```bash
# Export proof bundle with full chain
stella proof export --scan-id $SCAN_ID \
  --include-chain \
  --include-anchors \
  --output audit-bundle.zip

# Export multiple scans for audit period
stella proof export-batch \
  --from 2025-01-01 \
  --to 2025-01-31 \
  --output-dir ./audit-jan-2025/
```

---

## 5. Troubleshooting

### 5.1 Score Mismatch After Replay

**Symptom**: Replayed score differs from original.

**Diagnosis**:

```bash
# Compare manifests
stella score diff --scan-id $SCAN_ID --original --replayed

# Check for feed changes
stella score manifest --scan-id $SCAN_ID | jq '.concelierSnapshotHash'

# Compare input hashes
stella score inputs --scan-id $SCAN_ID --hash
```

**Common causes**:

1. **Feed snapshot changed**: Original used different advisory data
2. **Policy updated**: Scoring rules changed between runs
3. **VEX statements added**: New VEX data affects scores
4. **Non-deterministic seed**: Check if `deterministic: true` in manifest

**Resolution**:

```bash
# Replay with exact original snapshots
stella score replay --scan-id $SCAN_ID --use-original-snapshots
```

### 5.2 Proof Verification Failed

**Symptom**: Verification returns `verified: false`.

**Diagnosis**:

```bash
# Get detailed error
stella proof verify --bundle-id sha256:proof123... --verbose 2>&1 | head -50

# Common errors:
# - "Signature verification failed": Key mismatch or tampering
# - "ID recomputation failed": Canonical JSON issue
# - "Merkle path invalid": Proof chain corrupted
# - "Rekor entry not found": Not logged to transparency log
```

**Resolution by error type**:

| Error | Cause | Resolution |
|-------|-------|------------|
| Signature failed | Key rotated | Use correct trust anchor |
| ID mismatch | Content modified | Re-generate proof |
| Merkle invalid | Partial upload | Re-download bundle |
| Rekor missing | Log lag or skip | Wait or verify offline |

### 5.3 Missing Proof Bundle

**Symptom**: Proof bundle not found.

**Diagnosis**:

```bash
# Check if scan exists
stella scan status --scan-id $SCAN_ID

# Check proof generation status
stella proof status --scan-id $SCAN_ID

# Check if proof was generated
stella proof list --scan-id $SCAN_ID
```

**Common causes**:

1. **Scan still in progress**: Proof generated after completion
2. **Proof generation failed**: Check worker logs
3. **Archived to cold storage**: Needs retrieval
4. **Retention expired**: Proof deleted per policy

### 5.4 Replay Performance Issues

**Symptom**: Replay taking too long.

**Diagnosis**:

```bash
# Check replay queue depth
stella scheduler queue status replay

# Check worker health
stella scanner workers status

# Check for resource constraints
kubectl top pods -l app=scanner-worker
```

**Optimization**:

```bash
# Reduce parallelism during peak hours
stella scheduler job update nightly-score-replay \
  --config.maxConcurrency=5

# Skip unchanged scans
stella score replay --scan-id $SCAN_ID --skip-unchanged
```

---

## 6. Monitoring & Alerting

### 6.1 Key Metrics

| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| `score_replay_duration_seconds` | Time to replay a score | > 30s |
| `proof_verification_success_rate` | % of successful verifications | < 99% |
| `proof_bundle_size_bytes` | Size of proof bundles | > 100MB |
| `replay_queue_depth` | Pending replay jobs | > 1000 |
| `proof_generation_failures` | Failed proof generations | > 0/hour |

### 6.2 Grafana Dashboard

```
Dashboard: Score Proofs Operations
Panels:
- Replay throughput (replays/minute)
- Replay latency (p50, p95, p99)
- Verification success rate
- Proof bundle storage usage
- Queue depth over time
```

### 6.3 Alerting Rules

```yaml
# Prometheus alerting rules
groups:
  - name: score-proofs
    rules:
      - alert: ReplayLatencyHigh
        expr: histogram_quantile(0.95, score_replay_duration_seconds) > 30
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Score replay latency is high"
          
      - alert: ProofVerificationFailures
        expr: increase(proof_verification_failures_total[1h]) > 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Multiple proof verification failures detected"
          
      - alert: ReplayQueueBacklog
        expr: replay_queue_depth > 1000
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Score replay queue backlog is growing"
```

---

## 7. Escalation Procedures

### 7.1 Escalation Matrix

| Severity | Condition | Response Time | Escalation Path |
|----------|-----------|---------------|-----------------|
| P1 | Proof verification failing for all scans | 15 min | On-call → Team Lead → VP Eng |
| P2 | Replay failures > 10% | 1 hour | On-call → Team Lead |
| P3 | Replay latency > 60s p95 | 4 hours | On-call |
| P4 | Queue backlog > 5000 | 24 hours | Ticket |

### 7.2 P1 Response Procedure

1. **Acknowledge** alert in PagerDuty
2. **Triage**:
   ```bash
   # Check service health
   stella health check --service scanner
   stella health check --service attestor
   
   # Check recent changes
   kubectl rollout history deployment/scanner-worker
   ```
3. **Mitigate**:
   ```bash
   # If recent deployment, rollback
   kubectl rollout undo deployment/scanner-worker
   
   # If key rotation issue, restore previous anchor
   stella anchor restore --anchor-id anchor-001 --revision previous
   ```
4. **Communicate**: Update status page, notify stakeholders
5. **Resolve**: Fix root cause, verify fix
6. **Postmortem**: Document incident within 48 hours

### 7.3 Contact Information

| Role | Contact | Availability |
|------|---------|--------------|
| On-Call Engineer | PagerDuty `scanner-oncall` | 24/7 |
| Scanner Team Lead | @scanner-lead | Business hours |
| Security Team | security@stellaops.local | Business hours |
| VP Engineering | @vp-eng | Escalation only |

---

## Related Documentation

- [Score Proofs API Reference](../api/score-proofs-reachability-api-reference.md)
- [Proof Chain Architecture](../modules/attestor/architecture.md)
- [CLI Reference](./cli-reference.md)
- [Air-Gap Operations](../airgap/operations.md)

---

**Last Updated**: 2025-12-20  
**Version**: 1.0.0  
**Sprint**: 3500.0004.0004