Files

StellaOps Bot efe9bd8cfe Add integration tests for Proof Chain and Reachability workflows

- Implement ProofChainTestFixture for PostgreSQL-backed integration tests.
- Create StellaOps.Integration.ProofChain project with necessary dependencies.
- Add ReachabilityIntegrationTests to validate call graph extraction and reachability analysis.
- Introduce ReachabilityTestFixture for managing corpus and fixture paths.
- Establish StellaOps.Integration.Reachability project with required references.
- Develop UnknownsWorkflowTests to cover the unknowns lifecycle: detection, ranking, escalation, and resolution.
- Create StellaOps.Integration.Unknowns project with dependencies for unknowns workflow.

2025-12-20 22:19:26 +02:00

14 KiB

Raw Blame History

Score Proofs Operations Runbook

Version: 1.0.0
Sprint: 3500.0004.0004
Last Updated: 2025-12-20

This runbook covers operational procedures for Score Proofs, including score replay, proof verification, and troubleshooting.

Overview
Score Replay Operations
Proof Verification Operations
Proof Bundle Management
Troubleshooting
Monitoring & Alerting
Escalation Procedures

1. Overview

What are Score Proofs?

Score Proofs provide cryptographically verifiable audit trails for vulnerability scoring decisions. Each proof:

Records inputs: SBOM, feed snapshots, VEX data, policy hashes
Traces computation: Every scoring rule application
Signs results: DSSE envelopes with configurable trust anchors
Enables replay: Same inputs → same outputs (deterministic)

Key Components

Component	Purpose	Location
Scan Manifest	Records all inputs deterministically	`scanner.scan_manifest` table
Proof Ledger	DAG of scoring computation nodes	`scanner.proof_bundle` table
DSSE Envelope	Cryptographic signature wrapper	In proof bundle JSON
Proof Bundle	ZIP archive for offline verification	Stored in object storage

Prerequisites

Access to Scanner WebService API
scanner.proofs OAuth scope
CLI access with stella configured
Trust anchor public keys (for verification)

2. Score Replay Operations

2.1 When to Replay Scores

Score replay is needed when:

Feed updates: New advisories from Concelier
VEX updates: New VEX statements from Excititor
Policy changes: Updated scoring policy rules
Audit requests: Need to verify historical scores
Investigation: Analyze why a score changed

2.2 Manual Score Replay (API)

# Get current scan manifest
curl -s "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/manifest" \
  -H "Authorization: Bearer $TOKEN" | jq '.manifest'

# Replay with current feeds (uses latest snapshots)
curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/score/replay" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{}' | jq '.scoreProof.rootHash'

# Replay with specific feed snapshot
curl -X POST "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/score/replay" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "overrides": {
      "concelierSnapshotHash": "sha256:specific-feed-snapshot..."
    }
  }'

2.3 Manual Score Replay (CLI)

# Replay with current feeds
stella score replay --scan-id $SCAN_ID

# Replay with specific snapshot
stella score replay --scan-id $SCAN_ID \
  --feed-snapshot sha256:specific-feed-snapshot...

# Replay and compare with original
stella score replay --scan-id $SCAN_ID --diff

# Replay in offline mode (air-gap)
stella score replay --scan-id $SCAN_ID \
  --offline \
  --bundle /path/to/offline-bundle.zip

2.4 Batch Score Replay

For bulk replay (e.g., after major feed update):

# List all scans from last 7 days
stella scan list --since 7d --format json > scans.json

# Replay each scan
cat scans.json | jq -r '.[].scanId' | while read SCAN_ID; do
  echo "Replaying $SCAN_ID..."
  stella score replay --scan-id "$SCAN_ID" --quiet
done

# Or use the batch API endpoint (more efficient)
curl -X POST "https://scanner.example.com/api/v1/scanner/batch/replay" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "scanIds": ["scan-1", "scan-2", "scan-3"],
    "parallel": true,
    "maxConcurrency": 10
  }'

2.5 Nightly Replay Job

The Scheduler automatically replays scores when Concelier publishes new snapshots:

# Job configuration in Scheduler
job:
  name: nightly-score-replay
  schedule: "0 3 * * *"  # 3 AM daily
  trigger:
    type: concelier-snapshot-published
  action:
    type: batch-replay
    config:
      maxAge: 30d
      parallel: true
      maxConcurrency: 20

Monitoring the nightly job:

# Check job status
stella scheduler job status nightly-score-replay

# View recent runs
stella scheduler job runs nightly-score-replay --last 7

# Check for failures
stella scheduler job runs nightly-score-replay --status failed

3. Proof Verification Operations

3.1 Online Verification

# Verify via API
curl -X POST "https://scanner.example.com/api/v1/proofs/verify" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "proofBundleId": "sha256:proof123...",
    "checkRekor": true,
    "anchorIds": ["anchor-001"]
  }'

# Verify via CLI
stella proof verify --bundle-id sha256:proof123... --check-rekor

3.2 Offline Verification (Air-Gap)

For air-gapped environments:

# 1. Download proof bundle (on connected system)
curl -o proof-bundle.zip \
  "https://scanner.example.com/api/v1/scanner/scans/$SCAN_ID/proofs/sha256:proof123..."

# 2. Transfer to air-gapped system (USB, etc.)

# 3. Verify offline (on air-gapped system)
stella proof verify --bundle proof-bundle.zip \
  --offline \
  --trust-anchor /path/to/trust-anchor.pem

# 4. Verify with explicit public key
stella proof verify --bundle proof-bundle.zip \
  --offline \
  --public-key /path/to/public-key.pem \
  --skip-rekor  # No network access

3.3 Verification Checks

Check	Description	Can Skip?
Signature Valid	DSSE signature matches payload	No
ID Recomputed	Content-addressed ID matches	No
Merkle Path Valid	Merkle tree construction correct	No
Rekor Inclusion	Transparency log entry exists	Yes (offline)
Timestamp Valid	Proof created within valid window	Configurable

3.4 Failed Verification Troubleshooting

# Get detailed verification report
stella proof verify --bundle-id sha256:proof123... --verbose

# Check specific failures
stella proof verify --bundle-id sha256:proof123... --check signatureValid
stella proof verify --bundle-id sha256:proof123... --check idRecomputed
stella proof verify --bundle-id sha256:proof123... --check merklePathValid

# Dump proof bundle contents for inspection
stella proof inspect --bundle proof-bundle.zip --output-dir ./inspection/

4. Proof Bundle Management

4.1 Download Proof Bundles

# Download single bundle
stella proof download --scan-id $SCAN_ID --output proof.zip

# Download with specific root hash
stella proof download --scan-id $SCAN_ID \
  --root-hash sha256:proof123... \
  --output proof.zip

# Download all bundles for a scan
stella proof download --scan-id $SCAN_ID --all --output-dir ./proofs/

4.2 Bundle Contents

# List bundle contents
unzip -l proof-bundle.zip

# Expected contents:
#   manifest.json        - Scan manifest (canonical JSON)
#   manifest.dsse.json   - DSSE signature of manifest
#   score_proof.json     - Proof ledger (ProofNode array)
#   proof_root.dsse.json - DSSE signature of proof root
#   meta.json            - Metadata (timestamps, versions)

# Extract and inspect
unzip proof-bundle.zip -d ./proof-contents/
cat ./proof-contents/manifest.json | jq .
cat ./proof-contents/score_proof.json | jq '.nodes | length'

4.3 Proof Retention

Proof bundles are retained based on policy:

Tier	Retention	Description
Hot	30 days	Recent proofs, fast access
Warm	1 year	Archived proofs, slower access
Cold	7 years	Compliance archive, retrieval required

Check retention status:

stella proof status --scan-id $SCAN_ID
# Output: tier=hot, expires=2025-01-17, retrievable=true

Retrieve from cold storage:

# Request retrieval (async, may take hours)
stella proof retrieve --scan-id $SCAN_ID --root-hash sha256:proof123...

# Check retrieval status
stella proof retrieve-status --request-id req-001

4.4 Export for Audit

# Export proof bundle with full chain
stella proof export --scan-id $SCAN_ID \
  --include-chain \
  --include-anchors \
  --output audit-bundle.zip

# Export multiple scans for audit period
stella proof export-batch \
  --from 2025-01-01 \
  --to 2025-01-31 \
  --output-dir ./audit-jan-2025/

5. Troubleshooting

5.1 Score Mismatch After Replay

Symptom: Replayed score differs from original.

Diagnosis:

# Compare manifests
stella score diff --scan-id $SCAN_ID --original --replayed

# Check for feed changes
stella score manifest --scan-id $SCAN_ID | jq '.concelierSnapshotHash'

# Compare input hashes
stella score inputs --scan-id $SCAN_ID --hash

Common causes:

Feed snapshot changed: Original used different advisory data
Policy updated: Scoring rules changed between runs
VEX statements added: New VEX data affects scores
Non-deterministic seed: Check if deterministic: true in manifest

Resolution:

# Replay with exact original snapshots
stella score replay --scan-id $SCAN_ID --use-original-snapshots

5.2 Proof Verification Failed

Symptom: Verification returns verified: false.

Diagnosis:

# Get detailed error
stella proof verify --bundle-id sha256:proof123... --verbose 2>&1 | head -50

# Common errors:
# - "Signature verification failed": Key mismatch or tampering
# - "ID recomputation failed": Canonical JSON issue
# - "Merkle path invalid": Proof chain corrupted
# - "Rekor entry not found": Not logged to transparency log

Resolution by error type:

Error	Cause	Resolution
Signature failed	Key rotated	Use correct trust anchor
ID mismatch	Content modified	Re-generate proof
Merkle invalid	Partial upload	Re-download bundle
Rekor missing	Log lag or skip	Wait or verify offline

5.3 Missing Proof Bundle

Symptom: Proof bundle not found.

Diagnosis:

# Check if scan exists
stella scan status --scan-id $SCAN_ID

# Check proof generation status
stella proof status --scan-id $SCAN_ID

# Check if proof was generated
stella proof list --scan-id $SCAN_ID

Common causes:

Scan still in progress: Proof generated after completion
Proof generation failed: Check worker logs
Archived to cold storage: Needs retrieval
Retention expired: Proof deleted per policy

5.4 Replay Performance Issues

Symptom: Replay taking too long.

Diagnosis:

# Check replay queue depth
stella scheduler queue status replay

# Check worker health
stella scanner workers status

# Check for resource constraints
kubectl top pods -l app=scanner-worker

Optimization:

# Reduce parallelism during peak hours
stella scheduler job update nightly-score-replay \
  --config.maxConcurrency=5

# Skip unchanged scans
stella score replay --scan-id $SCAN_ID --skip-unchanged

6. Monitoring & Alerting

6.1 Key Metrics

Metric	Description	Alert Threshold
`score_replay_duration_seconds`	Time to replay a score	> 30s
`proof_verification_success_rate`	% of successful verifications	< 99%
`proof_bundle_size_bytes`	Size of proof bundles	> 100MB
`replay_queue_depth`	Pending replay jobs	> 1000
`proof_generation_failures`	Failed proof generations	> 0/hour

6.2 Grafana Dashboard

Dashboard: Score Proofs Operations
Panels:
- Replay throughput (replays/minute)
- Replay latency (p50, p95, p99)
- Verification success rate
- Proof bundle storage usage
- Queue depth over time

6.3 Alerting Rules

# Prometheus alerting rules
groups:
  - name: score-proofs
    rules:
      - alert: ReplayLatencyHigh
        expr: histogram_quantile(0.95, score_replay_duration_seconds) > 30
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Score replay latency is high"
          
      - alert: ProofVerificationFailures
        expr: increase(proof_verification_failures_total[1h]) > 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Multiple proof verification failures detected"
          
      - alert: ReplayQueueBacklog
        expr: replay_queue_depth > 1000
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Score replay queue backlog is growing"

7. Escalation Procedures

7.1 Escalation Matrix

Severity	Condition	Response Time	Escalation Path
P1	Proof verification failing for all scans	15 min	On-call → Team Lead → VP Eng
P2	Replay failures > 10%	1 hour	On-call → Team Lead
P3	Replay latency > 60s p95	4 hours	On-call
P4	Queue backlog > 5000	24 hours	Ticket

7.2 P1 Response Procedure

Acknowledge alert in PagerDuty

Triage:

# Check service health
stella health check --service scanner
stella health check --service attestor

# Check recent changes
kubectl rollout history deployment/scanner-worker

Mitigate:

# If recent deployment, rollback
kubectl rollout undo deployment/scanner-worker

# If key rotation issue, restore previous anchor
stella anchor restore --anchor-id anchor-001 --revision previous

Communicate: Update status page, notify stakeholders
Resolve: Fix root cause, verify fix
Postmortem: Document incident within 48 hours

7.3 Contact Information

Role	Contact	Availability
On-Call Engineer	PagerDuty `scanner-oncall`	24/7
Scanner Team Lead	@scanner-lead	Business hours
Security Team	security@stellaops.local	Business hours
VP Engineering	@vp-eng	Escalation only

Last Updated: 2025-12-20
Version: 1.0.0
Sprint: 3500.0004.0004

14 KiB Raw Blame History

Score Proofs Operations Runbook

Table of Contents

1. Overview

What are Score Proofs?

Key Components

Prerequisites

2. Score Replay Operations

2.1 When to Replay Scores

2.2 Manual Score Replay (API)

2.3 Manual Score Replay (CLI)

2.4 Batch Score Replay

2.5 Nightly Replay Job

3. Proof Verification Operations

3.1 Online Verification

3.2 Offline Verification (Air-Gap)

3.3 Verification Checks

3.4 Failed Verification Troubleshooting

4. Proof Bundle Management

4.1 Download Proof Bundles

4.2 Bundle Contents

4.3 Proof Retention

4.4 Export for Audit

5. Troubleshooting

5.1 Score Mismatch After Replay

5.2 Proof Verification Failed

5.3 Missing Proof Bundle

5.4 Replay Performance Issues

6. Monitoring & Alerting

6.1 Key Metrics

6.2 Grafana Dashboard

6.3 Alerting Rules

7. Escalation Procedures

7.1 Escalation Matrix

7.2 P1 Response Procedure

7.3 Contact Information

Related Documentation

14 KiB

Raw Blame History