docs(ops): Complete operations runbooks for Epic 3500
Sprint 3500.0004.0004 (Documentation & Handoff) - T2 DONE Operations Runbooks Added: - score-replay-runbook.md: Deterministic replay procedures - proof-verification-runbook.md: DSSE/Merkle verification ops - airgap-operations-runbook.md: Offline kit management CLI Reference Docs: - reachability-cli-reference.md - score-proofs-cli-reference.md - unknowns-cli-reference.md Air-Gap Guides: - score-proofs-reachability-airgap-runbook.md Training Materials: - score-proofs-concept-guide.md UI API Clients: - proof.client.ts - reachability.client.ts - unknowns.client.ts All 5 operations runbooks now complete (reachability, unknowns-queue, score-replay, proof-verification, airgap-operations).
This commit is contained in:
688
docs/operations/airgap-operations-runbook.md
Normal file
688
docs/operations/airgap-operations-runbook.md
Normal file
@@ -0,0 +1,688 @@
|
||||
# Air-Gap Operations Runbook
|
||||
|
||||
> **Version**: 1.0.0
|
||||
> **Sprint**: 3500.0004.0004
|
||||
> **Last Updated**: 2025-12-20
|
||||
|
||||
This runbook covers operational procedures for running StellaOps in air-gapped (offline) environments, including offline kit management, feed updates, and isolated verification workflows.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#1-overview)
|
||||
2. [Offline Kit Management](#2-offline-kit-management)
|
||||
3. [Feed Updates](#3-feed-updates)
|
||||
4. [Scanning in Air-Gap Mode](#4-scanning-in-air-gap-mode)
|
||||
5. [Verification in Air-Gap Mode](#5-verification-in-air-gap-mode)
|
||||
6. [Troubleshooting](#6-troubleshooting)
|
||||
7. [Monitoring & Health Checks](#7-monitoring--health-checks)
|
||||
8. [Escalation Procedures](#8-escalation-procedures)
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
### What is Air-Gap Mode?
|
||||
|
||||
Air-gap mode allows StellaOps to operate in environments with no external network connectivity. This is required for:
|
||||
|
||||
- Classified or sensitive environments
|
||||
- High-security facilities
|
||||
- Regulatory compliance (certain industries)
|
||||
- Disaster recovery scenarios
|
||||
|
||||
### Air-Gap Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Connected Environment │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Feed Sync │───►│ Bundle │───►│ Offline Kit │ │
|
||||
│ │ Service │ │ Generator │ │ (.tar.gz) │ │
|
||||
│ └─────────────┘ └─────────────┘ └──────┬──────┘ │
|
||||
└─────────────────────────────────────────────────┼───────────┘
|
||||
│ Physical
|
||||
│ Transfer
|
||||
┌─────────────────────────────────────────────────┼───────────┐
|
||||
│ Air-Gapped Environment │ │
|
||||
│ ┌──────────────┐ ┌─────────────┐ ┌──────▼──────┐ │
|
||||
│ │ StellaOps │◄───│ Offline │◄───│ Import │ │
|
||||
│ │ Scanner │ │ Data Store │ │ Service │ │
|
||||
│ └──────────────┘ └─────────────┘ └─────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Offline Kit Contents
|
||||
|
||||
| Component | Description | Update Frequency |
|
||||
|-----------|-------------|------------------|
|
||||
| Vulnerability Database | NVD, OSV, vendor advisories | Daily/Weekly |
|
||||
| Advisory Feeds | CVE details, EPSS scores | Daily |
|
||||
| Trust Bundles | CA certificates, signing keys | Quarterly |
|
||||
| Rules Engine | Scoring rules and policies | Monthly |
|
||||
| Offline Binaries | CLI tools, extractors | Per release |
|
||||
|
||||
---
|
||||
|
||||
## 2. Offline Kit Management
|
||||
|
||||
### 2.1 Generating an Offline Kit
|
||||
|
||||
On the connected system:
|
||||
|
||||
```bash
|
||||
# Generate full offline kit
|
||||
stella offline-kit create \
|
||||
--output /path/to/offline-kit.tar.gz \
|
||||
--include-all
|
||||
|
||||
# Generate minimal kit (feeds only)
|
||||
stella offline-kit create \
|
||||
--output /path/to/offline-kit-feeds.tar.gz \
|
||||
--feeds-only
|
||||
|
||||
# Generate with specific components
|
||||
stella offline-kit create \
|
||||
--output /path/to/offline-kit.tar.gz \
|
||||
--include vuln-db \
|
||||
--include advisories \
|
||||
--include trust-bundles \
|
||||
--include rules
|
||||
```
|
||||
|
||||
### 2.2 Kit Manifest
|
||||
|
||||
Each kit includes a manifest for verification:
|
||||
|
||||
```json
|
||||
{
|
||||
"version": "1.0.0",
|
||||
"createdAt": "2025-01-15T00:00:00Z",
|
||||
"expiresAt": "2025-02-15T00:00:00Z",
|
||||
"components": {
|
||||
"vulnerability-database": {
|
||||
"hash": "sha256:abc123...",
|
||||
"size": 1073741824,
|
||||
"records": 245000
|
||||
},
|
||||
"advisory-feeds": {
|
||||
"hash": "sha256:def456...",
|
||||
"size": 536870912,
|
||||
"lastUpdate": "2025-01-15T00:00:00Z"
|
||||
},
|
||||
"trust-bundles": {
|
||||
"hash": "sha256:ghi789...",
|
||||
"size": 65536,
|
||||
"certificates": 12
|
||||
},
|
||||
"signing-keys": {
|
||||
"hash": "sha256:jkl012...",
|
||||
"keyIds": ["key-001", "key-002"]
|
||||
}
|
||||
},
|
||||
"signature": "base64-signature..."
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 Transferring to Air-Gapped Environment
|
||||
|
||||
#### Physical Media Transfer
|
||||
|
||||
```bash
|
||||
# On connected system - write to media
|
||||
cp offline-kit.tar.gz /media/secure-usb/
|
||||
sha256sum offline-kit.tar.gz > /media/secure-usb/offline-kit.tar.gz.sha256
|
||||
|
||||
# On air-gapped system - verify and import
|
||||
cd /media/secure-usb/
|
||||
sha256sum -c offline-kit.tar.gz.sha256
|
||||
stella offline-kit import --kit offline-kit.tar.gz
|
||||
```
|
||||
|
||||
#### Secure File Transfer (if available)
|
||||
|
||||
```bash
|
||||
# Using data diode or one-way transfer
|
||||
scp offline-kit.tar.gz airgap-gateway:/incoming/
|
||||
```
|
||||
|
||||
### 2.4 Installing Offline Kit
|
||||
|
||||
```bash
|
||||
# Import and install kit
|
||||
stella offline-kit import \
|
||||
--kit /path/to/offline-kit.tar.gz \
|
||||
--verify \
|
||||
--install
|
||||
|
||||
# Verify installation
|
||||
stella offline-kit status
|
||||
|
||||
# List installed components
|
||||
stella offline-kit list
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
Offline Kit Status
|
||||
══════════════════════════════════════════
|
||||
Mode: AIR-GAP
|
||||
Kit Version: 1.0.0
|
||||
Installed At: 2025-01-15T10:30:00Z
|
||||
Expires At: 2025-02-15T00:00:00Z
|
||||
|
||||
Components:
|
||||
✓ vulnerability-database 2025-01-15 245,000 records
|
||||
✓ advisory-feeds 2025-01-15 Active
|
||||
✓ trust-bundles 2025-01-15 12 certificates
|
||||
✓ signing-keys 2025-01-15 2 keys
|
||||
✓ rules-engine 2025-01-15 v2.3.0
|
||||
|
||||
Health: HEALTHY
|
||||
Days Until Expiry: 31
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Feed Updates
|
||||
|
||||
### 3.1 Update Workflow
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────┐
|
||||
│ Weekly Feed Update Process │
|
||||
├────────────────────────────────────────────────────────────┤
|
||||
│ Day 1: Generate kit on connected system │
|
||||
│ Day 2: Security review and approval │
|
||||
│ Day 3: Transfer to air-gapped environment │
|
||||
│ Day 4: Import and verify │
|
||||
│ Day 5: Activate new feeds │
|
||||
└────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 3.2 Generating Delta Updates
|
||||
|
||||
For faster updates, generate delta kits:
|
||||
|
||||
```bash
|
||||
# On connected system
|
||||
stella offline-kit create \
|
||||
--output delta-kit.tar.gz \
|
||||
--delta-from 2025-01-08 \
|
||||
--delta-to 2025-01-15
|
||||
|
||||
# Delta kit is smaller, contains only changes
|
||||
```
|
||||
|
||||
### 3.3 Applying Updates
|
||||
|
||||
```bash
|
||||
# Import delta update
|
||||
stella offline-kit import \
|
||||
--kit delta-kit.tar.gz \
|
||||
--delta \
|
||||
--verify
|
||||
|
||||
# Verify feed freshness
|
||||
stella feeds status
|
||||
```
|
||||
|
||||
### 3.4 Rollback Procedure
|
||||
|
||||
If an update causes issues:
|
||||
|
||||
```bash
|
||||
# List available snapshots
|
||||
stella offline-kit snapshots
|
||||
|
||||
# Rollback to previous version
|
||||
stella offline-kit rollback --to 2025-01-08
|
||||
|
||||
# Verify rollback
|
||||
stella feeds status
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Scanning in Air-Gap Mode
|
||||
|
||||
### 4.1 Enabling Air-Gap Mode
|
||||
|
||||
```bash
|
||||
# Enable air-gap mode
|
||||
stella config set mode air-gap
|
||||
|
||||
# Verify mode
|
||||
stella config get mode
|
||||
# Output: air-gap
|
||||
```
|
||||
|
||||
### 4.2 Running Scans
|
||||
|
||||
```bash
|
||||
# Scan a local image (no registry pull)
|
||||
stella scan image --local /path/to/image.tar
|
||||
|
||||
# Scan from local registry
|
||||
stella scan image localhost:5000/myapp:v1.0
|
||||
|
||||
# Scan with offline feeds explicitly
|
||||
stella scan image myapp:v1.0 --offline-feeds
|
||||
```
|
||||
|
||||
### 4.3 Local Image Preparation
|
||||
|
||||
For images that need to be scanned:
|
||||
|
||||
```bash
|
||||
# On connected system - save image
|
||||
docker save myapp:v1.0 -o myapp-v1.0.tar
|
||||
sha256sum myapp-v1.0.tar > myapp-v1.0.tar.sha256
|
||||
|
||||
# Transfer to air-gapped system
|
||||
# ... physical transfer ...
|
||||
|
||||
# On air-gapped system - verify and load
|
||||
sha256sum -c myapp-v1.0.tar.sha256
|
||||
docker load -i myapp-v1.0.tar
|
||||
stella scan image myapp:v1.0 --local
|
||||
```
|
||||
|
||||
### 4.4 SBOM Generation
|
||||
|
||||
```bash
|
||||
# Generate SBOM for local image
|
||||
stella sbom generate --image myapp:v1.0 --output sbom.json
|
||||
|
||||
# Scan existing SBOM
|
||||
stella scan sbom --file sbom.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Verification in Air-Gap Mode
|
||||
|
||||
### 5.1 Offline Proof Verification
|
||||
|
||||
```bash
|
||||
# Verify proof bundle offline
|
||||
stella proof verify --bundle bundle.tar.gz --offline
|
||||
|
||||
# Verify with explicit trust store
|
||||
stella proof verify --bundle bundle.tar.gz \
|
||||
--offline \
|
||||
--trust-store /etc/stellaops/offline/trust-roots.json
|
||||
```
|
||||
|
||||
### 5.2 Preparing Trust Store
|
||||
|
||||
Before air-gapped deployment:
|
||||
|
||||
```bash
|
||||
# On connected system - export trust configuration
|
||||
stella trust export \
|
||||
--output trust-roots.json \
|
||||
--include-ca \
|
||||
--include-signing-keys
|
||||
|
||||
# Transfer to air-gapped system
|
||||
# ... physical transfer ...
|
||||
|
||||
# On air-gapped system - import trust
|
||||
stella trust import --file trust-roots.json
|
||||
```
|
||||
|
||||
### 5.3 Score Replay Offline
|
||||
|
||||
```bash
|
||||
# Replay score using offline data
|
||||
stella score replay --scan $SCAN_ID --offline
|
||||
|
||||
# Replay with frozen time
|
||||
stella score replay --scan $SCAN_ID \
|
||||
--offline \
|
||||
--freeze 2025-01-15T00:00:00Z
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Troubleshooting
|
||||
|
||||
### 6.1 Kit Import Fails
|
||||
|
||||
**Symptoms**: `Failed to import offline kit`
|
||||
|
||||
**Diagnostic Steps**:
|
||||
|
||||
1. Verify kit integrity:
|
||||
```bash
|
||||
sha256sum offline-kit.tar.gz
|
||||
# Compare with manifest
|
||||
```
|
||||
|
||||
2. Check kit signature:
|
||||
```bash
|
||||
stella offline-kit verify --kit offline-kit.tar.gz
|
||||
```
|
||||
|
||||
3. Check disk space:
|
||||
```bash
|
||||
df -h /var/lib/stellaops/
|
||||
```
|
||||
|
||||
**Common Causes**:
|
||||
|
||||
| Cause | Resolution |
|
||||
|-------|------------|
|
||||
| Corrupted transfer | Re-transfer, verify checksum |
|
||||
| Invalid signature | Regenerate kit with valid signing key |
|
||||
| Insufficient space | Free disk space or expand volume |
|
||||
| Expired kit | Generate fresh kit |
|
||||
|
||||
### 6.2 Stale Feed Data
|
||||
|
||||
**Symptoms**: Scans report old vulnerabilities, miss new CVEs
|
||||
|
||||
**Diagnostic Steps**:
|
||||
|
||||
1. Check feed age:
|
||||
```bash
|
||||
stella feeds status
|
||||
```
|
||||
|
||||
2. Verify last update:
|
||||
```bash
|
||||
stella offline-kit status | grep "Installed At"
|
||||
```
|
||||
|
||||
**Resolution**:
|
||||
- Generate and import fresh offline kit
|
||||
- Establish regular update schedule
|
||||
- Set up expiry alerts
|
||||
|
||||
### 6.3 Trust Verification Fails
|
||||
|
||||
**Symptoms**: `Certificate chain verification failed` in offline mode
|
||||
|
||||
**Diagnostic Steps**:
|
||||
|
||||
1. Check trust store:
|
||||
```bash
|
||||
stella trust list
|
||||
```
|
||||
|
||||
2. Verify CA bundle:
|
||||
```bash
|
||||
openssl verify -CAfile /etc/stellaops/offline/ca-bundle.pem \
|
||||
/path/to/certificate.pem
|
||||
```
|
||||
|
||||
3. Check for expired roots:
|
||||
```bash
|
||||
stella trust check-expiry
|
||||
```
|
||||
|
||||
**Resolution**:
|
||||
- Update trust bundle in offline kit
|
||||
- Import new CA certificates
|
||||
- Rotate expired signing keys
|
||||
|
||||
### 6.4 Network Access Attempted
|
||||
|
||||
**Symptoms**: Air-gapped system attempts network connection
|
||||
|
||||
**Diagnostic Steps**:
|
||||
|
||||
1. Check mode configuration:
|
||||
```bash
|
||||
stella config get mode
|
||||
```
|
||||
|
||||
2. Audit network attempts:
|
||||
```bash
|
||||
journalctl -u stellaops | grep -i "network\|connect\|http"
|
||||
```
|
||||
|
||||
3. Verify no external URLs in config:
|
||||
```bash
|
||||
grep -r "http" /etc/stellaops/
|
||||
```
|
||||
|
||||
**Resolution**:
|
||||
- Ensure `mode: air-gap` in configuration
|
||||
- Remove any hardcoded URLs
|
||||
- Block outbound traffic at firewall level
|
||||
|
||||
---
|
||||
|
||||
## 7. Monitoring & Health Checks
|
||||
|
||||
### 7.1 Air-Gap Health Checks
|
||||
|
||||
```bash
|
||||
# Run comprehensive health check
|
||||
stella health check --air-gap
|
||||
|
||||
# Output
|
||||
Air-Gap Health Check
|
||||
══════════════════════════════════════════
|
||||
✓ Mode: air-gap
|
||||
✓ Feed Freshness: 7 days old (OK)
|
||||
✓ Trust Store: Valid
|
||||
✓ Signing Keys: 2 active
|
||||
✓ Disk Space: 45% used
|
||||
✓ Database: Healthy
|
||||
⚠ Kit Expiry: 24 days remaining
|
||||
|
||||
Overall: HEALTHY (1 warning)
|
||||
```
|
||||
|
||||
### 7.2 Automated Monitoring Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# /etc/stellaops/scripts/airgap-health.sh
|
||||
|
||||
set -e
|
||||
|
||||
# Check feed age
|
||||
FEED_AGE=$(stella feeds age --days)
|
||||
if [ "$FEED_AGE" -gt 14 ]; then
|
||||
echo "CRITICAL: Feeds are $FEED_AGE days old"
|
||||
exit 2
|
||||
fi
|
||||
|
||||
# Check kit expiry
|
||||
DAYS_LEFT=$(stella offline-kit days-until-expiry)
|
||||
if [ "$DAYS_LEFT" -lt 7 ]; then
|
||||
echo "WARNING: Kit expires in $DAYS_LEFT days"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check trust store
|
||||
if ! stella trust verify --quiet; then
|
||||
echo "CRITICAL: Trust store verification failed"
|
||||
exit 2
|
||||
fi
|
||||
|
||||
echo "OK: Air-gap health check passed"
|
||||
exit 0
|
||||
```
|
||||
|
||||
### 7.3 Metrics for Air-Gap
|
||||
|
||||
| Metric | Description | Alert Threshold |
|
||||
|--------|-------------|-----------------|
|
||||
| `offline_kit_age_days` | Days since kit import | > 14 days |
|
||||
| `offline_kit_expiry_days` | Days until kit expires | < 7 days |
|
||||
| `feed_freshness_days` | Age of vulnerability feeds | > 7 days |
|
||||
| `trust_store_valid` | Trust store validity (0/1) | = 0 |
|
||||
| `disk_usage_percent` | Data store disk usage | > 80% |
|
||||
|
||||
### 7.4 Alert Configuration
|
||||
|
||||
```yaml
|
||||
# /etc/stellaops/alerts/airgap.yaml
|
||||
alerts:
|
||||
- name: offline_kit_expiring
|
||||
condition: offline_kit_expiry_days < 7
|
||||
severity: warning
|
||||
message: "Offline kit expires in {{ .Value }} days"
|
||||
|
||||
- name: offline_kit_expired
|
||||
condition: offline_kit_expiry_days <= 0
|
||||
severity: critical
|
||||
message: "Offline kit has expired"
|
||||
|
||||
- name: feeds_stale
|
||||
condition: feed_freshness_days > 14
|
||||
severity: warning
|
||||
message: "Vulnerability feeds are {{ .Value }} days old"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Escalation Procedures
|
||||
|
||||
### 8.1 Escalation Matrix
|
||||
|
||||
| Severity | Condition | Response Time | Action |
|
||||
|----------|-----------|---------------|--------|
|
||||
| P1 - Critical | Offline kit expired | 4 hours | Emergency kit transfer |
|
||||
| P1 - Critical | Trust store invalid | 4 hours | Restore from backup |
|
||||
| P2 - High | Feeds > 14 days old | 24 hours | Schedule kit update |
|
||||
| P3 - Medium | Kit expiring in < 7 days | 48 hours | Plan kit update |
|
||||
| P4 - Low | Minor health check warnings | Next maintenance | Review and address |
|
||||
|
||||
### 8.2 Emergency Kit Update Process
|
||||
|
||||
When kit expires before scheduled update:
|
||||
|
||||
1. **On Connected System** (0-2 hours):
|
||||
```bash
|
||||
stella offline-kit create --output emergency-kit.tar.gz --include-all
|
||||
```
|
||||
|
||||
2. **Security Review** (2-4 hours):
|
||||
- Verify kit signature
|
||||
- Check for known vulnerabilities in kit
|
||||
- Get approval for transfer
|
||||
|
||||
3. **Transfer** (4-6 hours):
|
||||
- Physical media preparation
|
||||
- Chain of custody documentation
|
||||
- Transfer to air-gapped environment
|
||||
|
||||
4. **Import** (6-8 hours):
|
||||
```bash
|
||||
stella offline-kit import --kit emergency-kit.tar.gz --verify --install
|
||||
```
|
||||
|
||||
### 8.3 Contacts
|
||||
|
||||
| Role | Contact | Availability |
|
||||
|------|---------|--------------|
|
||||
| Air-Gap Operations | airgap-ops@stellaops.io | Business hours |
|
||||
| Security Team | security@stellaops.io | Business hours |
|
||||
| Platform On-Call | platform-oncall@stellaops.io | 24/7 |
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: Configuration Reference
|
||||
|
||||
### Air-Gap Mode Configuration
|
||||
|
||||
```yaml
|
||||
# /etc/stellaops/config.yaml
|
||||
mode: air-gap
|
||||
|
||||
offline:
|
||||
dataDir: /var/lib/stellaops/offline
|
||||
feedsDir: /var/lib/stellaops/offline/feeds
|
||||
trustStore: /etc/stellaops/offline/trust-roots.json
|
||||
caBundle: /etc/stellaops/offline/ca-bundle.pem
|
||||
|
||||
# Disable all external network calls
|
||||
disableNetworking: true
|
||||
|
||||
# Kit expiry settings
|
||||
kit:
|
||||
expiryWarningDays: 7
|
||||
maxAgeDays: 30
|
||||
|
||||
# Feed freshness settings
|
||||
feeds:
|
||||
maxAgeDays: 14
|
||||
warnAgeDays: 7
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Air-gap specific
|
||||
export STELLAOPS_MODE=air-gap
|
||||
export STELLAOPS_OFFLINE_DATA_DIR=/var/lib/stellaops/offline
|
||||
export STELLAOPS_DISABLE_NETWORKING=true
|
||||
|
||||
# Trust configuration
|
||||
export STELLAOPS_TRUST_STORE=/etc/stellaops/offline/trust-roots.json
|
||||
export STELLAOPS_CA_BUNDLE=/etc/stellaops/offline/ca-bundle.pem
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Appendix B: CLI Quick Reference
|
||||
|
||||
```bash
|
||||
# Offline Kit Commands
|
||||
stella offline-kit create --output <path> # Generate kit
|
||||
stella offline-kit import --kit <path> # Import kit
|
||||
stella offline-kit status # Show status
|
||||
stella offline-kit verify --kit <path> # Verify kit
|
||||
stella offline-kit rollback --to <date> # Rollback feeds
|
||||
|
||||
# Feed Commands (Air-Gap)
|
||||
stella feeds status # Show feed status
|
||||
stella feeds age --days # Get feed age
|
||||
|
||||
# Trust Commands
|
||||
stella trust list # List trust roots
|
||||
stella trust import --file <path> # Import trust config
|
||||
stella trust verify # Verify trust store
|
||||
stella trust export --output <path> # Export trust config
|
||||
|
||||
# Scanning (Air-Gap)
|
||||
stella scan image --local <path> # Scan local image
|
||||
stella scan sbom --file <path> # Scan SBOM file
|
||||
|
||||
# Verification (Air-Gap)
|
||||
stella proof verify --bundle <path> --offline # Offline verification
|
||||
stella score replay --scan <id> --offline # Offline replay
|
||||
|
||||
# Health
|
||||
stella health check --air-gap # Air-gap health check
|
||||
stella config get mode # Check current mode
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Appendix C: Update Schedule Template
|
||||
|
||||
| Week | Activity | Owner | Notes |
|
||||
|------|----------|-------|-------|
|
||||
| 1 | Generate kit | Connected Ops | Monday |
|
||||
| 1 | Security review | Security Team | Tuesday |
|
||||
| 1 | Approval | Security Lead | Wednesday |
|
||||
| 2 | Transfer | Air-Gap Ops | Monday |
|
||||
| 2 | Import & verify | Air-Gap Ops | Tuesday |
|
||||
| 2 | Activate | Air-Gap Ops | Wednesday |
|
||||
| 2 | Validate scans | QA Team | Thursday |
|
||||
|
||||
---
|
||||
|
||||
## Revision History
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 1.0.0 | 2025-12-20 | Agent | Initial release |
|
||||
630
docs/operations/proof-verification-runbook.md
Normal file
630
docs/operations/proof-verification-runbook.md
Normal file
@@ -0,0 +1,630 @@
|
||||
# Proof Verification Operations Runbook
|
||||
|
||||
> **Version**: 1.0.0
|
||||
> **Sprint**: 3500.0004.0004
|
||||
> **Last Updated**: 2025-12-20
|
||||
|
||||
This runbook covers operational procedures for Proof Verification, including DSSE signature validation, Merkle tree verification, transparency log checks, and offline verification workflows.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#1-overview)
|
||||
2. [Verification Operations](#2-verification-operations)
|
||||
3. [Offline Verification](#3-offline-verification)
|
||||
4. [Transparency Log Integration](#4-transparency-log-integration)
|
||||
5. [Troubleshooting](#5-troubleshooting)
|
||||
6. [Monitoring & Alerting](#6-monitoring--alerting)
|
||||
7. [Escalation Procedures](#7-escalation-procedures)
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
### What is Proof Verification?
|
||||
|
||||
Proof Verification is the process of cryptographically validating that a scan result has not been tampered with and was produced by an authorized StellaOps instance. It involves:
|
||||
|
||||
- **DSSE Signature Verification**: Validate the signing envelope
|
||||
- **Merkle Tree Verification**: Confirm the root hash matches the proof
|
||||
- **Certificate Chain Validation**: Verify the signing certificate
|
||||
- **Transparency Log Check**: Optional Rekor/Sigstore verification
|
||||
|
||||
### Verification Components
|
||||
|
||||
| Component | Purpose | Verification Type |
|
||||
|-----------|---------|-------------------|
|
||||
| DSSE Envelope | Contains signed payload | Signature validation |
|
||||
| Merkle Proof | Cryptographic proof of inclusion | Hash verification |
|
||||
| Certificate | Signing identity | Chain validation |
|
||||
| Rekor Entry | Transparency log record | Log inclusion proof |
|
||||
|
||||
### Trust Model
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Trust Hierarchy │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Root CA (Offline) │
|
||||
│ └── Intermediate CA │
|
||||
│ └── Signing Certificate (Scanner Instance) │
|
||||
│ └── DSSE Envelope │
|
||||
│ └── Proof Bundle │
|
||||
│ └── Manifest + Score │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Verification Operations
|
||||
|
||||
### 2.1 Basic Proof Verification
|
||||
|
||||
#### Via CLI
|
||||
|
||||
```bash
|
||||
# Verify a proof bundle file
|
||||
stella proof verify --bundle bundle.tar.gz
|
||||
|
||||
# Verify with verbose output
|
||||
stella proof verify --bundle bundle.tar.gz --verbose
|
||||
|
||||
# Verify and output as JSON
|
||||
stella proof verify --bundle bundle.tar.gz --output json
|
||||
```
|
||||
|
||||
#### Expected Output (Success)
|
||||
|
||||
```
|
||||
Proof Verification Result
|
||||
══════════════════════════════════════════
|
||||
✓ DSSE Signature VALID
|
||||
✓ Merkle Root VALID
|
||||
✓ Certificate Chain VALID
|
||||
✓ Not Expired VALID
|
||||
──────────────────────────────────────────
|
||||
Overall: VERIFIED
|
||||
|
||||
Root Hash: sha256:abc123...
|
||||
Signed By: scanner-prod-01.stellaops.local
|
||||
Signed At: 2025-01-15T10:30:00Z
|
||||
Valid Until: 2026-01-15T10:30:00Z
|
||||
```
|
||||
|
||||
#### Expected Output (Failure)
|
||||
|
||||
```
|
||||
Proof Verification Result
|
||||
══════════════════════════════════════════
|
||||
✓ DSSE Signature VALID
|
||||
✗ Merkle Root INVALID
|
||||
✓ Certificate Chain VALID
|
||||
✓ Not Expired VALID
|
||||
──────────────────────────────────────────
|
||||
Overall: FAILED
|
||||
|
||||
Error: Merkle root mismatch
|
||||
Expected: sha256:abc123...
|
||||
Actual: sha256:def456...
|
||||
```
|
||||
|
||||
### 2.2 Verification via API
|
||||
|
||||
```bash
|
||||
# Verify by scan ID
|
||||
curl -X POST "https://scanner.stellaops.local/api/v1/scanner/scans/$SCAN_ID/proofs/$ROOT_HASH/verify" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json"
|
||||
|
||||
# Response
|
||||
{
|
||||
"valid": true,
|
||||
"rootHash": "sha256:abc123...",
|
||||
"checks": [
|
||||
{"name": "dsse_signature", "passed": true, "message": "Signature valid"},
|
||||
{"name": "merkle_root", "passed": true, "message": "Root hash matches"},
|
||||
{"name": "certificate_chain", "passed": true, "message": "Chain valid"},
|
||||
{"name": "not_expired", "passed": true, "message": "Certificate not expired"}
|
||||
],
|
||||
"verifiedAt": "2025-01-16T10:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 Viewing Merkle Spine
|
||||
|
||||
The Merkle spine shows the path from leaf nodes to the root:
|
||||
|
||||
```bash
|
||||
stella proof spine --bundle bundle.tar.gz
|
||||
```
|
||||
|
||||
Output:
|
||||
```
|
||||
Merkle Tree Spine
|
||||
══════════════════════════════════════════
|
||||
Root: sha256:abc123...
|
||||
├── sha256:node1... (sbom_hash)
|
||||
├── sha256:node2... (rules_hash)
|
||||
├── sha256:node3... (policy_hash)
|
||||
└── sha256:node4... (feed_hash)
|
||||
|
||||
Depth: 3
|
||||
Leaves: 4
|
||||
Algorithm: SHA-256
|
||||
```
|
||||
|
||||
### 2.4 Certificate Inspection
|
||||
|
||||
```bash
|
||||
# Extract and inspect certificate
|
||||
tar -xzf bundle.tar.gz
|
||||
openssl x509 -in bundle/certificate.pem -noout -text
|
||||
|
||||
# Check validity period
|
||||
openssl x509 -in bundle/certificate.pem -noout -dates
|
||||
|
||||
# Verify against CA bundle
|
||||
openssl verify -CAfile /etc/stellaops/ca-bundle.pem bundle/certificate.pem
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Offline Verification
|
||||
|
||||
### 3.1 When to Use Offline Verification
|
||||
|
||||
- Air-gapped environments
|
||||
- Network-restricted systems
|
||||
- Compliance audits without API access
|
||||
- Disaster recovery scenarios
|
||||
|
||||
### 3.2 Prerequisites for Offline Verification
|
||||
|
||||
Required files:
|
||||
- Proof bundle (`.tar.gz`)
|
||||
- CA certificate bundle (`ca-bundle.pem`)
|
||||
- Trust root configuration (`trust-roots.json`)
|
||||
|
||||
```bash
|
||||
# Prepare offline verification kit
|
||||
stella proof offline-kit create \
|
||||
--output /path/to/offline-kit/ \
|
||||
--include-ca \
|
||||
--include-trust-roots
|
||||
```
|
||||
|
||||
Kit contents:
|
||||
```
|
||||
offline-kit/
|
||||
├── ca-bundle.pem # Certificate authority chain
|
||||
├── trust-roots.json # Trusted signing keys
|
||||
├── verify.sh # Standalone verification script
|
||||
└── README.md # Instructions
|
||||
```
|
||||
|
||||
### 3.3 Running Offline Verification
|
||||
|
||||
```bash
|
||||
# Using CLI with offline flag
|
||||
stella proof verify --bundle bundle.tar.gz --offline
|
||||
|
||||
# Using standalone script
|
||||
./verify.sh bundle.tar.gz
|
||||
|
||||
# Manual verification with OpenSSL
|
||||
./verify.sh bundle.tar.gz --ca-bundle ./ca-bundle.pem
|
||||
```
|
||||
|
||||
### 3.4 Offline Verification Checks
|
||||
|
||||
| Check | Online | Offline | Notes |
|
||||
|-------|--------|---------|-------|
|
||||
| DSSE Signature | ✓ | ✓ | Local crypto |
|
||||
| Merkle Root | ✓ | ✓ | Local hash computation |
|
||||
| Certificate Chain | ✓ | ✓ | Requires CA bundle |
|
||||
| Certificate Revocation | ✓ | ✗ | Needs CRL/OCSP |
|
||||
| Rekor Transparency | ✓ | ✗ | Needs network |
|
||||
|
||||
### 3.5 Air-Gap Considerations
|
||||
|
||||
For fully air-gapped environments:
|
||||
|
||||
1. **Pre-stage CA bundle**:
|
||||
```bash
|
||||
# On connected system
|
||||
stella ca export --output ca-bundle.pem
|
||||
|
||||
# Transfer to air-gapped system
|
||||
scp ca-bundle.pem airgap:/etc/stellaops/
|
||||
```
|
||||
|
||||
2. **Pre-stage CRL (optional)**:
|
||||
```bash
|
||||
# Download latest CRL
|
||||
curl -o crl.pem https://ca.stellaops.io/crl/latest.pem
|
||||
|
||||
# Transfer and use
|
||||
stella proof verify --bundle bundle.tar.gz --offline --crl crl.pem
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Transparency Log Integration
|
||||
|
||||
### 4.1 Rekor Overview
|
||||
|
||||
StellaOps optionally publishes proof attestations to Sigstore Rekor for immutable transparency logging.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Transparency Flow │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Proof Bundle │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ DSSE Envelope ──────► Rekor ──────► Inclusion Proof │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ Local Verify Log Entry ID │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 4.2 Checking Rekor Entry
|
||||
|
||||
```bash
|
||||
# Verify with Rekor check
|
||||
stella proof verify --bundle bundle.tar.gz --check-rekor
|
||||
|
||||
# Get Rekor entry details
|
||||
stella proof rekor-entry --bundle bundle.tar.gz
|
||||
```
|
||||
|
||||
Output:
|
||||
```
|
||||
Rekor Entry
|
||||
══════════════════════════════════════════
|
||||
Log Index: 12345678
|
||||
Entry UUID: 24296fb24b8ad77a...
|
||||
Log ID: c0d23d6ad406973f...
|
||||
Integrated: 2025-01-15T10:30:05Z
|
||||
|
||||
Inclusion Proof:
|
||||
Root Hash: sha256:rekor-root...
|
||||
Tree Size: 98765432
|
||||
Hashes: [sha256:a1b2..., sha256:c3d4...]
|
||||
|
||||
Verification: ✓ INCLUDED
|
||||
```
|
||||
|
||||
### 4.3 Manual Rekor Verification
|
||||
|
||||
```bash
|
||||
# Using rekor-cli
|
||||
rekor-cli verify --artifact bundle.tar.gz \
|
||||
--signature bundle/dsse-envelope.json \
|
||||
--public-key bundle/certificate.pem
|
||||
|
||||
# Search for entries
|
||||
rekor-cli search --sha sha256:abc123...
|
||||
```
|
||||
|
||||
### 4.4 When Rekor is Unavailable
|
||||
|
||||
If Rekor is temporarily unavailable:
|
||||
|
||||
1. Verification still succeeds for DSSE and Merkle checks
|
||||
2. Rekor check is marked as "SKIPPED"
|
||||
3. Re-verify later when Rekor is available
|
||||
|
||||
```bash
|
||||
# Skip Rekor check
|
||||
stella proof verify --bundle bundle.tar.gz --skip-rekor
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Troubleshooting
|
||||
|
||||
### 5.1 DSSE Signature Invalid
|
||||
|
||||
**Symptoms**: `DSSE signature verification failed`
|
||||
|
||||
**Diagnostic Steps**:
|
||||
|
||||
1. Extract and inspect envelope:
|
||||
```bash
|
||||
tar -xzf bundle.tar.gz
|
||||
cat bundle/dsse-envelope.json | jq .
|
||||
```
|
||||
|
||||
2. Check payload type:
|
||||
```bash
|
||||
cat bundle/dsse-envelope.json | jq -r '.payloadType'
|
||||
# Expected: application/vnd.stellaops.proof+json
|
||||
```
|
||||
|
||||
3. Verify signature format:
|
||||
```bash
|
||||
cat bundle/dsse-envelope.json | jq '.signatures[0].sig' | base64 -d | xxd | head
|
||||
```
|
||||
|
||||
**Common Causes**:
|
||||
|
||||
| Cause | Resolution |
|
||||
|-------|------------|
|
||||
| Corrupted bundle | Re-download from API |
|
||||
| Wrong public key | Check trust roots configuration |
|
||||
| Signature algorithm mismatch | Verify ECDSA-P256 or RSA support |
|
||||
| Encoding issue | Check Base64 encoding |
|
||||
|
||||
### 5.2 Merkle Root Mismatch
|
||||
|
||||
**Symptoms**: `Merkle root does not match expected value`
|
||||
|
||||
**Diagnostic Steps**:
|
||||
|
||||
1. Recompute Merkle root locally:
|
||||
```bash
|
||||
stella proof compute-root --bundle bundle.tar.gz
|
||||
```
|
||||
|
||||
2. Compare manifest hashes:
|
||||
```bash
|
||||
cat bundle/manifest.json | jq '.hashes'
|
||||
```
|
||||
|
||||
3. Check for trailing whitespace or encoding:
|
||||
```bash
|
||||
sha256sum bundle/manifest.json
|
||||
```
|
||||
|
||||
**Resolution**:
|
||||
- Bundle may have been modified after signing
|
||||
- Re-export bundle from source system
|
||||
- If legitimate change, re-sign bundle
|
||||
|
||||
### 5.3 Certificate Chain Validation Failed
|
||||
|
||||
**Symptoms**: `Certificate chain verification failed`
|
||||
|
||||
**Diagnostic Steps**:
|
||||
|
||||
1. Check certificate expiry:
|
||||
```bash
|
||||
openssl x509 -in bundle/certificate.pem -noout -dates
|
||||
```
|
||||
|
||||
2. Verify chain:
|
||||
```bash
|
||||
openssl verify -verbose -CAfile /etc/stellaops/ca-bundle.pem bundle/certificate.pem
|
||||
```
|
||||
|
||||
3. Check for missing intermediates:
|
||||
```bash
|
||||
openssl x509 -in bundle/certificate.pem -noout -issuer
|
||||
```
|
||||
|
||||
**Common Errors**:
|
||||
|
||||
| Error | Cause | Fix |
|
||||
|-------|-------|-----|
|
||||
| `certificate has expired` | Cert past validity | Re-sign with valid cert |
|
||||
| `unable to get issuer certificate` | Missing intermediate | Update CA bundle |
|
||||
| `certificate revoked` | Key compromised | Use new signing key |
|
||||
| `self-signed certificate` | Wrong trust root | Import correct CA |
|
||||
|
||||
### 5.4 Bundle Extraction Fails
|
||||
|
||||
**Symptoms**: `Failed to extract bundle` or `Invalid archive format`
|
||||
|
||||
**Diagnostic Steps**:
|
||||
|
||||
1. Check file type:
|
||||
```bash
|
||||
file bundle.tar.gz
|
||||
```
|
||||
|
||||
2. Test archive integrity:
|
||||
```bash
|
||||
gzip -t bundle.tar.gz
|
||||
tar -tzf bundle.tar.gz
|
||||
```
|
||||
|
||||
3. Check for truncation:
|
||||
```bash
|
||||
ls -la bundle.tar.gz
|
||||
# Compare with expected size from API
|
||||
```
|
||||
|
||||
**Resolution**:
|
||||
- Re-download if corrupted
|
||||
- Check network transfer (use checksums)
|
||||
- Verify sufficient disk space
|
||||
|
||||
---
|
||||
|
||||
## 6. Monitoring & Alerting
|
||||
|
||||
### 6.1 Key Metrics
|
||||
|
||||
| Metric | Description | Alert Threshold |
|
||||
|--------|-------------|-----------------|
|
||||
| `proof_verification_total` | Total verifications | Baseline |
|
||||
| `proof_verification_failures` | Failed verifications | > 5/hour |
|
||||
| `proof_verification_duration_ms` | Verification latency | p99 > 5s |
|
||||
| `certificate_expiry_days` | Days until cert expiry | < 30 days |
|
||||
| `rekor_verification_failures` | Rekor check failures | > 0 (warning) |
|
||||
|
||||
### 6.2 Grafana Queries
|
||||
|
||||
```promql
|
||||
# Verification success rate
|
||||
sum(rate(proof_verification_success_total[1h])) /
|
||||
sum(rate(proof_verification_total[1h])) * 100
|
||||
|
||||
# Verification latency
|
||||
histogram_quantile(0.99, rate(proof_verification_duration_ms_bucket[5m]))
|
||||
|
||||
# Certificate expiry countdown
|
||||
min(certificate_expiry_days) by (certificate_id)
|
||||
|
||||
# Failures by type
|
||||
sum by (failure_reason) (rate(proof_verification_failures_total[1h]))
|
||||
```
|
||||
|
||||
### 6.3 Alert Rules
|
||||
|
||||
```yaml
|
||||
groups:
|
||||
- name: proof-verification
|
||||
rules:
|
||||
- alert: ProofVerificationFailuresHigh
|
||||
expr: rate(proof_verification_failures_total[1h]) > 5
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: High proof verification failure rate
|
||||
|
||||
- alert: SigningCertificateExpiringSoon
|
||||
expr: certificate_expiry_days < 30
|
||||
for: 1h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: Signing certificate expires in {{ $value }} days
|
||||
|
||||
- alert: SigningCertificateExpired
|
||||
expr: certificate_expiry_days <= 0
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: Signing certificate has expired
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Escalation Procedures
|
||||
|
||||
### 7.1 Escalation Matrix
|
||||
|
||||
| Severity | Condition | Response Time | Escalate To |
|
||||
|----------|-----------|---------------|-------------|
|
||||
| P1 - Critical | Signing certificate expired | Immediate | Security Team + Platform Lead |
|
||||
| P1 - Critical | Mass verification failures | 15 minutes | Platform Team |
|
||||
| P2 - High | Rekor unavailable | 1 hour | Platform Team |
|
||||
| P3 - Medium | Single verification failure | 4 hours | Support Queue |
|
||||
| P4 - Low | Certificate expiring (>7 days) | Next sprint | Security Team |
|
||||
|
||||
### 7.2 P1: Certificate Expired Response
|
||||
|
||||
1. **Immediate Actions** (0-15 min):
|
||||
- Stop accepting new scans (if signing required)
|
||||
- Notify stakeholders
|
||||
- Begin emergency certificate rotation
|
||||
|
||||
2. **Certificate Rotation** (15-60 min):
|
||||
```bash
|
||||
# Generate new certificate
|
||||
stella signer cert rotate --emergency
|
||||
|
||||
# Verify new certificate
|
||||
stella signer cert show --current
|
||||
|
||||
# Resume operations
|
||||
stella signer status
|
||||
```
|
||||
|
||||
3. **Post-Incident**:
|
||||
- Implement certificate expiry monitoring
|
||||
- Schedule proactive rotations
|
||||
- Update runbooks
|
||||
|
||||
### 7.3 Contacts
|
||||
|
||||
| Role | Contact | Availability |
|
||||
|------|---------|--------------|
|
||||
| Security Team | security@stellaops.io | Business hours |
|
||||
| Platform On-Call | platform-oncall@stellaops.io | 24/7 |
|
||||
| Attestor Team | attestor-team@stellaops.io | Business hours |
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: DSSE Envelope Format
|
||||
|
||||
```json
|
||||
{
|
||||
"payloadType": "application/vnd.stellaops.proof+json",
|
||||
"payload": "<base64-encoded-proof>",
|
||||
"signatures": [
|
||||
{
|
||||
"keyid": "sha256:signing-key-fingerprint",
|
||||
"sig": "<base64-encoded-signature>"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Payload Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"_type": "https://stellaops.io/proof/v1",
|
||||
"subject": [
|
||||
{
|
||||
"name": "scan-123",
|
||||
"digest": {
|
||||
"sha256": "abc123..."
|
||||
}
|
||||
}
|
||||
],
|
||||
"predicateType": "https://stellaops.io/attestation/score/v1",
|
||||
"predicate": {
|
||||
"manifest": {
|
||||
"sbomHash": "sha256:...",
|
||||
"rulesHash": "sha256:...",
|
||||
"policyHash": "sha256:...",
|
||||
"feedHash": "sha256:..."
|
||||
},
|
||||
"score": 7.5,
|
||||
"rootHash": "sha256:...",
|
||||
"timestamp": "2025-01-15T10:30:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Appendix B: CLI Quick Reference
|
||||
|
||||
```bash
|
||||
# Verification Commands
|
||||
stella proof verify --bundle <path> # Verify bundle
|
||||
stella proof verify --bundle <path> --offline # Offline verification
|
||||
stella proof verify --bundle <path> --verbose # Detailed output
|
||||
stella proof verify --bundle <path> --check-rekor # Include Rekor check
|
||||
|
||||
# Inspection Commands
|
||||
stella proof spine --bundle <path> # Show Merkle tree
|
||||
stella proof show --bundle <path> # Show bundle contents
|
||||
stella proof rekor-entry --bundle <path> # Show Rekor entry
|
||||
|
||||
# Offline Kit
|
||||
stella proof offline-kit create --output <dir> # Create offline kit
|
||||
stella proof offline-kit verify --kit <dir> --bundle <path> # Use kit
|
||||
|
||||
# Certificate Commands
|
||||
stella signer cert show # Show current cert
|
||||
stella signer cert rotate # Rotate certificate
|
||||
stella signer cert export --output <path> # Export public cert
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Revision History
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 1.0.0 | 2025-12-20 | Agent | Initial release |
|
||||
518
docs/operations/score-replay-runbook.md
Normal file
518
docs/operations/score-replay-runbook.md
Normal file
@@ -0,0 +1,518 @@
|
||||
# Score Replay Operations Runbook
|
||||
|
||||
> **Version**: 1.0.0
|
||||
> **Sprint**: 3500.0004.0004
|
||||
> **Last Updated**: 2025-12-20
|
||||
|
||||
This runbook covers operational procedures for Score Replay, including deterministic score computation verification, proof bundle validation, and troubleshooting replay discrepancies.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#1-overview)
|
||||
2. [Score Replay Operations](#2-score-replay-operations)
|
||||
3. [Determinism Verification](#3-determinism-verification)
|
||||
4. [Proof Bundle Management](#4-proof-bundle-management)
|
||||
5. [Troubleshooting](#5-troubleshooting)
|
||||
6. [Monitoring & Alerting](#6-monitoring--alerting)
|
||||
7. [Escalation Procedures](#7-escalation-procedures)
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
### What is Score Replay?
|
||||
|
||||
Score Replay is the ability to re-execute a vulnerability score computation using the exact same inputs (SBOM, rules, policies, feeds) that were used in the original scan. This provides:
|
||||
|
||||
- **Auditability**: Prove that a score was computed correctly
|
||||
- **Determinism verification**: Confirm that identical inputs produce identical outputs
|
||||
- **Compliance evidence**: Generate proof bundles for regulatory requirements
|
||||
- **Dispute resolution**: Verify contested scan results
|
||||
|
||||
### Key Concepts
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Manifest** | Content-addressed record of all scoring inputs (SBOM hash, rules hash, policy hash, feed hash) |
|
||||
| **Proof Bundle** | Signed attestation containing manifest, score, and Merkle proof |
|
||||
| **Root Hash** | Merkle tree root computed from all input hashes |
|
||||
| **DSSE Envelope** | Dead Simple Signing Envelope containing the signed proof |
|
||||
| **Freeze Timestamp** | Optional timestamp to replay scoring at a specific point in time |
|
||||
|
||||
### Architecture Components
|
||||
|
||||
| Component | Purpose | Location |
|
||||
|-----------|---------|----------|
|
||||
| Score Engine | Computes vulnerability scores | Scanner Worker |
|
||||
| Manifest Store | Persists scoring manifests | `scanner.manifest` table |
|
||||
| Proof Chain | Generates Merkle proofs | Attestor library |
|
||||
| Signer | Signs proof bundles (DSSE) | Signer service |
|
||||
|
||||
---
|
||||
|
||||
## 2. Score Replay Operations
|
||||
|
||||
### 2.1 Triggering a Score Replay
|
||||
|
||||
#### Via CLI
|
||||
|
||||
```bash
|
||||
# Basic replay
|
||||
stella score replay --scan <scan-id>
|
||||
|
||||
# Replay with specific manifest
|
||||
stella score replay --scan <scan-id> --manifest-hash sha256:abc123...
|
||||
|
||||
# Replay with frozen timestamp (for determinism testing)
|
||||
stella score replay --scan <scan-id> --freeze 2025-01-15T00:00:00Z
|
||||
|
||||
# Output as JSON
|
||||
stella score replay --scan <scan-id> --output json
|
||||
```
|
||||
|
||||
#### Via API
|
||||
|
||||
```bash
|
||||
# POST /api/v1/scanner/score/{scanId}/replay
|
||||
curl -X POST "https://scanner.stellaops.local/api/v1/scanner/score/scan-123/replay" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"manifestHash": "sha256:abc123...",
|
||||
"freezeTimestamp": "2025-01-15T00:00:00Z"
|
||||
}'
|
||||
```
|
||||
|
||||
#### Expected Response
|
||||
|
||||
```json
|
||||
{
|
||||
"scanId": "scan-123",
|
||||
"score": 7.5,
|
||||
"rootHash": "sha256:def456...",
|
||||
"bundleUri": "/api/v1/scanner/scans/scan-123/proofs/sha256:def456...",
|
||||
"manifestHash": "sha256:abc123...",
|
||||
"replayedAt": "2025-01-16T10:30:00Z",
|
||||
"deterministic": true
|
||||
}
|
||||
```
|
||||
|
||||
### 2.2 Retrieving Proof Bundles
|
||||
|
||||
#### Via CLI
|
||||
|
||||
```bash
|
||||
# Get bundle for a scan
|
||||
stella score bundle --scan <scan-id>
|
||||
|
||||
# Download bundle to file
|
||||
stella score bundle --scan <scan-id> --output bundle.tar.gz
|
||||
```
|
||||
|
||||
#### Via API
|
||||
|
||||
```bash
|
||||
# GET /api/v1/scanner/score/{scanId}/bundle
|
||||
curl "https://scanner.stellaops.local/api/v1/scanner/score/scan-123/bundle" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-o bundle.tar.gz
|
||||
```
|
||||
|
||||
### 2.3 Verifying Score Integrity
|
||||
|
||||
#### Via CLI
|
||||
|
||||
```bash
|
||||
# Verify against expected root hash
|
||||
stella score verify --scan <scan-id> --root-hash sha256:def456...
|
||||
|
||||
# Verify downloaded bundle
|
||||
stella proof verify --bundle bundle.tar.gz
|
||||
```
|
||||
|
||||
#### Via API
|
||||
|
||||
```bash
|
||||
# POST /api/v1/scanner/score/{scanId}/verify
|
||||
curl -X POST "https://scanner.stellaops.local/api/v1/scanner/score/scan-123/verify" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"expectedRootHash": "sha256:def456..."}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Determinism Verification
|
||||
|
||||
### 3.1 What Affects Determinism?
|
||||
|
||||
Score computation is deterministic when:
|
||||
|
||||
| Input | Requirement |
|
||||
|-------|-------------|
|
||||
| SBOM | Identical content (same hash) |
|
||||
| Rules | Same rule version and configuration |
|
||||
| Policy | Same policy document |
|
||||
| Feeds | Same feed snapshot (freeze timestamp) |
|
||||
| Ordering | Findings sorted deterministically |
|
||||
|
||||
### 3.2 Running Determinism Checks
|
||||
|
||||
```bash
|
||||
# Run replay twice and compare
|
||||
REPLAY1=$(stella score replay --scan $SCAN_ID --output json)
|
||||
REPLAY2=$(stella score replay --scan $SCAN_ID --output json)
|
||||
|
||||
# Extract root hashes
|
||||
HASH1=$(echo $REPLAY1 | jq -r '.rootHash')
|
||||
HASH2=$(echo $REPLAY2 | jq -r '.rootHash')
|
||||
|
||||
# Compare
|
||||
if [ "$HASH1" = "$HASH2" ]; then
|
||||
echo "✓ Determinism verified: $HASH1"
|
||||
else
|
||||
echo "✗ Non-deterministic! $HASH1 != $HASH2"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
### 3.3 Common Determinism Issues
|
||||
|
||||
| Issue | Cause | Resolution |
|
||||
|-------|-------|------------|
|
||||
| Different root hash | Feed data changed between replays | Use `--freeze` timestamp |
|
||||
| Score drift | Rule version mismatch | Pin rules version in manifest |
|
||||
| Ordering differences | Non-stable sort in findings | Check Scanner version (fixed in v2.1+) |
|
||||
| Timestamp in output | Current time in computation | Ensure frozen time mode |
|
||||
|
||||
### 3.4 Feed Freeze for Reproducibility
|
||||
|
||||
```bash
|
||||
# Replay with feed state frozen to original scan time
|
||||
stella score replay --scan $SCAN_ID \
|
||||
--freeze $(stella scan show $SCAN_ID --output json | jq -r '.scannedAt')
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Proof Bundle Management
|
||||
|
||||
### 4.1 Bundle Contents
|
||||
|
||||
A proof bundle (`.tar.gz`) contains:
|
||||
|
||||
```
|
||||
bundle/
|
||||
├── manifest.json # Input hashes and metadata
|
||||
├── score.json # Computed score and findings summary
|
||||
├── merkle-proof.json # Merkle tree with inclusion proofs
|
||||
├── dsse-envelope.json # Signed attestation (DSSE format)
|
||||
└── certificate.pem # Signing certificate (optional)
|
||||
```
|
||||
|
||||
### 4.2 Inspecting Bundles
|
||||
|
||||
```bash
|
||||
# Extract and view manifest
|
||||
tar -xzf bundle.tar.gz
|
||||
cat bundle/manifest.json | jq .
|
||||
|
||||
# Verify DSSE signature
|
||||
stella proof verify --bundle bundle.tar.gz --verbose
|
||||
|
||||
# Check Merkle proof
|
||||
stella proof spine --bundle bundle.tar.gz
|
||||
```
|
||||
|
||||
### 4.3 Bundle Retention Policy
|
||||
|
||||
| Environment | Retention | Notes |
|
||||
|-------------|-----------|-------|
|
||||
| Production | 7 years | Regulatory compliance |
|
||||
| Staging | 90 days | Testing purposes |
|
||||
| Development | 30 days | Cleanup automatically |
|
||||
|
||||
### 4.4 Archiving Bundles
|
||||
|
||||
```bash
|
||||
# Export bundle to long-term storage
|
||||
stella score bundle --scan $SCAN_ID --output /archive/proofs/$SCAN_ID.tar.gz
|
||||
|
||||
# Bulk export for compliance audit
|
||||
stella score bundle-export \
|
||||
--since 2024-01-01 \
|
||||
--until 2024-12-31 \
|
||||
--output /archive/2024-proofs/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Troubleshooting
|
||||
|
||||
### 5.1 Replay Returns Different Score
|
||||
|
||||
**Symptoms**: Replayed score differs from original scan score.
|
||||
|
||||
**Diagnostic Steps**:
|
||||
|
||||
1. Check manifest integrity:
|
||||
```bash
|
||||
stella scan show $SCAN_ID --output json | jq '.manifest'
|
||||
```
|
||||
|
||||
2. Verify feed state:
|
||||
```bash
|
||||
# Compare feed hashes
|
||||
stella score replay --scan $SCAN_ID --freeze $ORIGINAL_TIME --output json | jq '.manifestHash'
|
||||
```
|
||||
|
||||
3. Check for rule updates:
|
||||
```bash
|
||||
stella rules show --version --output json
|
||||
```
|
||||
|
||||
**Resolution**:
|
||||
- Use `--freeze` timestamp matching original scan
|
||||
- Pin rule versions in policy
|
||||
- Regenerate manifest if inputs changed legitimately
|
||||
|
||||
### 5.2 Proof Verification Fails
|
||||
|
||||
**Symptoms**: `stella proof verify` returns validation errors.
|
||||
|
||||
**Diagnostic Steps**:
|
||||
|
||||
1. Check DSSE signature:
|
||||
```bash
|
||||
stella proof verify --bundle bundle.tar.gz --verbose 2>&1 | grep -i signature
|
||||
```
|
||||
|
||||
2. Verify certificate validity:
|
||||
```bash
|
||||
openssl x509 -in bundle/certificate.pem -noout -dates
|
||||
```
|
||||
|
||||
3. Check Merkle proof:
|
||||
```bash
|
||||
stella proof spine --bundle bundle.tar.gz --verify
|
||||
```
|
||||
|
||||
**Common Errors**:
|
||||
|
||||
| Error | Cause | Fix |
|
||||
|-------|-------|-----|
|
||||
| `SIGNATURE_INVALID` | Bundle tampered or wrong key | Re-download bundle |
|
||||
| `CERTIFICATE_EXPIRED` | Signing cert expired | Check signing key rotation |
|
||||
| `MERKLE_MISMATCH` | Root hash doesn't match | Verify correct bundle version |
|
||||
| `MANIFEST_MISSING` | Incomplete bundle | Re-export from API |
|
||||
|
||||
### 5.3 Replay Timeout
|
||||
|
||||
**Symptoms**: Replay request times out or takes too long.
|
||||
|
||||
**Diagnostic Steps**:
|
||||
|
||||
1. Check scan size:
|
||||
```bash
|
||||
stella scan show $SCAN_ID --output json | jq '.findingsCount'
|
||||
```
|
||||
|
||||
2. Monitor replay progress:
|
||||
```bash
|
||||
stella score replay --scan $SCAN_ID --verbose
|
||||
```
|
||||
|
||||
**Resolution**:
|
||||
- For large scans (>10k findings), increase timeout
|
||||
- Check Scanner Worker health
|
||||
- Consider async replay for very large scans
|
||||
|
||||
### 5.4 Missing Manifest
|
||||
|
||||
**Symptoms**: `Manifest not found` error on replay.
|
||||
|
||||
**Diagnostic Steps**:
|
||||
|
||||
1. Verify scan exists:
|
||||
```bash
|
||||
stella scan show $SCAN_ID
|
||||
```
|
||||
|
||||
2. Check manifest table:
|
||||
```sql
|
||||
SELECT * FROM scanner.manifest WHERE scan_id = 'scan-123';
|
||||
```
|
||||
|
||||
**Resolution**:
|
||||
- Manifest may have been purged (check retention policy)
|
||||
- Restore from backup if available
|
||||
- Re-run scan if original inputs available
|
||||
|
||||
---
|
||||
|
||||
## 6. Monitoring & Alerting
|
||||
|
||||
### 6.1 Key Metrics
|
||||
|
||||
| Metric | Description | Alert Threshold |
|
||||
|--------|-------------|-----------------|
|
||||
| `score_replay_duration_ms` | Time to complete replay | p99 > 30s |
|
||||
| `score_replay_determinism_failures` | Non-deterministic replays | > 0 |
|
||||
| `proof_verification_failures` | Failed verifications | > 5/hour |
|
||||
| `manifest_storage_size_bytes` | Manifest table size | > 100GB |
|
||||
|
||||
### 6.2 Grafana Dashboard Queries
|
||||
|
||||
```promql
|
||||
# Replay latency
|
||||
histogram_quantile(0.99,
|
||||
rate(score_replay_duration_ms_bucket[5m])
|
||||
)
|
||||
|
||||
# Determinism failure rate
|
||||
rate(score_replay_determinism_failures_total[1h])
|
||||
|
||||
# Proof verification success rate
|
||||
sum(rate(proof_verification_success_total[1h])) /
|
||||
sum(rate(proof_verification_total[1h]))
|
||||
```
|
||||
|
||||
### 6.3 Alert Rules
|
||||
|
||||
```yaml
|
||||
groups:
|
||||
- name: score-replay
|
||||
rules:
|
||||
- alert: ScoreReplayLatencyHigh
|
||||
expr: histogram_quantile(0.99, rate(score_replay_duration_ms_bucket[5m])) > 30000
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: Score replay latency exceeds 30s at p99
|
||||
|
||||
- alert: DeterminismFailure
|
||||
expr: increase(score_replay_determinism_failures_total[1h]) > 0
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: Non-deterministic score replay detected
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Escalation Procedures
|
||||
|
||||
### 7.1 Escalation Matrix
|
||||
|
||||
| Severity | Condition | Response Time | Escalate To |
|
||||
|----------|-----------|---------------|-------------|
|
||||
| P1 - Critical | Determinism failure in production | 15 minutes | Platform Team Lead |
|
||||
| P2 - High | Proof verification failures > 10/hour | 1 hour | Scanner Team |
|
||||
| P3 - Medium | Replay latency degradation | 4 hours | Scanner Team |
|
||||
| P4 - Low | Single replay failure | Next business day | Support Queue |
|
||||
|
||||
### 7.2 P1: Determinism Failure Response
|
||||
|
||||
1. **Immediate Actions** (0-15 min):
|
||||
- Capture affected scan IDs
|
||||
- Preserve original manifest data
|
||||
- Check for recent deployments
|
||||
|
||||
2. **Investigation** (15-60 min):
|
||||
- Compare input hashes between replays
|
||||
- Check feed synchronization status
|
||||
- Review rule engine logs
|
||||
|
||||
3. **Remediation**:
|
||||
- Roll back if deployment-related
|
||||
- Freeze feeds if data drift
|
||||
- Hotfix if code bug identified
|
||||
|
||||
### 7.3 Contacts
|
||||
|
||||
| Role | Contact | Availability |
|
||||
|------|---------|--------------|
|
||||
| Scanner Team Lead | scanner-lead@stellaops.io | Business hours |
|
||||
| Platform On-Call | platform-oncall@stellaops.io | 24/7 |
|
||||
| Security Team | security@stellaops.io | Business hours |
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: SQL Queries
|
||||
|
||||
### Check Manifest History
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
scan_id,
|
||||
manifest_hash,
|
||||
sbom_hash,
|
||||
rules_hash,
|
||||
policy_hash,
|
||||
feed_hash,
|
||||
created_at
|
||||
FROM scanner.manifest
|
||||
WHERE scan_id = 'scan-123'
|
||||
ORDER BY created_at DESC;
|
||||
```
|
||||
|
||||
### Find Non-Deterministic Replays
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
scan_id,
|
||||
COUNT(DISTINCT root_hash) as unique_hashes,
|
||||
MIN(replayed_at) as first_replay,
|
||||
MAX(replayed_at) as last_replay
|
||||
FROM scanner.replay_log
|
||||
GROUP BY scan_id
|
||||
HAVING COUNT(DISTINCT root_hash) > 1;
|
||||
```
|
||||
|
||||
### Proof Bundle Statistics
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
DATE_TRUNC('day', created_at) as day,
|
||||
COUNT(*) as bundles_created,
|
||||
AVG(bundle_size_bytes) as avg_size,
|
||||
SUM(bundle_size_bytes) as total_size
|
||||
FROM scanner.proof_bundle
|
||||
WHERE created_at > NOW() - INTERVAL '30 days'
|
||||
GROUP BY DATE_TRUNC('day', created_at)
|
||||
ORDER BY day DESC;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Appendix B: CLI Quick Reference
|
||||
|
||||
```bash
|
||||
# Score Replay Commands
|
||||
stella score replay --scan <id> # Replay score computation
|
||||
stella score replay --scan <id> --freeze <ts> # Replay with frozen time
|
||||
stella score bundle --scan <id> # Get proof bundle
|
||||
stella score verify --scan <id> --root-hash <hash> # Verify score
|
||||
|
||||
# Proof Commands
|
||||
stella proof verify --bundle <path> # Verify bundle file
|
||||
stella proof verify --bundle <path> --offline # Offline verification
|
||||
stella proof spine --bundle <path> # Show Merkle spine
|
||||
|
||||
# Output Formats
|
||||
--output json # JSON output
|
||||
--output table # Table output (default)
|
||||
--output yaml # YAML output
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Revision History
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 1.0.0 | 2025-12-20 | Agent | Initial release |
|
||||
Reference in New Issue
Block a user