409 lines
8.4 KiB
Markdown
409 lines
8.4 KiB
Markdown
# Sprint: SPRINT_20260117_029_Runbook_coverage_expansion
|
|
# Task: RUN-003 - Evidence Locker Runbook
|
|
# Evidence Locker Operations Runbook
|
|
|
|
Status: PRODUCTION-READY (2026-01-17 UTC)
|
|
|
|
## Scope
|
|
Evidence locker operations including storage management, integrity verification, attestation management, provenance chain maintenance, and disaster recovery procedures.
|
|
|
|
---
|
|
|
|
## Pre-flight Checklist
|
|
|
|
### Environment Verification
|
|
```bash
|
|
# Check evidence locker health
|
|
stella doctor --category evidence
|
|
|
|
# Verify storage accessibility
|
|
stella evidence status
|
|
|
|
# Check index health
|
|
stella evidence index status
|
|
|
|
# Verify anchor chain
|
|
stella evidence anchor verify --latest
|
|
```
|
|
|
|
### Metrics to Watch
|
|
- `stella_evidence_artifacts_total` - Total artifacts stored
|
|
- `stella_evidence_retrieval_latency_seconds` - Retrieval latency P99
|
|
- `stella_evidence_storage_bytes` - Storage consumption
|
|
- `stella_merkle_anchor_age_seconds` - Time since last anchor
|
|
|
|
---
|
|
|
|
## Standard Procedures
|
|
|
|
### SP-001: Daily Integrity Check
|
|
|
|
**Frequency:** Daily (automated) or on-demand
|
|
**Duration:** Varies by locker size (typically 5-30 minutes)
|
|
|
|
1. Run integrity verification:
|
|
```bash
|
|
# Quick check (sample-based)
|
|
stella evidence verify --mode quick
|
|
|
|
# Full check (all artifacts)
|
|
stella evidence verify --mode full
|
|
```
|
|
|
|
2. Review results:
|
|
```bash
|
|
stella evidence verify-report --latest
|
|
```
|
|
|
|
3. Address any failures:
|
|
```bash
|
|
# List failed artifacts
|
|
stella evidence verify-report --latest --filter failed
|
|
```
|
|
|
|
### SP-002: Index Maintenance
|
|
|
|
**Frequency:** Weekly or after large ingestion
|
|
**Duration:** ~10 minutes
|
|
|
|
1. Check index health:
|
|
```bash
|
|
stella evidence index status
|
|
```
|
|
|
|
2. Refresh index if needed:
|
|
```bash
|
|
# Incremental refresh
|
|
stella evidence index refresh
|
|
|
|
# Full rebuild (if corruption suspected)
|
|
stella evidence index rebuild
|
|
```
|
|
|
|
3. Optimize index:
|
|
```bash
|
|
stella evidence index optimize
|
|
```
|
|
|
|
### SP-003: Merkle Anchoring
|
|
|
|
**Frequency:** Per policy (default: every 6 hours)
|
|
**Duration:** ~2 minutes
|
|
|
|
1. Create new anchor:
|
|
```bash
|
|
stella evidence anchor create
|
|
```
|
|
|
|
2. Verify anchor chain:
|
|
```bash
|
|
stella evidence anchor verify --all
|
|
```
|
|
|
|
3. Export anchor for external archival:
|
|
```bash
|
|
stella evidence anchor export --latest --output anchor-$(date +%Y%m%dT%H%M%S).json
|
|
```
|
|
|
|
### SP-004: Storage Cleanup
|
|
|
|
**Frequency:** Monthly or when storage alerts trigger
|
|
**Duration:** Varies
|
|
|
|
1. Review storage usage:
|
|
```bash
|
|
stella evidence storage stats
|
|
```
|
|
|
|
2. Apply retention policy:
|
|
```bash
|
|
# Dry run first
|
|
stella evidence cleanup --apply-retention --dry-run
|
|
|
|
# Execute cleanup
|
|
stella evidence cleanup --apply-retention
|
|
```
|
|
|
|
3. Archive old evidence (if required):
|
|
```bash
|
|
stella evidence archive --older-than 365d --output /archive/evidence-$(date +%Y).tar
|
|
```
|
|
|
|
---
|
|
|
|
## Incident Procedures
|
|
|
|
### INC-001: Integrity Verification Failure
|
|
|
|
**Symptoms:**
|
|
- Alert: `StellaEvidenceIntegrityFailure`
|
|
- Verification reports hash mismatch
|
|
|
|
**Investigation:**
|
|
```bash
|
|
# Get failure details
|
|
stella evidence verify-report --latest --filter failed --format json > /tmp/integrity-failures.json
|
|
|
|
# Check specific artifact
|
|
stella evidence inspect <artifact-id>
|
|
|
|
# Check provenance
|
|
stella evidence provenance show <artifact-id>
|
|
```
|
|
|
|
**Resolution:**
|
|
|
|
1. **Isolated corruption:**
|
|
```bash
|
|
# Attempt recovery from replica (if available)
|
|
stella evidence recover --id <artifact-id> --source replica
|
|
|
|
# If no replica, mark as corrupted
|
|
stella evidence mark-corrupted --id <artifact-id> --reason "hash-mismatch"
|
|
```
|
|
|
|
2. **Widespread corruption:**
|
|
- Stop evidence ingestion
|
|
- Identify corruption extent
|
|
- Restore from backup if necessary
|
|
- Escalate to L3
|
|
|
|
3. **False positive (software bug):**
|
|
- Verify with multiple hash implementations
|
|
- Check for recent software updates
|
|
- Report bug if confirmed
|
|
|
|
### INC-002: Evidence Retrieval Failure
|
|
|
|
**Symptoms:**
|
|
- Alert: `StellaEvidenceRetrievalFailed`
|
|
- API returning 404 for known artifacts
|
|
|
|
**Investigation:**
|
|
```bash
|
|
# Check if artifact exists
|
|
stella evidence exists <artifact-id>
|
|
|
|
# Check index
|
|
stella evidence index lookup <artifact-id>
|
|
|
|
# Check storage backend
|
|
stella evidence storage check <artifact-id>
|
|
```
|
|
|
|
**Resolution:**
|
|
|
|
1. **Index corruption:**
|
|
```bash
|
|
# Rebuild index
|
|
stella evidence index rebuild
|
|
```
|
|
|
|
2. **Storage backend issue:**
|
|
```bash
|
|
# Check storage health
|
|
stella doctor --check check.storage.evidencelocker
|
|
|
|
# Verify storage connectivity
|
|
stella evidence storage test
|
|
```
|
|
|
|
3. **File system issue:**
|
|
- Check disk health
|
|
- Verify file permissions
|
|
- Check mount status
|
|
|
|
### INC-003: Anchor Chain Break
|
|
|
|
**Symptoms:**
|
|
- Alert: `StellaMerkleAnchorChainBroken`
|
|
- Anchor verification fails
|
|
|
|
**Investigation:**
|
|
```bash
|
|
# Check anchor chain
|
|
stella evidence anchor verify --all --verbose
|
|
|
|
# Find break point
|
|
stella evidence anchor list --show-links
|
|
|
|
# Inspect specific anchor
|
|
stella evidence anchor inspect <anchor-id>
|
|
```
|
|
|
|
**Resolution:**
|
|
|
|
1. **Single broken link:**
|
|
```bash
|
|
# Attempt to recover from backup
|
|
stella evidence anchor recover --id <anchor-id> --source backup
|
|
```
|
|
|
|
2. **Multiple breaks:**
|
|
- Stop new anchoring
|
|
- Assess extent of damage
|
|
- Restore from backup or rebuild chain
|
|
|
|
3. **Create new chain segment:**
|
|
```bash
|
|
# Start new chain (preserves old chain as archived)
|
|
stella evidence anchor new-chain --reason "chain-break-recovery"
|
|
```
|
|
|
|
### INC-004: Storage Full
|
|
|
|
**Symptoms:**
|
|
- Alert: `StellaEvidenceStorageFull`
|
|
- Ingestion failing
|
|
|
|
**Immediate Actions:**
|
|
```bash
|
|
# Check storage usage
|
|
stella evidence storage stats
|
|
|
|
# Emergency cleanup of temporary files
|
|
stella evidence cleanup --temp-only
|
|
|
|
# Find large/old artifacts
|
|
stella evidence storage analyze --sort size --limit 20
|
|
```
|
|
|
|
**Resolution:**
|
|
|
|
1. **Apply retention policy:**
|
|
```bash
|
|
stella evidence cleanup --apply-retention --aggressive
|
|
```
|
|
|
|
2. **Archive old evidence:**
|
|
```bash
|
|
stella evidence archive --older-than 180d --compress
|
|
```
|
|
|
|
3. **Expand storage:**
|
|
- Follow cloud provider procedure
|
|
- Or add additional storage volume
|
|
|
|
---
|
|
|
|
## Disaster Recovery
|
|
|
|
### DR-001: Full Evidence Locker Recovery
|
|
|
|
**Prerequisites:**
|
|
- Backup available
|
|
- Target storage provisioned
|
|
- Recovery environment ready
|
|
|
|
**Procedure:**
|
|
|
|
1. Provision new storage:
|
|
```bash
|
|
stella evidence storage provision --size <size>
|
|
```
|
|
|
|
2. Restore from backup:
|
|
```bash
|
|
# List available backups
|
|
stella backup list --type evidence-locker
|
|
|
|
# Restore
|
|
stella evidence restore --backup-id <backup-id> --target /var/lib/stellaops/evidence
|
|
```
|
|
|
|
3. Verify restoration:
|
|
```bash
|
|
stella evidence verify --mode full
|
|
stella evidence anchor verify --all
|
|
```
|
|
|
|
4. Update service configuration:
|
|
```bash
|
|
stella config set EvidenceLocker:Path /var/lib/stellaops/evidence
|
|
stella service restart
|
|
```
|
|
|
|
### DR-002: Point-in-Time Recovery
|
|
|
|
For recovering to a specific point in time:
|
|
|
|
1. Identify target anchor:
|
|
```bash
|
|
stella evidence anchor list --before <timestamp>
|
|
```
|
|
|
|
2. Restore to that point:
|
|
```bash
|
|
stella evidence restore --to-anchor <anchor-id>
|
|
```
|
|
|
|
3. Verify integrity:
|
|
```bash
|
|
stella evidence verify --mode full --to-anchor <anchor-id>
|
|
```
|
|
|
|
---
|
|
|
|
## Offline Mode Operations
|
|
|
|
### Preparing Offline Evidence Pack
|
|
|
|
```bash
|
|
# Export evidence for specific artifact
|
|
stella evidence export --digest <artifact-digest> --output evidence-pack.tar.gz
|
|
|
|
# Export with all dependencies
|
|
stella evidence export --digest <artifact-digest> --include-deps --output evidence-full.tar.gz
|
|
```
|
|
|
|
### Verifying Evidence Offline
|
|
|
|
```bash
|
|
# Verify evidence pack without network
|
|
stella evidence verify --offline --input evidence-pack.tar.gz
|
|
|
|
# Replay verdict using evidence
|
|
stella replay --evidence evidence-pack.tar.gz --output verdict.json
|
|
```
|
|
|
|
---
|
|
|
|
## Monitoring Dashboard
|
|
|
|
Access: Grafana → Dashboards → Stella Ops → Evidence Locker
|
|
|
|
Key panels:
|
|
- Artifact ingestion rate
|
|
- Retrieval latency
|
|
- Storage utilization trend
|
|
- Integrity check status
|
|
- Anchor chain health
|
|
|
|
---
|
|
|
|
## Evidence Capture
|
|
|
|
For any incident:
|
|
```bash
|
|
stella evidence diagnostics --output /tmp/evidence-diag-$(date +%Y%m%dT%H%M%S).tar.gz
|
|
```
|
|
|
|
Bundle includes:
|
|
- Index status
|
|
- Storage stats
|
|
- Recent anchor chain
|
|
- Integrity check results
|
|
- Operation audit log
|
|
|
|
---
|
|
|
|
## Escalation Path
|
|
|
|
1. **L1 (On-call):** Standard procedures, cleanup operations
|
|
2. **L2 (Platform team):** Index rebuild, anchor issues
|
|
3. **L3 (Architecture):** Chain recovery, DR procedures
|
|
|
|
---
|
|
|
|
_Last updated: 2026-01-17 (UTC)_
|