8.4 KiB
Sprint: SPRINT_20260117_029_Runbook_coverage_expansion
Task: RUN-003 - Evidence Locker Runbook
Evidence Locker Operations Runbook
Status: PRODUCTION-READY (2026-01-17 UTC)
Scope
Evidence locker operations including storage management, integrity verification, attestation management, provenance chain maintenance, and disaster recovery procedures.
Pre-flight Checklist
Environment Verification
# Check evidence locker health
stella doctor --category evidence
# Verify storage accessibility
stella evidence status
# Check index health
stella evidence index status
# Verify anchor chain
stella evidence anchor verify --latest
Metrics to Watch
stella_evidence_artifacts_total- Total artifacts storedstella_evidence_retrieval_latency_seconds- Retrieval latency P99stella_evidence_storage_bytes- Storage consumptionstella_merkle_anchor_age_seconds- Time since last anchor
Standard Procedures
SP-001: Daily Integrity Check
Frequency: Daily (automated) or on-demand Duration: Varies by locker size (typically 5-30 minutes)
-
Run integrity verification:
# Quick check (sample-based) stella evidence verify --mode quick # Full check (all artifacts) stella evidence verify --mode full -
Review results:
stella evidence verify-report --latest -
Address any failures:
# List failed artifacts stella evidence verify-report --latest --filter failed
SP-002: Index Maintenance
Frequency: Weekly or after large ingestion Duration: ~10 minutes
-
Check index health:
stella evidence index status -
Refresh index if needed:
# Incremental refresh stella evidence index refresh # Full rebuild (if corruption suspected) stella evidence index rebuild -
Optimize index:
stella evidence index optimize
SP-003: Merkle Anchoring
Frequency: Per policy (default: every 6 hours) Duration: ~2 minutes
-
Create new anchor:
stella evidence anchor create -
Verify anchor chain:
stella evidence anchor verify --all -
Export anchor for external archival:
stella evidence anchor export --latest --output anchor-$(date +%Y%m%dT%H%M%S).json
SP-004: Storage Cleanup
Frequency: Monthly or when storage alerts trigger Duration: Varies
-
Review storage usage:
stella evidence storage stats -
Apply retention policy:
# Dry run first stella evidence cleanup --apply-retention --dry-run # Execute cleanup stella evidence cleanup --apply-retention -
Archive old evidence (if required):
stella evidence archive --older-than 365d --output /archive/evidence-$(date +%Y).tar
Incident Procedures
INC-001: Integrity Verification Failure
Symptoms:
- Alert:
StellaEvidenceIntegrityFailure - Verification reports hash mismatch
Investigation:
# Get failure details
stella evidence verify-report --latest --filter failed --format json > /tmp/integrity-failures.json
# Check specific artifact
stella evidence inspect <artifact-id>
# Check provenance
stella evidence provenance show <artifact-id>
Resolution:
-
Isolated corruption:
# Attempt recovery from replica (if available) stella evidence recover --id <artifact-id> --source replica # If no replica, mark as corrupted stella evidence mark-corrupted --id <artifact-id> --reason "hash-mismatch" -
Widespread corruption:
- Stop evidence ingestion
- Identify corruption extent
- Restore from backup if necessary
- Escalate to L3
-
False positive (software bug):
- Verify with multiple hash implementations
- Check for recent software updates
- Report bug if confirmed
INC-002: Evidence Retrieval Failure
Symptoms:
- Alert:
StellaEvidenceRetrievalFailed - API returning 404 for known artifacts
Investigation:
# Check if artifact exists
stella evidence exists <artifact-id>
# Check index
stella evidence index lookup <artifact-id>
# Check storage backend
stella evidence storage check <artifact-id>
Resolution:
-
Index corruption:
# Rebuild index stella evidence index rebuild -
Storage backend issue:
# Check storage health stella doctor --check check.storage.evidencelocker # Verify storage connectivity stella evidence storage test -
File system issue:
- Check disk health
- Verify file permissions
- Check mount status
INC-003: Anchor Chain Break
Symptoms:
- Alert:
StellaMerkleAnchorChainBroken - Anchor verification fails
Investigation:
# Check anchor chain
stella evidence anchor verify --all --verbose
# Find break point
stella evidence anchor list --show-links
# Inspect specific anchor
stella evidence anchor inspect <anchor-id>
Resolution:
-
Single broken link:
# Attempt to recover from backup stella evidence anchor recover --id <anchor-id> --source backup -
Multiple breaks:
- Stop new anchoring
- Assess extent of damage
- Restore from backup or rebuild chain
-
Create new chain segment:
# Start new chain (preserves old chain as archived) stella evidence anchor new-chain --reason "chain-break-recovery"
INC-004: Storage Full
Symptoms:
- Alert:
StellaEvidenceStorageFull - Ingestion failing
Immediate Actions:
# Check storage usage
stella evidence storage stats
# Emergency cleanup of temporary files
stella evidence cleanup --temp-only
# Find large/old artifacts
stella evidence storage analyze --sort size --limit 20
Resolution:
-
Apply retention policy:
stella evidence cleanup --apply-retention --aggressive -
Archive old evidence:
stella evidence archive --older-than 180d --compress -
Expand storage:
- Follow cloud provider procedure
- Or add additional storage volume
Disaster Recovery
DR-001: Full Evidence Locker Recovery
Prerequisites:
- Backup available
- Target storage provisioned
- Recovery environment ready
Procedure:
-
Provision new storage:
stella evidence storage provision --size <size> -
Restore from backup:
# List available backups stella backup list --type evidence-locker # Restore stella evidence restore --backup-id <backup-id> --target /var/lib/stellaops/evidence -
Verify restoration:
stella evidence verify --mode full stella evidence anchor verify --all -
Update service configuration:
stella config set EvidenceLocker:Path /var/lib/stellaops/evidence stella service restart
DR-002: Point-in-Time Recovery
For recovering to a specific point in time:
-
Identify target anchor:
stella evidence anchor list --before <timestamp> -
Restore to that point:
stella evidence restore --to-anchor <anchor-id> -
Verify integrity:
stella evidence verify --mode full --to-anchor <anchor-id>
Offline Mode Operations
Preparing Offline Evidence Pack
# Export evidence for specific artifact
stella evidence export --digest <artifact-digest> --output evidence-pack.tar.gz
# Export with all dependencies
stella evidence export --digest <artifact-digest> --include-deps --output evidence-full.tar.gz
Verifying Evidence Offline
# Verify evidence pack without network
stella evidence verify --offline --input evidence-pack.tar.gz
# Replay verdict using evidence
stella replay --evidence evidence-pack.tar.gz --output verdict.json
Monitoring Dashboard
Access: Grafana → Dashboards → Stella Ops → Evidence Locker
Key panels:
- Artifact ingestion rate
- Retrieval latency
- Storage utilization trend
- Integrity check status
- Anchor chain health
Evidence Capture
For any incident:
stella evidence diagnostics --output /tmp/evidence-diag-$(date +%Y%m%dT%H%M%S).tar.gz
Bundle includes:
- Index status
- Storage stats
- Recent anchor chain
- Integrity check results
- Operation audit log
Escalation Path
- L1 (On-call): Standard procedures, cleanup operations
- L2 (Platform team): Index rebuild, anchor issues
- L3 (Architecture): Chain recovery, DR procedures
Last updated: 2026-01-17 (UTC)