synergy moats product advisory implementations
This commit is contained in:
408
docs/operations/runbooks/evidence-locker-ops.md
Normal file
408
docs/operations/runbooks/evidence-locker-ops.md
Normal file
@@ -0,0 +1,408 @@
|
||||
# Sprint: SPRINT_20260117_029_Runbook_coverage_expansion
|
||||
# Task: RUN-003 - Evidence Locker Runbook
|
||||
# Evidence Locker Operations Runbook
|
||||
|
||||
Status: PRODUCTION-READY (2026-01-17 UTC)
|
||||
|
||||
## Scope
|
||||
Evidence locker operations including storage management, integrity verification, attestation management, provenance chain maintenance, and disaster recovery procedures.
|
||||
|
||||
---
|
||||
|
||||
## Pre-flight Checklist
|
||||
|
||||
### Environment Verification
|
||||
```bash
|
||||
# Check evidence locker health
|
||||
stella doctor --category evidence
|
||||
|
||||
# Verify storage accessibility
|
||||
stella evidence status
|
||||
|
||||
# Check index health
|
||||
stella evidence index status
|
||||
|
||||
# Verify anchor chain
|
||||
stella evidence anchor verify --latest
|
||||
```
|
||||
|
||||
### Metrics to Watch
|
||||
- `stella_evidence_artifacts_total` - Total artifacts stored
|
||||
- `stella_evidence_retrieval_latency_seconds` - Retrieval latency P99
|
||||
- `stella_evidence_storage_bytes` - Storage consumption
|
||||
- `stella_merkle_anchor_age_seconds` - Time since last anchor
|
||||
|
||||
---
|
||||
|
||||
## Standard Procedures
|
||||
|
||||
### SP-001: Daily Integrity Check
|
||||
|
||||
**Frequency:** Daily (automated) or on-demand
|
||||
**Duration:** Varies by locker size (typically 5-30 minutes)
|
||||
|
||||
1. Run integrity verification:
|
||||
```bash
|
||||
# Quick check (sample-based)
|
||||
stella evidence verify --mode quick
|
||||
|
||||
# Full check (all artifacts)
|
||||
stella evidence verify --mode full
|
||||
```
|
||||
|
||||
2. Review results:
|
||||
```bash
|
||||
stella evidence verify-report --latest
|
||||
```
|
||||
|
||||
3. Address any failures:
|
||||
```bash
|
||||
# List failed artifacts
|
||||
stella evidence verify-report --latest --filter failed
|
||||
```
|
||||
|
||||
### SP-002: Index Maintenance
|
||||
|
||||
**Frequency:** Weekly or after large ingestion
|
||||
**Duration:** ~10 minutes
|
||||
|
||||
1. Check index health:
|
||||
```bash
|
||||
stella evidence index status
|
||||
```
|
||||
|
||||
2. Refresh index if needed:
|
||||
```bash
|
||||
# Incremental refresh
|
||||
stella evidence index refresh
|
||||
|
||||
# Full rebuild (if corruption suspected)
|
||||
stella evidence index rebuild
|
||||
```
|
||||
|
||||
3. Optimize index:
|
||||
```bash
|
||||
stella evidence index optimize
|
||||
```
|
||||
|
||||
### SP-003: Merkle Anchoring
|
||||
|
||||
**Frequency:** Per policy (default: every 6 hours)
|
||||
**Duration:** ~2 minutes
|
||||
|
||||
1. Create new anchor:
|
||||
```bash
|
||||
stella evidence anchor create
|
||||
```
|
||||
|
||||
2. Verify anchor chain:
|
||||
```bash
|
||||
stella evidence anchor verify --all
|
||||
```
|
||||
|
||||
3. Export anchor for external archival:
|
||||
```bash
|
||||
stella evidence anchor export --latest --output anchor-$(date +%Y%m%dT%H%M%S).json
|
||||
```
|
||||
|
||||
### SP-004: Storage Cleanup
|
||||
|
||||
**Frequency:** Monthly or when storage alerts trigger
|
||||
**Duration:** Varies
|
||||
|
||||
1. Review storage usage:
|
||||
```bash
|
||||
stella evidence storage stats
|
||||
```
|
||||
|
||||
2. Apply retention policy:
|
||||
```bash
|
||||
# Dry run first
|
||||
stella evidence cleanup --apply-retention --dry-run
|
||||
|
||||
# Execute cleanup
|
||||
stella evidence cleanup --apply-retention
|
||||
```
|
||||
|
||||
3. Archive old evidence (if required):
|
||||
```bash
|
||||
stella evidence archive --older-than 365d --output /archive/evidence-$(date +%Y).tar
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Incident Procedures
|
||||
|
||||
### INC-001: Integrity Verification Failure
|
||||
|
||||
**Symptoms:**
|
||||
- Alert: `StellaEvidenceIntegrityFailure`
|
||||
- Verification reports hash mismatch
|
||||
|
||||
**Investigation:**
|
||||
```bash
|
||||
# Get failure details
|
||||
stella evidence verify-report --latest --filter failed --format json > /tmp/integrity-failures.json
|
||||
|
||||
# Check specific artifact
|
||||
stella evidence inspect <artifact-id>
|
||||
|
||||
# Check provenance
|
||||
stella evidence provenance show <artifact-id>
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
|
||||
1. **Isolated corruption:**
|
||||
```bash
|
||||
# Attempt recovery from replica (if available)
|
||||
stella evidence recover --id <artifact-id> --source replica
|
||||
|
||||
# If no replica, mark as corrupted
|
||||
stella evidence mark-corrupted --id <artifact-id> --reason "hash-mismatch"
|
||||
```
|
||||
|
||||
2. **Widespread corruption:**
|
||||
- Stop evidence ingestion
|
||||
- Identify corruption extent
|
||||
- Restore from backup if necessary
|
||||
- Escalate to L3
|
||||
|
||||
3. **False positive (software bug):**
|
||||
- Verify with multiple hash implementations
|
||||
- Check for recent software updates
|
||||
- Report bug if confirmed
|
||||
|
||||
### INC-002: Evidence Retrieval Failure
|
||||
|
||||
**Symptoms:**
|
||||
- Alert: `StellaEvidenceRetrievalFailed`
|
||||
- API returning 404 for known artifacts
|
||||
|
||||
**Investigation:**
|
||||
```bash
|
||||
# Check if artifact exists
|
||||
stella evidence exists <artifact-id>
|
||||
|
||||
# Check index
|
||||
stella evidence index lookup <artifact-id>
|
||||
|
||||
# Check storage backend
|
||||
stella evidence storage check <artifact-id>
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
|
||||
1. **Index corruption:**
|
||||
```bash
|
||||
# Rebuild index
|
||||
stella evidence index rebuild
|
||||
```
|
||||
|
||||
2. **Storage backend issue:**
|
||||
```bash
|
||||
# Check storage health
|
||||
stella doctor --check check.storage.evidencelocker
|
||||
|
||||
# Verify storage connectivity
|
||||
stella evidence storage test
|
||||
```
|
||||
|
||||
3. **File system issue:**
|
||||
- Check disk health
|
||||
- Verify file permissions
|
||||
- Check mount status
|
||||
|
||||
### INC-003: Anchor Chain Break
|
||||
|
||||
**Symptoms:**
|
||||
- Alert: `StellaMerkleAnchorChainBroken`
|
||||
- Anchor verification fails
|
||||
|
||||
**Investigation:**
|
||||
```bash
|
||||
# Check anchor chain
|
||||
stella evidence anchor verify --all --verbose
|
||||
|
||||
# Find break point
|
||||
stella evidence anchor list --show-links
|
||||
|
||||
# Inspect specific anchor
|
||||
stella evidence anchor inspect <anchor-id>
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
|
||||
1. **Single broken link:**
|
||||
```bash
|
||||
# Attempt to recover from backup
|
||||
stella evidence anchor recover --id <anchor-id> --source backup
|
||||
```
|
||||
|
||||
2. **Multiple breaks:**
|
||||
- Stop new anchoring
|
||||
- Assess extent of damage
|
||||
- Restore from backup or rebuild chain
|
||||
|
||||
3. **Create new chain segment:**
|
||||
```bash
|
||||
# Start new chain (preserves old chain as archived)
|
||||
stella evidence anchor new-chain --reason "chain-break-recovery"
|
||||
```
|
||||
|
||||
### INC-004: Storage Full
|
||||
|
||||
**Symptoms:**
|
||||
- Alert: `StellaEvidenceStorageFull`
|
||||
- Ingestion failing
|
||||
|
||||
**Immediate Actions:**
|
||||
```bash
|
||||
# Check storage usage
|
||||
stella evidence storage stats
|
||||
|
||||
# Emergency cleanup of temporary files
|
||||
stella evidence cleanup --temp-only
|
||||
|
||||
# Find large/old artifacts
|
||||
stella evidence storage analyze --sort size --limit 20
|
||||
```
|
||||
|
||||
**Resolution:**
|
||||
|
||||
1. **Apply retention policy:**
|
||||
```bash
|
||||
stella evidence cleanup --apply-retention --aggressive
|
||||
```
|
||||
|
||||
2. **Archive old evidence:**
|
||||
```bash
|
||||
stella evidence archive --older-than 180d --compress
|
||||
```
|
||||
|
||||
3. **Expand storage:**
|
||||
- Follow cloud provider procedure
|
||||
- Or add additional storage volume
|
||||
|
||||
---
|
||||
|
||||
## Disaster Recovery
|
||||
|
||||
### DR-001: Full Evidence Locker Recovery
|
||||
|
||||
**Prerequisites:**
|
||||
- Backup available
|
||||
- Target storage provisioned
|
||||
- Recovery environment ready
|
||||
|
||||
**Procedure:**
|
||||
|
||||
1. Provision new storage:
|
||||
```bash
|
||||
stella evidence storage provision --size <size>
|
||||
```
|
||||
|
||||
2. Restore from backup:
|
||||
```bash
|
||||
# List available backups
|
||||
stella backup list --type evidence-locker
|
||||
|
||||
# Restore
|
||||
stella evidence restore --backup-id <backup-id> --target /var/lib/stellaops/evidence
|
||||
```
|
||||
|
||||
3. Verify restoration:
|
||||
```bash
|
||||
stella evidence verify --mode full
|
||||
stella evidence anchor verify --all
|
||||
```
|
||||
|
||||
4. Update service configuration:
|
||||
```bash
|
||||
stella config set EvidenceLocker:Path /var/lib/stellaops/evidence
|
||||
stella service restart
|
||||
```
|
||||
|
||||
### DR-002: Point-in-Time Recovery
|
||||
|
||||
For recovering to a specific point in time:
|
||||
|
||||
1. Identify target anchor:
|
||||
```bash
|
||||
stella evidence anchor list --before <timestamp>
|
||||
```
|
||||
|
||||
2. Restore to that point:
|
||||
```bash
|
||||
stella evidence restore --to-anchor <anchor-id>
|
||||
```
|
||||
|
||||
3. Verify integrity:
|
||||
```bash
|
||||
stella evidence verify --mode full --to-anchor <anchor-id>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Offline Mode Operations
|
||||
|
||||
### Preparing Offline Evidence Pack
|
||||
|
||||
```bash
|
||||
# Export evidence for specific artifact
|
||||
stella evidence export --digest <artifact-digest> --output evidence-pack.tar.gz
|
||||
|
||||
# Export with all dependencies
|
||||
stella evidence export --digest <artifact-digest> --include-deps --output evidence-full.tar.gz
|
||||
```
|
||||
|
||||
### Verifying Evidence Offline
|
||||
|
||||
```bash
|
||||
# Verify evidence pack without network
|
||||
stella evidence verify --offline --input evidence-pack.tar.gz
|
||||
|
||||
# Replay verdict using evidence
|
||||
stella replay --evidence evidence-pack.tar.gz --output verdict.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring Dashboard
|
||||
|
||||
Access: Grafana → Dashboards → Stella Ops → Evidence Locker
|
||||
|
||||
Key panels:
|
||||
- Artifact ingestion rate
|
||||
- Retrieval latency
|
||||
- Storage utilization trend
|
||||
- Integrity check status
|
||||
- Anchor chain health
|
||||
|
||||
---
|
||||
|
||||
## Evidence Capture
|
||||
|
||||
For any incident:
|
||||
```bash
|
||||
stella evidence diagnostics --output /tmp/evidence-diag-$(date +%Y%m%dT%H%M%S).tar.gz
|
||||
```
|
||||
|
||||
Bundle includes:
|
||||
- Index status
|
||||
- Storage stats
|
||||
- Recent anchor chain
|
||||
- Integrity check results
|
||||
- Operation audit log
|
||||
|
||||
---
|
||||
|
||||
## Escalation Path
|
||||
|
||||
1. **L1 (On-call):** Standard procedures, cleanup operations
|
||||
2. **L2 (Platform team):** Index rebuild, anchor issues
|
||||
3. **L3 (Architecture):** Chain recovery, DR procedures
|
||||
|
||||
---
|
||||
|
||||
_Last updated: 2026-01-17 (UTC)_
|
||||
Reference in New Issue
Block a user