Files
git.stella-ops.org/docs/operations/runbooks/evidence-locker-ops.md

8.4 KiB

Sprint: SPRINT_20260117_029_Runbook_coverage_expansion

Task: RUN-003 - Evidence Locker Runbook

Evidence Locker Operations Runbook

Status: PRODUCTION-READY (2026-01-17 UTC)

Scope

Evidence locker operations including storage management, integrity verification, attestation management, provenance chain maintenance, and disaster recovery procedures.


Pre-flight Checklist

Environment Verification

# Check evidence locker health
stella doctor --category evidence

# Verify storage accessibility
stella evidence status

# Check index health
stella evidence index status

# Verify anchor chain
stella evidence anchor verify --latest

Metrics to Watch

  • stella_evidence_artifacts_total - Total artifacts stored
  • stella_evidence_retrieval_latency_seconds - Retrieval latency P99
  • stella_evidence_storage_bytes - Storage consumption
  • stella_merkle_anchor_age_seconds - Time since last anchor

Standard Procedures

SP-001: Daily Integrity Check

Frequency: Daily (automated) or on-demand Duration: Varies by locker size (typically 5-30 minutes)

  1. Run integrity verification:

    # Quick check (sample-based)
    stella evidence verify --mode quick
    
    # Full check (all artifacts)
    stella evidence verify --mode full
    
  2. Review results:

    stella evidence verify-report --latest
    
  3. Address any failures:

    # List failed artifacts
    stella evidence verify-report --latest --filter failed
    

SP-002: Index Maintenance

Frequency: Weekly or after large ingestion Duration: ~10 minutes

  1. Check index health:

    stella evidence index status
    
  2. Refresh index if needed:

    # Incremental refresh
    stella evidence index refresh
    
    # Full rebuild (if corruption suspected)
    stella evidence index rebuild
    
  3. Optimize index:

    stella evidence index optimize
    

SP-003: Merkle Anchoring

Frequency: Per policy (default: every 6 hours) Duration: ~2 minutes

  1. Create new anchor:

    stella evidence anchor create
    
  2. Verify anchor chain:

    stella evidence anchor verify --all
    
  3. Export anchor for external archival:

    stella evidence anchor export --latest --output anchor-$(date +%Y%m%dT%H%M%S).json
    

SP-004: Storage Cleanup

Frequency: Monthly or when storage alerts trigger Duration: Varies

  1. Review storage usage:

    stella evidence storage stats
    
  2. Apply retention policy:

    # Dry run first
    stella evidence cleanup --apply-retention --dry-run
    
    # Execute cleanup
    stella evidence cleanup --apply-retention
    
  3. Archive old evidence (if required):

    stella evidence archive --older-than 365d --output /archive/evidence-$(date +%Y).tar
    

Incident Procedures

INC-001: Integrity Verification Failure

Symptoms:

  • Alert: StellaEvidenceIntegrityFailure
  • Verification reports hash mismatch

Investigation:

# Get failure details
stella evidence verify-report --latest --filter failed --format json > /tmp/integrity-failures.json

# Check specific artifact
stella evidence inspect <artifact-id>

# Check provenance
stella evidence provenance show <artifact-id>

Resolution:

  1. Isolated corruption:

    # Attempt recovery from replica (if available)
    stella evidence recover --id <artifact-id> --source replica
    
    # If no replica, mark as corrupted
    stella evidence mark-corrupted --id <artifact-id> --reason "hash-mismatch"
    
  2. Widespread corruption:

    • Stop evidence ingestion
    • Identify corruption extent
    • Restore from backup if necessary
    • Escalate to L3
  3. False positive (software bug):

    • Verify with multiple hash implementations
    • Check for recent software updates
    • Report bug if confirmed

INC-002: Evidence Retrieval Failure

Symptoms:

  • Alert: StellaEvidenceRetrievalFailed
  • API returning 404 for known artifacts

Investigation:

# Check if artifact exists
stella evidence exists <artifact-id>

# Check index
stella evidence index lookup <artifact-id>

# Check storage backend
stella evidence storage check <artifact-id>

Resolution:

  1. Index corruption:

    # Rebuild index
    stella evidence index rebuild
    
  2. Storage backend issue:

    # Check storage health
    stella doctor --check check.storage.evidencelocker
    
    # Verify storage connectivity
    stella evidence storage test
    
  3. File system issue:

    • Check disk health
    • Verify file permissions
    • Check mount status

INC-003: Anchor Chain Break

Symptoms:

  • Alert: StellaMerkleAnchorChainBroken
  • Anchor verification fails

Investigation:

# Check anchor chain
stella evidence anchor verify --all --verbose

# Find break point
stella evidence anchor list --show-links

# Inspect specific anchor
stella evidence anchor inspect <anchor-id>

Resolution:

  1. Single broken link:

    # Attempt to recover from backup
    stella evidence anchor recover --id <anchor-id> --source backup
    
  2. Multiple breaks:

    • Stop new anchoring
    • Assess extent of damage
    • Restore from backup or rebuild chain
  3. Create new chain segment:

    # Start new chain (preserves old chain as archived)
    stella evidence anchor new-chain --reason "chain-break-recovery"
    

INC-004: Storage Full

Symptoms:

  • Alert: StellaEvidenceStorageFull
  • Ingestion failing

Immediate Actions:

# Check storage usage
stella evidence storage stats

# Emergency cleanup of temporary files
stella evidence cleanup --temp-only

# Find large/old artifacts
stella evidence storage analyze --sort size --limit 20

Resolution:

  1. Apply retention policy:

    stella evidence cleanup --apply-retention --aggressive
    
  2. Archive old evidence:

    stella evidence archive --older-than 180d --compress
    
  3. Expand storage:

    • Follow cloud provider procedure
    • Or add additional storage volume

Disaster Recovery

DR-001: Full Evidence Locker Recovery

Prerequisites:

  • Backup available
  • Target storage provisioned
  • Recovery environment ready

Procedure:

  1. Provision new storage:

    stella evidence storage provision --size <size>
    
  2. Restore from backup:

    # List available backups
    stella backup list --type evidence-locker
    
    # Restore
    stella evidence restore --backup-id <backup-id> --target /var/lib/stellaops/evidence
    
  3. Verify restoration:

    stella evidence verify --mode full
    stella evidence anchor verify --all
    
  4. Update service configuration:

    stella config set EvidenceLocker:Path /var/lib/stellaops/evidence
    stella service restart
    

DR-002: Point-in-Time Recovery

For recovering to a specific point in time:

  1. Identify target anchor:

    stella evidence anchor list --before <timestamp>
    
  2. Restore to that point:

    stella evidence restore --to-anchor <anchor-id>
    
  3. Verify integrity:

    stella evidence verify --mode full --to-anchor <anchor-id>
    

Offline Mode Operations

Preparing Offline Evidence Pack

# Export evidence for specific artifact
stella evidence export --digest <artifact-digest> --output evidence-pack.tar.gz

# Export with all dependencies
stella evidence export --digest <artifact-digest> --include-deps --output evidence-full.tar.gz

Verifying Evidence Offline

# Verify evidence pack without network
stella evidence verify --offline --input evidence-pack.tar.gz

# Replay verdict using evidence
stella replay --evidence evidence-pack.tar.gz --output verdict.json

Monitoring Dashboard

Access: Grafana → Dashboards → Stella Ops → Evidence Locker

Key panels:

  • Artifact ingestion rate
  • Retrieval latency
  • Storage utilization trend
  • Integrity check status
  • Anchor chain health

Evidence Capture

For any incident:

stella evidence diagnostics --output /tmp/evidence-diag-$(date +%Y%m%dT%H%M%S).tar.gz

Bundle includes:

  • Index status
  • Storage stats
  • Recent anchor chain
  • Integrity check results
  • Operation audit log

Escalation Path

  1. L1 (On-call): Standard procedures, cleanup operations
  2. L2 (Platform team): Index rebuild, anchor issues
  3. L3 (Architecture): Chain recovery, DR procedures

Last updated: 2026-01-17 (UTC)