# Sprint: SPRINT_20260117_029_Runbook_coverage_expansion # Task: RUN-004 - Backup/Restore Runbook # Backup and Restore Operations Runbook Status: PRODUCTION-READY (2026-01-17 UTC) ## Scope Comprehensive backup and restore procedures for all Stella Ops components including database, evidence locker, configuration, and secrets. --- ## Backup Architecture Overview ### Backup Components | Component | Backup Type | Default Schedule | Retention | |-----------|-------------|------------------|-----------| | PostgreSQL | Full + WAL | Daily full, continuous WAL | 30 days | | Evidence Locker | Incremental | Daily | 90 days | | Configuration | Snapshot | Daily + on change | 90 days | | Secrets | Encrypted snapshot | Daily | 30 days | | Attestation Keys | Encrypted export | Weekly | 1 year | ### Storage Locations - **Primary:** `/var/lib/stellaops/backups/` (local) - **Secondary:** S3/Azure Blob/GCS (configurable) - **Offline:** Removable media for air-gap scenarios --- ## Pre-flight Checklist ### Environment Verification ```bash # Check backup service status stella backup status # Verify backup storage stella doctor --check check.storage.backup # List recent backups stella backup list --last 7d # Test backup restore capability stella backup test-restore --latest --dry-run ``` ### Metrics to Watch - `stella_backup_last_success_timestamp` - Last successful backup - `stella_backup_duration_seconds` - Backup duration - `stella_backup_size_bytes` - Backup size - `stella_restore_test_last_success` - Last restore test --- ## Standard Procedures ### SP-001: Create Manual Backup **When:** Before upgrades, schema changes, or major configuration changes **Duration:** 5-30 minutes depending on data volume 1. Create full system backup: ```bash stella backup create --full --name "pre-upgrade-$(date +%Y%m%d)" ``` 2. Or create component-specific backup: ```bash # Database only stella backup create --type database --name "db-pre-migration" # Evidence locker only stella backup create --type evidence --name "evidence-snapshot" # Configuration only stella backup create --type config --name "config-backup" ``` 3. Verify backup: ```bash stella backup verify --name "pre-upgrade-$(date +%Y%m%d)" ``` 4. Copy to offsite storage (recommended): ```bash stella backup copy --name "pre-upgrade-$(date +%Y%m%d)" --destination s3://backup-bucket/ ``` ### SP-002: Verify Backup Integrity **Frequency:** Weekly **Duration:** 15-60 minutes 1. List backups for verification: ```bash stella backup list --unverified ``` 2. Verify backup integrity: ```bash # Verify specific backup stella backup verify --name # Verify all unverified stella backup verify --all-unverified ``` 3. Test restore (non-destructive): ```bash stella backup test-restore --name --target /tmp/restore-test ``` 4. Record verification result: ```bash stella backup log-verification --name --result success ``` ### SP-003: Restore from Backup **CAUTION: This is a destructive operation** #### Full System Restore 1. Stop all services: ```bash stella service stop --all ``` 2. List available backups: ```bash stella backup list --type full ``` 3. Restore: ```bash # Dry run first stella backup restore --name --dry-run # Execute restore stella backup restore --name --confirm ``` 4. Start services: ```bash stella service start --all ``` 5. Verify restoration: ```bash stella doctor --all stella service health ``` #### Component-Specific Restore 1. Database restore: ```bash stella service stop --service api,release-orchestrator stella backup restore --type database --name --confirm stella db migrate # Apply any pending migrations stella service start --service api,release-orchestrator ``` 2. Evidence locker restore: ```bash stella backup restore --type evidence --name --confirm stella evidence verify --mode quick ``` 3. Configuration restore: ```bash stella backup restore --type config --name --confirm stella service restart --graceful ``` ### SP-004: Point-in-Time Recovery (Database) 1. Identify target recovery point: ```bash # List WAL archives stella backup wal-list --after --before ``` 2. Perform PITR: ```bash stella backup restore-pitr --to-time "2026-01-17T10:30:00Z" --confirm ``` 3. Verify data state: ```bash stella db verify-integrity ``` --- ## Backup Schedules ### Configure Backup Schedule ```bash # View current schedule stella backup schedule show # Set database backup schedule stella backup schedule set --type database --cron "0 2 * * *" # Set evidence backup schedule stella backup schedule set --type evidence --cron "0 3 * * *" # Set configuration backup schedule stella backup schedule set --type config --cron "0 4 * * *" --on-change ``` ### Retention Policy ```bash # View retention policy stella backup retention show # Set retention stella backup retention set --type database --days 30 stella backup retention set --type evidence --days 90 stella backup retention set --type config --days 90 # Apply retention (cleanup old backups) stella backup retention apply ``` --- ## Incident Procedures ### INC-001: Backup Failure **Symptoms:** - Alert: `StellaBackupFailed` - Missing recent backup **Investigation:** ```bash # Check backup logs stella backup logs --last 24h # Check disk space stella doctor --check check.storage.diskspace,check.storage.backup # Test backup operation stella backup test --type database ``` **Resolution:** 1. **Disk space issue:** ```bash stella backup retention apply --force stella backup cleanup --expired ``` 2. **Database connectivity:** ```bash stella doctor --check check.postgres.connectivity ``` 3. **Permission issue:** - Check backup directory permissions - Verify service account access 4. **Retry backup:** ```bash stella backup create --type --retry ``` ### INC-002: Restore Failure **Symptoms:** - Restore command fails - Services not starting after restore **Investigation:** ```bash # Check restore logs stella backup restore-logs --last-attempt # Verify backup integrity stella backup verify --name # Check disk space stella doctor --check check.storage.diskspace ``` **Resolution:** 1. **Corrupted backup:** ```bash # Try previous backup stella backup list --type stella backup restore --name --confirm ``` 2. **Version mismatch:** ```bash # Check backup version stella backup info --name # Restore with migration stella backup restore --name --with-migration ``` 3. **Disk space:** - Free space or expand volume - Restore to alternate location ### INC-003: Backup Storage Full **Symptoms:** - Alert: `StellaBackupStorageFull` - New backups failing **Immediate Actions:** ```bash # Check storage stella backup storage stats # Emergency cleanup stella backup cleanup --keep-last 3 # Delete specific old backups stella backup delete --older-than 14d --confirm ``` **Resolution:** 1. **Adjust retention:** ```bash stella backup retention set --type database --days 14 stella backup retention apply ``` 2. **Expand storage:** - Add disk space - Configure offsite storage 3. **Archive to cold storage:** ```bash stella backup archive --older-than 30d --destination s3://archive-bucket/ ``` --- ## Disaster Recovery Scenarios ### DR-001: Complete System Loss 1. Provision new infrastructure 2. Install Stella Ops 3. Restore from offsite backup: ```bash stella backup restore --source s3://backup-bucket/latest-full.tar.gz --confirm ``` 4. Verify all components 5. Update DNS/load balancer ### DR-002: Database Corruption 1. Stop services 2. Restore database from latest clean backup: ```bash stella backup restore --type database --name ``` 3. Apply WAL to near-corruption point (PITR) 4. Verify data integrity 5. Resume services ### DR-003: Evidence Locker Loss 1. Restore evidence from backup: ```bash stella backup restore --type evidence --name ``` 2. Rebuild index: ```bash stella evidence index rebuild ``` 3. Verify anchor chain: ```bash stella evidence anchor verify --all ``` --- ## Offline/Air-Gap Backup ### Creating Offline Backup ```bash # Create encrypted offline bundle stella backup create-offline \ --output /media/usb/stellaops-backup-$(date +%Y%m%d).enc \ --encrypt \ --passphrase-file /secure/backup-key # Verify offline backup stella backup verify-offline --input /media/usb/stellaops-backup-*.enc ``` ### Restoring from Offline Backup ```bash # Restore from offline backup stella backup restore-offline \ --input /media/usb/stellaops-backup-*.enc \ --passphrase-file /secure/backup-key \ --confirm ``` --- ## Monitoring Dashboard Access: Grafana → Dashboards → Stella Ops → Backup Status Key panels: - Last backup success time - Backup size trend - Backup duration - Restore test status - Storage utilization --- ## Evidence Capture ```bash stella backup diagnostics --output /tmp/backup-diag-$(date +%Y%m%dT%H%M%S).tar.gz ``` --- ## Escalation Path 1. **L1 (On-call):** Retry failed backups, basic troubleshooting 2. **L2 (Platform team):** Restore operations, schedule adjustments 3. **L3 (Architecture):** Disaster recovery execution --- _Last updated: 2026-01-17 (UTC)_