# Stella Ops Upgrade Runbook This runbook provides step-by-step procedures for upgrading Stella Ops with evidence continuity preservation. ## Quick Reference | Phase | Duration | Owner | Rollback Point | |-------|----------|-------|----------------| | Pre-Upgrade | 2-4 hours | Platform Team | N/A | | Backup | 1-2 hours | DBA | Full restore | | Deploy Green | 30-60 min | Platform Team | Abort deploy | | Cutover | 15-30 min | Platform Team | Instant rollback | | Validation | 1-2 hours | QA + Security | 72h observation | | Cleanup | 30 min | Platform Team | N/A | ## Pre-Upgrade Checklist ### Environment Verification ```bash # Step 1: Record current version stella version > /tmp/pre-upgrade-version.txt echo "Current version: $(cat /tmp/pre-upgrade-version.txt)" # Step 2: Verify system health stella doctor --full --output /tmp/pre-upgrade-health.json if [ $? -ne 0 ]; then echo "ABORT: System health check failed" exit 1 fi # Step 3: Check pending migrations stella system migrations-status # Ensure no pending migrations before upgrade # Step 4: Verify queue depths stella queue status --all # All queues should be empty or near-empty ``` ### Evidence Integrity Baseline ```bash # Step 5: Capture evidence baseline stella evidence verify-all \ --output /backup/pre-upgrade-evidence-baseline.json \ --include-merkle-roots # Step 6: Export Merkle root summary stella evidence roots-export \ --output /backup/pre-upgrade-merkle-roots.json # Step 7: Record evidence counts stella evidence stats > /backup/pre-upgrade-evidence-stats.txt ``` ### Backup Procedures ```bash # Step 8: PostgreSQL backup BACKUP_TIMESTAMP=$(date +%Y%m%d-%H%M%S) pg_dump -Fc -d stellaops -f /backup/stellaops-${BACKUP_TIMESTAMP}.dump # Step 9: Verify backup integrity pg_restore --list /backup/stellaops-${BACKUP_TIMESTAMP}.dump > /dev/null if [ $? -ne 0 ]; then echo "ABORT: Backup verification failed" exit 1 fi # Step 10: Evidence bundle backup stella evidence export \ --all \ --output /backup/evidence-bundles-${BACKUP_TIMESTAMP}/ # Step 11: Configuration backup kubectl get configmap -n stellaops -o yaml > /backup/configmaps-${BACKUP_TIMESTAMP}.yaml kubectl get secret -n stellaops -o yaml > /backup/secrets-${BACKUP_TIMESTAMP}.yaml ``` ### Pre-Flight Approval Complete this checklist before proceeding: - [ ] Current version documented - [ ] System health: GREEN - [ ] Evidence baseline captured - [ ] PostgreSQL backup completed and verified - [ ] Evidence bundles exported - [ ] Configuration backed up - [ ] Maintenance window approved - [ ] Stakeholders notified - [ ] Rollback plan reviewed **Approver signature**: __________________ **Date**: __________ ## Upgrade Execution ### Deploy Green Environment ```bash # Step 12: Create green namespace kubectl create namespace stellaops-green # Step 13: Copy secrets to green namespace kubectl get secret stellaops-secrets -n stellaops -o yaml | \ sed 's/namespace: stellaops/namespace: stellaops-green/' | \ kubectl apply -f - # Step 14: Deploy new version helm upgrade stellaops-green ./helm/stellaops \ --namespace stellaops-green \ --values values-production.yaml \ --set image.tag=${TARGET_VERSION} \ --wait --timeout 10m # Step 15: Verify deployment kubectl get pods -n stellaops-green -w # Wait for all pods to be Running and Ready ``` ### Run Migrations ```bash # Step 16: Apply Category A migrations (startup) stella system migrations-run \ --category A \ --namespace stellaops-green # Step 17: Verify migration success stella system migrations-status --namespace stellaops-green # All migrations should show "Applied" # Step 18: Apply Category B migrations if needed (manual) # Review migration list first stella system migrations-pending --category B # Apply after review stella system migrations-run \ --category B \ --namespace stellaops-green \ --confirm ``` ### Evidence Migration (If Required) ```bash # Step 19: Check if evidence migration needed stella evidence migrate --dry-run --namespace stellaops-green # Step 20: If migration needed, execute stella evidence migrate \ --namespace stellaops-green \ --batch-size 100 \ --progress # Step 21: Verify evidence integrity post-migration stella evidence verify-all \ --namespace stellaops-green \ --output /tmp/post-migration-evidence.json ``` ### Health Validation ```bash # Step 22: Run health checks on green stella doctor --full --namespace stellaops-green # Step 23: Run smoke tests stella test smoke --namespace stellaops-green # Step 24: Verify critical paths stella test critical-paths --namespace stellaops-green ``` ## Traffic Cutover ### Gradual Cutover ```bash # Step 25: Enable canary (10%) kubectl apply -f - < 1%: Pause cutover - p99 latency > 5s: Investigate - Evidence failures > 0: Rollback ## Post-Upgrade Validation ### Evidence Continuity Verification ```bash # Step 30: Verify chain-of-custody stella evidence verify-continuity \ --baseline /backup/pre-upgrade-evidence-baseline.json \ --output /reports/continuity-report.html # Step 31: Verify Merkle roots stella evidence verify-roots \ --baseline /backup/pre-upgrade-merkle-roots.json \ --output /reports/roots-verification.json # Step 32: Compare evidence stats stella evidence stats > /tmp/post-upgrade-evidence-stats.txt diff /backup/pre-upgrade-evidence-stats.txt /tmp/post-upgrade-evidence-stats.txt # Step 33: Generate audit report stella evidence audit-report \ --since "${UPGRADE_START_TIME}" \ --format pdf \ --output /reports/upgrade-audit-$(date +%Y%m%d).pdf ``` ### Functional Validation ```bash # Step 34: Full integration test stella test integration --full # Step 35: Scan test stella scan \ --image registry.company.com/test-app:latest \ --sbom-format spdx-2.3 # Step 36: Attestation test stella attest \ --subject sha256:test123 \ --predicate-type slsa-provenance # Step 37: Policy evaluation test stella policy evaluate \ --artifact sha256:test123 \ --environment production ``` ### Post-Upgrade Checklist - [ ] Evidence continuity verified - [ ] Merkle roots consistent - [ ] All services healthy - [ ] Integration tests passing - [ ] Scan capability verified - [ ] Attestation generation working - [ ] Policy evaluation working - [ ] No elevated error rates - [ ] Latency within SLO **Validator signature**: __________________ **Date**: __________ ## Rollback Procedures ### Immediate Rollback (During Cutover) ```bash # Revert canary to 0% kubectl patch ingress stellaops-canary -n stellaops-green \ --type='json' \ -p='[{"op": "replace", "path": "/metadata/annotations/nginx.ingress.kubernetes.io~1canary-weight", "value": "0"}]' # Or delete canary entirely kubectl delete ingress stellaops-canary -n stellaops-green ``` ### Full Rollback (After Cutover) ```bash # Step R1: Assess database state stella system migrations-status # Step R2: If migrations are backward-compatible # Simply redeploy previous version helm upgrade stellaops ./helm/stellaops \ --namespace stellaops \ --set image.tag=${PREVIOUS_VERSION} \ --wait # Step R3: If database restore needed # Stop all services first kubectl scale deployment --all --replicas=0 -n stellaops # Restore database pg_restore -d stellaops -c /backup/stellaops-${BACKUP_TIMESTAMP}.dump # Redeploy previous version helm upgrade stellaops ./helm/stellaops \ --namespace stellaops \ --set image.tag=${PREVIOUS_VERSION} \ --wait # Step R4: Verify rollback stella doctor --full stella evidence verify-all ``` ## Cleanup ### After 72-Hour Observation ```bash # Step 40: Verify stable operation stella doctor --full stella evidence verify-all # Step 41: Remove blue environment kubectl delete namespace stellaops-blue # Step 42: Archive upgrade artifacts tar -czf /archive/upgrade-${UPGRADE_TIMESTAMP}.tar.gz \ /backup/ \ /reports/ \ /tmp/pre-upgrade-*.txt # Step 43: Update documentation echo "${TARGET_VERSION}" > docs/CURRENT_VERSION.md ``` ## Appendix ### Version-Specific Notes See `docs/releases/{version}/MIGRATION.md` for version-specific migration notes. ### Breaking Changes Matrix | From | To | Breaking Changes | Migration Required | |------|-----|-----------------|-------------------| | 2027.Q1 | 2027.Q2 | None | No | | 2026.Q4 | 2027.Q1 | Policy schema v2 | Yes | ### Support Contacts - Platform Team: platform@company.com - DBA Team: dba@company.com - Security Team: security@company.com - On-Call: +1-555-OPS-CALL