9.5 KiB
9.5 KiB
Stella Ops Upgrade Runbook
This runbook provides step-by-step procedures for upgrading Stella Ops with evidence continuity preservation.
Quick Reference
| Phase | Duration | Owner | Rollback Point |
|---|---|---|---|
| Pre-Upgrade | 2-4 hours | Platform Team | N/A |
| Backup | 1-2 hours | DBA | Full restore |
| Deploy Green | 30-60 min | Platform Team | Abort deploy |
| Cutover | 15-30 min | Platform Team | Instant rollback |
| Validation | 1-2 hours | QA + Security | 72h observation |
| Cleanup | 30 min | Platform Team | N/A |
Pre-Upgrade Checklist
Environment Verification
# Step 1: Record current version
stella version > /tmp/pre-upgrade-version.txt
echo "Current version: $(cat /tmp/pre-upgrade-version.txt)"
# Step 2: Verify system health
stella doctor --full --output /tmp/pre-upgrade-health.json
if [ $? -ne 0 ]; then
echo "ABORT: System health check failed"
exit 1
fi
# Step 3: Check pending migrations
stella system migrations-status
# Ensure no pending migrations before upgrade
# Step 4: Verify queue depths
stella queue status --all
# All queues should be empty or near-empty
Evidence Integrity Baseline
# Step 5: Capture evidence baseline
stella evidence verify-all \
--output /backup/pre-upgrade-evidence-baseline.json \
--include-merkle-roots
# Step 6: Export Merkle root summary
stella evidence roots-export \
--output /backup/pre-upgrade-merkle-roots.json
# Step 7: Record evidence counts
stella evidence stats > /backup/pre-upgrade-evidence-stats.txt
Backup Procedures
# Step 8: PostgreSQL backup
BACKUP_TIMESTAMP=$(date +%Y%m%d-%H%M%S)
pg_dump -Fc -d stellaops -f /backup/stellaops-${BACKUP_TIMESTAMP}.dump
# Step 9: Verify backup integrity
pg_restore --list /backup/stellaops-${BACKUP_TIMESTAMP}.dump > /dev/null
if [ $? -ne 0 ]; then
echo "ABORT: Backup verification failed"
exit 1
fi
# Step 10: Evidence bundle backup
stella evidence export \
--all \
--output /backup/evidence-bundles-${BACKUP_TIMESTAMP}/
# Step 11: Configuration backup
kubectl get configmap -n stellaops -o yaml > /backup/configmaps-${BACKUP_TIMESTAMP}.yaml
kubectl get secret -n stellaops -o yaml > /backup/secrets-${BACKUP_TIMESTAMP}.yaml
Pre-Flight Approval
Complete this checklist before proceeding:
- Current version documented
- System health: GREEN
- Evidence baseline captured
- PostgreSQL backup completed and verified
- Evidence bundles exported
- Configuration backed up
- Maintenance window approved
- Stakeholders notified
- Rollback plan reviewed
Approver signature: __________________ Date: __________
Upgrade Execution
Deploy Green Environment
# Step 12: Create green namespace
kubectl create namespace stellaops-green
# Step 13: Copy secrets to green namespace
kubectl get secret stellaops-secrets -n stellaops -o yaml | \
sed 's/namespace: stellaops/namespace: stellaops-green/' | \
kubectl apply -f -
# Step 14: Deploy new version
helm upgrade stellaops-green ./helm/stellaops \
--namespace stellaops-green \
--values values-production.yaml \
--set image.tag=${TARGET_VERSION} \
--wait --timeout 10m
# Step 15: Verify deployment
kubectl get pods -n stellaops-green -w
# Wait for all pods to be Running and Ready
Run Migrations
# Step 16: Apply Category A migrations (startup)
stella system migrations-run \
--category A \
--namespace stellaops-green
# Step 17: Verify migration success
stella system migrations-status --namespace stellaops-green
# All migrations should show "Applied"
# Step 18: Apply Category B migrations if needed (manual)
# Review migration list first
stella system migrations-pending --category B
# Apply after review
stella system migrations-run \
--category B \
--namespace stellaops-green \
--confirm
Evidence Migration (If Required)
# Step 19: Check if evidence migration needed
stella evidence migrate --dry-run --namespace stellaops-green
# Step 20: If migration needed, execute
stella evidence migrate \
--namespace stellaops-green \
--batch-size 100 \
--progress
# Step 21: Verify evidence integrity post-migration
stella evidence verify-all \
--namespace stellaops-green \
--output /tmp/post-migration-evidence.json
Health Validation
# Step 22: Run health checks on green
stella doctor --full --namespace stellaops-green
# Step 23: Run smoke tests
stella test smoke --namespace stellaops-green
# Step 24: Verify critical paths
stella test critical-paths --namespace stellaops-green
Traffic Cutover
Gradual Cutover
# Step 25: Enable canary (10%)
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: stellaops-canary
namespace: stellaops-green
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
ingressClassName: nginx
rules:
- host: stellaops.company.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: stellaops-api
port:
number: 80
EOF
# Step 26: Monitor for 15 minutes
# Check error rates, latency, evidence operations
# Step 27: Increase to 50%
kubectl patch ingress stellaops-canary -n stellaops-green \
--type='json' \
-p='[{"op": "replace", "path": "/metadata/annotations/nginx.ingress.kubernetes.io~1canary-weight", "value": "50"}]'
# Step 28: Monitor for 15 minutes
# Step 29: Complete cutover (100%)
kubectl patch ingress stellaops-canary -n stellaops-green \
--type='json' \
-p='[{"op": "replace", "path": "/metadata/annotations/nginx.ingress.kubernetes.io~1canary-weight", "value": "100"}]'
Monitoring During Cutover
Watch these dashboards:
- Grafana: Stella Ops Overview
- Grafana: Evidence Operations
- Grafana: Attestation Pipeline
Alert thresholds:
- Error rate > 1%: Pause cutover
- p99 latency > 5s: Investigate
- Evidence failures > 0: Rollback
Post-Upgrade Validation
Evidence Continuity Verification
# Step 30: Verify chain-of-custody
stella evidence verify-continuity \
--baseline /backup/pre-upgrade-evidence-baseline.json \
--output /reports/continuity-report.html
# Step 31: Verify Merkle roots
stella evidence verify-roots \
--baseline /backup/pre-upgrade-merkle-roots.json \
--output /reports/roots-verification.json
# Step 32: Compare evidence stats
stella evidence stats > /tmp/post-upgrade-evidence-stats.txt
diff /backup/pre-upgrade-evidence-stats.txt /tmp/post-upgrade-evidence-stats.txt
# Step 33: Generate audit report
stella evidence audit-report \
--since "${UPGRADE_START_TIME}" \
--format pdf \
--output /reports/upgrade-audit-$(date +%Y%m%d).pdf
Functional Validation
# Step 34: Full integration test
stella test integration --full
# Step 35: Scan test
stella scan \
--image registry.company.com/test-app:latest \
--sbom-format spdx-2.3
# Step 36: Attestation test
stella attest \
--subject sha256:test123 \
--predicate-type slsa-provenance
# Step 37: Policy evaluation test
stella policy evaluate \
--artifact sha256:test123 \
--environment production
Post-Upgrade Checklist
- Evidence continuity verified
- Merkle roots consistent
- All services healthy
- Integration tests passing
- Scan capability verified
- Attestation generation working
- Policy evaluation working
- No elevated error rates
- Latency within SLO
Validator signature: __________________ Date: __________
Rollback Procedures
Immediate Rollback (During Cutover)
# Revert canary to 0%
kubectl patch ingress stellaops-canary -n stellaops-green \
--type='json' \
-p='[{"op": "replace", "path": "/metadata/annotations/nginx.ingress.kubernetes.io~1canary-weight", "value": "0"}]'
# Or delete canary entirely
kubectl delete ingress stellaops-canary -n stellaops-green
Full Rollback (After Cutover)
# Step R1: Assess database state
stella system migrations-status
# Step R2: If migrations are backward-compatible
# Simply redeploy previous version
helm upgrade stellaops ./helm/stellaops \
--namespace stellaops \
--set image.tag=${PREVIOUS_VERSION} \
--wait
# Step R3: If database restore needed
# Stop all services first
kubectl scale deployment --all --replicas=0 -n stellaops
# Restore database
pg_restore -d stellaops -c /backup/stellaops-${BACKUP_TIMESTAMP}.dump
# Redeploy previous version
helm upgrade stellaops ./helm/stellaops \
--namespace stellaops \
--set image.tag=${PREVIOUS_VERSION} \
--wait
# Step R4: Verify rollback
stella doctor --full
stella evidence verify-all
Cleanup
After 72-Hour Observation
# Step 40: Verify stable operation
stella doctor --full
stella evidence verify-all
# Step 41: Remove blue environment
kubectl delete namespace stellaops-blue
# Step 42: Archive upgrade artifacts
tar -czf /archive/upgrade-${UPGRADE_TIMESTAMP}.tar.gz \
/backup/ \
/reports/ \
/tmp/pre-upgrade-*.txt
# Step 43: Update documentation
echo "${TARGET_VERSION}" > docs/CURRENT_VERSION.md
Appendix
Version-Specific Notes
See docs/releases/{version}/MIGRATION.md for version-specific migration notes.
Breaking Changes Matrix
| From | To | Breaking Changes | Migration Required |
|---|---|---|---|
| 2027.Q1 | 2027.Q2 | None | No |
| 2026.Q4 | 2027.Q1 | Policy schema v2 | Yes |
Support Contacts
- Platform Team: platform@company.com
- DBA Team: dba@company.com
- Security Team: security@company.com
- On-Call: +1-555-OPS-CALL