382 lines
9.5 KiB
Markdown
382 lines
9.5 KiB
Markdown
# Stella Ops Upgrade Runbook
|
|
|
|
This runbook provides step-by-step procedures for upgrading Stella Ops with evidence continuity preservation.
|
|
|
|
## Quick Reference
|
|
|
|
| Phase | Duration | Owner | Rollback Point |
|
|
|-------|----------|-------|----------------|
|
|
| Pre-Upgrade | 2-4 hours | Platform Team | N/A |
|
|
| Backup | 1-2 hours | DBA | Full restore |
|
|
| Deploy Green | 30-60 min | Platform Team | Abort deploy |
|
|
| Cutover | 15-30 min | Platform Team | Instant rollback |
|
|
| Validation | 1-2 hours | QA + Security | 72h observation |
|
|
| Cleanup | 30 min | Platform Team | N/A |
|
|
|
|
## Pre-Upgrade Checklist
|
|
|
|
### Environment Verification
|
|
|
|
```bash
|
|
# Step 1: Record current version
|
|
stella version > /tmp/pre-upgrade-version.txt
|
|
echo "Current version: $(cat /tmp/pre-upgrade-version.txt)"
|
|
|
|
# Step 2: Verify system health
|
|
stella doctor --full --output /tmp/pre-upgrade-health.json
|
|
if [ $? -ne 0 ]; then
|
|
echo "ABORT: System health check failed"
|
|
exit 1
|
|
fi
|
|
|
|
# Step 3: Check pending migrations
|
|
stella system migrations-status
|
|
# Ensure no pending migrations before upgrade
|
|
|
|
# Step 4: Verify queue depths
|
|
stella queue status --all
|
|
# All queues should be empty or near-empty
|
|
```
|
|
|
|
### Evidence Integrity Baseline
|
|
|
|
```bash
|
|
# Step 5: Capture evidence baseline
|
|
stella evidence verify-all \
|
|
--output /backup/pre-upgrade-evidence-baseline.json \
|
|
--include-merkle-roots
|
|
|
|
# Step 6: Export Merkle root summary
|
|
stella evidence roots-export \
|
|
--output /backup/pre-upgrade-merkle-roots.json
|
|
|
|
# Step 7: Record evidence counts
|
|
stella evidence stats > /backup/pre-upgrade-evidence-stats.txt
|
|
```
|
|
|
|
### Backup Procedures
|
|
|
|
```bash
|
|
# Step 8: PostgreSQL backup
|
|
BACKUP_TIMESTAMP=$(date +%Y%m%d-%H%M%S)
|
|
pg_dump -Fc -d stellaops -f /backup/stellaops-${BACKUP_TIMESTAMP}.dump
|
|
|
|
# Step 9: Verify backup integrity
|
|
pg_restore --list /backup/stellaops-${BACKUP_TIMESTAMP}.dump > /dev/null
|
|
if [ $? -ne 0 ]; then
|
|
echo "ABORT: Backup verification failed"
|
|
exit 1
|
|
fi
|
|
|
|
# Step 10: Evidence bundle backup
|
|
stella evidence export \
|
|
--all \
|
|
--output /backup/evidence-bundles-${BACKUP_TIMESTAMP}/
|
|
|
|
# Step 11: Configuration backup
|
|
kubectl get configmap -n stellaops -o yaml > /backup/configmaps-${BACKUP_TIMESTAMP}.yaml
|
|
kubectl get secret -n stellaops -o yaml > /backup/secrets-${BACKUP_TIMESTAMP}.yaml
|
|
```
|
|
|
|
### Pre-Flight Approval
|
|
|
|
Complete this checklist before proceeding:
|
|
|
|
- [ ] Current version documented
|
|
- [ ] System health: GREEN
|
|
- [ ] Evidence baseline captured
|
|
- [ ] PostgreSQL backup completed and verified
|
|
- [ ] Evidence bundles exported
|
|
- [ ] Configuration backed up
|
|
- [ ] Maintenance window approved
|
|
- [ ] Stakeholders notified
|
|
- [ ] Rollback plan reviewed
|
|
|
|
**Approver signature**: __________________ **Date**: __________
|
|
|
|
## Upgrade Execution
|
|
|
|
### Deploy Green Environment
|
|
|
|
```bash
|
|
# Step 12: Create green namespace
|
|
kubectl create namespace stellaops-green
|
|
|
|
# Step 13: Copy secrets to green namespace
|
|
kubectl get secret stellaops-secrets -n stellaops -o yaml | \
|
|
sed 's/namespace: stellaops/namespace: stellaops-green/' | \
|
|
kubectl apply -f -
|
|
|
|
# Step 14: Deploy new version
|
|
helm upgrade stellaops-green ./helm/stellaops \
|
|
--namespace stellaops-green \
|
|
--values values-production.yaml \
|
|
--set image.tag=${TARGET_VERSION} \
|
|
--wait --timeout 10m
|
|
|
|
# Step 15: Verify deployment
|
|
kubectl get pods -n stellaops-green -w
|
|
# Wait for all pods to be Running and Ready
|
|
```
|
|
|
|
### Run Migrations
|
|
|
|
```bash
|
|
# Step 16: Apply Category A migrations (startup)
|
|
stella system migrations-run \
|
|
--category A \
|
|
--namespace stellaops-green
|
|
|
|
# Step 17: Verify migration success
|
|
stella system migrations-status --namespace stellaops-green
|
|
# All migrations should show "Applied"
|
|
|
|
# Step 18: Apply Category B migrations if needed (manual)
|
|
# Review migration list first
|
|
stella system migrations-pending --category B
|
|
|
|
# Apply after review
|
|
stella system migrations-run \
|
|
--category B \
|
|
--namespace stellaops-green \
|
|
--confirm
|
|
```
|
|
|
|
### Evidence Migration (If Required)
|
|
|
|
```bash
|
|
# Step 19: Check if evidence migration needed
|
|
stella evidence migrate --dry-run --namespace stellaops-green
|
|
|
|
# Step 20: If migration needed, execute
|
|
stella evidence migrate \
|
|
--namespace stellaops-green \
|
|
--batch-size 100 \
|
|
--progress
|
|
|
|
# Step 21: Verify evidence integrity post-migration
|
|
stella evidence verify-all \
|
|
--namespace stellaops-green \
|
|
--output /tmp/post-migration-evidence.json
|
|
```
|
|
|
|
### Health Validation
|
|
|
|
```bash
|
|
# Step 22: Run health checks on green
|
|
stella doctor --full --namespace stellaops-green
|
|
|
|
# Step 23: Run smoke tests
|
|
stella test smoke --namespace stellaops-green
|
|
|
|
# Step 24: Verify critical paths
|
|
stella test critical-paths --namespace stellaops-green
|
|
```
|
|
|
|
## Traffic Cutover
|
|
|
|
### Gradual Cutover
|
|
|
|
```bash
|
|
# Step 25: Enable canary (10%)
|
|
kubectl apply -f - <<EOF
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: Ingress
|
|
metadata:
|
|
name: stellaops-canary
|
|
namespace: stellaops-green
|
|
annotations:
|
|
nginx.ingress.kubernetes.io/canary: "true"
|
|
nginx.ingress.kubernetes.io/canary-weight: "10"
|
|
spec:
|
|
ingressClassName: nginx
|
|
rules:
|
|
- host: stellaops.company.com
|
|
http:
|
|
paths:
|
|
- path: /
|
|
pathType: Prefix
|
|
backend:
|
|
service:
|
|
name: stellaops-api
|
|
port:
|
|
number: 80
|
|
EOF
|
|
|
|
# Step 26: Monitor for 15 minutes
|
|
# Check error rates, latency, evidence operations
|
|
|
|
# Step 27: Increase to 50%
|
|
kubectl patch ingress stellaops-canary -n stellaops-green \
|
|
--type='json' \
|
|
-p='[{"op": "replace", "path": "/metadata/annotations/nginx.ingress.kubernetes.io~1canary-weight", "value": "50"}]'
|
|
|
|
# Step 28: Monitor for 15 minutes
|
|
|
|
# Step 29: Complete cutover (100%)
|
|
kubectl patch ingress stellaops-canary -n stellaops-green \
|
|
--type='json' \
|
|
-p='[{"op": "replace", "path": "/metadata/annotations/nginx.ingress.kubernetes.io~1canary-weight", "value": "100"}]'
|
|
```
|
|
|
|
### Monitoring During Cutover
|
|
|
|
Watch these dashboards:
|
|
- Grafana: Stella Ops Overview
|
|
- Grafana: Evidence Operations
|
|
- Grafana: Attestation Pipeline
|
|
|
|
Alert thresholds:
|
|
- Error rate > 1%: Pause cutover
|
|
- p99 latency > 5s: Investigate
|
|
- Evidence failures > 0: Rollback
|
|
|
|
## Post-Upgrade Validation
|
|
|
|
### Evidence Continuity Verification
|
|
|
|
```bash
|
|
# Step 30: Verify chain-of-custody
|
|
stella evidence verify-continuity \
|
|
--baseline /backup/pre-upgrade-evidence-baseline.json \
|
|
--output /reports/continuity-report.html
|
|
|
|
# Step 31: Verify Merkle roots
|
|
stella evidence verify-roots \
|
|
--baseline /backup/pre-upgrade-merkle-roots.json \
|
|
--output /reports/roots-verification.json
|
|
|
|
# Step 32: Compare evidence stats
|
|
stella evidence stats > /tmp/post-upgrade-evidence-stats.txt
|
|
diff /backup/pre-upgrade-evidence-stats.txt /tmp/post-upgrade-evidence-stats.txt
|
|
|
|
# Step 33: Generate audit report
|
|
stella evidence audit-report \
|
|
--since "${UPGRADE_START_TIME}" \
|
|
--format pdf \
|
|
--output /reports/upgrade-audit-$(date +%Y%m%d).pdf
|
|
```
|
|
|
|
### Functional Validation
|
|
|
|
```bash
|
|
# Step 34: Full integration test
|
|
stella test integration --full
|
|
|
|
# Step 35: Scan test
|
|
stella scan \
|
|
--image registry.company.com/test-app:latest \
|
|
--sbom-format spdx-2.3
|
|
|
|
# Step 36: Attestation test
|
|
stella attest \
|
|
--subject sha256:test123 \
|
|
--predicate-type slsa-provenance
|
|
|
|
# Step 37: Policy evaluation test
|
|
stella policy evaluate \
|
|
--artifact sha256:test123 \
|
|
--environment production
|
|
```
|
|
|
|
### Post-Upgrade Checklist
|
|
|
|
- [ ] Evidence continuity verified
|
|
- [ ] Merkle roots consistent
|
|
- [ ] All services healthy
|
|
- [ ] Integration tests passing
|
|
- [ ] Scan capability verified
|
|
- [ ] Attestation generation working
|
|
- [ ] Policy evaluation working
|
|
- [ ] No elevated error rates
|
|
- [ ] Latency within SLO
|
|
|
|
**Validator signature**: __________________ **Date**: __________
|
|
|
|
## Rollback Procedures
|
|
|
|
### Immediate Rollback (During Cutover)
|
|
|
|
```bash
|
|
# Revert canary to 0%
|
|
kubectl patch ingress stellaops-canary -n stellaops-green \
|
|
--type='json' \
|
|
-p='[{"op": "replace", "path": "/metadata/annotations/nginx.ingress.kubernetes.io~1canary-weight", "value": "0"}]'
|
|
|
|
# Or delete canary entirely
|
|
kubectl delete ingress stellaops-canary -n stellaops-green
|
|
```
|
|
|
|
### Full Rollback (After Cutover)
|
|
|
|
```bash
|
|
# Step R1: Assess database state
|
|
stella system migrations-status
|
|
|
|
# Step R2: If migrations are backward-compatible
|
|
# Simply redeploy previous version
|
|
helm upgrade stellaops ./helm/stellaops \
|
|
--namespace stellaops \
|
|
--set image.tag=${PREVIOUS_VERSION} \
|
|
--wait
|
|
|
|
# Step R3: If database restore needed
|
|
# Stop all services first
|
|
kubectl scale deployment --all --replicas=0 -n stellaops
|
|
|
|
# Restore database
|
|
pg_restore -d stellaops -c /backup/stellaops-${BACKUP_TIMESTAMP}.dump
|
|
|
|
# Redeploy previous version
|
|
helm upgrade stellaops ./helm/stellaops \
|
|
--namespace stellaops \
|
|
--set image.tag=${PREVIOUS_VERSION} \
|
|
--wait
|
|
|
|
# Step R4: Verify rollback
|
|
stella doctor --full
|
|
stella evidence verify-all
|
|
```
|
|
|
|
## Cleanup
|
|
|
|
### After 72-Hour Observation
|
|
|
|
```bash
|
|
# Step 40: Verify stable operation
|
|
stella doctor --full
|
|
stella evidence verify-all
|
|
|
|
# Step 41: Remove blue environment
|
|
kubectl delete namespace stellaops-blue
|
|
|
|
# Step 42: Archive upgrade artifacts
|
|
tar -czf /archive/upgrade-${UPGRADE_TIMESTAMP}.tar.gz \
|
|
/backup/ \
|
|
/reports/ \
|
|
/tmp/pre-upgrade-*.txt
|
|
|
|
# Step 43: Update documentation
|
|
echo "${TARGET_VERSION}" > docs/CURRENT_VERSION.md
|
|
```
|
|
|
|
## Appendix
|
|
|
|
### Version-Specific Notes
|
|
|
|
See `docs/releases/{version}/MIGRATION.md` for version-specific migration notes.
|
|
|
|
### Breaking Changes Matrix
|
|
|
|
| From | To | Breaking Changes | Migration Required |
|
|
|------|-----|-----------------|-------------------|
|
|
| 2027.Q1 | 2027.Q2 | None | No |
|
|
| 2026.Q4 | 2027.Q1 | Policy schema v2 | Yes |
|
|
|
|
### Support Contacts
|
|
|
|
- Platform Team: platform@company.com
|
|
- DBA Team: dba@company.com
|
|
- Security Team: security@company.com
|
|
- On-Call: +1-555-OPS-CALL
|