Files
git.stella-ops.org/docs/operations/blue-green-deployment.md

295 lines
7.3 KiB
Markdown

# Blue/Green Deployment Guide
This guide documents the blue/green deployment strategy for Stella Ops platform upgrades with evidence continuity preservation.
## Overview
Blue/green deployment maintains two identical production environments:
- **Blue**: Current production environment
- **Green**: New version deployment target
This approach enables zero-downtime upgrades and instant rollback capability while preserving all evidence integrity.
## Prerequisites
### Infrastructure Requirements
| Component | Blue Environment | Green Environment |
|-----------|-----------------|-------------------|
| Kubernetes namespace | `stellaops-prod` | `stellaops-green` |
| PostgreSQL | Shared (with migration support) | Shared |
| Redis/Valkey | Separate instance | Separate instance |
| Object Storage | Shared (evidence bundles) | Shared |
| Load Balancer | Traffic routing | Traffic routing |
### Version Compatibility
Before upgrading, verify version compatibility:
```bash
# Check current version
stella version
# Check target version compatibility
stella upgrade check --target 2027.Q2
```
See `docs/releases/VERSIONING.md` for the full compatibility matrix.
## Deployment Phases
### Phase 1: Preparation
#### 1.1 Environment Assessment
```bash
# Verify current health
stella doctor --full
# Check pending migrations
stella system migrations-status
# Verify evidence integrity baseline
stella evidence verify-all --output pre-upgrade-baseline.json
```
#### 1.2 Backup Procedures
```bash
# PostgreSQL backup
pg_dump -Fc stellaops > backup-$(date +%Y%m%d-%H%M%S).dump
# Evidence bundle export
stella evidence export --all --output evidence-backup/
# Configuration backup
kubectl get configmap -n stellaops-prod -o yaml > configmaps-backup.yaml
kubectl get secret -n stellaops-prod -o yaml > secrets-backup.yaml
```
#### 1.3 Pre-Flight Checklist
- [ ] All services healthy
- [ ] No active scans or attestations in progress
- [ ] Queue depths at zero
- [ ] Backup completed and verified
- [ ] Evidence baseline captured
- [ ] Maintenance window communicated
### Phase 2: Green Environment Deployment
#### 2.1 Deploy New Version
```bash
# Deploy to green namespace
helm upgrade stellaops-green ./helm/stellaops \
--namespace stellaops-green \
--create-namespace \
--values values-production.yaml \
--set image.tag=2027.Q2 \
--wait
# Verify deployment
kubectl get pods -n stellaops-green
```
#### 2.2 Run Migrations
```bash
# Apply startup migrations (Category A)
stella system migrations-run --category A
# Verify migration status
stella system migrations-status
```
#### 2.3 Health Validation
```bash
# Run health checks on green
stella doctor --full --namespace stellaops-green
# Run smoke tests
stella test smoke --namespace stellaops-green
```
### Phase 3: Traffic Cutover
#### 3.1 Gradual Cutover (Recommended)
```yaml
# Update ingress for gradual traffic shift
# ingress-canary.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: stellaops-canary
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10" # Start with 10%
spec:
rules:
- host: stellaops.company.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: stellaops-green
port:
number: 80
```
Increase weight gradually: 10% -> 25% -> 50% -> 100%
#### 3.2 Instant Cutover
```bash
# Switch DNS/load balancer to green
kubectl patch ingress stellaops-main \
-n stellaops-prod \
--type='json' \
-p='[{"op": "replace", "path": "/spec/rules/0/http/paths/0/backend/service/name", "value": "stellaops-green"}]'
```
#### 3.3 Monitoring During Cutover
Monitor these metrics during cutover:
- Error rate: `rate(http_requests_total{status=~"5.."}[1m])`
- Latency p99: `histogram_quantile(0.99, http_request_duration_seconds_bucket)`
- Evidence operations: `rate(evidence_operations_total[1m])`
- Attestation success: `rate(attestation_success_total[1m])`
### Phase 4: Post-Upgrade Validation
#### 4.1 Evidence Continuity Verification
```bash
# Verify evidence chain-of-custody
stella evidence verify-continuity \
--baseline pre-upgrade-baseline.json \
--output post-upgrade-report.html
# Generate audit report
stella evidence audit-report \
--since $(date -d '1 hour ago' --iso-8601) \
--output upgrade-audit.pdf
```
#### 4.2 Functional Validation
```bash
# Run full test suite
stella test integration
# Verify scan capability
stella scan --image test-image:latest --dry-run
# Verify attestation generation
stella attest verify --bundle test-bundle.tar.gz
```
#### 4.3 Documentation Update
- Update `CURRENT_VERSION.md` with new version
- Record upgrade in `CHANGELOG.md`
- Archive upgrade artifacts
### Phase 5: Cleanup
#### 5.1 Observation Period
Maintain blue environment for 72 hours minimum before decommission.
#### 5.2 Blue Environment Decommission
```bash
# After observation period, remove blue
helm uninstall stellaops-blue -n stellaops-prod
# Clean up resources
kubectl delete namespace stellaops-blue
```
## Rollback Procedures
### Immediate Rollback (During Cutover)
```bash
# Revert traffic to blue
kubectl patch ingress stellaops-main \
-n stellaops-prod \
--type='json' \
-p='[{"op": "replace", "path": "/spec/rules/0/http/paths/0/backend/service/name", "value": "stellaops-blue"}]'
```
### Post-Cutover Rollback
If rollback needed after cutover complete:
1. **Assess impact**: Run `stella evidence verify-continuity` to check evidence state
2. **Database considerations**: Backward-compatible migrations allow rollback; breaking migrations require restore
3. **Evidence preservation**: Evidence bundles created during green operation remain valid
```bash
# If database rollback needed
pg_restore -d stellaops backup-YYYYMMDD-HHMMSS.dump
# Redeploy blue version
helm upgrade stellaops ./helm/stellaops \
--namespace stellaops-prod \
--set image.tag=2027.Q1 \
--wait
```
## Evidence Continuity Guarantees
### Preserved During Upgrade
| Artifact | Guarantee |
|----------|-----------|
| OCI digests | Unchanged |
| SBOM content hashes | Unchanged |
| Merkle roots | Recomputed if schema changes (cross-reference maintained) |
| Attestation signatures | Valid |
| Rekor log entries | Immutable |
### Verification Commands
```bash
# Verify OCI digests unchanged
stella evidence verify-digests --report digests.json
# Verify attestation validity
stella attest verify-all --since $(date -d '7 days ago' --iso-8601)
# Generate compliance report
stella evidence compliance-report --format pdf
```
## Troubleshooting
### Common Issues
| Issue | Symptom | Resolution |
|-------|---------|------------|
| Migration timeout | Pod stuck in init | Increase `migrationTimeoutSeconds` |
| Health check failure | Ready probe failing | Check database connectivity |
| Evidence mismatch | Continuity check fails | Run `stella evidence reindex` |
| Traffic not routing | 502 errors | Verify service selector labels |
### Support Escalation
If upgrade issues cannot be resolved:
1. Capture diagnostics: `stella doctor --export diagnostics.tar.gz`
2. Rollback to blue environment
3. Contact support with diagnostics bundle
## Related Documentation
- [Upgrade Runbook](upgrade-runbook.md)
- [Evidence Migration](evidence-migration.md)
- [Database Migration Strategy](../db/MIGRATION_STRATEGY.md)
- [Versioning Policy](../releases/VERSIONING.md)