Files
git.stella-ops.org/docs/operations/blue-green-deployment.md

7.3 KiB

Blue/Green Deployment Guide

This guide documents the blue/green deployment strategy for Stella Ops platform upgrades with evidence continuity preservation.

Overview

Blue/green deployment maintains two identical production environments:

  • Blue: Current production environment
  • Green: New version deployment target

This approach enables zero-downtime upgrades and instant rollback capability while preserving all evidence integrity.

Prerequisites

Infrastructure Requirements

Component Blue Environment Green Environment
Kubernetes namespace stellaops-prod stellaops-green
PostgreSQL Shared (with migration support) Shared
Redis/Valkey Separate instance Separate instance
Object Storage Shared (evidence bundles) Shared
Load Balancer Traffic routing Traffic routing

Version Compatibility

Before upgrading, verify version compatibility:

# Check current version
stella version

# Check target version compatibility
stella upgrade check --target 2027.Q2

See docs/releases/VERSIONING.md for the full compatibility matrix.

Deployment Phases

Phase 1: Preparation

1.1 Environment Assessment

# Verify current health
stella doctor --full

# Check pending migrations
stella system migrations-status

# Verify evidence integrity baseline
stella evidence verify-all --output pre-upgrade-baseline.json

1.2 Backup Procedures

# PostgreSQL backup
pg_dump -Fc stellaops > backup-$(date +%Y%m%d-%H%M%S).dump

# Evidence bundle export
stella evidence export --all --output evidence-backup/

# Configuration backup
kubectl get configmap -n stellaops-prod -o yaml > configmaps-backup.yaml
kubectl get secret -n stellaops-prod -o yaml > secrets-backup.yaml

1.3 Pre-Flight Checklist

  • All services healthy
  • No active scans or attestations in progress
  • Queue depths at zero
  • Backup completed and verified
  • Evidence baseline captured
  • Maintenance window communicated

Phase 2: Green Environment Deployment

2.1 Deploy New Version

# Deploy to green namespace
helm upgrade stellaops-green ./helm/stellaops \
  --namespace stellaops-green \
  --create-namespace \
  --values values-production.yaml \
  --set image.tag=2027.Q2 \
  --wait

# Verify deployment
kubectl get pods -n stellaops-green

2.2 Run Migrations

# Apply startup migrations (Category A)
stella system migrations-run --category A

# Verify migration status
stella system migrations-status

2.3 Health Validation

# Run health checks on green
stella doctor --full --namespace stellaops-green

# Run smoke tests
stella test smoke --namespace stellaops-green

Phase 3: Traffic Cutover

# Update ingress for gradual traffic shift
# ingress-canary.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: stellaops-canary
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"  # Start with 10%
spec:
  rules:
  - host: stellaops.company.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: stellaops-green
            port:
              number: 80

Increase weight gradually: 10% -> 25% -> 50% -> 100%

3.2 Instant Cutover

# Switch DNS/load balancer to green
kubectl patch ingress stellaops-main \
  -n stellaops-prod \
  --type='json' \
  -p='[{"op": "replace", "path": "/spec/rules/0/http/paths/0/backend/service/name", "value": "stellaops-green"}]'

3.3 Monitoring During Cutover

Monitor these metrics during cutover:

  • Error rate: rate(http_requests_total{status=~"5.."}[1m])
  • Latency p99: histogram_quantile(0.99, http_request_duration_seconds_bucket)
  • Evidence operations: rate(evidence_operations_total[1m])
  • Attestation success: rate(attestation_success_total[1m])

Phase 4: Post-Upgrade Validation

4.1 Evidence Continuity Verification

# Verify evidence chain-of-custody
stella evidence verify-continuity \
  --baseline pre-upgrade-baseline.json \
  --output post-upgrade-report.html

# Generate audit report
stella evidence audit-report \
  --since $(date -d '1 hour ago' --iso-8601) \
  --output upgrade-audit.pdf

4.2 Functional Validation

# Run full test suite
stella test integration

# Verify scan capability
stella scan --image test-image:latest --dry-run

# Verify attestation generation
stella attest verify --bundle test-bundle.tar.gz

4.3 Documentation Update

  • Update CURRENT_VERSION.md with new version
  • Record upgrade in CHANGELOG.md
  • Archive upgrade artifacts

Phase 5: Cleanup

5.1 Observation Period

Maintain blue environment for 72 hours minimum before decommission.

5.2 Blue Environment Decommission

# After observation period, remove blue
helm uninstall stellaops-blue -n stellaops-prod

# Clean up resources
kubectl delete namespace stellaops-blue

Rollback Procedures

Immediate Rollback (During Cutover)

# Revert traffic to blue
kubectl patch ingress stellaops-main \
  -n stellaops-prod \
  --type='json' \
  -p='[{"op": "replace", "path": "/spec/rules/0/http/paths/0/backend/service/name", "value": "stellaops-blue"}]'

Post-Cutover Rollback

If rollback needed after cutover complete:

  1. Assess impact: Run stella evidence verify-continuity to check evidence state
  2. Database considerations: Backward-compatible migrations allow rollback; breaking migrations require restore
  3. Evidence preservation: Evidence bundles created during green operation remain valid
# If database rollback needed
pg_restore -d stellaops backup-YYYYMMDD-HHMMSS.dump

# Redeploy blue version
helm upgrade stellaops ./helm/stellaops \
  --namespace stellaops-prod \
  --set image.tag=2027.Q1 \
  --wait

Evidence Continuity Guarantees

Preserved During Upgrade

Artifact Guarantee
OCI digests Unchanged
SBOM content hashes Unchanged
Merkle roots Recomputed if schema changes (cross-reference maintained)
Attestation signatures Valid
Rekor log entries Immutable

Verification Commands

# Verify OCI digests unchanged
stella evidence verify-digests --report digests.json

# Verify attestation validity
stella attest verify-all --since $(date -d '7 days ago' --iso-8601)

# Generate compliance report
stella evidence compliance-report --format pdf

Troubleshooting

Common Issues

Issue Symptom Resolution
Migration timeout Pod stuck in init Increase migrationTimeoutSeconds
Health check failure Ready probe failing Check database connectivity
Evidence mismatch Continuity check fails Run stella evidence reindex
Traffic not routing 502 errors Verify service selector labels

Support Escalation

If upgrade issues cannot be resolved:

  1. Capture diagnostics: stella doctor --export diagnostics.tar.gz
  2. Rollback to blue environment
  3. Contact support with diagnostics bundle