7.7 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			7.7 KiB
		
	
	
	
	
	
	
	
Launch Cutover Runbook - Stella Ops
Document owner: DevOps Guild (2025-10-26)
Scope: Full-platform launch from staging to production for release 2025.09.2.
1. Roles and Communication
| Role | Primary | Backup | Contact | 
|---|---|---|---|
| Cutover lead | DevOps Guild (on-call engineer) | Platform Ops lead | #launch-bridge(Mattermost) | 
| Authority stack | Authority Core guild rep | Security guild rep | #authority | 
| Scanner / Queue | Scanner WebService guild rep | Runtime guild rep | #scanner | 
| Storage | Mongo/MinIO operators | Backup DB admin | Pager escalation | 
| Observability | Telemetry guild rep | SRE on-call | #telemetry | 
| Approvals | Product owner + CTO | DevOps lead | Approval recorded in change ticket | 
Set up a bridge call 30 minutes before start and keep #launch-bridge updated every 10 minutes.
2. Timeline Overview (UTC)
| Time | Activity | Owner | 
|---|---|---|
| T-24h | Change ticket approved, prod secrets verified, offline kit build status checked ( DEVOPS-OFFLINE-18-005). | DevOps lead | 
| T-12h | Run deploy/tools/validate-profiles.sh; capture logs in ticket. | DevOps engineer | 
| T-6h | Freeze non-launch deployments; notify guild leads. | Product owner | 
| T-2h | Execute rehearsal in staging (Section 3) using values-stage.yamlto verify scripts. | DevOps + module reps | 
| T-30m | Final go/no-go with guild leads; confirm monitoring dashboards green. | Cutover lead | 
| T0 | Execute production cutover steps (Section 4). | Cutover team | 
| T+45m | Smoke tests complete (Section 5); announce success or trigger rollback. | Cutover lead | 
| T+4h | Post-cutover metrics review, notify stakeholders, close ticket. | DevOps + product owner | 
3. Rehearsal (Staging) Checklist
- docker network create stellaops_frontdoor || true(if not present on staging jump host).
- Run deploy/tools/validate-profiles.shand archive output.
- Apply staging secrets (kubectl apply -f secrets/stage/*.yamlorhelm secrets upgrade) ensuringstellaops-stagecredentials align withvalues-stage.yaml.
- Perform helm upgrade stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-stage.yamlin staging cluster.
- Verify health endpoints: curl https://authority.stage.../healthz,curl https://scanner.stage.../healthz.
- Execute smoke CLI: stellaops-cli scan submit --profile staging --sbom samples/sbom/demo.jsonand confirm report status in UI.
- Document total wall time and any deviations in the rehearsal log.
Rehearsal must complete without manual interventions before proceeding to production.
4. Production Cutover Steps
4.1 Pre-flight
- Confirm production secrets in the appropriate secret store (stellaops-prod-core,stellaops-prod-mongo,stellaops-prod-minio,stellaops-prod-notify) contain the keys referenced invalues-prod.yaml.
- Ensure the external reverse proxy network exists: docker network create stellaops_frontdoor || trueon each compose host.
- Back up current configuration and data:
- Mongo snapshot: mongodump --uri "$MONGO_BACKUP_URI" --out /backups/launch-$(date -Iseconds).
- MinIO policy export: mc mirror --overwrite minio/stellaops minio-backup/stellaops-$(date +%Y%m%d%H%M).
 
- Mongo snapshot: 
4.2 Apply Updates (Compose)
- On each compose node, pull updated images for release 2025.09.2:docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml pull
- Deploy changes:
docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml up -d
- Confirm containers healthy via docker compose psanddocker logs <service> --tail 50.
4.3 Apply Updates (Helm/Kubernetes)
If using Kubernetes, perform:
helm upgrade stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml --atomic --timeout 15m
Monitor rollout with kubectl get pods -n stellaops --watch and kubectl rollout status deployment/<service>.
4.4 Configuration Validation
- Verify Authority issuer metadata: curl https://authority.prod.../.well-known/openid-configuration.
- Validate Signer DSSE endpoint: stellaops-cli signer verify --base-url https://signer.prod... --bundle samples/dsse/demo.json.
- Check Scanner queue connectivity: docker exec stellaops-scanner-web dotnet StellaOps.Scanner.WebService.dll health queue(returns success).
- Ensure Notify (legacy) still accessible while Notifier migration pending.
5. Smoke Tests
| Test | Command / Action | Expected Result | 
|---|---|---|
| API health | curl https://scanner.prod.../healthz | HTTP 200 with status":"Healthy" | 
| Scan submit | stellaops-cli scan submit --profile prod --sbom samples/sbom/demo.json | Scan completes < 5 minutes; report accessible with signed DSSE | 
| Runtime event ingest | Post sample event from Zastava observer fixture | /runtime/eventsresponds 202 Accepted; record visible in Mongoruntime_events | 
| Signing | stellaops-cli signer sign --bundle demo.json | Returns DSSE with matching SHA256 and signer metadata | 
| Attestor verify | stellaops-cli attestor verify --uuid <uuid> | Verification result ok=true | 
| Web UI | Manual login, verify dashboards render and latency within budget | UI loads under 2 seconds; policy views consistent | 
Log results in the change ticket with timestamps and screenshots where applicable.
6. Rollback Procedure
- Assess failure scope; if systemic, initiate rollback immediately while preserving logs/artifacts.
- For Compose:
docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml down docker compose --env-file stage.env -f deploy/compose/docker-compose.stage.yaml up -d
- For Helm:
helm rollback stellaops <previous-release-number> --namespace stellaops
- Restore Mongo snapshot if data inconsistency detected: mongorestore --uri "$MONGO_BACKUP_URI" --drop /backups/launch-<timestamp>.
- Restore MinIO mirror if required: mc mirror minio-backup/stellaops-<timestamp> minio/stellaops.
- Notify stakeholders of rollback and capture root cause notes in incident ticket.
7. Post-cutover Actions
- Keep heightened monitoring for 4 hours post cutover; track latency, error rates, and queue depth.
- Confirm audit trails: Authority tokens issued, Scanner events recorded, Attestor submissions stored.
- Update docs/ops/launch-readiness.mdif any new gaps or follow-ups discovered.
- Schedule retrospective within 48 hours; include DevOps, module guilds, and product owner.
8. Approval Matrix
| Step | Required Approvers | Record Location | 
|---|---|---|
| Production deployment plan | CTO + DevOps lead | Change ticket comment | 
| Cutover start (T0) | DevOps lead + module reps | #launch-bridgesummary | 
| Post-smoke success | DevOps lead + product owner | Change ticket closure | 
| Rollback (if invoked) | DevOps lead + CTO | Incident ticket | 
Retain all approvals and logs for audit. Update this runbook after each execution to record actual timings and lessons learned.
9. Rehearsal Log
| Date (UTC) | What We Exercised | Outcome | Follow-up | 
|---|---|---|---|
| 2025-10-26 | Dry-run of compose/Helm validation via deploy/tools/validate-profiles.sh(dev/stage/prod/airgap/mirror). Network creation simulated (docker network create stellaops_frontdoorplanned) and stage CLI submission reviewed. | Validation script succeeded; all profiles templated cleanly. Stage deployment apply deferred because no staging cluster is accessible from the current environment. | Schedule full stage rehearsal once staging cluster credentials are available; reuse this log section to capture timings. |