--- checkId: check.release.rollback.readiness plugin: stellaops.doctor.release severity: warn tags: [release, rollback, disaster-recovery, production] --- # Rollback Readiness ## What It Checks Queries the Release Orchestrator at `/api/v1/environments/rollback-status` (with fallback to `/api/v1/environments`) and evaluates rollback capability for production environments: - **Cannot rollback**: fail if a production environment has a previous version but cannot roll back (e.g., irreversible migration, artifacts purged). - **No previous version**: warn if a production environment has no previous deployment to roll back to. - **Missing health probe**: warn if a production environment lacks a health probe (prevents auto-rollback on failure). Only production environments (type "prod" or "production") are evaluated. Non-production environments are not checked. Evidence collected: `prod_environment_count`, `rollback_ready_count`, `cannot_rollback_count`, `no_previous_version_count`, `no_health_probe_count`, `cannot_rollback_environments`, `rollback_blocker`. The check requires `ReleaseOrchestrator:Url` or `Release:Orchestrator:Url` to be configured. ## Why It Matters Rollback is the primary recovery mechanism when a production deployment introduces a critical issue. If rollback is unavailable, the only options are an emergency forward-fix or extended downtime. Missing health probes prevent automatic rollback on deployment failure, requiring manual intervention during incidents. In regulated environments, rollback readiness is often a compliance requirement for change management. ## Common Causes - Previous deployment artifacts not retained (artifact retention policy too aggressive) - Database migration not reversible (destructive schema change) - Breaking API change deployed that prevents running the previous version - Rollback manually disabled for the environment - First deployment to environment (no previous version exists) - Deployment history cleared during maintenance - Health probe URL not configured for auto-rollback - Auto-rollback on failure not enabled ## How to Fix ### Docker Compose ```bash # Check rollback status for a specific environment stella env rollback-status # View deployment history stella env history # Configure artifact retention to keep previous versions ``` ```yaml services: orchestrator: environment: Release__ArtifactRetention__Count: "5" Release__ArtifactRetention__Days: "30" ``` Configure health probes: ```bash # Set health probe for a production environment stella env configure --health-probe-url "http://:8080/health" # Enable auto-rollback on failure stella env configure --auto-rollback-on-failure ``` ### Bare Metal / systemd ```bash # Check rollback blockers stella env rollback-status # View deployment history stella env history # Configure health probe stella env configure --health-probe-url "http://localhost:8080/health" # Enable auto-rollback stella env configure --auto-rollback-on-failure ``` Edit `/etc/stellaops/orchestrator/appsettings.json`: ```json { "Release": { "ArtifactRetention": { "Count": 5, "Days": 30 } } } ``` ### Kubernetes / Helm ```bash # Check rollback status kubectl exec -it -- stella env rollback-status # View deployment history kubectl exec -it -- stella env history ``` Set in Helm `values.yaml`: ```yaml releaseOrchestrator: artifactRetention: count: 5 days: 30 environments: production: healthProbeUrl: "http://app:8080/health" autoRollbackOnFailure: true ``` ## Verification ``` stella doctor run --check check.release.rollback.readiness ``` ## Related Checks - `check.release.active` -- failed releases may require rollback - `check.release.environment.readiness` -- environment health affects rollback execution - `check.release.configuration` -- workflow configuration defines rollback behavior