Files
git.stella-ops.org/docs/doctor/articles/release/rollback-readiness.md
master c58a236d70 Doctor plugin checks: implement health check classes and documentation
Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:28:00 +02:00

4.0 KiB

checkId, plugin, severity, tags
checkId plugin severity tags
check.release.rollback.readiness stellaops.doctor.release warn
release
rollback
disaster-recovery
production

Rollback Readiness

What It Checks

Queries the Release Orchestrator at /api/v1/environments/rollback-status (with fallback to /api/v1/environments) and evaluates rollback capability for production environments:

  • Cannot rollback: fail if a production environment has a previous version but cannot roll back (e.g., irreversible migration, artifacts purged).
  • No previous version: warn if a production environment has no previous deployment to roll back to.
  • Missing health probe: warn if a production environment lacks a health probe (prevents auto-rollback on failure).

Only production environments (type "prod" or "production") are evaluated. Non-production environments are not checked.

Evidence collected: prod_environment_count, rollback_ready_count, cannot_rollback_count, no_previous_version_count, no_health_probe_count, cannot_rollback_environments, rollback_blocker.

The check requires ReleaseOrchestrator:Url or Release:Orchestrator:Url to be configured.

Why It Matters

Rollback is the primary recovery mechanism when a production deployment introduces a critical issue. If rollback is unavailable, the only options are an emergency forward-fix or extended downtime. Missing health probes prevent automatic rollback on deployment failure, requiring manual intervention during incidents. In regulated environments, rollback readiness is often a compliance requirement for change management.

Common Causes

  • Previous deployment artifacts not retained (artifact retention policy too aggressive)
  • Database migration not reversible (destructive schema change)
  • Breaking API change deployed that prevents running the previous version
  • Rollback manually disabled for the environment
  • First deployment to environment (no previous version exists)
  • Deployment history cleared during maintenance
  • Health probe URL not configured for auto-rollback
  • Auto-rollback on failure not enabled

How to Fix

Docker Compose

# Check rollback status for a specific environment
stella env rollback-status <environment-name>

# View deployment history
stella env history <environment-name>

# Configure artifact retention to keep previous versions
services:
  orchestrator:
    environment:
      Release__ArtifactRetention__Count: "5"
      Release__ArtifactRetention__Days: "30"

Configure health probes:

# Set health probe for a production environment
stella env configure <environment-name> --health-probe-url "http://<app>:8080/health"

# Enable auto-rollback on failure
stella env configure <environment-name> --auto-rollback-on-failure

Bare Metal / systemd

# Check rollback blockers
stella env rollback-status <environment-name>

# View deployment history
stella env history <environment-name>

# Configure health probe
stella env configure <environment-name> --health-probe-url "http://localhost:8080/health"

# Enable auto-rollback
stella env configure <environment-name> --auto-rollback-on-failure

Edit /etc/stellaops/orchestrator/appsettings.json:

{
  "Release": {
    "ArtifactRetention": {
      "Count": 5,
      "Days": 30
    }
  }
}

Kubernetes / Helm

# Check rollback status
kubectl exec -it <orchestrator-pod> -- stella env rollback-status <environment-name>

# View deployment history
kubectl exec -it <orchestrator-pod> -- stella env history <environment-name>

Set in Helm values.yaml:

releaseOrchestrator:
  artifactRetention:
    count: 5
    days: 30
  environments:
    production:
      healthProbeUrl: "http://app:8080/health"
      autoRollbackOnFailure: true

Verification

stella doctor run --check check.release.rollback.readiness
  • check.release.active -- failed releases may require rollback
  • check.release.environment.readiness -- environment health affects rollback execution
  • check.release.configuration -- workflow configuration defines rollback behavior