Implement remediation-aware health checks across all Doctor plugin modules (Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment, EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release, Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation, Authority, Core, Cryptography, Database, Docker, Integration, Notify, Observability, Security, ServiceGraph, Sources, Verification). Each check now emits structured remediation metadata (severity, category, runbook links, and fix suggestions) consumed by the Doctor dashboard remediation panel. Also adds: - docs/doctor/articles/ knowledge base for check explanations - Advisory AI search seed and allowlist updates for doctor content - Sprint plan for doctor checks documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4.0 KiB
checkId, plugin, severity, tags
| checkId | plugin | severity | tags | ||||
|---|---|---|---|---|---|---|---|
| check.release.rollback.readiness | stellaops.doctor.release | warn |
|
Rollback Readiness
What It Checks
Queries the Release Orchestrator at /api/v1/environments/rollback-status (with fallback to /api/v1/environments) and evaluates rollback capability for production environments:
- Cannot rollback: fail if a production environment has a previous version but cannot roll back (e.g., irreversible migration, artifacts purged).
- No previous version: warn if a production environment has no previous deployment to roll back to.
- Missing health probe: warn if a production environment lacks a health probe (prevents auto-rollback on failure).
Only production environments (type "prod" or "production") are evaluated. Non-production environments are not checked.
Evidence collected: prod_environment_count, rollback_ready_count, cannot_rollback_count, no_previous_version_count, no_health_probe_count, cannot_rollback_environments, rollback_blocker.
The check requires ReleaseOrchestrator:Url or Release:Orchestrator:Url to be configured.
Why It Matters
Rollback is the primary recovery mechanism when a production deployment introduces a critical issue. If rollback is unavailable, the only options are an emergency forward-fix or extended downtime. Missing health probes prevent automatic rollback on deployment failure, requiring manual intervention during incidents. In regulated environments, rollback readiness is often a compliance requirement for change management.
Common Causes
- Previous deployment artifacts not retained (artifact retention policy too aggressive)
- Database migration not reversible (destructive schema change)
- Breaking API change deployed that prevents running the previous version
- Rollback manually disabled for the environment
- First deployment to environment (no previous version exists)
- Deployment history cleared during maintenance
- Health probe URL not configured for auto-rollback
- Auto-rollback on failure not enabled
How to Fix
Docker Compose
# Check rollback status for a specific environment
stella env rollback-status <environment-name>
# View deployment history
stella env history <environment-name>
# Configure artifact retention to keep previous versions
services:
orchestrator:
environment:
Release__ArtifactRetention__Count: "5"
Release__ArtifactRetention__Days: "30"
Configure health probes:
# Set health probe for a production environment
stella env configure <environment-name> --health-probe-url "http://<app>:8080/health"
# Enable auto-rollback on failure
stella env configure <environment-name> --auto-rollback-on-failure
Bare Metal / systemd
# Check rollback blockers
stella env rollback-status <environment-name>
# View deployment history
stella env history <environment-name>
# Configure health probe
stella env configure <environment-name> --health-probe-url "http://localhost:8080/health"
# Enable auto-rollback
stella env configure <environment-name> --auto-rollback-on-failure
Edit /etc/stellaops/orchestrator/appsettings.json:
{
"Release": {
"ArtifactRetention": {
"Count": 5,
"Days": 30
}
}
}
Kubernetes / Helm
# Check rollback status
kubectl exec -it <orchestrator-pod> -- stella env rollback-status <environment-name>
# View deployment history
kubectl exec -it <orchestrator-pod> -- stella env history <environment-name>
Set in Helm values.yaml:
releaseOrchestrator:
artifactRetention:
count: 5
days: 30
environments:
production:
healthProbeUrl: "http://app:8080/health"
autoRollbackOnFailure: true
Verification
stella doctor run --check check.release.rollback.readiness
Related Checks
check.release.active-- failed releases may require rollbackcheck.release.environment.readiness-- environment health affects rollback executioncheck.release.configuration-- workflow configuration defines rollback behavior