4.2 KiB
Runbook: Release Orchestrator - Rollback Operation Failed
Sprint: SPRINT_20260117_029_DOCS_runbook_coverage Task: RUN-004 - Release Orchestrator Runbooks
Metadata
| Field | Value |
|---|---|
| Component | Release Orchestrator |
| Severity | Critical |
| On-call scope | Platform team, Release team |
| Last updated | 2026-01-17 |
| Doctor check | check.orchestrator.rollback-health |
Symptoms
- Rollback operation failing or stuck
- Alert
OrchestratorRollbackFailedfiring - Error: "rollback failed" or "cannot restore previous version"
- Target environment in inconsistent state
- Previous artifact not available for deployment
Impact
| Impact Type | Description |
|---|---|
| User-facing | Rollback blocked; potentially broken release in production |
| Data integrity | Environment may be in partial rollback state |
| SLA impact | Incident resolution blocked; extended outage |
Diagnosis
Quick checks
-
Check Doctor diagnostics:
stella doctor --check check.orchestrator.rollback-health -
Check rollback status:
stella rollback status <rollback-id> -
Check previous deployment history:
stella orch deployments list --env <env-name> --last 10
Deep diagnosis
-
Check why rollback failed:
stella rollback trace <rollback-id> --verboseLook for: Which step failed, error message
-
Check previous artifact availability:
stella orch artifacts get <previous-digest> --checkProblem if: Artifact deleted, not in registry
-
Check environment state:
stella orch env status <env-name> --detailed -
Check for deployment locks:
stella orch locks list --env <env-name>
Resolution
Immediate mitigation
-
Force release lock if stuck:
stella orch locks release --env <env-name> --force -
Manual rollback using specific artifact:
stella deploy --env <env-name> --artifact <previous-digest> --force -
If artifact unavailable, deploy last known good:
stella orch deployments list --env <env-name> --status success stella deploy --env <env-name> --artifact <last-good-digest>
Root cause fix
If previous artifact not in registry:
-
Check artifact retention policy:
stella registry retention show -
Restore from backup registry:
stella registry restore --artifact <digest> --from backup -
Increase artifact retention:
stella registry retention set --min-versions 10
If deployment service unavailable:
-
Check deployment target connectivity:
stella orch connectivity --target <env-name> -
Check deployment agent status:
stella orch agent status --env <env-name>
If configuration drift:
-
Check environment configuration:
stella orch env config diff <env-name> -
Reset environment to known state:
stella orch env reset <env-name> --to-baseline
If database state inconsistent:
-
Check orchestrator database:
stella orch db verify -
Repair deployment state:
stella orch repair --deployment <deployment-id>
Verification
# Verify rollback completed
stella rollback status <rollback-id>
# Verify environment state
stella orch env status <env-name>
# Verify correct version deployed
stella orch deployments current --env <env-name>
# Health check the environment
stella orch health-check --env <env-name>
Prevention
- Retention: Maintain at least 5 previous versions in registry
- Testing: Test rollback procedure in staging regularly
- Monitoring: Alert on rollback failures immediately
- Documentation: Document manual rollback procedures per environment
Related Resources
- Architecture:
docs/modules/release-orchestrator/rollback.md - Related runbooks:
orchestrator-promotion-stuck.md,orchestrator-evidence-missing.md - Rollback procedures:
docs/operations/rollback-procedures.md