# Runbook: Release Orchestrator - Promotion Job Not Progressing > **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage > **Task:** RUN-004 - Release Orchestrator Runbooks ## Metadata | Field | Value | |-------|-------| | **Component** | Release Orchestrator | | **Severity** | Critical | | **On-call scope** | Platform team, Release team | | **Last updated** | 2026-01-17 | | **Doctor check** | `check.orchestrator.job-health` | --- ## Symptoms - [ ] Promotion job stuck in "in_progress" state for >10 minutes - [ ] No progress updates in promotion timeline - [ ] Alert `OrchestratorPromotionStuck` firing - [ ] UI shows promotion spinner indefinitely - [ ] Downstream environment not receiving promoted artifact --- ## Impact | Impact Type | Description | |-------------|-------------| | **User-facing** | Release blocked, cannot promote to target environment | | **Data integrity** | Artifact is safe; promotion can be retried | | **SLA impact** | Release SLO violated if not resolved within 30 minutes | --- ## Diagnosis ### Quick checks 1. **Check Doctor diagnostics:** ```bash stella doctor --check check.orchestrator.job-health ``` 2. **Check promotion status:** ```bash stella promotion status ``` Look for: Current step, last update time, any error messages 3. **Check orchestrator service:** ```bash stella orch status ``` ### Deep diagnosis 1. **Get detailed promotion trace:** ```bash stella promotion trace --verbose ``` Look for: Which step is stuck, any timeouts 2. **Check gate evaluation status:** ```bash stella promotion gates ``` Problem if: Gate stuck waiting for external service 3. **Check target environment connectivity:** ```bash stella orch connectivity --target ``` 4. **Check for lock contention:** ```bash stella orch locks list ``` Problem if: Stale locks on the artifact or environment --- ## Resolution ### Immediate mitigation 1. **If gate is stuck waiting for external service:** ```bash # Skip the stuck gate (requires approval) stella promotion gate skip --reason "External service timeout" ``` 2. **If lock is stale:** ```bash # Release the lock (use with caution) stella orch locks release --force ``` 3. **If orchestrator is unresponsive:** ```bash stella service restart orchestrator ``` ### Root cause fix **If external gate service is slow:** 1. Increase gate timeout: ```bash stella orch config set gates..timeout 5m ``` 2. Configure gate retry: ```bash stella orch config set gates..retries 3 ``` **If target environment is unreachable:** 1. Check network connectivity to target 2. Verify credentials for target environment: ```bash stella orch credentials verify --target ``` **If database lock contention:** 1. Increase lock timeout: ```bash stella orch config set locks.timeout 60s ``` 2. Enable optimistic locking: ```bash stella orch config set locks.mode optimistic ``` ### Verification ```bash # Check promotion completed stella promotion status # Verify artifact in target environment stella orch artifacts list --env --filter # Check no stuck promotions stella promotion list --status in_progress --older-than 5m ``` --- ## Prevention - [ ] **Timeouts:** Configure appropriate timeouts for all gates - [ ] **Monitoring:** Alert on promotions stuck > 10 minutes - [ ] **Health checks:** Enable connectivity pre-checks before promotion - [ ] **Documentation:** Document SLAs for external gate services --- ## Related Resources - **Architecture:** `docs/modules/release-orchestrator/architecture.md` - **Related runbooks:** `orchestrator-gate-timeout.md`, `orchestrator-evidence-missing.md` - **Dashboard:** Grafana > Stella Ops > Release Orchestrator