4.1 KiB
Runbook: Release Orchestrator - Promotion Quota Exhausted
Sprint: SPRINT_20260117_029_DOCS_runbook_coverage Task: RUN-004 - Release Orchestrator Runbooks
Metadata
| Field | Value |
|---|---|
| Component | Release Orchestrator |
| Severity | Medium |
| On-call scope | Platform team, Release team |
| Last updated | 2026-01-17 |
| Doctor check | check.orchestrator.quota-status |
Symptoms
- Promotions failing with "quota exceeded"
- Alert
OrchestratorQuotaExceededfiring - Error: "promotion rate limit reached" or "daily quota exhausted"
- New promotions being rejected
- Queued promotions not processing
Impact
| Impact Type | Description |
|---|---|
| User-facing | New releases blocked until quota resets or increases |
| Data integrity | No data loss; promotions queued for later |
| SLA impact | Release frequency SLO may be violated |
Diagnosis
Quick checks
-
Check Doctor diagnostics:
stella doctor --check check.orchestrator.quota-status -
Check current quota usage:
stella orch quota status -
Check quota limits:
stella orch quota limits show
Deep diagnosis
-
Check promotion history:
stella promotion list --last 24h --countLook for: Unusual spike in promotions
-
Check per-environment quotas:
stella orch quota status --by-environment -
Check for runaway automation:
stella promotion list --last 1h --by-actorProblem if: Single actor/service making many promotions
-
Check when quota resets:
stella orch quota reset-time
Resolution
Immediate mitigation
-
Request temporary quota increase:
stella orch quota request-increase --amount 50 --reason "Release deadline" -
Prioritize critical promotions:
stella promotion priority set <promotion-id> high -
Cancel unnecessary queued promotions:
stella promotion list --status queued stella promotion cancel <promotion-id>
Root cause fix
If legitimate high volume:
-
Increase quota limits:
stella orch quota limits set --daily 200 --hourly 50 -
Increase per-environment limits:
stella orch quota limits set --env production --daily 50
If runaway automation:
-
Identify the source:
stella promotion list --last 1h --by-actor --verbose -
Revoke or rate-limit the service account:
stella auth rate-limit set <service-account> --promotions-per-hour 10 -
Fix the automation bug
If promotion retries causing spike:
-
Check for failing promotions causing retries:
stella promotion list --status failed --last 24h -
Fix underlying promotion failures (see other runbooks)
-
Configure retry limits:
stella orch config set promotion.max_retries 3 stella orch config set promotion.retry_backoff 5m
If quota too restrictive for workload:
-
Analyze actual promotion patterns:
stella orch quota analyze --last 30d -
Adjust quotas based on analysis:
stella orch quota limits set --daily <recommended>
Verification
# Check quota status
stella orch quota status
# Verify promotions processing
stella promotion list --status in_progress
# Test new promotion
stella promotion create --test --dry-run
# Check no quota errors
stella orch logs --filter "quota" --level error --last 30m
Prevention
- Monitoring: Alert at 80% quota usage
- Limits: Set appropriate quotas based on team size and release frequency
- Automation: Implement rate limiting in CI/CD pipelines
- Review: Regularly review and adjust quotas based on usage patterns
Related Resources
- Architecture:
docs/modules/release-orchestrator/quotas.md - Related runbooks:
orchestrator-promotion-stuck.md - Quota management:
docs/operations/quota-management.md