synergy moats product advisory implementations

This commit is contained in:
master
2026-01-17 01:30:03 +02:00
parent 77ff029205
commit 702a27ac83
112 changed files with 21356 additions and 127 deletions

View File

@@ -0,0 +1,189 @@
# Runbook: Release Orchestrator - Promotion Quota Exhausted
> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
> **Task:** RUN-004 - Release Orchestrator Runbooks
## Metadata
| Field | Value |
|-------|-------|
| **Component** | Release Orchestrator |
| **Severity** | Medium |
| **On-call scope** | Platform team, Release team |
| **Last updated** | 2026-01-17 |
| **Doctor check** | `check.orchestrator.quota-status` |
---
## Symptoms
- [ ] Promotions failing with "quota exceeded"
- [ ] Alert `OrchestratorQuotaExceeded` firing
- [ ] Error: "promotion rate limit reached" or "daily quota exhausted"
- [ ] New promotions being rejected
- [ ] Queued promotions not processing
---
## Impact
| Impact Type | Description |
|-------------|-------------|
| **User-facing** | New releases blocked until quota resets or increases |
| **Data integrity** | No data loss; promotions queued for later |
| **SLA impact** | Release frequency SLO may be violated |
---
## Diagnosis
### Quick checks
1. **Check Doctor diagnostics:**
```bash
stella doctor --check check.orchestrator.quota-status
```
2. **Check current quota usage:**
```bash
stella orch quota status
```
3. **Check quota limits:**
```bash
stella orch quota limits show
```
### Deep diagnosis
1. **Check promotion history:**
```bash
stella promotion list --last 24h --count
```
Look for: Unusual spike in promotions
2. **Check per-environment quotas:**
```bash
stella orch quota status --by-environment
```
3. **Check for runaway automation:**
```bash
stella promotion list --last 1h --by-actor
```
Problem if: Single actor/service making many promotions
4. **Check when quota resets:**
```bash
stella orch quota reset-time
```
---
## Resolution
### Immediate mitigation
1. **Request temporary quota increase:**
```bash
stella orch quota request-increase --amount 50 --reason "Release deadline"
```
2. **Prioritize critical promotions:**
```bash
stella promotion priority set <promotion-id> high
```
3. **Cancel unnecessary queued promotions:**
```bash
stella promotion list --status queued
stella promotion cancel <promotion-id>
```
### Root cause fix
**If legitimate high volume:**
1. Increase quota limits:
```bash
stella orch quota limits set --daily 200 --hourly 50
```
2. Increase per-environment limits:
```bash
stella orch quota limits set --env production --daily 50
```
**If runaway automation:**
1. Identify the source:
```bash
stella promotion list --last 1h --by-actor --verbose
```
2. Revoke or rate-limit the service account:
```bash
stella auth rate-limit set <service-account> --promotions-per-hour 10
```
3. Fix the automation bug
**If promotion retries causing spike:**
1. Check for failing promotions causing retries:
```bash
stella promotion list --status failed --last 24h
```
2. Fix underlying promotion failures (see other runbooks)
3. Configure retry limits:
```bash
stella orch config set promotion.max_retries 3
stella orch config set promotion.retry_backoff 5m
```
**If quota too restrictive for workload:**
1. Analyze actual promotion patterns:
```bash
stella orch quota analyze --last 30d
```
2. Adjust quotas based on analysis:
```bash
stella orch quota limits set --daily <recommended>
```
### Verification
```bash
# Check quota status
stella orch quota status
# Verify promotions processing
stella promotion list --status in_progress
# Test new promotion
stella promotion create --test --dry-run
# Check no quota errors
stella orch logs --filter "quota" --level error --last 30m
```
---
## Prevention
- [ ] **Monitoring:** Alert at 80% quota usage
- [ ] **Limits:** Set appropriate quotas based on team size and release frequency
- [ ] **Automation:** Implement rate limiting in CI/CD pipelines
- [ ] **Review:** Regularly review and adjust quotas based on usage patterns
---
## Related Resources
- **Architecture:** `docs/modules/release-orchestrator/quotas.md`
- **Related runbooks:** `orchestrator-promotion-stuck.md`
- **Quota management:** `docs/operations/quota-management.md`