Files
git.stella-ops.org/docs/operations/runbooks/orchestrator-quota-exceeded.md

4.1 KiB

Runbook: Release Orchestrator - Promotion Quota Exhausted

Sprint: SPRINT_20260117_029_DOCS_runbook_coverage Task: RUN-004 - Release Orchestrator Runbooks

Metadata

Field Value
Component Release Orchestrator
Severity Medium
On-call scope Platform team, Release team
Last updated 2026-01-17
Doctor check check.orchestrator.quota-status

Symptoms

  • Promotions failing with "quota exceeded"
  • Alert OrchestratorQuotaExceeded firing
  • Error: "promotion rate limit reached" or "daily quota exhausted"
  • New promotions being rejected
  • Queued promotions not processing

Impact

Impact Type Description
User-facing New releases blocked until quota resets or increases
Data integrity No data loss; promotions queued for later
SLA impact Release frequency SLO may be violated

Diagnosis

Quick checks

  1. Check Doctor diagnostics:

    stella doctor --check check.orchestrator.quota-status
    
  2. Check current quota usage:

    stella orch quota status
    
  3. Check quota limits:

    stella orch quota limits show
    

Deep diagnosis

  1. Check promotion history:

    stella promotion list --last 24h --count
    

    Look for: Unusual spike in promotions

  2. Check per-environment quotas:

    stella orch quota status --by-environment
    
  3. Check for runaway automation:

    stella promotion list --last 1h --by-actor
    

    Problem if: Single actor/service making many promotions

  4. Check when quota resets:

    stella orch quota reset-time
    

Resolution

Immediate mitigation

  1. Request temporary quota increase:

    stella orch quota request-increase --amount 50 --reason "Release deadline"
    
  2. Prioritize critical promotions:

    stella promotion priority set <promotion-id> high
    
  3. Cancel unnecessary queued promotions:

    stella promotion list --status queued
    stella promotion cancel <promotion-id>
    

Root cause fix

If legitimate high volume:

  1. Increase quota limits:

    stella orch quota limits set --daily 200 --hourly 50
    
  2. Increase per-environment limits:

    stella orch quota limits set --env production --daily 50
    

If runaway automation:

  1. Identify the source:

    stella promotion list --last 1h --by-actor --verbose
    
  2. Revoke or rate-limit the service account:

    stella auth rate-limit set <service-account> --promotions-per-hour 10
    
  3. Fix the automation bug

If promotion retries causing spike:

  1. Check for failing promotions causing retries:

    stella promotion list --status failed --last 24h
    
  2. Fix underlying promotion failures (see other runbooks)

  3. Configure retry limits:

    stella orch config set promotion.max_retries 3
    stella orch config set promotion.retry_backoff 5m
    

If quota too restrictive for workload:

  1. Analyze actual promotion patterns:

    stella orch quota analyze --last 30d
    
  2. Adjust quotas based on analysis:

    stella orch quota limits set --daily <recommended>
    

Verification

# Check quota status
stella orch quota status

# Verify promotions processing
stella promotion list --status in_progress

# Test new promotion
stella promotion create --test --dry-run

# Check no quota errors
stella orch logs --filter "quota" --level error --last 30m

Prevention

  • Monitoring: Alert at 80% quota usage
  • Limits: Set appropriate quotas based on team size and release frequency
  • Automation: Implement rate limiting in CI/CD pipelines
  • Review: Regularly review and adjust quotas based on usage patterns

  • Architecture: docs/modules/release-orchestrator/quotas.md
  • Related runbooks: orchestrator-promotion-stuck.md
  • Quota management: docs/operations/quota-management.md