--- checkId: check.environment.capacity plugin: stellaops.doctor.environment severity: warn tags: [environment, capacity, resources, cpu, memory, storage] --- # Environment Capacity ## What It Checks Queries the Release Orchestrator API (`/api/v1/environments/capacity`) and evaluates CPU, memory, storage, and deployment slot usage for every configured environment. Each resource is compared against two thresholds: - **Warn** when usage >= 75% - **Fail** when usage >= 90% Deployment slot utilization is calculated as `activeDeployments / maxConcurrentDeployments * 100`. If no environments exist, the check passes with a note. If the orchestrator is unreachable, the check returns warn. ## Why It Matters Resource exhaustion in a target environment blocks deployments and can cause running services to crash or degrade. Detecting capacity pressure early gives operators time to scale up, clean up unused deployments, or redistribute workloads before an outage occurs. In production environments, exceeding 90% on any resource dimension is a leading indicator of imminent service disruption. ## Common Causes - Gradual organic growth without corresponding resource scaling - Runaway or leaked processes consuming CPU/memory - Accumulated old deployments that were never cleaned up - Resource limits set too tightly relative to actual workload - Unexpected traffic spike or batch job saturating storage ## How to Fix ### Docker Compose ```bash # Check current resource usage on the host docker stats --no-stream # Increase resource limits in docker-compose.stella-ops.yml # Edit the target service under deploy.resources.limits: # cpus: '4.0' # memory: 8G # Remove stopped containers to free deployment slots docker container prune -f # Restart with updated limits docker compose -f docker-compose.stella-ops.yml up -d ``` ### Bare Metal / systemd ```bash # Check system resource usage free -h && df -h && top -bn1 | head -20 # Increase memory/CPU limits in systemd unit overrides sudo systemctl edit stellaops-environment-agent.service # Add under [Service]: # MemoryMax=8G # CPUQuota=400% sudo systemctl daemon-reload && sudo systemctl restart stellaops-environment-agent.service # Clean up old deployments stella env cleanup ``` ### Kubernetes / Helm ```bash # Check node resource usage kubectl top nodes kubectl top pods -n stellaops # Scale up resources via Helm values helm upgrade stellaops stellaops/stellaops \ --set environments.resources.limits.cpu=4 \ --set environments.resources.limits.memory=8Gi \ --set environments.maxConcurrentDeployments=20 # Or add more nodes to the cluster for horizontal scaling ``` ## Verification ```bash stella doctor run --check check.environment.capacity ``` ## Related Checks - `check.environment.deployments` - checks deployed service health, which may degrade under capacity pressure - `check.environment.connectivity` - verifies agents are reachable, which capacity exhaustion can prevent