---
checkId: check.environment.deployments
plugin: stellaops.doctor.environment
severity: warn
tags: [environment, deployment, services, health]
---
# Environment Deployment Health

## What It Checks
Queries the Release Orchestrator (`/api/v1/environments/deployments`) for all deployed services across all environments. Each service is evaluated for:
- **Status** -- `failed`, `stopped`, `degraded`, or healthy
- **Replica health** -- compares `healthyReplicas` against total `replicas`; partial health triggers degraded status

Severity escalation:
- **Fail** if any production service has status `failed` (production detected by environment name containing "prod")
- **Fail** if any non-production service has status `failed`
- **Warn** if services are `degraded` (partial replica health)
- **Warn** if services are `stopped`
- **Pass** if all services are healthy

## Why It Matters
Failed services in production directly impact end users and violate SLA commitments. Degraded services with partial replica health reduce fault tolerance and can cascade into full outages under load. Stopped services may indicate incomplete deployments or maintenance windows that were never closed. This check provides the earliest signal that a deployment rollout needs intervention.

## Common Causes
- Service crashed due to unhandled exception or OOM kill
- Deployment rolled out a bad image version
- Dependency (database, cache, message broker) became unavailable
- Resource exhaustion preventing replicas from starting
- Health check endpoint misconfigured, causing false failures
- Node failure taking down co-located replicas

## How to Fix

### Docker Compose
```bash
# Identify failed containers
docker ps -a --filter "status=exited" --filter "status=dead"

# View logs for the failed service
docker logs <container-name> --tail 200

# Restart the failed service
docker compose -f docker-compose.stella-ops.yml restart <service-name>

# If the image is bad, roll back to previous version
# Edit docker-compose.stella-ops.yml to pin the previous image tag
docker compose -f docker-compose.stella-ops.yml up -d <service-name>
```

### Bare Metal / systemd
```bash
# Check service status
sudo systemctl status stellaops-<service-name>

# View logs for crash details
sudo journalctl -u stellaops-<service-name> --since "30 minutes ago" --no-pager

# Restart the service
sudo systemctl restart stellaops-<service-name>

# Roll back to previous binary
sudo cp /opt/stellaops/backup/<service-name> /opt/stellaops/bin/<service-name>
sudo systemctl restart stellaops-<service-name>
```

### Kubernetes / Helm
```bash
# Check pod status across environments
kubectl get pods -n stellaops-<env> --field-selector=status.phase!=Running

# View events and logs for failing pods
kubectl describe pod <pod-name> -n stellaops-<env>
kubectl logs <pod-name> -n stellaops-<env> --previous

# Rollback a deployment
kubectl rollout undo deployment/<service-name> -n stellaops-<env>

# Or via Helm
helm rollback stellaops <previous-revision> -n stellaops-<env>
```

## Verification
```bash
stella doctor run --check check.environment.deployments
```

## Related Checks
- `check.environment.capacity` - resource exhaustion can cause deployment failures
- `check.environment.connectivity` - agent must be reachable to report deployment health
- `check.environment.drift` - configuration drift can cause services to fail after redeployment