Implement remediation-aware health checks across all Doctor plugin modules (Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment, EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release, Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation, Authority, Core, Cryptography, Database, Docker, Integration, Notify, Observability, Security, ServiceGraph, Sources, Verification). Each check now emits structured remediation metadata (severity, category, runbook links, and fix suggestions) consumed by the Doctor dashboard remediation panel. Also adds: - docs/doctor/articles/ knowledge base for check explanations - Advisory AI search seed and allowlist updates for doctor content - Sprint plan for doctor checks documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3.6 KiB
3.6 KiB
checkId, plugin, severity, tags
| checkId | plugin | severity | tags | ||||
|---|---|---|---|---|---|---|---|
| check.environment.connectivity | stellaops.doctor.environment | warn |
|
Environment Connectivity
What It Checks
Retrieves the list of environments from the Release Orchestrator (/api/v1/environments), then probes each environment agent's /health endpoint. For each agent the check measures:
- Reachability -- whether the health endpoint returns a success status code
- Latency -- fails warn if response takes more than 500ms
- TLS certificate validity -- warns if the agent's TLS certificate expires within 30 days
- Authentication -- detects 401/403 responses indicating credential issues
If any agent is unreachable, the check fails. High latency or expiring certificates produce a warn.
Why It Matters
Environment agents are the control surface through which Stella Ops manages deployments, collects telemetry, and enforces policy. An unreachable agent means the platform cannot deploy to, monitor, or roll back services in that environment. TLS certificate expiry causes hard connectivity failures with no graceful degradation. High latency slows deployment pipelines and can cause timeouts in approval workflows.
Common Causes
- Environment agent service is stopped or crashed
- Firewall rule change blocking the agent port
- Network partition between Stella Ops control plane and target environment
- TLS certificate not renewed before expiry
- Agent authentication credentials rotated without updating Stella Ops configuration
- DNS resolution failure for the agent hostname
How to Fix
Docker Compose
# Check if the environment agent container is running
docker ps --filter "name=environment-agent"
# View agent logs for errors
docker logs stellaops-environment-agent --tail 100
# Restart the agent
docker compose -f docker-compose.stella-ops.yml restart environment-agent
# If TLS cert is expiring, replace the certificate files
# mounted into the agent container and restart
cp /path/to/new/cert.pem devops/compose/certs/agent.pem
cp /path/to/new/key.pem devops/compose/certs/agent-key.pem
docker compose -f docker-compose.stella-ops.yml restart environment-agent
Bare Metal / systemd
# Check agent service status
sudo systemctl status stellaops-environment-agent
# View logs
sudo journalctl -u stellaops-environment-agent --since "1 hour ago"
# Restart agent
sudo systemctl restart stellaops-environment-agent
# Renew TLS certificate
sudo cp /path/to/new/cert.pem /etc/stellaops/certs/agent.pem
sudo cp /path/to/new/key.pem /etc/stellaops/certs/agent-key.pem
sudo systemctl restart stellaops-environment-agent
# Test network connectivity from control plane
curl -v https://<agent-host>:<agent-port>/health
Kubernetes / Helm
# Check agent pod status
kubectl get pods -n stellaops -l app=environment-agent
# View agent logs
kubectl logs -n stellaops -l app=environment-agent --tail=100
# Restart agent pods
kubectl rollout restart deployment/environment-agent -n stellaops
# Renew TLS certificate via cert-manager or manual secret update
kubectl create secret tls agent-tls \
--cert=/path/to/cert.pem \
--key=/path/to/key.pem \
-n stellaops --dry-run=client -o yaml | kubectl apply -f -
# Check network policies
kubectl get networkpolicies -n stellaops
Verification
stella doctor run --check check.environment.connectivity
Related Checks
check.environment.deployments- checks health of services deployed via agentscheck.environment.network.policy- verifies network policies that may block agent connectivitycheck.environment.secrets- agent credentials may need rotation