Files
git.stella-ops.org/docs/doctor/articles/environment/environment-connectivity.md
master c58a236d70 Doctor plugin checks: implement health check classes and documentation
Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:28:00 +02:00

3.6 KiB

checkId, plugin, severity, tags
checkId plugin severity tags
check.environment.connectivity stellaops.doctor.environment warn
environment
connectivity
agent
network

Environment Connectivity

What It Checks

Retrieves the list of environments from the Release Orchestrator (/api/v1/environments), then probes each environment agent's /health endpoint. For each agent the check measures:

  • Reachability -- whether the health endpoint returns a success status code
  • Latency -- fails warn if response takes more than 500ms
  • TLS certificate validity -- warns if the agent's TLS certificate expires within 30 days
  • Authentication -- detects 401/403 responses indicating credential issues

If any agent is unreachable, the check fails. High latency or expiring certificates produce a warn.

Why It Matters

Environment agents are the control surface through which Stella Ops manages deployments, collects telemetry, and enforces policy. An unreachable agent means the platform cannot deploy to, monitor, or roll back services in that environment. TLS certificate expiry causes hard connectivity failures with no graceful degradation. High latency slows deployment pipelines and can cause timeouts in approval workflows.

Common Causes

  • Environment agent service is stopped or crashed
  • Firewall rule change blocking the agent port
  • Network partition between Stella Ops control plane and target environment
  • TLS certificate not renewed before expiry
  • Agent authentication credentials rotated without updating Stella Ops configuration
  • DNS resolution failure for the agent hostname

How to Fix

Docker Compose

# Check if the environment agent container is running
docker ps --filter "name=environment-agent"

# View agent logs for errors
docker logs stellaops-environment-agent --tail 100

# Restart the agent
docker compose -f docker-compose.stella-ops.yml restart environment-agent

# If TLS cert is expiring, replace the certificate files
# mounted into the agent container and restart
cp /path/to/new/cert.pem devops/compose/certs/agent.pem
cp /path/to/new/key.pem devops/compose/certs/agent-key.pem
docker compose -f docker-compose.stella-ops.yml restart environment-agent

Bare Metal / systemd

# Check agent service status
sudo systemctl status stellaops-environment-agent

# View logs
sudo journalctl -u stellaops-environment-agent --since "1 hour ago"

# Restart agent
sudo systemctl restart stellaops-environment-agent

# Renew TLS certificate
sudo cp /path/to/new/cert.pem /etc/stellaops/certs/agent.pem
sudo cp /path/to/new/key.pem /etc/stellaops/certs/agent-key.pem
sudo systemctl restart stellaops-environment-agent

# Test network connectivity from control plane
curl -v https://<agent-host>:<agent-port>/health

Kubernetes / Helm

# Check agent pod status
kubectl get pods -n stellaops -l app=environment-agent

# View agent logs
kubectl logs -n stellaops -l app=environment-agent --tail=100

# Restart agent pods
kubectl rollout restart deployment/environment-agent -n stellaops

# Renew TLS certificate via cert-manager or manual secret update
kubectl create secret tls agent-tls \
  --cert=/path/to/cert.pem \
  --key=/path/to/key.pem \
  -n stellaops --dry-run=client -o yaml | kubectl apply -f -

# Check network policies
kubectl get networkpolicies -n stellaops

Verification

stella doctor run --check check.environment.connectivity
  • check.environment.deployments - checks health of services deployed via agents
  • check.environment.network.policy - verifies network policies that may block agent connectivity
  • check.environment.secrets - agent credentials may need rotation