Files
git.stella-ops.org/docs/doctor/articles/agent/certificate-expiry.md
master c58a236d70 Doctor plugin checks: implement health check classes and documentation
Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:28:00 +02:00

3.4 KiB

checkId, plugin, severity, tags
checkId plugin severity tags
check.agent.certificate.expiry stellaops.doctor.agent fail
agent
certificate
security
quick

Agent Certificate Expiry

What It Checks

Inspects the CertificateExpiresAt field on every non-revoked, non-inactive agent and classifies each into one of four buckets:

  1. Expired -- CertificateExpiresAt is in the past. Result: Fail.
  2. Critical -- certificate expires within 1 day (24 hours). Result: Fail.
  3. Warning -- certificate expires within 7 days. Result: Warn.
  4. Healthy -- certificate has more than 7 days remaining. Result: Pass.

The check short-circuits to the most severe bucket found. Evidence includes per-agent names with time-since-expiry or time-until-expiry, plus counts of TotalActive, Expired, Critical, and Warning agents.

Agents whose CertificateExpiresAt is null or default are silently skipped (certificate info not available). If no active agents exist the check is skipped entirely.

Why It Matters

Agent mTLS certificates authenticate the agent to the orchestrator. An expired certificate causes the agent to fail heartbeats, reject task assignments, and drop out of the fleet. In production this means deployments and scans silently stop being dispatched to that agent, potentially leaving environments unserviced.

Common Causes

  • Certificate auto-renewal is disabled on the agent
  • Agent was offline when renewal was due (missed the renewal window)
  • Certificate authority is unreachable from the agent host
  • Agent bootstrap was incomplete (certificate provisioned but auto-renewal not configured)
  • Certificate renewal threshold not yet reached (warning-level)
  • Certificate authority rate limiting prevented renewal (critical-level)

How to Fix

Docker Compose

# Check certificate expiry for agent containers
docker compose -f devops/compose/docker-compose.stella-ops.yml exec agent \
  stella agent health --show-cert

# Force certificate renewal
docker compose -f devops/compose/docker-compose.stella-ops.yml exec agent \
  stella agent renew-cert --force

# Verify auto-renewal configuration
docker compose -f devops/compose/docker-compose.stella-ops.yml exec agent \
  stella agent config show | grep auto_renew

Bare Metal / systemd

# Force certificate renewal on an affected agent
stella agent renew-cert --agent-id <agent-id> --force

# If agent is unreachable, re-bootstrap
stella agent bootstrap --name <agent-name> --env <environment>

# Verify auto-renewal is enabled
stella agent config --agent-id <agent-id> | grep auto_renew

# Check agent logs for renewal failures
stella agent logs --agent-id <agent-id> --level warn

Kubernetes / Helm

# Check cert expiry across agent pods
kubectl exec -it deploy/stellaops-agent -n stellaops -- \
  stella agent health --show-cert

# Force renewal via pod exec
kubectl exec -it deploy/stellaops-agent -n stellaops -- \
  stella agent renew-cert --force

# If using cert-manager, check Certificate resource
kubectl get certificate -n stellaops
kubectl describe certificate stellaops-agent-tls -n stellaops

Verification

stella doctor run --check check.agent.certificate.expiry
  • check.agent.certificate.validity -- verifies certificate chain of trust (not just expiry)
  • check.agent.heartbeat.freshness -- expired certs cause heartbeat failures
  • check.agent.stale -- agents with expired certs often show as stale