Doctor plugin checks: implement health check classes and documentation

Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
master
2026-03-27 12:28:00 +02:00
parent fbd24e71de
commit c58a236d70
326 changed files with 18500 additions and 463 deletions

View File

@@ -0,0 +1,84 @@
---
checkId: check.agent.certificate.validity
plugin: stellaops.doctor.agent
severity: fail
tags: [agent, certificate, security]
---
# Agent Certificate Validity
## What It Checks
Validates the full certificate chain of trust for agent mTLS certificates. The check is designed to verify:
1. Certificate is signed by a trusted CA
2. Certificate chain is complete (no missing intermediates)
3. No revoked certificates in the chain (CRL/OCSP check)
4. Certificate subject matches the agent's registered identity
**Current status:** implementation pending -- the check always returns Pass with a placeholder message. The framework and metadata are wired; the chain-validation logic is not yet connected.
Evidence collected: none yet (pending implementation).
The check requires `IAgentStore` to be registered in DI; otherwise it will not run.
## Why It Matters
A valid certificate expiry date (checked by `check.agent.certificate.expiry`) is necessary but not sufficient. An agent could present a non-expired certificate that was signed by an untrusted CA, has a broken chain, or has been revoked. Any of these conditions would allow an impersonating agent to receive task dispatches or exfiltrate deployment secrets.
## Common Causes
- CA certificate rotated but agent still presents cert signed by old CA
- Intermediate certificate missing from agent's cert bundle
- Certificate revoked via CRL but agent not yet re-provisioned
- Agent identity mismatch after hostname change or migration
## How to Fix
### Docker Compose
```bash
# Inspect agent certificate chain
docker compose -f devops/compose/docker-compose.stella-ops.yml exec agent \
openssl x509 -in /etc/stellaops/agent/tls.crt -text -noout
# Verify chain against CA bundle
docker compose -f devops/compose/docker-compose.stella-ops.yml exec agent \
openssl verify -CAfile /etc/stellaops/ca/ca.crt /etc/stellaops/agent/tls.crt
```
### Bare Metal / systemd
```bash
# Inspect agent certificate
openssl x509 -in /etc/stellaops/agent/tls.crt -text -noout
# Verify certificate chain
openssl verify -CAfile /etc/stellaops/ca/ca.crt -untrusted /etc/stellaops/ca/intermediate.crt \
/etc/stellaops/agent/tls.crt
# Re-bootstrap if chain is broken
stella agent bootstrap --name <agent-name> --env <environment>
```
### Kubernetes / Helm
```bash
# Check certificate in agent pod
kubectl exec -it deploy/stellaops-agent -n stellaops -- \
openssl x509 -in /etc/stellaops/agent/tls.crt -text -noout
# If using cert-manager, check CertificateRequest status
kubectl get certificaterequest -n stellaops
kubectl describe certificaterequest <name> -n stellaops
```
## Verification
```
stella doctor run --check check.agent.certificate.validity
```
## Related Checks
- `check.agent.certificate.expiry` -- checks expiry dates (complementary to chain validation)
- `check.agent.heartbeat.freshness` -- invalid certs prevent heartbeat communication