Doctor plugin checks: implement health check classes and documentation

Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
master
2026-03-27 12:28:00 +02:00
parent fbd24e71de
commit c58a236d70
326 changed files with 18500 additions and 463 deletions

View File

@@ -0,0 +1,89 @@
---
checkId: check.integration.secrets.manager
plugin: stellaops.doctor.integration
severity: fail
tags: [integration, secrets, vault, security, keyvault]
---
# Secrets Manager Connectivity
## What It Checks
Iterates over all secrets managers defined under `Secrets:Managers` (or the legacy `Secrets:Vault:Url` / `Vault:Url` single-manager key). For each manager it sends an HTTP GET to a type-specific health endpoint: Vault uses `/v1/sys/health?standbyok=true&sealedcode=200&uninitcode=200`, Azure Key Vault uses `/healthstatus`, and others use `/health`. Sets the appropriate auth header (`X-Vault-Token` for Vault, `Bearer` for others). Records reachability, authentication success, and latency. For Vault, parses the response JSON for `sealed`, `initialized`, and `version` fields. The check **fails** if any manager is unreachable or returns 401/403, **fails** if any Vault instance is sealed, and **passes** if all managers are healthy and unsealed.
## Why It Matters
Secrets managers store registry credentials, signing keys, API tokens, and encryption keys. If a secrets manager is unreachable, Stella Ops cannot retrieve credentials for deployments, cannot sign attestations, and cannot decrypt sensitive configuration. A sealed Vault is equally critical: all secret reads fail until it is manually unsealed. This is a hard blocker for any release operation.
## Common Causes
- Secrets manager service is down or restarting
- Network connectivity issue between Stella Ops and the secrets manager
- Authentication token has expired or been revoked
- TLS certificate issue (expired, untrusted CA)
- Vault was restarted and needs manual unseal
- Vault auto-seal triggered due to HSM connectivity loss
## How to Fix
### Docker Compose
```bash
# Check secrets manager configuration
grep 'SECRETS__\|VAULT__' .env
# Test Vault health
docker compose exec gateway curl -sv \
http://vault:8200/v1/sys/health
# Unseal Vault if sealed
docker compose exec vault vault operator unseal <key1>
docker compose exec vault vault operator unseal <key2>
docker compose exec vault vault operator unseal <key3>
# Refresh Vault token
docker compose exec vault vault token create -policy=stellaops
echo 'Secrets__Managers__0__Token=<new-token>' >> .env
docker compose restart platform
```
### Bare Metal / systemd
```bash
# Check Vault status
vault status
# Unseal if needed
vault operator unseal
# Renew the Vault token
vault token renew
# Check Azure Key Vault health
curl -v https://myvault.vault.azure.net/healthstatus
# Update configuration
sudo nano /etc/stellaops/appsettings.Production.json
sudo systemctl restart stellaops-platform
```
### Kubernetes / Helm
```yaml
# values.yaml
secrets:
managers:
- name: vault-prod
url: http://vault.vault.svc.cluster.local:8200
type: vault
existingSecret: stellaops-vault-token
```
```bash
# Update Vault token secret
kubectl create secret generic stellaops-vault-token \
--from-literal=token=<new-token> \
--dry-run=client -o yaml | kubectl apply -f -
helm upgrade stellaops ./chart -f values.yaml
```
## Verification
```
stella doctor run --check check.integration.secrets.manager
```
## Related Checks
- `check.integration.oci.credentials` -- registry credentials that may be sourced from the secrets manager