Doctor plugin checks: implement health check classes and documentation

Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
master
2026-03-27 12:28:00 +02:00
parent fbd24e71de
commit c58a236d70
326 changed files with 18500 additions and 463 deletions

View File

@@ -0,0 +1,100 @@
---
checkId: check.auth.oidc
plugin: stellaops.doctor.auth
severity: warn
tags: [auth, oidc, connectivity]
---
# OIDC Provider Connectivity
## What It Checks
Tests connectivity to an external OIDC provider by performing real HTTP requests. The check reads the issuer URL from configuration keys (in priority order): `Authentication:Oidc:Issuer`, `Auth:Oidc:Authority`, `Oidc:Issuer`. If none is configured, the check passes immediately (local authority mode).
When an external provider is configured, the check performs a multi-step validation:
1. **Fetch discovery document** -- HTTP GET to `{issuerUrl}/.well-known/openid-configuration` with a 10-second timeout. If unreachable: **Fail** with connection error type classification (ssl_error, dns_failure, refused, timeout, connection_failed).
2. **Validate discovery fields** -- Parses the discovery JSON and verifies presence of `authorization_endpoint`, `token_endpoint`, and `jwks_uri`. If any are missing: **Warn** listing the missing fields.
3. **Fetch JWKS** -- HTTP GET to the `jwks_uri` from the discovery document. Counts the number of keys in the `keys` array. If zero keys: **Warn** (token validation may fail).
4. **All healthy** -- provider reachable, discovery valid, JWKS has keys. Result: **Pass**.
Evidence collected: `issuer_url`, `discovery_reachable`, `discovery_response_ms`, `authorization_endpoint_present`, `token_endpoint_present`, `jwks_uri_present`, `jwks_key_count`, `jwks_fetch_ms`, `http_status_code`, `error_message`, `connection_error_type`.
## Why It Matters
When Stella Ops is configured to delegate authentication to an external OIDC provider (Azure AD, Keycloak, Okta, etc.), all user logins and token validations depend on that provider being reachable and correctly configured. A connectivity failure means users cannot log in, and services cannot validate tokens, leading to a platform-wide authentication outage.
## Common Causes
- OIDC provider is down or undergoing maintenance
- Network connectivity issue (proxy misconfiguration, firewall rule change)
- DNS resolution failure for the provider hostname
- Firewall blocking outbound HTTPS to the provider
- Discovery document missing required fields (misconfigured provider)
- Token endpoint misconfigured after provider upgrade
- JWKS endpoint returning empty key set (key rotation in progress)
- OIDC provider rate limiting or returning errors
## How to Fix
### Docker Compose
```bash
# Test OIDC provider connectivity from the authority container
docker compose -f devops/compose/docker-compose.stella-ops.yml exec authority \
curl -s https://<oidc-issuer>/.well-known/openid-configuration | jq .
# Check DNS resolution
docker compose -f devops/compose/docker-compose.stella-ops.yml exec authority \
nslookup <oidc-host>
# Set OIDC configuration via environment
# AUTHENTICATION__OIDC__ISSUER=https://login.microsoftonline.com/<tenant>/v2.0
```
### Bare Metal / systemd
```bash
# Test provider connectivity
curl -s https://<oidc-issuer>/.well-known/openid-configuration | jq .
# Check DNS resolution
nslookup <oidc-host>
# Validate OIDC configuration
stella auth oidc validate
# Check JWKS endpoint
curl -s $(curl -s https://<oidc-issuer>/.well-known/openid-configuration | jq -r .jwks_uri) | jq .
# Check network connectivity
stella doctor run --check check.network.dns
```
### Kubernetes / Helm
```bash
# Test from authority pod
kubectl exec -it deploy/stellaops-authority -n stellaops -- \
curl -s https://<oidc-issuer>/.well-known/openid-configuration | jq .
# Check NetworkPolicy allows egress to OIDC provider
kubectl get networkpolicy -n stellaops -o yaml | grep -A10 egress
# Set OIDC configuration in Helm values
# authority:
# oidc:
# issuer: "https://login.microsoftonline.com/<tenant>/v2.0"
helm upgrade stellaops stellaops/stellaops -f values.yaml
```
## Verification
```
stella doctor run --check check.auth.oidc
```
## Related Checks
- `check.auth.config` -- overall auth configuration health
- `check.auth.signing-key` -- local signing key health (used when not delegating to external OIDC)
- `check.auth.token-service` -- token endpoint availability