Doctor plugin checks: implement health check classes and documentation

Implement remediation-aware health checks across all Doctor plugin modules (Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment, EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release, Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation, Authority, Core, Cryptography, Database, Docker, Integration, Notify, Observability, Security, ServiceGraph, Sources, Verification). Each check now emits structured remediation metadata (severity, category, runbook links, and fix suggestions) consumed by the Doctor dashboard remediation panel. Also adds: - docs/doctor/articles/ knowledge base for check explanations - Advisory AI search seed and allowlist updates for doctor content - Sprint plan for doctor checks documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:28:00 +02:00
parent fbd24e71de
commit c58a236d70
326 changed files with 18500 additions and 463 deletions
--- a/docs/doctor/articles/environment/environment-drift.md
+++ b/docs/doctor/articles/environment/environment-drift.md
@@ -0,0 +1,86 @@
+---
+checkId: check.environment.drift
+plugin: stellaops.doctor.environment
+severity: warn
+tags: [environment, drift, configuration, consistency]
+---
+# Environment Drift Detection
+
+## What It Checks
+Queries the Release Orchestrator drift report API (`/api/v1/environments/drift`) and compares configuration snapshots across environments. The check requires at least 2 environments to perform comparison. Each drift item carries a severity classification:
+- **Fail** if any drift is classified as `critical` (e.g., security-relevant configuration differences between staging and production)
+- **Warn** if drifts exist but none are critical
+- **Pass** if no configuration drift is detected between environments
+
+Evidence includes the specific configuration keys that drifted and which environments are affected.
+
+## Why It Matters
+Configuration drift between environments undermines the core promise of promotion-based releases: that what you test in staging is what runs in production. Drift can cause subtle behavioral differences that only manifest under production load, making bugs nearly impossible to reproduce. Critical drift in security-related configuration (TLS settings, authentication, network policies) can create compliance violations and security exposures.
+
+## Common Causes
+- Manual configuration changes applied directly to one environment (bypassing the release pipeline)
+- Failed deployment that left partial configuration in one environment
+- Configuration sync job that did not propagate to all environments
+- Environment restored from an outdated backup
+- Intentional per-environment overrides that were not tracked as accepted exceptions
+
+## How to Fix
+
+### Docker Compose
+```bash
+# View the current drift report
+stella env drift show
+
+# Compare specific configuration between environments
+diff <(docker exec stellaops-staging cat /app/appsettings.json) \
+     <(docker exec stellaops-prod cat /app/appsettings.json)
+
+# Reconcile by redeploying from the canonical source
+docker compose -f docker-compose.stella-ops.yml up -d --force-recreate <service>
+
+# If drift is intentional, mark it as accepted
+stella env drift accept <config-key>
+```
+
+### Bare Metal / systemd
+```bash
+# View drift report
+stella env drift show
+
+# Compare config files between environments
+diff /etc/stellaops/staging/appsettings.json /etc/stellaops/prod/appsettings.json
+
+# Reconcile by copying from source of truth
+sudo cp /etc/stellaops/staging/appsettings.json /etc/stellaops/prod/appsettings.json
+sudo systemctl restart stellaops-<service>
+
+# Or accept drift as intentional
+stella env drift accept <config-key>
+```
+
+### Kubernetes / Helm
+```bash
+# View drift between environments
+stella env drift show
+
+# Compare Helm values between environments
+diff <(helm get values stellaops -n stellaops-staging -o yaml) \
+     <(helm get values stellaops -n stellaops-prod -o yaml)
+
+# Reconcile by redeploying with consistent values
+helm upgrade stellaops stellaops/stellaops -n stellaops-prod \
+  -f values-prod.yaml
+
+# Compare ConfigMaps
+kubectl diff -f configmap.yaml -n stellaops-prod
+```
+
+## Verification
+```bash
+stella doctor run --check check.environment.drift
+```
+
+## Related Checks
+- `check.environment.deployments` - drift can cause service failures after redeployment
+- `check.environment.secrets` - secret configuration differences between environments
+- `check.environment.network.policy` - network policy drift is a security concern