Implement remediation-aware health checks across all Doctor plugin modules (Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment, EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release, Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation, Authority, Core, Cryptography, Database, Docker, Integration, Notify, Observability, Security, ServiceGraph, Sources, Verification). Each check now emits structured remediation metadata (severity, category, runbook links, and fix suggestions) consumed by the Doctor dashboard remediation panel. Also adds: - docs/doctor/articles/ knowledge base for check explanations - Advisory AI search seed and allowlist updates for doctor content - Sprint plan for doctor checks documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3.4 KiB
checkId, plugin, severity, tags
| checkId | plugin | severity | tags | ||||
|---|---|---|---|---|---|---|---|
| check.environment.drift | stellaops.doctor.environment | warn |
|
Environment Drift Detection
What It Checks
Queries the Release Orchestrator drift report API (/api/v1/environments/drift) and compares configuration snapshots across environments. The check requires at least 2 environments to perform comparison. Each drift item carries a severity classification:
- Fail if any drift is classified as
critical(e.g., security-relevant configuration differences between staging and production) - Warn if drifts exist but none are critical
- Pass if no configuration drift is detected between environments
Evidence includes the specific configuration keys that drifted and which environments are affected.
Why It Matters
Configuration drift between environments undermines the core promise of promotion-based releases: that what you test in staging is what runs in production. Drift can cause subtle behavioral differences that only manifest under production load, making bugs nearly impossible to reproduce. Critical drift in security-related configuration (TLS settings, authentication, network policies) can create compliance violations and security exposures.
Common Causes
- Manual configuration changes applied directly to one environment (bypassing the release pipeline)
- Failed deployment that left partial configuration in one environment
- Configuration sync job that did not propagate to all environments
- Environment restored from an outdated backup
- Intentional per-environment overrides that were not tracked as accepted exceptions
How to Fix
Docker Compose
# View the current drift report
stella env drift show
# Compare specific configuration between environments
diff <(docker exec stellaops-staging cat /app/appsettings.json) \
<(docker exec stellaops-prod cat /app/appsettings.json)
# Reconcile by redeploying from the canonical source
docker compose -f docker-compose.stella-ops.yml up -d --force-recreate <service>
# If drift is intentional, mark it as accepted
stella env drift accept <config-key>
Bare Metal / systemd
# View drift report
stella env drift show
# Compare config files between environments
diff /etc/stellaops/staging/appsettings.json /etc/stellaops/prod/appsettings.json
# Reconcile by copying from source of truth
sudo cp /etc/stellaops/staging/appsettings.json /etc/stellaops/prod/appsettings.json
sudo systemctl restart stellaops-<service>
# Or accept drift as intentional
stella env drift accept <config-key>
Kubernetes / Helm
# View drift between environments
stella env drift show
# Compare Helm values between environments
diff <(helm get values stellaops -n stellaops-staging -o yaml) \
<(helm get values stellaops -n stellaops-prod -o yaml)
# Reconcile by redeploying with consistent values
helm upgrade stellaops stellaops/stellaops -n stellaops-prod \
-f values-prod.yaml
# Compare ConfigMaps
kubectl diff -f configmap.yaml -n stellaops-prod
Verification
stella doctor run --check check.environment.drift
Related Checks
check.environment.deployments- drift can cause service failures after redeploymentcheck.environment.secrets- secret configuration differences between environmentscheck.environment.network.policy- network policy drift is a security concern