Files
git.stella-ops.org/docs/doctor/articles/environment/environment-drift.md
master c58a236d70 Doctor plugin checks: implement health check classes and documentation
Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:28:00 +02:00

3.4 KiB

checkId, plugin, severity, tags
checkId plugin severity tags
check.environment.drift stellaops.doctor.environment warn
environment
drift
configuration
consistency

Environment Drift Detection

What It Checks

Queries the Release Orchestrator drift report API (/api/v1/environments/drift) and compares configuration snapshots across environments. The check requires at least 2 environments to perform comparison. Each drift item carries a severity classification:

  • Fail if any drift is classified as critical (e.g., security-relevant configuration differences between staging and production)
  • Warn if drifts exist but none are critical
  • Pass if no configuration drift is detected between environments

Evidence includes the specific configuration keys that drifted and which environments are affected.

Why It Matters

Configuration drift between environments undermines the core promise of promotion-based releases: that what you test in staging is what runs in production. Drift can cause subtle behavioral differences that only manifest under production load, making bugs nearly impossible to reproduce. Critical drift in security-related configuration (TLS settings, authentication, network policies) can create compliance violations and security exposures.

Common Causes

  • Manual configuration changes applied directly to one environment (bypassing the release pipeline)
  • Failed deployment that left partial configuration in one environment
  • Configuration sync job that did not propagate to all environments
  • Environment restored from an outdated backup
  • Intentional per-environment overrides that were not tracked as accepted exceptions

How to Fix

Docker Compose

# View the current drift report
stella env drift show

# Compare specific configuration between environments
diff <(docker exec stellaops-staging cat /app/appsettings.json) \
     <(docker exec stellaops-prod cat /app/appsettings.json)

# Reconcile by redeploying from the canonical source
docker compose -f docker-compose.stella-ops.yml up -d --force-recreate <service>

# If drift is intentional, mark it as accepted
stella env drift accept <config-key>

Bare Metal / systemd

# View drift report
stella env drift show

# Compare config files between environments
diff /etc/stellaops/staging/appsettings.json /etc/stellaops/prod/appsettings.json

# Reconcile by copying from source of truth
sudo cp /etc/stellaops/staging/appsettings.json /etc/stellaops/prod/appsettings.json
sudo systemctl restart stellaops-<service>

# Or accept drift as intentional
stella env drift accept <config-key>

Kubernetes / Helm

# View drift between environments
stella env drift show

# Compare Helm values between environments
diff <(helm get values stellaops -n stellaops-staging -o yaml) \
     <(helm get values stellaops -n stellaops-prod -o yaml)

# Reconcile by redeploying with consistent values
helm upgrade stellaops stellaops/stellaops -n stellaops-prod \
  -f values-prod.yaml

# Compare ConfigMaps
kubectl diff -f configmap.yaml -n stellaops-prod

Verification

stella doctor run --check check.environment.drift
  • check.environment.deployments - drift can cause service failures after redeployment
  • check.environment.secrets - secret configuration differences between environments
  • check.environment.network.policy - network policy drift is a security concern