4.2 KiB
Runbook: Policy Engine - Policy Version Conflicts
Sprint: SPRINT_20260117_029_DOCS_runbook_coverage Task: RUN-003 - Policy Engine Runbooks
Metadata
| Field | Value |
|---|---|
| Component | Policy Engine |
| Severity | Medium |
| On-call scope | Platform team |
| Last updated | 2026-01-17 |
| Doctor check | check.policy.version-consistency |
Symptoms
- Policy evaluation returning unexpected results
- Alert
PolicyVersionMismatchfiring - Error: "policy version conflict" or "bundle version mismatch"
- Different nodes evaluating with different policy versions
- Inconsistent gate decisions for same artifact
Impact
| Impact Type | Description |
|---|---|
| User-facing | Inconsistent policy decisions; unpredictable gate results |
| Data integrity | Decisions may not match expected policy behavior |
| SLA impact | Gate accuracy SLO violated; trust in decisions reduced |
Diagnosis
Quick checks
-
Check Doctor diagnostics:
stella doctor --check check.policy.version-consistency -
Check policy version across nodes:
stella policy version --all-nodes -
Check active policy version:
stella policy active --show-version
Deep diagnosis
-
Compare versions across instances:
stella policy version diff --all-instancesProblem if: Different versions on different nodes
-
Check bundle distribution status:
stella policy bundle status --all-nodes -
Check for failed deployments:
stella policy deployments list --status failed --last 24h -
Check OPA bundle sync:
stella policy opa bundle-status
Resolution
Immediate mitigation
-
Force sync to latest version:
stella policy sync --force --all-nodes -
Pin specific version:
stella policy pin --version <version> stella policy sync --all-nodes -
Restart policy engines to force reload:
stella service restart policy-engine --all-nodes
Root cause fix
If bundle distribution failed:
-
Check bundle storage:
stella policy bundle storage-status -
Rebuild and redistribute bundle:
stella policy bundle build stella policy bundle distribute --all-nodes
If node out of sync:
-
Check specific node status:
stella policy status --node <node-id> -
Force node resync:
stella policy sync --node <node-id> --force -
Verify node is receiving updates:
stella policy bundle check-subscription --node <node-id>
If concurrent deployments caused conflict:
-
Check deployment history:
stella policy deployments list --last 1h -
Resolve to single version:
stella policy resolve-conflict --to-version <version> -
Enable deployment locking:
stella policy config set deployment.locking true
If OPA bundle polling issue:
-
Check OPA bundle configuration:
stella policy opa config show | grep bundle -
Decrease polling interval for faster sync:
stella policy opa config set bundle.polling.min_delay_seconds 10 stella policy opa config set bundle.polling.max_delay_seconds 30
Verification
# Verify all nodes on same version
stella policy version --all-nodes
# Test consistent evaluation
stella policy evaluate --test --all-nodes
# Verify bundle status
stella policy bundle status --all-nodes
# Check no version warnings
stella policy logs --filter "version" --level warning --last 30m
Prevention
- Locking: Enable deployment locking to prevent concurrent updates
- Monitoring: Alert on version drift between nodes
- Sync: Configure aggressive bundle polling for fast convergence
- Testing: Deploy to staging before production to catch issues
Related Resources
- Architecture:
docs/modules/policy/versioning.md - Related runbooks:
policy-opa-crash.md,policy-storage-unavailable.md - Deployment guide:
docs/operations/policy-deployment.md