Files
git.stella-ops.org/docs/operations/runbooks/policy-version-mismatch.md

4.2 KiB

Runbook: Policy Engine - Policy Version Conflicts

Sprint: SPRINT_20260117_029_DOCS_runbook_coverage Task: RUN-003 - Policy Engine Runbooks

Metadata

Field Value
Component Policy Engine
Severity Medium
On-call scope Platform team
Last updated 2026-01-17
Doctor check check.policy.version-consistency

Symptoms

  • Policy evaluation returning unexpected results
  • Alert PolicyVersionMismatch firing
  • Error: "policy version conflict" or "bundle version mismatch"
  • Different nodes evaluating with different policy versions
  • Inconsistent gate decisions for same artifact

Impact

Impact Type Description
User-facing Inconsistent policy decisions; unpredictable gate results
Data integrity Decisions may not match expected policy behavior
SLA impact Gate accuracy SLO violated; trust in decisions reduced

Diagnosis

Quick checks

  1. Check Doctor diagnostics:

    stella doctor --check check.policy.version-consistency
    
  2. Check policy version across nodes:

    stella policy version --all-nodes
    
  3. Check active policy version:

    stella policy active --show-version
    

Deep diagnosis

  1. Compare versions across instances:

    stella policy version diff --all-instances
    

    Problem if: Different versions on different nodes

  2. Check bundle distribution status:

    stella policy bundle status --all-nodes
    
  3. Check for failed deployments:

    stella policy deployments list --status failed --last 24h
    
  4. Check OPA bundle sync:

    stella policy opa bundle-status
    

Resolution

Immediate mitigation

  1. Force sync to latest version:

    stella policy sync --force --all-nodes
    
  2. Pin specific version:

    stella policy pin --version <version>
    stella policy sync --all-nodes
    
  3. Restart policy engines to force reload:

    stella service restart policy-engine --all-nodes
    

Root cause fix

If bundle distribution failed:

  1. Check bundle storage:

    stella policy bundle storage-status
    
  2. Rebuild and redistribute bundle:

    stella policy bundle build
    stella policy bundle distribute --all-nodes
    

If node out of sync:

  1. Check specific node status:

    stella policy status --node <node-id>
    
  2. Force node resync:

    stella policy sync --node <node-id> --force
    
  3. Verify node is receiving updates:

    stella policy bundle check-subscription --node <node-id>
    

If concurrent deployments caused conflict:

  1. Check deployment history:

    stella policy deployments list --last 1h
    
  2. Resolve to single version:

    stella policy resolve-conflict --to-version <version>
    
  3. Enable deployment locking:

    stella policy config set deployment.locking true
    

If OPA bundle polling issue:

  1. Check OPA bundle configuration:

    stella policy opa config show | grep bundle
    
  2. Decrease polling interval for faster sync:

    stella policy opa config set bundle.polling.min_delay_seconds 10
    stella policy opa config set bundle.polling.max_delay_seconds 30
    

Verification

# Verify all nodes on same version
stella policy version --all-nodes

# Test consistent evaluation
stella policy evaluate --test --all-nodes

# Verify bundle status
stella policy bundle status --all-nodes

# Check no version warnings
stella policy logs --filter "version" --level warning --last 30m

Prevention

  • Locking: Enable deployment locking to prevent concurrent updates
  • Monitoring: Alert on version drift between nodes
  • Sync: Configure aggressive bundle polling for fast convergence
  • Testing: Deploy to staging before production to catch issues

  • Architecture: docs/modules/policy/versioning.md
  • Related runbooks: policy-opa-crash.md, policy-storage-unavailable.md
  • Deployment guide: docs/operations/policy-deployment.md