196 lines
4.2 KiB
Markdown
196 lines
4.2 KiB
Markdown
# Runbook: Policy Engine - Policy Version Conflicts
|
|
|
|
> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
|
|
> **Task:** RUN-003 - Policy Engine Runbooks
|
|
|
|
## Metadata
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| **Component** | Policy Engine |
|
|
| **Severity** | Medium |
|
|
| **On-call scope** | Platform team |
|
|
| **Last updated** | 2026-01-17 |
|
|
| **Doctor check** | `check.policy.version-consistency` |
|
|
|
|
---
|
|
|
|
## Symptoms
|
|
|
|
- [ ] Policy evaluation returning unexpected results
|
|
- [ ] Alert `PolicyVersionMismatch` firing
|
|
- [ ] Error: "policy version conflict" or "bundle version mismatch"
|
|
- [ ] Different nodes evaluating with different policy versions
|
|
- [ ] Inconsistent gate decisions for same artifact
|
|
|
|
---
|
|
|
|
## Impact
|
|
|
|
| Impact Type | Description |
|
|
|-------------|-------------|
|
|
| **User-facing** | Inconsistent policy decisions; unpredictable gate results |
|
|
| **Data integrity** | Decisions may not match expected policy behavior |
|
|
| **SLA impact** | Gate accuracy SLO violated; trust in decisions reduced |
|
|
|
|
---
|
|
|
|
## Diagnosis
|
|
|
|
### Quick checks
|
|
|
|
1. **Check Doctor diagnostics:**
|
|
```bash
|
|
stella doctor --check check.policy.version-consistency
|
|
```
|
|
|
|
2. **Check policy version across nodes:**
|
|
```bash
|
|
stella policy version --all-nodes
|
|
```
|
|
|
|
3. **Check active policy version:**
|
|
```bash
|
|
stella policy active --show-version
|
|
```
|
|
|
|
### Deep diagnosis
|
|
|
|
1. **Compare versions across instances:**
|
|
```bash
|
|
stella policy version diff --all-instances
|
|
```
|
|
Problem if: Different versions on different nodes
|
|
|
|
2. **Check bundle distribution status:**
|
|
```bash
|
|
stella policy bundle status --all-nodes
|
|
```
|
|
|
|
3. **Check for failed deployments:**
|
|
```bash
|
|
stella policy deployments list --status failed --last 24h
|
|
```
|
|
|
|
4. **Check OPA bundle sync:**
|
|
```bash
|
|
stella policy opa bundle-status
|
|
```
|
|
|
|
---
|
|
|
|
## Resolution
|
|
|
|
### Immediate mitigation
|
|
|
|
1. **Force sync to latest version:**
|
|
```bash
|
|
stella policy sync --force --all-nodes
|
|
```
|
|
|
|
2. **Pin specific version:**
|
|
```bash
|
|
stella policy pin --version <version>
|
|
stella policy sync --all-nodes
|
|
```
|
|
|
|
3. **Restart policy engines to force reload:**
|
|
```bash
|
|
stella service restart policy-engine --all-nodes
|
|
```
|
|
|
|
### Root cause fix
|
|
|
|
**If bundle distribution failed:**
|
|
|
|
1. Check bundle storage:
|
|
```bash
|
|
stella policy bundle storage-status
|
|
```
|
|
|
|
2. Rebuild and redistribute bundle:
|
|
```bash
|
|
stella policy bundle build
|
|
stella policy bundle distribute --all-nodes
|
|
```
|
|
|
|
**If node out of sync:**
|
|
|
|
1. Check specific node status:
|
|
```bash
|
|
stella policy status --node <node-id>
|
|
```
|
|
|
|
2. Force node resync:
|
|
```bash
|
|
stella policy sync --node <node-id> --force
|
|
```
|
|
|
|
3. Verify node is receiving updates:
|
|
```bash
|
|
stella policy bundle check-subscription --node <node-id>
|
|
```
|
|
|
|
**If concurrent deployments caused conflict:**
|
|
|
|
1. Check deployment history:
|
|
```bash
|
|
stella policy deployments list --last 1h
|
|
```
|
|
|
|
2. Resolve to single version:
|
|
```bash
|
|
stella policy resolve-conflict --to-version <version>
|
|
```
|
|
|
|
3. Enable deployment locking:
|
|
```bash
|
|
stella policy config set deployment.locking true
|
|
```
|
|
|
|
**If OPA bundle polling issue:**
|
|
|
|
1. Check OPA bundle configuration:
|
|
```bash
|
|
stella policy opa config show | grep bundle
|
|
```
|
|
|
|
2. Decrease polling interval for faster sync:
|
|
```bash
|
|
stella policy opa config set bundle.polling.min_delay_seconds 10
|
|
stella policy opa config set bundle.polling.max_delay_seconds 30
|
|
```
|
|
|
|
### Verification
|
|
|
|
```bash
|
|
# Verify all nodes on same version
|
|
stella policy version --all-nodes
|
|
|
|
# Test consistent evaluation
|
|
stella policy evaluate --test --all-nodes
|
|
|
|
# Verify bundle status
|
|
stella policy bundle status --all-nodes
|
|
|
|
# Check no version warnings
|
|
stella policy logs --filter "version" --level warning --last 30m
|
|
```
|
|
|
|
---
|
|
|
|
## Prevention
|
|
|
|
- [ ] **Locking:** Enable deployment locking to prevent concurrent updates
|
|
- [ ] **Monitoring:** Alert on version drift between nodes
|
|
- [ ] **Sync:** Configure aggressive bundle polling for fast convergence
|
|
- [ ] **Testing:** Deploy to staging before production to catch issues
|
|
|
|
---
|
|
|
|
## Related Resources
|
|
|
|
- **Architecture:** `docs/modules/policy/versioning.md`
|
|
- **Related runbooks:** `policy-opa-crash.md`, `policy-storage-unavailable.md`
|
|
- **Deployment guide:** `docs/operations/policy-deployment.md`
|