179 lines
3.7 KiB
Markdown
179 lines
3.7 KiB
Markdown
# Runbook: Policy Engine - Policy Storage Backend Down
|
|
|
|
> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
|
|
> **Task:** RUN-003 - Policy Engine Runbooks
|
|
|
|
## Metadata
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| **Component** | Policy Engine |
|
|
| **Severity** | Critical |
|
|
| **On-call scope** | Platform team |
|
|
| **Last updated** | 2026-01-17 |
|
|
| **Doctor check** | `check.policy.storage-health` |
|
|
|
|
---
|
|
|
|
## Symptoms
|
|
|
|
- [ ] Policy operations failing with "storage unavailable"
|
|
- [ ] Alert `PolicyStorageUnavailable` firing
|
|
- [ ] Error: "failed to connect to policy store" or "database connection refused"
|
|
- [ ] Policy updates not persisting
|
|
- [ ] OPA unable to load bundles from storage
|
|
|
|
---
|
|
|
|
## Impact
|
|
|
|
| Impact Type | Description |
|
|
|-------------|-------------|
|
|
| **User-facing** | Policy updates fail; cached policies may still work |
|
|
| **Data integrity** | Policy changes not persisted; risk of inconsistent state |
|
|
| **SLA impact** | Policy management blocked; evaluations use cached data |
|
|
|
|
---
|
|
|
|
## Diagnosis
|
|
|
|
### Quick checks
|
|
|
|
1. **Check Doctor diagnostics:**
|
|
```bash
|
|
stella doctor --check check.policy.storage-health
|
|
```
|
|
|
|
2. **Check storage connectivity:**
|
|
```bash
|
|
stella policy storage status
|
|
```
|
|
|
|
3. **Check database health:**
|
|
```bash
|
|
stella db status --component policy
|
|
```
|
|
|
|
### Deep diagnosis
|
|
|
|
1. **Check PostgreSQL connectivity:**
|
|
```bash
|
|
stella db ping --database policy
|
|
```
|
|
|
|
2. **Check connection pool status:**
|
|
```bash
|
|
stella db pool-status --database policy
|
|
```
|
|
Problem if: Pool exhausted, connections timing out
|
|
|
|
3. **Check storage logs:**
|
|
```bash
|
|
stella policy logs --filter "storage" --level error --last 30m
|
|
```
|
|
|
|
4. **Check disk space (if local storage):**
|
|
```bash
|
|
stella policy storage disk-usage
|
|
```
|
|
|
|
---
|
|
|
|
## Resolution
|
|
|
|
### Immediate mitigation
|
|
|
|
1. **Enable read-only mode (use cached policies):**
|
|
```bash
|
|
stella policy config set storage.read_only true
|
|
stella policy reload
|
|
```
|
|
|
|
2. **Switch to backup storage:**
|
|
```bash
|
|
stella policy storage failover --to backup
|
|
```
|
|
|
|
3. **Restart policy service to reconnect:**
|
|
```bash
|
|
stella service restart policy-engine
|
|
```
|
|
|
|
### Root cause fix
|
|
|
|
**If database connection issue:**
|
|
|
|
1. Check database status:
|
|
```bash
|
|
stella db status --database policy --verbose
|
|
```
|
|
|
|
2. Restart database connection pool:
|
|
```bash
|
|
stella db pool-restart --database policy
|
|
```
|
|
|
|
3. Check and increase connection limits:
|
|
```bash
|
|
stella db config set policy.max_connections 50
|
|
```
|
|
|
|
**If disk space exhausted:**
|
|
|
|
1. Check storage usage:
|
|
```bash
|
|
stella policy storage disk-usage --verbose
|
|
```
|
|
|
|
2. Clean old policy versions:
|
|
```bash
|
|
stella policy versions cleanup --older-than 30d
|
|
```
|
|
|
|
3. Increase storage capacity
|
|
|
|
**If storage corruption:**
|
|
|
|
1. Verify storage integrity:
|
|
```bash
|
|
stella policy storage verify
|
|
```
|
|
|
|
2. Restore from backup:
|
|
```bash
|
|
stella policy storage restore --from-backup latest
|
|
```
|
|
|
|
### Verification
|
|
|
|
```bash
|
|
# Check storage status
|
|
stella policy storage status
|
|
|
|
# Test write operation
|
|
stella policy storage test-write
|
|
|
|
# Test policy update
|
|
stella policy update --test
|
|
|
|
# Verify no errors
|
|
stella policy logs --filter "storage" --level error --last 30m
|
|
```
|
|
|
|
---
|
|
|
|
## Prevention
|
|
|
|
- [ ] **Monitoring:** Alert on storage connection failures immediately
|
|
- [ ] **Redundancy:** Configure backup storage for failover
|
|
- [ ] **Cleanup:** Schedule regular cleanup of old policy versions
|
|
- [ ] **Capacity:** Monitor disk usage and plan for growth
|
|
|
|
---
|
|
|
|
## Related Resources
|
|
|
|
- **Architecture:** `docs/modules/policy/storage.md`
|
|
- **Related runbooks:** `policy-opa-crash.md`, `postgres-ops.md`
|
|
- **Database setup:** `docs/operations/database-configuration.md`
|