# Runbook: Policy Engine - Evaluation Latency High > **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage > **Task:** RUN-003 - Policy Engine Runbooks ## Metadata | Field | Value | |-------|-------| | **Component** | Policy Engine | | **Severity** | High | | **On-call scope** | Platform team | | **Last updated** | 2026-01-17 | | **Doctor check** | `check.policy.evaluation-latency` | --- ## Symptoms - [ ] Policy evaluation takes >500ms (warning) or >2s (critical) - [ ] Gate decisions timing out in CI/CD pipelines - [ ] Alert `PolicyEvaluationSlow` firing - [ ] Metric `policy_evaluation_duration_seconds` P95 > 1s - [ ] Users report "policy check taking too long" --- ## Impact | Impact Type | Description | |-------------|-------------| | **User-facing** | Slow release gate checks, CI/CD pipeline delays | | **Data integrity** | No data loss; decisions are still correct | | **SLA impact** | Gate latency SLO violated (target: P95 < 500ms) | --- ## Diagnosis ### Quick checks 1. **Check Doctor diagnostics:** ```bash stella doctor --check check.policy.evaluation-latency ``` 2. **Check policy engine status:** ```bash stella policy status ``` 3. **Check recent evaluation times:** ```bash stella policy stats --last 10m ``` Look for: P95 latency, cache hit rate ### Deep diagnosis 1. **Profile a slow evaluation:** ```bash stella policy evaluate --image --profile ``` Look for: Which phase is slowest (parse, compile, execute) 2. **Check OPA compilation cache:** ```bash stella policy cache stats ``` Problem if: Cache hit rate < 90% 3. **Check policy complexity:** ```bash stella policy analyze --complexity ``` Problem if: Cyclomatic complexity > 50 or rule count > 200 4. **Check external data fetches:** ```bash stella policy logs --filter "external fetch" --level debug ``` Problem if: Many external fetches or slow responses --- ## Resolution ### Immediate mitigation 1. **Clear and warm the compilation cache:** ```bash stella policy cache clear stella policy cache warm ``` 2. **Increase OPA worker count:** ```bash stella policy config set opa.workers 4 stella policy reload ``` 3. **Enable evaluation result caching:** ```bash stella policy config set cache.evaluation_ttl 60s stella policy reload ``` ### Root cause fix **If policy is too complex:** 1. Analyze and simplify policy: ```bash stella policy analyze --suggest-optimizations ``` 2. Split large policies into modules: ```bash stella policy refactor --auto-split ``` **If external data fetches are slow:** 1. Increase external data cache TTL: ```bash stella policy config set external_data.cache_ttl 5m ``` 2. Pre-fetch external data: ```bash stella policy external-data prefetch ``` **If Rego compilation is slow:** 1. Enable partial evaluation: ```bash stella policy config set opa.partial_eval true ``` 2. Pre-compile policies: ```bash stella policy compile --all ``` ### Verification ```bash # Run evaluation and check latency stella policy evaluate --image --timing # Check P95 latency stella policy stats --last 5m # Verify cache is effective stella policy cache stats ``` --- ## Prevention - [ ] **Review:** Review policy complexity before deployment - [ ] **Monitoring:** Alert on P95 latency > 300ms - [ ] **Caching:** Ensure evaluation cache is enabled - [ ] **Pre-warming:** Add cache warming to deployment pipeline --- ## Related Resources - **Architecture:** `docs/modules/policy/architecture.md` - **Related runbooks:** `policy-opa-crash.md`, `policy-compilation-failed.md` - **Dashboard:** Grafana > Stella Ops > Policy Engine