# Runbook: Scanner - Worker Not Processing Jobs > **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage > **Task:** RUN-002 - Scanner Runbooks ## Metadata | Field | Value | |-------|-------| | **Component** | Scanner | | **Severity** | Critical | | **On-call scope** | Platform team | | **Last updated** | 2026-01-17 | | **Doctor check** | `check.scanner.worker-health` | --- ## Symptoms - [ ] Scan jobs stuck in "pending" or "processing" state for >5 minutes - [ ] Scanner worker process shows 0% CPU usage - [ ] Alert `ScannerWorkerStuck` or `ScannerQueueBacklog` firing - [ ] UI shows "Scan in progress" indefinitely - [ ] Metric `scanner_jobs_pending` increasing over time --- ## Impact | Impact Type | Description | |-------------|-------------| | **User-facing** | New scans cannot complete, blocking CI/CD pipelines and release gates | | **Data integrity** | No data loss; pending jobs will resume when worker recovers | | **SLA impact** | Scan latency SLO violated if not resolved within 15 minutes | --- ## Diagnosis ### Quick checks (< 2 minutes) 1. **Check Doctor diagnostics:** ```bash stella doctor --check check.scanner.worker-health ``` 2. **Check scanner service status:** ```bash stella scanner status ``` Expected: "Scanner workers: 4 active, 0 idle" Problem: "Scanner workers: 0 active" or "status: degraded" 3. **Check job queue depth:** ```bash stella scanner queue status ``` Expected: Queue depth < 50 Problem: Queue depth > 100 or growing rapidly ### Deep diagnosis 1. **Check worker process logs:** ```bash stella scanner logs --tail 100 --level error ``` Look for: "timeout", "connection refused", "out of memory" 2. **Check Valkey connectivity (job queue):** ```bash stella doctor --check check.storage.valkey ``` 3. **Check if workers are OOM-killed:** ```bash stella scanner workers inspect ``` Look for: "exit_code: 137" (OOM) or "exit_code: 143" (SIGTERM) 4. **Check resource utilization:** ```bash stella obs metrics --filter scanner --last 10m ``` Look for: Memory > 90%, CPU sustained > 95% --- ## Resolution ### Immediate mitigation 1. **Restart scanner workers:** ```bash stella scanner workers restart ``` This will: Terminate current workers and spawn fresh ones 2. **If restart fails, force restart the scanner service:** ```bash stella service restart scanner ``` 3. **Verify workers are processing:** ```bash stella scanner queue status --watch ``` Queue depth should start decreasing ### Root cause fix **If workers were OOM-killed:** 1. Increase worker memory limit: ```bash stella scanner config set worker.memory_limit 4Gi stella scanner workers restart ``` 2. Reduce concurrent scans per worker: ```bash stella scanner config set worker.concurrency 2 stella scanner workers restart ``` **If Valkey connection failed:** 1. Check Valkey health: ```bash stella doctor --check check.storage.valkey ``` 2. Restart Valkey if needed (see `valkey-connection-failure.md`) **If workers are deadlocked:** 1. Enable deadlock detection: ```bash stella scanner config set worker.deadlock_detection true stella scanner workers restart ``` ### Verification ```bash # Verify workers are healthy stella doctor --check check.scanner.worker-health # Submit a test scan stella scan image --image alpine:latest --dry-run # Watch queue drain stella scanner queue status --watch # Verify no errors in recent logs stella scanner logs --tail 20 --level error ``` --- ## Prevention - [ ] **Alert:** Ensure `ScannerQueueBacklog` alert is configured with threshold < 100 jobs - [ ] **Monitoring:** Add Grafana panel for worker memory usage - [ ] **Capacity:** Review worker count and memory limits during capacity planning - [ ] **Deadlock:** Enable `worker.deadlock_detection` in production --- ## Related Resources - **Architecture:** `docs/modules/scanner/architecture.md` - **Related runbooks:** `scanner-oom.md`, `scanner-timeout.md` - **Doctor check:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Scanner/Checks/WorkerHealthCheck.cs` - **Dashboard:** Grafana > Stella Ops > Scanner Overview