# Runbook: Scanner - Out of Memory on Large Images > **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage > **Task:** RUN-002 - Scanner Runbooks ## Metadata | Field | Value | |-------|-------| | **Component** | Scanner | | **Severity** | High | | **On-call scope** | Platform team | | **Last updated** | 2026-01-17 | | **Doctor check** | `check.scanner.memory-usage` | --- ## Symptoms - [ ] Scanner worker exits with code 137 (OOM killed) - [ ] Scans fail consistently for specific large images - [ ] Error log contains "fatal error: runtime: out of memory" - [ ] Alert `ScannerWorkerOOM` firing - [ ] Metric `scanner_worker_restarts_total{reason="oom"}` increasing --- ## Impact | Impact Type | Description | |-------------|-------------| | **User-facing** | Large images cannot be scanned; smaller images may still work | | **Data integrity** | No data loss; failed scans can be retried | | **SLA impact** | Specific images blocked from release pipeline | --- ## Diagnosis ### Quick checks 1. **Identify the failing image:** ```bash stella scanner jobs list --status failed --last 1h ``` 2. **Check image size:** ```bash stella image inspect --format json | jq '.size' ``` Problem if: Image size > 2GB or layer count > 100 3. **Check worker memory limit:** ```bash stella scanner config get worker.memory_limit ``` ### Deep diagnosis 1. **Profile memory usage during scan:** ```bash stella scan image --image --profile-memory ``` 2. **Check SBOM generation memory:** ```bash stella scanner logs --filter "sbom" --level debug --last 30m ``` Look for: "memory allocation failed", "heap exhausted" 3. **Identify memory-heavy layers:** ```bash stella image layers --sort-by size ``` --- ## Resolution ### Immediate mitigation 1. **Increase worker memory limit:** ```bash stella scanner config set worker.memory_limit 8Gi stella scanner workers restart ``` 2. **Enable streaming mode for large images:** ```bash stella scanner config set sbom.streaming_threshold 1Gi stella scanner workers restart ``` 3. **Retry the failed scan:** ```bash stella scan image --image --retry ``` ### Root cause fix **For consistently large images:** 1. Configure dedicated large-image worker pool: ```bash stella scanner workers add --pool large-images --memory 16Gi --count 2 stella scanner config set routing.large_image_threshold 2Gi stella scanner config set routing.large_image_pool large-images ``` **For images with many small files (node_modules, etc.):** 1. Enable incremental SBOM mode: ```bash stella scanner config set sbom.incremental_mode true ``` **For base image reuse:** 1. Enable layer caching: ```bash stella scanner config set cache.layer_dedup true ``` ### Verification ```bash # Retry the previously failing scan stella scan image --image # Monitor memory during scan stella scanner workers stats --watch # Verify no OOM in recent logs stella scanner logs --filter "out of memory" --last 1h ``` --- ## Prevention - [ ] **Capacity:** Set memory limit based on largest expected image (recommend 4Gi minimum) - [ ] **Routing:** Configure large-image pool for images > 2GB - [ ] **Monitoring:** Alert on `scanner_worker_memory_usage_bytes` > 80% of limit - [ ] **Documentation:** Document image size limits in user guide --- ## Related Resources - **Architecture:** `docs/modules/scanner/architecture.md` - **Related runbooks:** `scanner-worker-stuck.md`, `scanner-timeout.md` - **Dashboard:** Grafana > Stella Ops > Scanner Memory