Files
git.stella-ops.org/docs/doctor/articles/scanner/resources.md
master c58a236d70 Doctor plugin checks: implement health check classes and documentation
Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:28:00 +02:00

3.1 KiB

checkId, plugin, severity, tags
checkId plugin severity tags
check.scanner.resources stellaops.doctor.scanner warn
scanner
resources
cpu
memory
workers

Scanner Resource Utilization

What It Checks

Queries the Scanner service at /api/v1/resources/stats and evaluates CPU, memory, and worker pool health:

  • CPU utilization: warn at 75%, fail at 90%.
  • Memory utilization: warn at 80%, fail at 95%.
  • Worker pool saturation: warn when all workers are busy (zero idle workers).

Evidence collected: cpu_utilization, memory_utilization, memory_used_mb, active_workers, total_workers, idle_workers.

The check requires Scanner:Url or Services:Scanner:Url to be configured.

Why It Matters

The scanner is one of the most resource-intensive services in the Stella Ops stack. It processes container images, generates SBOMs, runs vulnerability matching, and performs reachability analysis. When scanner resources are exhausted, all downstream pipelines stall: queue depth grows, scan latency increases, and release gates time out waiting for scan results. Memory exhaustion can cause OOM kills that lose in-progress work.

Common Causes

  • High scan volume during bulk import or CI surge
  • Memory leak from accumulated scan artifacts not being garbage collected
  • Large container images (multi-GB layers) being processed concurrently
  • Insufficient CPU/memory allocation relative to workload
  • All workers busy with no capacity for new jobs
  • Worker scaling not keeping up with demand

How to Fix

Docker Compose

# Check scanner resource usage
docker stats scanner --no-stream

# Reduce concurrent jobs to lower resource pressure
# In docker-compose.stella-ops.yml:
services:
  scanner:
    deploy:
      resources:
        limits:
          memory: 4G
          cpus: "4.0"
    environment:
      Scanner__MaxConcurrentJobs: "2"
      Scanner__Workers__Count: "4"
# Restart scanner to apply new resource limits
docker compose -f docker-compose.stella-ops.yml up -d scanner

Bare Metal / systemd

# Check current resource usage
top -p $(pgrep -f stellaops-scanner)

# Reduce concurrent processing
stella scanner config set MaxConcurrentJobs 2

Edit /etc/stellaops/scanner/appsettings.json:

{
  "Scanner": {
    "MaxConcurrentJobs": 2,
    "Workers": {
      "Count": 4
    }
  }
}
sudo systemctl restart stellaops-scanner

Kubernetes / Helm

# Check pod resource usage
kubectl top pods -l app=stellaops-scanner

# Scale horizontally instead of vertically
kubectl scale deployment stellaops-scanner --replicas=4

Set in Helm values.yaml:

scanner:
  replicas: 4
  resources:
    requests:
      memory: 2Gi
      cpu: "2"
    limits:
      memory: 4Gi
      cpu: "4"
  maxConcurrentJobs: 2

Verification

stella doctor run --check check.scanner.resources
  • check.scanner.queue -- resource exhaustion causes queue backlog growth
  • check.scanner.sbom -- memory exhaustion causes SBOM generation failures
  • check.scanner.reachability -- CPU constraints slow computation times
  • check.scanner.slice.cache -- cache effectiveness reduces resource demand