# SCHED-WORKER-16-205 — Scheduler Worker Observability _Sprint 16 · Scheduler Worker Guild_ The scheduler worker now exposes first-class metrics covering planner latency, runner throughput, and backlog health. ## Meter: `StellaOps.Scheduler.Worker` | Metric | Type | Tags | Description | | --- | --- | --- | --- | | `scheduler_planner_runs_total` | Counter | `mode`, `status` | Planner outcomes (`enqueued`, `no_work`, `failed`). | | `scheduler_planner_latency_seconds` | Histogram | `mode`, `status` | Time between run creation and planner completion. | | `scheduler_runner_segments_total` | Counter | `mode`, `status` | Runner segments processed (`Completed`, `persist_failed`, `RunMissing`). | | `scheduler_runner_images_total` | Counter | `mode`, `delta` | Images processed per mode, split by whether a delta was observed. | | `scheduler_runner_delta_total` | Counter | `mode` | Total new findings observed. | | `scheduler_runner_delta_critical_total` | Counter | `mode` | Critical findings observed. | | `scheduler_runner_delta_high_total` | Counter | `mode` | High findings observed. | | `scheduler_runner_delta_kev_total` | Counter | `mode` | KEV hits surfaced across runner segments. | | `scheduler_run_duration_seconds` | Histogram | `mode`, `result` | End-to-end run durations (currently recorded for successful completions). | | `scheduler_runs_active` | Up/down counter | `mode` | Active runs in-flight. | | `scheduler_runner_backlog` | Observable gauge | `mode`, `scheduleId` | Remaining images awaiting runner processing per schedule. | ## Instrumentation notes - Planner records latency once a run transitions out of `Planning`. `no_work` completions emit zero-duration runs without incrementing the active counter. - Runner updates backlog after every segment and decrements the active counter when a run reaches `Completed`. - Delta counters aggregate per severity and KEV hit; they only increment when `DeltaSummary` reports meaningful changes. - Metrics are emitted regardless of Notify availability so operators can track queue pressure even in air-gapped deployments. ## Dashboards & alerts - **Grafana dashboard:** `docs/ops/scheduler-worker-grafana-dashboard.json` (import into Prometheus-backed Grafana). Panels mirror the metrics above with mode filters. - **Prometheus rules:** `docs/ops/scheduler-worker-prometheus-rules.yaml` provides planner failure/latency, backlog, and stuck-run alerts. - **Operations guide:** see `docs/ops/scheduler-worker-operations.md` for runbook steps, alert context, and dashboard wiring instructions.