Some checks failed
		
		
	
	Docs CI / lint-and-preview (push) Has been cancelled
				
			- Introduced RunnerBackgroundService to handle execution of runner segments. - Added RunnerExecutionService for processing segments and aggregating results. - Implemented PlannerQueueDispatchService to manage dispatching of planner messages. - Created PlannerQueueDispatcherBackgroundService for leasing and processing planner queue messages. - Developed ScannerReportClient for interacting with the scanner service. - Enhanced observability with SchedulerWorkerMetrics for tracking planner and runner performance. - Added comprehensive documentation for the new runner execution pipeline and observability metrics. - Implemented event emission for rescan activity and scanner report readiness.
		
			
				
	
	
	
		
			2.5 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			2.5 KiB
		
	
	
	
	
	
	
	
SCHED-WORKER-16-205 — Scheduler Worker Observability
Sprint 16 · Scheduler Worker Guild
The scheduler worker now exposes first-class metrics covering planner latency, runner throughput, and backlog health.
Meter: StellaOps.Scheduler.Worker
| Metric | Type | Tags | Description | 
|---|---|---|---|
| scheduler_planner_runs_total | Counter | mode,status | Planner outcomes ( enqueued,no_work,failed). | 
| scheduler_planner_latency_seconds | Histogram | mode,status | Time between run creation and planner completion. | 
| scheduler_runner_segments_total | Counter | mode,status | Runner segments processed ( Completed,persist_failed,RunMissing). | 
| scheduler_runner_images_total | Counter | mode,delta | Images processed per mode, split by whether a delta was observed. | 
| scheduler_runner_delta_total | Counter | mode | Total new findings observed. | 
| scheduler_runner_delta_critical_total | Counter | mode | Critical findings observed. | 
| scheduler_runner_delta_high_total | Counter | mode | High findings observed. | 
| scheduler_runner_delta_kev_total | Counter | mode | KEV hits surfaced across runner segments. | 
| scheduler_run_duration_seconds | Histogram | mode,result | End-to-end run durations (currently recorded for successful completions). | 
| scheduler_runs_active | Up/down counter | mode | Active runs in-flight. | 
| scheduler_runner_backlog | Observable gauge | mode,scheduleId | Remaining images awaiting runner processing per schedule. | 
Instrumentation notes
- Planner records latency once a run transitions out of Planning.no_workcompletions emit zero-duration runs without incrementing the active counter.
- Runner updates backlog after every segment and decrements the active counter
when a run reaches Completed.
- Delta counters aggregate per severity and KEV hit; they only increment when
DeltaSummaryreports meaningful changes.
- Metrics are emitted regardless of Notify availability so operators can track queue pressure even in air-gapped deployments.
Dashboards & alerts
- Grafana dashboard: docs/ops/scheduler-worker-grafana-dashboard.json(import into Prometheus-backed Grafana). Panels mirror the metrics above with mode filters.
- Prometheus rules: docs/ops/scheduler-worker-prometheus-rules.yamlprovides planner failure/latency, backlog, and stuck-run alerts.
- Operations guide: see docs/ops/scheduler-worker-operations.mdfor runbook steps, alert context, and dashboard wiring instructions.