# Unknowns SLA Monitoring ## Module Unknowns ## Status VERIFIED ## Description SLA monitoring for unknowns tracking resolution timelines and health checks for unknown queue items. ## Implementation Details - **Unknowns SLA Monitor Service**: `src/Unknowns/StellaOps.Unknowns.Services/UnknownsSlaMonitorService.cs` -- background service that periodically checks unknown queue items against configured SLA thresholds (time-to-triage, time-to-resolution); raises alerts for SLA breaches. - **Unknowns SLA Health Check**: `src/Unknowns/StellaOps.Unknowns.Services/UnknownsSlaHealthCheck.cs` -- ASP.NET health check that reports SLA compliance status; returns degraded/unhealthy when unknowns exceed SLA thresholds, enabling integration with orchestrator health monitoring. - **Unknowns Metrics Service**: `src/Unknowns/StellaOps.Unknowns.Services/UnknownsMetricsService.cs` -- exposes Prometheus/OpenTelemetry metrics for unknown queue depth, average resolution time, SLA breach count, and hint coverage percentage. - **SLA Calculator**: `src/Unknowns/StellaOps.Unknowns.Services/SlaCalculator.cs` -- shared score-band and elapsed-time calculations used by both monitor and health check paths. - **Grey Queue Entry Model**: `src/Unknowns/__Libraries/StellaOps.Unknowns.Core/Models/GreyQueueEntry.cs` -- data model for grey queue entries including creation timestamp, last activity timestamp, and SLA deadline fields used by the monitor. ## E2E Test Plan - [ ] Enqueue an unknown item, let the `UnknownsSlaMonitorService` run its check cycle, and verify the item is reported as within SLA when the elapsed time is below the threshold - [ ] Enqueue an unknown item with an artificially past creation timestamp (exceeding the SLA threshold), run the monitor, and verify an SLA breach alert is raised - [ ] Query the `UnknownsSlaHealthCheck` endpoint when all unknowns are within SLA and verify it returns `Healthy`; then introduce warning and breach states and verify it returns `Degraded` and `Unhealthy` - [ ] Verify `UnknownsMetricsService` exposes correct Prometheus metrics: enqueue an item, resolve it, and verify `unknown_resolution_time_seconds` histogram records the elapsed time ## Verification - **Verified**: 2026-02-11 - **Method**: Tier 0 source verification + Tier 1 build/test + Tier 2 behavioral integration replay - **Build**: PASS (`src/Unknowns/StellaOps.Unknowns.Services/StellaOps.Unknowns.Services.csproj`) - **Tests**: PASS (`src/Unknowns/__Tests/StellaOps.Unknowns.Core.Tests/StellaOps.Unknowns.Core.Tests.csproj`: 119/119, `src/Unknowns/__Tests/StellaOps.Unknowns.WebService.Tests/StellaOps.Unknowns.WebService.Tests.csproj`: 9/9) - **Tier 0 Evidence**: `docs/qa/feature-checks/runs/unknowns/unknowns-sla-monitoring/run-001/tier0-source-check.json` - **Tier 1 Evidence**: `docs/qa/feature-checks/runs/unknowns/unknowns-sla-monitoring/run-001/tier1-build-check.json` - **Tier 2 Evidence**: `docs/qa/feature-checks/runs/unknowns/unknowns-sla-monitoring/run-001/tier2-integration-check.json`