2.9 KiB
2.9 KiB
Unknowns SLA Monitoring
Module
Unknowns
Status
VERIFIED
Description
SLA monitoring for unknowns tracking resolution timelines and health checks for unknown queue items.
Implementation Details
- Unknowns SLA Monitor Service:
src/Unknowns/StellaOps.Unknowns.Services/UnknownsSlaMonitorService.cs-- background service that periodically checks unknown queue items against configured SLA thresholds (time-to-triage, time-to-resolution); raises alerts for SLA breaches. - Unknowns SLA Health Check:
src/Unknowns/StellaOps.Unknowns.Services/UnknownsSlaHealthCheck.cs-- ASP.NET health check that reports SLA compliance status; returns degraded/unhealthy when unknowns exceed SLA thresholds, enabling integration with orchestrator health monitoring. - Unknowns Metrics Service:
src/Unknowns/StellaOps.Unknowns.Services/UnknownsMetricsService.cs-- exposes Prometheus/OpenTelemetry metrics for unknown queue depth, average resolution time, SLA breach count, and hint coverage percentage. - SLA Calculator:
src/Unknowns/StellaOps.Unknowns.Services/SlaCalculator.cs-- shared score-band and elapsed-time calculations used by both monitor and health check paths. - Grey Queue Entry Model:
src/Unknowns/__Libraries/StellaOps.Unknowns.Core/Models/GreyQueueEntry.cs-- data model for grey queue entries including creation timestamp, last activity timestamp, and SLA deadline fields used by the monitor.
E2E Test Plan
- Enqueue an unknown item, let the
UnknownsSlaMonitorServicerun its check cycle, and verify the item is reported as within SLA when the elapsed time is below the threshold - Enqueue an unknown item with an artificially past creation timestamp (exceeding the SLA threshold), run the monitor, and verify an SLA breach alert is raised
- Query the
UnknownsSlaHealthCheckendpoint when all unknowns are within SLA and verify it returnsHealthy; then introduce warning and breach states and verify it returnsDegradedandUnhealthy - Verify
UnknownsMetricsServiceexposes correct Prometheus metrics: enqueue an item, resolve it, and verifyunknown_resolution_time_secondshistogram records the elapsed time
Verification
- Verified: 2026-02-11
- Method: Tier 0 source verification + Tier 1 build/test + Tier 2 behavioral integration replay
- Build: PASS (
src/Unknowns/StellaOps.Unknowns.Services/StellaOps.Unknowns.Services.csproj) - Tests: PASS (
src/Unknowns/__Tests/StellaOps.Unknowns.Core.Tests/StellaOps.Unknowns.Core.Tests.csproj: 119/119,src/Unknowns/__Tests/StellaOps.Unknowns.WebService.Tests/StellaOps.Unknowns.WebService.Tests.csproj: 9/9) - Tier 0 Evidence:
docs/qa/feature-checks/runs/unknowns/unknowns-sla-monitoring/run-001/tier0-source-check.json - Tier 1 Evidence:
docs/qa/feature-checks/runs/unknowns/unknowns-sla-monitoring/run-001/tier1-build-check.json - Tier 2 Evidence:
docs/qa/feature-checks/runs/unknowns/unknowns-sla-monitoring/run-001/tier2-integration-check.json