Files
git.stella-ops.org/docs/features/checked/unknowns/unknowns-sla-monitoring.md
2026-02-12 10:27:23 +02:00

2.9 KiB

Unknowns SLA Monitoring

Module

Unknowns

Status

VERIFIED

Description

SLA monitoring for unknowns tracking resolution timelines and health checks for unknown queue items.

Implementation Details

  • Unknowns SLA Monitor Service: src/Unknowns/StellaOps.Unknowns.Services/UnknownsSlaMonitorService.cs -- background service that periodically checks unknown queue items against configured SLA thresholds (time-to-triage, time-to-resolution); raises alerts for SLA breaches.
  • Unknowns SLA Health Check: src/Unknowns/StellaOps.Unknowns.Services/UnknownsSlaHealthCheck.cs -- ASP.NET health check that reports SLA compliance status; returns degraded/unhealthy when unknowns exceed SLA thresholds, enabling integration with orchestrator health monitoring.
  • Unknowns Metrics Service: src/Unknowns/StellaOps.Unknowns.Services/UnknownsMetricsService.cs -- exposes Prometheus/OpenTelemetry metrics for unknown queue depth, average resolution time, SLA breach count, and hint coverage percentage.
  • SLA Calculator: src/Unknowns/StellaOps.Unknowns.Services/SlaCalculator.cs -- shared score-band and elapsed-time calculations used by both monitor and health check paths.
  • Grey Queue Entry Model: src/Unknowns/__Libraries/StellaOps.Unknowns.Core/Models/GreyQueueEntry.cs -- data model for grey queue entries including creation timestamp, last activity timestamp, and SLA deadline fields used by the monitor.

E2E Test Plan

  • Enqueue an unknown item, let the UnknownsSlaMonitorService run its check cycle, and verify the item is reported as within SLA when the elapsed time is below the threshold
  • Enqueue an unknown item with an artificially past creation timestamp (exceeding the SLA threshold), run the monitor, and verify an SLA breach alert is raised
  • Query the UnknownsSlaHealthCheck endpoint when all unknowns are within SLA and verify it returns Healthy; then introduce warning and breach states and verify it returns Degraded and Unhealthy
  • Verify UnknownsMetricsService exposes correct Prometheus metrics: enqueue an item, resolve it, and verify unknown_resolution_time_seconds histogram records the elapsed time

Verification

  • Verified: 2026-02-11
  • Method: Tier 0 source verification + Tier 1 build/test + Tier 2 behavioral integration replay
  • Build: PASS (src/Unknowns/StellaOps.Unknowns.Services/StellaOps.Unknowns.Services.csproj)
  • Tests: PASS (src/Unknowns/__Tests/StellaOps.Unknowns.Core.Tests/StellaOps.Unknowns.Core.Tests.csproj: 119/119, src/Unknowns/__Tests/StellaOps.Unknowns.WebService.Tests/StellaOps.Unknowns.WebService.Tests.csproj: 9/9)
  • Tier 0 Evidence: docs/qa/feature-checks/runs/unknowns/unknowns-sla-monitoring/run-001/tier0-source-check.json
  • Tier 1 Evidence: docs/qa/feature-checks/runs/unknowns/unknowns-sla-monitoring/run-001/tier1-build-check.json
  • Tier 2 Evidence: docs/qa/feature-checks/runs/unknowns/unknowns-sla-monitoring/run-001/tier2-integration-check.json