# SLO Burn-Rate Computation and Alert Budget Tracking ## Module Orchestrator ## Status VERIFIED ## Description SLO burn-rate computation for orchestrator operations with configurable alert budgets, enabling proactive capacity and reliability management. ## Implementation Details - **Modules**: `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/SloManagement/`, `src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/` - **Key Classes**: - `BurnRateEngine` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/SloManagement/BurnRateEngine.cs`) - computes SLO burn rate from error budget consumption over rolling windows (1h, 6h, 24h, 30d) - `Slo` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Domain/Slo.cs`) - SLO entity with target (e.g., 99.9%), error budget, and current burn rate - `SloEndpoints` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.WebService/Endpoints/SloEndpoints.cs`) - REST API for SLO queries and burn rate dashboards - `IncidentModeHooks` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Observability/IncidentModeHooks.cs`) - activates incident mode when burn rate exceeds alert thresholds - `OrchestratorGoldenSignals` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Infrastructure/Observability/OrchestratorGoldenSignals.cs`) - provides underlying error/latency data for SLO computation - `ScaleMetrics` (`src/JobEngine/StellaOps.JobEngine/StellaOps.JobEngine.Core/Scale/ScaleMetrics.cs`) - metrics feeding SLO saturation signals - **Interfaces**: None (uses concrete implementations) - **Source**: Feature matrix scan ## E2E Test Plan - [ ] Define an `Slo` with target=99.9% and error budget=43.2 minutes/month; verify the SLO is persisted - [ ] Generate 10 successful and 1 failed request and verify `BurnRateEngine` computation reflects the error - [ ] Verify rolling window: compute burn rates for 1h, 6h, and 24h windows via `BurnRateEngine` and verify each reflects the appropriate time range - [ ] Exceed the alert threshold (e.g., 2x burn rate) and verify `IncidentModeHooks` triggers incident mode - [ ] Query SLO via `SloEndpoints` and verify the response includes current burn rate, remaining budget, and alert status - [ ] Verify budget depletion: consume the entire error budget and verify the `Slo` shows 0% remaining - [ ] Reset the SLO period (monthly rollover) and verify the error budget resets to full - [ ] Verify multi-SLO: define SLOs for latency and availability, verify `BurnRateEngine` computes each independently ## Verification - Verified on 2026-02-13 via `run-002`. - Tier 0: Source files confirmed present on disk. - Tier 1: `dotnet build` passed (0 errors); 1292/1292 tests passed. - Tier 2d: `docs/qa/feature-checks/runs/jobengine/slo-burn-rate-computation-and-alert-budget-tracking/run-002/tier2-integration-check.json`