2.5 KiB
2.5 KiB
P0 Product-Level Metrics and Dashboard
Module
Telemetry
Status
IMPLEMENTED
Description
Four P0 product-level metrics instrumented: time-to-first-verified-release, mean-time-to-answer-why-blocked, support-minutes-per-customer, and determinism-regressions-total, with Prometheus alerting rules and install timestamp tracking service.
Implementation Details
- P0ProductMetrics:
src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/P0ProductMetrics.cs-- meterStellaOps.P0Metricswith 4 P0 metrics:- P0M-001:
stella_time_to_first_verified_release_seconds-- histogram with buckets 5m to 1 week - P0M-002:
stella_why_blocked_latency_seconds-- mean time to answer "why blocked" - P0M-003:
stella_support_burden_minutes_total-- support minutes per customer counter - P0M-004:
stella_determinism_regressions_total-- determinism regression counter
- P0M-001:
- InstallTimestampService:
src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/InstallTimestampService.cs-- tracks fresh install timestamp for P0M-001 - GoldenSignalMetrics:
src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/GoldenSignalMetrics.cs-- golden signal metrics (latency, traffic, errors, saturation) - FidelityMetricsTelemetry:
src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/FidelityMetricsTelemetry.cs-- fidelity metrics for evidence quality - FidelitySloAlertingService:
src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/FidelitySloAlertingService.cs-- SLO alerting for fidelity metrics - ProofCoverageMetrics:
src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ProofCoverageMetrics.cs-- proof coverage tracking - ProofGenerationMetrics:
src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ProofGenerationMetrics.cs-- proof generation performance - Tests:
src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/GoldenSignalMetricsTests.cs,ProofCoverageMetricsTests.cs - Source: SPRINT_20260117_028_Telemetry
E2E Test Plan
- Verify time-to-first-verified-release histogram records elapsed time from install
- Test why-blocked latency captures mean time from block to explanation delivery
- Verify support minutes counter increments per customer interaction
- Test determinism regression counter fires on replay divergence detection
- Verify Prometheus alerting rules trigger on SLO breaches
- Test install timestamp service persists and recovers install time across restarts