Files
git.stella-ops.org/docs/features/unchecked/telemetry/p0-product-level-metrics-and-dashboard.md

2.5 KiB

P0 Product-Level Metrics and Dashboard

Module

Telemetry

Status

IMPLEMENTED

Description

Four P0 product-level metrics instrumented: time-to-first-verified-release, mean-time-to-answer-why-blocked, support-minutes-per-customer, and determinism-regressions-total, with Prometheus alerting rules and install timestamp tracking service.

Implementation Details

  • P0ProductMetrics: src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/P0ProductMetrics.cs -- meter StellaOps.P0Metrics with 4 P0 metrics:
    • P0M-001: stella_time_to_first_verified_release_seconds -- histogram with buckets 5m to 1 week
    • P0M-002: stella_why_blocked_latency_seconds -- mean time to answer "why blocked"
    • P0M-003: stella_support_burden_minutes_total -- support minutes per customer counter
    • P0M-004: stella_determinism_regressions_total -- determinism regression counter
  • InstallTimestampService: src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/InstallTimestampService.cs -- tracks fresh install timestamp for P0M-001
  • GoldenSignalMetrics: src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/GoldenSignalMetrics.cs -- golden signal metrics (latency, traffic, errors, saturation)
  • FidelityMetricsTelemetry: src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/FidelityMetricsTelemetry.cs -- fidelity metrics for evidence quality
  • FidelitySloAlertingService: src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/FidelitySloAlertingService.cs -- SLO alerting for fidelity metrics
  • ProofCoverageMetrics: src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ProofCoverageMetrics.cs -- proof coverage tracking
  • ProofGenerationMetrics: src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ProofGenerationMetrics.cs -- proof generation performance
  • Tests: src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/GoldenSignalMetricsTests.cs, ProofCoverageMetricsTests.cs
  • Source: SPRINT_20260117_028_Telemetry

E2E Test Plan

  • Verify time-to-first-verified-release histogram records elapsed time from install
  • Test why-blocked latency captures mean time from block to explanation delivery
  • Verify support minutes counter increments per customer interaction
  • Test determinism regression counter fires on replay divergence detection
  • Verify Prometheus alerting rules trigger on SLO breaches
  • Test install timestamp service persists and recovers install time across restarts