partly or unimplemented features - now implemented
This commit is contained in:
@@ -18,9 +18,9 @@
|
||||
## 3) Pipelines & Guardrails
|
||||
|
||||
- **Redaction.** Attribute processors strip PII/secrets based on policy-managed allowed keys. Redaction profiles mirrored in Offline Kit.
|
||||
- **Sampling.** Tail sampling by service/error; incident mode (triggered by Orchestrator) promotes services to 100 % sampling, extends retention, and toggles Notify alerts.
|
||||
- **Alerting.** Prometheus rules/Dashboards packaged with Export Center: service SLOs, queue depth, policy run latency, ingestion AOC violations.
|
||||
- **Sealed-mode guard.** `StellaOps.Telemetry.Core` enforces `IEgressPolicy` on OTLP exporters; when air-gap mode is sealed any non-loopback collector endpoints are automatically disabled and a structured warning with remediation is emitted.
|
||||
- **Sampling.** Tail sampling by service/error; incident mode (triggered by Orchestrator) promotes services to 100 % sampling, extends retention, and toggles Notify alerts.
|
||||
- **Alerting.** Prometheus rules/Dashboards packaged with Export Center: service SLOs, queue depth, policy run latency, ingestion AOC violations.
|
||||
- **Sealed-mode guard.** `StellaOps.Telemetry.Core` enforces `IEgressPolicy` on OTLP exporters; when air-gap mode is sealed any non-loopback collector endpoints are automatically disabled and a structured warning with remediation is emitted.
|
||||
|
||||
## 4) APIs & integration
|
||||
|
||||
@@ -39,4 +39,66 @@
|
||||
- Meta-metrics: `collector_export_failures_total`, `telemetry_bundle_generation_seconds`, `telemetry_incident_mode{state}`.
|
||||
- Health endpoints for collectors and storage clusters, plus dashboards for ingestion rate, retention, rule evaluations.
|
||||
|
||||
Refer to the module README and implementation plan for immediate context, and update this document once component boundaries and data flows are finalised.
|
||||
## 7) DORA Metrics
|
||||
|
||||
Stella Ops tracks the four key DORA (DevOps Research and Assessment) metrics for software delivery performance:
|
||||
|
||||
### 7.1) Metrics Tracked
|
||||
|
||||
- **Deployment Frequency** (`dora_deployments_total`, `dora_deployment_frequency_per_day`) — How often deployments occur per day/week.
|
||||
- **Lead Time for Changes** (`dora_lead_time_hours`) — Time from commit to deployment in production.
|
||||
- **Change Failure Rate** (`dora_deployment_failure_total`, `dora_change_failure_rate_percent`) — Percentage of deployments requiring rollback, hotfix, or failing.
|
||||
- **Mean Time to Recovery (MTTR)** (`dora_time_to_recovery_hours`) — Average time to recover from incidents.
|
||||
|
||||
### 7.2) Performance Classification
|
||||
|
||||
The system classifies teams into DORA performance levels:
|
||||
- **Elite**: On-demand deployments, <24h lead time, <15% CFR, <1h MTTR
|
||||
- **High**: Weekly deployments, <1 week lead time, <30% CFR, <1 day MTTR
|
||||
- **Medium**: Monthly deployments, <6 months lead time, <45% CFR, <1 week MTTR
|
||||
- **Low**: Quarterly or less frequent deployments with higher failure rates
|
||||
|
||||
### 7.3) Integration Points
|
||||
|
||||
- `IDoraMetricsService` — Service interface for recording deployments and incidents
|
||||
- `DoraMetrics` — OpenTelemetry-style metrics class with SLO breach tracking
|
||||
- DI registration: `services.AddDoraMetrics(options => { ... })`
|
||||
- Events are recorded when Release Orchestrator completes promotions or rollbacks
|
||||
|
||||
### 7.4) SLO Tracking
|
||||
|
||||
Configurable SLO targets via `DoraMetricsOptions`:
|
||||
- `LeadTimeSloHours` (default: 24)
|
||||
- `DeploymentFrequencySloPerDay` (default: 1)
|
||||
- `ChangeFailureRateSloPercent` (default: 15)
|
||||
- `MttrSloHours` (default: 1)
|
||||
|
||||
SLO breaches are recorded as `dora_slo_breach_total` with `metric` label.
|
||||
|
||||
### 7.5) Outcome Analytics and Attribution (Sprint 20260208_065)
|
||||
|
||||
Telemetry now includes deterministic executive outcome attribution built on top of the existing DORA event stream:
|
||||
|
||||
- `IOutcomeAnalyticsService` (`src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/IOutcomeAnalyticsService.cs`)
|
||||
- `DoraOutcomeAnalyticsService` (`src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/DoraOutcomeAnalyticsService.cs`)
|
||||
- Outcome report models (`src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/OutcomeAnalyticsModels.cs`)
|
||||
|
||||
Outcome attribution behavior:
|
||||
|
||||
- Produces `OutcomeExecutiveReport` for a fixed tenant/environment/time window with deterministic ordering.
|
||||
- Adds MTTA support via `DoraIncidentEvent.AcknowledgedAt` and `TimeToAcknowledge`.
|
||||
- Groups deployment outcomes by normalized pipeline (`pipeline-a`, `pipeline-b`, `unknown`) with per-pipeline change failure rate and median lead time.
|
||||
- Groups incidents by severity with resolved/acknowledged counts plus MTTA/MTTR aggregates.
|
||||
- Produces daily cohort slices across the requested date range for executive trend views.
|
||||
|
||||
Dependency injection integration:
|
||||
|
||||
- `TelemetryServiceCollectionExtensions.AddDoraMetrics(...)` now also registers `IOutcomeAnalyticsService`, so existing telemetry entry points automatically expose attribution reporting without additional module wiring.
|
||||
|
||||
Verification coverage:
|
||||
|
||||
- `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/OutcomeAnalyticsServiceTests.cs`
|
||||
- `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/DoraMetricsServiceTests.cs`
|
||||
- Full telemetry core test suite pass (`262` tests) remains green after integration.
|
||||
|
||||
Refer to the module README and implementation plan for immediate context, and update this document once component boundaries and data flows are finalised.
|
||||
|
||||
Reference in New Issue
Block a user