partly or unimplemented features - now implemented

This commit is contained in:
master
2026-02-09 08:53:51 +02:00
parent 1bf6bbf395
commit 4bdc298ec1
674 changed files with 90194 additions and 2271 deletions

View File

@@ -18,9 +18,9 @@
## 3) Pipelines & Guardrails
- **Redaction.** Attribute processors strip PII/secrets based on policy-managed allowed keys. Redaction profiles mirrored in Offline Kit.
- **Sampling.** Tail sampling by service/error; incident mode (triggered by Orchestrator) promotes services to 100% sampling, extends retention, and toggles Notify alerts.
- **Alerting.** Prometheus rules/Dashboards packaged with Export Center: service SLOs, queue depth, policy run latency, ingestion AOC violations.
- **Sealed-mode guard.** `StellaOps.Telemetry.Core` enforces `IEgressPolicy` on OTLP exporters; when air-gap mode is sealed any non-loopback collector endpoints are automatically disabled and a structured warning with remediation is emitted.
- **Sampling.** Tail sampling by service/error; incident mode (triggered by Orchestrator) promotes services to 100% sampling, extends retention, and toggles Notify alerts.
- **Alerting.** Prometheus rules/Dashboards packaged with Export Center: service SLOs, queue depth, policy run latency, ingestion AOC violations.
- **Sealed-mode guard.** `StellaOps.Telemetry.Core` enforces `IEgressPolicy` on OTLP exporters; when air-gap mode is sealed any non-loopback collector endpoints are automatically disabled and a structured warning with remediation is emitted.
## 4) APIs & integration
@@ -39,4 +39,66 @@
- Meta-metrics: `collector_export_failures_total`, `telemetry_bundle_generation_seconds`, `telemetry_incident_mode{state}`.
- Health endpoints for collectors and storage clusters, plus dashboards for ingestion rate, retention, rule evaluations.
Refer to the module README and implementation plan for immediate context, and update this document once component boundaries and data flows are finalised.
## 7) DORA Metrics
Stella Ops tracks the four key DORA (DevOps Research and Assessment) metrics for software delivery performance:
### 7.1) Metrics Tracked
- **Deployment Frequency** (`dora_deployments_total`, `dora_deployment_frequency_per_day`) — How often deployments occur per day/week.
- **Lead Time for Changes** (`dora_lead_time_hours`) — Time from commit to deployment in production.
- **Change Failure Rate** (`dora_deployment_failure_total`, `dora_change_failure_rate_percent`) — Percentage of deployments requiring rollback, hotfix, or failing.
- **Mean Time to Recovery (MTTR)** (`dora_time_to_recovery_hours`) — Average time to recover from incidents.
### 7.2) Performance Classification
The system classifies teams into DORA performance levels:
- **Elite**: On-demand deployments, <24h lead time, <15% CFR, <1h MTTR
- **High**: Weekly deployments, <1 week lead time, <30% CFR, <1 day MTTR
- **Medium**: Monthly deployments, <6 months lead time, <45% CFR, <1 week MTTR
- **Low**: Quarterly or less frequent deployments with higher failure rates
### 7.3) Integration Points
- `IDoraMetricsService` Service interface for recording deployments and incidents
- `DoraMetrics` OpenTelemetry-style metrics class with SLO breach tracking
- DI registration: `services.AddDoraMetrics(options => { ... })`
- Events are recorded when Release Orchestrator completes promotions or rollbacks
### 7.4) SLO Tracking
Configurable SLO targets via `DoraMetricsOptions`:
- `LeadTimeSloHours` (default: 24)
- `DeploymentFrequencySloPerDay` (default: 1)
- `ChangeFailureRateSloPercent` (default: 15)
- `MttrSloHours` (default: 1)
SLO breaches are recorded as `dora_slo_breach_total` with `metric` label.
### 7.5) Outcome Analytics and Attribution (Sprint 20260208_065)
Telemetry now includes deterministic executive outcome attribution built on top of the existing DORA event stream:
- `IOutcomeAnalyticsService` (`src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/IOutcomeAnalyticsService.cs`)
- `DoraOutcomeAnalyticsService` (`src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/DoraOutcomeAnalyticsService.cs`)
- Outcome report models (`src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/OutcomeAnalyticsModels.cs`)
Outcome attribution behavior:
- Produces `OutcomeExecutiveReport` for a fixed tenant/environment/time window with deterministic ordering.
- Adds MTTA support via `DoraIncidentEvent.AcknowledgedAt` and `TimeToAcknowledge`.
- Groups deployment outcomes by normalized pipeline (`pipeline-a`, `pipeline-b`, `unknown`) with per-pipeline change failure rate and median lead time.
- Groups incidents by severity with resolved/acknowledged counts plus MTTA/MTTR aggregates.
- Produces daily cohort slices across the requested date range for executive trend views.
Dependency injection integration:
- `TelemetryServiceCollectionExtensions.AddDoraMetrics(...)` now also registers `IOutcomeAnalyticsService`, so existing telemetry entry points automatically expose attribution reporting without additional module wiring.
Verification coverage:
- `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/OutcomeAnalyticsServiceTests.cs`
- `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/DoraMetricsServiceTests.cs`
- Full telemetry core test suite pass (`262` tests) remains green after integration.
Refer to the module README and implementation plan for immediate context, and update this document once component boundaries and data flows are finalised.