save checkpoint
This commit is contained in:
32
docs/features/checked/telemetry/dora-metrics.md
Normal file
32
docs/features/checked/telemetry/dora-metrics.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# DORA Metrics
|
||||
|
||||
## Module
|
||||
Telemetry
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
DORA (DevOps Research and Assessment) metrics implementation tracking the four key metrics: Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Mean Time to Recovery (MTTR), with SLO breach tracking and performance classification.
|
||||
|
||||
## Implementation Details
|
||||
- **DoraMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/DoraMetrics.cs` -- OpenTelemetry meter `StellaOps.DORA` with counters for deployments, successes, failures, incidents, resolutions, and histograms for deployment duration, lead time, and MTTR; includes SLO breach counter and performance level classification (Elite/High/Medium/Low/Unknown)
|
||||
- **DoraMetricsModels**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/DoraMetricsModels.cs` -- DoraMetricsOptions, DoraPerformanceLevel enum, DoraDeploymentOutcome enum, DoraIncidentSeverity enum, DoraDeploymentEvent record, DoraIncidentEvent record, DoraSummary record
|
||||
- **IDoraMetricsService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/IDoraMetricsService.cs` -- service interface for recording deployments, incidents, resolving incidents, getting summaries, and querying events
|
||||
- **InMemoryDoraMetricsService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/InMemoryDoraMetricsService.cs` -- in-memory implementation with per-tenant isolation, median lead time calculation, CFR computation, MTTR aggregation, and environment-level filtering
|
||||
- **DI Registration**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryServiceCollectionExtensions.cs` -- `AddDoraMetrics()` extension method registering DoraMetrics, IDoraMetricsService, and IOutcomeAnalyticsService
|
||||
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/DoraMetricsTests.cs` (11 test cases), `DoraMetricsServiceTests.cs` (11 test cases)
|
||||
- **Source**: Feature matrix scan + QA verification
|
||||
|
||||
## Verified Behaviors
|
||||
- Deployment recording emits dora_deployments_total, dora_deployment_success_total, dora_deployment_duration_seconds, dora_lead_time_hours
|
||||
- Rollback/hotfix/failed outcomes emit dora_deployment_failure_total
|
||||
- Lead time SLO breach emits dora_slo_breach_total with metric=lead_time tag
|
||||
- MTTR SLO breach emits dora_slo_breach_total with metric=mttr tag
|
||||
- Incident tracking with start/resolution lifecycle
|
||||
- Performance classification across all four DORA levels
|
||||
- Summary calculation with deployment frequency, CFR, median lead time, MTTR
|
||||
- Per-tenant and per-environment isolation
|
||||
|
||||
## QA Notes
|
||||
- Bug fix applied: DoraMetricsTests._measurements changed from List<> to ConcurrentBag<> to fix race condition in MeterListener callbacks
|
||||
26
docs/features/checked/telemetry/incident-forensic-mode.md
Normal file
26
docs/features/checked/telemetry/incident-forensic-mode.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# Incident/Forensic Mode (High-Fidelity Sampling)
|
||||
|
||||
## Module
|
||||
Telemetry
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
Incident/forensic mode service that enables high-fidelity (100%) sampling during security incidents for detailed investigation.
|
||||
|
||||
## Implementation Details
|
||||
- **IIncidentModeService interface**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/IIncidentModeService.cs` -- `IsActive`, `CurrentState`, `ActivateAsync` (actor, tenantId, TTL override, reason), `DeactivateAsync`; manages incident mode state with per-tenant granularity
|
||||
- **IncidentModeService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/IncidentModeService.cs` -- default implementation with activation/deactivation lifecycle
|
||||
- **IncidentModeOptions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/IncidentModeOptions.cs` -- configurable default TTL and sampling rates
|
||||
- **ISealedModeTelemetryService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ISealedModeTelemetryService.cs` -- `IsIncidentModeOverrideActive` property enables incident mode to override sealed mode sampling rate
|
||||
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/IncidentModeServiceTests.cs`
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Verify incident mode activation increases sampling rate to 100%
|
||||
- [ ] Test TTL override correctly expires incident mode after configured duration
|
||||
- [ ] Verify incident mode tags are attached to all telemetry during active period
|
||||
- [ ] Test incident mode overrides sealed mode sampling restrictions
|
||||
- [ ] Verify deactivation restores normal sampling rates
|
||||
- [ ] Test per-tenant incident mode isolation
|
||||
23
docs/features/checked/telemetry/metric-label-analyzer.md
Normal file
23
docs/features/checked/telemetry/metric-label-analyzer.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# Metric Label Analyzer (Static Analysis)
|
||||
|
||||
## Module
|
||||
Telemetry
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
Roslyn-based analyzer that validates metric label usage at compile time to prevent telemetry cardinality issues.
|
||||
|
||||
## Implementation Details
|
||||
- **MetricLabelAnalyzer**: `src/Telemetry/StellaOps.Telemetry.Analyzers/MetricLabelAnalyzer.cs` -- Roslyn-based DiagnosticAnalyzer that validates metric label usage at compile time; detects high-cardinality labels, missing required labels, and naming convention violations
|
||||
- **MetricLabelGuard**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/MetricLabelGuard.cs` -- runtime guard that validates metric labels before emission
|
||||
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Analyzers/StellaOps.Telemetry.Analyzers.Tests/MetricLabelAnalyzerTests.cs`, `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/MetricLabelGuardTests.cs`
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Verify Roslyn analyzer detects high-cardinality metric labels at compile time
|
||||
- [ ] Test analyzer flags missing required labels (tenant, service, environment)
|
||||
- [ ] Verify naming convention violations produce diagnostic warnings
|
||||
- [ ] Test runtime MetricLabelGuard rejects labels exceeding cardinality thresholds
|
||||
- [ ] Verify analyzer integrates with CI build pipeline for automated enforcement
|
||||
29
docs/features/checked/telemetry/opentelemetry-integration.md
Normal file
29
docs/features/checked/telemetry/opentelemetry-integration.md
Normal file
@@ -0,0 +1,29 @@
|
||||
# OpenTelemetry Integration
|
||||
|
||||
## Module
|
||||
Telemetry
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
OpenTelemetry-based telemetry infrastructure with configurable options and custom exporters including TTE percentile exporter.
|
||||
|
||||
## Implementation Details
|
||||
- **StellaOpsTelemetryOptions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/StellaOpsTelemetryOptions.cs` -- configurable OTEL options with `CollectorOptions` for endpoint, protocol, and component
|
||||
- **TelemetryServiceCollectionExtensions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryServiceCollectionExtensions.cs` -- DI registration for OTEL tracing, metrics, and logging
|
||||
- **TelemetryApplicationBuilderExtensions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryApplicationBuilderExtensions.cs` -- middleware pipeline integration
|
||||
- **TelemetryServiceDescriptor**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryServiceDescriptor.cs` -- service identity for telemetry tagging
|
||||
- **TelemetrySignal**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetrySignal.cs` -- signal types (traces, metrics, logs)
|
||||
- **TtePercentileExporter**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TtePercentileExporter.cs` -- custom OTEL exporter for TTE percentile metrics
|
||||
- **GoldenSignalMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/GoldenSignalMetrics.cs` -- golden signal metrics (latency, traffic, errors, saturation)
|
||||
- **GrpcContextInterceptors**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/GrpcContextInterceptors.cs` -- gRPC telemetry interceptors
|
||||
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/GoldenSignalMetricsTests.cs`
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Verify OTEL traces are exported with correct service identity and span attributes
|
||||
- [ ] Test OTEL metrics export includes golden signal metrics (latency, traffic, errors, saturation)
|
||||
- [ ] Verify TTE percentile exporter publishes p50/p90/p99 buckets
|
||||
- [ ] Test gRPC interceptors propagate trace context across service boundaries
|
||||
- [ ] Verify collector endpoint configuration respects sealed mode restrictions
|
||||
@@ -0,0 +1,27 @@
|
||||
# Outcome Analytics / Attribution
|
||||
|
||||
## Module
|
||||
Telemetry
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
Deterministic outcome analytics service providing MTTA/MTTR attribution, per-pipeline deployment attribution, per-severity incident attribution, daily cohort analysis, and executive reporting backed by DORA metrics.
|
||||
|
||||
## Implementation Details
|
||||
- **IOutcomeAnalyticsService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/IOutcomeAnalyticsService.cs` -- service interface for building executive outcome reports
|
||||
- **DoraOutcomeAnalyticsService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/DoraOutcomeAnalyticsService.cs` -- deterministic implementation backed by IDoraMetricsService; builds deployment attribution slices grouped by pipeline, incident attribution slices grouped by severity, and daily cohort views for trend reporting
|
||||
- **OutcomeAnalyticsModels**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/OutcomeAnalyticsModels.cs` -- OutcomeExecutiveReport, DeploymentAttributionSlice, IncidentAttributionSlice, OutcomeCohortSlice records
|
||||
- **DI Registration**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryServiceCollectionExtensions.cs` -- registered automatically via `AddDoraMetrics()` extension method
|
||||
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/OutcomeAnalyticsServiceTests.cs` (3 test cases)
|
||||
- **Source**: Feature matrix scan + QA verification
|
||||
|
||||
## Verified Behaviors
|
||||
- Executive report computes total/failed deployments, total/resolved/acknowledged incidents
|
||||
- MTTA and MTTR computed across incidents with deterministic rounding
|
||||
- Deployment attribution grouped by pipeline with per-pipeline CFR and median lead time
|
||||
- Incident attribution grouped by severity with per-severity MTTA/MTTR
|
||||
- Daily cohort view covers full date range with deployment and incident counts
|
||||
- Deterministic: repeated calls with same data produce identical reports
|
||||
- DI registration via AddDoraMetrics() resolves IOutcomeAnalyticsService
|
||||
@@ -0,0 +1,33 @@
|
||||
# P0 Product-Level Metrics and Dashboard
|
||||
|
||||
## Module
|
||||
Telemetry
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
Four P0 product-level metrics instrumented: time-to-first-verified-release, mean-time-to-answer-why-blocked, support-minutes-per-customer, and determinism-regressions-total, with Prometheus alerting rules and install timestamp tracking service.
|
||||
|
||||
## Implementation Details
|
||||
- **P0ProductMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/P0ProductMetrics.cs` -- meter `StellaOps.P0Metrics` with 4 P0 metrics:
|
||||
- P0M-001: `stella_time_to_first_verified_release_seconds` -- histogram with buckets 5m to 1 week
|
||||
- P0M-002: `stella_why_blocked_latency_seconds` -- mean time to answer "why blocked"
|
||||
- P0M-003: `stella_support_burden_minutes_total` -- support minutes per customer counter
|
||||
- P0M-004: `stella_determinism_regressions_total` -- determinism regression counter
|
||||
- **InstallTimestampService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/InstallTimestampService.cs` -- tracks fresh install timestamp for P0M-001
|
||||
- **GoldenSignalMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/GoldenSignalMetrics.cs` -- golden signal metrics (latency, traffic, errors, saturation)
|
||||
- **FidelityMetricsTelemetry**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/FidelityMetricsTelemetry.cs` -- fidelity metrics for evidence quality
|
||||
- **FidelitySloAlertingService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/FidelitySloAlertingService.cs` -- SLO alerting for fidelity metrics
|
||||
- **ProofCoverageMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ProofCoverageMetrics.cs` -- proof coverage tracking
|
||||
- **ProofGenerationMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ProofGenerationMetrics.cs` -- proof generation performance
|
||||
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/GoldenSignalMetricsTests.cs`, `ProofCoverageMetricsTests.cs`
|
||||
- **Source**: SPRINT_20260117_028_Telemetry
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Verify time-to-first-verified-release histogram records elapsed time from install
|
||||
- [ ] Test why-blocked latency captures mean time from block to explanation delivery
|
||||
- [ ] Verify support minutes counter increments per customer interaction
|
||||
- [ ] Test determinism regression counter fires on replay divergence detection
|
||||
- [ ] Verify Prometheus alerting rules trigger on SLO breaches
|
||||
- [ ] Test install timestamp service persists and recovers install time across restarts
|
||||
26
docs/features/checked/telemetry/redacting-log-processor.md
Normal file
26
docs/features/checked/telemetry/redacting-log-processor.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# Redacting Log Processor
|
||||
|
||||
## Module
|
||||
Telemetry
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
Log processor that redacts sensitive data from telemetry output before export.
|
||||
|
||||
## Implementation Details
|
||||
- **RedactingLogProcessor**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/RedactingLogProcessor.cs` -- OpenTelemetry LogRecordProcessor that redacts sensitive data before export
|
||||
- **ILogRedactor interface**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ILogRedactor.cs` -- redaction service interface
|
||||
- **LogRedactor**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/LogRedactor.cs` -- default implementation with configurable redaction patterns
|
||||
- **LogRedactionOptions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/LogRedactionOptions.cs` -- configurable patterns, replacement text, and scope
|
||||
- **DeterministicLogFormatter**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/DeterministicLogFormatter.cs` -- deterministic log formatting for reproducibility
|
||||
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/LogRedactorTests.cs`, `DeterministicLogFormatterTests.cs`
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Verify log processor redacts PII patterns (emails, IPs, tokens) before export
|
||||
- [ ] Test custom redaction patterns are applied via LogRedactionOptions
|
||||
- [ ] Verify deterministic log formatter produces reproducible output
|
||||
- [ ] Test redaction preserves log structure and does not corrupt JSON output
|
||||
- [ ] Verify redaction applies to both log message and log attributes
|
||||
26
docs/features/checked/telemetry/sealed-mode-telemetry.md
Normal file
26
docs/features/checked/telemetry/sealed-mode-telemetry.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# Sealed-Mode Telemetry (Offline/Air-Gap)
|
||||
|
||||
## Module
|
||||
Telemetry
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
Sealed-mode telemetry that writes to local files instead of external endpoints, supporting air-gapped environments.
|
||||
|
||||
## Implementation Details
|
||||
- **ISealedModeTelemetryService interface**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ISealedModeTelemetryService.cs` -- `IsSealed`, `EffectiveSamplingRate`, `IsIncidentModeOverrideActive`, `GetSealedModeTags`, `ShouldAllowExporter`; blocks external exporters when sealed
|
||||
- **SealedModeTelemetryService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/SealedModeTelemetryService.cs` -- implementation that disables external exporters and writes to local storage
|
||||
- **SealedModeFileExporter**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/SealedModeFileExporter.cs` -- writes telemetry to local files in air-gapped mode
|
||||
- **SealedModeTelemetryOptions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/SealedModeTelemetryOptions.cs` -- local storage path, file rotation, retention settings
|
||||
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/SealedModeTelemetryServiceTests.cs`, `SealedModeFileExporterTests.cs`
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Verify sealed mode blocks all external telemetry exporters
|
||||
- [ ] Test telemetry is written to local files when sealed mode is active
|
||||
- [ ] Verify sealed mode tags are added to all telemetry data
|
||||
- [ ] Test incident mode can override sealed mode sampling rate
|
||||
- [ ] Verify file exporter handles rotation and retention correctly
|
||||
- [ ] Test transition between sealed and normal modes preserves data integrity
|
||||
@@ -0,0 +1,32 @@
|
||||
# Telemetry Context Propagation Library
|
||||
|
||||
## Module
|
||||
Telemetry
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
Shared telemetry context propagation library providing standardized trace/span ID injection, tenant context threading, and PII scrubbing across all platform services.
|
||||
|
||||
## Implementation Details
|
||||
- **ITelemetryContextAccessor**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ITelemetryContextAccessor.cs` -- `Context` / `Current` accessor for ambient telemetry context
|
||||
- **TelemetryContextAccessor**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryContextAccessor.cs` -- AsyncLocal-based implementation
|
||||
- **TelemetryContext**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryContext.cs` -- context model with trace/span ID, tenant, service identity
|
||||
- **TelemetryContextPropagationMiddleware**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryContextPropagationMiddleware.cs` -- ASP.NET middleware for HTTP context propagation
|
||||
- **TelemetryPropagationMiddleware**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryPropagationMiddleware.cs` -- additional propagation middleware
|
||||
- **TelemetryPropagationHandler**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryPropagationHandler.cs` -- HTTP client handler for outbound context propagation
|
||||
- **TelemetryContextPropagator**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryContextPropagator.cs` -- W3C trace context propagation
|
||||
- **TelemetryContextJobScope**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryContextJobScope.cs` -- context scoping for background jobs
|
||||
- **GrpcContextInterceptors**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/GrpcContextInterceptors.cs` -- gRPC interceptors for context propagation
|
||||
- **CliTelemetryContext**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/CliTelemetryContext.cs` -- CLI-specific context for command telemetry
|
||||
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/TelemetryContextAccessorTests.cs`, `TelemetryContextTests.cs`, `TelemetryPropagationHandlerTests.cs`, `TelemetryPropagationMiddlewareTests.cs`, `CliTelemetryContextTests.cs`
|
||||
- **Source**: SPRINT_0174_0001_0001_telemetry.md
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Verify trace/span IDs propagate across HTTP service boundaries via middleware
|
||||
- [ ] Test tenant context threads through all service calls in a request
|
||||
- [ ] Verify outbound HTTP calls include propagated context via TelemetryPropagationHandler
|
||||
- [ ] Test gRPC interceptors propagate context for inter-service gRPC calls
|
||||
- [ ] Verify background job scope correctly inherits and isolates telemetry context
|
||||
- [ ] Test CLI telemetry context attaches command metadata to spans
|
||||
25
docs/features/checked/telemetry/telemetry-exporter-guard.md
Normal file
25
docs/features/checked/telemetry/telemetry-exporter-guard.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# Telemetry Exporter Guard
|
||||
|
||||
## Module
|
||||
Telemetry
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
Guard that prevents telemetry export to unauthorized endpoints, enforcing sealed-mode restrictions.
|
||||
|
||||
## Implementation Details
|
||||
- **TelemetryExporterGuard**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryExporterGuard.cs` -- `IsExporterAllowed(descriptor, options, signal, endpoint, out decision)` that applies `IEgressPolicy` from `StellaOps.AirGap.Policy`; returns allow/deny with `EgressDecision` details; logs enforcement results
|
||||
- **TelemetrySignal**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetrySignal.cs` -- signal types (traces, metrics, logs) for per-signal guard evaluation
|
||||
- **TelemetryServiceDescriptor**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryServiceDescriptor.cs` -- service identity for guard evaluation
|
||||
- **StellaOpsTelemetryOptions.CollectorOptions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/StellaOpsTelemetryOptions.cs` -- collector endpoint and component configuration
|
||||
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/TelemetryExporterGuardTests.cs`
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Verify guard blocks telemetry export to unauthorized endpoints when air-gap policy is active
|
||||
- [ ] Test guard allows export when no egress policy is configured (permissive default)
|
||||
- [ ] Verify per-signal guard evaluation (traces, metrics, logs can have different policies)
|
||||
- [ ] Test guard logs enforcement decisions for audit trail
|
||||
- [ ] Verify integration with SealedModeTelemetryService for complete export blocking
|
||||
@@ -0,0 +1,37 @@
|
||||
# Time-to-Evidence (TTE) metric instrumentation and percentile export
|
||||
|
||||
## Module
|
||||
Telemetry
|
||||
|
||||
## Status
|
||||
IMPLEMENTED
|
||||
|
||||
## Description
|
||||
TTE metrics capture and percentile export are implemented in the Telemetry.Core library with DI registration support.
|
||||
|
||||
## Implementation Details
|
||||
- **TimeToEvidenceMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TimeToEvidenceMetrics.cs` -- meter `StellaOps.TimeToEvidence` with:
|
||||
- `tte_phase_latency_seconds` -- histogram for per-phase latency
|
||||
- `tte_scan_duration_seconds` -- histogram for total scan duration
|
||||
- `tte_phase_completed_total` -- counter for completed phases
|
||||
- `tte_phase_failed_total` -- counter for failed phases
|
||||
- `tte_slo_breach_total` -- counter for SLO breaches
|
||||
- `tte_evidence_attached_total` -- counter for evidence attachments
|
||||
- `tte_decision_made_total` -- counter for decisions made
|
||||
- **TtePercentileExporter**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TtePercentileExporter.cs` -- custom OTEL exporter for p50/p90/p99 percentile export
|
||||
- **TimeToFirstSignalMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TimeToFirstSignalMetrics.cs` -- TTFS metrics for first signal detection
|
||||
- **TimeToFirstSignalOptions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TimeToFirstSignalOptions.cs` -- TTFS configuration
|
||||
- **TtfsIngestionService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/Triage/TtfsIngestionService.cs` -- ingests TTFS events for metrics
|
||||
- **TtfsEvent**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/Triage/TtfsEvent.cs` -- TTFS event model
|
||||
- **ScanCompletionMetricsIntegration**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ScanCompletionMetricsIntegration.cs` -- integrates scan completion into TTE pipeline
|
||||
- **UnknownsBurndownMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/UnknownsBurndownMetrics.cs` -- tracks unknowns burndown rate
|
||||
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/TimeToFirstSignalMetricsTests.cs`, `TtfsIngestionServiceTests.cs`
|
||||
- **Source**: Feature matrix scan
|
||||
|
||||
## E2E Test Plan
|
||||
- [ ] Verify TTE phase latency histogram records per-phase timing accurately
|
||||
- [ ] Test percentile exporter produces p50/p90/p99 values for TTE metrics
|
||||
- [ ] Verify SLO breach counter fires when phase latency exceeds threshold
|
||||
- [ ] Test TTFS metrics capture time from CVE disclosure to first signal detection
|
||||
- [ ] Verify scan completion integration records evidence attachment timing
|
||||
- [ ] Test unknowns burndown metrics track reduction rate over time
|
||||
Reference in New Issue
Block a user