semi implemented and features implemented save checkpoint

This commit is contained in:
master
2026-02-08 18:00:49 +02:00
parent 04360dff63
commit 1bf6bbf395
20895 changed files with 716795 additions and 64 deletions

View File

@@ -0,0 +1,26 @@
# Incident/Forensic Mode (High-Fidelity Sampling)
## Module
Telemetry
## Status
IMPLEMENTED
## Description
Incident/forensic mode service that enables high-fidelity (100%) sampling during security incidents for detailed investigation.
## Implementation Details
- **IIncidentModeService interface**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/IIncidentModeService.cs` -- `IsActive`, `CurrentState`, `ActivateAsync` (actor, tenantId, TTL override, reason), `DeactivateAsync`; manages incident mode state with per-tenant granularity
- **IncidentModeService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/IncidentModeService.cs` -- default implementation with activation/deactivation lifecycle
- **IncidentModeOptions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/IncidentModeOptions.cs` -- configurable default TTL and sampling rates
- **ISealedModeTelemetryService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ISealedModeTelemetryService.cs` -- `IsIncidentModeOverrideActive` property enables incident mode to override sealed mode sampling rate
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/IncidentModeServiceTests.cs`
- **Source**: Feature matrix scan
## E2E Test Plan
- [ ] Verify incident mode activation increases sampling rate to 100%
- [ ] Test TTL override correctly expires incident mode after configured duration
- [ ] Verify incident mode tags are attached to all telemetry during active period
- [ ] Test incident mode overrides sealed mode sampling restrictions
- [ ] Verify deactivation restores normal sampling rates
- [ ] Test per-tenant incident mode isolation

View File

@@ -0,0 +1,23 @@
# Metric Label Analyzer (Static Analysis)
## Module
Telemetry
## Status
IMPLEMENTED
## Description
Roslyn-based analyzer that validates metric label usage at compile time to prevent telemetry cardinality issues.
## Implementation Details
- **MetricLabelAnalyzer**: `src/Telemetry/StellaOps.Telemetry.Analyzers/MetricLabelAnalyzer.cs` -- Roslyn-based DiagnosticAnalyzer that validates metric label usage at compile time; detects high-cardinality labels, missing required labels, and naming convention violations
- **MetricLabelGuard**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/MetricLabelGuard.cs` -- runtime guard that validates metric labels before emission
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Analyzers/StellaOps.Telemetry.Analyzers.Tests/MetricLabelAnalyzerTests.cs`, `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/MetricLabelGuardTests.cs`
- **Source**: Feature matrix scan
## E2E Test Plan
- [ ] Verify Roslyn analyzer detects high-cardinality metric labels at compile time
- [ ] Test analyzer flags missing required labels (tenant, service, environment)
- [ ] Verify naming convention violations produce diagnostic warnings
- [ ] Test runtime MetricLabelGuard rejects labels exceeding cardinality thresholds
- [ ] Verify analyzer integrates with CI build pipeline for automated enforcement

View File

@@ -0,0 +1,29 @@
# OpenTelemetry Integration
## Module
Telemetry
## Status
IMPLEMENTED
## Description
OpenTelemetry-based telemetry infrastructure with configurable options and custom exporters including TTE percentile exporter.
## Implementation Details
- **StellaOpsTelemetryOptions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/StellaOpsTelemetryOptions.cs` -- configurable OTEL options with `CollectorOptions` for endpoint, protocol, and component
- **TelemetryServiceCollectionExtensions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryServiceCollectionExtensions.cs` -- DI registration for OTEL tracing, metrics, and logging
- **TelemetryApplicationBuilderExtensions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryApplicationBuilderExtensions.cs` -- middleware pipeline integration
- **TelemetryServiceDescriptor**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryServiceDescriptor.cs` -- service identity for telemetry tagging
- **TelemetrySignal**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetrySignal.cs` -- signal types (traces, metrics, logs)
- **TtePercentileExporter**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TtePercentileExporter.cs` -- custom OTEL exporter for TTE percentile metrics
- **GoldenSignalMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/GoldenSignalMetrics.cs` -- golden signal metrics (latency, traffic, errors, saturation)
- **GrpcContextInterceptors**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/GrpcContextInterceptors.cs` -- gRPC telemetry interceptors
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/GoldenSignalMetricsTests.cs`
- **Source**: Feature matrix scan
## E2E Test Plan
- [ ] Verify OTEL traces are exported with correct service identity and span attributes
- [ ] Test OTEL metrics export includes golden signal metrics (latency, traffic, errors, saturation)
- [ ] Verify TTE percentile exporter publishes p50/p90/p99 buckets
- [ ] Test gRPC interceptors propagate trace context across service boundaries
- [ ] Verify collector endpoint configuration respects sealed mode restrictions

View File

@@ -0,0 +1,33 @@
# P0 Product-Level Metrics and Dashboard
## Module
Telemetry
## Status
IMPLEMENTED
## Description
Four P0 product-level metrics instrumented: time-to-first-verified-release, mean-time-to-answer-why-blocked, support-minutes-per-customer, and determinism-regressions-total, with Prometheus alerting rules and install timestamp tracking service.
## Implementation Details
- **P0ProductMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/P0ProductMetrics.cs` -- meter `StellaOps.P0Metrics` with 4 P0 metrics:
- P0M-001: `stella_time_to_first_verified_release_seconds` -- histogram with buckets 5m to 1 week
- P0M-002: `stella_why_blocked_latency_seconds` -- mean time to answer "why blocked"
- P0M-003: `stella_support_burden_minutes_total` -- support minutes per customer counter
- P0M-004: `stella_determinism_regressions_total` -- determinism regression counter
- **InstallTimestampService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/InstallTimestampService.cs` -- tracks fresh install timestamp for P0M-001
- **GoldenSignalMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/GoldenSignalMetrics.cs` -- golden signal metrics (latency, traffic, errors, saturation)
- **FidelityMetricsTelemetry**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/FidelityMetricsTelemetry.cs` -- fidelity metrics for evidence quality
- **FidelitySloAlertingService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/FidelitySloAlertingService.cs` -- SLO alerting for fidelity metrics
- **ProofCoverageMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ProofCoverageMetrics.cs` -- proof coverage tracking
- **ProofGenerationMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ProofGenerationMetrics.cs` -- proof generation performance
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/GoldenSignalMetricsTests.cs`, `ProofCoverageMetricsTests.cs`
- **Source**: SPRINT_20260117_028_Telemetry
## E2E Test Plan
- [ ] Verify time-to-first-verified-release histogram records elapsed time from install
- [ ] Test why-blocked latency captures mean time from block to explanation delivery
- [ ] Verify support minutes counter increments per customer interaction
- [ ] Test determinism regression counter fires on replay divergence detection
- [ ] Verify Prometheus alerting rules trigger on SLO breaches
- [ ] Test install timestamp service persists and recovers install time across restarts

View File

@@ -0,0 +1,26 @@
# Redacting Log Processor
## Module
Telemetry
## Status
IMPLEMENTED
## Description
Log processor that redacts sensitive data from telemetry output before export.
## Implementation Details
- **RedactingLogProcessor**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/RedactingLogProcessor.cs` -- OpenTelemetry LogRecordProcessor that redacts sensitive data before export
- **ILogRedactor interface**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ILogRedactor.cs` -- redaction service interface
- **LogRedactor**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/LogRedactor.cs` -- default implementation with configurable redaction patterns
- **LogRedactionOptions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/LogRedactionOptions.cs` -- configurable patterns, replacement text, and scope
- **DeterministicLogFormatter**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/DeterministicLogFormatter.cs` -- deterministic log formatting for reproducibility
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/LogRedactorTests.cs`, `DeterministicLogFormatterTests.cs`
- **Source**: Feature matrix scan
## E2E Test Plan
- [ ] Verify log processor redacts PII patterns (emails, IPs, tokens) before export
- [ ] Test custom redaction patterns are applied via LogRedactionOptions
- [ ] Verify deterministic log formatter produces reproducible output
- [ ] Test redaction preserves log structure and does not corrupt JSON output
- [ ] Verify redaction applies to both log message and log attributes

View File

@@ -0,0 +1,26 @@
# Sealed-Mode Telemetry (Offline/Air-Gap)
## Module
Telemetry
## Status
IMPLEMENTED
## Description
Sealed-mode telemetry that writes to local files instead of external endpoints, supporting air-gapped environments.
## Implementation Details
- **ISealedModeTelemetryService interface**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ISealedModeTelemetryService.cs` -- `IsSealed`, `EffectiveSamplingRate`, `IsIncidentModeOverrideActive`, `GetSealedModeTags`, `ShouldAllowExporter`; blocks external exporters when sealed
- **SealedModeTelemetryService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/SealedModeTelemetryService.cs` -- implementation that disables external exporters and writes to local storage
- **SealedModeFileExporter**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/SealedModeFileExporter.cs` -- writes telemetry to local files in air-gapped mode
- **SealedModeTelemetryOptions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/SealedModeTelemetryOptions.cs` -- local storage path, file rotation, retention settings
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/SealedModeTelemetryServiceTests.cs`, `SealedModeFileExporterTests.cs`
- **Source**: Feature matrix scan
## E2E Test Plan
- [ ] Verify sealed mode blocks all external telemetry exporters
- [ ] Test telemetry is written to local files when sealed mode is active
- [ ] Verify sealed mode tags are added to all telemetry data
- [ ] Test incident mode can override sealed mode sampling rate
- [ ] Verify file exporter handles rotation and retention correctly
- [ ] Test transition between sealed and normal modes preserves data integrity

View File

@@ -0,0 +1,32 @@
# Telemetry Context Propagation Library
## Module
Telemetry
## Status
IMPLEMENTED
## Description
Shared telemetry context propagation library providing standardized trace/span ID injection, tenant context threading, and PII scrubbing across all platform services.
## Implementation Details
- **ITelemetryContextAccessor**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ITelemetryContextAccessor.cs` -- `Context` / `Current` accessor for ambient telemetry context
- **TelemetryContextAccessor**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryContextAccessor.cs` -- AsyncLocal-based implementation
- **TelemetryContext**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryContext.cs` -- context model with trace/span ID, tenant, service identity
- **TelemetryContextPropagationMiddleware**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryContextPropagationMiddleware.cs` -- ASP.NET middleware for HTTP context propagation
- **TelemetryPropagationMiddleware**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryPropagationMiddleware.cs` -- additional propagation middleware
- **TelemetryPropagationHandler**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryPropagationHandler.cs` -- HTTP client handler for outbound context propagation
- **TelemetryContextPropagator**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryContextPropagator.cs` -- W3C trace context propagation
- **TelemetryContextJobScope**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryContextJobScope.cs` -- context scoping for background jobs
- **GrpcContextInterceptors**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/GrpcContextInterceptors.cs` -- gRPC interceptors for context propagation
- **CliTelemetryContext**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/CliTelemetryContext.cs` -- CLI-specific context for command telemetry
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/TelemetryContextAccessorTests.cs`, `TelemetryContextTests.cs`, `TelemetryPropagationHandlerTests.cs`, `TelemetryPropagationMiddlewareTests.cs`, `CliTelemetryContextTests.cs`
- **Source**: SPRINT_0174_0001_0001_telemetry.md
## E2E Test Plan
- [ ] Verify trace/span IDs propagate across HTTP service boundaries via middleware
- [ ] Test tenant context threads through all service calls in a request
- [ ] Verify outbound HTTP calls include propagated context via TelemetryPropagationHandler
- [ ] Test gRPC interceptors propagate context for inter-service gRPC calls
- [ ] Verify background job scope correctly inherits and isolates telemetry context
- [ ] Test CLI telemetry context attaches command metadata to spans

View File

@@ -0,0 +1,25 @@
# Telemetry Exporter Guard
## Module
Telemetry
## Status
IMPLEMENTED
## Description
Guard that prevents telemetry export to unauthorized endpoints, enforcing sealed-mode restrictions.
## Implementation Details
- **TelemetryExporterGuard**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryExporterGuard.cs` -- `IsExporterAllowed(descriptor, options, signal, endpoint, out decision)` that applies `IEgressPolicy` from `StellaOps.AirGap.Policy`; returns allow/deny with `EgressDecision` details; logs enforcement results
- **TelemetrySignal**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetrySignal.cs` -- signal types (traces, metrics, logs) for per-signal guard evaluation
- **TelemetryServiceDescriptor**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TelemetryServiceDescriptor.cs` -- service identity for guard evaluation
- **StellaOpsTelemetryOptions.CollectorOptions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/StellaOpsTelemetryOptions.cs` -- collector endpoint and component configuration
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/TelemetryExporterGuardTests.cs`
- **Source**: Feature matrix scan
## E2E Test Plan
- [ ] Verify guard blocks telemetry export to unauthorized endpoints when air-gap policy is active
- [ ] Test guard allows export when no egress policy is configured (permissive default)
- [ ] Verify per-signal guard evaluation (traces, metrics, logs can have different policies)
- [ ] Test guard logs enforcement decisions for audit trail
- [ ] Verify integration with SealedModeTelemetryService for complete export blocking

View File

@@ -0,0 +1,37 @@
# Time-to-Evidence (TTE) metric instrumentation and percentile export
## Module
Telemetry
## Status
IMPLEMENTED
## Description
TTE metrics capture and percentile export are implemented in the Telemetry.Core library with DI registration support.
## Implementation Details
- **TimeToEvidenceMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TimeToEvidenceMetrics.cs` -- meter `StellaOps.TimeToEvidence` with:
- `tte_phase_latency_seconds` -- histogram for per-phase latency
- `tte_scan_duration_seconds` -- histogram for total scan duration
- `tte_phase_completed_total` -- counter for completed phases
- `tte_phase_failed_total` -- counter for failed phases
- `tte_slo_breach_total` -- counter for SLO breaches
- `tte_evidence_attached_total` -- counter for evidence attachments
- `tte_decision_made_total` -- counter for decisions made
- **TtePercentileExporter**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TtePercentileExporter.cs` -- custom OTEL exporter for p50/p90/p99 percentile export
- **TimeToFirstSignalMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TimeToFirstSignalMetrics.cs` -- TTFS metrics for first signal detection
- **TimeToFirstSignalOptions**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/TimeToFirstSignalOptions.cs` -- TTFS configuration
- **TtfsIngestionService**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/Triage/TtfsIngestionService.cs` -- ingests TTFS events for metrics
- **TtfsEvent**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/Triage/TtfsEvent.cs` -- TTFS event model
- **ScanCompletionMetricsIntegration**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/ScanCompletionMetricsIntegration.cs` -- integrates scan completion into TTE pipeline
- **UnknownsBurndownMetrics**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core/UnknownsBurndownMetrics.cs` -- tracks unknowns burndown rate
- **Tests**: `src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/TimeToFirstSignalMetricsTests.cs`, `TtfsIngestionServiceTests.cs`
- **Source**: Feature matrix scan
## E2E Test Plan
- [ ] Verify TTE phase latency histogram records per-phase timing accurately
- [ ] Test percentile exporter produces p50/p90/p99 values for TTE metrics
- [ ] Verify SLO breach counter fires when phase latency exceeds threshold
- [ ] Test TTFS metrics capture time from CVE disclosure to first signal detection
- [ ] Verify scan completion integration records evidence attachment timing
- [ ] Test unknowns burndown metrics track reduction rate over time