Files
git.stella-ops.org/docs/implplan/SPRINT_0174_0001_0001_telemetry.md
master b7059d523e Refactor and update test projects, remove obsolete tests, and upgrade dependencies
- Deleted obsolete test files for SchedulerAuditService and SchedulerMongoSessionFactory.
- Removed unused TestDataFactory class.
- Updated project files for Mongo.Tests to remove references to deleted files.
- Upgraded BouncyCastle.Cryptography package to version 2.6.2 across multiple projects.
- Replaced Microsoft.Extensions.Http.Polly with Microsoft.Extensions.Http.Resilience in Zastava.Webhook project.
- Updated NetEscapades.Configuration.Yaml package to version 3.1.0 in Configuration library.
- Upgraded Pkcs11Interop package to version 5.1.2 in Cryptography libraries.
- Refactored Argon2idPasswordHasher to use BouncyCastle for hashing instead of Konscious.
- Updated JsonSchema.Net package to version 7.3.2 in Microservice project.
- Updated global.json to use .NET SDK version 10.0.101.
2025-12-10 19:13:29 +02:00

9.6 KiB

Sprint 0174-0001-0001 · Telemetry (Notifications & Telemetry 170.B)

Topic & Scope

  • Deliver StellaOps.Telemetry.Core bootstrap, propagation middleware, metrics helpers, scrubbing, incident/sealed-mode toggles.
  • Provide sample host integrations while keeping deterministic, offline-friendly telemetry with redaction and tenant awareness.
  • Working directory: src/Telemetry/StellaOps.Telemetry.Core.

Dependencies & Concurrency

  • Upstream: Sprint 0150 (Orchestrator) for host integration; CLI toggle contract (CLI-OBS-12-001); Notify incident payload spec (NOTIFY-OBS-55-001); Security scrub policy (POLICY-SEC-42-003).
  • Concurrency: tasks follow 50 → 51 → 55/56 chain; 50-002 waits on 50-001 package.

Documentation Prerequisites

  • docs/README.md
  • docs/07_HIGH_LEVEL_ARCHITECTURE.md
  • docs/modules/platform/architecture-overview.md
  • docs/modules/telemetry/architecture.md
  • src/Telemetry/StellaOps.Telemetry.Core/AGENTS.md

Delivery Tracker

# Task ID Status Key dependency / next step Owners Task Definition
P1 PREP-TELEMETRY-OBS-50-002-AWAIT-PUBLISHED-50 DONE (2025-11-19) Due 2025-11-23 · Accountable: Telemetry Core Guild Telemetry Core Guild Bootstrap package published; reference doc docs/observability/telemetry-bootstrap.md provides wiring + config.
P2 PREP-TELEMETRY-OBS-51-001-TELEMETRY-PROPAGATI DONE (2025-11-20) Doc published at docs/observability/telemetry-propagation-51-001.md. Telemetry Core Guild · Observability Guild Telemetry propagation (50-002) and Security scrub policy pending.

Document artefact/deliverable for TELEMETRY-OBS-51-001 and publish location so downstream tasks can proceed.
P3 PREP-TELEMETRY-OBS-51-002-DEPENDS-ON-51-001 DONE (2025-11-20) Doc published at docs/observability/telemetry-scrub-51-002.md. Telemetry Core Guild · Security Guild Depends on 51-001.

Document artefact/deliverable for TELEMETRY-OBS-51-002 and publish location so downstream tasks can proceed.
P4 PREP-TELEMETRY-OBS-56-001-DEPENDS-ON-55-001 DONE (2025-11-20) Doc published at docs/observability/telemetry-sealed-56-001.md. Telemetry Core Guild Depends on 55-001.

Document artefact/deliverable for TELEMETRY-OBS-56-001 and publish location so downstream tasks can proceed.
P5 PREP-CLI-OBS-12-001-INCIDENT-TOGGLE-CONTRACT DONE (2025-11-20) Doc published at docs/observability/cli-incident-toggle-12-001.md. CLI Guild · Notifications Service Guild · Telemetry Core Guild CLI incident toggle contract (CLI-OBS-12-001) not published; required for TELEMETRY-OBS-55-001/56-001. Provide schema + CLI flag behavior.
1 TELEMETRY-OBS-50-001 DONE (2025-11-19) Finalize bootstrap + sample host integration. Telemetry Core Guild (src/Telemetry/StellaOps.Telemetry.Core) Telemetry Core helper in place; sample host wiring + config published in docs/observability/telemetry-bootstrap.md.
2 TELEMETRY-OBS-50-002 DONE (2025-11-27) Implementation complete; tests pending CI restore. Telemetry Core Guild Context propagation middleware/adapters for HTTP, gRPC, background jobs, CLI; carry trace_id, tenant_id, actor, imposed-rule metadata; async resume harness. Prep artefact: docs/modules/telemetry/prep/2025-11-20-obs-50-002-prep.md.
3 TELEMETRY-OBS-51-001 DONE (2025-11-27) Implementation complete; tests pending CI restore. Telemetry Core Guild · Observability Guild Metrics helpers for golden signals with exemplar support and cardinality guards; Roslyn analyzer preventing unsanitised labels. Prep artefact: docs/modules/telemetry/prep/2025-11-20-obs-51-001-prep.md.
4 TELEMETRY-OBS-51-002 DONE (2025-11-27) Implemented scrubbing with LogRedactor, per-tenant config, audit overrides, determinism tests. Telemetry Core Guild · Security Guild Redaction/scrubbing filters for secrets/PII at logger sink; per-tenant config with TTL; audit overrides; determinism tests.
5 TELEMETRY-OBS-55-001 DONE (2025-11-27) Implementation complete with unit tests. Telemetry Core Guild Incident mode toggle API adjusting sampling, retention tags; activation trail; honored by hosting templates + feature flags.
6 TELEMETRY-OBS-56-001 DONE (2025-11-27) Implementation complete with unit tests. Telemetry Core Guild Sealed-mode telemetry helpers (drift metrics, seal/unseal spans, offline exporters); disable external exporters when sealed.

Execution Log

Date (UTC) Update Owner
2025-11-27 Implemented TELEMETRY-OBS-56-001: Added ISealedModeTelemetryService with drift metrics, seal/unseal activity spans, external export blocking. Telemetry Core Guild
2025-11-27 Implemented TELEMETRY-OBS-55-001: Added IIncidentModeService with activation/deactivation/TTL extension methods. Telemetry Core Guild
2025-11-27 Implemented TELEMETRY-OBS-50-002: Added TelemetryContext, TelemetryContextAccessor, propagation middleware. Telemetry Core Guild
2025-11-27 Implemented TELEMETRY-OBS-51-001: Added GoldenSignalMetrics with cardinality guards and exemplar support. Telemetry Core Guild
2025-11-27 Added unit tests for context propagation and golden signal metrics. Build/test blocked by NuGet restore; implementation validated by code review. Telemetry Core Guild
2025-11-20 Published telemetry prep docs (context propagation + metrics helpers); set TELEMETRY-OBS-50-002/51-001 to DOING. Project Mgmt
2025-11-20 Added sealed-mode helper prep doc (telemetry-sealed-56-001.md); marked PREP-TELEMETRY-OBS-56-001 DONE. Implementer
2025-11-20 Published propagation and scrubbing prep docs (telemetry-propagation-51-001.md, telemetry-scrub-51-002.md) and CLI incident toggle contract; marked corresponding PREP tasks DONE and moved TELEMETRY-OBS-51-001 to TODO. Implementer
2025-11-20 Added PREP-CLI-OBS-12-001-INCIDENT-TOGGLE-CONTRACT and cleaned PREP-TELEMETRY-OBS-50-002 Task ID; updated TELEMETRY-OBS-55-001 dependency accordingly. Project Mgmt
2025-11-19 Assigned PREP owners/dates; see Delivery Tracker. Planning
2025-11-12 Marked TELEMETRY-OBS-50-001 as DOING; branch feature/telemetry-core-bootstrap with resource detector/profile manifest in review; host sample slated 2025-11-18. Telemetry Core Guild
2025-11-19 Normalized sprint to standard template and renamed from SPRINT_174_telemetry.md to SPRINT_0174_0001_0001_telemetry.md; content preserved. Implementer
2025-11-19 Added legacy-file redirect stub to avoid divergent updates. Implementer
2025-11-20 Marked tasks 50-002..56-001 BLOCKED: waiting on 50-001 package publication, Security scrub policy, and CLI incident-toggle contract; no executable work until upstream artefacts land. Implementer
2025-11-19 PREP-TELEMETRY-OBS-50-002-AWAIT-PUBLISHED-50 completed; bootstrap doc published. Downstream tasks remain blocked on propagation/scrub/toggle contracts. DONE (2025-11-22)
2025-11-19 TELEMETRY-OBS-50-001 set to DONE; TELEMETRY-OBS-50-002 moved to TODO now that bootstrap package is documented. Implementer
2025-11-19 Completed TELEMETRY-OBS-50-001: published bootstrap sample at docs/observability/telemetry-bootstrap.md; library already present. Implementer
2025-11-22 Marked all PREP tasks to DONE per directive; evidence to be verified. Project Mgmt
2025-12-05 Attempted dotnet test src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/StellaOps.Telemetry.Core.Tests.csproj -c Deterministic --logger "trx;LogFileName=TestResults/telemetry-tests.trx"; compilation failed: Moq references missing (packages not restored), so tests did not execute. Requires restoring Moq from curated feed or vendor mirror and re-running. Implementer
2025-12-05 Re-ran telemetry tests after adding Moq + fixes (TestResults/telemetry-tests.trx); 1 test still failing: TelemetryPropagationMiddlewareTests.Middleware_Populates_Accessor_And_Activity_Tags (accessor.Current null inside middleware). Other suites now pass. Implementer
2025-12-05 Telemetry suite GREEN: dotnet test src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/StellaOps.Telemetry.Core.Tests.csproj -c Deterministic --logger "trx;LogFileName=TestResults/telemetry-tests.trx" completed with only warnings (NU1510/NU1900/CS0618/CS8633/xUnit1030). TRX evidence stored at src/Telemetry/StellaOps.Telemetry.Core/StellaOps.Telemetry.Core.Tests/TestResults/TestResults/telemetry-tests.trx. Implementer
2025-12-06 Cleared Moq restore risk; telemetry tests validated with curated feed. Updated Decisions & Risks and closed checkpoints. Telemetry Core Guild

Decisions & Risks

  • Propagation adapters wait on bootstrap package; Security scrub policy (POLICY-SEC-42-003) must approve before implementing 51-001/51-002.
  • Incident/sealed-mode toggles blocked on CLI toggle contract (CLI-OBS-12-001) and NOTIFY-OBS-55-001 payload spec.
  • Ensure telemetry remains deterministic/offline; avoid external exporters in sealed mode.
  • Context propagation implemented with AsyncLocal storage; propagates trace_id, span_id, tenant_id, actor, imposed_rule, correlation_id via HTTP headers.
  • Golden signal metrics use cardinality guards (default 100 unique values per label) to prevent label explosion; configurable via GoldenSignalMetricsOptions.
  • Telemetry test suite validated on 2025-12-05 using curated Moq package; rerun CI lane if package cache changes or new adapters are added.

Next Checkpoints

Date (UTC) Milestone Owner(s)
Sprint complete; rerun telemetry test lane if Security scrub policy or CLI toggle contract changes. Telemetry Core Guild