Files
git.stella-ops.org/docs/observability/telemetry-propagation-51-001.md
StellaOps Bot 8abbf9574d up
2025-11-27 21:10:06 +02:00

3.1 KiB

Telemetry propagation contract (TELEMETRY-OBS-51-001)

Goal: standardise trace/metrics propagation across StellaOps services so golden-signal helpers remain deterministic, tenant-safe, and offline-friendly.

Scope

  • Applies to HTTP, gRPC, background jobs, and message handlers instrumented via StellaOps.Telemetry.Core.
  • Complements bootstrap guide (telemetry-bootstrap.md) and precedes metrics helper implementation.

Required context fields

  • trace_id / span_id: W3C TraceContext headers only (no B3); generate if missing.
  • tenant: lower-case string; required for all incoming requests; default to unknown only in sealed/offline diagnostics jobs.
  • actor: optional user/service principal; redacted to hash in logs when Scrub.Sealed=true.
  • imposed_rule: optional string conveying enforcement context (e.g., merge=false).

HTTP middleware

  • Accept traceparent/tracestate; reject/strip vendor-specific headers.
  • Propagate tenant, actor, imposed-rule via x-stella-tenant, x-stella-actor, x-stella-imposed-rule headers (defaults configurable via Telemetry:Propagation).
  • Middleware entry point: app.UseStellaOpsTelemetryContext() plus the TelemetryPropagationHandler automatically added to all HttpClient instances when AddStellaOpsTelemetry is called.
  • Emit exemplars: when sampling is off, attach exemplar ids to request duration and active request metrics.

gRPC interceptors

  • Use binary TraceContext; carry metadata keys stella-tenant, stella-actor, stella-imposed-rule.
  • Enforce presence of tenant; abort with Unauthenticated if missing in non-sealed mode.

Jobs & message handlers

  • Wrap background job execution with Activity + baggage items (tenant, actor, imposed_rule).
  • When publishing bus events, stamp trace_id and tenant into headers; avoid embedding PII in payloads.

Metrics helper expectations

  • Golden signals: http.server.duration, http.client.duration, messaging.operation.duration, job.execution.duration, runtime.gc.pause, db.call.duration.
  • Mandatory tags: tenant, service, endpoint/operation, result (ok|error|cancelled|throttled), sealed (true|false).
  • Cardinality guard: trim tag values to 64 chars (configurable) and replace values beyond the first 50 distinct entries per key with other (enforced by MetricLabelGuard).
  • Helper API: Histogram<double>.RecordRequestDuration(guard, durationMs, route, verb, status, result) applies guard + tags consistently.

Determinism & offline posture

  • All timestamps UTC RFC3339; sampling configs controlled via appsettings and mirrored in offline bundles.
  • No external exporters when Sealed=true; use in-memory or file-based OTLP for air-gap.

Tests to add with implementation

  • Middleware unit tests asserting header/baggage mapping and tenant enforcement.
  • Metrics helper tests ensuring required tags present and trimmed; exemplar id attached when enabled.
  • Deterministic snapshot tests for serialized OTLP when sealed/offline.

Provenance

  • Authored 2025-11-20 to unblock TELEMETRY-OBS-51-001; to be refined as helpers are coded.