Files
git.stella-ops.org/docs/observability/telemetry-propagation-51-001.md
StellaOps Bot 8abbf9574d up
2025-11-27 21:10:06 +02:00

46 lines
3.1 KiB
Markdown

# Telemetry propagation contract (TELEMETRY-OBS-51-001)
**Goal**: standardise trace/metrics propagation across StellaOps services so golden-signal helpers remain deterministic, tenant-safe, and offline-friendly.
## Scope
- Applies to HTTP, gRPC, background jobs, and message handlers instrumented via `StellaOps.Telemetry.Core`.
- Complements bootstrap guide (`telemetry-bootstrap.md`) and precedes metrics helper implementation.
## Required context fields
- `trace_id` / `span_id`: W3C TraceContext headers only (no B3); generate if missing.
- `tenant`: lower-case string; required for all incoming requests; default to `unknown` only in sealed/offline diagnostics jobs.
- `actor`: optional user/service principal; redacted to hash in logs when `Scrub.Sealed=true`.
- `imposed_rule`: optional string conveying enforcement context (e.g., `merge=false`).
## HTTP middleware
- Accept `traceparent`/`tracestate`; reject/strip vendor-specific headers.
- Propagate `tenant`, `actor`, `imposed-rule` via `x-stella-tenant`, `x-stella-actor`, `x-stella-imposed-rule` headers (defaults configurable via `Telemetry:Propagation`).
- Middleware entry point: `app.UseStellaOpsTelemetryContext()` plus the `TelemetryPropagationHandler` automatically added to all `HttpClient` instances when `AddStellaOpsTelemetry` is called.
- Emit exemplars: when sampling is off, attach exemplar ids to request duration and active request metrics.
## gRPC interceptors
- Use binary TraceContext; carry metadata keys `stella-tenant`, `stella-actor`, `stella-imposed-rule`.
- Enforce presence of `tenant`; abort with `Unauthenticated` if missing in non-sealed mode.
## Jobs & message handlers
- Wrap background job execution with Activity + baggage items (`tenant`, `actor`, `imposed_rule`).
- When publishing bus events, stamp `trace_id` and `tenant` into headers; avoid embedding PII in payloads.
## Metrics helper expectations
- Golden signals: `http.server.duration`, `http.client.duration`, `messaging.operation.duration`, `job.execution.duration`, `runtime.gc.pause`, `db.call.duration`.
- Mandatory tags: `tenant`, `service`, `endpoint`/`operation`, `result` (`ok|error|cancelled|throttled`), `sealed` (`true|false`).
- Cardinality guard: trim tag values to 64 chars (configurable) and replace values beyond the first 50 distinct entries per key with `other` (enforced by `MetricLabelGuard`).
- Helper API: `Histogram<double>.RecordRequestDuration(guard, durationMs, route, verb, status, result)` applies guard + tags consistently.
## Determinism & offline posture
- All timestamps UTC RFC3339; sampling configs controlled via appsettings and mirrored in offline bundles.
- No external exporters when `Sealed=true`; use in-memory or file-based OTLP for air-gap.
## Tests to add with implementation
- Middleware unit tests asserting header/baggage mapping and tenant enforcement.
- Metrics helper tests ensuring required tags present and trimmed; exemplar id attached when enabled.
- Deterministic snapshot tests for serialized OTLP when sealed/offline.
## Provenance
- Authored 2025-11-20 to unblock TELEMETRY-OBS-51-001; to be refined as helpers are coded.