# Tracing Standards (DOCS-OBS-50-004) Last updated: 2025-11-25 (Docs Tasks Md.VI) ## Goals - Consistent distributed tracing across services (API, workers, CLI). - Safe for offline/air-gapped deployments. - Deterministic span data for replay/debug. ## Context propagation - Use W3C headers: `traceparent` (required), `baggage` (optional key/value pairs). - Preserve incoming `trace_id` for all downstream calls; create child spans per operation. - For async work (queues, cron), copy `traceparent` and `baggage` into the message envelope; new span links to the stored context using **links**, not a new parent. ## Span conventions - Names: `.` (e.g., `riskengine.simulate`, `notify.deliver`). - Required attributes: `tenant`, `workload` (service), `env`, `region`, `version`, `operation`, `status`. - HTTP spans: add `http.method`, `http.route`, `http.status_code`, `net.peer.name`, `net.peer.port`. - DB spans: `db.system`, `db.name`, `db.operation`, `db.statement` (omit literals). - Message spans: `messaging.system`, `messaging.destination`, `messaging.operation` (`send|receive|process`), `messaging.message_id`. - Errors: set `status=error`, include `error.code`, redacted `error.message`, `retryable` (bool). ## Sampling - Default head sampling: 10% non-prod, 5% prod. - Always sample spans with `status=error|fault` or `audit=true`. - Allow override via env `Tracing__SampleRate` (0–1) per service; document in runbooks. ## Offline/air-gap posture - No external exporters; emit OTLP to local collector or file. - Disable remote enrichment; rely on bundled service map. - All timestamps UTC; span ids deterministic only in scope of traceparent (no GUID reuse). ## Validation checklist - [ ] `traceparent` forwarded on every inbound/outbound call. - [ ] Required attributes present on spans. - [ ] Error spans include codes and redacted messages. - [ ] Sampling knobs documented in service config.