Files
git.stella-ops.org/docs/doctor/articles/observability/observability-tracing.md
2026-03-31 23:26:24 +03:00

1.6 KiB

checkId, plugin, severity, tags
checkId plugin severity tags
check.observability.tracing stellaops.doctor.observability warn
observability
tracing
correlation

Distributed Tracing

What It Checks

Validates trace enablement, propagator, sampling ratio, exporter type, and whether HTTP and database instrumentation are turned on.

The check reports info when tracing is explicitly disabled and warns when sampling is invalid, too low, or when important instrumentation is turned off.

Why It Matters

Tracing is the fastest way to understand cross-service latency and identify the exact hop that is failing. Disabling instrumentation removes that evidence.

Common Causes

  • Sampling ratio set to 0 during load testing and never restored
  • Only outbound HTTP traces are enabled while database spans remain off
  • Propagator or exporter defaults differ between services

How to Fix

Docker Compose

services:
  doctor-web:
    environment:
      Tracing__Enabled: "true"
      Tracing__SamplingRatio: "1.0"
      Tracing__Instrumentation__Http: "true"
      Tracing__Instrumentation__Database: "true"

Bare Metal / systemd

Keep Tracing:SamplingRatio between 0.01 and 1.0 unless you are deliberately suppressing traces for a benchmark.

Kubernetes / Helm

Propagate the same trace configuration across all services in the release path so correlation IDs remain intact.

Verification

stella doctor --check check.observability.tracing
  • check.observability.otel - exporter connectivity must work before traces leave the process
  • check.servicegraph.timeouts - tracing is most useful when diagnosing timeout-related issues