doctor: complete runtime check documentation sprint

Signed-off-by: master <>
This commit is contained in:
master
2026-03-31 23:26:24 +03:00
parent 404d50bcb7
commit 152c1b1357
54 changed files with 2210 additions and 258 deletions

View File

@@ -0,0 +1,48 @@
---
checkId: check.observability.tracing
plugin: stellaops.doctor.observability
severity: warn
tags: [observability, tracing, correlation]
---
# Distributed Tracing
## What It Checks
Validates trace enablement, propagator, sampling ratio, exporter type, and whether HTTP and database instrumentation are turned on.
The check reports info when tracing is explicitly disabled and warns when sampling is invalid, too low, or when important instrumentation is turned off.
## Why It Matters
Tracing is the fastest way to understand cross-service latency and identify the exact hop that is failing. Disabling instrumentation removes that evidence.
## Common Causes
- Sampling ratio set to `0` during load testing and never restored
- Only outbound HTTP traces are enabled while database spans remain off
- Propagator or exporter defaults differ between services
## How to Fix
### Docker Compose
```yaml
services:
doctor-web:
environment:
Tracing__Enabled: "true"
Tracing__SamplingRatio: "1.0"
Tracing__Instrumentation__Http: "true"
Tracing__Instrumentation__Database: "true"
```
### Bare Metal / systemd
Keep `Tracing:SamplingRatio` between `0.01` and `1.0` unless you are deliberately suppressing traces for a benchmark.
### Kubernetes / Helm
Propagate the same trace configuration across all services in the release path so correlation IDs remain intact.
## Verification
```bash
stella doctor --check check.observability.tracing
```
## Related Checks
- `check.observability.otel` - exporter connectivity must work before traces leave the process
- `check.servicegraph.timeouts` - tracing is most useful when diagnosing timeout-related issues