Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
1.9 KiB
1.9 KiB
Tracing Standards (DOCS-OBS-50-004)
Last updated: 2025-11-25 (Docs Tasks Md.VI)
Goals
- Consistent distributed tracing across services (API, workers, CLI).
- Safe for offline/air-gapped deployments.
- Deterministic span data for replay/debug.
Context propagation
- Use W3C headers:
traceparent(required),baggage(optional key/value pairs). - Preserve incoming
trace_idfor all downstream calls; create child spans per operation. - For async work (queues, cron), copy
traceparentandbaggageinto the message envelope; new span links to the stored context using links, not a new parent.
Span conventions
- Names:
<component>.<operation>(e.g.,riskengine.simulate,notify.deliver). - Required attributes:
tenant,workload(service),env,region,version,operation,status. - HTTP spans: add
http.method,http.route,http.status_code,net.peer.name,net.peer.port. - DB spans:
db.system,db.name,db.operation,db.statement(omit literals). - Message spans:
messaging.system,messaging.destination,messaging.operation(send|receive|process),messaging.message_id. - Errors: set
status=error, includeerror.code, redactederror.message,retryable(bool).
Sampling
- Default head sampling: 10% non-prod, 5% prod.
- Always sample spans with
status=error|faultoraudit=true. - Allow override via env
Tracing__SampleRate(0–1) per service; document in runbooks.
Offline/air-gap posture
- No external exporters; emit OTLP to local collector or file.
- Disable remote enrichment; rely on bundled service map.
- All timestamps UTC; span ids deterministic only in scope of traceparent (no GUID reuse).
Validation checklist
traceparentforwarded on every inbound/outbound call.- Required attributes present on spans.
- Error spans include codes and redacted messages.
- Sampling knobs documented in service config.