Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
2.6 KiB
2.6 KiB
Telemetry Standards (DOCS-OBS-50-002)
Last updated: 2025-11-25 (Docs Tasks Md.VI)
Common envelope
- Trace context:
trace_id,span_id,trace_flags; propagate W3Ctraceparentandbaggageend to end. - Tenant & workload:
tenant,workload(service name),region,env(dev/stage/prod),version(git sha or semver). - Subject:
component(module),operation(verb/name),resource(purl/uri/subject id when safe). - Timing: UTC ISO-8601
timestamp; durations in milliseconds with integers. - Outcome:
status(ok|error|fault|throttle),error.code(machine),error.message(human, redacted),retryable(bool).
Scrubbing policy
- Denylist PII/secrets before emit: emails, tokens, Authorization headers, bearer fragments, private keys, passwords, session IDs.
- Redact fields to
"[redacted]"and addredaction.reason(secret|pii|tenant_policy). - Hash low-cardinality identifiers when needed (
sha256lowercase hex) and markhashed=true. - Logs must not contain full request/response bodies; store hashes plus lengths. For NDJSON exports, allow hashes + selected headers only.
Sampling defaults
- Traces: 10% head sampling non-prod; 100% for
status=error|faultand for spans taggedaudit=true. Prod default 5% with the same error/audit boost. - Logs: info logs rate-limited per component (default 100/s); warn/error never sampled. Structured JSON only.
- Metrics: never sampled; counters/gauges/histograms use deterministic bucket boundaries documented in component specs.
Redaction override procedure
- Overrides are rare and must be auditable.
- To allow a field temporarily, set
telemetry.redaction.overrides=<comma list>in service config with change-ticket id; emitredaction.override=truetag on affected spans/logs. - Overrides expire automatically after
telemetry.redaction.override_ttl(default 24h); services refuse to start with expired overrides. - All overrides are logged to
telemetry.redaction.auditchannel with actor, ticket, fields, TTL.
Determinism & offline posture
- No external enrichers; all enrichment data must be preloaded bundles (e.g., service map, tenant metadata).
- Sorting for exports: by
timestamp, thenworkload, thenoperation. - Time always UTC; avoid locale-specific formats.
Validation checklist
traceparentpropagated and present on inbound/outbound.- Required fields present (
tenant,workload,operation,status). - Scrubbing tests cover auth headers and bodies.
- Sampling knobs configurable via env vars with documented defaults.