Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
39 lines
2.6 KiB
Markdown
39 lines
2.6 KiB
Markdown
# Telemetry Standards (DOCS-OBS-50-002)
|
|
|
|
Last updated: 2025-11-25 (Docs Tasks Md.VI)
|
|
|
|
## Common envelope
|
|
- **Trace context**: `trace_id`, `span_id`, `trace_flags`; propagate W3C `traceparent` and `baggage` end to end.
|
|
- **Tenant & workload**: `tenant`, `workload` (service name), `region`, `env` (dev/stage/prod), `version` (git sha or semver).
|
|
- **Subject**: `component` (module), `operation` (verb/name), `resource` (purl/uri/subject id when safe).
|
|
- **Timing**: UTC ISO-8601 `timestamp`; durations in milliseconds with integers.
|
|
- **Outcome**: `status` (`ok|error|fault|throttle`), `error.code` (machine), `error.message` (human, redacted), `retryable` (bool).
|
|
|
|
## Scrubbing policy
|
|
- Denylist PII/secrets before emit: emails, tokens, Authorization headers, bearer fragments, private keys, passwords, session IDs.
|
|
- Redact fields to `"[redacted]"` and add `redaction.reason` (`secret|pii|tenant_policy`).
|
|
- Hash low-cardinality identifiers when needed (`sha256` lowercase hex) and mark `hashed=true`.
|
|
- Logs must not contain full request/response bodies; store hashes plus lengths. For NDJSON exports, allow hashes + selected headers only.
|
|
|
|
## Sampling defaults
|
|
- **Traces**: 10% head sampling non-prod; 100% for `status=error|fault` and for spans tagged `audit=true`. Prod default 5% with the same error/audit boost.
|
|
- **Logs**: info logs rate-limited per component (default 100/s); warn/error never sampled. Structured JSON only.
|
|
- **Metrics**: never sampled; counters/gauges/histograms use deterministic bucket boundaries documented in component specs.
|
|
|
|
## Redaction override procedure
|
|
- Overrides are rare and must be auditable.
|
|
- To allow a field temporarily, set `telemetry.redaction.overrides=<comma list>` in service config with change-ticket id; emit `redaction.override=true` tag on affected spans/logs.
|
|
- Overrides expire automatically after `telemetry.redaction.override_ttl` (default 24h); services refuse to start with expired overrides.
|
|
- All overrides are logged to `telemetry.redaction.audit` channel with actor, ticket, fields, TTL.
|
|
|
|
## Determinism & offline posture
|
|
- No external enrichers; all enrichment data must be preloaded bundles (e.g., service map, tenant metadata).
|
|
- Sorting for exports: by `timestamp`, then `workload`, then `operation`.
|
|
- Time always UTC; avoid locale-specific formats.
|
|
|
|
## Validation checklist
|
|
- [ ] `traceparent` propagated and present on inbound/outbound.
|
|
- [ ] Required fields present (`tenant`, `workload`, `operation`, `status`).
|
|
- [ ] Scrubbing tests cover auth headers and bodies.
|
|
- [ ] Sampling knobs configurable via env vars with documented defaults.
|