3.6 KiB
3.6 KiB
Notifier Telemetry SLO Webhook Schema (1.0.0)
Purpose: define the payload emitted by Telemetry SLO evaluators toward Notifier so that NOTIFY-OBS-51-001 can consume alerts deterministically (online and offline).
Delivery contract
- Content-Type:
application/json - Encoding: UTF-8
- Authentication: mTLS (service identity) or DPoP/JWT with
aud=notifierandscope=obs:slo:ingest. - Determinism: timestamps are UTC ISO-8601 with
Z; field order stable for hashing (see canonical JSON below).
Payload fields
{
"id": "uuid",
"tenant": "string", // required; aligns with orchestrator/telemetry tenant id
"service": "string", // logical service name
"host": "string", // optional; k8s node/hostname
"slo": {
"name": "string", // human-readable
"id": "string", // immutable key used for dedupe
"objective": {
"window": "PT5M", // ISO-8601 duration
"target": 0.995 // decimal between 0 and 1
}
},
"metric": {
"type": "latency|error|availability|custom",
"value": 0.0123, // double; units depend on type
"unit": "seconds|ratio|percent|count",
"labels": { // sanitized, deterministic ordering when serialized
"endpoint": "/api/jobs",
"method": "GET"
}
},
"window": {
"start": "2025-11-19T12:00:00Z",
"end": "2025-11-19T12:05:00Z"
},
"breach": {
"state": "breaching|warning|ok",
"reason": "p95 latency above objective",
"evidence": [
{
"type": "timeseries",
"href": "cas://telemetry/series/abc123",
"hash": "sha256:..."
}
]
},
"quietHours": {
"active": false,
"policyId": null
},
"trace": {
"trace_id": "optional-trace-id",
"span_id": "optional-span-id"
},
"version": "1.0.0",
"issued_at": "2025-11-19T12:05:07Z"
}
Canonical JSON rules
- Sort object keys lexicographically before hashing/signing.
- Use lowercase for enum-like fields shown above.
versionis required for evolution; new fields must be add-only.
Retry and idempotency
idis the idempotency key; Notifier treats duplicates as no-op.- Producers retry with exponential backoff up to 10 minutes; consumers respond 2xx only after persistence.
Validation checklist (for tests/CI)
- Required fields: id, tenant, service, slo.id, slo.objective.window, slo.objective.target, metric.type, metric.value, window.start/end, breach.state, version, issued_at.
- Timestamps parse with
DateTimeStyles.RoundtripKind. - When
breach.state=ok,breach.reasonmay be null but object must exist. quietHours.active=truemust includepolicyId.
Sample canonical JSON (minified)
{"breach":{"evidence":[],"reason":"p99 latency above objective","state":"breaching"},"host":"orchestrator-0","id":"8c1d58c4-b1de-4b3c-9c7b-40a6b0f8d4c1","issued_at":"2025-11-19T12:05:07Z","metric":{"labels":{"endpoint":"/api/jobs","method":"GET"},"type":"latency","unit":"seconds","value":1.234},"quietHours":{"active":false,"policyId":null},"service":"orchestrator","slo":{"id":"orch-api-latency-p99","name":"Orchestrator API p99","objective":{"target":0.99,"window":"PT5M"}},"tenant":"default","trace":{"span_id":null,"trace_id":null},"version":"1.0.0","window":{"end":"2025-11-19T12:05:00Z","start":"2025-11-19T12:00:00Z"}}
Evidence to surface in sprint tasks
- File:
docs/notifications/slo-webhook-schema.md(this document). - Sample payload (canonical) and validation checklist above.
- Dependencies: upstream Telemetry evaluator must emit
metric.labelssanitized; Notifier to persistidfor idempotency.