87 lines
3.6 KiB
Markdown
87 lines
3.6 KiB
Markdown
# Notifier Telemetry SLO Webhook Schema (1.0.0)
|
|
|
|
Purpose: define the payload emitted by Telemetry SLO evaluators toward Notifier so that NOTIFY-OBS-51-001 can consume alerts deterministically (online and offline).
|
|
|
|
## Delivery contract
|
|
- Content-Type: `application/json`
|
|
- Encoding: UTF-8
|
|
- Authentication: mTLS (service identity) or DPoP/JWT with `aud` = `notifier` and `scope` = `obs:slo:ingest`.
|
|
- Determinism: timestamps are UTC ISO-8601 with `Z`; field order stable for hashing (see canonical JSON below).
|
|
|
|
## Payload fields
|
|
```
|
|
{
|
|
"id": "uuid",
|
|
"tenant": "string", // required; aligns with orchestrator/telemetry tenant id
|
|
"service": "string", // logical service name
|
|
"host": "string", // optional; k8s node/hostname
|
|
"slo": {
|
|
"name": "string", // human-readable
|
|
"id": "string", // immutable key used for dedupe
|
|
"objective": {
|
|
"window": "PT5M", // ISO-8601 duration
|
|
"target": 0.995 // decimal between 0 and 1
|
|
}
|
|
},
|
|
"metric": {
|
|
"type": "latency|error|availability|custom",
|
|
"value": 0.0123, // double; units depend on type
|
|
"unit": "seconds|ratio|percent|count",
|
|
"labels": { // sanitized, deterministic ordering when serialized
|
|
"endpoint": "/api/jobs",
|
|
"method": "GET"
|
|
}
|
|
},
|
|
"window": {
|
|
"start": "2025-11-19T12:00:00Z",
|
|
"end": "2025-11-19T12:05:00Z"
|
|
},
|
|
"breach": {
|
|
"state": "breaching|warning|ok",
|
|
"reason": "p95 latency above objective",
|
|
"evidence": [
|
|
{
|
|
"type": "timeseries",
|
|
"href": "cas://telemetry/series/abc123",
|
|
"hash": "sha256:..."
|
|
}
|
|
]
|
|
},
|
|
"quietHours": {
|
|
"active": false,
|
|
"policyId": null
|
|
},
|
|
"trace": {
|
|
"trace_id": "optional-trace-id",
|
|
"span_id": "optional-span-id"
|
|
},
|
|
"version": "1.0.0",
|
|
"issued_at": "2025-11-19T12:05:07Z"
|
|
}
|
|
```
|
|
|
|
### Canonical JSON rules
|
|
- Sort object keys lexicographically before hashing/signing.
|
|
- Use lowercase for enum-like fields shown above.
|
|
- `version` is required for evolution; new fields must be add-only.
|
|
|
|
### Retry and idempotency
|
|
- `id` is the idempotency key; Notifier treats duplicates as no-op.
|
|
- Producers retry with exponential backoff up to 10 minutes; consumers respond 2xx only after persistence.
|
|
|
|
### Validation checklist (for tests/CI)
|
|
- Required fields: id, tenant, service, slo.id, slo.objective.window, slo.objective.target, metric.type, metric.value, window.start/end, breach.state, version, issued_at.
|
|
- Timestamps parse with `DateTimeStyles.RoundtripKind`.
|
|
- When `breach.state=ok`, `breach.reason` may be null but object must exist.
|
|
- `quietHours.active=true` must include `policyId`.
|
|
|
|
### Sample canonical JSON (minified)
|
|
```
|
|
{"breach":{"evidence":[],"reason":"p99 latency above objective","state":"breaching"},"host":"orchestrator-0","id":"8c1d58c4-b1de-4b3c-9c7b-40a6b0f8d4c1","issued_at":"2025-11-19T12:05:07Z","metric":{"labels":{"endpoint":"/api/jobs","method":"GET"},"type":"latency","unit":"seconds","value":1.234},"quietHours":{"active":false,"policyId":null},"service":"orchestrator","slo":{"id":"orch-api-latency-p99","name":"Orchestrator API p99","objective":{"target":0.99,"window":"PT5M"}},"tenant":"default","trace":{"span_id":null,"trace_id":null},"version":"1.0.0","window":{"end":"2025-11-19T12:05:00Z","start":"2025-11-19T12:00:00Z"}}
|
|
```
|
|
|
|
### Evidence to surface in sprint tasks
|
|
- File: `docs/notifications/slo-webhook-schema.md` (this document).
|
|
- Sample payload (canonical) and validation checklist above.
|
|
- Dependencies: upstream Telemetry evaluator must emit `metric.labels` sanitized; Notifier to persist `id` for idempotency.
|