Files
git.stella-ops.org/docs/notifications/slo-webhook-schema.md
master 10212d67c0
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
Refactor code structure for improved readability and maintainability; removed redundant code blocks and optimized function calls.
2025-11-20 07:50:52 +02:00

87 lines
3.6 KiB
Markdown

# Notifier Telemetry SLO Webhook Schema (1.0.0)
Purpose: define the payload emitted by Telemetry SLO evaluators toward Notifier so that NOTIFY-OBS-51-001 can consume alerts deterministically (online and offline).
## Delivery contract
- Content-Type: `application/json`
- Encoding: UTF-8
- Authentication: mTLS (service identity) or DPoP/JWT with `aud` = `notifier` and `scope` = `obs:slo:ingest`.
- Determinism: timestamps are UTC ISO-8601 with `Z`; field order stable for hashing (see canonical JSON below).
## Payload fields
```
{
"id": "uuid",
"tenant": "string", // required; aligns with orchestrator/telemetry tenant id
"service": "string", // logical service name
"host": "string", // optional; k8s node/hostname
"slo": {
"name": "string", // human-readable
"id": "string", // immutable key used for dedupe
"objective": {
"window": "PT5M", // ISO-8601 duration
"target": 0.995 // decimal between 0 and 1
}
},
"metric": {
"type": "latency|error|availability|custom",
"value": 0.0123, // double; units depend on type
"unit": "seconds|ratio|percent|count",
"labels": { // sanitized, deterministic ordering when serialized
"endpoint": "/api/jobs",
"method": "GET"
}
},
"window": {
"start": "2025-11-19T12:00:00Z",
"end": "2025-11-19T12:05:00Z"
},
"breach": {
"state": "breaching|warning|ok",
"reason": "p95 latency above objective",
"evidence": [
{
"type": "timeseries",
"href": "cas://telemetry/series/abc123",
"hash": "sha256:..."
}
]
},
"quietHours": {
"active": false,
"policyId": null
},
"trace": {
"trace_id": "optional-trace-id",
"span_id": "optional-span-id"
},
"version": "1.0.0",
"issued_at": "2025-11-19T12:05:07Z"
}
```
### Canonical JSON rules
- Sort object keys lexicographically before hashing/signing.
- Use lowercase for enum-like fields shown above.
- `version` is required for evolution; new fields must be add-only.
### Retry and idempotency
- `id` is the idempotency key; Notifier treats duplicates as no-op.
- Producers retry with exponential backoff up to 10 minutes; consumers respond 2xx only after persistence.
### Validation checklist (for tests/CI)
- Required fields: id, tenant, service, slo.id, slo.objective.window, slo.objective.target, metric.type, metric.value, window.start/end, breach.state, version, issued_at.
- Timestamps parse with `DateTimeStyles.RoundtripKind`.
- When `breach.state=ok`, `breach.reason` may be null but object must exist.
- `quietHours.active=true` must include `policyId`.
### Sample canonical JSON (minified)
```
{"breach":{"evidence":[],"reason":"p99 latency above objective","state":"breaching"},"host":"orchestrator-0","id":"8c1d58c4-b1de-4b3c-9c7b-40a6b0f8d4c1","issued_at":"2025-11-19T12:05:07Z","metric":{"labels":{"endpoint":"/api/jobs","method":"GET"},"type":"latency","unit":"seconds","value":1.234},"quietHours":{"active":false,"policyId":null},"service":"orchestrator","slo":{"id":"orch-api-latency-p99","name":"Orchestrator API p99","objective":{"target":0.99,"window":"PT5M"}},"tenant":"default","trace":{"span_id":null,"trace_id":null},"version":"1.0.0","window":{"end":"2025-11-19T12:05:00Z","start":"2025-11-19T12:00:00Z"}}
```
### Evidence to surface in sprint tasks
- File: `docs/notifications/slo-webhook-schema.md` (this document).
- Sample payload (canonical) and validation checklist above.
- Dependencies: upstream Telemetry evaluator must emit `metric.labels` sanitized; Notifier to persist `id` for idempotency.