Files
git.stella-ops.org/docs/notifications/slo-webhook-schema.md
master 10212d67c0
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
Refactor code structure for improved readability and maintainability; removed redundant code blocks and optimized function calls.
2025-11-20 07:50:52 +02:00

3.6 KiB

Notifier Telemetry SLO Webhook Schema (1.0.0)

Purpose: define the payload emitted by Telemetry SLO evaluators toward Notifier so that NOTIFY-OBS-51-001 can consume alerts deterministically (online and offline).

Delivery contract

  • Content-Type: application/json
  • Encoding: UTF-8
  • Authentication: mTLS (service identity) or DPoP/JWT with aud = notifier and scope = obs:slo:ingest.
  • Determinism: timestamps are UTC ISO-8601 with Z; field order stable for hashing (see canonical JSON below).

Payload fields

{
  "id": "uuid",
  "tenant": "string",            // required; aligns with orchestrator/telemetry tenant id
  "service": "string",           // logical service name
  "host": "string",              // optional; k8s node/hostname
  "slo": {
    "name": "string",           // human-readable
    "id": "string",             // immutable key used for dedupe
    "objective": {
      "window": "PT5M",         // ISO-8601 duration
      "target": 0.995             // decimal between 0 and 1
    }
  },
  "metric": {
    "type": "latency|error|availability|custom",
    "value": 0.0123,             // double; units depend on type
    "unit": "seconds|ratio|percent|count",
    "labels": {                  // sanitized, deterministic ordering when serialized
      "endpoint": "/api/jobs",
      "method": "GET"
    }
  },
  "window": {
    "start": "2025-11-19T12:00:00Z",
    "end": "2025-11-19T12:05:00Z"
  },
  "breach": {
    "state": "breaching|warning|ok",
    "reason": "p95 latency above objective",
    "evidence": [
      {
        "type": "timeseries",
        "href": "cas://telemetry/series/abc123",
        "hash": "sha256:..."
      }
    ]
  },
  "quietHours": {
    "active": false,
    "policyId": null
  },
  "trace": {
    "trace_id": "optional-trace-id",
    "span_id": "optional-span-id"
  },
  "version": "1.0.0",
  "issued_at": "2025-11-19T12:05:07Z"
}

Canonical JSON rules

  • Sort object keys lexicographically before hashing/signing.
  • Use lowercase for enum-like fields shown above.
  • version is required for evolution; new fields must be add-only.

Retry and idempotency

  • id is the idempotency key; Notifier treats duplicates as no-op.
  • Producers retry with exponential backoff up to 10 minutes; consumers respond 2xx only after persistence.

Validation checklist (for tests/CI)

  • Required fields: id, tenant, service, slo.id, slo.objective.window, slo.objective.target, metric.type, metric.value, window.start/end, breach.state, version, issued_at.
  • Timestamps parse with DateTimeStyles.RoundtripKind.
  • When breach.state=ok, breach.reason may be null but object must exist.
  • quietHours.active=true must include policyId.

Sample canonical JSON (minified)

{"breach":{"evidence":[],"reason":"p99 latency above objective","state":"breaching"},"host":"orchestrator-0","id":"8c1d58c4-b1de-4b3c-9c7b-40a6b0f8d4c1","issued_at":"2025-11-19T12:05:07Z","metric":{"labels":{"endpoint":"/api/jobs","method":"GET"},"type":"latency","unit":"seconds","value":1.234},"quietHours":{"active":false,"policyId":null},"service":"orchestrator","slo":{"id":"orch-api-latency-p99","name":"Orchestrator API p99","objective":{"target":0.99,"window":"PT5M"}},"tenant":"default","trace":{"span_id":null,"trace_id":null},"version":"1.0.0","window":{"end":"2025-11-19T12:05:00Z","start":"2025-11-19T12:00:00Z"}}

Evidence to surface in sprint tasks

  • File: docs/notifications/slo-webhook-schema.md (this document).
  • Sample payload (canonical) and validation checklist above.
  • Dependencies: upstream Telemetry evaluator must emit metric.labels sanitized; Notifier to persist id for idempotency.