Files
StellaOps Bot 9f6e6f7fb3
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
up
2025-11-25 22:09:44 +02:00

2.5 KiB

Escalations & Acknowledgements

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

Last updated: 2025-11-25 (Docs Tasks Md.V · DOCS-NOTIFY-40-001)

Model

  • Escalation policy: ordered stages of channels with delays; stored per tenant.
  • Acknowledgement: DSSE-signed token embedded in messages; acknowledger must present token to stop escalation.
  • Suppression: rules may mark events as non-escalating (informational) while still sending single notifications.

Policy schema (conceptual)

{
  "id": "uuid",
  "tenant": "string",
  "name": "pager-policy-prod",
  "stages": [
    { "delaySeconds": 0,   "channels": ["slack-prod", "email-oncall"] },
    { "delaySeconds": 900, "channels": ["pager-primary"] },
    { "delaySeconds": 1800,"channels": ["pager-management"] }
  ],
  "autoCloseMinutes": 120,
  "retry": { "maxAttempts": 3, "backoffSeconds": 60 }
}
  • Stages execute sequentially until an ack is recorded.
  • Deterministic ordering: channels within a stage are sorted lexicographically before dispatch.

Ack tokens

  • Token payload: { tenant, deliveryId, expiresUtc, ruleId, actionHash }.
  • Signed with Authority-issued DSSE key; verified by Notify WebService before accepting POST /acks/{token}.
  • Expiry defaults to 24h; tokens are single-use and idempotent.

Escalation flow

  1. Rule fires → action references an escalation policy.
  2. Stage 0 deliveries sent; ledger records attempts and ack URL.
  3. If no ack by delaySeconds, next stage dispatches; repeats until ack or final stage.
  4. On ack, remaining stages are cancelled; ledger entry marked acknowledged with timestamp and subject.

Quiet hours & throttles

  • Quiet hours suppress new escalations; in-flight escalations continue.
  • Per-policy throttle prevents repeated escalation runs for identical actionHash within a configurable window (default 30m).

Observability

  • Counters: notify.escalation.started, notify.escalation.stage_sent, notify.escalation.ack, notify.escalation.cancelled tagged by tenant, policy, stage.
  • Logs: structured escalation.{started|stage_sent|ack|cancelled} with delivery ids and rationale.

Runbooks

  • Update escalation policy safely: create new policy id, switch rules, then delete old policy to avoid mid-flight ambiguity.
  • If a stage storms, set throttle higher or add quiet hours; do not delete the policy mid-flight—use cancelEscalation endpoint instead.