up
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled

This commit is contained in:
StellaOps Bot
2025-11-25 22:09:44 +02:00
parent 6bee1fdcf5
commit 9f6e6f7fb3
116 changed files with 4495 additions and 730 deletions

42
docs/notifications/api.md Normal file
View File

@@ -0,0 +1,42 @@
# Notifications API
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
Last updated: 2025-11-25 (Docs Tasks Md.V · DOCS-NOTIFY-40-001)
All endpoints require `Authorization: Bearer <token>` and `X-Stella-Tenant` header. Responses use the common error envelope (`docs/api/overview.md`). Paths are rooted at `/api/v1/notify`.
## Channels
- `POST /channels` — create channel. Body matches `notifications/channels.md` schema. Returns `201` + channel.
- `GET /channels` — list channels (deterministic order: type ASC, id ASC). Supports `type` filter.
- `GET /channels/{id}` — fetch single channel.
- `DELETE /channels/{id}` — soft-delete; fails if referenced by active rules unless `force=true` query.
## Rules
- `POST /rules` — create/update rule; idempotency via `Idempotency-Key`.
- `GET /rules` — list rules with paging (`page_token`, `page_size`). Sorted by `name` ASC.
- `POST /rules:preview` — dry-run rule against sample event; returns matched actions and rendered templates.
## Policies & escalations
- `POST /policies/escalations` — create escalation policy (see `notifications/escalations.md`).
- `GET /policies/escalations` — list policies.
## Deliveries & digests
- `GET /deliveries` — query delivery ledger; filters: `status`, `channel`, `rule_id`, `from`, `to`. Sorted by `createdUtc` DESC then `id` ASC.
- `GET /deliveries/{id}` — single delivery with rendered payload hash and attempts.
- `POST /digests/preview` — preview digest rendering for a tenant/rule set; returns deterministic body/hash without sending.
## Acknowledgements
- `POST /acks/{token}` — acknowledge an escalation token. Validates DSSE signature, token expiry, and tenant. Returns `200` with cancellation summary.
## Simulations
- `POST /simulations/rules` — simulate a rule set for a supplied event payload; no side effects. Returns matched actions and throttling outcome.
## Health & metadata
- `GET /health` — liveness/readiness probes.
- `GET /metadata` — returns supported channel types, max payload sizes, and server version.
## Determinism notes
- All list endpoints are stable and include `next_page_token` when applicable.
- Templates render with fixed locale `en-US` unless `Accept-Language` provided; rendering is pure (no network calls).
- `bodyHash` uses SHA-256 over canonical JSON; repeated sends with identical inputs produce identical hashes and are de-duplicated.

View File

@@ -0,0 +1,51 @@
# Notification Channels
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
Last updated: 2025-11-25 (Docs Tasks Md.V · DOCS-NOTIFY-40-001)
## Supported channel types
- **Slack / Teams**: webhook-based with optional slash-command ack URLs.
- **Email (SMTP/SMTPS)**: relay-only; secrets provided via `secretRef` in Authority.
- **Generic webhook**: signed (HMAC-SHA256) payloads with replay protection and allowlisted hosts.
- **Pager duty-style escalation webhooks**: same contract as generic webhooks but with escalation metadata.
- **Console in-app**: stored delivery rendered in UI; always enabled for each tenant.
## Channel resource schema (Notify API)
```json
{
"id": "uuid",
"tenant": "string",
"type": "slack|teams|email|webhook|pager|console",
"endpoint": "https://..." ,
"secretRef": "authority://secrets/notify/slack-hook", // optional per type
"labels": { "env": "prod", "team": "sre" },
"throttle": { "windowSeconds": 60, "max": 10 },
"quietHours": { "from": "22:00", "to": "06:00", "timezone": "UTC" },
"enabled": true,
"createdUtc": "2025-11-25T00:00:00Z"
}
```
- **Determinism**: channel ids are UUIDv5 seeded by `(tenant, type, endpoint)` when created via manifests; server generates new IDs for ad-hoc API calls.
- **Validation**: endpoints must be on the allowlist; secretRef must exist in Authority; quiet hours use 24h clock UTC.
## Connector rules
- No secrets are stored in Notify DB; only `secretRef` is persisted.
- Per-tenant allowlists control outbound hostnames/ports; defaults block public internet in air-gapped kits.
- Payload signing:
- Slack/Teams: bearer secret in URL (indirect via secretRef) plus optional HMAC header `X-Stella-Signature` for mirror validation.
- Webhook/Pager: HMAC `X-Stella-Signature` (hex) over body with nonce + timestamp; receivers must enforce 5minute skew.
## Offline posture
- Offline kits ship default channel manifests under `out/offline/notify/channels/*.json` with placeholder endpoints.
- Operators must replace endpoints and secretRefs before deploy; validation rejects placeholder values.
## Observability
- Emit `notify.channel.delivery` counter with tags: `channel_type`, `tenant`, `status` (success/fail/throttled/quiet_hours), `rule_id`.
- Store delivery attempt hashes in the delivery ledger; duplicate payloads are de-duplicated per `(channel, bodyHash)` for 24h.
## Safety checklist
- [ ] Endpoint on allowlist and TLS valid.
- [ ] `secretRef` exists in Authority and scoped to tenant.
- [ ] Quiet hours configured for non-critical alerts; throttles set for bursty rules.
- [ ] HMAC signing verified in downstream system (webhook/pager).

View File

@@ -0,0 +1,51 @@
# Escalations & Acknowledgements
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
Last updated: 2025-11-25 (Docs Tasks Md.V · DOCS-NOTIFY-40-001)
## Model
- **Escalation policy**: ordered stages of channels with delays; stored per tenant.
- **Acknowledgement**: DSSE-signed token embedded in messages; acknowledger must present token to stop escalation.
- **Suppression**: rules may mark events as non-escalating (informational) while still sending single notifications.
## Policy schema (conceptual)
```json
{
"id": "uuid",
"tenant": "string",
"name": "pager-policy-prod",
"stages": [
{ "delaySeconds": 0, "channels": ["slack-prod", "email-oncall"] },
{ "delaySeconds": 900, "channels": ["pager-primary"] },
{ "delaySeconds": 1800,"channels": ["pager-management"] }
],
"autoCloseMinutes": 120,
"retry": { "maxAttempts": 3, "backoffSeconds": 60 }
}
```
- Stages execute sequentially until an **ack** is recorded.
- Deterministic ordering: channels within a stage are sorted lexicographically before dispatch.
## Ack tokens
- Token payload: `{ tenant, deliveryId, expiresUtc, ruleId, actionHash }`.
- Signed with Authority-issued DSSE key; verified by Notify WebService before accepting `POST /acks/{token}`.
- Expiry defaults to 24h; tokens are single-use and idempotent.
## Escalation flow
1) Rule fires → action references an escalation policy.
2) Stage 0 deliveries sent; ledger records attempts and ack URL.
3) If no ack by `delaySeconds`, next stage dispatches; repeats until ack or final stage.
4) On ack, remaining stages are cancelled; ledger entry marked `acknowledged` with timestamp and subject.
## Quiet hours & throttles
- Quiet hours suppress *new* escalations; in-flight escalations continue.
- Per-policy throttle prevents repeated escalation runs for identical `actionHash` within a configurable window (default 30m).
## Observability
- Counters: `notify.escalation.started`, `notify.escalation.stage_sent`, `notify.escalation.ack`, `notify.escalation.cancelled` tagged by `tenant`, `policy`, `stage`.
- Logs: structured `escalation.{started|stage_sent|ack|cancelled}` with delivery ids and rationale.
## Runbooks
- Update escalation policy safely: create new policy id, switch rules, then delete old policy to avoid mid-flight ambiguity.
- If a stage storms, set throttle higher or add quiet hours; do not delete the policy mid-flight—use `cancelEscalation` endpoint instead.