Files
git.stella-ops.org/docs/notifications/digests.md
root 68da90a11a
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Restructure solution layout by module
2025-10-28 15:10:40 +02:00

5.0 KiB
Raw Blame History

Notifications Digests

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

Digests coalesce multiple matching events into a single notification when rules request batched delivery. They protect responders from alert storms while preserving a deterministic record of every input.


1. Digest lifecycle

  1. Window selection. Rule actions opt into a digest cadence by setting actions[].digest (instant, 5m, 15m, 1h, 1d). instant skips digest logic entirely.
  2. Aggregation. When an event matches, the worker appends it to the open digest window (tenantId + actionId + window). Events include the canonical scope, delta counts, and references.
  3. Flush. When the window expires or hits the workers safety cap (configurable), the worker renders a digest template and emits a single delivery with status Digested.
  4. Audit. The delivery ledger links back to the digest document so operators can inspect individual items and the aggregated summary.

2. Storage model

Digest state lives in Mongo (digests collection) and mirrors the schema described in ARCHITECTURE_NOTIFY.md:

{
  "_id": "tenant-dev:act-email-compliance:1h",
  "tenantId": "tenant-dev",
  "actionKey": "act-email-compliance",
  "window": "1h",
  "openedAt": "2025-10-24T08:00:00Z",
  "status": "open",
  "items": [
    {
      "eventId": "00000000-0000-0000-0000-000000000001",
      "scope": {
        "namespace": "prod-payments",
        "repo": "ghcr.io/acme/api",
        "digest": "sha256:…"
      },
      "delta": {
        "newCritical": 1,
        "kev": 1
      }
    }
  ]
}
  • status reflects whether the window is currently collecting (open) or has been completed (closed). Future revisions may introduce flushing for in-progress operations.
  • items[].delta captures aggregated counts for reporting (e.g., new critical findings, KEV, quieted).
  • Workers use optimistic concurrency on the document ID to avoid duplicate flushes across replicas.

3. Rendering and templates

  • Digest deliveries use the same template engine as instant notifications. Templates receive an additional digest object with window, openedAt, itemCount, and items (findings grouped by namespace/repository when available).
  • Provide digest-specific templates (e.g., tmpl-digest-hourly) so the body can enumerate top offenders, summarise totals, and link to detailed dashboards.
  • When no template is specified, Notify falls back to channel defaults that emphasise summary counts and redirect to Console for detail.

4. API surface

Endpoint Description Notes
POST /digests Issues administrative commands (e.g., force flush, reopen) for a specific action/window. Request body specifies the command target; requires notify.admin.
GET /digests/{actionKey} Returns the currently open window (if any) for the referenced action. Supports operators/CLI inspecting pending digests; requires notify.read.
DELETE /digests/{actionKey} Drops the open window without notifying (emergency stop). Emits an audit record; use sparingly.

All routes honour the tenant header and reuse the standard Notify rate limits.


5. Worker behaviour and safety nets

  • Idempotency. Flush operations generate a deterministic digest delivery ID (digest:<tenant>:<actionId>:<window>:<openedAt>). Retries reuse the same ID.
  • Throttles. Digest generation respects action throttles; setting an aggressive throttle together with a digest window may result in deliberate skips (logged as Throttled in the delivery ledger).
  • Quiet hours. Future sprint work (NOTIFY-SVC-39-004) integrates quiet-hour calendars. When enabled, flush timers pause during quiet windows and resume afterwards.
  • Back-pressure. When the window reaches the configured item cap before the timer, the worker flushes early and starts a new window immediately.
  • Crash resilience. Workers rebuild in-flight windows from Mongo on startup; partially flushed windows remain closed after success or reopened if the flush fails.

6. Operator guidance

  • Choose hourly digests for high-volume compliance events; daily digests suit executive reporting.
  • Pair digests with incident-focused instant rules so critical items surface immediately while less urgent noise is summarised.
  • Monitor /stats output for openDigestCount to ensure windows are flushing; spikes may indicate downstream connector failures.
  • When testing new digest templates, open a small (5m) window, trigger sample events, then call POST /digests/{actionId}/flush to validate rendering before moving to longer cadences.

Imposed rule reminder: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.