# StellaOps Notify Notify (Notifications Studio) converts platform events into tenant-scoped alerts with deterministic delivery, offline parity, and a full audit trail. The service is split between the reusable tooling in `src/Notify/*` and the runtime host in `src/Notifier/*` (decision recorded 2025-11-02) so downstream systems can embed the rules engine without inheriting the Studio UI. ## Latest updates (2025-11-30) - Sprint tracker `docs/implplan/SPRINT_322_docs_modules_notify.md` and module `TASKS.md` added to mirror status. - Observability runbook stub and Grafana placeholder added under `operations/` (offline import); finalize after next demo. - NOTIFY-DOCS-0002 remains blocked pending NOTIFY-SVC-39-001..004 outputs (correlation/digests/simulation/quiet hours). - `2026-04-15`: Notify/Notifier production hosts now use the shared PostgreSQL + Redis-backed Notify persistence/queue path instead of live in-memory shadow registrations. - `2026-04-15`: durable pack-approval persistence and restart-survival proof landed under sprint `SPRINT_20260415_002_DOCS_notify_notifier_real_backend_cutover.md`. - `2026-04-16`: non-testing throttle and operator-override admin APIs now persist through PostgreSQL-backed suppression services and legacy compat adapters in both hosts; restart-survival proof landed in `NotifierSuppressionDurableRuntimeTests`. - `2026-04-16`: non-testing escalation-policy and on-call schedule APIs now resolve through PostgreSQL-backed services plus durable legacy compat adapters in both hosts; restart-survival proof landed in `NotifierEscalationOnCallDurableRuntimeTests`. - `2026-04-16`: non-testing quiet-hours and maintenance-window admin/runtime state now persists through PostgreSQL-backed quiet-hours calendar/evaluator services plus durable compat adapters in both hosts; restart-survival proof landed in `NotifierQuietHoursMaintenanceDurableRuntimeTests`. - `2026-04-16`: non-testing webhook security, tenant isolation, dead-letter administration, and retention cleanup state now persist through PostgreSQL-backed runtime services plus durable compat adapters in both hosts; restart-survival proof landed in `NotifierSecurityDeadLetterDurableRuntimeTests`. ## Scope & responsibilities - Apply tenant-scoped rules to events from Scanner, Scheduler, VEX Lens, Attestor, Task Runner, and Zastava. - Render channel-specific payloads (Slack, Teams, Email, webhook) using deterministic templates with localisation safeguards. - Enforce throttling, digests, and quiet-hour calendars so bursts stay explainable and recoverable. - Persist deliveries, attempts, throttles, and DSSE hashes for CLI/UI investigation and compliance export. ## Current capabilities (Sprint 38 foundations) - **Rules + channels API:** `StellaOps.Notify.WebService` exposes CRUD, previews, and health probes secured by Authority scopes. - **Worker pipeline:** `StellaOps.Notify.Worker` ingests bus events, evaluates match predicates, applies per-tenant throttles, and dispatches deliveries. - **Connector plug-ins:** Restart-time plug-ins under `StellaOps.Notify.Connectors.*` (Slack, Teams, Email, generic webhook) with health checks and retry policy hints declared in `notify-plugin.json`. - **Template engine:** Deterministic rendering with safe helpers, locale bundles, and redaction defaults that keep Offline Kit parity. - **Delivery ledger:** PostgreSQL-backed ledger storing hashed payloads, attempts, throttled/digested markers, and provenance links for audit + exports. ## In progress / upcoming (Sprint 39 focus) - `NOTIFY-SVC-39-001` correlation engine with token-bucket throttles, incident lifecycle, and quiet-hours evaluator. - `NOTIFY-SVC-39-002` digest generator with schedule runner, ledger queries, and distribution across existing channels. - `NOTIFY-SVC-39-003` simulation API for rule dry-runs against historical events. - `NOTIFY-SVC-39-004` quiet-hour calendar integration and default throttles with audit logging. Status for these items is tracked in `src/Notifier/StellaOps.Notifier/TASKS.md` and sprint plans; update this README once tasks merge. ## Key docs & release alignment - [`overview.md`](overview.md) — summary of capabilities, imposed rules, and customer journey. - [`architecture.md`](architecture.md) / [`architecture-detail.md`](architecture-detail.md) — Notifications Studio runtime view. - [`rules.md`](rules.md) — declarative matcher syntax and evaluation order. - [`digests.md`](digests.md) — digest windows, coalescing logic, and delivery samples. - [`templates.md`](templates.md) — template helpers, localisation, and redaction guidelines. - [`docs/implplan/archived/updates/2025-10-29-notify-docs.md`](../../implplan/archived/updates/2025-10-29-notify-docs.md) — latest release note; follow-ups remain to validate connector metadata, quiet-hours semantics, and simulation payloads once Sprint 39 drops land. ## Integrations & dependencies - **Storage:** PostgreSQL (schema `notify`) for rules, channels, deliveries, digests, and throttles; Valkey for worker coordination. - **Queues:** Valkey Streams or NATS JetStream for ingestion, throttling, and DLQs (`notify.dlq`). - **Authority:** OpTok-protected APIs, DPoP-backed CLI/UI scopes (`notify.viewer`, `notify.operator`, `notify.admin`), and secret references for channel credentials. - **Observability:** Prometheus metrics (`notify.sent_total`, `notify.failed_total`, `notify.digest_coalesced_total`, etc.), OTEL traces, and dashboards documented in `architecture-detail.md`. ## Operational notes - Schema fixtures live in `./resources/schemas`; event and delivery samples live in `./resources/samples` for contract tests and UI mocks. - Offline Kit bundles ship plug-ins, default templates, and seed rules; update manifests under `ops/offline-kit/` when connectors change. - Dashboards and alert references depend on `DEVOPS-NOTIFY-39-002`; coordinate before renaming metrics or labels. - Observability assets: `operations/observability.md` and `operations/dashboards/notify-observability.json` (offline import). - When releasing new rule or connector features, update guidance in this directory and related checklists until the follow-ups are closed. ## Epic alignment - **Epic 11 – Notifications Studio:** notifications workspace, preview tooling, immutable delivery ledger, throttling/digest controls, and forthcoming correlation/simulation features. ## Implementation Status ### Delivery Phases - **Phase 1 – Core rules engine & delivery ledger:** Implement rules/channels schema, event ingestion, rule evaluation, idempotent deliveries, audit logging - **Phase 2 – Connectors & rendering:** Ship Slack/Teams/Email/Webhook connectors, template rendering, localization, throttling, retries, secret referencing - **Phase 3 – Console & CLI authoring:** Provide UI/CLI for rule authoring, previews, channel health, delivery browsing, digests, test sends - **Phase 4 – Governance & observability:** Add approvals, RBAC, tenant quotas, metrics/logs/traces, dashboards, alerts, runbooks - **Phase 5 – Offline & compliance:** Produce Offline Kit bundles (rules/channels/deploy scripts), signed exports, retention policies, auditing ### Acceptance Criteria - Rules evaluate deterministically per event; deliveries idempotent with audit trail and DSSE signatures - Channel connectors support retries, rate limits, health checks, previews; secrets referenced securely - Console/CLI support rule creation, testing, digests, delivery browsing, export/import workflows - Observability dashboards track delivery health; alerts fire for sustained failures or backlog; runbooks cover remediation - Offline Kit bundle contains configs, rules, digests, deployment scripts for air-gapped installs - Notify respects tenancy and RBAC; governance (approvals, change log) enforced for high-impact rules ### Key Risks & Mitigations - **Notification storms:** Throttling, digests, dedupe windows, preview/test gating - **Secret compromise:** Secret references only, rotation workflows, audit logging - **Connector API changes:** Versioned adapter layer, nightly health checks, fallback channels - **Noise vs signal:** Simulation previews, metrics, rule scoring, recommended defaults - **Offline parity:** Export/import of rules, connectors, digests with signed manifests ### Current Phase Progress - Phase 1: Core rules engine mostly complete; template dispatch/rendering in progress - Phase 2: Connector and rendering work not yet started; depends on Phase 1 completion - Phase 3: Console/CLI authoring work not started; depends on Phase 2 completion - Phase 4: Core observability complete; governance and risk notifications blocked on upstream dependencies - Phase 5: Offline basics complete; tenancy work blocked on upstream Sprint 0172