Files
git.stella-ops.org/docs/modules/notify
2025-12-25 18:50:33 +02:00
..
2025-12-25 10:54:10 +02:00
2025-12-25 10:54:10 +02:00
2025-12-25 10:54:10 +02:00
2025-12-25 10:54:10 +02:00
2025-12-25 12:16:13 +02:00
2025-12-25 10:54:10 +02:00
2025-12-25 10:54:10 +02:00
2025-12-25 10:54:10 +02:00
2025-12-24 21:45:46 +02:00
2025-12-24 14:19:46 +02:00
2025-12-25 10:54:10 +02:00
2025-12-25 10:54:10 +02:00
2025-12-25 10:54:10 +02:00
2025-12-25 10:54:10 +02:00
2025-12-25 10:54:10 +02:00
2025-12-25 18:50:33 +02:00
2025-12-25 10:54:10 +02:00
2025-12-25 10:54:10 +02:00
2025-12-25 10:54:10 +02:00

StellaOps Notify

Notify (Notifications Studio) converts platform events into tenant-scoped alerts with deterministic delivery, offline parity, and a full audit trail. The service is split between the reusable tooling in src/Notify/* and the runtime host in src/Notifier/* (decision recorded 2025-11-02) so downstream systems can embed the rules engine without inheriting the Studio UI.

Latest updates (2025-11-30)

  • Sprint tracker docs/implplan/SPRINT_322_docs_modules_notify.md and module TASKS.md added to mirror status.
  • Observability runbook stub and Grafana placeholder added under operations/ (offline import); finalize after next demo.
  • NOTIFY-DOCS-0002 remains blocked pending NOTIFY-SVC-39-001..004 outputs (correlation/digests/simulation/quiet hours).

Scope & responsibilities

  • Apply tenant-scoped rules to events from Scanner, Scheduler, VEX Lens, Attestor, Task Runner, and Zastava.
  • Render channel-specific payloads (Slack, Teams, Email, webhook) using deterministic templates with localisation safeguards.
  • Enforce throttling, digests, and quiet-hour calendars so bursts stay explainable and recoverable.
  • Persist deliveries, attempts, throttles, and DSSE hashes for CLI/UI investigation and compliance export.

Current capabilities (Sprint 38 foundations)

  • Rules + channels API: StellaOps.Notify.WebService exposes CRUD, previews, and health probes secured by Authority scopes.
  • Worker pipeline: StellaOps.Notify.Worker ingests bus events, evaluates match predicates, applies per-tenant throttles, and dispatches deliveries.
  • Connector plug-ins: Restart-time plug-ins under StellaOps.Notify.Connectors.* (Slack, Teams, Email, generic webhook) with health checks and retry policy hints declared in notify-plugin.json.
  • Template engine: Deterministic rendering with safe helpers, locale bundles, and redaction defaults that keep Offline Kit parity.
  • Delivery ledger: PostgreSQL-backed ledger storing hashed payloads, attempts, throttled/digested markers, and provenance links for audit + exports.

In progress / upcoming (Sprint 39 focus)

  • NOTIFY-SVC-39-001 correlation engine with token-bucket throttles, incident lifecycle, and quiet-hours evaluator.
  • NOTIFY-SVC-39-002 digest generator with schedule runner, ledger queries, and distribution across existing channels.
  • NOTIFY-SVC-39-003 simulation API for rule dry-runs against historical events.
  • NOTIFY-SVC-39-004 quiet-hour calendar integration and default throttles with audit logging. Status for these items is tracked in src/Notifier/StellaOps.Notifier/TASKS.md and sprint plans; update this README once tasks merge.

Key docs & release alignment

Integrations & dependencies

  • Storage: PostgreSQL (schema notify) for rules, channels, deliveries, digests, and throttles; Valkey for worker coordination.
  • Queues: Valkey Streams or NATS JetStream for ingestion, throttling, and DLQs (notify.dlq).
  • Authority: OpTok-protected APIs, DPoP-backed CLI/UI scopes (notify.viewer, notify.operator, notify.admin), and secret references for channel credentials.
  • Observability: Prometheus metrics (notify.sent_total, notify.failed_total, notify.digest_coalesced_total, etc.), OTEL traces, and dashboards documented in architecture-detail.md.

Operational notes

  • Schema fixtures live in ./resources/schemas; event and delivery samples live in ./resources/samples for contract tests and UI mocks.
  • Offline Kit bundles ship plug-ins, default templates, and seed rules; update manifests under ops/offline-kit/ when connectors change.
  • Dashboards and alert references depend on DEVOPS-NOTIFY-39-002; coordinate before renaming metrics or labels.
  • Observability assets: operations/observability.md and operations/dashboards/notify-observability.json (offline import).
  • When releasing new rule or connector features, update guidance in this directory and related checklists until the follow-ups are closed.

Epic alignment

  • Epic 11 Notifications Studio: notifications workspace, preview tooling, immutable delivery ledger, throttling/digest controls, and forthcoming correlation/simulation features.

Implementation Status

Delivery Phases

  • Phase 1 Core rules engine & delivery ledger: Implement rules/channels schema, event ingestion, rule evaluation, idempotent deliveries, audit logging
  • Phase 2 Connectors & rendering: Ship Slack/Teams/Email/Webhook connectors, template rendering, localization, throttling, retries, secret referencing
  • Phase 3 Console & CLI authoring: Provide UI/CLI for rule authoring, previews, channel health, delivery browsing, digests, test sends
  • Phase 4 Governance & observability: Add approvals, RBAC, tenant quotas, metrics/logs/traces, dashboards, alerts, runbooks
  • Phase 5 Offline & compliance: Produce Offline Kit bundles (rules/channels/deploy scripts), signed exports, retention policies, auditing

Acceptance Criteria

  • Rules evaluate deterministically per event; deliveries idempotent with audit trail and DSSE signatures
  • Channel connectors support retries, rate limits, health checks, previews; secrets referenced securely
  • Console/CLI support rule creation, testing, digests, delivery browsing, export/import workflows
  • Observability dashboards track delivery health; alerts fire for sustained failures or backlog; runbooks cover remediation
  • Offline Kit bundle contains configs, rules, digests, deployment scripts for air-gapped installs
  • Notify respects tenancy and RBAC; governance (approvals, change log) enforced for high-impact rules

Key Risks & Mitigations

  • Notification storms: Throttling, digests, dedupe windows, preview/test gating
  • Secret compromise: Secret references only, rotation workflows, audit logging
  • Connector API changes: Versioned adapter layer, nightly health checks, fallback channels
  • Noise vs signal: Simulation previews, metrics, rule scoring, recommended defaults
  • Offline parity: Export/import of rules, connectors, digests with signed manifests

Current Phase Progress

  • Phase 1: Core rules engine mostly complete; template dispatch/rendering in progress
  • Phase 2: Connector and rendering work not yet started; depends on Phase 1 completion
  • Phase 3: Console/CLI authoring work not started; depends on Phase 2 completion
  • Phase 4: Core observability complete; governance and risk notifications blocked on upstream dependencies
  • Phase 5: Offline basics complete; tenancy work blocked on upstream Sprint 0172