Files
git.stella-ops.org/docs/modules/notify/README.md
2025-12-25 18:50:33 +02:00

84 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# StellaOps Notify
Notify (Notifications Studio) converts platform events into tenant-scoped alerts with deterministic delivery, offline parity, and a full audit trail. The service is split between the reusable tooling in `src/Notify/*` and the runtime host in `src/Notifier/*` (decision recorded 2025-11-02) so downstream systems can embed the rules engine without inheriting the Studio UI.
## Latest updates (2025-11-30)
- Sprint tracker `docs/implplan/SPRINT_322_docs_modules_notify.md` and module `TASKS.md` added to mirror status.
- Observability runbook stub and Grafana placeholder added under `operations/` (offline import); finalize after next demo.
- NOTIFY-DOCS-0002 remains blocked pending NOTIFY-SVC-39-001..004 outputs (correlation/digests/simulation/quiet hours).
## Scope & responsibilities
- Apply tenant-scoped rules to events from Scanner, Scheduler, VEX Lens, Attestor, Task Runner, and Zastava.
- Render channel-specific payloads (Slack, Teams, Email, webhook) using deterministic templates with localisation safeguards.
- Enforce throttling, digests, and quiet-hour calendars so bursts stay explainable and recoverable.
- Persist deliveries, attempts, throttles, and DSSE hashes for CLI/UI investigation and compliance export.
## Current capabilities (Sprint 38 foundations)
- **Rules + channels API:** `StellaOps.Notify.WebService` exposes CRUD, previews, and health probes secured by Authority scopes.
- **Worker pipeline:** `StellaOps.Notify.Worker` ingests bus events, evaluates match predicates, applies per-tenant throttles, and dispatches deliveries.
- **Connector plug-ins:** Restart-time plug-ins under `StellaOps.Notify.Connectors.*` (Slack, Teams, Email, generic webhook) with health checks and retry policy hints declared in `notify-plugin.json`.
- **Template engine:** Deterministic rendering with safe helpers, locale bundles, and redaction defaults that keep Offline Kit parity.
- **Delivery ledger:** PostgreSQL-backed ledger storing hashed payloads, attempts, throttled/digested markers, and provenance links for audit + exports.
## In progress / upcoming (Sprint 39 focus)
- `NOTIFY-SVC-39-001` correlation engine with token-bucket throttles, incident lifecycle, and quiet-hours evaluator.
- `NOTIFY-SVC-39-002` digest generator with schedule runner, ledger queries, and distribution across existing channels.
- `NOTIFY-SVC-39-003` simulation API for rule dry-runs against historical events.
- `NOTIFY-SVC-39-004` quiet-hour calendar integration and default throttles with audit logging.
Status for these items is tracked in `src/Notifier/StellaOps.Notifier/TASKS.md` and sprint plans; update this README once tasks merge.
## Key docs & release alignment
- [`overview.md`](overview.md) — summary of capabilities, imposed rules, and customer journey.
- [`architecture.md`](architecture.md) / [`architecture-detail.md`](architecture-detail.md) — Notifications Studio runtime view.
- [`rules.md`](rules.md) — declarative matcher syntax and evaluation order.
- [`digests.md`](digests.md) — digest windows, coalescing logic, and delivery samples.
- [`templates.md`](templates.md) — template helpers, localisation, and redaction guidelines.
- [`docs/implplan/archived/updates/2025-10-29-notify-docs.md`](../../implplan/archived/updates/2025-10-29-notify-docs.md) — latest release note; follow-ups remain to validate connector metadata, quiet-hours semantics, and simulation payloads once Sprint 39 drops land.
## Integrations & dependencies
- **Storage:** PostgreSQL (schema `notify`) for rules, channels, deliveries, digests, and throttles; Valkey for worker coordination.
- **Queues:** Valkey Streams or NATS JetStream for ingestion, throttling, and DLQs (`notify.dlq`).
- **Authority:** OpTok-protected APIs, DPoP-backed CLI/UI scopes (`notify.viewer`, `notify.operator`, `notify.admin`), and secret references for channel credentials.
- **Observability:** Prometheus metrics (`notify.sent_total`, `notify.failed_total`, `notify.digest_coalesced_total`, etc.), OTEL traces, and dashboards documented in `architecture-detail.md`.
## Operational notes
- Schema fixtures live in `./resources/schemas`; event and delivery samples live in `./resources/samples` for contract tests and UI mocks.
- Offline Kit bundles ship plug-ins, default templates, and seed rules; update manifests under `ops/offline-kit/` when connectors change.
- Dashboards and alert references depend on `DEVOPS-NOTIFY-39-002`; coordinate before renaming metrics or labels.
- Observability assets: `operations/observability.md` and `operations/dashboards/notify-observability.json` (offline import).
- When releasing new rule or connector features, update guidance in this directory and related checklists until the follow-ups are closed.
## Epic alignment
- **Epic 11 Notifications Studio:** notifications workspace, preview tooling, immutable delivery ledger, throttling/digest controls, and forthcoming correlation/simulation features.
## Implementation Status
### Delivery Phases
- **Phase 1 Core rules engine & delivery ledger:** Implement rules/channels schema, event ingestion, rule evaluation, idempotent deliveries, audit logging
- **Phase 2 Connectors & rendering:** Ship Slack/Teams/Email/Webhook connectors, template rendering, localization, throttling, retries, secret referencing
- **Phase 3 Console & CLI authoring:** Provide UI/CLI for rule authoring, previews, channel health, delivery browsing, digests, test sends
- **Phase 4 Governance & observability:** Add approvals, RBAC, tenant quotas, metrics/logs/traces, dashboards, alerts, runbooks
- **Phase 5 Offline & compliance:** Produce Offline Kit bundles (rules/channels/deploy scripts), signed exports, retention policies, auditing
### Acceptance Criteria
- Rules evaluate deterministically per event; deliveries idempotent with audit trail and DSSE signatures
- Channel connectors support retries, rate limits, health checks, previews; secrets referenced securely
- Console/CLI support rule creation, testing, digests, delivery browsing, export/import workflows
- Observability dashboards track delivery health; alerts fire for sustained failures or backlog; runbooks cover remediation
- Offline Kit bundle contains configs, rules, digests, deployment scripts for air-gapped installs
- Notify respects tenancy and RBAC; governance (approvals, change log) enforced for high-impact rules
### Key Risks & Mitigations
- **Notification storms:** Throttling, digests, dedupe windows, preview/test gating
- **Secret compromise:** Secret references only, rotation workflows, audit logging
- **Connector API changes:** Versioned adapter layer, nightly health checks, fallback channels
- **Noise vs signal:** Simulation previews, metrics, rule scoring, recommended defaults
- **Offline parity:** Export/import of rules, connectors, digests with signed manifests
### Current Phase Progress
- Phase 1: Core rules engine mostly complete; template dispatch/rendering in progress
- Phase 2: Connector and rendering work not yet started; depends on Phase 1 completion
- Phase 3: Console/CLI authoring work not started; depends on Phase 2 completion
- Phase 4: Core observability complete; governance and risk notifications blocked on upstream dependencies
- Phase 5: Offline basics complete; tenancy work blocked on upstream Sprint 0172