- API_CLI_REFERENCE.md, INSTALL_GUIDE.md, quickstart.md, architecture/integrations.md, dev/DEV_ENVIRONMENT_SETUP.md, integrations/LOCAL_SERVICES.md: reflect real-service wiring. - docs/modules/**: module dossier updates across the modules touched by SPRINT_20260415_001..007 + SPRINT_20260416_003..017 + SPRINT_20260417_018..024 + SPRINT_20260418_025 + SPRINT_20260419_026. - docs/features/checked/web/**: update feature notes where UI changed. - docs/qa/feature-checks/runs/web/evidence-presentation-ux/: QA evidence artifacts. - docs/setup/**, docs/technical/**: align with setup wizard contracts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
90 lines
8.6 KiB
Markdown
90 lines
8.6 KiB
Markdown
# StellaOps Notify
|
||
|
||
Notify (Notifications Studio) converts platform events into tenant-scoped alerts with deterministic delivery, offline parity, and a full audit trail. The service is split between the reusable tooling in `src/Notify/*` and the runtime host in `src/Notifier/*` (decision recorded 2025-11-02) so downstream systems can embed the rules engine without inheriting the Studio UI.
|
||
|
||
## Latest updates (2025-11-30)
|
||
- Sprint tracker `docs/implplan/SPRINT_322_docs_modules_notify.md` and module `TASKS.md` added to mirror status.
|
||
- Observability runbook stub and Grafana placeholder added under `operations/` (offline import); finalize after next demo.
|
||
- NOTIFY-DOCS-0002 remains blocked pending NOTIFY-SVC-39-001..004 outputs (correlation/digests/simulation/quiet hours).
|
||
- `2026-04-15`: Notify/Notifier production hosts now use the shared PostgreSQL + Redis-backed Notify persistence/queue path instead of live in-memory shadow registrations.
|
||
- `2026-04-15`: durable pack-approval persistence and restart-survival proof landed under sprint `SPRINT_20260415_002_DOCS_notify_notifier_real_backend_cutover.md`.
|
||
- `2026-04-16`: non-testing throttle and operator-override admin APIs now persist through PostgreSQL-backed suppression services and legacy compat adapters in both hosts; restart-survival proof landed in `NotifierSuppressionDurableRuntimeTests`.
|
||
- `2026-04-16`: non-testing escalation-policy and on-call schedule APIs now resolve through PostgreSQL-backed services plus durable legacy compat adapters in both hosts; restart-survival proof landed in `NotifierEscalationOnCallDurableRuntimeTests`.
|
||
- `2026-04-16`: non-testing quiet-hours and maintenance-window admin/runtime state now persists through PostgreSQL-backed quiet-hours calendar/evaluator services plus durable compat adapters in both hosts; restart-survival proof landed in `NotifierQuietHoursMaintenanceDurableRuntimeTests`.
|
||
- `2026-04-16`: non-testing webhook security, tenant isolation, dead-letter administration, and retention cleanup state now persist through PostgreSQL-backed runtime services plus durable compat adapters in both hosts; restart-survival proof landed in `NotifierSecurityDeadLetterDurableRuntimeTests`.
|
||
|
||
## Scope & responsibilities
|
||
- Apply tenant-scoped rules to events from Scanner, Scheduler, VEX Lens, Attestor, Task Runner, and Zastava.
|
||
- Render channel-specific payloads (Slack, Teams, Email, webhook) using deterministic templates with localisation safeguards.
|
||
- Enforce throttling, digests, and quiet-hour calendars so bursts stay explainable and recoverable.
|
||
- Persist deliveries, attempts, throttles, and DSSE hashes for CLI/UI investigation and compliance export.
|
||
|
||
## Current capabilities (Sprint 38 foundations)
|
||
- **Rules + channels API:** `StellaOps.Notify.WebService` exposes CRUD, previews, and health probes secured by Authority scopes.
|
||
- **Worker pipeline:** `StellaOps.Notify.Worker` ingests bus events, evaluates match predicates, applies per-tenant throttles, and dispatches deliveries.
|
||
- **Connector plug-ins:** Restart-time plug-ins under `StellaOps.Notify.Connectors.*` (Slack, Teams, Email, generic webhook) with health checks and retry policy hints declared in `notify-plugin.json`.
|
||
- **Template engine:** Deterministic rendering with safe helpers, locale bundles, and redaction defaults that keep Offline Kit parity.
|
||
- **Delivery ledger:** PostgreSQL-backed ledger storing hashed payloads, attempts, throttled/digested markers, and provenance links for audit + exports.
|
||
|
||
## In progress / upcoming (Sprint 39 focus)
|
||
- `NOTIFY-SVC-39-001` correlation engine with token-bucket throttles, incident lifecycle, and quiet-hours evaluator.
|
||
- `NOTIFY-SVC-39-002` digest generator with schedule runner, ledger queries, and distribution across existing channels.
|
||
- `NOTIFY-SVC-39-003` simulation API for rule dry-runs against historical events.
|
||
- `NOTIFY-SVC-39-004` quiet-hour calendar integration and default throttles with audit logging.
|
||
Status for these items is tracked in `src/Notifier/StellaOps.Notifier/TASKS.md` and sprint plans; update this README once tasks merge.
|
||
|
||
## Key docs & release alignment
|
||
- [`overview.md`](overview.md) — summary of capabilities, imposed rules, and customer journey.
|
||
- [`architecture.md`](architecture.md) / [`architecture-detail.md`](architecture-detail.md) — Notifications Studio runtime view.
|
||
- [`rules.md`](rules.md) — declarative matcher syntax and evaluation order.
|
||
- [`digests.md`](digests.md) — digest windows, coalescing logic, and delivery samples.
|
||
- [`templates.md`](templates.md) — template helpers, localisation, and redaction guidelines.
|
||
- [`docs/implplan/archived/updates/2025-10-29-notify-docs.md`](../../implplan/archived/updates/2025-10-29-notify-docs.md) — latest release note; follow-ups remain to validate connector metadata, quiet-hours semantics, and simulation payloads once Sprint 39 drops land.
|
||
|
||
## Integrations & dependencies
|
||
- **Storage:** PostgreSQL (schema `notify`) for rules, channels, deliveries, digests, and throttles; Valkey for worker coordination.
|
||
- **Queues:** Valkey Streams or NATS JetStream for ingestion, throttling, and DLQs (`notify.dlq`).
|
||
- **Authority:** OpTok-protected APIs, DPoP-backed CLI/UI scopes (`notify.viewer`, `notify.operator`, `notify.admin`), and secret references for channel credentials.
|
||
- **Observability:** Prometheus metrics (`notify.sent_total`, `notify.failed_total`, `notify.digest_coalesced_total`, etc.), OTEL traces, and dashboards documented in `architecture-detail.md`.
|
||
|
||
## Operational notes
|
||
- Schema fixtures live in `./resources/schemas`; event and delivery samples live in `./resources/samples` for contract tests and UI mocks.
|
||
- Offline Kit bundles ship plug-ins, default templates, and seed rules; update manifests under `ops/offline-kit/` when connectors change.
|
||
- Dashboards and alert references depend on `DEVOPS-NOTIFY-39-002`; coordinate before renaming metrics or labels.
|
||
- Observability assets: `operations/observability.md` and `operations/dashboards/notify-observability.json` (offline import).
|
||
- When releasing new rule or connector features, update guidance in this directory and related checklists until the follow-ups are closed.
|
||
|
||
## Epic alignment
|
||
- **Epic 11 – Notifications Studio:** notifications workspace, preview tooling, immutable delivery ledger, throttling/digest controls, and forthcoming correlation/simulation features.
|
||
|
||
## Implementation Status
|
||
|
||
### Delivery Phases
|
||
- **Phase 1 – Core rules engine & delivery ledger:** Implement rules/channels schema, event ingestion, rule evaluation, idempotent deliveries, audit logging
|
||
- **Phase 2 – Connectors & rendering:** Ship Slack/Teams/Email/Webhook connectors, template rendering, localization, throttling, retries, secret referencing
|
||
- **Phase 3 – Console & CLI authoring:** Provide UI/CLI for rule authoring, previews, channel health, delivery browsing, digests, test sends
|
||
- **Phase 4 – Governance & observability:** Add approvals, RBAC, tenant quotas, metrics/logs/traces, dashboards, alerts, runbooks
|
||
- **Phase 5 – Offline & compliance:** Produce Offline Kit bundles (rules/channels/deploy scripts), signed exports, retention policies, auditing
|
||
|
||
### Acceptance Criteria
|
||
- Rules evaluate deterministically per event; deliveries idempotent with audit trail and DSSE signatures
|
||
- Channel connectors support retries, rate limits, health checks, previews; secrets referenced securely
|
||
- Console/CLI support rule creation, testing, digests, delivery browsing, export/import workflows
|
||
- Observability dashboards track delivery health; alerts fire for sustained failures or backlog; runbooks cover remediation
|
||
- Offline Kit bundle contains configs, rules, digests, deployment scripts for air-gapped installs
|
||
- Notify respects tenancy and RBAC; governance (approvals, change log) enforced for high-impact rules
|
||
|
||
### Key Risks & Mitigations
|
||
- **Notification storms:** Throttling, digests, dedupe windows, preview/test gating
|
||
- **Secret compromise:** Secret references only, rotation workflows, audit logging
|
||
- **Connector API changes:** Versioned adapter layer, nightly health checks, fallback channels
|
||
- **Noise vs signal:** Simulation previews, metrics, rule scoring, recommended defaults
|
||
- **Offline parity:** Export/import of rules, connectors, digests with signed manifests
|
||
|
||
### Current Phase Progress
|
||
- Phase 1: Core rules engine mostly complete; template dispatch/rendering in progress
|
||
- Phase 2: Connector and rendering work not yet started; depends on Phase 1 completion
|
||
- Phase 3: Console/CLI authoring work not started; depends on Phase 2 completion
|
||
- Phase 4: Core observability complete; governance and risk notifications blocked on upstream dependencies
|
||
- Phase 5: Offline basics complete; tenancy work blocked on upstream Sprint 0172
|