7.2 KiB
7.2 KiB
StellaOps Notify
Notify (Notifications Studio) converts platform events into tenant-scoped alerts with deterministic delivery, offline parity, and a full audit trail. The service is split between the reusable tooling in src/Notify/* and the runtime host in src/Notifier/* (decision recorded 2025-11-02) so downstream systems can embed the rules engine without inheriting the Studio UI.
Latest updates (2025-11-30)
- Sprint tracker
docs/implplan/SPRINT_322_docs_modules_notify.mdand moduleTASKS.mdadded to mirror status. - Observability runbook stub and Grafana placeholder added under
operations/(offline import); finalize after next demo. - NOTIFY-DOCS-0002 remains blocked pending NOTIFY-SVC-39-001..004 outputs (correlation/digests/simulation/quiet hours).
Scope & responsibilities
- Apply tenant-scoped rules to events from Scanner, Scheduler, VEX Lens, Attestor, Task Runner, and Zastava.
- Render channel-specific payloads (Slack, Teams, Email, webhook) using deterministic templates with localisation safeguards.
- Enforce throttling, digests, and quiet-hour calendars so bursts stay explainable and recoverable.
- Persist deliveries, attempts, throttles, and DSSE hashes for CLI/UI investigation and compliance export.
Current capabilities (Sprint 38 foundations)
- Rules + channels API:
StellaOps.Notify.WebServiceexposes CRUD, previews, and health probes secured by Authority scopes. - Worker pipeline:
StellaOps.Notify.Workeringests bus events, evaluates match predicates, applies per-tenant throttles, and dispatches deliveries. - Connector plug-ins: Restart-time plug-ins under
StellaOps.Notify.Connectors.*(Slack, Teams, Email, generic webhook) with health checks and retry policy hints declared innotify-plugin.json. - Template engine: Deterministic rendering with safe helpers, locale bundles, and redaction defaults that keep Offline Kit parity.
- Delivery ledger: PostgreSQL-backed ledger storing hashed payloads, attempts, throttled/digested markers, and provenance links for audit + exports.
In progress / upcoming (Sprint 39 focus)
NOTIFY-SVC-39-001correlation engine with token-bucket throttles, incident lifecycle, and quiet-hours evaluator.NOTIFY-SVC-39-002digest generator with schedule runner, ledger queries, and distribution across existing channels.NOTIFY-SVC-39-003simulation API for rule dry-runs against historical events.NOTIFY-SVC-39-004quiet-hour calendar integration and default throttles with audit logging. Status for these items is tracked insrc/Notifier/StellaOps.Notifier/TASKS.mdand sprint plans; update this README once tasks merge.
Key docs & release alignment
overview.md— summary of capabilities, imposed rules, and customer journey.architecture.md/architecture-detail.md— Notifications Studio runtime view.rules.md— declarative matcher syntax and evaluation order.digests.md— digest windows, coalescing logic, and delivery samples.templates.md— template helpers, localisation, and redaction guidelines.docs/implplan/archived/updates/2025-10-29-notify-docs.md— latest release note; follow-ups remain to validate connector metadata, quiet-hours semantics, and simulation payloads once Sprint 39 drops land.
Integrations & dependencies
- Storage: PostgreSQL (schema
notify) for rules, channels, deliveries, digests, and throttles; Valkey for worker coordination. - Queues: Valkey Streams or NATS JetStream for ingestion, throttling, and DLQs (
notify.dlq). - Authority: OpTok-protected APIs, DPoP-backed CLI/UI scopes (
notify.viewer,notify.operator,notify.admin), and secret references for channel credentials. - Observability: Prometheus metrics (
notify.sent_total,notify.failed_total,notify.digest_coalesced_total, etc.), OTEL traces, and dashboards documented inarchitecture-detail.md.
Operational notes
- Schema fixtures live in
./resources/schemas; event and delivery samples live in./resources/samplesfor contract tests and UI mocks. - Offline Kit bundles ship plug-ins, default templates, and seed rules; update manifests under
ops/offline-kit/when connectors change. - Dashboards and alert references depend on
DEVOPS-NOTIFY-39-002; coordinate before renaming metrics or labels. - Observability assets:
operations/observability.mdandoperations/dashboards/notify-observability.json(offline import). - When releasing new rule or connector features, update guidance in this directory and related checklists until the follow-ups are closed.
Epic alignment
- Epic 11 – Notifications Studio: notifications workspace, preview tooling, immutable delivery ledger, throttling/digest controls, and forthcoming correlation/simulation features.
Implementation Status
Delivery Phases
- Phase 1 – Core rules engine & delivery ledger: Implement rules/channels schema, event ingestion, rule evaluation, idempotent deliveries, audit logging
- Phase 2 – Connectors & rendering: Ship Slack/Teams/Email/Webhook connectors, template rendering, localization, throttling, retries, secret referencing
- Phase 3 – Console & CLI authoring: Provide UI/CLI for rule authoring, previews, channel health, delivery browsing, digests, test sends
- Phase 4 – Governance & observability: Add approvals, RBAC, tenant quotas, metrics/logs/traces, dashboards, alerts, runbooks
- Phase 5 – Offline & compliance: Produce Offline Kit bundles (rules/channels/deploy scripts), signed exports, retention policies, auditing
Acceptance Criteria
- Rules evaluate deterministically per event; deliveries idempotent with audit trail and DSSE signatures
- Channel connectors support retries, rate limits, health checks, previews; secrets referenced securely
- Console/CLI support rule creation, testing, digests, delivery browsing, export/import workflows
- Observability dashboards track delivery health; alerts fire for sustained failures or backlog; runbooks cover remediation
- Offline Kit bundle contains configs, rules, digests, deployment scripts for air-gapped installs
- Notify respects tenancy and RBAC; governance (approvals, change log) enforced for high-impact rules
Key Risks & Mitigations
- Notification storms: Throttling, digests, dedupe windows, preview/test gating
- Secret compromise: Secret references only, rotation workflows, audit logging
- Connector API changes: Versioned adapter layer, nightly health checks, fallback channels
- Noise vs signal: Simulation previews, metrics, rule scoring, recommended defaults
- Offline parity: Export/import of rules, connectors, digests with signed manifests
Current Phase Progress
- Phase 1: Core rules engine mostly complete; template dispatch/rendering in progress
- Phase 2: Connector and rendering work not yet started; depends on Phase 1 completion
- Phase 3: Console/CLI authoring work not started; depends on Phase 2 completion
- Phase 4: Core observability complete; governance and risk notifications blocked on upstream dependencies
- Phase 5: Offline basics complete; tenancy work blocked on upstream Sprint 0172