# Implementation plan — Notify ## Delivery phases - **Phase 1 – Core rules engine & delivery ledger** Implement rules/channels schema, event ingestion, rule evaluation, idempotent deliveries, and audit logging. - **Phase 2 – Connectors & rendering** Ship Slack/Teams/Email/Webhook connectors, template rendering, localization, throttling, retries, and secret referencing. - **Phase 3 – Console & CLI authoring** Provide UI/CLI for rule authoring, previews, channel health, delivery browsing, digests, and test sends. - **Phase 4 – Governance & observability** Add approvals, RBAC, tenant quotas, Notify metrics/logs/traces, dashboards, Notify-specific alerts, and Notify runbooks. - **Phase 5 – Offline & compliance** Produce Offline Kit bundles (rules/channels/deploy scripts), signed exports, retention policies, and auditing for regulated environments. ## Work breakdown - **Service & worker** - REST API for rules/channels/delivery history, idempotency middleware, digest scheduler. - Worker pipelines for event intake, rule matching, template rendering, delivery execution, retries, and throttling. - Delivery ledger capturing payload metadata, response, retry state, DSSE signatures. - **Connectors** - Slack/Teams/Email/Webhook plug-ins with configuration validation, rate limiting, error classification. - Secrets referenced via Authority/Secret store; no plaintext storage. - **Console & CLI** - Console module for rules builder, condition editor, preview, test send, delivery insights, digests and schedule configuration. - CLI (`stella notify rule|channel|delivery`) for automation, export/import. - **Integrations** - Event sources: Concelier, Excititor, Policy Engine, Vuln Explorer, Export Center, Attestor, Zastava, Scheduler. - Notify events to Notify (meta) for failure escalations, accepted-risk expiration reminders. - **Observability & ops** - Metrics: delivery success/failure, retry counts, throttle hits, digest generation, channel health. - Logs/traces with tenant, rule ID, channel, correlation ID; dashboards and alerts. - Runbooks for misconfigured channels, throttling, event backlog, incident digest. - **Docs & compliance** - Update Notifications Studio guides, channel runbooks, security/RBAC docs, Offline Kit instructions. - Provide compliance checklist (audit logging, retention, opt-out). ## Acceptance criteria - Rules evaluate deterministically per event; deliveries idempotent with audit trail and DSSE signatures. - Channel connectors support retries, rate limits, health checks, previews; secrets referenced securely. - Console/CLI support rule creation, testing, digests, delivery browsing, and export/import workflows. - Observability dashboards track delivery health; alerts fire for sustained failures or backlog; runbooks cover remediation. - Offline Kit bundle contains configs, rules, digests, and deployment scripts for air-gapped installs. - Notify respects tenancy and RBAC; governance (approvals, change log) enforced for high-impact rules. ## Risks & mitigations - **Notification storms:** throttling, digests, dedupe windows, preview/test gating. - **Secret compromise:** secret references only, rotation workflows, audit logging. - **Connector API changes:** versioned adapter layer, nightly health checks, fallback channels. - **Noise vs signal:** simulation previews, metrics, rule scoring, recommended defaults. - **Offline parity:** export/import of rules, connectors, and digests with signed manifests. ## Test strategy - **Unit:** rule evaluation, template rendering, connector clients, throttling, digests. - **Integration:** end-to-end events from core services, multi-channel routing, retries, audit logging. - **Performance:** burst throttling, digest creation, large rule sets. - **Security:** RBAC tests, tenant isolation, secret reference validation, DSSE signature verification. - **Offline:** export/import round-trips, Offline Kit deployment, manual delivery replay. ## Definition of done - Notify service, workers, connectors, Console/CLI, observability, and Offline Kit assets shipped with documentation and runbooks. - Compliance checklist appended to docs; ./TASKS.md and ../../TASKS.md updated with progress. --- ## Sprint readiness tracker > Last updated: 2025-11-27 (NOTIFY-ENG-0001) This section maps delivery phases to implementation sprints and tracks readiness checkpoints. ### Phase 1 — Core rules engine & delivery ledger | Task ID | Status | Sprint | Notes | |---------|--------|--------|-------| | NOTIFY-SVC-37-001 | ✅ DONE (2025-11-24) | SPRINT_0172_0001_0002_notifier_ii | Pack approval contract published (OpenAPI schema, payloads). | | NOTIFY-SVC-37-002 | ✅ DONE (2025-11-24) | SPRINT_0172_0001_0002_notifier_ii | Ingestion endpoint with Mongo persistence, idempotent writes, audit trail. | | NOTIFY-SVC-37-003 | 🔄 DOING | SPRINT_0172_0001_0002_notifier_ii | Approval/policy templates, routing predicates; dispatch/rendering pending. | | NOTIFY-SVC-37-004 | ✅ DONE (2025-11-24) | SPRINT_0172_0001_0002_notifier_ii | Acknowledgement API, test harness, metrics. | | NOTIFY-OAS-61-001 | ✅ DONE (2025-11-17) | SPRINT_0171_0001_0001_notifier_i | OAS with rules/templates/incidents/quiet hours endpoints. | | NOTIFY-OAS-61-002 | ✅ DONE (2025-11-17) | SPRINT_0171_0001_0001_notifier_i | `/.well-known/openapi` discovery endpoint. | | NOTIFY-OAS-62-001 | ✅ DONE (2025-11-17) | SPRINT_0171_0001_0001_notifier_i | SDK examples for rule CRUD. | | NOTIFY-OAS-63-001 | ✅ DONE (2025-11-17) | SPRINT_0171_0001_0001_notifier_i | Deprecation headers and templates. | **Checkpoint:** Core rules engine mostly complete; template dispatch/rendering in progress. ### Phase 2 — Connectors & rendering | Task ID | Status | Sprint | Notes | |---------|--------|--------|-------| | NOTIFY-SVC-38-002 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Channel adapters (email, chat webhook, generic webhook) with retry policies. | | NOTIFY-SVC-38-003 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Template service, renderer with redaction and localization. | | NOTIFY-SVC-38-004 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | REST + WS APIs for rules CRUD, templates preview, incidents. | | NOTIFY-DOC-70-001 | ✅ DONE (2025-11-02) | SPRINT_0171_0001_0001_notifier_i | Architecture docs for `src/Notify` vs `src/Notifier` split. | **Checkpoint:** Connector and rendering work not yet started; depends on Phase 1 completion. ### Phase 3 — Console & CLI authoring | Task ID | Status | Sprint | Notes | |---------|--------|--------|-------| | NOTIFY-SVC-39-001 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Correlation engine with throttler, quiet hours, incident lifecycle. | | NOTIFY-SVC-39-002 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Digest generator with schedule runner. | | NOTIFY-SVC-39-003 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Simulation engine for dry-run rules against historical events. | | NOTIFY-SVC-39-004 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Quiet hour calendars with audit logging. | **Checkpoint:** Console/CLI authoring work not started; depends on Phase 2 completion. ### Phase 4 — Governance & observability | Task ID | Status | Sprint | Notes | |---------|--------|--------|-------| | NOTIFY-SVC-40-001 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Escalations, on-call schedules, PagerDuty/OpsGenie adapters. | | NOTIFY-SVC-40-002 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Summary storm breaker, localization bundles. | | NOTIFY-SVC-40-003 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Security hardening (signed ack links, webhook HMAC). | | NOTIFY-SVC-40-004 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Observability metrics/traces, dead-letter handling, chaos tests. | | NOTIFY-OBS-51-001 | ✅ DONE (2025-11-22) | SPRINT_0171_0001_0001_notifier_i | SLO evaluator webhooks with templates/routing/suppression. | | NOTIFY-OBS-55-001 | ✅ DONE (2025-11-22) | SPRINT_0171_0001_0001_notifier_i | Incident mode templates with evidence/trace/retention context. | | NOTIFY-ATTEST-74-001 | ✅ DONE (2025-11-16) | SPRINT_0171_0001_0001_notifier_i | Templates for verification failures, key revocations, transparency. | | NOTIFY-ATTEST-74-002 | 📝 TODO | SPRINT_0171_0001_0001_notifier_i | Wire notifications to key rotation/revocation events. | | NOTIFY-RISK-66-001 | ⏳ BLOCKED | SPRINT_0171_0001_0001_notifier_i | Risk severity escalation triggers; needs POLICY-RISK-40-002. | | NOTIFY-RISK-67-001 | ⏳ BLOCKED | SPRINT_0171_0001_0001_notifier_i | Risk profile publish/deprecate notifications. | | NOTIFY-RISK-68-001 | ⏳ BLOCKED | SPRINT_0171_0001_0001_notifier_i | Per-profile routing, quiet hours, dedupe. | **Checkpoint:** Core observability complete; governance and risk notifications blocked on upstream dependencies. ### Phase 5 — Offline & compliance | Task ID | Status | Sprint | Notes | |---------|--------|--------|-------| | NOTIFY-AIRGAP-56-002 | ✅ DONE | SPRINT_0171_0001_0001_notifier_i | Bootstrap Pack with deterministic secrets and offline validation. | | NOTIFY-TEN-48-001 | ⏳ BLOCKED | SPRINT_0173_0001_0003_notifier_iii | Tenant-scope rules/templates; needs Sprint 0172 tenancy model. | **Checkpoint:** Offline basics complete; tenancy work blocked on upstream Sprint 0172. --- ### Overall readiness summary | Phase | Status | Blocking items | |-------|--------|----------------| | **1 – Core rules engine** | 🔄 In progress | NOTIFY-SVC-37-003 dispatch/rendering | | **2 – Connectors & rendering** | 📝 Not started | Phase 1 completion | | **3 – Console & CLI** | 📝 Not started | Phase 2 completion | | **4 – Governance & observability** | 🔄 Partial | POLICY-RISK-40-002 for risk notifications | | **5 – Offline & compliance** | 🔄 Partial | Sprint 0172 tenancy model | ### Cross-module dependencies | Dependency | Required by | Status | |------------|-------------|--------| | Attestor payload localization | NOTIFY-ATTEST-74-002 | Freeze pending | | POLICY-RISK-40-002 export | NOTIFY-RISK-66/67/68 | BLOCKED | | Sprint 0172 tenancy model | NOTIFY-TEN-48-001 | In progress | | Telemetry SLO webhook schema | NOTIFY-OBS-51-001 | ✅ Published (`docs/notifications/slo-webhook-schema.md`) | ### Next actions 1. Complete NOTIFY-SVC-37-003 dispatch/rendering wiring (Sprint 0172). 2. Start NOTIFY-SVC-38-002 channel adapters once Phase 1 closes. 3. Track POLICY-RISK-40-002 to unblock risk notification tasks. 4. Monitor Sprint 0172 tenancy model for NOTIFY-TEN-48-001.