119 lines
8.8 KiB
Markdown
119 lines
8.8 KiB
Markdown
# Notifications Architecture
|
||
|
||
> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||
|
||
This dossier distils the Notify architecture into implementation-ready guidance for service owners, SREs, and integrators. It complements the high-level overview by detailing process boundaries, persistence models, and extensibility points.
|
||
|
||
---
|
||
|
||
## 1. Runtime shape
|
||
|
||
```
|
||
┌──────────────────┐
|
||
│ Authority (OpTok)│
|
||
└───────┬──────────┘
|
||
│
|
||
┌───────▼──────────┐ ┌───────────────┐
|
||
│ Notify.WebService│◀──────▶│ MongoDB │
|
||
Tenant API│ REST + gRPC WIP │ │ rules/channels│
|
||
└───────▲──────────┘ │ deliveries │
|
||
│ │ digests │
|
||
Internal bus │ └───────────────┘
|
||
(NATS/Redis/etc) │
|
||
│
|
||
┌─────────▼─────────┐ ┌───────────────┐
|
||
│ Notify.Worker │◀────▶│ Redis / Cache │
|
||
│ rule eval + render│ │ throttles/locks│
|
||
└─────────▲─────────┘ └───────▲───────┘
|
||
│ │
|
||
│ │
|
||
┌──────┴──────┐ ┌─────────┴────────┐
|
||
│ Connectors │──────▶│ Slack/Teams/... │
|
||
│ (plug-ins) │ │ External targets │
|
||
└─────────────┘ └──────────────────┘
|
||
```
|
||
|
||
- **WebService** hosts REST endpoints (`/channels`, `/rules`, `/templates`, `/deliveries`, `/digests`, `/stats`) and handles schema normalisation, validation, and Authority enforcement.
|
||
- **Worker** subscribes to the platform event bus, evaluates rules per tenant, applies throttles/digests, renders payloads, writes ledger entries, and invokes connectors.
|
||
- **Plug-ins** live under `plugins/notify/` and are loaded deterministically at service start (`orderedPlugins` list). Each implements connector contracts and optional health/test-preview providers.
|
||
|
||
Both services share options via `notify.yaml` (see `etc/notify.yaml.sample`). For dev/test scenarios, an in-memory repository exists but production requires Mongo + Redis/NATS for durability and coordination.
|
||
|
||
---
|
||
|
||
## 2. Event ingestion and rule evaluation
|
||
|
||
1. **Subscription.** Workers attach to the internal bus (Redis Streams or NATS JetStream). Each partition key is `tenantId|scope.digest|event.kind` to preserve order for a given artefact.
|
||
2. **Normalisation.** Incoming events are hydrated into `NotifyEvent` envelopes. Payload JSON is normalised (sorted object keys) to preserve determinism and enable hashing.
|
||
3. **Rule snapshot.** Per-tenant rule sets are cached in memory. Change streams from Mongo trigger snapshot refreshes without restart.
|
||
4. **Match pipeline.**
|
||
- Tenant check (`rule.tenantId` vs. event tenant).
|
||
- Kind/namespace/repository/digest filters.
|
||
- Severity and KEV gating based on event deltas.
|
||
- VEX gating using `NotifyRuleMatchVex`.
|
||
- Action iteration with throttle/digest decisions.
|
||
5. **Idempotency.** Each action computes `hash(ruleId|actionId|event.kind|scope.digest|delta.hash|dayBucket)`; matches within throttle TTL record `status=Throttled` and stop.
|
||
6. **Dispatch.** If digest is `instant`, the renderer immediately processes the action. Otherwise the event is appended to the digest window for later flush.
|
||
|
||
Failures during evaluation are logged with correlation IDs and surfaced through `/stats` and worker metrics (`notify_rule_eval_failures_total`, `notify_digest_flush_errors_total`).
|
||
|
||
---
|
||
|
||
## 3. Rendering & connectors
|
||
|
||
- **Template resolution.** The renderer picks the template in this order: action template → channel default template → locale fallback → built-in minimal template. Locale negotiation reduces `en-US` to `en-us`.
|
||
- **Helpers & partials.** Exposed helpers mirror the list in [`notifications/templates.md`](templates.md#3-variables-helpers-and-context). Plug-ins may register additional helpers but must remain deterministic and side-effect free.
|
||
- **Rendering output.** `NotifyDeliveryRendered` captures:
|
||
- `channelType`, `format`, `locale`
|
||
- `title`, `body`, optional `summary`, `textBody`
|
||
- `target` (redacted where necessary)
|
||
- `attachments[]` (safe URLs or references)
|
||
- `bodyHash` (lowercase SHA-256) for audit parity
|
||
- **Connector contract.** Connectors implement `INotifyConnector` (send + health) and can implement `INotifyChannelTestProvider` for `/channels/{id}/test`. All plugs are single-tenant aware; secrets are pulled via references at send time and never persisted in Mongo.
|
||
- **Retries.** Workers track attempts with exponential jitter. On permanent failure, deliveries are marked `Failed` with `statusReason`, and optional DLQ fan-out is slated for Sprint 40.
|
||
|
||
---
|
||
|
||
## 4. Persistence model
|
||
|
||
| Collection | Purpose | Key fields & indexes |
|
||
|------------|---------|----------------------|
|
||
| `rules` | Tenant rule definitions. | `_id`, `tenantId`, `enabled`; index on `{tenantId, enabled}`. |
|
||
| `channels` | Channel metadata + config references. | `_id`, `tenantId`, `type`; index on `{tenantId, type}`. |
|
||
| `templates` | Locale-specific render bodies. | `_id`, `tenantId`, `channelType`, `key`; index on `{tenantId, channelType, key}`. |
|
||
| `deliveries` | Ledger of rendered notifications. | `_id`, `tenantId`, `sentAt`; compound index on `{tenantId, sentAt:-1}` for history queries. |
|
||
| `digests` | Open digest windows per action. | `_id` (`tenantId:actionKey:window`), `status`; index on `{tenantId, actionKey}`. |
|
||
| `throttles` | Short-lived throttle tokens (Mongo or Redis). | Key format `idem:<hash>` with TTL aligned to throttle duration. |
|
||
|
||
Documents are stored using the canonical JSON serializer (`NotifyCanonicalJsonSerializer`) to preserve property ordering and casing. Schema migration helpers upgrade stored documents when new versions ship.
|
||
|
||
---
|
||
|
||
## 5. Deployment & configuration
|
||
|
||
- **Configuration sources.** YAML files feed typed options (`NotifyMongoOptions`, `NotifyWorkerOptions`, etc.). Environment variables can override connection strings and rate limits for production.
|
||
- **Authority integration.** Two OAuth clients (`notify-web`, `notify-web-dev`) with scopes `notify.read` and `notify.admin` are required. Authority enforcement can be disabled for air-gapped dev use by providing `developmentSigningKey`.
|
||
- **Plug-in management.** `plugins.baseDirectory` and `orderedPlugins` guarantee deterministic loading. Offline Kits copy the plug-in tree verbatim; operations must keep the order aligned across environments.
|
||
- **Observability.** Workers expose structured logs (`ruleId`, `actionId`, `eventId`, `throttleKey`). Metrics include:
|
||
- `notify_rule_matches_total{tenant,eventKind}`
|
||
- `notify_delivery_attempts_total{channelType,status}`
|
||
- `notify_digest_open_windows{window}`
|
||
- Optional OpenTelemetry traces for rule evaluation and connector round-trips.
|
||
- **Scaling levers.** Increase worker replicas to cope with bus throughput; adjust `worker.prefetchCount` for Redis Streams or `ackWait` for NATS JetStream. WebService remains stateless and scales horizontally behind the gateway.
|
||
|
||
---
|
||
|
||
## 6. Roadmap alignment
|
||
|
||
| Backlog | Architectural note |
|
||
|---------|--------------------|
|
||
| `NOTIFY-SVC-38-001` | Standardise event envelope publication (idempotency keys) – ensure bus bindings use the documented key format. |
|
||
| `NOTIFY-SVC-38-002..004` | Introduce simulation endpoints and throttle dashboards – expect additional `/internal/notify/simulate` routes and metrics; update once merged. |
|
||
| `NOTIFY-SVC-39-001..004` | Correlation engine, digests generator, simulation API, quiet hours – anticipate new Mongo documents (`quietHours`, correlation caches) and connector metadata (quiet mode hints). Review this guide when implementations land. |
|
||
|
||
Action: schedule a documentation sync with the Notifications Service Guild immediately after `NOTIFY-SVC-39-001..004` merge to confirm schema adjustments (e.g., correlation edge storage, quiet hour calendars) and add any new persistence or API details here.
|
||
|
||
---
|
||
|
||
> **Imposed rule reminder:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|