119 lines
		
	
	
		
			8.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			119 lines
		
	
	
		
			8.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Notifications Architecture
 | ||
| 
 | ||
| > **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
 | ||
| 
 | ||
| This dossier distils the Notify architecture into implementation-ready guidance for service owners, SREs, and integrators. It complements the high-level overview by detailing process boundaries, persistence models, and extensibility points.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 1. Runtime shape
 | ||
| 
 | ||
| ```
 | ||
|           ┌──────────────────┐
 | ||
|           │ Authority (OpTok)│
 | ||
|           └───────┬──────────┘
 | ||
|                   │
 | ||
|           ┌───────▼──────────┐        ┌───────────────┐
 | ||
|           │ Notify.WebService│◀──────▶│ MongoDB       │
 | ||
| Tenant API│  REST + gRPC WIP │        │ rules/channels│
 | ||
|           └───────▲──────────┘        │ deliveries    │
 | ||
|                   │                   │ digests       │
 | ||
|    Internal bus   │                   └───────────────┘
 | ||
|  (NATS/Redis/etc) │
 | ||
|                   │
 | ||
|         ┌─────────▼─────────┐      ┌───────────────┐
 | ||
|         │ Notify.Worker     │◀────▶│ Redis / Cache │
 | ||
|         │ rule eval + render│      │ throttles/locks│
 | ||
|         └─────────▲─────────┘      └───────▲───────┘
 | ||
|                   │                        │
 | ||
|                   │                        │
 | ||
|            ┌──────┴──────┐       ┌─────────┴────────┐
 | ||
|            │ Connectors  │──────▶│ Slack/Teams/...  │
 | ||
|            │ (plug-ins)  │       │ External targets │
 | ||
|            └─────────────┘       └──────────────────┘
 | ||
| ```
 | ||
| 
 | ||
| - **WebService** hosts REST endpoints (`/channels`, `/rules`, `/templates`, `/deliveries`, `/digests`, `/stats`) and handles schema normalisation, validation, and Authority enforcement.
 | ||
| - **Worker** subscribes to the platform event bus, evaluates rules per tenant, applies throttles/digests, renders payloads, writes ledger entries, and invokes connectors.
 | ||
| - **Plug-ins** live under `plugins/notify/` and are loaded deterministically at service start (`orderedPlugins` list). Each implements connector contracts and optional health/test-preview providers.
 | ||
| 
 | ||
| Both services share options via `notify.yaml` (see `etc/notify.yaml.sample`). For dev/test scenarios, an in-memory repository exists but production requires Mongo + Redis/NATS for durability and coordination.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 2. Event ingestion and rule evaluation
 | ||
| 
 | ||
| 1. **Subscription.** Workers attach to the internal bus (Redis Streams or NATS JetStream). Each partition key is `tenantId|scope.digest|event.kind` to preserve order for a given artefact.
 | ||
| 2. **Normalisation.** Incoming events are hydrated into `NotifyEvent` envelopes. Payload JSON is normalised (sorted object keys) to preserve determinism and enable hashing.
 | ||
| 3. **Rule snapshot.** Per-tenant rule sets are cached in memory. Change streams from Mongo trigger snapshot refreshes without restart.
 | ||
| 4. **Match pipeline.**
 | ||
|    - Tenant check (`rule.tenantId` vs. event tenant).
 | ||
|   - Kind/namespace/repository/digest filters.
 | ||
|   - Severity and KEV gating based on event deltas.
 | ||
|   - VEX gating using `NotifyRuleMatchVex`.
 | ||
|   - Action iteration with throttle/digest decisions.
 | ||
| 5. **Idempotency.** Each action computes `hash(ruleId|actionId|event.kind|scope.digest|delta.hash|dayBucket)`; matches within throttle TTL record `status=Throttled` and stop.
 | ||
| 6. **Dispatch.** If digest is `instant`, the renderer immediately processes the action. Otherwise the event is appended to the digest window for later flush.
 | ||
| 
 | ||
| Failures during evaluation are logged with correlation IDs and surfaced through `/stats` and worker metrics (`notify_rule_eval_failures_total`, `notify_digest_flush_errors_total`).
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 3. Rendering & connectors
 | ||
| 
 | ||
| - **Template resolution.** The renderer picks the template in this order: action template → channel default template → locale fallback → built-in minimal template. Locale negotiation reduces `en-US` to `en-us`.
 | ||
| - **Helpers & partials.** Exposed helpers mirror the list in [`notifications/templates.md`](templates.md#3-variables-helpers-and-context). Plug-ins may register additional helpers but must remain deterministic and side-effect free.
 | ||
| - **Rendering output.** `NotifyDeliveryRendered` captures:
 | ||
|   - `channelType`, `format`, `locale`
 | ||
|   - `title`, `body`, optional `summary`, `textBody`
 | ||
|   - `target` (redacted where necessary)
 | ||
|   - `attachments[]` (safe URLs or references)
 | ||
|   - `bodyHash` (lowercase SHA-256) for audit parity
 | ||
| - **Connector contract.** Connectors implement `INotifyConnector` (send + health) and can implement `INotifyChannelTestProvider` for `/channels/{id}/test`. All plugs are single-tenant aware; secrets are pulled via references at send time and never persisted in Mongo.
 | ||
| - **Retries.** Workers track attempts with exponential jitter. On permanent failure, deliveries are marked `Failed` with `statusReason`, and optional DLQ fan-out is slated for Sprint 40.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 4. Persistence model
 | ||
| 
 | ||
| | Collection | Purpose | Key fields & indexes |
 | ||
| |------------|---------|----------------------|
 | ||
| | `rules` | Tenant rule definitions. | `_id`, `tenantId`, `enabled`; index on `{tenantId, enabled}`. |
 | ||
| | `channels` | Channel metadata + config references. | `_id`, `tenantId`, `type`; index on `{tenantId, type}`. |
 | ||
| | `templates` | Locale-specific render bodies. | `_id`, `tenantId`, `channelType`, `key`; index on `{tenantId, channelType, key}`. |
 | ||
| | `deliveries` | Ledger of rendered notifications. | `_id`, `tenantId`, `sentAt`; compound index on `{tenantId, sentAt:-1}` for history queries. |
 | ||
| | `digests` | Open digest windows per action. | `_id` (`tenantId:actionKey:window`), `status`; index on `{tenantId, actionKey}`. |
 | ||
| | `throttles` | Short-lived throttle tokens (Mongo or Redis). | Key format `idem:<hash>` with TTL aligned to throttle duration. |
 | ||
| 
 | ||
| Documents are stored using the canonical JSON serializer (`NotifyCanonicalJsonSerializer`) to preserve property ordering and casing. Schema migration helpers upgrade stored documents when new versions ship.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 5. Deployment & configuration
 | ||
| 
 | ||
| - **Configuration sources.** YAML files feed typed options (`NotifyMongoOptions`, `NotifyWorkerOptions`, etc.). Environment variables can override connection strings and rate limits for production.
 | ||
| - **Authority integration.** Two OAuth clients (`notify-web`, `notify-web-dev`) with scopes `notify.read` and `notify.admin` are required. Authority enforcement can be disabled for air-gapped dev use by providing `developmentSigningKey`.
 | ||
| - **Plug-in management.** `plugins.baseDirectory` and `orderedPlugins` guarantee deterministic loading. Offline Kits copy the plug-in tree verbatim; operations must keep the order aligned across environments.
 | ||
| - **Observability.** Workers expose structured logs (`ruleId`, `actionId`, `eventId`, `throttleKey`). Metrics include:
 | ||
|   - `notify_rule_matches_total{tenant,eventKind}`
 | ||
|   - `notify_delivery_attempts_total{channelType,status}`
 | ||
|   - `notify_digest_open_windows{window}`
 | ||
|   - Optional OpenTelemetry traces for rule evaluation and connector round-trips.
 | ||
| - **Scaling levers.** Increase worker replicas to cope with bus throughput; adjust `worker.prefetchCount` for Redis Streams or `ackWait` for NATS JetStream. WebService remains stateless and scales horizontally behind the gateway.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 6. Roadmap alignment
 | ||
| 
 | ||
| | Backlog | Architectural note |
 | ||
| |---------|--------------------|
 | ||
| | `NOTIFY-SVC-38-001` | Standardise event envelope publication (idempotency keys) – ensure bus bindings use the documented key format. |
 | ||
| | `NOTIFY-SVC-38-002..004` | Introduce simulation endpoints and throttle dashboards – expect additional `/internal/notify/simulate` routes and metrics; update once merged. |
 | ||
| | `NOTIFY-SVC-39-001..004` | Correlation engine, digests generator, simulation API, quiet hours – anticipate new Mongo documents (`quietHours`, correlation caches) and connector metadata (quiet mode hints). Review this guide when implementations land. |
 | ||
| 
 | ||
| Action: schedule a documentation sync with the Notifications Service Guild immediately after `NOTIFY-SVC-39-001..004` merge to confirm schema adjustments (e.g., correlation edge storage, quiet hour calendars) and add any new persistence or API details here.
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| > **Imposed rule reminder:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
 |