8.8 KiB
Notifications Architecture
Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
This dossier distils the Notify architecture into implementation-ready guidance for service owners, SREs, and integrators. It complements the high-level overview by detailing process boundaries, persistence models, and extensibility points.
1. Runtime shape
┌──────────────────┐
│ Authority (OpTok)│
└───────┬──────────┘
│
┌───────▼──────────┐ ┌───────────────┐
│ Notify.WebService│◀──────▶│ MongoDB │
Tenant API│ REST + gRPC WIP │ │ rules/channels│
└───────▲──────────┘ │ deliveries │
│ │ digests │
Internal bus │ └───────────────┘
(NATS/Redis/etc) │
│
┌─────────▼─────────┐ ┌───────────────┐
│ Notify.Worker │◀────▶│ Redis / Cache │
│ rule eval + render│ │ throttles/locks│
└─────────▲─────────┘ └───────▲───────┘
│ │
│ │
┌──────┴──────┐ ┌─────────┴────────┐
│ Connectors │──────▶│ Slack/Teams/... │
│ (plug-ins) │ │ External targets │
└─────────────┘ └──────────────────┘
- WebService hosts REST endpoints (
/channels,/rules,/templates,/deliveries,/digests,/stats) and handles schema normalisation, validation, and Authority enforcement. - Worker subscribes to the platform event bus, evaluates rules per tenant, applies throttles/digests, renders payloads, writes ledger entries, and invokes connectors.
- Plug-ins live under
plugins/notify/and are loaded deterministically at service start (orderedPluginslist). Each implements connector contracts and optional health/test-preview providers.
Both services share options via notify.yaml (see etc/notify.yaml.sample). For dev/test scenarios, an in-memory repository exists but production requires Mongo + Redis/NATS for durability and coordination.
2. Event ingestion and rule evaluation
- Subscription. Workers attach to the internal bus (Redis Streams or NATS JetStream). Each partition key is
tenantId|scope.digest|event.kindto preserve order for a given artefact. - Normalisation. Incoming events are hydrated into
NotifyEventenvelopes. Payload JSON is normalised (sorted object keys) to preserve determinism and enable hashing. - Rule snapshot. Per-tenant rule sets are cached in memory. Change streams from Mongo trigger snapshot refreshes without restart.
- Match pipeline.
- Tenant check (
rule.tenantIdvs. event tenant).
- Tenant check (
- Kind/namespace/repository/digest filters.
- Severity and KEV gating based on event deltas.
- VEX gating using
NotifyRuleMatchVex. - Action iteration with throttle/digest decisions.
- Idempotency. Each action computes
hash(ruleId|actionId|event.kind|scope.digest|delta.hash|dayBucket); matches within throttle TTL recordstatus=Throttledand stop. - Dispatch. If digest is
instant, the renderer immediately processes the action. Otherwise the event is appended to the digest window for later flush.
Failures during evaluation are logged with correlation IDs and surfaced through /stats and worker metrics (notify_rule_eval_failures_total, notify_digest_flush_errors_total).
3. Rendering & connectors
- Template resolution. The renderer picks the template in this order: action template → channel default template → locale fallback → built-in minimal template. Locale negotiation reduces
en-UStoen-us. - Helpers & partials. Exposed helpers mirror the list in
notifications/templates.md. Plug-ins may register additional helpers but must remain deterministic and side-effect free. - Rendering output.
NotifyDeliveryRenderedcaptures:channelType,format,localetitle,body, optionalsummary,textBodytarget(redacted where necessary)attachments[](safe URLs or references)bodyHash(lowercase SHA-256) for audit parity
- Connector contract. Connectors implement
INotifyConnector(send + health) and can implementINotifyChannelTestProviderfor/channels/{id}/test. All plugs are single-tenant aware; secrets are pulled via references at send time and never persisted in Mongo. - Retries. Workers track attempts with exponential jitter. On permanent failure, deliveries are marked
FailedwithstatusReason, and optional DLQ fan-out is slated for Sprint 40.
4. Persistence model
| Collection | Purpose | Key fields & indexes |
|---|---|---|
rules |
Tenant rule definitions. | _id, tenantId, enabled; index on {tenantId, enabled}. |
channels |
Channel metadata + config references. | _id, tenantId, type; index on {tenantId, type}. |
templates |
Locale-specific render bodies. | _id, tenantId, channelType, key; index on {tenantId, channelType, key}. |
deliveries |
Ledger of rendered notifications. | _id, tenantId, sentAt; compound index on {tenantId, sentAt:-1} for history queries. |
digests |
Open digest windows per action. | _id (tenantId:actionKey:window), status; index on {tenantId, actionKey}. |
throttles |
Short-lived throttle tokens (Mongo or Redis). | Key format idem:<hash> with TTL aligned to throttle duration. |
Documents are stored using the canonical JSON serializer (NotifyCanonicalJsonSerializer) to preserve property ordering and casing. Schema migration helpers upgrade stored documents when new versions ship.
5. Deployment & configuration
- Configuration sources. YAML files feed typed options (
NotifyMongoOptions,NotifyWorkerOptions, etc.). Environment variables can override connection strings and rate limits for production. - Authority integration. Two OAuth clients (
notify-web,notify-web-dev) with scopesnotify.readandnotify.adminare required. Authority enforcement can be disabled for air-gapped dev use by providingdevelopmentSigningKey. - Plug-in management.
plugins.baseDirectoryandorderedPluginsguarantee deterministic loading. Offline Kits copy the plug-in tree verbatim; operations must keep the order aligned across environments. - Observability. Workers expose structured logs (
ruleId,actionId,eventId,throttleKey). Metrics include:notify_rule_matches_total{tenant,eventKind}notify_delivery_attempts_total{channelType,status}notify_digest_open_windows{window}- Optional OpenTelemetry traces for rule evaluation and connector round-trips.
- Scaling levers. Increase worker replicas to cope with bus throughput; adjust
worker.prefetchCountfor Redis Streams orackWaitfor NATS JetStream. WebService remains stateless and scales horizontally behind the gateway.
6. Roadmap alignment
| Backlog | Architectural note |
|---|---|
NOTIFY-SVC-38-001 |
Standardise event envelope publication (idempotency keys) – ensure bus bindings use the documented key format. |
NOTIFY-SVC-38-002..004 |
Introduce simulation endpoints and throttle dashboards – expect additional /internal/notify/simulate routes and metrics; update once merged. |
NOTIFY-SVC-39-001..004 |
Correlation engine, digests generator, simulation API, quiet hours – anticipate new Mongo documents (quietHours, correlation caches) and connector metadata (quiet mode hints). Review this guide when implementations land. |
Action: schedule a documentation sync with the Notifications Service Guild immediately after NOTIFY-SVC-39-001..004 merge to confirm schema adjustments (e.g., correlation edge storage, quiet hour calendars) and add any new persistence or API details here.
Imposed rule reminder: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.