# Notification Rules and Alerting Engine **Version:** 1.0 **Date:** 2025-11-29 **Status:** Canonical This advisory defines the product rationale, rules engine semantics, and implementation strategy for the Notify module, covering channel connectors, throttling, digests, and delivery management. --- ## 1. Executive Summary The Notify module provides **rules-driven, tenant-aware notification delivery** across security workflows. Key capabilities: - **Rules Engine** - Declarative matchers for event routing - **Multi-Channel Delivery** - Slack, Teams, Email, Webhooks - **Noise Control** - Throttling, deduplication, digest windows - **Approval Tokens** - DSSE-signed ack tokens for one-click workflows - **Audit Trail** - Complete delivery history with redacted payloads --- ## 2. Market Drivers ### 2.1 Target Segments | Segment | Notification Requirements | Use Case | |---------|--------------------------|----------| | **Security Teams** | Real-time critical alerts | Incident response | | **DevSecOps** | CI/CD integration | Pipeline notifications | | **Compliance** | Audit trails | Delivery verification | | **Management** | Digest summaries | Executive reporting | ### 2.2 Competitive Positioning Most vulnerability tools offer basic email alerts. Stella Ops differentiates with: - **Rules-based routing** with fine-grained matchers - **Native Slack/Teams integration** with rich formatting - **Digest windows** to prevent alert fatigue - **Cryptographic ack tokens** for approval workflows - **Tenant isolation** with quota controls --- ## 3. Rules Engine ### 3.1 Rule Structure ```yaml name: "critical-alerts-prod" enabled: true tenant: "acme-corp" match: eventKinds: - "scanner.report.ready" - "scheduler.rescan.delta" - "zastava.admission" namespaces: ["prod-*"] repos: ["ghcr.io/acme/*"] minSeverity: "high" kev: true verdict: ["fail", "deny"] vex: includeRejectedJustifications: false actions: - channel: "slack:sec-alerts" template: "concise" throttle: "5m" - channel: "email:soc" digest: "hourly" template: "detailed" ``` ### 3.2 Matcher Types | Matcher | Description | Example | |---------|-------------|---------| | `eventKinds` | Event type filter | `["scanner.report.ready"]` | | `namespaces` | Namespace patterns | `["prod-*", "staging"]` | | `repos` | Repository patterns | `["ghcr.io/acme/*"]` | | `minSeverity` | Minimum severity | `"high"` | | `kev` | KEV-tagged required | `true` | | `verdict` | Report/admission verdict | `["fail", "deny"]` | | `labels` | Kubernetes labels | `{"env": "production"}` | ### 3.3 Evaluation Order 1. **Tenant check** - Discard if rule tenant ≠ event tenant 2. **Kind filter** - Early discard for non-matching kinds 3. **Scope match** - Namespace/repo/label matching 4. **Delta gates** - Severity threshold evaluation 5. **VEX gate** - Filter based on VEX status 6. **Throttle/dedup** - Idempotency key check 7. **Actions** - Enqueue per-channel jobs --- ## 4. Channel Connectors ### 4.1 Built-in Channels | Channel | Features | Rate Limits | |---------|----------|-------------| | **Slack** | Blocks, threads, reactions | 1 msg/sec per channel | | **Teams** | Adaptive Cards, webhooks | 4 msgs/sec | | **Email** | HTML+text, attachments | Relay-dependent | | **Webhook** | JSON, HMAC signing | 10 req/sec | ### 4.2 Channel Configuration ```yaml channels: - name: "slack:sec-alerts" type: slack config: channel: "#security-alerts" workspace: "acme-corp" secretRef: "ref://notify/slack-token" - name: "email:soc" type: email config: to: ["soc@acme.com"] from: "stellaops@acme.com" smtpHost: "smtp.acme.com" secretRef: "ref://notify/smtp-creds" - name: "webhook:siem" type: webhook config: url: "https://siem.acme.com/api/events" signMethod: "ed25519" signKeyRef: "ref://notify/webhook-key" ``` ### 4.3 Connector Contract ```csharp public interface INotifyConnector { string Type { get; } Task SendAsync(DeliveryContext ctx, CancellationToken ct); Task HealthAsync(ChannelConfig cfg, CancellationToken ct); } ``` --- ## 5. Noise Control ### 5.1 Throttling - **Per-action throttle** - Suppress duplicates within window - **Idempotency key** - `hash(ruleId | actionId | event.kind | scope.digest | day)` - **Configurable windows** - 5m, 15m, 1h, 1d ### 5.2 Digest Windows ```yaml actions: - channel: "email:weekly-summary" digest: "weekly" digestOptions: maxItems: 100 groupBy: ["severity", "namespace"] template: "digest-summary" ``` **Behavior:** - Coalesce events within window - Summarize top N items with counts - Flush on window close or max items - Safe truncation with "and X more" links ### 5.3 Quiet Hours ```yaml notify: quietHours: enabled: true window: "22:00-06:00" timezone: "America/New_York" minSeverity: "critical" ``` Only critical alerts during quiet hours; others deferred to digests. --- ## 6. Templates & Rendering ### 6.1 Template Engine - Handlebars-style safe templates - No arbitrary code execution - Deterministic outputs (stable property order) - Locale-aware formatting ### 6.2 Template Variables | Variable | Description | |----------|-------------| | `event.kind` | Event type | | `event.ts` | Timestamp | | `scope.namespace` | Kubernetes namespace | | `scope.repo` | Repository | | `scope.digest` | Image digest | | `payload.verdict` | Policy verdict | | `payload.delta.newCritical` | New critical count | | `payload.links.ui` | UI deep link | | `topFindings[]` | Top N findings | ### 6.3 Channel-Specific Rendering **Slack:** ```json { "blocks": [ {"type": "header", "text": {"type": "plain_text", "text": "Policy FAIL: nginx:latest"}}, {"type": "section", "text": {"type": "mrkdwn", "text": "*2 critical*, 3 high vulnerabilities"}} ] } ``` **Email:** ```html

Policy FAIL: nginx:latest

Critical2
High3
View Details ``` --- ## 7. Ack Tokens ### 7.1 Token Structure DSSE-signed tokens for one-click acknowledgements: ```json { "payloadType": "application/vnd.stellaops.notify-ack-token+json", "payload": { "tenant": "acme-corp", "deliveryId": "delivery-123", "notificationId": "notif-456", "channel": "slack:sec-alerts", "webhookUrl": "https://notify.internal/ack", "nonce": "random-nonce", "actions": ["acknowledge", "escalate"], "expiresAt": "2025-11-29T13:00:00Z" }, "signatures": [{"keyid": "notify-ack-key-01", "sig": "..."}] } ``` ### 7.2 Token Workflow 1. **Issue** - `POST /notify/ack-tokens/issue` 2. **Embed** - Token included in message action button 3. **Click** - User clicks button, token sent to webhook 4. **Verify** - `POST /notify/ack-tokens/verify` 5. **Audit** - Ack event recorded ### 7.3 Token Rotation ```bash # Rotate ack token signing key stella notify rotate-ack-key --key-source kms://notify/ack-key ``` --- ## 8. Implementation Strategy ### 8.1 Phase 1: Core Engine (Complete) - [x] Rules engine with matchers - [x] Slack connector - [x] Teams connector - [x] Email connector - [x] Webhook connector ### 8.2 Phase 2: Noise Control (Complete) - [x] Throttling - [x] Digest windows - [x] Idempotency - [x] Quiet hours ### 8.3 Phase 3: Ack Tokens (In Progress) - [x] Token issuance - [x] Token verification - [ ] Token rotation API (NOTIFY-ACK-45-001) - [ ] Escalation workflows (NOTIFY-ESC-46-001) ### 8.4 Phase 4: Advanced Features (Planned) - [ ] PagerDuty connector - [ ] Jira ticket creation - [ ] In-app notifications - [ ] Anomaly suppression --- ## 9. API Surface ### 9.1 Channels | Endpoint | Method | Scope | Description | |----------|--------|-------|-------------| | `/api/v1/notify/channels` | GET/POST | `notify.read/admin` | List/create channels | | `/api/v1/notify/channels/{id}` | GET/PATCH/DELETE | `notify.admin` | Manage channel | | `/api/v1/notify/channels/{id}/test` | POST | `notify.admin` | Send test message | | `/api/v1/notify/channels/{id}/health` | GET | `notify.read` | Health check | ### 9.2 Rules | Endpoint | Method | Scope | Description | |----------|--------|-------|-------------| | `/api/v1/notify/rules` | GET/POST | `notify.read/admin` | List/create rules | | `/api/v1/notify/rules/{id}` | GET/PATCH/DELETE | `notify.admin` | Manage rule | | `/api/v1/notify/rules/{id}/test` | POST | `notify.admin` | Dry-run rule | ### 9.3 Deliveries | Endpoint | Method | Scope | Description | |----------|--------|-------|-------------| | `/api/v1/notify/deliveries` | GET | `notify.read` | List deliveries | | `/api/v1/notify/deliveries/{id}` | GET | `notify.read` | Delivery detail | | `/api/v1/notify/deliveries/{id}/retry` | POST | `notify.admin` | Retry delivery | --- ## 10. Event Sources ### 10.1 Subscribed Events | Event | Source | Typical Actions | |-------|--------|-----------------| | `scanner.scan.completed` | Scanner | Immediate/digest | | `scanner.report.ready` | Scanner | Immediate | | `scheduler.rescan.delta` | Scheduler | Immediate/digest | | `attestor.logged` | Attestor | Immediate | | `zastava.admission` | Zastava | Immediate | | `conselier.export.completed` | Concelier | Digest | | `excitor.export.completed` | Excititor | Digest | ### 10.2 Event Envelope ```json { "eventId": "uuid", "kind": "scanner.report.ready", "tenant": "acme-corp", "ts": "2025-11-29T12:00:00Z", "actor": "scanner-webservice", "scope": { "namespace": "production", "repo": "ghcr.io/acme/api", "digest": "sha256:..." }, "payload": { "reportId": "report-123", "verdict": "fail", "summary": {"total": 12, "blocked": 2}, "delta": {"newCritical": 1, "kev": ["CVE-2025-..."]} } } ``` --- ## 11. Observability ### 11.1 Metrics - `notify.events_consumed_total{kind}` - `notify.rules_matched_total{ruleId}` - `notify.throttled_total{reason}` - `notify.digest_coalesced_total{window}` - `notify.sent_total{channel}` - `notify.failed_total{channel,code}` - `notify.delivery_latency_seconds{channel}` ### 11.2 SLO Targets | Metric | Target | |--------|--------| | Event-to-delivery p95 | < 60 seconds | | Failure rate | < 0.5% per hour | | Duplicate rate | ~0% | --- ## 12. Security Considerations ### 12.1 Secret Management - Secrets stored as references only - Just-in-time fetch at send time - No plaintext in Mongo ### 12.2 Webhook Signing ``` X-StellaOps-Signature: t=1732881600,v1=abc123... X-StellaOps-Timestamp: 2025-11-29T12:00:00Z ``` - HMAC-SHA256 or Ed25519 - Replay window protection - Canonical body hash ### 12.3 Loop Prevention - Webhook target allowlist - Event origin tags - Own webhooks rejected --- ## 13. Related Documentation | Resource | Location | |----------|----------| | Notify architecture | `docs/modules/notify/architecture.md` | | Channel schemas | `docs/modules/notify/resources/schemas/` | | Sample payloads | `docs/modules/notify/resources/samples/` | | Bootstrap pack | `docs/modules/notify/bootstrap-pack.md` | --- ## 14. Sprint Mapping - **Primary Sprint:** SPRINT_0170_0001_0001_notify_engine.md (NEW) - **Related Sprints:** - SPRINT_0171_0001_0002_notify_connectors.md - SPRINT_0172_0001_0003_notify_ack_tokens.md **Key Task IDs:** - `NOTIFY-ENGINE-40-001` - Rules engine (DONE) - `NOTIFY-CONN-41-001` - Connectors (DONE) - `NOTIFY-NOISE-42-001` - Throttling/digests (DONE) - `NOTIFY-ACK-45-001` - Token rotation (IN PROGRESS) - `NOTIFY-ESC-46-001` - Escalation workflows (TODO) --- ## 15. Success Metrics | Metric | Target | |--------|--------| | Delivery latency | < 60s p95 | | Delivery success rate | > 99.5% | | Duplicate rate | < 0.01% | | Rule evaluation time | < 10ms | | Channel health | 99.9% uptime | --- *Last updated: 2025-11-29*