This commit is contained in:
@@ -0,0 +1,469 @@
|
||||
# Notification Rules and Alerting Engine
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-11-29
|
||||
**Status:** Canonical
|
||||
|
||||
This advisory defines the product rationale, rules engine semantics, and implementation strategy for the Notify module, covering channel connectors, throttling, digests, and delivery management.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
The Notify module provides **rules-driven, tenant-aware notification delivery** across security workflows. Key capabilities:
|
||||
|
||||
- **Rules Engine** - Declarative matchers for event routing
|
||||
- **Multi-Channel Delivery** - Slack, Teams, Email, Webhooks
|
||||
- **Noise Control** - Throttling, deduplication, digest windows
|
||||
- **Approval Tokens** - DSSE-signed ack tokens for one-click workflows
|
||||
- **Audit Trail** - Complete delivery history with redacted payloads
|
||||
|
||||
---
|
||||
|
||||
## 2. Market Drivers
|
||||
|
||||
### 2.1 Target Segments
|
||||
|
||||
| Segment | Notification Requirements | Use Case |
|
||||
|---------|--------------------------|----------|
|
||||
| **Security Teams** | Real-time critical alerts | Incident response |
|
||||
| **DevSecOps** | CI/CD integration | Pipeline notifications |
|
||||
| **Compliance** | Audit trails | Delivery verification |
|
||||
| **Management** | Digest summaries | Executive reporting |
|
||||
|
||||
### 2.2 Competitive Positioning
|
||||
|
||||
Most vulnerability tools offer basic email alerts. Stella Ops differentiates with:
|
||||
- **Rules-based routing** with fine-grained matchers
|
||||
- **Native Slack/Teams integration** with rich formatting
|
||||
- **Digest windows** to prevent alert fatigue
|
||||
- **Cryptographic ack tokens** for approval workflows
|
||||
- **Tenant isolation** with quota controls
|
||||
|
||||
---
|
||||
|
||||
## 3. Rules Engine
|
||||
|
||||
### 3.1 Rule Structure
|
||||
|
||||
```yaml
|
||||
name: "critical-alerts-prod"
|
||||
enabled: true
|
||||
tenant: "acme-corp"
|
||||
|
||||
match:
|
||||
eventKinds:
|
||||
- "scanner.report.ready"
|
||||
- "scheduler.rescan.delta"
|
||||
- "zastava.admission"
|
||||
namespaces: ["prod-*"]
|
||||
repos: ["ghcr.io/acme/*"]
|
||||
minSeverity: "high"
|
||||
kev: true
|
||||
verdict: ["fail", "deny"]
|
||||
vex:
|
||||
includeRejectedJustifications: false
|
||||
|
||||
actions:
|
||||
- channel: "slack:sec-alerts"
|
||||
template: "concise"
|
||||
throttle: "5m"
|
||||
|
||||
- channel: "email:soc"
|
||||
digest: "hourly"
|
||||
template: "detailed"
|
||||
```
|
||||
|
||||
### 3.2 Matcher Types
|
||||
|
||||
| Matcher | Description | Example |
|
||||
|---------|-------------|---------|
|
||||
| `eventKinds` | Event type filter | `["scanner.report.ready"]` |
|
||||
| `namespaces` | Namespace patterns | `["prod-*", "staging"]` |
|
||||
| `repos` | Repository patterns | `["ghcr.io/acme/*"]` |
|
||||
| `minSeverity` | Minimum severity | `"high"` |
|
||||
| `kev` | KEV-tagged required | `true` |
|
||||
| `verdict` | Report/admission verdict | `["fail", "deny"]` |
|
||||
| `labels` | Kubernetes labels | `{"env": "production"}` |
|
||||
|
||||
### 3.3 Evaluation Order
|
||||
|
||||
1. **Tenant check** - Discard if rule tenant ≠ event tenant
|
||||
2. **Kind filter** - Early discard for non-matching kinds
|
||||
3. **Scope match** - Namespace/repo/label matching
|
||||
4. **Delta gates** - Severity threshold evaluation
|
||||
5. **VEX gate** - Filter based on VEX status
|
||||
6. **Throttle/dedup** - Idempotency key check
|
||||
7. **Actions** - Enqueue per-channel jobs
|
||||
|
||||
---
|
||||
|
||||
## 4. Channel Connectors
|
||||
|
||||
### 4.1 Built-in Channels
|
||||
|
||||
| Channel | Features | Rate Limits |
|
||||
|---------|----------|-------------|
|
||||
| **Slack** | Blocks, threads, reactions | 1 msg/sec per channel |
|
||||
| **Teams** | Adaptive Cards, webhooks | 4 msgs/sec |
|
||||
| **Email** | HTML+text, attachments | Relay-dependent |
|
||||
| **Webhook** | JSON, HMAC signing | 10 req/sec |
|
||||
|
||||
### 4.2 Channel Configuration
|
||||
|
||||
```yaml
|
||||
channels:
|
||||
- name: "slack:sec-alerts"
|
||||
type: slack
|
||||
config:
|
||||
channel: "#security-alerts"
|
||||
workspace: "acme-corp"
|
||||
secretRef: "ref://notify/slack-token"
|
||||
|
||||
- name: "email:soc"
|
||||
type: email
|
||||
config:
|
||||
to: ["soc@acme.com"]
|
||||
from: "stellaops@acme.com"
|
||||
smtpHost: "smtp.acme.com"
|
||||
secretRef: "ref://notify/smtp-creds"
|
||||
|
||||
- name: "webhook:siem"
|
||||
type: webhook
|
||||
config:
|
||||
url: "https://siem.acme.com/api/events"
|
||||
signMethod: "ed25519"
|
||||
signKeyRef: "ref://notify/webhook-key"
|
||||
```
|
||||
|
||||
### 4.3 Connector Contract
|
||||
|
||||
```csharp
|
||||
public interface INotifyConnector
|
||||
{
|
||||
string Type { get; }
|
||||
Task<DeliveryResult> SendAsync(DeliveryContext ctx, CancellationToken ct);
|
||||
Task<HealthResult> HealthAsync(ChannelConfig cfg, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Noise Control
|
||||
|
||||
### 5.1 Throttling
|
||||
|
||||
- **Per-action throttle** - Suppress duplicates within window
|
||||
- **Idempotency key** - `hash(ruleId | actionId | event.kind | scope.digest | day)`
|
||||
- **Configurable windows** - 5m, 15m, 1h, 1d
|
||||
|
||||
### 5.2 Digest Windows
|
||||
|
||||
```yaml
|
||||
actions:
|
||||
- channel: "email:weekly-summary"
|
||||
digest: "weekly"
|
||||
digestOptions:
|
||||
maxItems: 100
|
||||
groupBy: ["severity", "namespace"]
|
||||
template: "digest-summary"
|
||||
```
|
||||
|
||||
**Behavior:**
|
||||
- Coalesce events within window
|
||||
- Summarize top N items with counts
|
||||
- Flush on window close or max items
|
||||
- Safe truncation with "and X more" links
|
||||
|
||||
### 5.3 Quiet Hours
|
||||
|
||||
```yaml
|
||||
notify:
|
||||
quietHours:
|
||||
enabled: true
|
||||
window: "22:00-06:00"
|
||||
timezone: "America/New_York"
|
||||
minSeverity: "critical"
|
||||
```
|
||||
|
||||
Only critical alerts during quiet hours; others deferred to digests.
|
||||
|
||||
---
|
||||
|
||||
## 6. Templates & Rendering
|
||||
|
||||
### 6.1 Template Engine
|
||||
|
||||
- Handlebars-style safe templates
|
||||
- No arbitrary code execution
|
||||
- Deterministic outputs (stable property order)
|
||||
- Locale-aware formatting
|
||||
|
||||
### 6.2 Template Variables
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `event.kind` | Event type |
|
||||
| `event.ts` | Timestamp |
|
||||
| `scope.namespace` | Kubernetes namespace |
|
||||
| `scope.repo` | Repository |
|
||||
| `scope.digest` | Image digest |
|
||||
| `payload.verdict` | Policy verdict |
|
||||
| `payload.delta.newCritical` | New critical count |
|
||||
| `payload.links.ui` | UI deep link |
|
||||
| `topFindings[]` | Top N findings |
|
||||
|
||||
### 6.3 Channel-Specific Rendering
|
||||
|
||||
**Slack:**
|
||||
```json
|
||||
{
|
||||
"blocks": [
|
||||
{"type": "header", "text": {"type": "plain_text", "text": "Policy FAIL: nginx:latest"}},
|
||||
{"type": "section", "text": {"type": "mrkdwn", "text": "*2 critical*, 3 high vulnerabilities"}}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Email:**
|
||||
```html
|
||||
<h2>Policy FAIL: nginx:latest</h2>
|
||||
<table>
|
||||
<tr><td>Critical</td><td>2</td></tr>
|
||||
<tr><td>High</td><td>3</td></tr>
|
||||
</table>
|
||||
<a href="https://ui.internal/reports/...">View Details</a>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Ack Tokens
|
||||
|
||||
### 7.1 Token Structure
|
||||
|
||||
DSSE-signed tokens for one-click acknowledgements:
|
||||
|
||||
```json
|
||||
{
|
||||
"payloadType": "application/vnd.stellaops.notify-ack-token+json",
|
||||
"payload": {
|
||||
"tenant": "acme-corp",
|
||||
"deliveryId": "delivery-123",
|
||||
"notificationId": "notif-456",
|
||||
"channel": "slack:sec-alerts",
|
||||
"webhookUrl": "https://notify.internal/ack",
|
||||
"nonce": "random-nonce",
|
||||
"actions": ["acknowledge", "escalate"],
|
||||
"expiresAt": "2025-11-29T13:00:00Z"
|
||||
},
|
||||
"signatures": [{"keyid": "notify-ack-key-01", "sig": "..."}]
|
||||
}
|
||||
```
|
||||
|
||||
### 7.2 Token Workflow
|
||||
|
||||
1. **Issue** - `POST /notify/ack-tokens/issue`
|
||||
2. **Embed** - Token included in message action button
|
||||
3. **Click** - User clicks button, token sent to webhook
|
||||
4. **Verify** - `POST /notify/ack-tokens/verify`
|
||||
5. **Audit** - Ack event recorded
|
||||
|
||||
### 7.3 Token Rotation
|
||||
|
||||
```bash
|
||||
# Rotate ack token signing key
|
||||
stella notify rotate-ack-key --key-source kms://notify/ack-key
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Implementation Strategy
|
||||
|
||||
### 8.1 Phase 1: Core Engine (Complete)
|
||||
|
||||
- [x] Rules engine with matchers
|
||||
- [x] Slack connector
|
||||
- [x] Teams connector
|
||||
- [x] Email connector
|
||||
- [x] Webhook connector
|
||||
|
||||
### 8.2 Phase 2: Noise Control (Complete)
|
||||
|
||||
- [x] Throttling
|
||||
- [x] Digest windows
|
||||
- [x] Idempotency
|
||||
- [x] Quiet hours
|
||||
|
||||
### 8.3 Phase 3: Ack Tokens (In Progress)
|
||||
|
||||
- [x] Token issuance
|
||||
- [x] Token verification
|
||||
- [ ] Token rotation API (NOTIFY-ACK-45-001)
|
||||
- [ ] Escalation workflows (NOTIFY-ESC-46-001)
|
||||
|
||||
### 8.4 Phase 4: Advanced Features (Planned)
|
||||
|
||||
- [ ] PagerDuty connector
|
||||
- [ ] Jira ticket creation
|
||||
- [ ] In-app notifications
|
||||
- [ ] Anomaly suppression
|
||||
|
||||
---
|
||||
|
||||
## 9. API Surface
|
||||
|
||||
### 9.1 Channels
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/v1/notify/channels` | GET/POST | `notify.read/admin` | List/create channels |
|
||||
| `/api/v1/notify/channels/{id}` | GET/PATCH/DELETE | `notify.admin` | Manage channel |
|
||||
| `/api/v1/notify/channels/{id}/test` | POST | `notify.admin` | Send test message |
|
||||
| `/api/v1/notify/channels/{id}/health` | GET | `notify.read` | Health check |
|
||||
|
||||
### 9.2 Rules
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/v1/notify/rules` | GET/POST | `notify.read/admin` | List/create rules |
|
||||
| `/api/v1/notify/rules/{id}` | GET/PATCH/DELETE | `notify.admin` | Manage rule |
|
||||
| `/api/v1/notify/rules/{id}/test` | POST | `notify.admin` | Dry-run rule |
|
||||
|
||||
### 9.3 Deliveries
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/v1/notify/deliveries` | GET | `notify.read` | List deliveries |
|
||||
| `/api/v1/notify/deliveries/{id}` | GET | `notify.read` | Delivery detail |
|
||||
| `/api/v1/notify/deliveries/{id}/retry` | POST | `notify.admin` | Retry delivery |
|
||||
|
||||
---
|
||||
|
||||
## 10. Event Sources
|
||||
|
||||
### 10.1 Subscribed Events
|
||||
|
||||
| Event | Source | Typical Actions |
|
||||
|-------|--------|-----------------|
|
||||
| `scanner.scan.completed` | Scanner | Immediate/digest |
|
||||
| `scanner.report.ready` | Scanner | Immediate |
|
||||
| `scheduler.rescan.delta` | Scheduler | Immediate/digest |
|
||||
| `attestor.logged` | Attestor | Immediate |
|
||||
| `zastava.admission` | Zastava | Immediate |
|
||||
| `conselier.export.completed` | Concelier | Digest |
|
||||
| `excitor.export.completed` | Excititor | Digest |
|
||||
|
||||
### 10.2 Event Envelope
|
||||
|
||||
```json
|
||||
{
|
||||
"eventId": "uuid",
|
||||
"kind": "scanner.report.ready",
|
||||
"tenant": "acme-corp",
|
||||
"ts": "2025-11-29T12:00:00Z",
|
||||
"actor": "scanner-webservice",
|
||||
"scope": {
|
||||
"namespace": "production",
|
||||
"repo": "ghcr.io/acme/api",
|
||||
"digest": "sha256:..."
|
||||
},
|
||||
"payload": {
|
||||
"reportId": "report-123",
|
||||
"verdict": "fail",
|
||||
"summary": {"total": 12, "blocked": 2},
|
||||
"delta": {"newCritical": 1, "kev": ["CVE-2025-..."]}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Observability
|
||||
|
||||
### 11.1 Metrics
|
||||
|
||||
- `notify.events_consumed_total{kind}`
|
||||
- `notify.rules_matched_total{ruleId}`
|
||||
- `notify.throttled_total{reason}`
|
||||
- `notify.digest_coalesced_total{window}`
|
||||
- `notify.sent_total{channel}`
|
||||
- `notify.failed_total{channel,code}`
|
||||
- `notify.delivery_latency_seconds{channel}`
|
||||
|
||||
### 11.2 SLO Targets
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Event-to-delivery p95 | < 60 seconds |
|
||||
| Failure rate | < 0.5% per hour |
|
||||
| Duplicate rate | ~0% |
|
||||
|
||||
---
|
||||
|
||||
## 12. Security Considerations
|
||||
|
||||
### 12.1 Secret Management
|
||||
|
||||
- Secrets stored as references only
|
||||
- Just-in-time fetch at send time
|
||||
- No plaintext in Mongo
|
||||
|
||||
### 12.2 Webhook Signing
|
||||
|
||||
```
|
||||
X-StellaOps-Signature: t=1732881600,v1=abc123...
|
||||
X-StellaOps-Timestamp: 2025-11-29T12:00:00Z
|
||||
```
|
||||
|
||||
- HMAC-SHA256 or Ed25519
|
||||
- Replay window protection
|
||||
- Canonical body hash
|
||||
|
||||
### 12.3 Loop Prevention
|
||||
|
||||
- Webhook target allowlist
|
||||
- Event origin tags
|
||||
- Own webhooks rejected
|
||||
|
||||
---
|
||||
|
||||
## 13. Related Documentation
|
||||
|
||||
| Resource | Location |
|
||||
|----------|----------|
|
||||
| Notify architecture | `docs/modules/notify/architecture.md` |
|
||||
| Channel schemas | `docs/modules/notify/resources/schemas/` |
|
||||
| Sample payloads | `docs/modules/notify/resources/samples/` |
|
||||
| Bootstrap pack | `docs/modules/notify/bootstrap-pack.md` |
|
||||
|
||||
---
|
||||
|
||||
## 14. Sprint Mapping
|
||||
|
||||
- **Primary Sprint:** SPRINT_0170_0001_0001_notify_engine.md (NEW)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0171_0001_0002_notify_connectors.md
|
||||
- SPRINT_0172_0001_0003_notify_ack_tokens.md
|
||||
|
||||
**Key Task IDs:**
|
||||
- `NOTIFY-ENGINE-40-001` - Rules engine (DONE)
|
||||
- `NOTIFY-CONN-41-001` - Connectors (DONE)
|
||||
- `NOTIFY-NOISE-42-001` - Throttling/digests (DONE)
|
||||
- `NOTIFY-ACK-45-001` - Token rotation (IN PROGRESS)
|
||||
- `NOTIFY-ESC-46-001` - Escalation workflows (TODO)
|
||||
|
||||
---
|
||||
|
||||
## 15. Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Delivery latency | < 60s p95 |
|
||||
| Delivery success rate | > 99.5% |
|
||||
| Duplicate rate | < 0.01% |
|
||||
| Rule evaluation time | < 10ms |
|
||||
| Channel health | 99.9% uptime |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
Reference in New Issue
Block a user