fix tests. new product advisories enhancements
This commit is contained in:
42
docs/notifications/operations/alerts/notify-slo-alerts.yaml
Normal file
42
docs/notifications/operations/alerts/notify-slo-alerts.yaml
Normal file
@@ -0,0 +1,42 @@
|
||||
# Notify SLO Alerts
|
||||
# Prometheus alerting rules for the notification service
|
||||
|
||||
groups:
|
||||
- name: notify-slo
|
||||
rules:
|
||||
- alert: NotifyDeliverySuccessSLO
|
||||
expr: |
|
||||
(
|
||||
sum(rate(notify_delivery_success_total[5m])) /
|
||||
sum(rate(notify_delivery_total[5m]))
|
||||
) < 0.99
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
service: notify
|
||||
annotations:
|
||||
summary: "Notification delivery success rate below SLO"
|
||||
description: "Current success rate: {{ $value | humanizePercentage }}"
|
||||
|
||||
- alert: NotifyBacklogDepth
|
||||
expr: notify_backlog_depth > 10000
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
service: notify
|
||||
annotations:
|
||||
summary: "Notification backlog depth high"
|
||||
description: "Current backlog: {{ $value }} notifications"
|
||||
|
||||
- alert: NotifyLatencyP99
|
||||
expr: |
|
||||
histogram_quantile(0.99,
|
||||
sum(rate(notify_delivery_duration_seconds_bucket[5m])) by (le)
|
||||
) > 5
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
service: notify
|
||||
annotations:
|
||||
summary: "Notification delivery P99 latency high"
|
||||
description: "P99 latency: {{ $value | humanizeDuration }}"
|
||||
32
docs/notifications/operations/quotas.md
Normal file
32
docs/notifications/operations/quotas.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# Notification Quotas
|
||||
|
||||
This document describes the quota system for notification delivery.
|
||||
|
||||
## Overview
|
||||
|
||||
Quotas ensure fair usage of the notification system across tenants.
|
||||
|
||||
## Quota Types
|
||||
|
||||
### Daily Limits
|
||||
- Maximum notifications per day per tenant
|
||||
- Maximum notifications per channel per day
|
||||
|
||||
### Rate Limits
|
||||
- Maximum notifications per minute
|
||||
- Maximum notifications per second per channel
|
||||
|
||||
### Size Limits
|
||||
- Maximum payload size
|
||||
- Maximum attachment count
|
||||
|
||||
## Quota Enforcement
|
||||
|
||||
Quota violations result in:
|
||||
1. Notification is queued for later delivery
|
||||
2. Tenant is notified of quota exceeded
|
||||
3. Admin alert is triggered if threshold is reached
|
||||
|
||||
## Configuration
|
||||
|
||||
Quotas are configured per tenant and can be overridden by administrators.
|
||||
38
docs/notifications/operations/retries.md
Normal file
38
docs/notifications/operations/retries.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# Notification Retries
|
||||
|
||||
This document describes the retry mechanism for failed notification deliveries.
|
||||
|
||||
## Overview
|
||||
|
||||
The retry system ensures reliable notification delivery even when temporary failures occur.
|
||||
|
||||
## Retry Strategy
|
||||
|
||||
### Exponential Backoff
|
||||
- Initial delay: 5 seconds
|
||||
- Maximum delay: 1 hour
|
||||
- Backoff multiplier: 2x
|
||||
|
||||
### Retry Limits
|
||||
- Maximum attempts: 10
|
||||
- Maximum retry duration: 24 hours
|
||||
|
||||
### Retry Conditions
|
||||
- Network errors: Always retry
|
||||
- HTTP 5xx errors: Always retry
|
||||
- HTTP 429 (rate limit): Retry with Retry-After header
|
||||
- HTTP 4xx errors: Do not retry (permanent failure)
|
||||
|
||||
## Dead Letter Queue
|
||||
|
||||
Notifications that exceed retry limits are moved to the dead letter queue for:
|
||||
- Manual inspection
|
||||
- Automatic alerting
|
||||
- Scheduled reprocessing
|
||||
|
||||
## Monitoring
|
||||
|
||||
Retry metrics are exposed for:
|
||||
- Retry count per notification
|
||||
- Success rate after retries
|
||||
- Average retry duration
|
||||
27
docs/notifications/schemas/notify-schemas-catalog.json
Normal file
27
docs/notifications/schemas/notify-schemas-catalog.json
Normal file
@@ -0,0 +1,27 @@
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"$id": "https://docs.stella-ops.org/notifications/schemas/notify-schemas-catalog.json",
|
||||
"title": "Notify Schemas Catalog",
|
||||
"description": "Catalog of all notification schemas",
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"version": {
|
||||
"type": "string",
|
||||
"const": "1.0.0"
|
||||
},
|
||||
"schemas": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": { "type": "string" },
|
||||
"version": { "type": "string" },
|
||||
"description": { "type": "string" },
|
||||
"path": { "type": "string" }
|
||||
},
|
||||
"required": ["name", "version", "path"]
|
||||
}
|
||||
}
|
||||
},
|
||||
"required": ["version", "schemas"]
|
||||
}
|
||||
28
docs/notifications/security/redaction-catalog.md
Normal file
28
docs/notifications/security/redaction-catalog.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# Redaction Catalog
|
||||
|
||||
This document catalogs the redaction rules applied to notification payloads.
|
||||
|
||||
## Overview
|
||||
|
||||
The redaction catalog ensures that sensitive information is not exposed in notifications.
|
||||
|
||||
## Redaction Rules
|
||||
|
||||
### Personal Identifiable Information (PII)
|
||||
- Email addresses are partially redacted
|
||||
- IP addresses are anonymized
|
||||
- User names are replaced with user IDs
|
||||
|
||||
### Credentials
|
||||
- API keys are fully redacted
|
||||
- Passwords are never included
|
||||
- Tokens are truncated to first/last 4 characters
|
||||
|
||||
### Internal Data
|
||||
- Internal URLs are replaced with public equivalents
|
||||
- Database IDs are not exposed
|
||||
- Stack traces are summarized
|
||||
|
||||
## Configuration
|
||||
|
||||
Redaction rules can be customized per tenant and notification channel.
|
||||
19
docs/notifications/security/tenant-approvals.md
Normal file
19
docs/notifications/security/tenant-approvals.md
Normal file
@@ -0,0 +1,19 @@
|
||||
# Tenant Approvals
|
||||
|
||||
This document describes the tenant approval process for notification delivery.
|
||||
|
||||
## Overview
|
||||
|
||||
Tenant approvals ensure that notifications are only sent to approved tenants with proper configuration.
|
||||
|
||||
## Approval Process
|
||||
|
||||
1. Tenant submits a request for notification access
|
||||
2. Admin reviews the request and approves/denies
|
||||
3. Approved tenants can configure notification channels
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- All approval decisions are logged for audit purposes
|
||||
- Approvals can be revoked at any time
|
||||
- Cross-tenant notifications are blocked by default
|
||||
22
docs/notifications/security/webhook-ack-hardening.md
Normal file
22
docs/notifications/security/webhook-ack-hardening.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# Webhook Acknowledgment Hardening
|
||||
|
||||
This document describes the security measures for webhook acknowledgment validation.
|
||||
|
||||
## Overview
|
||||
|
||||
Webhook acknowledgment hardening ensures that webhook deliveries are properly verified and acknowledged.
|
||||
|
||||
## Security Measures
|
||||
|
||||
- HMAC signature verification for all webhook payloads
|
||||
- Timeout handling for slow webhook endpoints
|
||||
- Retry logic with exponential backoff
|
||||
- Dead letter queue for failed deliveries
|
||||
|
||||
## Configuration
|
||||
|
||||
Webhook endpoints must be configured with:
|
||||
- Secret key for HMAC signing
|
||||
- Signature header name
|
||||
- Timeout duration
|
||||
- Maximum retry attempts
|
||||
4
docs/notifications/simulations/index.ndjson
Normal file
4
docs/notifications/simulations/index.ndjson
Normal file
@@ -0,0 +1,4 @@
|
||||
{"simulation_id": "sim-001", "name": "High Volume Burst", "description": "Simulates a burst of 10000 notifications in 1 minute", "tenant": "test-tenant", "status": "ready"}
|
||||
{"simulation_id": "sim-002", "name": "Rate Limit Test", "description": "Simulates hitting rate limits across all channels", "tenant": "test-tenant", "status": "ready"}
|
||||
{"simulation_id": "sim-003", "name": "Retry Storm", "description": "Simulates webhook endpoints returning 500 errors causing retries", "tenant": "test-tenant", "status": "ready"}
|
||||
{"simulation_id": "sim-004", "name": "Multi-Tenant Isolation", "description": "Validates tenant isolation with concurrent notifications", "tenant": "test-tenant", "status": "ready"}
|
||||
Reference in New Issue
Block a user