fix tests. new product advisories enhancements
This commit is contained in:
42
docs/notifications/operations/alerts/notify-slo-alerts.yaml
Normal file
42
docs/notifications/operations/alerts/notify-slo-alerts.yaml
Normal file
@@ -0,0 +1,42 @@
|
||||
# Notify SLO Alerts
|
||||
# Prometheus alerting rules for the notification service
|
||||
|
||||
groups:
|
||||
- name: notify-slo
|
||||
rules:
|
||||
- alert: NotifyDeliverySuccessSLO
|
||||
expr: |
|
||||
(
|
||||
sum(rate(notify_delivery_success_total[5m])) /
|
||||
sum(rate(notify_delivery_total[5m]))
|
||||
) < 0.99
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
service: notify
|
||||
annotations:
|
||||
summary: "Notification delivery success rate below SLO"
|
||||
description: "Current success rate: {{ $value | humanizePercentage }}"
|
||||
|
||||
- alert: NotifyBacklogDepth
|
||||
expr: notify_backlog_depth > 10000
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
service: notify
|
||||
annotations:
|
||||
summary: "Notification backlog depth high"
|
||||
description: "Current backlog: {{ $value }} notifications"
|
||||
|
||||
- alert: NotifyLatencyP99
|
||||
expr: |
|
||||
histogram_quantile(0.99,
|
||||
sum(rate(notify_delivery_duration_seconds_bucket[5m])) by (le)
|
||||
) > 5
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
service: notify
|
||||
annotations:
|
||||
summary: "Notification delivery P99 latency high"
|
||||
description: "P99 latency: {{ $value | humanizeDuration }}"
|
||||
32
docs/notifications/operations/quotas.md
Normal file
32
docs/notifications/operations/quotas.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# Notification Quotas
|
||||
|
||||
This document describes the quota system for notification delivery.
|
||||
|
||||
## Overview
|
||||
|
||||
Quotas ensure fair usage of the notification system across tenants.
|
||||
|
||||
## Quota Types
|
||||
|
||||
### Daily Limits
|
||||
- Maximum notifications per day per tenant
|
||||
- Maximum notifications per channel per day
|
||||
|
||||
### Rate Limits
|
||||
- Maximum notifications per minute
|
||||
- Maximum notifications per second per channel
|
||||
|
||||
### Size Limits
|
||||
- Maximum payload size
|
||||
- Maximum attachment count
|
||||
|
||||
## Quota Enforcement
|
||||
|
||||
Quota violations result in:
|
||||
1. Notification is queued for later delivery
|
||||
2. Tenant is notified of quota exceeded
|
||||
3. Admin alert is triggered if threshold is reached
|
||||
|
||||
## Configuration
|
||||
|
||||
Quotas are configured per tenant and can be overridden by administrators.
|
||||
38
docs/notifications/operations/retries.md
Normal file
38
docs/notifications/operations/retries.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# Notification Retries
|
||||
|
||||
This document describes the retry mechanism for failed notification deliveries.
|
||||
|
||||
## Overview
|
||||
|
||||
The retry system ensures reliable notification delivery even when temporary failures occur.
|
||||
|
||||
## Retry Strategy
|
||||
|
||||
### Exponential Backoff
|
||||
- Initial delay: 5 seconds
|
||||
- Maximum delay: 1 hour
|
||||
- Backoff multiplier: 2x
|
||||
|
||||
### Retry Limits
|
||||
- Maximum attempts: 10
|
||||
- Maximum retry duration: 24 hours
|
||||
|
||||
### Retry Conditions
|
||||
- Network errors: Always retry
|
||||
- HTTP 5xx errors: Always retry
|
||||
- HTTP 429 (rate limit): Retry with Retry-After header
|
||||
- HTTP 4xx errors: Do not retry (permanent failure)
|
||||
|
||||
## Dead Letter Queue
|
||||
|
||||
Notifications that exceed retry limits are moved to the dead letter queue for:
|
||||
- Manual inspection
|
||||
- Automatic alerting
|
||||
- Scheduled reprocessing
|
||||
|
||||
## Monitoring
|
||||
|
||||
Retry metrics are exposed for:
|
||||
- Retry count per notification
|
||||
- Success rate after retries
|
||||
- Average retry duration
|
||||
Reference in New Issue
Block a user