docs consolidation
This commit is contained in:
@@ -8,7 +8,7 @@ Operational steps to deploy, monitor, and recover the Notifications service (Web
|
||||
## Pre-flight
|
||||
- Secrets stored in Authority: SMTP creds, Slack/Teams hooks, webhook HMAC keys.
|
||||
- Outbound allowlist updated for target channels.
|
||||
- PostgreSQL and Redis reachable; health checks pass.
|
||||
- PostgreSQL and Valkey reachable; health checks pass.
|
||||
- Offline kit loaded: channel manifests, default templates, rule seeds.
|
||||
|
||||
## Deploy
|
||||
@@ -37,7 +37,7 @@ Operational steps to deploy, monitor, and recover the Notifications service (Web
|
||||
- **Rotate secrets**: update Authority secret, then `POST /api/v1/notify/channels/{id}:refresh-secret`.
|
||||
|
||||
## Failure recovery
|
||||
- Worker crash loop: check Redis connectivity, template compile errors; run `notify-worker --validate-only` using current config.
|
||||
- Worker crash loop: check Valkey connectivity, template compile errors; run `notify-worker --validate-only` using current config.
|
||||
- PostgreSQL outage: worker backs off with exponential retry; after recovery, replay via `:replay` or digests as needed.
|
||||
- Channel outage (e.g., Slack 5xx): throttles + retry policy handle transient errors; for extended outages, disable channel or swap to backup policy.
|
||||
|
||||
@@ -54,5 +54,5 @@ Operational steps to deploy, monitor, and recover the Notifications service (Web
|
||||
- [ ] Health endpoints green.
|
||||
- [ ] Delivery failure rate < 0.5% over last hour.
|
||||
- [ ] Escalation backlog empty or within SLO.
|
||||
- [ ] Redis memory < 75% and PostgreSQL primary healthy.
|
||||
- [ ] Valkey memory < 75% and PostgreSQL primary healthy.
|
||||
- [ ] Latest release notes applied and channels validated.
|
||||
|
||||
Reference in New Issue
Block a user