Files
git.stella-ops.org/docs/doctor/articles/notify/queue-health.md
master c58a236d70 Doctor plugin checks: implement health check classes and documentation
Implement remediation-aware health checks across all Doctor plugin modules
(Agent, Attestor, Auth, BinaryAnalysis, Compliance, Crypto, Environment,
EvidenceLocker, Notify, Observability, Operations, Policy, Postgres, Release,
Scanner, Storage, Vex) and their backing library counterparts (AI, Attestation,
Authority, Core, Cryptography, Database, Docker, Integration, Notify,
Observability, Security, ServiceGraph, Sources, Verification).

Each check now emits structured remediation metadata (severity, category,
runbook links, and fix suggestions) consumed by the Doctor dashboard
remediation panel.

Also adds:
- docs/doctor/articles/ knowledge base for check explanations
- Advisory AI search seed and allowlist updates for doctor content
- Sprint plan for doctor checks documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:28:00 +02:00

2.5 KiB

checkId, plugin, severity, tags
checkId plugin severity tags
check.notify.queue.health stellaops.doctor.notify fail
notify
queue
redis
nats
infrastructure

Notification Queue Health

What It Checks

Verifies that the notification event and delivery queues are healthy. The check:

  • Reads the Notify:Queue:Transport (or Kind) setting to determine the queue transport type (Redis/Valkey or NATS).
  • Resolves NotifyQueueHealthCheck and NotifyDeliveryQueueHealthCheck from the DI container.
  • Invokes each registered health check and aggregates the results.
  • Fails if any queue reports an Unhealthy status; warns if degraded; passes if all are healthy.

The check only runs when a queue transport is configured in Notify:Queue:Transport.

Why It Matters

The notification queue is the backbone of the notification pipeline. If the event queue is unhealthy, new notification events are lost. If the delivery queue is unhealthy, pending notifications to email, Slack, Teams, and webhook channels will not be delivered. This is a severity-fail check because queue failure means complete notification blackout.

Common Causes

  • Queue server (Redis/Valkey/NATS) not running
  • Network connectivity issues between the Notify service and the queue server
  • Authentication failure (wrong password or credentials)
  • Incorrect connection string in configuration

How to Fix

Docker Compose

For Redis/Valkey transport:

# Check Redis health
docker exec <redis-container> redis-cli ping

# Check connection string
docker exec <notify-container> env | grep Notify__Queue

# Restart Redis if needed
docker restart <redis-container>

For NATS transport:

# Check NATS server status
docker exec <nats-container> nats server ping

# Check NATS logs
docker logs <nats-container> --tail 50

Bare Metal / systemd

# Redis/Valkey
redis-cli ping
redis-cli info server

# NATS
nats server ping
systemctl status nats

Verify the connection string in appsettings.json:

{
  "Notify": {
    "Queue": {
      "Transport": "redis",
      "Redis": {
        "ConnectionString": "127.1.1.2:6379"
      }
    }
  }
}

Kubernetes / Helm

kubectl exec -it <redis-pod> -- redis-cli ping
kubectl logs <notify-pod> --tail 50 | grep -i queue

Verification

stella doctor run --check check.notify.queue.health
  • check.notify.email.configured — verifies email channel configuration
  • check.notify.slack.configured — verifies Slack channel configuration
  • check.notify.webhook.configured — verifies webhook channel configuration