perf(router): replace 100ms Valkey polling with Pub/Sub notification wakeup and increase heartbeat to 45s

The Valkey transport layer used 100ms busy-polling loops (Task.Delay(100))
across ~90 concurrent loops in 45+ services, generating ~900 idle
commands/sec and burning ~58% CPU while the system was completely idle.

Replace polling with Redis Pub/Sub notifications:
- Publishers fire PUBLISH after each XADD (fire-and-forget)
- Consumers SUBSCRIBE and wait on SemaphoreSlim with 30s fallback timeout
- Applies to both ValkeyMessageQueue (INotifiableQueue) and ValkeyEventStream
- Non-Valkey transports fall back to 1s polling via QueueWaitExtensions

Increase heartbeat interval from 10s to 45s across all transport options,
with corresponding health threshold adjustments (stale: 135s, degraded: 90s).

Expected idle CPU reduction: ~58% → ~3-5%.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
master
2026-03-09 07:47:31 +02:00
parent f218ec82ec
commit 841add4f27
17 changed files with 230 additions and 48 deletions

View File

@@ -39,9 +39,9 @@ public class StellaRouterOptionsBase
/// <summary>
/// Heartbeat interval in seconds for health reporting.
/// Default: 10 seconds.
/// Default: 45 seconds.
/// </summary>
public int HeartbeatIntervalSeconds { get; set; } = 10;
public int HeartbeatIntervalSeconds { get; set; } = 45;
/// <summary>
/// Service trust mode for gateway-enforced authorization semantics.