perf(router): replace 100ms Valkey polling with Pub/Sub notification wakeup and increase heartbeat to 45s

The Valkey transport layer used 100ms busy-polling loops (Task.Delay(100))
across ~90 concurrent loops in 45+ services, generating ~900 idle
commands/sec and burning ~58% CPU while the system was completely idle.

Replace polling with Redis Pub/Sub notifications:
- Publishers fire PUBLISH after each XADD (fire-and-forget)
- Consumers SUBSCRIBE and wait on SemaphoreSlim with 30s fallback timeout
- Applies to both ValkeyMessageQueue (INotifiableQueue) and ValkeyEventStream
- Non-Valkey transports fall back to 1s polling via QueueWaitExtensions

Increase heartbeat interval from 10s to 45s across all transport options,
with corresponding health threshold adjustments (stale: 135s, degraded: 90s).

Expected idle CPU reduction: ~58% → ~3-5%.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
master
2026-03-09 07:47:31 +02:00
parent f218ec82ec
commit 841add4f27
17 changed files with 230 additions and 48 deletions

View File

@@ -12,21 +12,23 @@ public sealed class HealthOptions
/// <summary>
/// Gets or sets the threshold after which a connection is considered stale (no heartbeat).
/// Default: 30 seconds.
/// Should be at least 3x the heartbeat interval (45s default).
/// Default: 135 seconds.
/// </summary>
public TimeSpan StaleThreshold { get; set; } = TimeSpan.FromSeconds(30);
public TimeSpan StaleThreshold { get; set; } = TimeSpan.FromSeconds(135);
/// <summary>
/// Gets or sets the threshold after which a connection is considered degraded.
/// Default: 15 seconds.
/// Should be at least 2x the heartbeat interval (45s default).
/// Default: 90 seconds.
/// </summary>
public TimeSpan DegradedThreshold { get; set; } = TimeSpan.FromSeconds(15);
public TimeSpan DegradedThreshold { get; set; } = TimeSpan.FromSeconds(90);
/// <summary>
/// Gets or sets the interval at which to check for stale connections.
/// Default: 5 seconds.
/// Default: 15 seconds.
/// </summary>
public TimeSpan CheckInterval { get; set; } = TimeSpan.FromSeconds(5);
public TimeSpan CheckInterval { get; set; } = TimeSpan.FromSeconds(15);
/// <summary>
/// Gets or sets the number of ping measurements to keep for averaging.