Files
git.stella-ops.org/src/Router/__Libraries/StellaOps.Router.Gateway/Configuration/HealthOptions.cs
master 841add4f27 perf(router): replace 100ms Valkey polling with Pub/Sub notification wakeup and increase heartbeat to 45s
The Valkey transport layer used 100ms busy-polling loops (Task.Delay(100))
across ~90 concurrent loops in 45+ services, generating ~900 idle
commands/sec and burning ~58% CPU while the system was completely idle.

Replace polling with Redis Pub/Sub notifications:
- Publishers fire PUBLISH after each XADD (fire-and-forget)
- Consumers SUBSCRIBE and wait on SemaphoreSlim with 30s fallback timeout
- Applies to both ValkeyMessageQueue (INotifiableQueue) and ValkeyEventStream
- Non-Valkey transports fall back to 1s polling via QueueWaitExtensions

Increase heartbeat interval from 10s to 45s across all transport options,
with corresponding health threshold adjustments (stale: 135s, degraded: 90s).

Expected idle CPU reduction: ~58% → ~3-5%.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 07:47:31 +02:00

39 lines
1.3 KiB
C#

namespace StellaOps.Router.Gateway.Configuration;
/// <summary>
/// Configuration options for health monitoring.
/// </summary>
public sealed class HealthOptions
{
/// <summary>
/// Gets the configuration section name.
/// </summary>
public const string SectionName = "Router:Health";
/// <summary>
/// Gets or sets the threshold after which a connection is considered stale (no heartbeat).
/// Should be at least 3x the heartbeat interval (45s default).
/// Default: 135 seconds.
/// </summary>
public TimeSpan StaleThreshold { get; set; } = TimeSpan.FromSeconds(135);
/// <summary>
/// Gets or sets the threshold after which a connection is considered degraded.
/// Should be at least 2x the heartbeat interval (45s default).
/// Default: 90 seconds.
/// </summary>
public TimeSpan DegradedThreshold { get; set; } = TimeSpan.FromSeconds(90);
/// <summary>
/// Gets or sets the interval at which to check for stale connections.
/// Default: 15 seconds.
/// </summary>
public TimeSpan CheckInterval { get; set; } = TimeSpan.FromSeconds(15);
/// <summary>
/// Gets or sets the number of ping measurements to keep for averaging.
/// Default: 10.
/// </summary>
public int PingHistorySize { get; set; } = 10;
}