Eliminate Valkey queue polling fallback (phase 2 CPU optimization)

Replace hardcoded 1-5s polling constants with configurable
QueueWaitTimeoutSeconds (default 0 = pure event-driven). Consumers
now only wake on pub/sub notifications, eliminating ~118 idle
XREADGROUP polls per second across 59 services. Override with
VALKEY_QUEUE_WAIT_TIMEOUT env var if a safety-net poll is needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
master
2026-03-10 02:36:01 +02:00
parent 166745f9f9
commit 31cb31d0fb
6 changed files with 68 additions and 41 deletions

View File

@@ -111,6 +111,24 @@ Task description:
Completion criteria:
- [x] Verified ValkeyMessageQueue already uses push-first pattern
### WS-7 — Eliminate Valkey Queue Polling Fallback
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Remove hardcoded 1s PollingFallback and 1-5s notifiable timeout constants from QueueWaitExtensions.
- Add configurable `QueueWaitTimeoutSeconds` to ValkeyTransportOptions (default: 0 = pure event-driven).
- ValkeyMessageQueue.WaitForNotificationAsync uses configured timeout instead of caller-provided value.
- Compose env var `VALKEY_QUEUE_WAIT_TIMEOUT` (default 0) controls the setting for all services.
Completion criteria:
- [x] QueueWaitTimeoutSeconds added to ValkeyTransportOptions with default 0
- [x] ValkeyMessageQueue uses configured timeout (0 = Timeout.InfiniteTimeSpan)
- [x] Hardcoded PollingFallback/MinimumNotifiableTimeout/MaximumNotifiableTimeout removed from QueueWaitExtensions
- [x] Compose YAML updated for microservice defaults and gateway
- [x] All 252 gateway tests pass
- [x] Compose validates clean (45 services have the setting)
### WS-6 — GC Configuration
Status: DONE
Dependency: none
@@ -129,12 +147,14 @@ Completion criteria:
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-03-10 | Sprint created. All workstreams completed. All 3 C# projects build clean. Compose validates clean. | Developer |
| 2026-03-10 | WS-7 added: eliminated Valkey queue polling fallback. Default is now pure event-driven (QueueWaitTimeoutSeconds=0). | Developer |
## Decisions & Risks
- Resource limits are dev/QA defaults; production deployments should tune per hardware.
- GCDynamicAdaptationMode=1 requires .NET 8+; all services use .NET 8/9.
- Healthcheck interval override via HEALTHCHECK_INTERVAL env var for operator flexibility.
- Valkey pub/sub notifications are fire-and-forget; fallback timers ensure correctness if missed.
- QueueWaitTimeoutSeconds defaults to 0 (pure event-driven). Set VALKEY_QUEUE_WAIT_TIMEOUT=5 to restore a 5s safety-net poll if pub/sub proves unreliable.
## Next Checkpoints
- Rebuild affected images (platform, jobengine, graph-indexer) after C# changes merge.