The gateway service was a redundant deployment of the same
StellaOps.Gateway.WebService binary already running as router-gateway.
It served no unique purpose — all traffic is handled by router-gateway
(slot 0). This removes the container, its route table entries, nginx
proxy blocks, health/quota stubs, and redirects STELLAOPS_GATEWAY_URL
to router.stella-ops.local so the Angular frontend resolves API base
URLs through the canonical frontdoor.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded 1-5s polling constants with configurable
QueueWaitTimeoutSeconds (default 0 = pure event-driven). Consumers
now only wake on pub/sub notifications, eliminating ~118 idle
XREADGROUP polls per second across 59 services. Override with
VALKEY_QUEUE_WAIT_TIMEOUT env var if a safety-net poll is needed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Route /api/v1/jobengine to jobengine service (was orchestrator)
- Route /api/v1/sources and /api/v1/witnesses to scanner service
- Add orch:quota and pack-registry scopes to platform OIDC token
- Align compose-local manifests with gateway appsettings.json
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Valkey transport layer used 100ms busy-polling loops (Task.Delay(100))
across ~90 concurrent loops in 45+ services, generating ~900 idle
commands/sec and burning ~58% CPU while the system was completely idle.
Replace polling with Redis Pub/Sub notifications:
- Publishers fire PUBLISH after each XADD (fire-and-forget)
- Consumers SUBSCRIBE and wait on SemaphoreSlim with 30s fallback timeout
- Applies to both ValkeyMessageQueue (INotifiableQueue) and ValkeyEventStream
- Non-Valkey transports fall back to 1s polling via QueueWaitExtensions
Increase heartbeat interval from 10s to 45s across all transport options,
with corresponding health threshold adjustments (stale: 135s, degraded: 90s).
Expected idle CPU reduction: ~58% → ~3-5%.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Valkey messaging transport auto-reconnection:
- MessagingTransportClient: detect persistent Redis failures (5 consecutive)
and exit processing loops instead of retrying forever with dead connection
- IMicroserviceTransport: add TransportDied event to interface
- RouterConnectionManager: listen for TransportDied, auto-reconnect after 2s
- Fixes services becoming unreachable after Valkey blip during restarts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Plugin assemblies loaded via PluginHost into isolated AssemblyLoadContexts
produce distinct types even from the same DLL. When AppDomain.GetAssemblies()
returns both Default and plugin-ALC copies, DI registration and IOptions<T>
resolution silently fail (e.g. ValkeyTransportOptions defaulting to localhost).
Applied AssemblyLoadContext.Default filter to all 7 assembly discovery sites:
- MessagingServiceCollectionExtensions (transport plugin scan)
- StellaRouterIntegrationHelper (transport plugin loader)
- Gateway.WebService Program.cs (startup transport scan)
- GeneratedEndpointDiscoveryProvider (endpoint provider scan)
- ReflectionEndpointDiscoveryProvider (endpoint attribute scan)
- ServiceCollectionExtensions (schema provider scan)
- MigrationModulePluginDiscovery (migration plugin scan)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sprint: SPRINT_20251229_049_BE_csproj_audit_maint_tests
Tasks: AUDIT-0001 through AUDIT-0147 APPLY tasks (approved decisions 1-9)
Changes:
- Set TreatWarningsAsErrors=true for all production .NET projects
- Fixed nullable warnings in Scanner.EntryTrace, Scanner.Evidence,
Scheduler.Worker, Concelier connectors, and other modules
- Injected TimeProvider/IGuidProvider for deterministic time/ID generation
- Added path traversal validation in AirGap.Bundle
- Fixed NULL handling in various cursor classes
- Third-party GostCryptography retains TreatWarningsAsErrors=false (preserves original)
- Test projects excluded per user decision (rejected decision 10)
Note: All 17 ACSC connector tests pass after snapshot fixture sync