The topology wizard creates environments and targets via POST /api/v1/environments
and POST /api/v1/targets. These were routed to JobEngine which doesn't have
the identity envelope middleware, causing 404 on ReverseProxy routes.
Fix: Add environment CRUD, target CRUD, and agent list endpoints directly
to Concelier's TopologySetupEndpointExtensions. These use the same
Topology.Read/Manage authorization policies that work with the identity
envelope middleware.
Routes updated:
- /api/v1/environments → Concelier (was JobEngine)
- /api/v1/agents → Concelier (new)
Topology wizard now completes steps 1-4:
1. Region: CREATE OK
2. Environment: CREATE OK
3. Stage Order: OK (skip)
4. Target: CREATE OK
5. Agent: BLOCKED (expected — no agents deployed on fresh install)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Concelier:
- Register Topology.Read, Topology.Manage, Topology.Admin authorization
policies mapped to OrchRead/OrchOperate/PlatformContextRead/IntegrationWrite
scopes. Previously these policies were referenced by endpoints but never
registered, causing System.InvalidOperationException on every topology
API call.
Gateway routes:
- Simplified targets/environments routes (removed specific sub-path routes,
use catch-all patterns instead)
- Changed environments base route to JobEngine (where CRUD lives)
- Changed to ReverseProxy type for all topology routes
KNOWN ISSUE (not yet fixed):
- ReverseProxy routes don't forward the gateway's identity envelope to
Concelier. The regions/targets/bindings endpoints return 401 because
hasPrincipal=False — the gateway authenticates the user but doesn't
pass the identity to the backend via ReverseProxy. Microservice routes
use Valkey transport which includes envelope headers. Topology endpoints
need either: (a) Valkey transport registration in Concelier, or
(b) Concelier configured to accept raw bearer tokens on ReverseProxy paths.
This is an architecture-level fix.
Journey findings collected so far:
- Integration wizard (Harbor + GitHub App): works end-to-end
- Advisory Check All: fixed (parallel individual checks)
- Mirror domain creation: works, generate-immediately fails silently
- Topology wizard Step 1 (Region): blocked by auth passthrough issue
- Topology wizard Step 2 (Environment): POST to JobEngine needs verify
- User ID resolution: raw hashes shown everywhere
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The topology setup wizard calls /api/v1/regions, /api/v1/infrastructure-bindings,
/api/v1/pending-deletions, and target/environment readiness+validate endpoints
which are registered on the Concelier service. Without explicit gateway routes,
these fall through to the generic Microservice matcher which tries to find a
non-existent "regions" service, returning 503.
Added 6 Microservice routes forwarding topology API paths to
http://concelier.stella-ops.local. Both compose mount config and source
appsettings.json updated.
Verified: /api/v1/regions now returns 401 (auth required) instead of 503.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend: ExtractParameters() now discovers query params from [AsParameters]
records and [FromQuery] attributes via handler method reflection. Gateway
OpenApiDocumentGenerator emits parameters arrays in the aggregated spec.
QueryParameterInfo added to EndpointSchemaInfo for HELLO payload transport.
Frontend: Remaining spec files and straggler services updated to canonical
X-Stella-Ops-* header names. Sprint 026 archived (tasks 01-06 DONE,
07-09 TODO for backend service rename pass).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The gateway service was a redundant deployment of the same
StellaOps.Gateway.WebService binary already running as router-gateway.
It served no unique purpose — all traffic is handled by router-gateway
(slot 0). This removes the container, its route table entries, nginx
proxy blocks, health/quota stubs, and redirects STELLAOPS_GATEWAY_URL
to router.stella-ops.local so the Angular frontend resolves API base
URLs through the canonical frontdoor.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded 1-5s polling constants with configurable
QueueWaitTimeoutSeconds (default 0 = pure event-driven). Consumers
now only wake on pub/sub notifications, eliminating ~118 idle
XREADGROUP polls per second across 59 services. Override with
VALKEY_QUEUE_WAIT_TIMEOUT env var if a safety-net poll is needed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Route /api/v1/jobengine to jobengine service (was orchestrator)
- Route /api/v1/sources and /api/v1/witnesses to scanner service
- Add orch:quota and pack-registry scopes to platform OIDC token
- Align compose-local manifests with gateway appsettings.json
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Valkey transport layer used 100ms busy-polling loops (Task.Delay(100))
across ~90 concurrent loops in 45+ services, generating ~900 idle
commands/sec and burning ~58% CPU while the system was completely idle.
Replace polling with Redis Pub/Sub notifications:
- Publishers fire PUBLISH after each XADD (fire-and-forget)
- Consumers SUBSCRIBE and wait on SemaphoreSlim with 30s fallback timeout
- Applies to both ValkeyMessageQueue (INotifiableQueue) and ValkeyEventStream
- Non-Valkey transports fall back to 1s polling via QueueWaitExtensions
Increase heartbeat interval from 10s to 45s across all transport options,
with corresponding health threshold adjustments (stale: 135s, degraded: 90s).
Expected idle CPU reduction: ~58% → ~3-5%.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Valkey messaging transport auto-reconnection:
- MessagingTransportClient: detect persistent Redis failures (5 consecutive)
and exit processing loops instead of retrying forever with dead connection
- IMicroserviceTransport: add TransportDied event to interface
- RouterConnectionManager: listen for TransportDied, auto-reconnect after 2s
- Fixes services becoming unreachable after Valkey blip during restarts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Plugin assemblies loaded via PluginHost into isolated AssemblyLoadContexts
produce distinct types even from the same DLL. When AppDomain.GetAssemblies()
returns both Default and plugin-ALC copies, DI registration and IOptions<T>
resolution silently fail (e.g. ValkeyTransportOptions defaulting to localhost).
Applied AssemblyLoadContext.Default filter to all 7 assembly discovery sites:
- MessagingServiceCollectionExtensions (transport plugin scan)
- StellaRouterIntegrationHelper (transport plugin loader)
- Gateway.WebService Program.cs (startup transport scan)
- GeneratedEndpointDiscoveryProvider (endpoint provider scan)
- ReflectionEndpointDiscoveryProvider (endpoint attribute scan)
- ServiceCollectionExtensions (schema provider scan)
- MigrationModulePluginDiscovery (migration plugin scan)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>