# Router Architecture This document is the canonical specification for the StellaOps Router system. Tenant selection and header propagation contract: `docs/architecture/decisions/ADR-002-multi-tenant-same-api-key-selection.md` Service impact ledger: `docs/technical/architecture/multi-tenant-service-impact-ledger.md` Flow sequences: `docs/technical/architecture/multi-tenant-flow-sequences.md` Rollout policy: `docs/operations/multi-tenant-rollout-and-compatibility.md` > **Location clarification (updated 2026-03-04).** The Router (`src/Router/`) hosts `StellaOps.Gateway.WebService` with configurable route tables via `GatewayRouteCatalog`, reverse proxy support, SPA fallback hosting, WebSocket routing, Valkey messaging transport integration, and `StellaOpsRouteResolver` for front-door dispatching. This is the canonical deployment for HTTP ingress. The standalone `src/Gateway/` was deleted in Sprint 200. ## System Architecture ### Scope - A single HTTP ingress service (`StellaOps.Gateway.WebService`) handles all external HTTP traffic - Microservices communicate with the Gateway using binary transports (TCP, TLS, UDP, RabbitMQ) - HTTP is not used for internal microservice-to-gateway traffic - Request/response bodies are opaque to the router (raw bytes/streams) - Forwarded HTTP headers remain case-insensitive across Router frame transport and ASP.NET bridge dispatch; lowercase HTTP/2 names such as `content-type` must be preserved for JSON-bound endpoints, and the ASP.NET bridge must mark POST/PUT/PATCH requests as body-capable so minimal-API JSON binding survives frame dispatch - Gateway scope authorization evaluates against the resolved per-request scope set from identity expansion (`GatewayContextKeys.Scopes`), so coarse compatibility scopes such as `orch:quota` can satisfy their fine-grained frontdoor equivalents without changing downstream policy names ### Transport Architecture Each transport connection carries: - Initial registration (HELLO) and endpoint configuration - Ongoing heartbeats - Request/response data frames - Streaming data frames - Cancellation frames ``` ┌─────────────────┐ ┌─────────────────┐ │ Microservice │ │ Gateway │ │ │ HELLO │ │ │ Endpoints: │ ─────────────────────────►│ Routing │ │ - POST /items │ HEARTBEAT │ State │ │ - GET /items │ ◄────────────────────────►│ │ │ │ │ Connections[] │ │ │ REQUEST / RESPONSE │ │ │ │ ◄────────────────────────►│ │ │ │ │ │ │ │ STREAM_DATA / CANCEL │ │ │ │ ◄────────────────────────►│ │ └─────────────────┘ └─────────────────┘ ``` --- ## Front Door (Configurable Route Table) The Router Gateway serves as the **single HTTP entry point** for the entire StellaOps platform. In addition to binary transport routing for microservices, it handles: - **Static file serving** (Angular SPA dist) - **Reverse proxy** to HTTP-only backend services - **WebSocket proxy** to upstream WebSocket servers - **SPA fallback** (extensionless paths serve `index.html`) - **Custom error pages** (404/500 HTML fallback) ### Route Table Model Routes are configured in `Gateway:Routes` as a `StellaOpsRoute[]` array, evaluated **first-match-wins**: ```csharp public sealed class StellaOpsRoute { public StellaOpsRouteType Type { get; set; } public string Path { get; set; } public bool IsRegex { get; set; } public string? TranslatesTo { get; set; } public Dictionary Headers { get; set; } } ``` Route types: | Type | Behavior | |------|----------| | `ReverseProxy` | Strip path prefix, forward to `TranslatesTo` HTTP URL | | `StaticFiles` | Serve files from `TranslatesTo` directory, SPA fallback if `x-spa-fallback: true` header set | | `StaticFile` | Serve a single file at exact path match | | `WebSocket` | Bidirectional WebSocket proxy to `TranslatesTo` ws:// URL | | `Microservice` | Pass through to binary transport pipeline | | `NotFoundPage` | HTML file served on 404 (after all other middleware) | | `ServerErrorPage` | HTML file served on 5xx (after all other middleware) | Reverse proxy is reserved for external/bootstrap surfaces such as OIDC browser flows, Rekor, and frontdoor static assets. First-party Stella API surfaces are expected to use `Microservice` routing so the gateway remains the single routing authority instead of silently bypassing router registration state. ### Pipeline Order System paths (`/health`, `/metrics`, `/openapi.*`) bypass the route table entirely. The dispatch middleware runs before the microservice pipeline: ``` HealthCheckMiddleware → (system paths: health, metrics) RouteDispatchMiddleware → (static files, reverse proxy, websocket) MapRouterOpenApi → (OpenAPI endpoints) UseWhen(non-system) → (microservice pipeline: auth, routing, transport) ErrorPageFallbackMiddleware → (custom 404/500 pages) ``` ### Docker Architecture ``` Browser → Router Gateway (port 80) → [microservices via binary transport] → [HTTP backends via reverse proxy] → [Angular SPA from /app/wwwroot volume] ``` The Angular SPA dist is provided by a `console-builder` init container that copies the built files to a shared `console-dist` volume mounted at `/app/wwwroot`. When the gateway runs in-container, listener binding must honor explicit `ASPNETCORE_URLS` / `ASPNETCORE_HTTP_PORTS` / `ASPNETCORE_HTTPS_PORTS` values from compose. Wildcard hosts (`+`, `*`) are normalized to `0.0.0.0` before Kestrel listeners are created so the declared HTTP frontdoor contract actually comes up. --- ## Service Identity ### Instance Identity Each microservice instance is identified by: | Field | Type | Description | |-------|------|-------------| | `ServiceName` | string | Logical service name (e.g., "billing") | | `Version` | string | Semantic version (`major.minor.patch`) | | `Region` | string | Deployment region (e.g., "us-east-1") | | `InstanceId` | string | Unique instance identifier | ### Version Matching - Version matching is strict semver equality - Router only routes to instances with exact version match - Default version used when client doesn't specify ### Region Configuration Gateway region comes from `GatewayNodeConfig`: ```csharp public sealed class GatewayNodeConfig { public required string Region { get; init; } // e.g., "eu1" public required string NodeId { get; init; } // e.g., "gw-eu1-01" public required string Environment { get; init; } // e.g., "prod" } ``` Region is never derived from HTTP headers or URL hostnames. --- ## Endpoint Model ### Endpoint Identity Endpoint identity is `(HTTP Method, Path)`: | Field | Example | |-------|---------| | Method | `GET`, `POST`, `PUT`, `PATCH`, `DELETE` | | Path | `/invoices`, `/items/{id}`, `/users/{userId}/orders` | ### Endpoint Descriptor Each endpoint includes: ```csharp public sealed class EndpointDescriptor { public required string Method { get; init; } public required string Path { get; init; } public required string ServiceName { get; init; } public required string Version { get; init; } public TimeSpan DefaultTimeout { get; init; } public bool SupportsStreaming { get; init; } public IReadOnlyList RequiringClaims { get; init; } = []; public EndpointSchemaInfo? SchemaInfo { get; init; } } ``` ### Path Matching - ASP.NET-style route templates - Parameter segments: `{id}`, `{userId}` - Extra path segments are consumed only by explicit catch-all parameters (`{**path}`); ordinary terminal parameters must not behave like implicit catch-alls during messaging transport dispatch - Case sensitivity and trailing slash handling follow ASP.NET conventions --- ## Routing Algorithm ### Instance Selection Given `(ServiceName, Version, Method, Path)`: 1. **Filter candidates**: - Match `ServiceName` exactly - Match `Version` exactly (strict semver) - Health status in acceptable set (`Healthy` or `Degraded`) 2. **Region preference**: - Prefer instances where `Region == GatewayNodeConfig.Region` - Fall back to configured neighbor regions - Fall back to all other regions 3. **Within region tier**: - Prefer lower `AveragePingMs` - If tied, prefer more recent `LastHeartbeatUtc` - If still tied, use round-robin balancing ### Instance Health ```csharp public enum InstanceHealthStatus { Unknown, Healthy, Degraded, Draining, Unhealthy } ``` Health metadata per connection: | Field | Type | Description | |-------|------|-------------| | `Status` | enum | Current health status | | `LastHeartbeatUtc` | DateTime | Last heartbeat timestamp | | `AveragePingMs` | double | Average round-trip latency | --- ## Transport Layer ### Transport Types | Transport | Use Case | Streaming | Notes | |-----------|----------|-----------|-------| | InMemory | Testing | Yes | In-process channels | | TCP | Production | Yes | Length-prefixed frames | | TLS | Secure | Yes | Certificate-based encryption | | UDP | Small payloads | No | Single datagram per frame | | RabbitMQ | Queuing | Yes | Exchange/queue routing | ### Transport Plugin Interface ```csharp public interface ITransportServer { Task StartAsync(CancellationToken ct); Task StopAsync(CancellationToken ct); event Func OnHelloReceived; event Func OnHeartbeatReceived; event Func OnConnectionClosed; } public interface ITransportClient { Task ConnectAsync(CancellationToken ct); Task DisconnectAsync(CancellationToken ct); Task SendFrameAsync(Frame frame, CancellationToken ct); } ``` ### Frame Types ```csharp public enum FrameType : byte { Hello = 1, Heartbeat = 2, Request = 3, Response = 4, RequestStreamData = 5, ResponseStreamData = 6, Cancel = 7 } ``` --- ## Gateway Pipeline ### HTTP Middleware Stack ``` Request ─►│ ForwardedHeaders │ │ RequestLogging │ │ ErrorHandling │ │ Authentication │ │ EndpointResolution │ ◄── (Method, Path) → EndpointDescriptor │ Authorization │ ◄── RequiringClaims check │ RoutingDecision │ ◄── Select connection/instance │ TransportDispatch │ ◄── Send to microservice ▼ ``` ### Identity Header Policy and Tenant Selection - Gateway strips client-supplied reserved identity headers (`X-StellaOps-*`, legacy aliases, raw claim headers, and auth headers) before proxying. - Effective tenant is claim-derived from validated principal claims (`stellaops:tenant`, then bounded legacy `tid` fallback). - Per-request tenant override is disabled by default and only works when explicitly enabled with `Gateway:Auth:EnableTenantOverride=true` and the requested tenant exists in `stellaops:allowed_tenants`. - Authorization/DPoP passthrough is fail-closed: - route must be configured with `PreserveAuthHeaders=true`, and - route prefix must also be in the approved passthrough allow-list configured under `Gateway:Auth:ApprovedAuthPassthroughPrefixes`. - local frontdoor configs approve `/connect`, `/console`, `/authority`, `/doctor`, `/api`, `/policy/shadow`, and `/policy/simulations` so live policy compatibility endpoints can preserve DPoP/JWT passthrough without broadening unrelated routes. - Tenant override attempts are logged with deterministic fields including route, actor, requested tenant, and resolved tenant. ### Connection State Per-connection state maintained by Gateway: ```csharp public sealed class ConnectionState { public required string ConnectionId { get; init; } public required InstanceDescriptor Instance { get; init; } public InstanceHealthStatus Status { get; set; } public DateTime? LastHeartbeatUtc { get; set; } public double AveragePingMs { get; set; } public TransportType TransportType { get; init; } public Dictionary<(string Method, string Path), EndpointDescriptor> Endpoints { get; } = new(); public IReadOnlyDictionary Schemas { get; init; } = new Dictionary(); } ``` ### Payload Handling The Gateway treats bodies as opaque byte sequences: - No deserialization or schema interpretation - Headers and bytes forwarded as-is - Schema validation is microservice responsibility ### Payload Limits Configurable limits protect against resource exhaustion: | Limit | Scope | |-------|-------| | `MaxRequestBytesPerCall` | Single request | | `MaxRequestBytesPerConnection` | All requests on connection | | `MaxAggregateInflightBytes` | All in-flight across gateway | Exceeded limits result in: - Early rejection (HTTP 413) if `Content-Length` known - Mid-stream abort with CANCEL frame - Appropriate error response (413 or 503) --- ## Microservice SDK ### Configuration ```csharp services.AddStellaMicroservice(options => { options.ServiceName = "billing"; options.Version = "1.0.0"; options.Region = "us-east-1"; options.InstanceId = Guid.NewGuid().ToString(); options.ServiceDescription = "Invoice processing service"; }); ``` ### Endpoint Declaration Attributes: ```csharp [StellaEndpoint("POST", "/invoices")] public sealed class CreateInvoiceEndpoint : IStellaEndpoint ``` ### Handler Interfaces **Typed handler** (JSON serialization): ```csharp public interface IStellaEndpoint { Task HandleAsync(TRequest request, CancellationToken ct); } public interface IStellaEndpoint { Task HandleAsync(CancellationToken ct); } ``` **Raw handler** (streaming): ```csharp public interface IRawStellaEndpoint { Task HandleAsync(RawRequestContext ctx, CancellationToken ct); } ``` ### Endpoint Discovery Two mechanisms: 1. **Source Generator** (preferred): Compile-time discovery via Roslyn 2. **Reflection** (fallback): Runtime assembly scanning ### Connection Behavior On connection: 1. Send HELLO with instance info and endpoints 2. Start heartbeat timer 3. Listen for REQUEST frames HELLO payload: ```csharp public sealed class HelloPayload { public required InstanceDescriptor Instance { get; init; } public required IReadOnlyList Endpoints { get; init; } public IReadOnlyDictionary Schemas { get; init; } = new Dictionary(); public ServiceOpenApiInfo? OpenApiInfo { get; init; } } ``` --- ## Authorization ### Claims-based Model Authorization uses `RequiringClaims`, not roles: ```csharp public sealed class ClaimRequirement { public required string Type { get; init; } public string? Value { get; init; } } ``` ### Precedence 1. Microservice provides defaults in HELLO 2. Authority can override centrally 3. Gateway enforces final effective claims ### Enforcement Gateway `AuthorizationMiddleware`: - Validates user principal has all required claims - Empty claims list = authenticated access only - Missing claim = 403 Forbidden --- ## Cancellation ### CANCEL Frame ```csharp public sealed class CancelPayload { public required string Reason { get; init; } // Values: "ClientDisconnected", "Timeout", "PayloadLimitExceeded", "Shutdown" } ``` ### Gateway sends CANCEL when: - HTTP client disconnects (`HttpContext.RequestAborted`) - Request timeout elapses - Payload limit exceeded - Gateway shutdown ### Microservice handles CANCEL: - Maps correlation ID to `CancellationTokenSource` - Calls `Cancel()` on the source - Handler receives cancellation via `CancellationToken` --- ## Streaming ### Buffered vs Streaming | Mode | Request Body | Response Body | Use Case | |------|--------------|---------------|----------| | Buffered | Full in memory | Full in memory | Small payloads | | Streaming | Chunked frames | Chunked frames | Large payloads | ### Frame Flow (Streaming) ``` Gateway Microservice │ │ │ REQUEST (headers only) │ │ ────────────────────────────────────►│ │ │ │ REQUEST_STREAM_DATA (chunk 1) │ │ ────────────────────────────────────►│ │ │ │ REQUEST_STREAM_DATA (chunk n) │ │ ────────────────────────────────────►│ │ │ │ REQUEST_STREAM_DATA (final=true) │ │ ────────────────────────────────────►│ │ │ │ RESPONSE │ │◄────────────────────────────────────│ │ │ │ RESPONSE_STREAM_DATA │ │◄────────────────────────────────────│ ``` --- ## Heartbeat & Health ### Heartbeat Frame Sent at regular intervals over the same connection as requests: ```csharp public sealed class HeartbeatPayload { public required InstanceHealthStatus Status { get; init; } public int InflightRequests { get; init; } public double ErrorRate { get; init; } } ``` ### Health Tracking Gateway tracks: - `LastHeartbeatUtc` per connection - Derives status from heartbeat recency - Marks stale instances as Unhealthy - Uses health in routing decisions - Messaging transports stay push-first even when backed by notifiable queues; the missed-notification safety-net timeout is derived from the configured heartbeat interval and clamped to a short bounded window instead of falling back to a fixed long poll. - Gateway degraded and stale transitions are normalized against the messaging heartbeat contract. A gateway may not mark an instance `Degraded` earlier than `2x` the heartbeat interval or `Unhealthy` earlier than `3x` the heartbeat interval, even when looser defaults were configured. - `/health/ready` is stricter than "process started": it remains `503` until the configured required first-party microservices have live healthy or degraded registrations in router state. Local scratch compose uses this to hold the frontdoor unhealthy until the core Stella API surface has replayed HELLO after a rebuild. - The required-service list must use canonical router `serviceName` values, not loose product-family aliases. Gateway readiness normalizes host-style suffixes such as `-gateway`, `-web`, `.stella-ops.local`, and ports, but it does not treat sibling services as interchangeable. - When a request already matched a configured `Microservice` route but the target service has not registered yet, the gateway returns `503 Service Unavailable`, not `404 Not Found`. `404` remains reserved for genuinely unknown paths or missing endpoints on an otherwise registered service. Periodic HELLO re-registration is valid so a microservice can repopulate gateway state after a gateway restart, but it must refresh the existing logical transport connection instead of minting a second one. Gateway routing state also deduplicates by service instance identity (`ServiceName`, `Version`, `InstanceId`, transport) before re-indexing endpoints so repeated HELLO frames cannot accumulate stale route candidates. --- ## Configuration ### Router YAML ```yaml # router.yaml Gateway: Region: "us-east-1" NodeId: "gw-east-01" Environment: "production" PayloadLimits: MaxRequestBytesPerCall: 10485760 # 10 MB MaxRequestBytesPerConnection: 104857600 # 100 MB MaxAggregateInflightBytes: 1073741824 # 1 GB Services: - ServiceName: billing DefaultVersion: "1.0.0" DefaultTransport: Tcp Endpoints: - Method: POST Path: /invoices TimeoutSeconds: 30 RequiringClaims: - Type: "invoices:write" OpenApi: Title: "StellaOps Gateway API" CacheTtlSeconds: 60 ``` ### Hot Reload - YAML changes picked up at runtime - Routing state updated without restart - New services/endpoints added dynamically --- ## Error Mapping | Condition | HTTP Status | |-----------|-------------| | Version not found | 404 Not Found | | No healthy instance | 503 Service Unavailable | | Request timeout | 504 Gateway Timeout | | Payload too large | 413 Payload Too Large | | Unauthorized | 401 Unauthorized | | Missing claims | 403 Forbidden | | Validation error | 422 Unprocessable Entity | | Rate limit exceeded | 429 Too Many Requests | | Internal error | 500 Internal Server Error | --- ## See Also - [schema-validation.md](schema-validation.md) - JSON Schema validation - [openapi-aggregation.md](openapi-aggregation.md) - OpenAPI document generation - [migration-guide.md](migration-guide.md) - WebService to Microservice migration - [rate-limiting.md](rate-limiting.md) - Centralized Router rate limiting