# Router Architecture This document is the canonical specification for the StellaOps Router system. Tenant selection and header propagation contract: `docs/architecture/decisions/ADR-002-multi-tenant-same-api-key-selection.md` Service impact ledger: `docs/technical/architecture/multi-tenant-service-impact-ledger.md` Flow sequences: `docs/technical/architecture/multi-tenant-flow-sequences.md` Rollout policy: `docs/operations/multi-tenant-rollout-and-compatibility.md` > **Location clarification (updated 2026-03-04).** The Router (`src/Router/`) hosts `StellaOps.Gateway.WebService` with configurable route tables via `GatewayRouteCatalog`, reverse proxy support, SPA fallback hosting, WebSocket routing, Valkey messaging transport integration, and `StellaOpsRouteResolver` for front-door dispatching. This is the canonical deployment for HTTP ingress. The standalone `src/Gateway/` was deleted in Sprint 200. ## System Architecture ### Scope - A single HTTP ingress service (`StellaOps.Gateway.WebService`) handles all external HTTP traffic - Microservices communicate with the Gateway using binary transports (TCP, TLS, UDP, RabbitMQ) - HTTP is not used for internal microservice-to-gateway traffic - Request/response bodies are opaque to the router (raw bytes/streams) - Forwarded HTTP headers remain case-insensitive across Router frame transport and ASP.NET bridge dispatch; lowercase HTTP/2 names such as `content-type` must be preserved for JSON-bound endpoints, and the ASP.NET bridge must mark POST/PUT/PATCH requests as body-capable so minimal-API JSON binding survives frame dispatch ### Transport Architecture Each transport connection carries: - Initial registration (HELLO) and endpoint configuration - Ongoing heartbeats - Request/response data frames - Streaming data frames - Cancellation frames ``` ┌─────────────────┐ ┌─────────────────┐ │ Microservice │ │ Gateway │ │ │ HELLO │ │ │ Endpoints: │ ─────────────────────────►│ Routing │ │ - POST /items │ HEARTBEAT │ State │ │ - GET /items │ ◄────────────────────────►│ │ │ │ │ Connections[] │ │ │ REQUEST / RESPONSE │ │ │ │ ◄────────────────────────►│ │ │ │ │ │ │ │ STREAM_DATA / CANCEL │ │ │ │ ◄────────────────────────►│ │ └─────────────────┘ └─────────────────┘ ``` --- ## Front Door (Configurable Route Table) The Router Gateway serves as the **single HTTP entry point** for the entire StellaOps platform. In addition to binary transport routing for microservices, it handles: - **Static file serving** (Angular SPA dist) - **Reverse proxy** to HTTP-only backend services - **WebSocket proxy** to upstream WebSocket servers - **SPA fallback** (extensionless paths serve `index.html`) - **Custom error pages** (404/500 HTML fallback) ### Route Table Model Routes are configured in `Gateway:Routes` as a `StellaOpsRoute[]` array, evaluated **first-match-wins**: ```csharp public sealed class StellaOpsRoute { public StellaOpsRouteType Type { get; set; } public string Path { get; set; } public bool IsRegex { get; set; } public string? TranslatesTo { get; set; } public Dictionary Headers { get; set; } } ``` Route types: | Type | Behavior | |------|----------| | `ReverseProxy` | Strip path prefix, forward to `TranslatesTo` HTTP URL | | `StaticFiles` | Serve files from `TranslatesTo` directory, SPA fallback if `x-spa-fallback: true` header set | | `StaticFile` | Serve a single file at exact path match | | `WebSocket` | Bidirectional WebSocket proxy to `TranslatesTo` ws:// URL | | `Microservice` | Pass through to binary transport pipeline | | `NotFoundPage` | HTML file served on 404 (after all other middleware) | | `ServerErrorPage` | HTML file served on 5xx (after all other middleware) | ### Pipeline Order System paths (`/health`, `/metrics`, `/openapi.*`) bypass the route table entirely. The dispatch middleware runs before the microservice pipeline: ``` HealthCheckMiddleware → (system paths: health, metrics) RouteDispatchMiddleware → (static files, reverse proxy, websocket) MapRouterOpenApi → (OpenAPI endpoints) UseWhen(non-system) → (microservice pipeline: auth, routing, transport) ErrorPageFallbackMiddleware → (custom 404/500 pages) ``` ### Docker Architecture ``` Browser → Router Gateway (port 80) → [microservices via binary transport] → [HTTP backends via reverse proxy] → [Angular SPA from /app/wwwroot volume] ``` The Angular SPA dist is provided by a `console-builder` init container that copies the built files to a shared `console-dist` volume mounted at `/app/wwwroot`. --- ## Service Identity ### Instance Identity Each microservice instance is identified by: | Field | Type | Description | |-------|------|-------------| | `ServiceName` | string | Logical service name (e.g., "billing") | | `Version` | string | Semantic version (`major.minor.patch`) | | `Region` | string | Deployment region (e.g., "us-east-1") | | `InstanceId` | string | Unique instance identifier | ### Version Matching - Version matching is strict semver equality - Router only routes to instances with exact version match - Default version used when client doesn't specify ### Region Configuration Gateway region comes from `GatewayNodeConfig`: ```csharp public sealed class GatewayNodeConfig { public required string Region { get; init; } // e.g., "eu1" public required string NodeId { get; init; } // e.g., "gw-eu1-01" public required string Environment { get; init; } // e.g., "prod" } ``` Region is never derived from HTTP headers or URL hostnames. --- ## Endpoint Model ### Endpoint Identity Endpoint identity is `(HTTP Method, Path)`: | Field | Example | |-------|---------| | Method | `GET`, `POST`, `PUT`, `PATCH`, `DELETE` | | Path | `/invoices`, `/items/{id}`, `/users/{userId}/orders` | ### Endpoint Descriptor Each endpoint includes: ```csharp public sealed class EndpointDescriptor { public required string Method { get; init; } public required string Path { get; init; } public required string ServiceName { get; init; } public required string Version { get; init; } public TimeSpan DefaultTimeout { get; init; } public bool SupportsStreaming { get; init; } public IReadOnlyList RequiringClaims { get; init; } = []; public EndpointSchemaInfo? SchemaInfo { get; init; } } ``` ### Path Matching - ASP.NET-style route templates - Parameter segments: `{id}`, `{userId}` - Case sensitivity and trailing slash handling follow ASP.NET conventions --- ## Routing Algorithm ### Instance Selection Given `(ServiceName, Version, Method, Path)`: 1. **Filter candidates**: - Match `ServiceName` exactly - Match `Version` exactly (strict semver) - Health status in acceptable set (`Healthy` or `Degraded`) 2. **Region preference**: - Prefer instances where `Region == GatewayNodeConfig.Region` - Fall back to configured neighbor regions - Fall back to all other regions 3. **Within region tier**: - Prefer lower `AveragePingMs` - If tied, prefer more recent `LastHeartbeatUtc` - If still tied, use round-robin balancing ### Instance Health ```csharp public enum InstanceHealthStatus { Unknown, Healthy, Degraded, Draining, Unhealthy } ``` Health metadata per connection: | Field | Type | Description | |-------|------|-------------| | `Status` | enum | Current health status | | `LastHeartbeatUtc` | DateTime | Last heartbeat timestamp | | `AveragePingMs` | double | Average round-trip latency | --- ## Transport Layer ### Transport Types | Transport | Use Case | Streaming | Notes | |-----------|----------|-----------|-------| | InMemory | Testing | Yes | In-process channels | | TCP | Production | Yes | Length-prefixed frames | | TLS | Secure | Yes | Certificate-based encryption | | UDP | Small payloads | No | Single datagram per frame | | RabbitMQ | Queuing | Yes | Exchange/queue routing | ### Transport Plugin Interface ```csharp public interface ITransportServer { Task StartAsync(CancellationToken ct); Task StopAsync(CancellationToken ct); event Func OnHelloReceived; event Func OnHeartbeatReceived; event Func OnConnectionClosed; } public interface ITransportClient { Task ConnectAsync(CancellationToken ct); Task DisconnectAsync(CancellationToken ct); Task SendFrameAsync(Frame frame, CancellationToken ct); } ``` ### Frame Types ```csharp public enum FrameType : byte { Hello = 1, Heartbeat = 2, Request = 3, Response = 4, RequestStreamData = 5, ResponseStreamData = 6, Cancel = 7 } ``` --- ## Gateway Pipeline ### HTTP Middleware Stack ``` Request ─►│ ForwardedHeaders │ │ RequestLogging │ │ ErrorHandling │ │ Authentication │ │ EndpointResolution │ ◄── (Method, Path) → EndpointDescriptor │ Authorization │ ◄── RequiringClaims check │ RoutingDecision │ ◄── Select connection/instance │ TransportDispatch │ ◄── Send to microservice ▼ ``` ### Identity Header Policy and Tenant Selection - Gateway strips client-supplied reserved identity headers (`X-StellaOps-*`, legacy aliases, raw claim headers, and auth headers) before proxying. - Effective tenant is claim-derived from validated principal claims (`stellaops:tenant`, then bounded legacy `tid` fallback). - Per-request tenant override is disabled by default and only works when explicitly enabled with `Gateway:Auth:EnableTenantOverride=true` and the requested tenant exists in `stellaops:allowed_tenants`. - Authorization/DPoP passthrough is fail-closed: - route must be configured with `PreserveAuthHeaders=true`, and - route prefix must also be in the approved passthrough allow-list (`/connect`, `/console`, `/authority`, `/doctor`, `/api`). - Tenant override attempts are logged with deterministic fields including route, actor, requested tenant, and resolved tenant. ### Connection State Per-connection state maintained by Gateway: ```csharp public sealed class ConnectionState { public required string ConnectionId { get; init; } public required InstanceDescriptor Instance { get; init; } public InstanceHealthStatus Status { get; set; } public DateTime? LastHeartbeatUtc { get; set; } public double AveragePingMs { get; set; } public TransportType TransportType { get; init; } public Dictionary<(string Method, string Path), EndpointDescriptor> Endpoints { get; } = new(); public IReadOnlyDictionary Schemas { get; init; } = new Dictionary(); } ``` ### Payload Handling The Gateway treats bodies as opaque byte sequences: - No deserialization or schema interpretation - Headers and bytes forwarded as-is - Schema validation is microservice responsibility ### Payload Limits Configurable limits protect against resource exhaustion: | Limit | Scope | |-------|-------| | `MaxRequestBytesPerCall` | Single request | | `MaxRequestBytesPerConnection` | All requests on connection | | `MaxAggregateInflightBytes` | All in-flight across gateway | Exceeded limits result in: - Early rejection (HTTP 413) if `Content-Length` known - Mid-stream abort with CANCEL frame - Appropriate error response (413 or 503) --- ## Microservice SDK ### Configuration ```csharp services.AddStellaMicroservice(options => { options.ServiceName = "billing"; options.Version = "1.0.0"; options.Region = "us-east-1"; options.InstanceId = Guid.NewGuid().ToString(); options.ServiceDescription = "Invoice processing service"; }); ``` ### Endpoint Declaration Attributes: ```csharp [StellaEndpoint("POST", "/invoices")] public sealed class CreateInvoiceEndpoint : IStellaEndpoint ``` ### Handler Interfaces **Typed handler** (JSON serialization): ```csharp public interface IStellaEndpoint { Task HandleAsync(TRequest request, CancellationToken ct); } public interface IStellaEndpoint { Task HandleAsync(CancellationToken ct); } ``` **Raw handler** (streaming): ```csharp public interface IRawStellaEndpoint { Task HandleAsync(RawRequestContext ctx, CancellationToken ct); } ``` ### Endpoint Discovery Two mechanisms: 1. **Source Generator** (preferred): Compile-time discovery via Roslyn 2. **Reflection** (fallback): Runtime assembly scanning ### Connection Behavior On connection: 1. Send HELLO with instance info and endpoints 2. Start heartbeat timer 3. Listen for REQUEST frames HELLO payload: ```csharp public sealed class HelloPayload { public required InstanceDescriptor Instance { get; init; } public required IReadOnlyList Endpoints { get; init; } public IReadOnlyDictionary Schemas { get; init; } = new Dictionary(); public ServiceOpenApiInfo? OpenApiInfo { get; init; } } ``` --- ## Authorization ### Claims-based Model Authorization uses `RequiringClaims`, not roles: ```csharp public sealed class ClaimRequirement { public required string Type { get; init; } public string? Value { get; init; } } ``` ### Precedence 1. Microservice provides defaults in HELLO 2. Authority can override centrally 3. Gateway enforces final effective claims ### Enforcement Gateway `AuthorizationMiddleware`: - Validates user principal has all required claims - Empty claims list = authenticated access only - Missing claim = 403 Forbidden --- ## Cancellation ### CANCEL Frame ```csharp public sealed class CancelPayload { public required string Reason { get; init; } // Values: "ClientDisconnected", "Timeout", "PayloadLimitExceeded", "Shutdown" } ``` ### Gateway sends CANCEL when: - HTTP client disconnects (`HttpContext.RequestAborted`) - Request timeout elapses - Payload limit exceeded - Gateway shutdown ### Microservice handles CANCEL: - Maps correlation ID to `CancellationTokenSource` - Calls `Cancel()` on the source - Handler receives cancellation via `CancellationToken` --- ## Streaming ### Buffered vs Streaming | Mode | Request Body | Response Body | Use Case | |------|--------------|---------------|----------| | Buffered | Full in memory | Full in memory | Small payloads | | Streaming | Chunked frames | Chunked frames | Large payloads | ### Frame Flow (Streaming) ``` Gateway Microservice │ │ │ REQUEST (headers only) │ │ ────────────────────────────────────►│ │ │ │ REQUEST_STREAM_DATA (chunk 1) │ │ ────────────────────────────────────►│ │ │ │ REQUEST_STREAM_DATA (chunk n) │ │ ────────────────────────────────────►│ │ │ │ REQUEST_STREAM_DATA (final=true) │ │ ────────────────────────────────────►│ │ │ │ RESPONSE │ │◄────────────────────────────────────│ │ │ │ RESPONSE_STREAM_DATA │ │◄────────────────────────────────────│ ``` --- ## Heartbeat & Health ### Heartbeat Frame Sent at regular intervals over the same connection as requests: ```csharp public sealed class HeartbeatPayload { public required InstanceHealthStatus Status { get; init; } public int InflightRequests { get; init; } public double ErrorRate { get; init; } } ``` ### Health Tracking Gateway tracks: - `LastHeartbeatUtc` per connection - Derives status from heartbeat recency - Marks stale instances as Unhealthy - Uses health in routing decisions Periodic HELLO re-registration is valid so a microservice can repopulate gateway state after a gateway restart, but it must refresh the existing logical transport connection instead of minting a second one. Gateway routing state also deduplicates by service instance identity (`ServiceName`, `Version`, `InstanceId`, transport) before re-indexing endpoints so repeated HELLO frames cannot accumulate stale route candidates. --- ## Configuration ### Router YAML ```yaml # router.yaml Gateway: Region: "us-east-1" NodeId: "gw-east-01" Environment: "production" PayloadLimits: MaxRequestBytesPerCall: 10485760 # 10 MB MaxRequestBytesPerConnection: 104857600 # 100 MB MaxAggregateInflightBytes: 1073741824 # 1 GB Services: - ServiceName: billing DefaultVersion: "1.0.0" DefaultTransport: Tcp Endpoints: - Method: POST Path: /invoices TimeoutSeconds: 30 RequiringClaims: - Type: "invoices:write" OpenApi: Title: "StellaOps Gateway API" CacheTtlSeconds: 60 ``` ### Hot Reload - YAML changes picked up at runtime - Routing state updated without restart - New services/endpoints added dynamically --- ## Error Mapping | Condition | HTTP Status | |-----------|-------------| | Version not found | 404 Not Found | | No healthy instance | 503 Service Unavailable | | Request timeout | 504 Gateway Timeout | | Payload too large | 413 Payload Too Large | | Unauthorized | 401 Unauthorized | | Missing claims | 403 Forbidden | | Validation error | 422 Unprocessable Entity | | Rate limit exceeded | 429 Too Many Requests | | Internal error | 500 Internal Server Error | --- ## See Also - [schema-validation.md](schema-validation.md) - JSON Schema validation - [openapi-aggregation.md](openapi-aggregation.md) - OpenAPI document generation - [migration-guide.md](migration-guide.md) - WebService to Microservice migration - [rate-limiting.md](rate-limiting.md) - Centralized Router rate limiting