22 KiB
Router Architecture
This document is the canonical specification for the StellaOps Router system.
Tenant selection and header propagation contract: docs/architecture/decisions/ADR-002-multi-tenant-same-api-key-selection.md
Service impact ledger: docs/technical/architecture/multi-tenant-service-impact-ledger.md
Flow sequences: docs/technical/architecture/multi-tenant-flow-sequences.md
Rollout policy: docs/operations/multi-tenant-rollout-and-compatibility.md
Location clarification (updated 2026-03-04). The Router (
src/Router/) hostsStellaOps.Gateway.WebServicewith configurable route tables viaGatewayRouteCatalog, reverse proxy support, SPA fallback hosting, WebSocket routing, Valkey messaging transport integration, andStellaOpsRouteResolverfor front-door dispatching. This is the canonical deployment for HTTP ingress. The standalonesrc/Gateway/was deleted in Sprint 200.
System Architecture
Scope
- A single HTTP ingress service (
StellaOps.Gateway.WebService) handles all external HTTP traffic - Microservices communicate with the Gateway using binary transports (TCP, TLS, UDP, RabbitMQ)
- HTTP is not used for internal microservice-to-gateway traffic
- Request/response bodies are opaque to the router (raw bytes/streams)
- Forwarded HTTP headers remain case-insensitive across Router frame transport and ASP.NET bridge dispatch; lowercase HTTP/2 names such as
content-typemust be preserved for JSON-bound endpoints, and the ASP.NET bridge must mark POST/PUT/PATCH requests as body-capable so minimal-API JSON binding survives frame dispatch - Gateway scope authorization evaluates against the resolved per-request scope set from identity expansion (
GatewayContextKeys.Scopes), so coarse compatibility scopes such asorch:quotacan satisfy their fine-grained frontdoor equivalents without changing downstream policy names
Transport Architecture
Each transport connection carries:
- Initial registration (HELLO) and endpoint configuration
- Ongoing heartbeats
- Request/response data frames
- Streaming data frames
- Cancellation frames
┌─────────────────┐ ┌─────────────────┐
│ Microservice │ │ Gateway │
│ │ HELLO │ │
│ Endpoints: │ ─────────────────────────►│ Routing │
│ - POST /items │ HEARTBEAT │ State │
│ - GET /items │ ◄────────────────────────►│ │
│ │ │ Connections[] │
│ │ REQUEST / RESPONSE │ │
│ │ ◄────────────────────────►│ │
│ │ │ │
│ │ STREAM_DATA / CANCEL │ │
│ │ ◄────────────────────────►│ │
└─────────────────┘ └─────────────────┘
Front Door (Configurable Route Table)
The Router Gateway serves as the single HTTP entry point for the entire StellaOps platform. In addition to binary transport routing for microservices, it handles:
- Static file serving (Angular SPA dist)
- Reverse proxy to HTTP-only backend services
- WebSocket proxy to upstream WebSocket servers
- SPA fallback (extensionless paths serve
index.html) - Custom error pages (404/500 HTML fallback)
Route Table Model
Routes are configured in Gateway:Routes as a StellaOpsRoute[] array, evaluated first-match-wins:
public sealed class StellaOpsRoute
{
public StellaOpsRouteType Type { get; set; }
public string Path { get; set; }
public bool IsRegex { get; set; }
public string? TranslatesTo { get; set; }
public Dictionary<string, string> Headers { get; set; }
}
Route types:
| Type | Behavior |
|---|---|
ReverseProxy |
Strip path prefix, forward to TranslatesTo HTTP URL |
StaticFiles |
Serve files from TranslatesTo directory, SPA fallback if x-spa-fallback: true header set |
StaticFile |
Serve a single file at exact path match |
WebSocket |
Bidirectional WebSocket proxy to TranslatesTo ws:// URL |
Microservice |
Pass through to binary transport pipeline |
NotFoundPage |
HTML file served on 404 (after all other middleware) |
ServerErrorPage |
HTML file served on 5xx (after all other middleware) |
Reverse proxy is reserved for external/bootstrap surfaces such as OIDC browser flows, Rekor, and frontdoor static assets. First-party Stella API surfaces are expected to use Microservice routing so the gateway remains the single routing authority instead of silently bypassing router registration state.
Pipeline Order
System paths (/health, /metrics, /openapi.*) bypass the route table entirely. The dispatch middleware runs before the microservice pipeline:
HealthCheckMiddleware → (system paths: health, metrics)
RouteDispatchMiddleware → (static files, reverse proxy, websocket)
MapRouterOpenApi → (OpenAPI endpoints)
UseWhen(non-system) → (microservice pipeline: auth, routing, transport)
ErrorPageFallbackMiddleware → (custom 404/500 pages)
Docker Architecture
Browser → Router Gateway (port 80) → [microservices via binary transport]
→ [HTTP backends via reverse proxy]
→ [Angular SPA from /app/wwwroot volume]
The Angular SPA dist is provided by a console-builder init container that copies the built files to a shared console-dist volume mounted at /app/wwwroot.
When the gateway runs in-container, listener binding must honor explicit ASPNETCORE_URLS / ASPNETCORE_HTTP_PORTS / ASPNETCORE_HTTPS_PORTS values from compose. Wildcard hosts (+, *) are normalized to 0.0.0.0 before Kestrel listeners are created so the declared HTTP frontdoor contract actually comes up.
Service Identity
Instance Identity
Each microservice instance is identified by:
| Field | Type | Description |
|---|---|---|
ServiceName |
string | Logical service name (e.g., "billing") |
Version |
string | Semantic version (major.minor.patch) |
Region |
string | Deployment region (e.g., "us-east-1") |
InstanceId |
string | Unique instance identifier |
Version Matching
- Version matching is strict semver equality
- Router only routes to instances with exact version match
- Default version used when client doesn't specify
Region Configuration
Gateway region comes from GatewayNodeConfig:
public sealed class GatewayNodeConfig
{
public required string Region { get; init; } // e.g., "eu1"
public required string NodeId { get; init; } // e.g., "gw-eu1-01"
public required string Environment { get; init; } // e.g., "prod"
}
Region is never derived from HTTP headers or URL hostnames.
Endpoint Model
Endpoint Identity
Endpoint identity is (HTTP Method, Path):
| Field | Example |
|---|---|
| Method | GET, POST, PUT, PATCH, DELETE |
| Path | /invoices, /items/{id}, /users/{userId}/orders |
Endpoint Descriptor
Each endpoint includes:
public sealed class EndpointDescriptor
{
public required string Method { get; init; }
public required string Path { get; init; }
public required string ServiceName { get; init; }
public required string Version { get; init; }
public TimeSpan DefaultTimeout { get; init; }
public bool SupportsStreaming { get; init; }
public IReadOnlyList<ClaimRequirement> RequiringClaims { get; init; } = [];
public EndpointSchemaInfo? SchemaInfo { get; init; }
}
Path Matching
- ASP.NET-style route templates
- Parameter segments:
{id},{userId} - Extra path segments are consumed only by explicit catch-all parameters (
{**path}); ordinary terminal parameters must not behave like implicit catch-alls during messaging transport dispatch - Case sensitivity and trailing slash handling follow ASP.NET conventions
Routing Algorithm
Instance Selection
Given (ServiceName, Version, Method, Path):
-
Filter candidates:
- Match
ServiceNameexactly - Match
Versionexactly (strict semver) - Health status in acceptable set (
HealthyorDegraded)
- Match
-
Region preference:
- Prefer instances where
Region == GatewayNodeConfig.Region - Fall back to configured neighbor regions
- Fall back to all other regions
- Prefer instances where
-
Within region tier:
- Prefer lower
AveragePingMs - If tied, prefer more recent
LastHeartbeatUtc - If still tied, use round-robin balancing
- Prefer lower
Instance Health
public enum InstanceHealthStatus
{
Unknown,
Healthy,
Degraded,
Draining,
Unhealthy
}
Health metadata per connection:
| Field | Type | Description |
|---|---|---|
Status |
enum | Current health status |
LastHeartbeatUtc |
DateTime | Last heartbeat timestamp |
AveragePingMs |
double | Average round-trip latency |
Transport Layer
Transport Types
| Transport | Use Case | Streaming | Notes |
|---|---|---|---|
| InMemory | Testing | Yes | In-process channels |
| TCP | Production | Yes | Length-prefixed frames |
| TLS | Secure | Yes | Certificate-based encryption |
| UDP | Small payloads | No | Single datagram per frame |
| RabbitMQ | Queuing | Yes | Exchange/queue routing |
Transport Plugin Interface
public interface ITransportServer
{
Task StartAsync(CancellationToken ct);
Task StopAsync(CancellationToken ct);
event Func<ConnectionState, HelloPayload, Task> OnHelloReceived;
event Func<ConnectionState, HeartbeatPayload, Task> OnHeartbeatReceived;
event Func<string, Task> OnConnectionClosed;
}
public interface ITransportClient
{
Task ConnectAsync(CancellationToken ct);
Task DisconnectAsync(CancellationToken ct);
Task SendFrameAsync(Frame frame, CancellationToken ct);
}
Frame Types
public enum FrameType : byte
{
Hello = 1,
Heartbeat = 2,
Request = 3,
Response = 4,
RequestStreamData = 5,
ResponseStreamData = 6,
Cancel = 7
}
Gateway Pipeline
HTTP Middleware Stack
Request ─►│ ForwardedHeaders │
│ RequestLogging │
│ ErrorHandling │
│ Authentication │
│ EndpointResolution │ ◄── (Method, Path) → EndpointDescriptor
│ Authorization │ ◄── RequiringClaims check
│ RoutingDecision │ ◄── Select connection/instance
│ TransportDispatch │ ◄── Send to microservice
▼
Identity Header Policy and Tenant Selection
- Gateway strips client-supplied reserved identity headers (
X-StellaOps-*, legacy aliases, raw claim headers, and auth headers) before proxying. - Effective tenant is claim-derived from validated principal claims (
stellaops:tenant, then bounded legacytidfallback). - Per-request tenant override is disabled by default and only works when explicitly enabled with
Gateway:Auth:EnableTenantOverride=trueand the requested tenant exists instellaops:allowed_tenants. - Authorization/DPoP passthrough is fail-closed:
- route must be configured with
PreserveAuthHeaders=true, and - route prefix must also be in the approved passthrough allow-list configured under
Gateway:Auth:ApprovedAuthPassthroughPrefixes. - local frontdoor configs approve
/connect,/console,/authority,/doctor,/api,/policy/shadow, and/policy/simulationsso live policy compatibility endpoints can preserve DPoP/JWT passthrough without broadening unrelated routes. - Tenant override attempts are logged with deterministic fields including route, actor, requested tenant, and resolved tenant.
Connection State
Per-connection state maintained by Gateway:
public sealed class ConnectionState
{
public required string ConnectionId { get; init; }
public required InstanceDescriptor Instance { get; init; }
public InstanceHealthStatus Status { get; set; }
public DateTime? LastHeartbeatUtc { get; set; }
public double AveragePingMs { get; set; }
public TransportType TransportType { get; init; }
public Dictionary<(string Method, string Path), EndpointDescriptor> Endpoints { get; } = new();
public IReadOnlyDictionary<string, SchemaDefinition> Schemas { get; init; } = new Dictionary<string, SchemaDefinition>();
}
Payload Handling
The Gateway treats bodies as opaque byte sequences:
- No deserialization or schema interpretation
- Headers and bytes forwarded as-is
- Schema validation is microservice responsibility
Payload Limits
Configurable limits protect against resource exhaustion:
| Limit | Scope |
|---|---|
MaxRequestBytesPerCall |
Single request |
MaxRequestBytesPerConnection |
All requests on connection |
MaxAggregateInflightBytes |
All in-flight across gateway |
Exceeded limits result in:
- Early rejection (HTTP 413) if
Content-Lengthknown - Mid-stream abort with CANCEL frame
- Appropriate error response (413 or 503)
Microservice SDK
Configuration
services.AddStellaMicroservice(options =>
{
options.ServiceName = "billing";
options.Version = "1.0.0";
options.Region = "us-east-1";
options.InstanceId = Guid.NewGuid().ToString();
options.ServiceDescription = "Invoice processing service";
});
Endpoint Declaration
Attributes:
[StellaEndpoint("POST", "/invoices")]
public sealed class CreateInvoiceEndpoint : IStellaEndpoint<CreateInvoiceRequest, CreateInvoiceResponse>
Handler Interfaces
Typed handler (JSON serialization):
public interface IStellaEndpoint<TRequest, TResponse>
{
Task<TResponse> HandleAsync(TRequest request, CancellationToken ct);
}
public interface IStellaEndpoint<TResponse>
{
Task<TResponse> HandleAsync(CancellationToken ct);
}
Raw handler (streaming):
public interface IRawStellaEndpoint
{
Task<RawResponse> HandleAsync(RawRequestContext ctx, CancellationToken ct);
}
Endpoint Discovery
Two mechanisms:
- Source Generator (preferred): Compile-time discovery via Roslyn
- Reflection (fallback): Runtime assembly scanning
Connection Behavior
On connection:
- Send HELLO with instance info and endpoints
- Start heartbeat timer
- Listen for REQUEST frames
HELLO payload:
public sealed class HelloPayload
{
public required InstanceDescriptor Instance { get; init; }
public required IReadOnlyList<EndpointDescriptor> Endpoints { get; init; }
public IReadOnlyDictionary<string, SchemaDefinition> Schemas { get; init; } = new Dictionary<string, SchemaDefinition>();
public ServiceOpenApiInfo? OpenApiInfo { get; init; }
}
Authorization
Claims-based Model
Authorization uses RequiringClaims, not roles:
public sealed class ClaimRequirement
{
public required string Type { get; init; }
public string? Value { get; init; }
}
Precedence
- Microservice provides defaults in HELLO
- Authority can override centrally
- Gateway enforces final effective claims
Enforcement
Gateway AuthorizationMiddleware:
- Validates user principal has all required claims
- Empty claims list = authenticated access only
- Missing claim = 403 Forbidden
Cancellation
CANCEL Frame
public sealed class CancelPayload
{
public required string Reason { get; init; }
// Values: "ClientDisconnected", "Timeout", "PayloadLimitExceeded", "Shutdown"
}
Gateway sends CANCEL when:
- HTTP client disconnects (
HttpContext.RequestAborted) - Request timeout elapses
- Payload limit exceeded
- Gateway shutdown
Microservice handles CANCEL:
- Maps correlation ID to
CancellationTokenSource - Calls
Cancel()on the source - Handler receives cancellation via
CancellationToken
Streaming
Buffered vs Streaming
| Mode | Request Body | Response Body | Use Case |
|---|---|---|---|
| Buffered | Full in memory | Full in memory | Small payloads |
| Streaming | Chunked frames | Chunked frames | Large payloads |
Frame Flow (Streaming)
Gateway Microservice
│ │
│ REQUEST (headers only) │
│ ────────────────────────────────────►│
│ │
│ REQUEST_STREAM_DATA (chunk 1) │
│ ────────────────────────────────────►│
│ │
│ REQUEST_STREAM_DATA (chunk n) │
│ ────────────────────────────────────►│
│ │
│ REQUEST_STREAM_DATA (final=true) │
│ ────────────────────────────────────►│
│ │
│ RESPONSE │
│◄────────────────────────────────────│
│ │
│ RESPONSE_STREAM_DATA │
│◄────────────────────────────────────│
Heartbeat & Health
Heartbeat Frame
Sent at regular intervals over the same connection as requests:
public sealed class HeartbeatPayload
{
public required InstanceHealthStatus Status { get; init; }
public int InflightRequests { get; init; }
public double ErrorRate { get; init; }
}
Health Tracking
Gateway tracks:
LastHeartbeatUtcper connection- Derives status from heartbeat recency
- Marks stale instances as Unhealthy
- Uses health in routing decisions
- Messaging transports stay push-first even when backed by notifiable queues; the missed-notification safety-net timeout is derived from the configured heartbeat interval and clamped to a short bounded window instead of falling back to a fixed long poll.
- Gateway degraded and stale transitions are normalized against the messaging heartbeat contract. A gateway may not mark an instance
Degradedearlier than2xthe heartbeat interval orUnhealthyearlier than3xthe heartbeat interval, even when looser defaults were configured. /health/readyis stricter than "process started": it remains503until the configured required first-party microservices have live healthy or degraded registrations in router state. Local scratch compose uses this to hold the frontdoor unhealthy until the core Stella API surface has replayed HELLO after a rebuild.- The required-service list must use canonical router
serviceNamevalues, not loose product-family aliases. Gateway readiness normalizes host-style suffixes such as-gateway,-web,.stella-ops.local, and ports, but it does not treat sibling services as interchangeable. - When a request already matched a configured
Microserviceroute but the target service has not registered yet, the gateway returns503 Service Unavailable, not404 Not Found.404remains reserved for genuinely unknown paths or missing endpoints on an otherwise registered service.
Periodic HELLO re-registration is valid so a microservice can repopulate gateway state after a gateway restart, but it must refresh the existing logical transport connection instead of minting a second one. Gateway routing state also deduplicates by service instance identity (ServiceName, Version, InstanceId, transport) before re-indexing endpoints so repeated HELLO frames cannot accumulate stale route candidates.
Configuration
Router YAML
# router.yaml
Gateway:
Region: "us-east-1"
NodeId: "gw-east-01"
Environment: "production"
PayloadLimits:
MaxRequestBytesPerCall: 10485760 # 10 MB
MaxRequestBytesPerConnection: 104857600 # 100 MB
MaxAggregateInflightBytes: 1073741824 # 1 GB
Services:
- ServiceName: billing
DefaultVersion: "1.0.0"
DefaultTransport: Tcp
Endpoints:
- Method: POST
Path: /invoices
TimeoutSeconds: 30
RequiringClaims:
- Type: "invoices:write"
OpenApi:
Title: "StellaOps Gateway API"
CacheTtlSeconds: 60
Hot Reload
- YAML changes picked up at runtime
- Routing state updated without restart
- New services/endpoints added dynamically
Error Mapping
| Condition | HTTP Status |
|---|---|
| Version not found | 404 Not Found |
| No healthy instance | 503 Service Unavailable |
| Request timeout | 504 Gateway Timeout |
| Payload too large | 413 Payload Too Large |
| Unauthorized | 401 Unauthorized |
| Missing claims | 403 Forbidden |
| Validation error | 422 Unprocessable Entity |
| Rate limit exceeded | 429 Too Many Requests |
| Internal error | 500 Internal Server Error |
See Also
- schema-validation.md - JSON Schema validation
- openapi-aggregation.md - OpenAPI document generation
- migration-guide.md - WebService to Microservice migration
- rate-limiting.md - Centralized Router rate limiting