Files
git.stella-ops.org/docs/modules/router/architecture.md

22 KiB

Router Architecture

This document is the canonical specification for the StellaOps Router system.

Tenant selection and header propagation contract: docs/architecture/decisions/ADR-002-multi-tenant-same-api-key-selection.md Service impact ledger: docs/technical/architecture/multi-tenant-service-impact-ledger.md Flow sequences: docs/technical/architecture/multi-tenant-flow-sequences.md Rollout policy: docs/operations/multi-tenant-rollout-and-compatibility.md

Location clarification (updated 2026-03-04). The Router (src/Router/) hosts StellaOps.Gateway.WebService with configurable route tables via GatewayRouteCatalog, reverse proxy support, SPA fallback hosting, WebSocket routing, Valkey messaging transport integration, and StellaOpsRouteResolver for front-door dispatching. This is the canonical deployment for HTTP ingress. The standalone src/Gateway/ was deleted in Sprint 200.

System Architecture

Scope

  • A single HTTP ingress service (StellaOps.Gateway.WebService) handles all external HTTP traffic
  • Microservices communicate with the Gateway using binary transports (TCP, TLS, UDP, RabbitMQ)
  • HTTP is not used for internal microservice-to-gateway traffic
  • Request/response bodies are opaque to the router (raw bytes/streams)
  • Forwarded HTTP headers remain case-insensitive across Router frame transport and ASP.NET bridge dispatch; lowercase HTTP/2 names such as content-type must be preserved for JSON-bound endpoints, and the ASP.NET bridge must mark POST/PUT/PATCH requests as body-capable so minimal-API JSON binding survives frame dispatch
  • Gateway scope authorization evaluates against the resolved per-request scope set from identity expansion (GatewayContextKeys.Scopes), so coarse compatibility scopes such as orch:quota can satisfy their fine-grained frontdoor equivalents without changing downstream policy names

Transport Architecture

Each transport connection carries:

  • Initial registration (HELLO) and endpoint configuration
  • Ongoing heartbeats
  • Request/response data frames
  • Streaming data frames
  • Cancellation frames
┌─────────────────┐                           ┌─────────────────┐
│   Microservice  │                           │     Gateway     │
│                 │         HELLO             │                 │
│   Endpoints:    │ ─────────────────────────►│   Routing       │
│   - POST /items │         HEARTBEAT         │   State         │
│   - GET /items  │ ◄────────────────────────►│                 │
│                 │                           │   Connections[] │
│                 │  REQUEST / RESPONSE       │                 │
│                 │ ◄────────────────────────►│                 │
│                 │                           │                 │
│                 │  STREAM_DATA / CANCEL     │                 │
│                 │ ◄────────────────────────►│                 │
└─────────────────┘                           └─────────────────┘

Front Door (Configurable Route Table)

The Router Gateway serves as the single HTTP entry point for the entire StellaOps platform. In addition to binary transport routing for microservices, it handles:

  • Static file serving (Angular SPA dist)
  • Reverse proxy to HTTP-only backend services
  • WebSocket proxy to upstream WebSocket servers
  • SPA fallback (extensionless paths serve index.html)
  • Custom error pages (404/500 HTML fallback)

Route Table Model

Routes are configured in Gateway:Routes as a StellaOpsRoute[] array, evaluated first-match-wins:

public sealed class StellaOpsRoute
{
    public StellaOpsRouteType Type { get; set; }
    public string Path { get; set; }
    public bool IsRegex { get; set; }
    public string? TranslatesTo { get; set; }
    public Dictionary<string, string> Headers { get; set; }
}

Route types:

Type Behavior
ReverseProxy Strip path prefix, forward to TranslatesTo HTTP URL
StaticFiles Serve files from TranslatesTo directory, SPA fallback if x-spa-fallback: true header set
StaticFile Serve a single file at exact path match
WebSocket Bidirectional WebSocket proxy to TranslatesTo ws:// URL
Microservice Pass through to binary transport pipeline
NotFoundPage HTML file served on 404 (after all other middleware)
ServerErrorPage HTML file served on 5xx (after all other middleware)

Reverse proxy is reserved for external/bootstrap surfaces such as OIDC browser flows, Rekor, and frontdoor static assets. First-party Stella API surfaces are expected to use Microservice routing so the gateway remains the single routing authority instead of silently bypassing router registration state.

Regex microservice routes that own a root prefix must use a segment boundary when the same prefix can appear in static asset filenames. The local frontdoor uses ^/policy(?=/|$)(.*) rather than ^/policy(.*) so Angular chunks such as /policy-decisioning.routes-*.js stay on the SPA/static path instead of being misrouted to the Policy service.

Browser-facing compatibility prefixes that exist only at the frontdoor must strip that prefix before dispatching to the target microservice. Local compose keeps /doctor/api/v1/doctor/* and /scheduler/api/v1/scheduler/* for the shell, but the route table translates them to http://doctor.stella-ops.local$1 and http://scheduler.stella-ops.local$1 so the backend still receives its canonical /api/v1/<service>/* path.

Pipeline Order

System paths (/health, /metrics, /openapi.*) bypass the route table entirely. The dispatch middleware runs before the microservice pipeline:

HealthCheckMiddleware  →  (system paths: health, metrics)
RouteDispatchMiddleware →  (static files, reverse proxy, websocket)
MapRouterOpenApi       →  (OpenAPI endpoints)
UseWhen(non-system)    →  (microservice pipeline: auth, routing, transport)
ErrorPageFallbackMiddleware → (custom 404/500 pages)

Docker Architecture

Browser → Router Gateway (port 80) → [microservices via binary transport]
                                    → [HTTP backends via reverse proxy]
                                    → [Angular SPA from /app/wwwroot volume]

The Angular SPA dist is provided by a console-builder init container that copies the built files to a shared console-dist volume mounted at /app/wwwroot.

When the gateway runs in-container, listener binding must honor explicit ASPNETCORE_URLS / ASPNETCORE_HTTP_PORTS / ASPNETCORE_HTTPS_PORTS values from compose. Wildcard hosts (+, *) are normalized to 0.0.0.0 before Kestrel listeners are created so the declared HTTP frontdoor contract actually comes up.


Service Identity

Instance Identity

Each microservice instance is identified by:

Field Type Description
ServiceName string Logical service name (e.g., "billing")
Version string Semantic version (major.minor.patch)
Region string Deployment region (e.g., "us-east-1")
InstanceId string Unique instance identifier

Version Matching

  • Version matching is strict semver equality
  • Router only routes to instances with exact version match
  • Default version used when client doesn't specify

Region Configuration

Gateway region comes from GatewayNodeConfig:

public sealed class GatewayNodeConfig
{
    public required string Region { get; init; }     // e.g., "eu1"
    public required string NodeId { get; init; }     // e.g., "gw-eu1-01"
    public required string Environment { get; init; } // e.g., "prod"
}

Region is never derived from HTTP headers or URL hostnames.


Endpoint Model

Endpoint Identity

Endpoint identity is (HTTP Method, Path):

Field Example
Method GET, POST, PUT, PATCH, DELETE
Path /invoices, /items/{id}, /users/{userId}/orders

Endpoint Descriptor

Each endpoint includes:

public sealed class EndpointDescriptor
{
    public required string Method { get; init; }
    public required string Path { get; init; }
    public required string ServiceName { get; init; }
    public required string Version { get; init; }
    public TimeSpan DefaultTimeout { get; init; }
    public bool SupportsStreaming { get; init; }
    public IReadOnlyList<ClaimRequirement> RequiringClaims { get; init; } = [];
    public EndpointSchemaInfo? SchemaInfo { get; init; }
}

Path Matching

  • ASP.NET-style route templates
  • Parameter segments: {id}, {userId}
  • Extra path segments are consumed only by explicit catch-all parameters ({**path}); ordinary terminal parameters must not behave like implicit catch-alls during messaging transport dispatch
  • Case sensitivity and trailing slash handling follow ASP.NET conventions

Routing Algorithm

Instance Selection

Given (ServiceName, Version, Method, Path):

  1. Filter candidates:

    • Match ServiceName exactly
    • Match Version exactly (strict semver)
    • Health status in acceptable set (Healthy or Degraded)
  2. Region preference:

    • Prefer instances where Region == GatewayNodeConfig.Region
    • Fall back to configured neighbor regions
    • Fall back to all other regions
  3. Within region tier:

    • Prefer lower AveragePingMs
    • If tied, prefer more recent LastHeartbeatUtc
    • If still tied, use round-robin balancing

Instance Health

public enum InstanceHealthStatus
{
    Unknown,
    Healthy,
    Degraded,
    Draining,
    Unhealthy
}

Health metadata per connection:

Field Type Description
Status enum Current health status
LastHeartbeatUtc DateTime Last heartbeat timestamp
AveragePingMs double Average round-trip latency

Transport Layer

Transport Types

Transport Use Case Streaming Notes
InMemory Testing Yes In-process channels
TCP Production Yes Length-prefixed frames
TLS Secure Yes Certificate-based encryption
UDP Small payloads No Single datagram per frame
RabbitMQ Queuing Yes Exchange/queue routing

Transport Plugin Interface

public interface ITransportServer
{
    Task StartAsync(CancellationToken ct);
    Task StopAsync(CancellationToken ct);
    event Func<ConnectionState, HelloPayload, Task> OnHelloReceived;
    event Func<ConnectionState, HeartbeatPayload, Task> OnHeartbeatReceived;
    event Func<string, Task> OnConnectionClosed;
}

public interface ITransportClient
{
    Task ConnectAsync(CancellationToken ct);
    Task DisconnectAsync(CancellationToken ct);
    Task SendFrameAsync(Frame frame, CancellationToken ct);
}

Frame Types

public enum FrameType : byte
{
    Hello = 1,
    Heartbeat = 2,
    Request = 3,
    Response = 4,
    RequestStreamData = 5,
    ResponseStreamData = 6,
    Cancel = 7
}

Gateway Pipeline

HTTP Middleware Stack

Request ─►│ ForwardedHeaders        │
          │ RequestLogging          │
          │ ErrorHandling           │
          │ Authentication          │
          │ EndpointResolution      │  ◄── (Method, Path) → EndpointDescriptor
          │ Authorization           │  ◄── RequiringClaims check
          │ RoutingDecision         │  ◄── Select connection/instance
          │ TransportDispatch       │  ◄── Send to microservice
          ▼

Identity Header Policy and Tenant Selection

  • Gateway strips client-supplied reserved identity headers (X-StellaOps-*, legacy aliases, raw claim headers, and auth headers) before proxying.
  • Effective tenant is claim-derived from validated principal claims (stellaops:tenant, then bounded legacy tid fallback).
  • Per-request tenant override is disabled by default and only works when explicitly enabled with Gateway:Auth:EnableTenantOverride=true and the requested tenant exists in stellaops:allowed_tenants.
  • Authorization/DPoP passthrough is fail-closed:
  • route must be configured with PreserveAuthHeaders=true, and
  • route prefix must also be in the approved passthrough allow-list configured under Gateway:Auth:ApprovedAuthPassthroughPrefixes.
  • local frontdoor configs approve /connect, /console, /authority, /doctor, /api, /policy/shadow, and /policy/simulations so live policy compatibility endpoints can preserve DPoP/JWT passthrough without broadening unrelated routes.
  • Tenant override attempts are logged with deterministic fields including route, actor, requested tenant, and resolved tenant.

Connection State

Per-connection state maintained by Gateway:

public sealed class ConnectionState
{
    public required string ConnectionId { get; init; }
    public required InstanceDescriptor Instance { get; init; }
    public InstanceHealthStatus Status { get; set; }
    public DateTime? LastHeartbeatUtc { get; set; }
    public double AveragePingMs { get; set; }
    public TransportType TransportType { get; init; }
    public Dictionary<(string Method, string Path), EndpointDescriptor> Endpoints { get; } = new();
    public IReadOnlyDictionary<string, SchemaDefinition> Schemas { get; init; } = new Dictionary<string, SchemaDefinition>();
}

Payload Handling

The Gateway treats bodies as opaque byte sequences:

  • No deserialization or schema interpretation
  • Headers and bytes forwarded as-is
  • Schema validation is microservice responsibility

Payload Limits

Configurable limits protect against resource exhaustion:

Limit Scope
MaxRequestBytesPerCall Single request
MaxRequestBytesPerConnection All requests on connection
MaxAggregateInflightBytes All in-flight across gateway

Exceeded limits result in:

  • Early rejection (HTTP 413) if Content-Length known
  • Mid-stream abort with CANCEL frame
  • Appropriate error response (413 or 503)

Microservice SDK

Configuration

services.AddStellaMicroservice(options =>
{
    options.ServiceName = "billing";
    options.Version = "1.0.0";
    options.Region = "us-east-1";
    options.InstanceId = Guid.NewGuid().ToString();
    options.ServiceDescription = "Invoice processing service";
});

Endpoint Declaration

Attributes:

[StellaEndpoint("POST", "/invoices")]
public sealed class CreateInvoiceEndpoint : IStellaEndpoint<CreateInvoiceRequest, CreateInvoiceResponse>

Handler Interfaces

Typed handler (JSON serialization):

public interface IStellaEndpoint<TRequest, TResponse>
{
    Task<TResponse> HandleAsync(TRequest request, CancellationToken ct);
}

public interface IStellaEndpoint<TResponse>
{
    Task<TResponse> HandleAsync(CancellationToken ct);
}

Raw handler (streaming):

public interface IRawStellaEndpoint
{
    Task<RawResponse> HandleAsync(RawRequestContext ctx, CancellationToken ct);
}

Endpoint Discovery

Two mechanisms:

  1. Source Generator (preferred): Compile-time discovery via Roslyn
  2. Reflection (fallback): Runtime assembly scanning

Connection Behavior

On connection:

  1. Send HELLO with instance info and endpoints
  2. Start heartbeat timer
  3. Listen for REQUEST frames

HELLO payload:

public sealed class HelloPayload
{
    public required InstanceDescriptor Instance { get; init; }
    public required IReadOnlyList<EndpointDescriptor> Endpoints { get; init; }
    public IReadOnlyDictionary<string, SchemaDefinition> Schemas { get; init; } = new Dictionary<string, SchemaDefinition>();
    public ServiceOpenApiInfo? OpenApiInfo { get; init; }
}

Authorization

Claims-based Model

Authorization uses RequiringClaims, not roles:

public sealed class ClaimRequirement
{
    public required string Type { get; init; }
    public string? Value { get; init; }
}

Precedence

  1. Microservice provides defaults in HELLO
  2. Authority can override centrally
  3. Gateway enforces final effective claims

Enforcement

Gateway AuthorizationMiddleware:

  • Validates user principal has all required claims
  • Empty claims list = authenticated access only
  • Missing claim = 403 Forbidden

Cancellation

CANCEL Frame

public sealed class CancelPayload
{
    public required string Reason { get; init; }
    // Values: "ClientDisconnected", "Timeout", "PayloadLimitExceeded", "Shutdown"
}

Gateway sends CANCEL when:

  • HTTP client disconnects (HttpContext.RequestAborted)
  • Request timeout elapses
  • Payload limit exceeded
  • Gateway shutdown

Microservice handles CANCEL:

  • Maps correlation ID to CancellationTokenSource
  • Calls Cancel() on the source
  • Handler receives cancellation via CancellationToken

Streaming

Buffered vs Streaming

Mode Request Body Response Body Use Case
Buffered Full in memory Full in memory Small payloads
Streaming Chunked frames Chunked frames Large payloads

Frame Flow (Streaming)

Gateway                              Microservice
   │                                      │
   │  REQUEST (headers only)              │
   │ ────────────────────────────────────►│
   │                                      │
   │  REQUEST_STREAM_DATA (chunk 1)       │
   │ ────────────────────────────────────►│
   │                                      │
   │  REQUEST_STREAM_DATA (chunk n)       │
   │ ────────────────────────────────────►│
   │                                      │
   │  REQUEST_STREAM_DATA (final=true)    │
   │ ────────────────────────────────────►│
   │                                      │
   │                   RESPONSE           │
   │◄────────────────────────────────────│
   │                                      │
   │         RESPONSE_STREAM_DATA         │
   │◄────────────────────────────────────│

Heartbeat & Health

Heartbeat Frame

Sent at regular intervals over the same connection as requests:

public sealed class HeartbeatPayload
{
    public required InstanceHealthStatus Status { get; init; }
    public int InflightRequests { get; init; }
    public double ErrorRate { get; init; }
}

Health Tracking

Gateway tracks:

  • LastHeartbeatUtc per connection
  • Derives status from heartbeat recency
  • Marks stale instances as Unhealthy
  • Uses health in routing decisions
  • Messaging transports stay push-first even when backed by notifiable queues; the missed-notification safety-net timeout is derived from the configured heartbeat interval and clamped to a short bounded window instead of falling back to a fixed long poll.
  • Gateway degraded and stale transitions are normalized against the messaging heartbeat contract. A gateway may not mark an instance Degraded earlier than 2x the heartbeat interval or Unhealthy earlier than 3x the heartbeat interval, even when looser defaults were configured.
  • /health/ready is stricter than "process started": it remains 503 until the configured required first-party microservices have live healthy or degraded registrations in router state. Local scratch compose uses this to hold the frontdoor unhealthy until the core Stella API surface has replayed HELLO after a rebuild.
  • The required-service list must use canonical router serviceName values, not loose product-family aliases. Gateway readiness normalizes host-style suffixes such as -gateway, -web, .stella-ops.local, and ports, but it does not treat sibling services as interchangeable.
  • When a request already matched a configured Microservice route but the target service has not registered yet, the gateway returns 503 Service Unavailable, not 404 Not Found. 404 remains reserved for genuinely unknown paths or missing endpoints on an otherwise registered service.

Periodic HELLO re-registration is valid so a microservice can repopulate gateway state after a gateway restart, but it must refresh the existing logical transport connection instead of minting a second one. Gateway routing state also deduplicates by service instance identity (ServiceName, Version, InstanceId, transport) before re-indexing endpoints so repeated HELLO frames cannot accumulate stale route candidates.


Configuration

Router YAML

# router.yaml
Gateway:
  Region: "us-east-1"
  NodeId: "gw-east-01"
  Environment: "production"

PayloadLimits:
  MaxRequestBytesPerCall: 10485760        # 10 MB
  MaxRequestBytesPerConnection: 104857600  # 100 MB
  MaxAggregateInflightBytes: 1073741824    # 1 GB

Services:
  - ServiceName: billing
    DefaultVersion: "1.0.0"
    DefaultTransport: Tcp
    Endpoints:
      - Method: POST
        Path: /invoices
        TimeoutSeconds: 30
        RequiringClaims:
          - Type: "invoices:write"

OpenApi:
  Title: "StellaOps Gateway API"
  CacheTtlSeconds: 60

Hot Reload

  • YAML changes picked up at runtime
  • Routing state updated without restart
  • New services/endpoints added dynamically

Error Mapping

Condition HTTP Status
Version not found 404 Not Found
No healthy instance 503 Service Unavailable
Request timeout 504 Gateway Timeout
Payload too large 413 Payload Too Large
Unauthorized 401 Unauthorized
Missing claims 403 Forbidden
Validation error 422 Unprocessable Entity
Rate limit exceeded 429 Too Many Requests
Internal error 500 Internal Server Error

See Also