Files
git.stella-ops.org/docs/modules/router/architecture.md
master cc69d332e3
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Add unit tests for RabbitMq and Udp transport servers and clients
- Implemented comprehensive unit tests for RabbitMqTransportServer, covering constructor, disposal, connection management, event handlers, and exception handling.
- Added configuration tests for RabbitMqTransportServer to validate SSL, durable queues, auto-recovery, and custom virtual host options.
- Created unit tests for UdpFrameProtocol, including frame parsing and serialization, header size validation, and round-trip data preservation.
- Developed tests for UdpTransportClient, focusing on connection handling, event subscriptions, and exception scenarios.
- Established tests for UdpTransportServer, ensuring proper start/stop behavior, connection state management, and event handling.
- Included tests for UdpTransportOptions to verify default values and modification capabilities.
- Enhanced service registration tests for Udp transport services in the dependency injection container.
2025-12-05 19:01:12 +02:00

14 KiB

Router Architecture

This document is the canonical specification for the StellaOps Router system.

System Architecture

Scope

  • A single HTTP ingress service (StellaOps.Gateway.WebService) handles all external HTTP traffic
  • Microservices communicate with the Gateway using binary transports (TCP, TLS, UDP, RabbitMQ)
  • HTTP is not used for internal microservice-to-gateway traffic
  • Request/response bodies are opaque to the router (raw bytes/streams)

Transport Architecture

Each transport connection carries:

  • Initial registration (HELLO) and endpoint configuration
  • Ongoing heartbeats
  • Request/response data frames
  • Streaming data frames
  • Cancellation frames
┌─────────────────┐                           ┌─────────────────┐
│   Microservice  │                           │     Gateway     │
│                 │         HELLO             │                 │
│   Endpoints:    │ ─────────────────────────►│   Routing       │
│   - POST /items │         HEARTBEAT         │   State         │
│   - GET /items  │ ◄────────────────────────►│                 │
│                 │                           │   Connections[] │
│                 │  REQUEST / RESPONSE       │                 │
│                 │ ◄────────────────────────►│                 │
│                 │                           │                 │
│                 │  STREAM_DATA / CANCEL     │                 │
│                 │ ◄────────────────────────►│                 │
└─────────────────┘                           └─────────────────┘

Service Identity

Instance Identity

Each microservice instance is identified by:

Field Type Description
ServiceName string Logical service name (e.g., "billing")
Version string Semantic version (major.minor.patch)
Region string Deployment region (e.g., "us-east-1")
InstanceId string Unique instance identifier

Version Matching

  • Version matching is strict semver equality
  • Router only routes to instances with exact version match
  • Default version used when client doesn't specify

Region Configuration

Gateway region comes from GatewayNodeConfig:

public sealed class GatewayNodeConfig
{
    public required string Region { get; init; }     // e.g., "eu1"
    public required string NodeId { get; init; }     // e.g., "gw-eu1-01"
    public required string Environment { get; init; } // e.g., "prod"
}

Region is never derived from HTTP headers or URL hostnames.


Endpoint Model

Endpoint Identity

Endpoint identity is (HTTP Method, Path):

Field Example
Method GET, POST, PUT, PATCH, DELETE
Path /invoices, /items/{id}, /users/{userId}/orders

Endpoint Descriptor

Each endpoint includes:

public sealed class EndpointDescriptor
{
    public required string Method { get; init; }
    public required string Path { get; init; }
    public required string ServiceName { get; init; }
    public required string Version { get; init; }
    public TimeSpan DefaultTimeout { get; init; }
    public bool SupportsStreaming { get; init; }
    public IReadOnlyList<ClaimRequirement> RequiringClaims { get; init; } = [];
    public EndpointSchemaInfo? SchemaInfo { get; init; }
}

Path Matching

  • ASP.NET-style route templates
  • Parameter segments: {id}, {userId}
  • Case sensitivity and trailing slash handling follow ASP.NET conventions

Routing Algorithm

Instance Selection

Given (ServiceName, Version, Method, Path):

  1. Filter candidates:

    • Match ServiceName exactly
    • Match Version exactly (strict semver)
    • Health status in acceptable set (Healthy or Degraded)
  2. Region preference:

    • Prefer instances where Region == GatewayNodeConfig.Region
    • Fall back to configured neighbor regions
    • Fall back to all other regions
  3. Within region tier:

    • Prefer lower AveragePingMs
    • If tied, prefer more recent LastHeartbeatUtc
    • If still tied, use round-robin balancing

Instance Health

public enum InstanceHealthStatus
{
    Unknown,
    Healthy,
    Degraded,
    Draining,
    Unhealthy
}

Health metadata per connection:

Field Type Description
Status enum Current health status
LastHeartbeatUtc DateTime Last heartbeat timestamp
AveragePingMs double Average round-trip latency

Transport Layer

Transport Types

Transport Use Case Streaming Notes
InMemory Testing Yes In-process channels
TCP Production Yes Length-prefixed frames
TLS Secure Yes Certificate-based encryption
UDP Small payloads No Single datagram per frame
RabbitMQ Queuing Yes Exchange/queue routing

Transport Plugin Interface

public interface ITransportServer
{
    Task StartAsync(CancellationToken ct);
    Task StopAsync(CancellationToken ct);
    event Func<ConnectionState, HelloPayload, Task> OnHelloReceived;
    event Func<ConnectionState, HeartbeatPayload, Task> OnHeartbeatReceived;
    event Func<string, Task> OnConnectionClosed;
}

public interface ITransportClient
{
    Task ConnectAsync(CancellationToken ct);
    Task DisconnectAsync(CancellationToken ct);
    Task SendFrameAsync(Frame frame, CancellationToken ct);
}

Frame Types

public enum FrameType : byte
{
    Hello = 1,
    Heartbeat = 2,
    Request = 3,
    Response = 4,
    RequestStreamData = 5,
    ResponseStreamData = 6,
    Cancel = 7
}

Gateway Pipeline

HTTP Middleware Stack

Request ─►│ ForwardedHeaders        │
          │ RequestLogging          │
          │ ErrorHandling           │
          │ Authentication          │
          │ EndpointResolution      │  ◄── (Method, Path) → EndpointDescriptor
          │ Authorization           │  ◄── RequiringClaims check
          │ RoutingDecision         │  ◄── Select connection/instance
          │ TransportDispatch       │  ◄── Send to microservice
          ▼

Connection State

Per-connection state maintained by Gateway:

public sealed class ConnectionState
{
    public required string ConnectionId { get; init; }
    public required InstanceDescriptor Instance { get; init; }
    public InstanceHealthStatus Status { get; set; }
    public DateTime? LastHeartbeatUtc { get; set; }
    public double AveragePingMs { get; set; }
    public TransportType TransportType { get; init; }
    public Dictionary<(string Method, string Path), EndpointDescriptor> Endpoints { get; } = new();
    public IReadOnlyDictionary<string, SchemaDefinition> Schemas { get; init; } = new Dictionary<string, SchemaDefinition>();
}

Payload Handling

The Gateway treats bodies as opaque byte sequences:

  • No deserialization or schema interpretation
  • Headers and bytes forwarded as-is
  • Schema validation is microservice responsibility

Payload Limits

Configurable limits protect against resource exhaustion:

Limit Scope
MaxRequestBytesPerCall Single request
MaxRequestBytesPerConnection All requests on connection
MaxAggregateInflightBytes All in-flight across gateway

Exceeded limits result in:

  • Early rejection (HTTP 413) if Content-Length known
  • Mid-stream abort with CANCEL frame
  • Appropriate error response (413 or 503)

Microservice SDK

Configuration

services.AddStellaMicroservice(options =>
{
    options.ServiceName = "billing";
    options.Version = "1.0.0";
    options.Region = "us-east-1";
    options.InstanceId = Guid.NewGuid().ToString();
    options.ServiceDescription = "Invoice processing service";
});

Endpoint Declaration

Attributes:

[StellaEndpoint("POST", "/invoices")]
public sealed class CreateInvoiceEndpoint : IStellaEndpoint<CreateInvoiceRequest, CreateInvoiceResponse>

Handler Interfaces

Typed handler (JSON serialization):

public interface IStellaEndpoint<TRequest, TResponse>
{
    Task<TResponse> HandleAsync(TRequest request, CancellationToken ct);
}

public interface IStellaEndpoint<TResponse>
{
    Task<TResponse> HandleAsync(CancellationToken ct);
}

Raw handler (streaming):

public interface IRawStellaEndpoint
{
    Task<RawResponse> HandleAsync(RawRequestContext ctx, CancellationToken ct);
}

Endpoint Discovery

Two mechanisms:

  1. Source Generator (preferred): Compile-time discovery via Roslyn
  2. Reflection (fallback): Runtime assembly scanning

Connection Behavior

On connection:

  1. Send HELLO with instance info and endpoints
  2. Start heartbeat timer
  3. Listen for REQUEST frames

HELLO payload:

public sealed class HelloPayload
{
    public required InstanceDescriptor Instance { get; init; }
    public required IReadOnlyList<EndpointDescriptor> Endpoints { get; init; }
    public IReadOnlyDictionary<string, SchemaDefinition> Schemas { get; init; } = new Dictionary<string, SchemaDefinition>();
    public ServiceOpenApiInfo? OpenApiInfo { get; init; }
}

Authorization

Claims-based Model

Authorization uses RequiringClaims, not roles:

public sealed class ClaimRequirement
{
    public required string Type { get; init; }
    public string? Value { get; init; }
}

Precedence

  1. Microservice provides defaults in HELLO
  2. Authority can override centrally
  3. Gateway enforces final effective claims

Enforcement

Gateway AuthorizationMiddleware:

  • Validates user principal has all required claims
  • Empty claims list = authenticated access only
  • Missing claim = 403 Forbidden

Cancellation

CANCEL Frame

public sealed class CancelPayload
{
    public required string Reason { get; init; }
    // Values: "ClientDisconnected", "Timeout", "PayloadLimitExceeded", "Shutdown"
}

Gateway sends CANCEL when:

  • HTTP client disconnects (HttpContext.RequestAborted)
  • Request timeout elapses
  • Payload limit exceeded
  • Gateway shutdown

Microservice handles CANCEL:

  • Maps correlation ID to CancellationTokenSource
  • Calls Cancel() on the source
  • Handler receives cancellation via CancellationToken

Streaming

Buffered vs Streaming

Mode Request Body Response Body Use Case
Buffered Full in memory Full in memory Small payloads
Streaming Chunked frames Chunked frames Large payloads

Frame Flow (Streaming)

Gateway                              Microservice
   │                                      │
   │  REQUEST (headers only)              │
   │ ────────────────────────────────────►│
   │                                      │
   │  REQUEST_STREAM_DATA (chunk 1)       │
   │ ────────────────────────────────────►│
   │                                      │
   │  REQUEST_STREAM_DATA (chunk n)       │
   │ ────────────────────────────────────►│
   │                                      │
   │  REQUEST_STREAM_DATA (final=true)    │
   │ ────────────────────────────────────►│
   │                                      │
   │                   RESPONSE           │
   │◄────────────────────────────────────│
   │                                      │
   │         RESPONSE_STREAM_DATA         │
   │◄────────────────────────────────────│

Heartbeat & Health

Heartbeat Frame

Sent at regular intervals over the same connection as requests:

public sealed class HeartbeatPayload
{
    public required InstanceHealthStatus Status { get; init; }
    public int InflightRequests { get; init; }
    public double ErrorRate { get; init; }
}

Health Tracking

Gateway tracks:

  • LastHeartbeatUtc per connection
  • Derives status from heartbeat recency
  • Marks stale instances as Unhealthy
  • Uses health in routing decisions

Configuration

Router YAML

# router.yaml
Gateway:
  Region: "us-east-1"
  NodeId: "gw-east-01"
  Environment: "production"

PayloadLimits:
  MaxRequestBytesPerCall: 10485760        # 10 MB
  MaxRequestBytesPerConnection: 104857600  # 100 MB
  MaxAggregateInflightBytes: 1073741824    # 1 GB

Services:
  - ServiceName: billing
    DefaultVersion: "1.0.0"
    DefaultTransport: Tcp
    Endpoints:
      - Method: POST
        Path: /invoices
        TimeoutSeconds: 30
        RequiringClaims:
          - Type: "invoices:write"

OpenApi:
  Title: "StellaOps Gateway API"
  CacheTtlSeconds: 60

Hot Reload

  • YAML changes picked up at runtime
  • Routing state updated without restart
  • New services/endpoints added dynamically

Error Mapping

Condition HTTP Status
Version not found 404 Not Found
No healthy instance 503 Service Unavailable
Request timeout 504 Gateway Timeout
Payload too large 413 Payload Too Large
Unauthorized 401 Unauthorized
Missing claims 403 Forbidden
Validation error 422 Unprocessable Entity
Internal error 500 Internal Server Error

See Also