Files
git.stella-ops.org/docs/modules/router/architecture.md
master cc69d332e3
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Add unit tests for RabbitMq and Udp transport servers and clients
- Implemented comprehensive unit tests for RabbitMqTransportServer, covering constructor, disposal, connection management, event handlers, and exception handling.
- Added configuration tests for RabbitMqTransportServer to validate SSL, durable queues, auto-recovery, and custom virtual host options.
- Created unit tests for UdpFrameProtocol, including frame parsing and serialization, header size validation, and round-trip data preservation.
- Developed tests for UdpTransportClient, focusing on connection handling, event subscriptions, and exception scenarios.
- Established tests for UdpTransportServer, ensuring proper start/stop behavior, connection state management, and event handling.
- Included tests for UdpTransportOptions to verify default values and modification capabilities.
- Enhanced service registration tests for Udp transport services in the dependency injection container.
2025-12-05 19:01:12 +02:00

520 lines
14 KiB
Markdown

# Router Architecture
This document is the canonical specification for the StellaOps Router system.
## System Architecture
### Scope
- A single HTTP ingress service (`StellaOps.Gateway.WebService`) handles all external HTTP traffic
- Microservices communicate with the Gateway using binary transports (TCP, TLS, UDP, RabbitMQ)
- HTTP is not used for internal microservice-to-gateway traffic
- Request/response bodies are opaque to the router (raw bytes/streams)
### Transport Architecture
Each transport connection carries:
- Initial registration (HELLO) and endpoint configuration
- Ongoing heartbeats
- Request/response data frames
- Streaming data frames
- Cancellation frames
```
┌─────────────────┐ ┌─────────────────┐
│ Microservice │ │ Gateway │
│ │ HELLO │ │
│ Endpoints: │ ─────────────────────────►│ Routing │
│ - POST /items │ HEARTBEAT │ State │
│ - GET /items │ ◄────────────────────────►│ │
│ │ │ Connections[] │
│ │ REQUEST / RESPONSE │ │
│ │ ◄────────────────────────►│ │
│ │ │ │
│ │ STREAM_DATA / CANCEL │ │
│ │ ◄────────────────────────►│ │
└─────────────────┘ └─────────────────┘
```
---
## Service Identity
### Instance Identity
Each microservice instance is identified by:
| Field | Type | Description |
|-------|------|-------------|
| `ServiceName` | string | Logical service name (e.g., "billing") |
| `Version` | string | Semantic version (`major.minor.patch`) |
| `Region` | string | Deployment region (e.g., "us-east-1") |
| `InstanceId` | string | Unique instance identifier |
### Version Matching
- Version matching is strict semver equality
- Router only routes to instances with exact version match
- Default version used when client doesn't specify
### Region Configuration
Gateway region comes from `GatewayNodeConfig`:
```csharp
public sealed class GatewayNodeConfig
{
public required string Region { get; init; } // e.g., "eu1"
public required string NodeId { get; init; } // e.g., "gw-eu1-01"
public required string Environment { get; init; } // e.g., "prod"
}
```
Region is never derived from HTTP headers or URL hostnames.
---
## Endpoint Model
### Endpoint Identity
Endpoint identity is `(HTTP Method, Path)`:
| Field | Example |
|-------|---------|
| Method | `GET`, `POST`, `PUT`, `PATCH`, `DELETE` |
| Path | `/invoices`, `/items/{id}`, `/users/{userId}/orders` |
### Endpoint Descriptor
Each endpoint includes:
```csharp
public sealed class EndpointDescriptor
{
public required string Method { get; init; }
public required string Path { get; init; }
public required string ServiceName { get; init; }
public required string Version { get; init; }
public TimeSpan DefaultTimeout { get; init; }
public bool SupportsStreaming { get; init; }
public IReadOnlyList<ClaimRequirement> RequiringClaims { get; init; } = [];
public EndpointSchemaInfo? SchemaInfo { get; init; }
}
```
### Path Matching
- ASP.NET-style route templates
- Parameter segments: `{id}`, `{userId}`
- Case sensitivity and trailing slash handling follow ASP.NET conventions
---
## Routing Algorithm
### Instance Selection
Given `(ServiceName, Version, Method, Path)`:
1. **Filter candidates**:
- Match `ServiceName` exactly
- Match `Version` exactly (strict semver)
- Health status in acceptable set (`Healthy` or `Degraded`)
2. **Region preference**:
- Prefer instances where `Region == GatewayNodeConfig.Region`
- Fall back to configured neighbor regions
- Fall back to all other regions
3. **Within region tier**:
- Prefer lower `AveragePingMs`
- If tied, prefer more recent `LastHeartbeatUtc`
- If still tied, use round-robin balancing
### Instance Health
```csharp
public enum InstanceHealthStatus
{
Unknown,
Healthy,
Degraded,
Draining,
Unhealthy
}
```
Health metadata per connection:
| Field | Type | Description |
|-------|------|-------------|
| `Status` | enum | Current health status |
| `LastHeartbeatUtc` | DateTime | Last heartbeat timestamp |
| `AveragePingMs` | double | Average round-trip latency |
---
## Transport Layer
### Transport Types
| Transport | Use Case | Streaming | Notes |
|-----------|----------|-----------|-------|
| InMemory | Testing | Yes | In-process channels |
| TCP | Production | Yes | Length-prefixed frames |
| TLS | Secure | Yes | Certificate-based encryption |
| UDP | Small payloads | No | Single datagram per frame |
| RabbitMQ | Queuing | Yes | Exchange/queue routing |
### Transport Plugin Interface
```csharp
public interface ITransportServer
{
Task StartAsync(CancellationToken ct);
Task StopAsync(CancellationToken ct);
event Func<ConnectionState, HelloPayload, Task> OnHelloReceived;
event Func<ConnectionState, HeartbeatPayload, Task> OnHeartbeatReceived;
event Func<string, Task> OnConnectionClosed;
}
public interface ITransportClient
{
Task ConnectAsync(CancellationToken ct);
Task DisconnectAsync(CancellationToken ct);
Task SendFrameAsync(Frame frame, CancellationToken ct);
}
```
### Frame Types
```csharp
public enum FrameType : byte
{
Hello = 1,
Heartbeat = 2,
Request = 3,
Response = 4,
RequestStreamData = 5,
ResponseStreamData = 6,
Cancel = 7
}
```
---
## Gateway Pipeline
### HTTP Middleware Stack
```
Request ─►│ ForwardedHeaders │
│ RequestLogging │
│ ErrorHandling │
│ Authentication │
│ EndpointResolution │ ◄── (Method, Path) → EndpointDescriptor
│ Authorization │ ◄── RequiringClaims check
│ RoutingDecision │ ◄── Select connection/instance
│ TransportDispatch │ ◄── Send to microservice
```
### Connection State
Per-connection state maintained by Gateway:
```csharp
public sealed class ConnectionState
{
public required string ConnectionId { get; init; }
public required InstanceDescriptor Instance { get; init; }
public InstanceHealthStatus Status { get; set; }
public DateTime? LastHeartbeatUtc { get; set; }
public double AveragePingMs { get; set; }
public TransportType TransportType { get; init; }
public Dictionary<(string Method, string Path), EndpointDescriptor> Endpoints { get; } = new();
public IReadOnlyDictionary<string, SchemaDefinition> Schemas { get; init; } = new Dictionary<string, SchemaDefinition>();
}
```
### Payload Handling
The Gateway treats bodies as opaque byte sequences:
- No deserialization or schema interpretation
- Headers and bytes forwarded as-is
- Schema validation is microservice responsibility
### Payload Limits
Configurable limits protect against resource exhaustion:
| Limit | Scope |
|-------|-------|
| `MaxRequestBytesPerCall` | Single request |
| `MaxRequestBytesPerConnection` | All requests on connection |
| `MaxAggregateInflightBytes` | All in-flight across gateway |
Exceeded limits result in:
- Early rejection (HTTP 413) if `Content-Length` known
- Mid-stream abort with CANCEL frame
- Appropriate error response (413 or 503)
---
## Microservice SDK
### Configuration
```csharp
services.AddStellaMicroservice(options =>
{
options.ServiceName = "billing";
options.Version = "1.0.0";
options.Region = "us-east-1";
options.InstanceId = Guid.NewGuid().ToString();
options.ServiceDescription = "Invoice processing service";
});
```
### Endpoint Declaration
Attributes:
```csharp
[StellaEndpoint("POST", "/invoices")]
public sealed class CreateInvoiceEndpoint : IStellaEndpoint<CreateInvoiceRequest, CreateInvoiceResponse>
```
### Handler Interfaces
**Typed handler** (JSON serialization):
```csharp
public interface IStellaEndpoint<TRequest, TResponse>
{
Task<TResponse> HandleAsync(TRequest request, CancellationToken ct);
}
public interface IStellaEndpoint<TResponse>
{
Task<TResponse> HandleAsync(CancellationToken ct);
}
```
**Raw handler** (streaming):
```csharp
public interface IRawStellaEndpoint
{
Task<RawResponse> HandleAsync(RawRequestContext ctx, CancellationToken ct);
}
```
### Endpoint Discovery
Two mechanisms:
1. **Source Generator** (preferred): Compile-time discovery via Roslyn
2. **Reflection** (fallback): Runtime assembly scanning
### Connection Behavior
On connection:
1. Send HELLO with instance info and endpoints
2. Start heartbeat timer
3. Listen for REQUEST frames
HELLO payload:
```csharp
public sealed class HelloPayload
{
public required InstanceDescriptor Instance { get; init; }
public required IReadOnlyList<EndpointDescriptor> Endpoints { get; init; }
public IReadOnlyDictionary<string, SchemaDefinition> Schemas { get; init; } = new Dictionary<string, SchemaDefinition>();
public ServiceOpenApiInfo? OpenApiInfo { get; init; }
}
```
---
## Authorization
### Claims-based Model
Authorization uses `RequiringClaims`, not roles:
```csharp
public sealed class ClaimRequirement
{
public required string Type { get; init; }
public string? Value { get; init; }
}
```
### Precedence
1. Microservice provides defaults in HELLO
2. Authority can override centrally
3. Gateway enforces final effective claims
### Enforcement
Gateway `AuthorizationMiddleware`:
- Validates user principal has all required claims
- Empty claims list = authenticated access only
- Missing claim = 403 Forbidden
---
## Cancellation
### CANCEL Frame
```csharp
public sealed class CancelPayload
{
public required string Reason { get; init; }
// Values: "ClientDisconnected", "Timeout", "PayloadLimitExceeded", "Shutdown"
}
```
### Gateway sends CANCEL when:
- HTTP client disconnects (`HttpContext.RequestAborted`)
- Request timeout elapses
- Payload limit exceeded
- Gateway shutdown
### Microservice handles CANCEL:
- Maps correlation ID to `CancellationTokenSource`
- Calls `Cancel()` on the source
- Handler receives cancellation via `CancellationToken`
---
## Streaming
### Buffered vs Streaming
| Mode | Request Body | Response Body | Use Case |
|------|--------------|---------------|----------|
| Buffered | Full in memory | Full in memory | Small payloads |
| Streaming | Chunked frames | Chunked frames | Large payloads |
### Frame Flow (Streaming)
```
Gateway Microservice
│ │
│ REQUEST (headers only) │
│ ────────────────────────────────────►│
│ │
│ REQUEST_STREAM_DATA (chunk 1) │
│ ────────────────────────────────────►│
│ │
│ REQUEST_STREAM_DATA (chunk n) │
│ ────────────────────────────────────►│
│ │
│ REQUEST_STREAM_DATA (final=true) │
│ ────────────────────────────────────►│
│ │
│ RESPONSE │
│◄────────────────────────────────────│
│ │
│ RESPONSE_STREAM_DATA │
│◄────────────────────────────────────│
```
---
## Heartbeat & Health
### Heartbeat Frame
Sent at regular intervals over the same connection as requests:
```csharp
public sealed class HeartbeatPayload
{
public required InstanceHealthStatus Status { get; init; }
public int InflightRequests { get; init; }
public double ErrorRate { get; init; }
}
```
### Health Tracking
Gateway tracks:
- `LastHeartbeatUtc` per connection
- Derives status from heartbeat recency
- Marks stale instances as Unhealthy
- Uses health in routing decisions
---
## Configuration
### Router YAML
```yaml
# router.yaml
Gateway:
Region: "us-east-1"
NodeId: "gw-east-01"
Environment: "production"
PayloadLimits:
MaxRequestBytesPerCall: 10485760 # 10 MB
MaxRequestBytesPerConnection: 104857600 # 100 MB
MaxAggregateInflightBytes: 1073741824 # 1 GB
Services:
- ServiceName: billing
DefaultVersion: "1.0.0"
DefaultTransport: Tcp
Endpoints:
- Method: POST
Path: /invoices
TimeoutSeconds: 30
RequiringClaims:
- Type: "invoices:write"
OpenApi:
Title: "StellaOps Gateway API"
CacheTtlSeconds: 60
```
### Hot Reload
- YAML changes picked up at runtime
- Routing state updated without restart
- New services/endpoints added dynamically
---
## Error Mapping
| Condition | HTTP Status |
|-----------|-------------|
| Version not found | 404 Not Found |
| No healthy instance | 503 Service Unavailable |
| Request timeout | 504 Gateway Timeout |
| Payload too large | 413 Payload Too Large |
| Unauthorized | 401 Unauthorized |
| Missing claims | 403 Forbidden |
| Validation error | 422 Unprocessable Entity |
| Internal error | 500 Internal Server Error |
---
## See Also
- [schema-validation.md](schema-validation.md) - JSON Schema validation
- [openapi-aggregation.md](openapi-aggregation.md) - OpenAPI document generation
- [migration-guide.md](migration-guide.md) - WebService to Microservice migration