Files
git.stella-ops.org/docs/router/specs.md
2025-12-02 18:38:32 +02:00

22 KiB
Raw Blame History

Ill group everything into requirement buckets, but keep it all as requirements statements (no rationale). This is the union of what you asked for or confirmed across the whole thread.


1. Architectural / scope requirements

  • There SHALL be a single HTTP ingress service named StellaOps.Gateway.WebService.

  • Microservices SHALL NOT expose HTTP to the router; all microservice-to-router traffic (control + data) MUST use in-house transports (UDP, TCP, certificate/TLS, RabbitMQ).

  • There SHALL NOT be a separate control-plane service or protocol; each transport connection between a microservice and the router MUST carry:

    • Initial registration (HELLO) and endpoint configuration.
    • Ongoing heartbeats.
    • Endpoint updates (if any).
    • Request/response and streaming data.
  • The router SHALL maintain per-connection endpoint mappings and derive its global routing state from the union of all live connections.

  • The router SHALL treat request and response bodies as opaque (raw bytes / streams); all deserialization and schema handling SHALL be the microservices responsibility.

  • The system SHALL support both buffered and streaming request/response flows end-to-end.

  • The design MUST reuse only the generic parts of __SerdicaTemplate (dynamic endpoint metadata, attribute-based endpoint discovery, request routing patterns, correlation, connection management) and MUST drop Serdica-specific stack (Oracle schema, domain logic, etc.).

  • The solution MUST be a simpler, generic replacement for the existing Serdica HTTP→RabbitMQ→microservice design.


2. Service identity, region, versioning

  • Each microservice instance SHALL be identified by (ServiceName, Version, Region, InstanceId).

  • Version MUST follow strict semantic versioning (major.minor.patch).

  • Routing MUST be strict on version:

    • The router MUST only route a request to instances whose Version equals the selected version.
    • When a version is not explicitly specified by the client, a default version MUST be used (from config or metadata).
  • Each gateway node SHALL have a static configuration object GatewayNodeConfig containing at least:

    • Region (e.g. "eu1").
    • NodeId (e.g. "gw-eu1-01").
    • Environment (e.g. "prod").
  • Routing decisions MUST use GatewayNodeConfig.Region as the nodes region; the router MUST NOT derive region from HTTP headers or URL host names.

  • DNS/host naming conventions SHOULD express region in the domain (e.g. eu1.global.stella-ops.org, mainoffice.contoso.stella-ops.org), but routing logic MUST be driven by GatewayNodeConfig.Region rather than by host parsing.


3. Endpoint identity and metadata

  • Endpoint identity in the router and microservices MUST be HTTP Method + Path, for example:

    • Method: one of GET, POST, PUT, PATCH, DELETE.
    • Path: e.g. /section/get/{id}.
  • The router and microservices MUST use the same path template syntax and matching rules (e.g. ASP.NET-style route templates), including decisions on:

    • Case sensitivity.
    • Trailing slash handling.
    • Parameter segments (e.g. {id}).
  • The router MUST resolve an incoming HTTP (Method, Path) to a logical endpoint descriptor that includes:

    • ServiceName.
    • Version.
    • Method.
    • Path.
    • DefaultTimeout.
    • RequiringClaims: a list of claim requirements.
    • A flag indicating whether the endpoint supports streaming.
  • Every place that previously spoke about AllowedRoles MUST be replaced with RequiringClaims:

    • Each requirement MUST at minimum contain a Type and MAY contain a Value.
  • Endpoints MUST support being configured with default RequiringClaims in microservices, with the possibility of external override (see Authority section).


4. Routing algorithm / instance selection

  • Given a resolved endpoint (ServiceName, Version, Method, Path), the router MUST:

    • Filter candidate instances by:

      • Matching ServiceName.
      • Matching Version (strict semver equality).
      • Health in an acceptable set (e.g. Healthy or Degraded).
  • Instances MUST have health metadata:

    • Status ∈ {Unknown, Healthy, Degraded, Draining, Unhealthy}.
    • LastHeartbeatUtc.
    • AveragePingMs.
  • The routers instance selection MUST obey these rules:

    • Region:

      • Prefer instances whose Region == GatewayNodeConfig.Region.
      • If none, fall back to configured neighbor regions.
      • If none, fall back to all other regions.
    • Within a chosen region tier:

      • Prefer lower AveragePingMs.
      • If several are tied, prefer more recent LastHeartbeatUtc.
      • If still tied, use a balancing strategy (e.g. random or round-robin).
  • The router MUST support a strict fallback order as requested:

    • Prefer “closest by region and heartbeat and ping.”

    • If having to choose between worse candidates, fall back in order of:

      • Greater ping (latency).
      • Greater heartbeat age.
      • Less preferred region tier.

5. Transport plugin requirements

  • There MUST be a transport plugin abstraction representing how the router and microservices communicate.

  • The default transport type MUST be UDP.

  • Additional supported transport types MUST include:

    • TCP.
    • Certificate-based TCP (TLS / mTLS).
    • RabbitMQ.
  • There MUST NOT be an HTTP transport plugin; HTTP MUST NOT be used for microservice-to-router communications (control or data).

  • Each transport plugin MUST support:

    • Establishing logical connections between microservices and the router.
    • Sending/receiving HELLO (registration), HEARTBEAT, optional ENDPOINTS_UPDATE.
    • Sending/receiving REQUEST/RESPONSE frames.
    • Supporting streaming via REQUEST_STREAM_DATA / RESPONSE_STREAM_DATA frames where the transport allows it.
    • Sending/receiving CANCEL frames to abort specific in-flight requests.
  • UDP transport:

    • MUST be used only for small/bounded payloads (no unbounded streaming).
    • MUST respect configured MaxRequestBytesPerCall.
  • TCP and Certificate transports:

    • MUST implement a length-prefixed framing protocol capable of multiplexing frames for multiple correlation IDs.
    • Certificate transport MUST enforce TLS and support optional mutual TLS (verifiable peer identity).
  • RabbitMQ:

    • MUST implement queue/exchange naming and routing keys sufficient to represent logical connections and correlation IDs.
    • MUST use message properties (e.g. CorrelationId) for request/response matching.

6. Gateway (StellaOps.Gateway.WebService) requirements

6.1 HTTP ingress pipeline

  • The gateway MUST host an ASP.NET Core HTTP server.

  • The HTTP middleware pipeline MUST include at least:

    • Forwarded headers handling (when behind reverse proxy).
    • Request logging (e.g. via Serilog) including correlation ID, service, endpoint, region, instance.
    • Global error-handling middleware.
    • Authentication middleware.
    • EndpointResolutionMiddleware to resolve (Method, Path) → endpoint.
    • Authorization middleware that enforces RequiringClaims.
    • RoutingDecisionMiddleware to choose connection/instance/transport.
    • TransportDispatchMiddleware to carry out buffered or streaming dispatch.
  • The gateway MUST read Method and Path from the HTTP request and use them to resolve endpoints.

6.2 Per-connection state and routing view

  • The gateway MUST maintain a ConnectionState per logical connection that includes:

    • ConnectionId.
    • InstanceDescriptor (InstanceId, ServiceName, Version, Region).
    • Status, LastHeartbeatUtc, AveragePingMs.
    • The set of endpoints that this connection serves ((Method, Path)EndpointDescriptor).
    • The transport type for that connection.
  • The gateway MUST maintain a global routing state (IGlobalRoutingState) that:

    • Resolves (Method, Path) to an EndpointDescriptor (service, version, metadata).
    • Provides the set of ConnectionState objects that can handle a given (ServiceName, Version, Method, Path).

6.3 Buffered vs streaming dispatch

  • The gateway MUST support:

    • Buffered mode for small to medium payloads:

      • Read the entire HTTP body into memory (or temp file when above a threshold).
      • Send as a single REQUEST payload.
    • Streaming mode for large or unknown content:

      • Streaming from HTTP body to microservice via a sequence of REQUEST_STREAM_DATA frames.
      • Streaming from microservice back to HTTP via RESPONSE_STREAM_DATA frames.
  • For each endpoint, the gateway MUST know whether it can use streaming or must use buffered mode (SupportsStreaming flag).

6.4 Opaque body handling

  • The gateway MUST treat request and response bodies as opaque byte sequences and MUST NOT attempt to deserialize or interpret payload contents.
  • The gateway MUST forward headers and body bytes as given and leave any schema, JSON, or other decoding to the microservice.

6.5 Payload and memory protection

  • The gateway MUST enforce configured payload limits:

    • MaxRequestBytesPerCall.
    • MaxRequestBytesPerConnection.
    • MaxAggregateInflightBytes.
  • If Content-Length is known and exceeds MaxRequestBytesPerCall, the gateway MUST reject the request early (e.g. HTTP 413 Payload Too Large).

  • During streaming, the gateway MUST maintain counters of:

    • Bytes read for this request.
    • Bytes for this connection.
    • Total in-flight bytes across all requests.
  • If any limit is exceeded mid-stream, the gateway MUST:

    • Stop reading the HTTP body.
    • Send a CANCEL frame for that correlation ID.
    • Abort the stream to the microservice.
    • Return an appropriate error to the client (e.g. 413 or 503) and log the incident.

7. Microservice SDK (__Libraries/StellaOps.Microservice) requirements

7.1 Identity & router connections

  • StellaMicroserviceOptions MUST let microservices configure:

    • ServiceName.
    • Version.
    • Region.
    • InstanceId.
    • A list of router endpoints (Routers / router pool) including host, port, and transport type for each.
    • Optional path to a YAML config file for endpoint-level overrides.
  • Providing the router pool (Routers / HTTP servers pool) MUST be mandatory; a microservice cannot start without at least one configured router endpoint.

  • The router pool SHOULD be configurable via code and MAY optionally be configured via YAML with hot-reload (causing reconnections if changed).

7.2 Endpoint definition & discovery

  • Microservice endpoints MUST be declared using attributes that specify (Method, Path):

    [StellaEndpoint("POST", "/billing/invoices")]
    public sealed class CreateInvoiceEndpoint : ...
    
  • The SDK MUST support two handler shapes:

    • Raw handler:

      • IRawStellaEndpoint taking a RawRequestContext and returning a RawResponse, where:

        • RawRequestContext.Body is a stream (may be buffered or streaming).
        • Body contents are raw bytes.
    • Typed handlers:

      • IStellaEndpoint<TRequest, TResponse> which takes a typed request and returns a typed response.
      • IStellaEndpoint<TResponse> which has no request payload and returns a typed response.
  • The SDK MUST adapt typed endpoints to the raw model internally (microservice-side only), leaving the router unaware of types.

  • Endpoint discovery MUST work by:

    • Runtime reflection: scanning assemblies for [StellaEndpoint] and handler interfaces.

    • Build-time reflection via source generation:

      • A Roslyn source generator MUST generate a descriptor list at build time.
      • At runtime, the SDK MUST prefer source-generated metadata and only fall back to reflection if generation is not available.

7.3 Endpoint metadata defaults & overrides

  • Microservices MUST be able to provide default endpoint metadata:

    • SupportsStreaming flag.
    • Default timeout.
    • Default RequiringClaims.
  • Microservice-local YAML MUST be allowed to override or refine these defaults per endpoint, keyed by (Method, Path).

  • Precedence rules MUST be clearly defined and honored:

    • Service identity & router pool: from StellaMicroserviceOptions (not YAML).
    • Endpoint set: from code (attributes/source gen); YAML MAY override properties but ideally not create endpoints not present in code (policy decision to be documented).
    • RequiringClaims and timeouts: YAML overrides defaults from code, unless overridden by central Authority.

7.4 Connection behavior

  • On establishing a connection to a router endpoint, the SDK MUST:

    • Immediately send a HELLO frame containing:

      • ServiceName, Version, Region, InstanceId.
      • The list of endpoints (Method, Path) with their metadata (SupportsStreaming, default timeouts, default RequiringClaims).
  • At regular intervals, the SDK MUST send HEARTBEAT frames on each connection indicating:

    • Instance health status.
    • Optional metrics (e.g. in-flight request count, error rate).
  • The SDK SHOULD support optional ENDPOINTS_UPDATE (or a re-HELLO) to update endpoint metadata at runtime if needed.

7.5 Request handling & streaming

  • For each incoming REQUEST frame:

    • The SDK MUST create a RawRequestContext with:

      • Method.

      • Path.

      • Headers.

      • A Body stream that either:

        • Wraps a buffered byte array.
        • Or exposes streaming reads from subsequent REQUEST_STREAM_DATA frames.
      • A CancellationToken that will be cancelled when the router sends a CANCEL frame or the connection fails.

  • The SDK MUST resolve the correct endpoint handler by (Method, Path) using the same path template rules as the router.

  • For streaming endpoints, handlers MUST be able to read from RawRequestContext.Body incrementally and obey the CancellationToken.

7.6 Cancellation handling (microservice side)

  • The SDK MUST maintain a map of in-flight requests by correlation ID, each containing:

    • A CancellationTokenSource.
    • The task executing the handler.
  • Upon receiving a CANCEL frame for a given correlation ID, the SDK MUST:

    • Look up the corresponding entry and call CancellationTokenSource.Cancel().
  • Handlers (both raw and typed) MUST receive a CancellationToken:

    • They MUST observe the token and be coded to cancel promptly where needed.
    • They MUST pass the token to downstream I/O operations (DB calls, file I/O, network).
  • If the transport connection is closed, the SDK MUST treat it as a cancellation trigger for all outstanding requests on that connection and cancel their tokens.


8. Control / health / ping requirements

  • Heartbeats MUST be sent over the same connection as requests (no separate control channel).

  • The router MUST:

    • Track LastHeartbeatUtc for each connection.
    • Derive InstanceHealthStatus based on heartbeat recency and optionally metrics.
    • Drop or mark as Unhealthy any instances whose heartbeats are stale past configured thresholds.
  • The router SHOULD measure network latency (ping) by:

    • Timing request-response round trips, or
    • Using explicit ping frames, and updating AveragePingMs for each connection.
  • The router MUST use heartbeat and ping metrics in its routing decision as described above.


9. Authorization / requiringClaims / Authority requirements

  • RequiringClaims MUST be the only authorization metadata field; AllowedRoles MUST NOT be used.

  • Every endpoint MUST be able to specify:

    • An empty RequiringClaims list (no additional claims required beyond authenticated).
    • Or one or more ClaimRequirement objects (Type + optional Value).
  • The gateway MUST enforce RequiringClaims per request:

    • Authorization MUST check that the requests user principal has all required claims for the endpoint.
  • Microservices MUST provide default RequiringClaims as part of their HELLO metadata.

  • There MUST be a mechanism for an external Authority service to override RequiringClaims centrally:

    • Defaults MUST come from microservices.
    • Authority MUST be able to push or supply overrides that the gateway applies at startup and/or at runtime.
    • The gateway MUST proactively request such overrides on startup (e.g. via a special message or mechanism) before handling traffic, or as early as practical.
  • Final, effective RequiringClaims enforced at the gateway MUST be derived from microservice defaults plus Authority overrides, with Authority taking precedence where applicable.


10. Cancellation requirements (router side)

  • The protocol MUST define a FrameType.Cancel with:

    • A CorrelationId indicating which request to cancel.
    • An optional payload containing a reason code (e.g. "ClientDisconnected", "Timeout", "PayloadLimitExceeded").
  • The router MUST send CANCEL frames when:

    • The HTTP client disconnects (ASP.NET HttpContext.RequestAborted fires) while the request is in progress.
    • The routers effective timeout for the request elapses, and no response has been received.
    • The router detects payload/memory limit breaches and has to abort the request.
    • The router is shutting down and explicitly aborts in-flight requests (if implemented).
  • The router MUST:

    • Stop forwarding any additional REQUEST_STREAM_DATA to the microservice once a CANCEL is sent.

    • Stop reading any remaining response frames for that correlation and either:

      • Discard them.
      • Or treat them as late, log them, and ignore them.
  • For streaming responses, if the HTTP client disconnects or router cancels:

    • The router MUST stop writing to the HTTP response and treat any subsequent frames as ignored.

11. Configuration and YAML requirements

  • __Libraries/StellaOps.Router.Config MUST handle:

    • Binding router config from JSON/appsettings + YAML + environment variables.

    • Static service definitions:

      • ServiceName.
      • DefaultVersion.
      • DefaultTransport.
      • Endpoint list (Method, Path) with default timeouts, requiringClaims, streaming flags.
    • Static instance definitions (optional):

      • ServiceName, Version, Region, supported transports, plugin-specific settings.
    • Global payload limits (PayloadLimits).

  • Router YAML config MUST support hot-reload:

    • Changes SHOULD be picked up at runtime without restarting the gateway.

    • Hot-reload MUST cause in-memory routing state to be updated, including:

      • New or removed services/endpoints.
      • New or removed instances (static).
      • Updated payload limits.
  • Microservice YAML config MUST be optional and used for endpoint-level overrides only, not for identity or router pool configuration.

  • The router pool for microservices MUST be configured via code and MAY be backed by YAML (with hot-plug / reconnection behavior) if desired.


12. Library naming / repo structure requirements

  • The router configuration library MUST be named __Libraries/StellaOps.Router.Config.

  • The microservice SDK library MUST be named __Libraries/StellaOps.Microservice.

  • The gateway webservice MUST be named StellaOps.Gateway.WebService.

  • There MUST be a “common” library for shared types and abstractions (e.g. __Libraries/StellaOps.Router.Common).

  • Documentation files MUST include at least:

    • Stella Ops Router.md (what it is, why, high-level architecture).
    • Stella Ops Router - Webserver.md (how the webservice works).
    • Stella Ops Router - Microservice.md (how the microservice SDK works and is implemented).
    • Stella Ops Router - Common.md (common components and how they are implemented).
    • Migration of Webservices to Microservices.md.
    • Stella Ops Router Documentation.md (doc structure & guidance).

13. Documentation & developer-experience requirements

  • The docs MUST be detailed; “do not spare details” implies:

    • High-fidelity, concrete examples and not hand-wavy descriptions.
  • For average C# developers, documentation MUST cover:

    • Exact .NET / ASP.NET Core target version and runtime baseline.

    • Required NuGet packages (logging, serialization, YAML parsing, RabbitMQ client, etc.).

    • Exact serialization formats for frames and payloads (JSON vs MessagePack vs others).

    • Exact framing rules for each transport (length-prefix for TCP/TLS, datagrams for UDP, exchanges/queues for Rabbit).

    • Concrete sample Program.cs for:

      • A gateway node.
      • A microservice.
    • Example endpoint implementations:

      • Typed (with and without request).
      • Raw streaming endpoints for large payloads.
    • Example router YAML and microservice YAML with realistic values.

    • Error and HTTP status mapping policy:

      • E.g. “version not found → 404 or 400; no instance available → 503; timeout → 504; payload too large → 413.”
    • Guidelines on:

      • When to use UDP vs TCP vs RabbitMQ.
      • How to configure and validate certificates for the certificate transport.
      • How to write cancellation-friendly handlers (proper use of CancellationToken).
      • Testing strategies: local dev setups, integration test harnesses, how to run router + microservice together for tests.
    • Clear explanation of config precedence:

      • Code options vs YAML vs microservice defaults vs Authority for claims.
  • Documentation MUST answer for each major concept:

    • What it is.
    • Why it exists.
    • How it works.
    • How to use it (with examples).
    • What happens when it is misused and how to debug issues.

14. Migration requirements

  • There MUST be a defined migration path from StellaOps.*.WebServices to StellaOps.*.Microservices.

  • Migration documentation MUST cover:

    • Inventorying existing HTTP routes (Method + Path).

    • Strategy A (in-place adaptation):

      • Adding microservice SDK into WebService.
      • Declaring endpoints with [StellaEndpoint].
      • Wrapping existing controller logic in handlers.
      • Connecting to the router and validating registration.
      • Gradually shifting traffic from direct WebService HTTP ingress to gateway routing.
    • Strategy B (split):

      • Extracting domain logic into shared libraries.
      • Creating a dedicated microservice project using the SDK.
      • Mapping routes and handlers.
      • Phasing out or repurposing the original WebService.
    • Ensuring cancellation tokens are wired throughout migrated code.

    • Handling streaming endpoints (large uploads/downloads) via IRawStellaEndpoint and streaming support instead of naive buffered HTTP controllers.


If you want, I can next turn this requirement set into a machine-readable checklist (e.g. JSON or YAML) or derive a first-pass implementation roadmap directly from these requirements.