I’ll group everything into requirement buckets, but keep it all as requirements statements (no rationale). This is the union of what you asked for or confirmed across the whole thread. --- ## 1. Architectural / scope requirements * There SHALL be a single HTTP ingress service named `StellaOps.Gateway.WebService`. * Microservices SHALL NOT expose HTTP to the router; all microservice-to-router traffic (control + data) MUST use in-house transports (UDP, TCP, certificate/TLS, RabbitMQ). * There SHALL NOT be a separate control-plane service or protocol; each transport connection between a microservice and the router MUST carry: * Initial registration (HELLO) and endpoint configuration. * Ongoing heartbeats. * Endpoint updates (if any). * Request/response and streaming data. * The router SHALL maintain per-connection endpoint mappings and derive its global routing state from the union of all live connections. * The router SHALL treat request and response bodies as opaque (raw bytes / streams); all deserialization and schema handling SHALL be the microservice’s responsibility. * The system SHALL support both buffered and streaming request/response flows end-to-end. * The design MUST reuse only the generic parts of `__SerdicaTemplate` (dynamic endpoint metadata, attribute-based endpoint discovery, request routing patterns, correlation, connection management) and MUST drop Serdica-specific stack (Oracle schema, domain logic, etc.). * The solution MUST be a simpler, generic replacement for the existing Serdica HTTP→RabbitMQ→microservice design. --- ## 2. Service identity, region, versioning * Each microservice instance SHALL be identified by `(ServiceName, Version, Region, InstanceId)`. * `Version` MUST follow strict semantic versioning (`major.minor.patch`). * Routing MUST be strict on version: * The router MUST only route a request to instances whose `Version` equals the selected version. * When a version is not explicitly specified by the client, a default version MUST be used (from config or metadata). * Each gateway node SHALL have a static configuration object `GatewayNodeConfig` containing at least: * `Region` (e.g. `"eu1"`). * `NodeId` (e.g. `"gw-eu1-01"`). * `Environment` (e.g. `"prod"`). * Routing decisions MUST use `GatewayNodeConfig.Region` as the node’s region; the router MUST NOT derive region from HTTP headers or URL host names. * DNS/host naming conventions SHOULD express region in the domain (e.g. `eu1.global.stella-ops.org`, `mainoffice.contoso.stella-ops.org`), but routing logic MUST be driven by `GatewayNodeConfig.Region` rather than by host parsing. --- ## 3. Endpoint identity and metadata * Endpoint identity in the router and microservices MUST be `HTTP Method + Path`, for example: * `Method`: one of `GET`, `POST`, `PUT`, `PATCH`, `DELETE`. * `Path`: e.g. `/section/get/{id}`. * The router and microservices MUST use the same path template syntax and matching rules (e.g. ASP.NET-style route templates), including decisions on: * Case sensitivity. * Trailing slash handling. * Parameter segments (e.g. `{id}`). * The router MUST resolve an incoming HTTP `(Method, Path)` to a logical endpoint descriptor that includes: * ServiceName. * Version. * Method. * Path. * DefaultTimeout. * `RequiringClaims`: a list of claim requirements. * A flag indicating whether the endpoint supports streaming. * Every place that previously spoke about `AllowedRoles` MUST be replaced with `RequiringClaims`: * Each requirement MUST at minimum contain a `Type` and MAY contain a `Value`. * Endpoints MUST support being configured with default `RequiringClaims` in microservices, with the possibility of external override (see Authority section). --- ## 4. Routing algorithm / instance selection * Given a resolved endpoint `(ServiceName, Version, Method, Path)`, the router MUST: * Filter candidate instances by: * Matching `ServiceName`. * Matching `Version` (strict semver equality). * Health in an acceptable set (e.g. `Healthy` or `Degraded`). * Instances MUST have health metadata: * `Status` ∈ {`Unknown`, `Healthy`, `Degraded`, `Draining`, `Unhealthy`}. * `LastHeartbeatUtc`. * `AveragePingMs`. * The router’s instance selection MUST obey these rules: * Region: * Prefer instances whose `Region == GatewayNodeConfig.Region`. * If none, fall back to configured neighbor regions. * If none, fall back to all other regions. * Within a chosen region tier: * Prefer lower `AveragePingMs`. * If several are tied, prefer more recent `LastHeartbeatUtc`. * If still tied, use a balancing strategy (e.g. random or round-robin). * The router MUST support a strict fallback order as requested: * Prefer “closest by region and heartbeat and ping.” * If having to choose between worse candidates, fall back in order of: * Greater ping (latency). * Greater heartbeat age. * Less preferred region tier. --- ## 5. Transport plugin requirements * There MUST be a transport plugin abstraction representing how the router and microservices communicate. * The default transport type MUST be UDP. * Additional supported transport types MUST include: * TCP. * Certificate-based TCP (TLS / mTLS). * RabbitMQ. * There MUST NOT be an HTTP transport plugin; HTTP MUST NOT be used for microservice-to-router communications (control or data). * Each transport plugin MUST support: * Establishing logical connections between microservices and the router. * Sending/receiving HELLO (registration), HEARTBEAT, optional ENDPOINTS_UPDATE. * Sending/receiving REQUEST/RESPONSE frames. * Supporting streaming via REQUEST_STREAM_DATA / RESPONSE_STREAM_DATA frames where the transport allows it. * Sending/receiving CANCEL frames to abort specific in-flight requests. * UDP transport: * MUST be used only for small/bounded payloads (no unbounded streaming). * MUST respect configured `MaxRequestBytesPerCall`. * TCP and Certificate transports: * MUST implement a length-prefixed framing protocol capable of multiplexing frames for multiple correlation IDs. * Certificate transport MUST enforce TLS and support optional mutual TLS (verifiable peer identity). * RabbitMQ: * MUST implement queue/exchange naming and routing keys sufficient to represent logical connections and correlation IDs. * MUST use message properties (e.g. `CorrelationId`) for request/response matching. --- ## 6. Gateway (`StellaOps.Gateway.WebService`) requirements ### 6.1 HTTP ingress pipeline * The gateway MUST host an ASP.NET Core HTTP server. * The HTTP middleware pipeline MUST include at least: * Forwarded headers handling (when behind reverse proxy). * Request logging (e.g. via Serilog) including correlation ID, service, endpoint, region, instance. * Global error-handling middleware. * Authentication middleware. * `EndpointResolutionMiddleware` to resolve `(Method, Path)` → endpoint. * Authorization middleware that enforces `RequiringClaims`. * `RoutingDecisionMiddleware` to choose connection/instance/transport. * `TransportDispatchMiddleware` to carry out buffered or streaming dispatch. * The gateway MUST read `Method` and `Path` from the HTTP request and use them to resolve endpoints. ### 6.2 Per-connection state and routing view * The gateway MUST maintain a `ConnectionState` per logical connection that includes: * ConnectionId. * `InstanceDescriptor` (`InstanceId`, `ServiceName`, `Version`, `Region`). * `Status`, `LastHeartbeatUtc`, `AveragePingMs`. * The set of endpoints that this connection serves (`(Method, Path)` → `EndpointDescriptor`). * The transport type for that connection. * The gateway MUST maintain a global routing state (`IGlobalRoutingState`) that: * Resolves `(Method, Path)` to an `EndpointDescriptor` (service, version, metadata). * Provides the set of `ConnectionState` objects that can handle a given `(ServiceName, Version, Method, Path)`. ### 6.3 Buffered vs streaming dispatch * The gateway MUST support: * **Buffered mode** for small to medium payloads: * Read the entire HTTP body into memory (or temp file when above a threshold). * Send as a single REQUEST payload. * **Streaming mode** for large or unknown content: * Streaming from HTTP body to microservice via a sequence of REQUEST_STREAM_DATA frames. * Streaming from microservice back to HTTP via RESPONSE_STREAM_DATA frames. * For each endpoint, the gateway MUST know whether it can use streaming or must use buffered mode (`SupportsStreaming` flag). ### 6.4 Opaque body handling * The gateway MUST treat request and response bodies as opaque byte sequences and MUST NOT attempt to deserialize or interpret payload contents. * The gateway MUST forward headers and body bytes as given and leave any schema, JSON, or other decoding to the microservice. ### 6.5 Payload and memory protection * The gateway MUST enforce configured payload limits: * `MaxRequestBytesPerCall`. * `MaxRequestBytesPerConnection`. * `MaxAggregateInflightBytes`. * If `Content-Length` is known and exceeds `MaxRequestBytesPerCall`, the gateway MUST reject the request early (e.g. HTTP 413 Payload Too Large). * During streaming, the gateway MUST maintain counters of: * Bytes read for this request. * Bytes for this connection. * Total in-flight bytes across all requests. * If any limit is exceeded mid-stream, the gateway MUST: * Stop reading the HTTP body. * Send a CANCEL frame for that correlation ID. * Abort the stream to the microservice. * Return an appropriate error to the client (e.g. 413 or 503) and log the incident. --- ## 7. Microservice SDK (`__Libraries/StellaOps.Microservice`) requirements ### 7.1 Identity & router connections * `StellaMicroserviceOptions` MUST let microservices configure: * `ServiceName`. * `Version`. * `Region`. * `InstanceId`. * A list of router endpoints (`Routers` / router pool) including host, port, and transport type for each. * Optional path to a YAML config file for endpoint-level overrides. * Providing the router pool (`Routers` / HTTP servers pool) MUST be mandatory; a microservice cannot start without at least one configured router endpoint. * The router pool SHOULD be configurable via code and MAY optionally be configured via YAML with hot-reload (causing reconnections if changed). ### 7.2 Endpoint definition & discovery * Microservice endpoints MUST be declared using attributes that specify `(Method, Path)`: ```csharp [StellaEndpoint("POST", "/billing/invoices")] public sealed class CreateInvoiceEndpoint : ... ``` * The SDK MUST support two handler shapes: * Raw handler: * `IRawStellaEndpoint` taking a `RawRequestContext` and returning a `RawResponse`, where: * `RawRequestContext.Body` is a stream (may be buffered or streaming). * Body contents are raw bytes. * Typed handlers: * `IStellaEndpoint` which takes a typed request and returns a typed response. * `IStellaEndpoint` which has no request payload and returns a typed response. * The SDK MUST adapt typed endpoints to the raw model internally (microservice-side only), leaving the router unaware of types. * Endpoint discovery MUST work by: * Runtime reflection: scanning assemblies for `[StellaEndpoint]` and handler interfaces. * Build-time reflection via source generation: * A Roslyn source generator MUST generate a descriptor list at build time. * At runtime, the SDK MUST prefer source-generated metadata and only fall back to reflection if generation is not available. ### 7.3 Endpoint metadata defaults & overrides * Microservices MUST be able to provide default endpoint metadata: * `SupportsStreaming` flag. * Default timeout. * Default `RequiringClaims`. * Microservice-local YAML MUST be allowed to override or refine these defaults per endpoint, keyed by `(Method, Path)`. * Precedence rules MUST be clearly defined and honored: * Service identity & router pool: from `StellaMicroserviceOptions` (not YAML). * Endpoint set: from code (attributes/source gen); YAML MAY override properties but ideally not create endpoints not present in code (policy decision to be documented). * `RequiringClaims` and timeouts: YAML overrides defaults from code, unless overridden by central Authority. ### 7.4 Connection behavior * On establishing a connection to a router endpoint, the SDK MUST: * Immediately send a HELLO frame containing: * `ServiceName`, `Version`, `Region`, `InstanceId`. * The list of endpoints (Method, Path) with their metadata (SupportsStreaming, default timeouts, default RequiringClaims). * At regular intervals, the SDK MUST send HEARTBEAT frames on each connection indicating: * Instance health status. * Optional metrics (e.g. in-flight request count, error rate). * The SDK SHOULD support optional ENDPOINTS_UPDATE (or a re-HELLO) to update endpoint metadata at runtime if needed. ### 7.5 Request handling & streaming * For each incoming REQUEST frame: * The SDK MUST create a `RawRequestContext` with: * Method. * Path. * Headers. * A `Body` stream that either: * Wraps a buffered byte array. * Or exposes streaming reads from subsequent REQUEST_STREAM_DATA frames. * A `CancellationToken` that will be cancelled when the router sends a CANCEL frame or the connection fails. * The SDK MUST resolve the correct endpoint handler by `(Method, Path)` using the same path template rules as the router. * For streaming endpoints, handlers MUST be able to read from `RawRequestContext.Body` incrementally and obey the `CancellationToken`. ### 7.6 Cancellation handling (microservice side) * The SDK MUST maintain a map of in-flight requests by correlation ID, each containing: * A `CancellationTokenSource`. * The task executing the handler. * Upon receiving a CANCEL frame for a given correlation ID, the SDK MUST: * Look up the corresponding entry and call `CancellationTokenSource.Cancel()`. * Handlers (both raw and typed) MUST receive a `CancellationToken`: * They MUST observe the token and be coded to cancel promptly where needed. * They MUST pass the token to downstream I/O operations (DB calls, file I/O, network). * If the transport connection is closed, the SDK MUST treat it as a cancellation trigger for all outstanding requests on that connection and cancel their tokens. --- ## 8. Control / health / ping requirements * Heartbeats MUST be sent over the same connection as requests (no separate control channel). * The router MUST: * Track `LastHeartbeatUtc` for each connection. * Derive `InstanceHealthStatus` based on heartbeat recency and optionally metrics. * Drop or mark as Unhealthy any instances whose heartbeats are stale past configured thresholds. * The router SHOULD measure network latency (ping) by: * Timing request-response round trips, or * Using explicit ping frames, and updating `AveragePingMs` for each connection. * The router MUST use heartbeat and ping metrics in its routing decision as described above. --- ## 9. Authorization / requiringClaims / Authority requirements * `RequiringClaims` MUST be the only authorization metadata field; `AllowedRoles` MUST NOT be used. * Every endpoint MUST be able to specify: * An empty `RequiringClaims` list (no additional claims required beyond authenticated). * Or one or more `ClaimRequirement` objects (Type + optional Value). * The gateway MUST enforce `RequiringClaims` per request: * Authorization MUST check that the request’s user principal has all required claims for the endpoint. * Microservices MUST provide default `RequiringClaims` as part of their HELLO metadata. * There MUST be a mechanism for an external Authority service to override `RequiringClaims` centrally: * Defaults MUST come from microservices. * Authority MUST be able to push or supply overrides that the gateway applies at startup and/or at runtime. * The gateway MUST proactively request such overrides on startup (e.g. via a special message or mechanism) before handling traffic, or as early as practical. * Final, effective `RequiringClaims` enforced at the gateway MUST be derived from microservice defaults plus Authority overrides, with Authority taking precedence where applicable. --- ## 10. Cancellation requirements (router side) * The protocol MUST define a `FrameType.Cancel` with: * A `CorrelationId` indicating which request to cancel. * An optional payload containing a reason code (e.g. `"ClientDisconnected"`, `"Timeout"`, `"PayloadLimitExceeded"`). * The router MUST send CANCEL frames when: * The HTTP client disconnects (ASP.NET `HttpContext.RequestAborted` fires) while the request is in progress. * The router’s effective timeout for the request elapses, and no response has been received. * The router detects payload/memory limit breaches and has to abort the request. * The router is shutting down and explicitly aborts in-flight requests (if implemented). * The router MUST: * Stop forwarding any additional REQUEST_STREAM_DATA to the microservice once a CANCEL is sent. * Stop reading any remaining response frames for that correlation and either: * Discard them. * Or treat them as late, log them, and ignore them. * For streaming responses, if the HTTP client disconnects or router cancels: * The router MUST stop writing to the HTTP response and treat any subsequent frames as ignored. --- ## 11. Configuration and YAML requirements * `__Libraries/StellaOps.Router.Config` MUST handle: * Binding router config from JSON/appsettings + YAML + environment variables. * Static service definitions: * ServiceName. * DefaultVersion. * DefaultTransport. * Endpoint list (Method, Path) with default timeouts, requiringClaims, streaming flags. * Static instance definitions (optional): * ServiceName, Version, Region, supported transports, plugin-specific settings. * Global payload limits (`PayloadLimits`). * Router YAML config MUST support hot-reload: * Changes SHOULD be picked up at runtime without restarting the gateway. * Hot-reload MUST cause in-memory routing state to be updated, including: * New or removed services/endpoints. * New or removed instances (static). * Updated payload limits. * Microservice YAML config MUST be optional and used for endpoint-level overrides only, not for identity or router pool configuration. * The router pool for microservices MUST be configured via code and MAY be backed by YAML (with hot-plug / reconnection behavior) if desired. --- ## 12. Library naming / repo structure requirements * The router configuration library MUST be named `__Libraries/StellaOps.Router.Config`. * The microservice SDK library MUST be named `__Libraries/StellaOps.Microservice`. * The gateway webservice MUST be named `StellaOps.Gateway.WebService`. * There MUST be a “common” library for shared types and abstractions (e.g. `__Libraries/StellaOps.Router.Common`). * Documentation files MUST include at least: * `Stella Ops Router.md` (what it is, why, high-level architecture). * `Stella Ops Router - Webserver.md` (how the webservice works). * `Stella Ops Router - Microservice.md` (how the microservice SDK works and is implemented). * `Stella Ops Router - Common.md` (common components and how they are implemented). * `Migration of Webservices to Microservices.md`. * `Stella Ops Router Documentation.md` (doc structure & guidance). --- ## 13. Documentation & developer-experience requirements * The docs MUST be detailed; “do not spare details” implies: * High-fidelity, concrete examples and not hand-wavy descriptions. * For average C# developers, documentation MUST cover: * Exact .NET / ASP.NET Core target version and runtime baseline. * Required NuGet packages (logging, serialization, YAML parsing, RabbitMQ client, etc.). * Exact serialization formats for frames and payloads (JSON vs MessagePack vs others). * Exact framing rules for each transport (length-prefix for TCP/TLS, datagrams for UDP, exchanges/queues for Rabbit). * Concrete sample `Program.cs` for: * A gateway node. * A microservice. * Example endpoint implementations: * Typed (with and without request). * Raw streaming endpoints for large payloads. * Example router YAML and microservice YAML with realistic values. * Error and HTTP status mapping policy: * E.g. “version not found → 404 or 400; no instance available → 503; timeout → 504; payload too large → 413.” * Guidelines on: * When to use UDP vs TCP vs RabbitMQ. * How to configure and validate certificates for the certificate transport. * How to write cancellation-friendly handlers (proper use of `CancellationToken`). * Testing strategies: local dev setups, integration test harnesses, how to run router + microservice together for tests. * Clear explanation of config precedence: * Code options vs YAML vs microservice defaults vs Authority for claims. * Documentation MUST answer for each major concept: * What it is. * Why it exists. * How it works. * How to use it (with examples). * What happens when it is misused and how to debug issues. --- ## 14. Migration requirements * There MUST be a defined migration path from `StellaOps.*.WebServices` to `StellaOps.*.Microservices`. * Migration documentation MUST cover: * Inventorying existing HTTP routes (Method + Path). * Strategy A (in-place adaptation): * Adding microservice SDK into WebService. * Declaring endpoints with `[StellaEndpoint]`. * Wrapping existing controller logic in handlers. * Connecting to the router and validating registration. * Gradually shifting traffic from direct WebService HTTP ingress to gateway routing. * Strategy B (split): * Extracting domain logic into shared libraries. * Creating a dedicated microservice project using the SDK. * Mapping routes and handlers. * Phasing out or repurposing the original WebService. * Ensuring cancellation tokens are wired throughout migrated code. * Handling streaming endpoints (large uploads/downloads) via `IRawStellaEndpoint` and streaming support instead of naive buffered HTTP controllers. --- If you want, I can next turn this requirement set into a machine-readable checklist (e.g. JSON or YAML) or derive a first-pass implementation roadmap directly from these requirements.