22 KiB
I’ll group everything into requirement buckets, but keep it all as requirements statements (no rationale). This is the union of what you asked for or confirmed across the whole thread.
1. Architectural / scope requirements
-
There SHALL be a single HTTP ingress service named
StellaOps.Gateway.WebService. -
Microservices SHALL NOT expose HTTP to the router; all microservice-to-router traffic (control + data) MUST use in-house transports (UDP, TCP, certificate/TLS, RabbitMQ).
-
There SHALL NOT be a separate control-plane service or protocol; each transport connection between a microservice and the router MUST carry:
- Initial registration (HELLO) and endpoint configuration.
- Ongoing heartbeats.
- Endpoint updates (if any).
- Request/response and streaming data.
-
The router SHALL maintain per-connection endpoint mappings and derive its global routing state from the union of all live connections.
-
The router SHALL treat request and response bodies as opaque (raw bytes / streams); all deserialization and schema handling SHALL be the microservice’s responsibility.
-
The system SHALL support both buffered and streaming request/response flows end-to-end.
-
The design MUST reuse only the generic parts of
__SerdicaTemplate(dynamic endpoint metadata, attribute-based endpoint discovery, request routing patterns, correlation, connection management) and MUST drop Serdica-specific stack (Oracle schema, domain logic, etc.). -
The solution MUST be a simpler, generic replacement for the existing Serdica HTTP→RabbitMQ→microservice design.
2. Service identity, region, versioning
-
Each microservice instance SHALL be identified by
(ServiceName, Version, Region, InstanceId). -
VersionMUST follow strict semantic versioning (major.minor.patch). -
Routing MUST be strict on version:
- The router MUST only route a request to instances whose
Versionequals the selected version. - When a version is not explicitly specified by the client, a default version MUST be used (from config or metadata).
- The router MUST only route a request to instances whose
-
Each gateway node SHALL have a static configuration object
GatewayNodeConfigcontaining at least:Region(e.g."eu1").NodeId(e.g."gw-eu1-01").Environment(e.g."prod").
-
Routing decisions MUST use
GatewayNodeConfig.Regionas the node’s region; the router MUST NOT derive region from HTTP headers or URL host names. -
DNS/host naming conventions SHOULD express region in the domain (e.g.
eu1.global.stella-ops.org,mainoffice.contoso.stella-ops.org), but routing logic MUST be driven byGatewayNodeConfig.Regionrather than by host parsing.
3. Endpoint identity and metadata
-
Endpoint identity in the router and microservices MUST be
HTTP Method + Path, for example:Method: one ofGET,POST,PUT,PATCH,DELETE.Path: e.g./section/get/{id}.
-
The router and microservices MUST use the same path template syntax and matching rules (e.g. ASP.NET-style route templates), including decisions on:
- Case sensitivity.
- Trailing slash handling.
- Parameter segments (e.g.
{id}).
-
The router MUST resolve an incoming HTTP
(Method, Path)to a logical endpoint descriptor that includes:- ServiceName.
- Version.
- Method.
- Path.
- DefaultTimeout.
RequiringClaims: a list of claim requirements.- A flag indicating whether the endpoint supports streaming.
-
Every place that previously spoke about
AllowedRolesMUST be replaced withRequiringClaims:- Each requirement MUST at minimum contain a
Typeand MAY contain aValue.
- Each requirement MUST at minimum contain a
-
Endpoints MUST support being configured with default
RequiringClaimsin microservices, with the possibility of external override (see Authority section).
4. Routing algorithm / instance selection
-
Given a resolved endpoint
(ServiceName, Version, Method, Path), the router MUST:-
Filter candidate instances by:
- Matching
ServiceName. - Matching
Version(strict semver equality). - Health in an acceptable set (e.g.
HealthyorDegraded).
- Matching
-
-
Instances MUST have health metadata:
Status∈ {Unknown,Healthy,Degraded,Draining,Unhealthy}.LastHeartbeatUtc.AveragePingMs.
-
The router’s instance selection MUST obey these rules:
-
Region:
- Prefer instances whose
Region == GatewayNodeConfig.Region. - If none, fall back to configured neighbor regions.
- If none, fall back to all other regions.
- Prefer instances whose
-
Within a chosen region tier:
- Prefer lower
AveragePingMs. - If several are tied, prefer more recent
LastHeartbeatUtc. - If still tied, use a balancing strategy (e.g. random or round-robin).
- Prefer lower
-
-
The router MUST support a strict fallback order as requested:
-
Prefer “closest by region and heartbeat and ping.”
-
If having to choose between worse candidates, fall back in order of:
- Greater ping (latency).
- Greater heartbeat age.
- Less preferred region tier.
-
5. Transport plugin requirements
-
There MUST be a transport plugin abstraction representing how the router and microservices communicate.
-
The default transport type MUST be UDP.
-
Additional supported transport types MUST include:
- TCP.
- Certificate-based TCP (TLS / mTLS).
- RabbitMQ.
-
There MUST NOT be an HTTP transport plugin; HTTP MUST NOT be used for microservice-to-router communications (control or data).
-
Each transport plugin MUST support:
- Establishing logical connections between microservices and the router.
- Sending/receiving HELLO (registration), HEARTBEAT, optional ENDPOINTS_UPDATE.
- Sending/receiving REQUEST/RESPONSE frames.
- Supporting streaming via REQUEST_STREAM_DATA / RESPONSE_STREAM_DATA frames where the transport allows it.
- Sending/receiving CANCEL frames to abort specific in-flight requests.
-
UDP transport:
- MUST be used only for small/bounded payloads (no unbounded streaming).
- MUST respect configured
MaxRequestBytesPerCall.
-
TCP and Certificate transports:
- MUST implement a length-prefixed framing protocol capable of multiplexing frames for multiple correlation IDs.
- Certificate transport MUST enforce TLS and support optional mutual TLS (verifiable peer identity).
-
RabbitMQ:
- MUST implement queue/exchange naming and routing keys sufficient to represent logical connections and correlation IDs.
- MUST use message properties (e.g.
CorrelationId) for request/response matching.
6. Gateway (StellaOps.Gateway.WebService) requirements
6.1 HTTP ingress pipeline
-
The gateway MUST host an ASP.NET Core HTTP server.
-
The HTTP middleware pipeline MUST include at least:
- Forwarded headers handling (when behind reverse proxy).
- Request logging (e.g. via Serilog) including correlation ID, service, endpoint, region, instance.
- Global error-handling middleware.
- Authentication middleware.
EndpointResolutionMiddlewareto resolve(Method, Path)→ endpoint.- Authorization middleware that enforces
RequiringClaims. RoutingDecisionMiddlewareto choose connection/instance/transport.TransportDispatchMiddlewareto carry out buffered or streaming dispatch.
-
The gateway MUST read
MethodandPathfrom the HTTP request and use them to resolve endpoints.
6.2 Per-connection state and routing view
-
The gateway MUST maintain a
ConnectionStateper logical connection that includes:- ConnectionId.
InstanceDescriptor(InstanceId,ServiceName,Version,Region).Status,LastHeartbeatUtc,AveragePingMs.- The set of endpoints that this connection serves (
(Method, Path)→EndpointDescriptor). - The transport type for that connection.
-
The gateway MUST maintain a global routing state (
IGlobalRoutingState) that:- Resolves
(Method, Path)to anEndpointDescriptor(service, version, metadata). - Provides the set of
ConnectionStateobjects that can handle a given(ServiceName, Version, Method, Path).
- Resolves
6.3 Buffered vs streaming dispatch
-
The gateway MUST support:
-
Buffered mode for small to medium payloads:
- Read the entire HTTP body into memory (or temp file when above a threshold).
- Send as a single REQUEST payload.
-
Streaming mode for large or unknown content:
- Streaming from HTTP body to microservice via a sequence of REQUEST_STREAM_DATA frames.
- Streaming from microservice back to HTTP via RESPONSE_STREAM_DATA frames.
-
-
For each endpoint, the gateway MUST know whether it can use streaming or must use buffered mode (
SupportsStreamingflag).
6.4 Opaque body handling
- The gateway MUST treat request and response bodies as opaque byte sequences and MUST NOT attempt to deserialize or interpret payload contents.
- The gateway MUST forward headers and body bytes as given and leave any schema, JSON, or other decoding to the microservice.
6.5 Payload and memory protection
-
The gateway MUST enforce configured payload limits:
MaxRequestBytesPerCall.MaxRequestBytesPerConnection.MaxAggregateInflightBytes.
-
If
Content-Lengthis known and exceedsMaxRequestBytesPerCall, the gateway MUST reject the request early (e.g. HTTP 413 Payload Too Large). -
During streaming, the gateway MUST maintain counters of:
- Bytes read for this request.
- Bytes for this connection.
- Total in-flight bytes across all requests.
-
If any limit is exceeded mid-stream, the gateway MUST:
- Stop reading the HTTP body.
- Send a CANCEL frame for that correlation ID.
- Abort the stream to the microservice.
- Return an appropriate error to the client (e.g. 413 or 503) and log the incident.
7. Microservice SDK (__Libraries/StellaOps.Microservice) requirements
7.1 Identity & router connections
-
StellaMicroserviceOptionsMUST let microservices configure:ServiceName.Version.Region.InstanceId.- A list of router endpoints (
Routers/ router pool) including host, port, and transport type for each. - Optional path to a YAML config file for endpoint-level overrides.
-
Providing the router pool (
Routers/ HTTP servers pool) MUST be mandatory; a microservice cannot start without at least one configured router endpoint. -
The router pool SHOULD be configurable via code and MAY optionally be configured via YAML with hot-reload (causing reconnections if changed).
7.2 Endpoint definition & discovery
-
Microservice endpoints MUST be declared using attributes that specify
(Method, Path):[StellaEndpoint("POST", "/billing/invoices")] public sealed class CreateInvoiceEndpoint : ... -
The SDK MUST support two handler shapes:
-
Raw handler:
-
IRawStellaEndpointtaking aRawRequestContextand returning aRawResponse, where:RawRequestContext.Bodyis a stream (may be buffered or streaming).- Body contents are raw bytes.
-
-
Typed handlers:
IStellaEndpoint<TRequest, TResponse>which takes a typed request and returns a typed response.IStellaEndpoint<TResponse>which has no request payload and returns a typed response.
-
-
The SDK MUST adapt typed endpoints to the raw model internally (microservice-side only), leaving the router unaware of types.
-
Endpoint discovery MUST work by:
-
Runtime reflection: scanning assemblies for
[StellaEndpoint]and handler interfaces. -
Build-time reflection via source generation:
- A Roslyn source generator MUST generate a descriptor list at build time.
- At runtime, the SDK MUST prefer source-generated metadata and only fall back to reflection if generation is not available.
-
7.3 Endpoint metadata defaults & overrides
-
Microservices MUST be able to provide default endpoint metadata:
SupportsStreamingflag.- Default timeout.
- Default
RequiringClaims.
-
Microservice-local YAML MUST be allowed to override or refine these defaults per endpoint, keyed by
(Method, Path). -
Precedence rules MUST be clearly defined and honored:
- Service identity & router pool: from
StellaMicroserviceOptions(not YAML). - Endpoint set: from code (attributes/source gen); YAML MAY override properties but ideally not create endpoints not present in code (policy decision to be documented).
RequiringClaimsand timeouts: YAML overrides defaults from code, unless overridden by central Authority.
- Service identity & router pool: from
7.4 Connection behavior
-
On establishing a connection to a router endpoint, the SDK MUST:
-
Immediately send a HELLO frame containing:
ServiceName,Version,Region,InstanceId.- The list of endpoints (Method, Path) with their metadata (SupportsStreaming, default timeouts, default RequiringClaims).
-
-
At regular intervals, the SDK MUST send HEARTBEAT frames on each connection indicating:
- Instance health status.
- Optional metrics (e.g. in-flight request count, error rate).
-
The SDK SHOULD support optional ENDPOINTS_UPDATE (or a re-HELLO) to update endpoint metadata at runtime if needed.
7.5 Request handling & streaming
-
For each incoming REQUEST frame:
-
The SDK MUST create a
RawRequestContextwith:-
Method.
-
Path.
-
Headers.
-
A
Bodystream that either:- Wraps a buffered byte array.
- Or exposes streaming reads from subsequent REQUEST_STREAM_DATA frames.
-
A
CancellationTokenthat will be cancelled when the router sends a CANCEL frame or the connection fails.
-
-
-
The SDK MUST resolve the correct endpoint handler by
(Method, Path)using the same path template rules as the router. -
For streaming endpoints, handlers MUST be able to read from
RawRequestContext.Bodyincrementally and obey theCancellationToken.
7.6 Cancellation handling (microservice side)
-
The SDK MUST maintain a map of in-flight requests by correlation ID, each containing:
- A
CancellationTokenSource. - The task executing the handler.
- A
-
Upon receiving a CANCEL frame for a given correlation ID, the SDK MUST:
- Look up the corresponding entry and call
CancellationTokenSource.Cancel().
- Look up the corresponding entry and call
-
Handlers (both raw and typed) MUST receive a
CancellationToken:- They MUST observe the token and be coded to cancel promptly where needed.
- They MUST pass the token to downstream I/O operations (DB calls, file I/O, network).
-
If the transport connection is closed, the SDK MUST treat it as a cancellation trigger for all outstanding requests on that connection and cancel their tokens.
8. Control / health / ping requirements
-
Heartbeats MUST be sent over the same connection as requests (no separate control channel).
-
The router MUST:
- Track
LastHeartbeatUtcfor each connection. - Derive
InstanceHealthStatusbased on heartbeat recency and optionally metrics. - Drop or mark as Unhealthy any instances whose heartbeats are stale past configured thresholds.
- Track
-
The router SHOULD measure network latency (ping) by:
- Timing request-response round trips, or
- Using explicit ping frames, and updating
AveragePingMsfor each connection.
-
The router MUST use heartbeat and ping metrics in its routing decision as described above.
9. Authorization / requiringClaims / Authority requirements
-
RequiringClaimsMUST be the only authorization metadata field;AllowedRolesMUST NOT be used. -
Every endpoint MUST be able to specify:
- An empty
RequiringClaimslist (no additional claims required beyond authenticated). - Or one or more
ClaimRequirementobjects (Type + optional Value).
- An empty
-
The gateway MUST enforce
RequiringClaimsper request:- Authorization MUST check that the request’s user principal has all required claims for the endpoint.
-
Microservices MUST provide default
RequiringClaimsas part of their HELLO metadata. -
There MUST be a mechanism for an external Authority service to override
RequiringClaimscentrally:- Defaults MUST come from microservices.
- Authority MUST be able to push or supply overrides that the gateway applies at startup and/or at runtime.
- The gateway MUST proactively request such overrides on startup (e.g. via a special message or mechanism) before handling traffic, or as early as practical.
-
Final, effective
RequiringClaimsenforced at the gateway MUST be derived from microservice defaults plus Authority overrides, with Authority taking precedence where applicable.
10. Cancellation requirements (router side)
-
The protocol MUST define a
FrameType.Cancelwith:- A
CorrelationIdindicating which request to cancel. - An optional payload containing a reason code (e.g.
"ClientDisconnected","Timeout","PayloadLimitExceeded").
- A
-
The router MUST send CANCEL frames when:
- The HTTP client disconnects (ASP.NET
HttpContext.RequestAbortedfires) while the request is in progress. - The router’s effective timeout for the request elapses, and no response has been received.
- The router detects payload/memory limit breaches and has to abort the request.
- The router is shutting down and explicitly aborts in-flight requests (if implemented).
- The HTTP client disconnects (ASP.NET
-
The router MUST:
-
Stop forwarding any additional REQUEST_STREAM_DATA to the microservice once a CANCEL is sent.
-
Stop reading any remaining response frames for that correlation and either:
- Discard them.
- Or treat them as late, log them, and ignore them.
-
-
For streaming responses, if the HTTP client disconnects or router cancels:
- The router MUST stop writing to the HTTP response and treat any subsequent frames as ignored.
11. Configuration and YAML requirements
-
__Libraries/StellaOps.Router.ConfigMUST handle:-
Binding router config from JSON/appsettings + YAML + environment variables.
-
Static service definitions:
- ServiceName.
- DefaultVersion.
- DefaultTransport.
- Endpoint list (Method, Path) with default timeouts, requiringClaims, streaming flags.
-
Static instance definitions (optional):
- ServiceName, Version, Region, supported transports, plugin-specific settings.
-
Global payload limits (
PayloadLimits).
-
-
Router YAML config MUST support hot-reload:
-
Changes SHOULD be picked up at runtime without restarting the gateway.
-
Hot-reload MUST cause in-memory routing state to be updated, including:
- New or removed services/endpoints.
- New or removed instances (static).
- Updated payload limits.
-
-
Microservice YAML config MUST be optional and used for endpoint-level overrides only, not for identity or router pool configuration.
-
The router pool for microservices MUST be configured via code and MAY be backed by YAML (with hot-plug / reconnection behavior) if desired.
12. Library naming / repo structure requirements
-
The router configuration library MUST be named
__Libraries/StellaOps.Router.Config. -
The microservice SDK library MUST be named
__Libraries/StellaOps.Microservice. -
The gateway webservice MUST be named
StellaOps.Gateway.WebService. -
There MUST be a “common” library for shared types and abstractions (e.g.
__Libraries/StellaOps.Router.Common). -
Documentation files MUST include at least:
Stella Ops Router.md(what it is, why, high-level architecture).Stella Ops Router - Webserver.md(how the webservice works).Stella Ops Router - Microservice.md(how the microservice SDK works and is implemented).Stella Ops Router - Common.md(common components and how they are implemented).Migration of Webservices to Microservices.md.Stella Ops Router Documentation.md(doc structure & guidance).
13. Documentation & developer-experience requirements
-
The docs MUST be detailed; “do not spare details” implies:
- High-fidelity, concrete examples and not hand-wavy descriptions.
-
For average C# developers, documentation MUST cover:
-
Exact .NET / ASP.NET Core target version and runtime baseline.
-
Required NuGet packages (logging, serialization, YAML parsing, RabbitMQ client, etc.).
-
Exact serialization formats for frames and payloads (JSON vs MessagePack vs others).
-
Exact framing rules for each transport (length-prefix for TCP/TLS, datagrams for UDP, exchanges/queues for Rabbit).
-
Concrete sample
Program.csfor:- A gateway node.
- A microservice.
-
Example endpoint implementations:
- Typed (with and without request).
- Raw streaming endpoints for large payloads.
-
Example router YAML and microservice YAML with realistic values.
-
Error and HTTP status mapping policy:
- E.g. “version not found → 404 or 400; no instance available → 503; timeout → 504; payload too large → 413.”
-
Guidelines on:
- When to use UDP vs TCP vs RabbitMQ.
- How to configure and validate certificates for the certificate transport.
- How to write cancellation-friendly handlers (proper use of
CancellationToken). - Testing strategies: local dev setups, integration test harnesses, how to run router + microservice together for tests.
-
Clear explanation of config precedence:
- Code options vs YAML vs microservice defaults vs Authority for claims.
-
-
Documentation MUST answer for each major concept:
- What it is.
- Why it exists.
- How it works.
- How to use it (with examples).
- What happens when it is misused and how to debug issues.
14. Migration requirements
-
There MUST be a defined migration path from
StellaOps.*.WebServicestoStellaOps.*.Microservices. -
Migration documentation MUST cover:
-
Inventorying existing HTTP routes (Method + Path).
-
Strategy A (in-place adaptation):
- Adding microservice SDK into WebService.
- Declaring endpoints with
[StellaEndpoint]. - Wrapping existing controller logic in handlers.
- Connecting to the router and validating registration.
- Gradually shifting traffic from direct WebService HTTP ingress to gateway routing.
-
Strategy B (split):
- Extracting domain logic into shared libraries.
- Creating a dedicated microservice project using the SDK.
- Mapping routes and handlers.
- Phasing out or repurposing the original WebService.
-
Ensuring cancellation tokens are wired throughout migrated code.
-
Handling streaming endpoints (large uploads/downloads) via
IRawStellaEndpointand streaming support instead of naive buffered HTTP controllers.
-
If you want, I can next turn this requirement set into a machine-readable checklist (e.g. JSON or YAML) or derive a first-pass implementation roadmap directly from these requirements.