From 0c9e8d5d188ee7216bf8f95c01c8a913c90ab605 Mon Sep 17 00:00:00 2001 From: master <> Date: Tue, 2 Dec 2025 18:38:32 +0200 Subject: [PATCH] router planning --- docs/router/01-Step.md | 422 +++++++++++++ docs/router/02-Step.md | 375 +++++++++++ docs/router/03-Step.md | 144 +++++ docs/router/04-Step.md | 520 ++++++++++++++++ docs/router/05-Step.md | 554 +++++++++++++++++ docs/router/06-Step.md | 541 ++++++++++++++++ docs/router/07-Step.md | 378 +++++++++++ docs/router/08-Step.md | 501 +++++++++++++++ docs/router/09-Step.md | 562 +++++++++++++++++ docs/router/10-Step.md | 586 ++++++++++++++++++ docs/router/11-Step.md | 550 ++++++++++++++++ docs/router/12-Step.md | 415 +++++++++++++ .../SPRINT_7000_0001_0001_router_skeleton.md | 41 ++ docs/router/implplan.md | 356 +++++++++++ docs/router/specs.md | 494 +++++++++++++++ 15 files changed, 6439 insertions(+) create mode 100644 docs/router/01-Step.md create mode 100644 docs/router/02-Step.md create mode 100644 docs/router/03-Step.md create mode 100644 docs/router/04-Step.md create mode 100644 docs/router/05-Step.md create mode 100644 docs/router/06-Step.md create mode 100644 docs/router/07-Step.md create mode 100644 docs/router/08-Step.md create mode 100644 docs/router/09-Step.md create mode 100644 docs/router/10-Step.md create mode 100644 docs/router/11-Step.md create mode 100644 docs/router/12-Step.md create mode 100644 docs/router/SPRINT_7000_0001_0001_router_skeleton.md create mode 100644 docs/router/implplan.md create mode 100644 docs/router/specs.md diff --git a/docs/router/01-Step.md b/docs/router/01-Step.md new file mode 100644 index 000000000..bfc56f67a --- /dev/null +++ b/docs/router/01-Step.md @@ -0,0 +1,422 @@ +Goal for this phase: get a clean, compiling skeleton in place that matches the spec and folder conventions, with zero real logic and minimal dependencies. After this, all future work plugs into this structure. + +I’ll break it into concrete tasks you can assign to agents. + +--- + +## 1. Define the repository layout + +**Owner: “Skeleton” / infra agent** + +Target layout (no code yet, just dirs): + +```text +/ (repo root) + StellaOps.Router.sln + /src + /StellaOps.Gateway.WebService + /__Libraries + /StellaOps.Router.Common + /StellaOps.Router.Config + /StellaOps.Microservice + /StellaOps.Microservice.SourceGen (empty stub for now) + /tests + /StellaOps.Router.Common.Tests + /StellaOps.Gateway.WebService.Tests + /StellaOps.Microservice.Tests + /docs + /router + specs.md (already exists) + README.md (placeholder, 2–3 lines) +``` + +Tasks: + +1. Create `src`, `src/__Libraries`, `tests`, `docs/router` directories if missing. +2. Move/confirm `docs/router/specs.md` is the canonical spec. +3. Add `docs/router/README.md` with a pointer: “Start with specs.md; this folder will host router-related docs.” + +--- + +## 2. Create the solution and projects + +**Owner: skeleton agent** + +### 2.1 Create solution + +* At repo root: + + ```bash + dotnet new sln -n StellaOps.Router + ``` + +* Add projects as they are created in the next step. + +### 2.2 Create projects + +For each project below: + +* `dotnet new` with appropriate template. +* Set `RootNamespace` / `AssemblyName` to match folder & spec. + +Projects: + +1. **Gateway webservice** + + ```bash + cd src/StellaOps.Gateway.WebService + dotnet new webapi -n StellaOps.Gateway.WebService + ``` + + * This will create an ASP.NET Core Web API project; we’ll trim later. + +2. **Common library** + + ```bash + cd src/__Libraries + dotnet new classlib -n StellaOps.Router.Common + ``` + +3. **Config library** + + ```bash + dotnet new classlib -n StellaOps.Router.Config + ``` + +4. **Microservice SDK** + + ```bash + dotnet new classlib -n StellaOps.Microservice + ``` + +5. **Microservice Source Generator (stub)** + + ```bash + dotnet new classlib -n StellaOps.Microservice.SourceGen + ``` + + * This will be converted to an Analyzer/SourceGen project later; for now it can compile as a plain library. + +6. **Test projects** + + Under `tests`: + + ```bash + cd tests + dotnet new xunit -n StellaOps.Router.Common.Tests + dotnet new xunit -n StellaOps.Gateway.WebService.Tests + dotnet new xunit -n StellaOps.Microservice.Tests + ``` + +### 2.3 Add projects to solution + +At repo root: + +```bash +dotnet sln StellaOps.Router.sln add \ + src/StellaOps.Gateway.WebService/StellaOps.Gateway.WebService.csproj \ + src/__Libraries/StellaOps.Router.Common/StellaOps.Router.Common.csproj \ + src/__Libraries/StellaOps.Router.Config/StellaOps.Router.Config.csproj \ + src/__Libraries/StellaOps.Microservice/StellaOps.Microservice.csproj \ + src/__Libraries/StellaOps.Microservice.SourceGen/StellaOps.Microservice.SourceGen.csproj \ + tests/StellaOps.Router.Common.Tests/StellaOps.Router.Common.Tests.csproj \ + tests/StellaOps.Gateway.WebService.Tests/StellaOps.Gateway.WebService.Tests.csproj \ + tests/StellaOps.Microservice.Tests/StellaOps.Microservice.Tests.csproj +``` + +--- + +## 3. Wire basic project references + +**Owner: skeleton agent** + +The reference graph should be: + +* `StellaOps.Gateway.WebService` + + * references `StellaOps.Router.Common` + * references `StellaOps.Router.Config` + +* `StellaOps.Microservice` + + * references `StellaOps.Router.Common` + * (later) references `StellaOps.Microservice.SourceGen` as analyzer; for now no reference. + +* `StellaOps.Router.Config` + + * references `StellaOps.Router.Common` (for `EndpointDescriptor`, `InstanceDescriptor`, etc.) + +Test projects: + +* `StellaOps.Router.Common.Tests` → `StellaOps.Router.Common` +* `StellaOps.Gateway.WebService.Tests` → `StellaOps.Gateway.WebService` +* `StellaOps.Microservice.Tests` → `StellaOps.Microservice` + +Use `dotnet add reference`: + +```bash +dotnet add src/StellaOps.Gateway.WebService/StellaOps.Gateway.WebService.csproj reference \ + src/__Libraries/StellaOps.Router.Common/StellaOps.Router.Common.csproj \ + src/__Libraries/StellaOps.Router.Config/StellaOps.Router.Config.csproj + +dotnet add src/__Libraries/StellaOps.Microservice/StellaOps.Microservice.csproj reference \ + src/__Libraries/StellaOps.Router.Common/StellaOps.Router.Common.csproj + +dotnet add src/__Libraries/StellaOps.Router.Config/StellaOps.Router.Config.csproj reference \ + src/__Libraries/StellaOps.Router.Common/StellaOps.Router.Common.csproj + +dotnet add tests/StellaOps.Router.Common.Tests/StellaOps.Router.Common.Tests.csproj reference \ + src/__Libraries/StellaOps.Router.Common/StellaOps.Router.Common.csproj + +dotnet add tests/StellaOps.Gateway.WebService.Tests/StellaOps.Gateway.WebService.Tests.csproj reference \ + src/StellaOps.Gateway.WebService/StellaOps.Gateway.WebService.csproj + +dotnet add tests/StellaOps.Microservice.Tests/StellaOps.Microservice.Tests.csproj reference \ + src/__Libraries/StellaOps.Microservice/StellaOps.Microservice.csproj +``` + +--- + +## 4. Set common build settings + +**Owner: infra agent** + +Add a `Directory.Build.props` at repo root to centralize: + +* Target framework (e.g. `net8.0`). +* Nullable context. +* LangVersion. + +Example (minimal): + +```xml + + + net8.0 + enable + preview + enable + + +``` + +Then, strip redundant `` from individual `.csproj` files if desired. + +--- + +## 5. Stub namespaces and “empty” entry points + +**Owner: each project’s agent** + +### 5.1 Common library + +Create empty placeholder types that match the spec names (no logic, just shells) so everything compiles and IntelliSense knows the shapes. + +Example files: + +* `TransportType.cs` +* `FrameType.cs` +* `InstanceHealthStatus.cs` +* `ClaimRequirement.cs` +* `EndpointDescriptor.cs` +* `InstanceDescriptor.cs` +* `ConnectionState.cs` +* `RoutingContext.cs` +* `RoutingDecision.cs` +* `PayloadLimits.cs` +* Interfaces: `IGlobalRoutingState`, `IRoutingPlugin`, `ITransportServer`, `ITransportClient`. + +Each type can be an auto-property-only record/class/enum; no methods yet. + +Example: + +```csharp +namespace StellaOps.Router.Common; + +public enum TransportType +{ + Udp, + Tcp, + Certificate, + RabbitMq +} +``` + +and so on. + +### 5.2 Config library + +Add a minimal `RouterConfig` and `PayloadLimits` class aligned with the spec; again, just properties. + +```csharp +namespace StellaOps.Router.Config; + +public sealed class RouterConfig +{ + public IList Services { get; init; } = new List(); + public PayloadLimits PayloadLimits { get; init; } = new(); +} + +public sealed class ServiceConfig +{ + public string Name { get; init; } = string.Empty; + public string DefaultVersion { get; init; } = "1.0.0"; +} +``` + +No YAML binding, no logic yet. + +### 5.3 Microservice library + +Create: + +* `StellaMicroserviceOptions` with required properties. +* `RouterEndpointConfig` (host/port/transport). +* Extension method `AddStellaMicroservice(...)` with an empty body that just registers options and placeholder services. + +```csharp +namespace StellaOps.Microservice; + +public sealed class StellaMicroserviceOptions +{ + public string ServiceName { get; set; } = string.Empty; + public string Version { get; set; } = string.Empty; + public string Region { get; set; } = string.Empty; + public string InstanceId { get; set; } = string.Empty; + public IList Routers { get; set; } = new List(); + public string? ConfigFilePath { get; set; } +} + +public sealed class RouterEndpointConfig +{ + public string Host { get; set; } = string.Empty; + public int Port { get; set; } + public TransportType TransportType { get; set; } +} +``` + +`AddStellaMicroservice`: + +```csharp +public static class ServiceCollectionExtensions +{ + public static IServiceCollection AddStellaMicroservice( + this IServiceCollection services, + Action configure) + { + services.Configure(configure); + // TODO: register internal SDK services in later phases + return services; + } +} +``` + +### 5.4 Microservice.SourceGen + +For now: + +* Leave this as an empty classlib with an empty `README.md` stating: + + * “This project will host Roslyn source generators for endpoint discovery. No implementation yet.” + +Don’t hook it as an analyzer until there is content. + +### 5.5 Gateway webservice + +Simplify the scaffolded Web API to minimal: + +* In `Program.cs`: + + * Build a barebones `WebApplication` that: + + * Binds `GatewayNodeConfig` from config. + * Adds controllers or minimal endpoints. + * Runs; no router logic yet. + +Example: + +```csharp +var builder = WebApplication.CreateBuilder(args); + +builder.Services.Configure( + builder.Configuration.GetSection("GatewayNode")); + +builder.Services.AddControllers(); + +var app = builder.Build(); + +app.MapControllers(); // may be empty for now + +app.Run(); +``` + +* Add `GatewayNodeConfig` class in `StellaOps.Gateway.WebService` project. + +--- + +## 6. Make tests compile (even if empty) + +**Owner: test agent** + +For each test project: + +* Reference the appropriate main project (already done). +* Add a single dummy test class so CI passes: + +```csharp +public class SmokeTests +{ + [Fact] + public void SolutionCompiles() + { + Assert.True(true); + } +} +``` + +This is just to ensure the pipeline runs; real tests come later. + +--- + +## 7. Add initial CI/build pipeline + +**Owner: infra agent** + +Set up minimal CI (GitHub Actions, GitLab, Azure DevOps, whatever you use): + +* Steps: + + * `dotnet restore` + * `dotnet build StellaOps.Router.sln -c Release` + * `dotnet test StellaOps.Router.sln -c Release` + +No packaging or deployment yet; just compile + tests. + +--- + +## 8. Sanity check & readiness criteria + +Before you let agents move to “Common model implementation” (next phase), confirm: + +* The solution builds cleanly in a clean checkout (`dotnet restore`, `dotnet build`). +* All test projects run and pass (even with dummy tests). +* Namespaces and project names match the spec: + + * `StellaOps.Gateway.WebService` + * `StellaOps.Router.Common` + * `StellaOps.Router.Config` + * `StellaOps.Microservice` +* There is no real business logic yet: + + * No transport logic. + * No routing decisions. + * No reflection or YAML. +* `docs/router/specs.md` is referenced in `docs/router/README.md` as the spec. + +At that point, the skeleton is in place and stable. Next phases can then focus on: + +* Filling in `Common` contracts properly. +* Implementing the in-memory transport. +* Wiring minimal microservice/gateway flows. + +If you want, I can outline the next phase (“implement core model + in-memory transport”) with a similar task breakdown next. diff --git a/docs/router/02-Step.md b/docs/router/02-Step.md new file mode 100644 index 000000000..b3e09bc09 --- /dev/null +++ b/docs/router/02-Step.md @@ -0,0 +1,375 @@ +For this step, the goal is: make `StellaOps.Router.Common` the single, stable contract layer that everything else can depend on, with **no behavior** yet, just shapes. After this, gateway, microservice SDK, transports, and config can all compile against it. + +Think of this as “lock down the domain vocabulary”. + +--- + +## 0. Pre-work + +**All devs touching Common:** + +1. Read `docs/router/specs.md`, specifically: + + * The sections describing: + + * Enums (`TransportType`, `FrameType`, `InstanceHealthStatus`, etc.). + * Endpoint/instance/routing models. + * Frames and request/response correlation. + * Routing state and routing plugin. +2. Agree that no class/interface will be added to Common if it isn’t in the spec (or discussed with you and then added to the spec). + +--- + +## 1. Inventory and file layout + +**Owner: “Common” lead** + +1. From `specs.md`, extract a **type inventory** for `StellaOps.Router.Common`: + + Enumerations: + + * `TransportType` + * `FrameType` + * `InstanceHealthStatus` + + Core value objects: + + * `ClaimRequirement` + * `EndpointDescriptor` + * `InstanceDescriptor` + * `ConnectionState` + * `PayloadLimits` (if used from Common; otherwise keep in Config only) + * Any small value types you’ve defined (e.g. cancel payload, ping metrics etc. if present in specs). + + Routing: + + * `RoutingContext` + * `RoutingDecision` + + Frames: + + * `Frame` (type + correlation id + payload) + * Optional payload contracts for HELLO, HEARTBEAT, ENDPOINTS_UPDATE, etc., if you’ve specified them explicitly. + + Abstractions/interfaces: + + * `IGlobalRoutingState` + * `IRoutingPlugin` + * `ITransportServer` + * `ITransportClient` + * Optional: `IRegionProvider` if you kept it in the spec. + +2. Propose a file layout inside `src/__Libraries/StellaOps.Router.Common`: + + Example: + + ```text + /StellaOps.Router.Common + /Enums + TransportType.cs + FrameType.cs + InstanceHealthStatus.cs + /Models + ClaimRequirement.cs + EndpointDescriptor.cs + InstanceDescriptor.cs + ConnectionState.cs + RoutingContext.cs + RoutingDecision.cs + Frame.cs + /Abstractions + IGlobalRoutingState.cs + IRoutingPlugin.cs + ITransportClient.cs + ITransportServer.cs + IRegionProvider.cs (if used) + ``` + +3. Get a quick 👍/👎 from you on the layout (no code yet, just file names and namespaces). + +--- + +## 2. Implement enums and basic models + +**Owner: Common dev** + +Scope: simple, immutable models, no methods. + +1. **Enums** + + Implement: + + * `TransportType` with `[Udp, Tcp, Certificate, RabbitMq]`. + * `FrameType` with: + + * `Hello`, `Heartbeat`, `EndpointsUpdate`, `Request`, `RequestStreamData`, `Response`, `ResponseStreamData`, `Cancel` (and any others in specs). + * `InstanceHealthStatus` with: + + * `Unknown`, `Healthy`, `Degraded`, `Draining`, `Unhealthy`. + + All enums live under `namespace StellaOps.Router.Common;`. + +2. **Value models** + + Implement as plain classes/records with auto-properties: + + * `ClaimRequirement`: + + * `string Type` (required). + * `string? Value` (optional). + * `EndpointDescriptor`: + + * `string ServiceName` + * `string Version` + * `string Method` + * `string Path` + * `TimeSpan DefaultTimeout` + * `bool SupportsStreaming` + * `IReadOnlyList RequiringClaims` + * `InstanceDescriptor`: + + * `string InstanceId` + * `string ServiceName` + * `string Version` + * `string Region` + * `ConnectionState`: + + * `string ConnectionId` + * `InstanceDescriptor Instance` + * `InstanceHealthStatus Status` + * `DateTime LastHeartbeatUtc` + * `double AveragePingMs` + * `TransportType TransportType` + * `IReadOnlyDictionary<(string Method, string Path), EndpointDescriptor> Endpoints` + + Design choices: + + * Make constructors minimal (empty constructors okay for now). + * Use `init` where reasonable to encourage immutability for descriptors; `ConnectionState` can have mutable health fields. + +3. **PayloadLimits (if in Common)** + + If the spec places `PayloadLimits` in Common (versus Config), implement: + + ```csharp + public sealed class PayloadLimits + { + public long MaxRequestBytesPerCall { get; set; } + public long MaxRequestBytesPerConnection { get; set; } + public long MaxAggregateInflightBytes { get; set; } + } + ``` + + If it’s defined in Config only, leave it there and avoid duplication. + +--- + +## 3. Implement frame & correlation model + +**Owner: Common dev** + +1. Implement `Frame`: + + ```csharp + public sealed class Frame + { + public FrameType Type { get; init; } + public Guid CorrelationId { get; init; } + public byte[] Payload { get; init; } = Array.Empty(); + } + ``` + +2. If `specs.md` defines specific payload DTOs (e.g. `HelloPayload`, `HeartbeatPayload`, `CancelPayload`), define them too: + + * `HelloPayload`: + + * `InstanceDescriptor` and list of `EndpointDescriptor`s, or the equivalent properties. + * `HeartbeatPayload`: + + * `InstanceId`, `Status`, metrics. + * `CancelPayload`: + + * `string Reason` or similar. + + Keep them as simple DTOs with no logic. + +3. Do **not** implement serialization yet (no JSON/MessagePack references here); Common should only define shapes. + +--- + +## 4. Routing abstractions + +**Owner: Common dev** + +Implement the routing interface + context & decision types. + +1. `RoutingContext`: + + * Match the spec. If your `specs.md` version includes `HttpContext`, follow it; if you intentionally kept Common free of ASP.NET types, use a neutral context (e.g. method/path/headers/principal). + * For now, if `HttpContext` is included in spec, define: + + ```csharp + public sealed class RoutingContext + { + public object HttpContext { get; init; } = default!; // or Microsoft.AspNetCore.Http.HttpContext if allowed + public EndpointDescriptor Endpoint { get; init; } = default!; + public string GatewayRegion { get; init; } = string.Empty; + } + ``` + + Then you can refine the type once you finalize whether Common can reference ASP.NET packages. If you want to avoid that now, define your own lightweight context model and let gateway adapt. + +2. `RoutingDecision`: + + * Must include: + + * `EndpointDescriptor Endpoint` + * `ConnectionState Connection` + * `TransportType TransportType` + * `TimeSpan EffectiveTimeout` + +3. `IGlobalRoutingState`: + + Interface only, no implementation: + + ```csharp + public interface IGlobalRoutingState + { + EndpointDescriptor? ResolveEndpoint(string method, string path); + + IReadOnlyList GetConnectionsFor( + string serviceName, + string version, + string method, + string path); + } + ``` + +4. `IRoutingPlugin`: + + * Single method: + + ```csharp + public interface IRoutingPlugin + { + Task ChooseInstanceAsync( + RoutingContext context, + CancellationToken cancellationToken); + } + ``` + + * No logic; just interface. + +--- + +## 5. Transport abstractions + +**Owner: Common dev** + +Implement the shared transport contracts. + +1. `ITransportServer`: + + ```csharp + public interface ITransportServer + { + Task StartAsync(CancellationToken cancellationToken); + Task StopAsync(CancellationToken cancellationToken); + } + ``` + +2. `ITransportClient`: + + Per spec, you need: + + * A buffered call (request → response). + * A streaming call. + * A cancel call. + + Interfaces only; content roughly: + + ```csharp + public interface ITransportClient + { + Task SendRequestAsync( + ConnectionState connection, + Frame requestFrame, + TimeSpan timeout, + CancellationToken cancellationToken); + + Task SendCancelAsync( + ConnectionState connection, + Guid correlationId, + string? reason = null); + + Task SendStreamingAsync( + ConnectionState connection, + Frame requestHeader, + Stream requestBody, + Func readResponseBody, + PayloadLimits limits, + CancellationToken cancellationToken); + } + ``` + + No implementation or transport-specific logic here. No network types beyond `Stream` and `Task`. + +3. `IRegionProvider` (if you decided to keep it): + + ```csharp + public interface IRegionProvider + { + string Region { get; } + } + ``` + +--- + +## 6. Wire Common into tests (sanity checks only) + +**Owner: Common tests dev** + +Create a few very simple unit tests in `StellaOps.Router.Common.Tests`: + +1. **Shape tests** (these are mostly compile-time): + + * That `EndpointDescriptor` has the expected properties and default values can be set. + * That `ConnectionState` can be constructed and that its `Endpoints` dictionary handles `(Method, Path)` keys. + +2. **Enum completeness tests**: + + * Assert that `Enum.GetValues(typeof(FrameType))` contains all expected values. This catches accidental changes. + +3. **No behavior yet**: + + * No routing algorithms or transport behavior tests here; just that model contracts behave like dumb DTOs (e.g. property assignment, default value semantics). + +This is mostly to lock in the shape and catch accidental refactors later. + +--- + +## 7. Cleanliness & review checklist + +Before you move on to the in-memory transport and gateway/microservice wiring, check: + +1. `StellaOps.Router.Common`: + + * Compiles with zero warnings (nullable enabled). + * Only references BCL; no ASP.NET or serializer packages unless intentionally agreed in the spec. + +2. All types listed in `specs.md` under the Common section exist and match names & property sets. + +3. No behavior/logic: + + * No LINQ-heavy methods. + * No routing algorithm code. + * No network code. + * No YAML/JSON or serialization. + +4. `StellaOps.Router.Common.Tests` runs and passes. + +5. `docs/router/specs.md` is updated if there was any discrepancy (or the code is updated to match the spec, not the other way around). + +--- + +If you want the next step, I can outline “3. Build in-memory transport + minimal HELLO/REQUEST/RESPONSE wiring” in the same style, so agents can move from contracts to a working vertical slice. diff --git a/docs/router/03-Step.md b/docs/router/03-Step.md new file mode 100644 index 000000000..acb6c0be8 --- /dev/null +++ b/docs/router/03-Step.md @@ -0,0 +1,144 @@ +For this step, you’re not writing any real logic yet – you’re just making sure the projects depend on each other in the right direction so future work doesn’t turn into spaghetti. + +Think of it as locking in the dependency graph. + +--- + +## 1. Pin the desired dependency graph + +First, make explicit what is allowed to depend on what. + +Target graph: + +* `StellaOps.Router.Common` + + * Lowest layer. + * **No** project references to any other StellaOps projects. + +* `StellaOps.Router.Config` + + * References: + + * `StellaOps.Router.Common`. + +* `StellaOps.Microservice` + + * References: + + * `StellaOps.Router.Common`. + +* `StellaOps.Microservice.SourceGen` + + * For now: no references, or only to Common if needed for types in generated code. + * Later: will be consumed as an analyzer by `StellaOps.Microservice`, not via normal project reference. + +* `StellaOps.Gateway.WebService` + + * References: + + * `StellaOps.Router.Common` + * `StellaOps.Router.Config`. + +Test projects: + +* `StellaOps.Router.Common.Tests` → `StellaOps.Router.Common` +* `StellaOps.Gateway.WebService.Tests` → `StellaOps.Gateway.WebService` +* `StellaOps.Microservice.Tests` → `StellaOps.Microservice` + +Explicitly: there should be **no** circular references, and nothing should reference the Gateway from libraries. + +--- + +## 2. Add the project references + +From repo root, for each needed edge: + +```bash +# Gateway → Common + Config +dotnet add src/StellaOps.Gateway.WebService/StellaOps.Gateway.WebService.csproj reference \ + src/__Libraries/StellaOps.Router.Common/StellaOps.Router.Common.csproj \ + src/__Libraries/StellaOps.Router.Config/StellaOps.Router.Config.csproj + +# Microservice → Common +dotnet add src/__Libraries/StellaOps.Microservice/StellaOps.Microservice.csproj reference \ + src/__Libraries/StellaOps.Router.Common/StellaOps.Router.Common.csproj + +# Config → Common +dotnet add src/__Libraries/StellaOps.Router.Config/StellaOps.Router.Config.csproj reference \ + src/__Libraries/StellaOps.Router.Common/StellaOps.Router.Common.csproj + +# Tests → main projects +dotnet add tests/StellaOps.Router.Common.Tests/StellaOps.Router.Common.Tests.csproj reference \ + src/__Libraries/StellaOps.Router.Common/StellaOps.Router.Common.csproj + +dotnet add tests/StellaOps.Gateway.WebService.Tests/StellaOps.Gateway.WebService.Tests.csproj reference \ + src/StellaOps.Gateway.WebService/StellaOps.Gateway.WebService.csproj + +dotnet add tests/StellaOps.Microservice.Tests/StellaOps.Microservice.Tests.csproj reference \ + src/__Libraries/StellaOps.Microservice/StellaOps.Microservice.csproj +``` + +Do **not** add any references: + +* From `Common` → anything. +* From `Config` → Gateway or Microservice. +* From `Microservice` → Gateway. +* From tests → libraries other than their primary target (unless you explicitly want shared test utils later). + +--- + +## 3. Verify the .csproj contents + +Have one agent open each `.csproj` and confirm: + +* `StellaOps.Router.Common.csproj` + + * No `` elements. + +* `StellaOps.Router.Config.csproj` + + * Exactly one ``: Common. + +* `StellaOps.Microservice.csproj` + + * Exactly one ``: Common. + +* `StellaOps.Microservice.SourceGen.csproj` + + * No project references for now (we’ll convert it to a proper analyzer / source-generator package later). + +* `StellaOps.Gateway.WebService.csproj` + + * Exactly two ``s: Common + Config. + * No reference to Microservice. + +* Test projects: + + * Each test project references only its corresponding main project (no cross-test coupling). + +If anything else is present (e.g. leftover references from templates), remove them. + +--- + +## 4. Run a full build & test as a sanity check + +From repo root: + +```bash +dotnet restore +dotnet build StellaOps.Router.sln -c Debug +dotnet test StellaOps.Router.sln -c Debug +``` + +Acceptance criteria for this step: + +* Solution builds without reference errors. +* All test projects compile and run (even if they only have dummy tests). +* Intellisense / navigation in IDE shows: + + * Gateway can see Common & Config types. + * Microservice can see Common types. + * Config can see Common types. + * No library can see Gateway unless through tests. + +Once this is stable, your devs can safely move on to implementing the Common model and know they won’t have to rewrite references later. diff --git a/docs/router/04-Step.md b/docs/router/04-Step.md new file mode 100644 index 000000000..204c11c30 --- /dev/null +++ b/docs/router/04-Step.md @@ -0,0 +1,520 @@ +For this step, the goal is: a microservice that can: + +* Start up with `AddStellaMicroservice(...)` +* Discover its endpoints from attributes +* Connect to the router (via InMemory transport) +* Send a HELLO with identity + endpoints +* Receive a REQUEST and return a RESPONSE + +No streaming, no cancellation, no heartbeat yet. Pure minimal handshake & dispatch. + +--- + +## 0. Preconditions + +Before your agents start this step, you should have: + +* `StellaOps.Router.Common` contracts in place (enums, `EndpointDescriptor`, `ConnectionState`, `Frame`, etc.). +* The solution skeleton and project references configured. +* A **stub** InMemory transport “router harness” (at least a place to park the future InMemory transport). Even if it’s not fully implemented, assume it will expose: + + * A way for a microservice to “connect” and register itself. + * A way to deliver frames from router to microservice and back. + +If InMemory isn’t built yet, the microservice code should be written *against abstractions* so you can plug it in later. + +--- + +## 1. Define microservice public surface (SDK contract) + +**Project:** `__Libraries/StellaOps.Microservice` +**Owner:** microservice SDK agent + +Purpose: give product teams a stable way to define services and endpoints without caring about transports. + +### 1.1 Options + +Make sure `StellaMicroserviceOptions` matches the spec: + +```csharp +public sealed class StellaMicroserviceOptions +{ + public string ServiceName { get; set; } = string.Empty; + public string Version { get; set; } = string.Empty; + public string Region { get; set; } = string.Empty; + public string InstanceId { get; set; } = string.Empty; + + public IList Routers { get; set; } = new List(); + + public string? ConfigFilePath { get; set; } +} + +public sealed class RouterEndpointConfig +{ + public string Host { get; set; } = string.Empty; + public int Port { get; set; } + public TransportType TransportType { get; set; } +} +``` + +`Routers` is mandatory: without at least one router configured, the SDK should refuse to start later (that policy can be enforced in the handshake stage). + +### 1.2 Public endpoint abstractions + +Define: + +* Attribute for endpoint identity: + +```csharp +[AttributeUsage(AttributeTargets.Class, AllowMultiple = true)] +public sealed class StellaEndpointAttribute : Attribute +{ + public string Method { get; } + public string Path { get; } + + public StellaEndpointAttribute(string method, string path) + { + Method = method; + Path = path; + } +} +``` + +* Raw handler: + +```csharp +public sealed class RawRequestContext +{ + public string Method { get; init; } = string.Empty; + public string Path { get; init; } = string.Empty; + public IReadOnlyDictionary Headers { get; init; } = + new Dictionary(); + public Stream Body { get; init; } = Stream.Null; + public CancellationToken CancellationToken { get; init; } +} + +public sealed class RawResponse +{ + public int StatusCode { get; set; } = 200; + public IDictionary Headers { get; } = + new Dictionary(); + public Func? WriteBodyAsync { get; set; } // may be null +} + +public interface IRawStellaEndpoint +{ + Task HandleAsync(RawRequestContext ctx); +} +``` + +* Typed convenience interfaces (used later, but define now): + +```csharp +public interface IStellaEndpoint +{ + Task HandleAsync(TRequest request, CancellationToken ct); +} + +public interface IStellaEndpoint +{ + Task HandleAsync(CancellationToken ct); +} +``` + +At this step, you don’t need to implement adapters yet, but the signatures must be fixed. + +### 1.3 Registration extension + +Extend `AddStellaMicroservice` to wire options + a few internal services: + +```csharp +public static class ServiceCollectionExtensions +{ + public static IServiceCollection AddStellaMicroservice( + this IServiceCollection services, + Action configure) + { + services.Configure(configure); + + services.AddSingleton(); // to be implemented + services.AddSingleton(); // to be implemented + + services.AddHostedService(); // handshake loop + + return services; + } +} +``` + +This still compiles with empty implementations; you fill them in next steps. + +--- + +## 2. Endpoint discovery (reflection only for now) + +**Project:** `StellaOps.Microservice` +**Owner:** SDK agent + +Goal: given the entry assembly, build: + +* A list of `EndpointDescriptor` objects (from Common). +* A mapping `(Method, Path) -> handler type` used for dispatch. + +### 2.1 Internal types + +Define an internal representation: + +```csharp +internal sealed class EndpointRegistration +{ + public EndpointDescriptor Descriptor { get; init; } = default!; + public Type HandlerType { get; init; } = default!; +} +``` + +Define an interface for discovery: + +```csharp +internal interface IEndpointDiscovery +{ + IReadOnlyList DiscoverEndpoints(StellaMicroserviceOptions options); +} +``` + +### 2.2 Implement reflection-based discovery + +Create `ReflectionEndpointDiscovery`: + +* Scan the entry assembly (and optionally referenced assemblies) for classes that: + + * Have `StellaEndpointAttribute`. + * Implement either: + + * `IRawStellaEndpoint`, or + * `IStellaEndpoint<,>`, or + * `IStellaEndpoint<>`. + +* For each `[StellaEndpoint]` usage: + + * Create `EndpointDescriptor` with: + + * `ServiceName` = `options.ServiceName`. + * `Version` = `options.Version`. + * `Method`, `Path` from attribute. + * `DefaultTimeout` = some sensible default (e.g. `TimeSpan.FromSeconds(30)`; refine later). + * `SupportsStreaming` = `false` (for now). + * `RequiringClaims` = empty array (for now). + + * Create `EndpointRegistration` with `Descriptor` + `HandlerType`. + +* Return the list. + +Wire it into DI: + +```csharp +services.AddSingleton(); +``` + +--- + +## 3. Endpoint catalog & dispatcher (microservice internal) + +**Project:** `StellaOps.Microservice` +**Owner:** SDK agent + +Goal: presence of: + +* A catalog holding endpoints and descriptors. +* A dispatcher that takes frames and calls handlers. + +### 3.1 Endpoint catalog + +Define: + +```csharp +internal interface IEndpointCatalog +{ + IReadOnlyList Descriptors { get; } + bool TryGetHandler(string method, string path, out EndpointRegistration endpoint); +} + +internal sealed class EndpointCatalog : IEndpointCatalog +{ + private readonly Dictionary<(string Method, string Path), EndpointRegistration> _map; + public IReadOnlyList Descriptors { get; } + + public EndpointCatalog(IEndpointDiscovery discovery, + IOptions optionsAccessor) + { + var options = optionsAccessor.Value; + var registrations = discovery.DiscoverEndpoints(options); + + _map = registrations.ToDictionary( + r => (r.Descriptor.Method, r.Descriptor.Path), + r => r, + StringComparer.OrdinalIgnoreCase); + + Descriptors = registrations.Select(r => r.Descriptor).ToArray(); + } + + public bool TryGetHandler(string method, string path, out EndpointRegistration endpoint) => + _map.TryGetValue((method, path), out endpoint!); +} +``` + +You can refine path normalization later; for now, keep it simple. + +### 3.2 Endpoint dispatcher + +Define: + +```csharp +internal interface IEndpointDispatcher +{ + Task HandleRequestAsync(Frame requestFrame, CancellationToken ct); +} +``` + +Implement `EndpointDispatcher` with minimal behavior: + +1. Decode `requestFrame.Payload` into a small DTO carrying: + + * Method + * Path + * Headers (if you already have a format; if not, assume no headers in v0) + * Body bytes + + For this step, you can stub decoding as: + + * Payload = raw body bytes. + * Method/Path are carried separately in frame header or in a simple DTO; decide a minimal interim format and write it down. + +2. Use `IEndpointCatalog.TryGetHandler(method, path, ...)`: + + * If not found: + + * Build a `RawResponse` with status 404 and empty body. + +3. If handler implements `IRawStellaEndpoint`: + + * Instantiate via DI (`IServiceProvider.GetRequiredService(handlerType)`). + * Build `RawRequestContext` with: + + * Method, Path, Headers, Body (`new MemoryStream(bodyBytes)` for now). + * `CancellationToken` = `ct`. + * Call `HandleAsync`. + * Convert `RawResponse` into a response frame payload. + +4. If handler implements `IStellaEndpoint<,>` (typed): + + * For now, **you can skip typed handling** or wire a very simple JSON-based adapter if you want to unlock it early. The focus in this step is the raw path; typed adapters can come in the next iteration. + +Return a `Frame` with: + +* `Type = FrameType.Response` +* `CorrelationId` = `requestFrame.CorrelationId` +* `Payload` = encoded response (status + body bytes). + +No streaming, no cancellation logic beyond passing `ct` through — router won’t cancel yet. + +--- + +## 4. Minimal handshake hosted service (using InMemory) + +**Project:** `StellaOps.Microservice` +**Owner:** SDK agent + +This is where the microservice actually “talks” to the router. + +### 4.1 Define a microservice connection abstraction + +Your SDK should not depend directly on InMemory; define an internal abstraction: + +```csharp +internal interface IMicroserviceConnection +{ + Task StartAsync(CancellationToken ct); + Task StopAsync(CancellationToken ct); +} +``` + +The implementation for this step will target the InMemory transport; later you can add TCP/TLS/RabbitMQ versions. + +### 4.2 Implement InMemory microservice connection + +Assuming you have or will have an `IInMemoryRouter` (or similar) dev harness, implement: + +```csharp +internal sealed class InMemoryMicroserviceConnection : IMicroserviceConnection +{ + private readonly IEndpointCatalog _catalog; + private readonly IEndpointDispatcher _dispatcher; + private readonly IOptions _options; + private readonly IInMemoryRouterClient _routerClient; // dev-only abstraction + + public InMemoryMicroserviceConnection( + IEndpointCatalog catalog, + IEndpointDispatcher dispatcher, + IOptions options, + IInMemoryRouterClient routerClient) + { + _catalog = catalog; + _dispatcher = dispatcher; + _options = options; + _routerClient = routerClient; + } + + public async Task StartAsync(CancellationToken ct) + { + var opts = _options.Value; + + // Build HELLO payload from options + catalog.Descriptors + var helloPayload = BuildHelloPayload(opts, _catalog.Descriptors); + + await _routerClient.ConnectAsync(opts, ct); + await _routerClient.SendHelloAsync(helloPayload, ct); + + // Start background receive loop + _ = Task.Run(() => ReceiveLoopAsync(ct), ct); + } + + public Task StopAsync(CancellationToken ct) + { + // For now: ask routerClient to disconnect; finer handling later + return _routerClient.DisconnectAsync(ct); + } + + private async Task ReceiveLoopAsync(CancellationToken ct) + { + await foreach (var frame in _routerClient.GetIncomingFramesAsync(ct)) + { + if (frame.Type == FrameType.Request) + { + var response = await _dispatcher.HandleRequestAsync(frame, ct); + await _routerClient.SendFrameAsync(response, ct); + } + else + { + // Ignore other frame types in this minimal step + } + } + } +} +``` + +`IInMemoryRouterClient` is whatever dev harness you build for the in-memory transport; the exact shape is not important for this step’s planning, only that it provides: + +* `ConnectAsync` +* `SendHelloAsync` +* `GetIncomingFramesAsync` (async stream of frames) +* `SendFrameAsync` for responses +* `DisconnectAsync` + +### 4.3 Hosted service to bootstrap the connection + +Implement `MicroserviceBootstrapHostedService`: + +```csharp +internal sealed class MicroserviceBootstrapHostedService : IHostedService +{ + private readonly IMicroserviceConnection _connection; + + public MicroserviceBootstrapHostedService(IMicroserviceConnection connection) + { + _connection = connection; + } + + public Task StartAsync(CancellationToken cancellationToken) => + _connection.StartAsync(cancellationToken); + + public Task StopAsync(CancellationToken cancellationToken) => + _connection.StopAsync(cancellationToken); +} +``` + +Wire `IMicroserviceConnection` to `InMemoryMicroserviceConnection` in DI for now: + +```csharp +services.AddSingleton(); +``` + +In a later phase, you’ll swap this to transport-specific connectors. + +--- + +## 5. End-to-end smoke test (InMemory only) + +**Project:** `StellaOps.Microservice.Tests` + a minimal InMemory router test harness +**Owner:** test agent + +Goal: prove that minimal handshake & dispatch works in memory. + +1. Build a trivial test microservice: + + * Define a handler: + + ```csharp + [StellaEndpoint("GET", "/ping")] + public sealed class PingEndpoint : IRawStellaEndpoint + { + public Task HandleAsync(RawRequestContext ctx) + { + var resp = new RawResponse { StatusCode = 200 }; + resp.Headers["Content-Type"] = "text/plain"; + resp.WriteBodyAsync = stream => stream.WriteAsync( + Encoding.UTF8.GetBytes("pong")); + return Task.FromResult(resp); + } + } + ``` + +2. Test harness: + + * Spin up: + + * An instance of the microservice host (generic HostBuilder). + * An in-memory “router” that: + + * Accepts HELLO from the microservice. + * Sends a single REQUEST frame for `GET /ping`. + * Receives the RESPONSE frame. + +3. Assert: + + * The HELLO includes the `/ping` endpoint. + * The REQUEST is dispatched to `PingEndpoint`. + * The RESPONSE has status 200 and body “pong”. + +This verifies that: + +* `AddStellaMicroservice` wires discovery, catalog, dispatcher, bootstrap. +* The microservice sends HELLO on connect. +* The microservice can handle at least one request via InMemory. + +--- + +## 6. Done criteria for “minimal handshake & dispatch” + +You can consider this step complete when: + +* `StellaOps.Microservice` exposes: + + * Options. + * Attribute & handler interfaces (raw + typed). + * `AddStellaMicroservice` registering discovery, catalog, dispatcher, and hosted service. +* The microservice can: + + * Discover endpoints via reflection. + * Build a `HELLO` payload and send it over InMemory on startup. + * Receive a `REQUEST` frame over InMemory. + * Dispatch that request to the correct handler. + * Return a `RESPONSE` frame. + +Not yet required in this step: + +* Streaming bodies. +* Heartbeats or health evaluation. +* Cancellation via CANCEL frames. +* Authority overrides for requiringClaims. + +Those come in subsequent phases; right now you just want a working minimal vertical slice: InMemory microservice that says “HELLO” and responds to one simple request. diff --git a/docs/router/05-Step.md b/docs/router/05-Step.md new file mode 100644 index 000000000..5beb94f59 --- /dev/null +++ b/docs/router/05-Step.md @@ -0,0 +1,554 @@ +For this step, the goal is: the gateway can accept an HTTP request, route it to **one** microservice over the **InMemory** transport, get a response, and return it to the client. + +No health/heartbeat yet. No streaming yet. Just: HTTP → InMemory → microservice → InMemory → HTTP. + +I’ll assume you’re still in the InMemory world and not touching TCP/UDP/RabbitMQ at this stage. + +--- + +## 0. Preconditions + +Before you start: + +* `StellaOps.Router.Common` exists and exposes: + + * `EndpointDescriptor`, `ConnectionState`, `Frame`, `FrameType`, `TransportType`, `RoutingDecision`. + * Interfaces: `IGlobalRoutingState`, `IRoutingPlugin`, `ITransportClient`. +* `StellaOps.Microservice` minimal handshake & dispatch is in place (from your “step 4”): + + * Microservice can: + + * Discover endpoints. + * Connect to an InMemory router client. + * Send HELLO. + * Receive REQUEST and send RESPONSE. +* Gateway project exists (`StellaOps.Gateway.WebService`) and runs as a basic ASP.NET Core app. + +If anything in that list is not true, fix it first or adjust the plan accordingly. + +--- + +## 1. Implement an InMemory transport “hub” + +You need a simple in-process component that: + +* Keeps track of “connections” from microservices. +* Delivers frames from the gateway to the correct microservice and back. + +You can host this either: + +* In a dedicated **test/support** assembly, or +* In the gateway project but marked as “dev-only” transport. + +For this step, keep it simple and in-memory. + +### 1.1 Define an InMemory router hub + +Conceptually: + +```csharp +public interface IInMemoryRouterHub +{ + // Called by microservice side to register a new connection + Task RegisterMicroserviceAsync( + InstanceDescriptor instance, + IReadOnlyList endpoints, + Func onFrameFromGateway, + CancellationToken ct); + + // Called by microservice when it wants to send a frame to the gateway + Task SendFromMicroserviceAsync(string connectionId, Frame frame, CancellationToken ct); + + // Called by gateway transport client when sending a frame to a microservice + Task SendFromGatewayAsync(string connectionId, Frame frame, CancellationToken ct); +} +``` + +Internally, the hub maintains per-connection data: + +* `ConnectionId` +* `InstanceDescriptor` +* Endpoints +* Delegate `onFrameFromGateway` (microservice receiver) + +For minimal routing you can start by: + +* Only supporting `SendFromGatewayAsync` for REQUEST and returning RESPONSE. +* For now, heartbeat frames can be ignored or stubbed. + +### 1.2 Connect the microservice side + +Your `InMemoryMicroserviceConnection` (from step 4) should: + +* Call `RegisterMicroserviceAsync` on the hub when it sends HELLO: + + * Get `connectionId`. +* Provide a handler `onFrameFromGateway` that: + + * Dispatches REQUEST frames via `IEndpointDispatcher`. + * Sends RESPONSE frames back via `SendFromMicroserviceAsync`. + +This is mostly microservice work; you should already have most of it outlined. + +--- + +## 2. Implement an InMemory `ITransportClient` in the gateway + +Now focus on the gateway side. + +**Project:** `StellaOps.Gateway.WebService` (or a small internal infra class in the same project) + +### 2.1 `InMemoryTransportClient` + +Implement `ITransportClient` using the `IInMemoryRouterHub`: + +```csharp +public sealed class InMemoryTransportClient : ITransportClient +{ + private readonly IInMemoryRouterHub _hub; + + public InMemoryTransportClient(IInMemoryRouterHub hub) + { + _hub = hub; + } + + public Task SendRequestAsync( + ConnectionState connection, + Frame requestFrame, + TimeSpan timeout, + CancellationToken ct) + { + // connection.ConnectionId must be set when HELLO is processed + return _hub.SendFromGatewayAsync(connection.ConnectionId, requestFrame, ct); + } + + public Task SendCancelAsync(ConnectionState connection, Guid correlationId, string? reason = null) + => Task.CompletedTask; // no-op at this stage + + public Task SendStreamingAsync( + ConnectionState connection, + Frame requestHeader, + Stream requestBody, + Func readResponseBody, + PayloadLimits limits, + CancellationToken ct) + => throw new NotSupportedException("Streaming not implemented for InMemory in this step."); +} +``` + +For now: + +* Ignore streaming. +* Ignore cancel. +* Just call `SendFromGatewayAsync` and get a response frame. + +### 2.2 Register it in DI + +In gateway `Program.cs` or a DI setup: + +```csharp +services.AddSingleton(); // your hub implementation +services.AddSingleton(); +``` + +You’ll later swap this with real transport clients (TCP, UDP, Rabbit), but for now everything uses InMemory. + +--- + +## 3. Implement minimal `IGlobalRoutingState` + +You now need the gateway’s internal view of: + +* Which endpoints exist. +* Which connections serve them. + +**Project:** `StellaOps.Gateway.WebService` or a small internal infra namespace. + +### 3.1 In-memory implementation + +Implement an `InMemoryGlobalRoutingState` something like: + +```csharp +public sealed class InMemoryGlobalRoutingState : IGlobalRoutingState +{ + private readonly object _lock = new(); + private readonly Dictionary<(string, string), EndpointDescriptor> _endpoints = new(); + private readonly List _connections = new(); + + public EndpointDescriptor? ResolveEndpoint(string method, string path) + { + lock (_lock) + { + _endpoints.TryGetValue((method, path), out var endpoint); + return endpoint; + } + } + + public IReadOnlyList GetConnectionsFor( + string serviceName, + string version, + string method, + string path) + { + lock (_lock) + { + return _connections + .Where(c => + c.Instance.ServiceName == serviceName && + c.Instance.Version == version && + c.Endpoints.ContainsKey((method, path))) + .ToList(); + } + } + + // Called when HELLO arrives from microservice + public void RegisterConnection(ConnectionState connection) + { + lock (_lock) + { + _connections.Add(connection); + foreach (var kvp in connection.Endpoints) + { + var key = kvp.Key; // (Method, Path) + var descriptor = kvp.Value; + // global endpoint map: any connection's descriptor is ok as "canonical" + _endpoints[(key.Method, key.Path)] = descriptor; + } + } + } +} +``` + +You will refine this later; for minimal routing it's enough. + +### 3.2 Hook HELLO to `IGlobalRoutingState` + +In your InMemory router hub, when a microservice registers (HELLO): + +* Create a `ConnectionState`: + + ```csharp + var conn = new ConnectionState + { + ConnectionId = generatedConnectionId, + Instance = instanceDescriptor, + Status = InstanceHealthStatus.Healthy, + LastHeartbeatUtc = DateTime.UtcNow, + AveragePingMs = 0, + TransportType = TransportType.Udp, // or TransportType.Tcp logically for InMemory + Endpoints = endpointDescriptors.ToDictionary( + e => (e.Method, e.Path), + e => e) + }; + ``` + +* Call `InMemoryGlobalRoutingState.RegisterConnection(conn)`. + +This gives the gateway a routing view as soon as HELLO is processed. + +--- + +## 4. Implement HTTP pipeline middlewares for routing + +Now, wire the gateway HTTP pipeline so that an incoming HTTP request is: + +1. Resolved to a logical endpoint. +2. Routed to one connection. +3. Dispatched via InMemory transport. + +### 4.1 EndpointResolutionMiddleware + +This maps `(Method, Path)` to an `EndpointDescriptor`. + +Create a middleware: + +```csharp +public sealed class EndpointResolutionMiddleware +{ + private readonly RequestDelegate _next; + + public EndpointResolutionMiddleware(RequestDelegate next) => _next = next; + + public async Task Invoke(HttpContext context, IGlobalRoutingState routingState) + { + var method = context.Request.Method; + var path = context.Request.Path.ToString(); + + var endpoint = routingState.ResolveEndpoint(method, path); + if (endpoint is null) + { + context.Response.StatusCode = StatusCodes.Status404NotFound; + await context.Response.WriteAsync("Endpoint not found"); + return; + } + + context.Items["Stella.EndpointDescriptor"] = endpoint; + await _next(context); + } +} +``` + +Register it in the pipeline: + +```csharp +app.UseMiddleware(); +``` + +Before or after auth depending on your final pipeline; for minimal routing, order is not critical. + +### 4.2 Minimal routing plugin (pick first connection) + +Implement a very naive `IRoutingPlugin` just to get things moving: + +```csharp +public sealed class NaiveRoutingPlugin : IRoutingPlugin +{ + private readonly IGlobalRoutingState _state; + + public NaiveRoutingPlugin(IGlobalRoutingState state) => _state = state; + + public Task ChooseInstanceAsync( + RoutingContext context, + CancellationToken cancellationToken) + { + var endpoint = context.Endpoint; + + var connections = _state.GetConnectionsFor( + endpoint.ServiceName, + endpoint.Version, + endpoint.Method, + endpoint.Path); + + var chosen = connections.FirstOrDefault(); + if (chosen is null) + return Task.FromResult(null); + + var decision = new RoutingDecision + { + Endpoint = endpoint, + Connection = chosen, + TransportType = chosen.TransportType, + EffectiveTimeout = endpoint.DefaultTimeout + }; + + return Task.FromResult(decision); + } +} +``` + +Register it: + +```csharp +services.AddSingleton(); +services.AddSingleton(); +``` + +### 4.3 RoutingDecisionMiddleware + +This middleware grabs the endpoint descriptor and asks the routing plugin for a connection. + +```csharp +public sealed class RoutingDecisionMiddleware +{ + private readonly RequestDelegate _next; + + public RoutingDecisionMiddleware(RequestDelegate next) => _next = next; + + public async Task Invoke(HttpContext context, IRoutingPlugin routingPlugin) + { + var endpoint = (EndpointDescriptor?)context.Items["Stella.EndpointDescriptor"]; + if (endpoint is null) + { + context.Response.StatusCode = 500; + await context.Response.WriteAsync("Endpoint metadata missing"); + return; + } + + var routingContext = new RoutingContext + { + Endpoint = endpoint, + GatewayRegion = "not_used_yet", // you’ll fill this from GatewayNodeConfig later + HttpContext = context + }; + + var decision = await routingPlugin.ChooseInstanceAsync(routingContext, context.RequestAborted); + if (decision is null) + { + context.Response.StatusCode = StatusCodes.Status503ServiceUnavailable; + await context.Response.WriteAsync("No instances available"); + return; + } + + context.Items["Stella.RoutingDecision"] = decision; + await _next(context); + } +} +``` + +Register it after `EndpointResolutionMiddleware`: + +```csharp +app.UseMiddleware(); +``` + +### 4.4 TransportDispatchMiddleware + +This middleware: + +* Builds a REQUEST frame from HTTP. +* Uses `ITransportClient` to send it to the chosen connection. +* Writes the RESPONSE frame back to HTTP. + +Minimal version (buffered, no streaming): + +```csharp +public sealed class TransportDispatchMiddleware +{ + private readonly RequestDelegate _next; + + public TransportDispatchMiddleware(RequestDelegate next) => _next = next; + + public async Task Invoke( + HttpContext context, + ITransportClient transportClient) + { + var decision = (RoutingDecision?)context.Items["Stella.RoutingDecision"]; + if (decision is null) + { + context.Response.StatusCode = 500; + await context.Response.WriteAsync("Routing decision missing"); + return; + } + + // Read request body into memory (safe for minimal tests) + byte[] bodyBytes; + using (var ms = new MemoryStream()) + { + await context.Request.Body.CopyToAsync(ms); + bodyBytes = ms.ToArray(); + } + + var requestPayload = new MinimalRequestPayload + { + Method = context.Request.Method, + Path = context.Request.Path.ToString(), + Body = bodyBytes + // headers can be ignored or added later + }; + + var requestFrame = new Frame + { + Type = FrameType.Request, + CorrelationId = Guid.NewGuid(), + Payload = SerializeRequestPayload(requestPayload) + }; + + var timeout = decision.EffectiveTimeout; + using var cts = CancellationTokenSource.CreateLinkedTokenSource(context.RequestAborted); + cts.CancelAfter(timeout); + + Frame responseFrame; + try + { + responseFrame = await transportClient.SendRequestAsync( + decision.Connection, + requestFrame, + timeout, + cts.Token); + } + catch (OperationCanceledException) + { + context.Response.StatusCode = StatusCodes.Status504GatewayTimeout; + await context.Response.WriteAsync("Upstream timeout"); + return; + } + + var responsePayload = DeserializeResponsePayload(responseFrame.Payload); + + context.Response.StatusCode = responsePayload.StatusCode; + foreach (var (k, v) in responsePayload.Headers) + { + context.Response.Headers[k] = v; + } + if (responsePayload.Body is { Length: > 0 }) + { + await context.Response.Body.WriteAsync(responsePayload.Body); + } + } +} +``` + +You’ll need minimal DTOs and serializers (`MinimalRequestPayload`, `MinimalResponsePayload`) just to move bytes. You can use JSON for now; protocol details will be formalized later. + +Register it after `RoutingDecisionMiddleware`: + +```csharp +app.UseMiddleware(); +``` + +At this point, you no longer need ASP.NET controllers for microservice endpoints; you can have a catch-all pipeline. + +--- + +## 5. Minimal end-to-end test + +**Owner:** test agent, probably in `StellaOps.Gateway.WebService.Tests` (plus a simple host for microservice in tests) + +Scenario: + +1. Start an in-memory microservice host: + + * It uses `AddStellaMicroservice`. + * It attaches to the same `IInMemoryRouterHub` instance as the gateway (created inside the test). + * It has a single endpoint: + + * `[StellaEndpoint("GET", "/ping")]` + * Handler returns “pong”. + +2. Start the gateway host: + + * Inject the same `IInMemoryRouterHub`. + * Use middlewares: `EndpointResolutionMiddleware`, `RoutingDecisionMiddleware`, `TransportDispatchMiddleware`. + +3. Invoke HTTP `GET /ping` against the gateway (using `WebApplicationFactory` or `TestServer`). + +Assert: + +* HTTP status 200. +* Body “pong”. +* The router hub saw: + + * At least one HELLO frame. + * One REQUEST frame. + * One RESPONSE frame. + +This proves: + +* HELLO → gateway routing state population. +* Endpoint resolution → connection selection. +* InMemory transport client used. +* Minimal dispatch works. + +--- + +## 6. Done criteria for “Gateway: minimal routing using InMemory plugin” + +You’re done with this step when: + +* A microservice can register with the gateway via InMemory. +* The gateway’s `IGlobalRoutingState` knows about endpoints and connections. +* The HTTP pipeline: + + * Resolves an endpoint based on `(Method, Path)`. + * Asks `IRoutingPlugin` for a connection. + * Uses `ITransportClient` (InMemory) to send REQUEST and get RESPONSE. + * Returns the mapped HTTP response to the client. +* You have at least one automated test showing: + + * `GET /ping` through gateway → InMemory → microservice → back to HTTP. + +After this, you’re ready to: + +* Swap `NaiveRoutingPlugin` with the health/region-sensitive plugin you defined. +* Implement heartbeat and latency. +* Later replace InMemory with TCP/UDP/Rabbit without changing the HTTP pipeline. diff --git a/docs/router/06-Step.md b/docs/router/06-Step.md new file mode 100644 index 000000000..c56a42efc --- /dev/null +++ b/docs/router/06-Step.md @@ -0,0 +1,541 @@ +For this step, you’re layering **liveness** and **basic routing intelligence** on top of the minimal handshake/dispatch you already designed. + +Target outcome: + +* Microservices send **heartbeats** over the existing connection. +* The router tracks **LastHeartbeatUtc**, **health status**, and **AveragePingMs** per connection. +* The router’s `IRoutingPlugin` uses **region + health + latency** to pick an instance. + +No need to handle cancellation or streaming yet; just make routing decisions *not* naive. + +--- + +## 0. Preconditions + +Before starting, confirm: + +* `StellaOps.Router.Common` already has: + + * `InstanceHealthStatus` enum. + * `ConnectionState` with at least `Instance`, `Status`, `LastHeartbeatUtc`, `AveragePingMs`, `TransportType`. +* Minimal handshake is working: + + * Microservice sends HELLO (instance + endpoints). + * Router creates `ConnectionState` & populates global routing view. + * Router can send REQUEST and receive RESPONSE via InMemory transport. + +If any of that is incomplete, shore it up first. + +--- + +## 1. Extend Common with heartbeat payloads + +**Project:** `StellaOps.Router.Common` +**Owner:** Common dev + +Add DTOs for heartbeat frames. + +### 1.1 Heartbeat payload + +```csharp +public sealed class HeartbeatPayload +{ + public string InstanceId { get; init; } = string.Empty; + public InstanceHealthStatus Status { get; init; } = InstanceHealthStatus.Healthy; + + // Optional basic metrics + public int InFlightRequests { get; init; } + public double ErrorRate { get; init; } // 0–1 range, optional +} +``` + +* This is application-level health; `Status` lets the microservice say “Degraded” / “Draining”. +* In-flight + error rate can be used later for smarter routing; initially, you can ignore them. + +### 1.2 Wire into frame model + +Ensure: + +* `FrameType` includes `Heartbeat`: + + ```csharp + public enum FrameType : byte + { + Hello = 1, + Heartbeat = 2, + EndpointsUpdate = 3, + Request = 4, + RequestStreamData = 5, + Response = 6, + ResponseStreamData = 7, + Cancel = 8 + } + ``` + +* No behavior in Common; only DTOs and enums. + +--- + +## 2. Microservice SDK: send heartbeats on the same connection + +**Project:** `StellaOps.Microservice` +**Owner:** SDK dev + +You already have `MicroserviceConnectionHostedService` doing HELLO and request dispatch. Now add heartbeat sending. + +### 2.1 Introduce heartbeat options + +Extend `StellaMicroserviceOptions` with simple settings: + +```csharp +public sealed class StellaMicroserviceOptions +{ + // existing fields... + public TimeSpan HeartbeatInterval { get; set; } = TimeSpan.FromSeconds(10); + public TimeSpan HeartbeatTimeout { get; set; } = TimeSpan.FromSeconds(30); // used by router, not here +} +``` + +### 2.2 Internal heartbeat sender + +Create an internal interface and implementation: + +```csharp +internal interface IHeartbeatSource +{ + InstanceHealthStatus GetCurrentStatus(); + int GetInFlightRequests(); + double GetErrorRate(); +} +``` + +For now you can implement a trivial `DefaultHeartbeatSource`: + +* `GetCurrentStatus()` → `Healthy`. +* `GetInFlightRequests()` → 0. +* `GetErrorRate()` → 0. + +Wire this in DI: + +```csharp +services.AddSingleton(); +``` + +### 2.3 Add heartbeat loop to MicroserviceConnectionHostedService + +In `StartAsync` of `MicroserviceConnectionHostedService`: + +* After sending HELLO and subscribing to requests, start a background heartbeat loop. + +Pseudo-plan: + +```csharp +private Task? _heartbeatLoop; + +public async Task StartAsync(CancellationToken ct) +{ + // existing HELLO logic... + await _connection.SendHelloAsync(payload, ct); + + _connection.OnRequest(frame => HandleRequestAsync(frame, ct)); + + _heartbeatLoop = Task.Run(() => HeartbeatLoopAsync(ct), ct); +} + +private async Task HeartbeatLoopAsync(CancellationToken outerCt) +{ + var opt = _options.Value; + var interval = opt.HeartbeatInterval; + var instanceId = opt.InstanceId; + + while (!outerCt.IsCancellationRequested) + { + var payload = new HeartbeatPayload + { + InstanceId = instanceId, + Status = _heartbeatSource.GetCurrentStatus(), + InFlightRequests = _heartbeatSource.GetInFlightRequests(), + ErrorRate = _heartbeatSource.GetErrorRate() + }; + + var frame = new Frame + { + Type = FrameType.Heartbeat, + CorrelationId = Guid.Empty, // or a reserved value + Payload = SerializeHeartbeatPayload(payload) + }; + + await _connection.SendHeartbeatAsync(frame, outerCt); + + try + { + await Task.Delay(interval, outerCt); + } + catch (TaskCanceledException) + { + break; + } + } +} +``` + +You’ll need to extend `IMicroserviceConnection` with: + +```csharp +Task SendHeartbeatAsync(Frame frame, CancellationToken ct); +``` + +In this step, manipulation is simple: every N seconds, push a heartbeat. + +--- + +## 3. Router: accept heartbeats and update connection health + +**Project:** `StellaOps.Gateway.WebService` +**Owner:** Gateway dev + +You already have an InMemory router or similar structure that: + +* Handles HELLO frames, creates `ConnectionState`. +* Maintains a global `IGlobalRoutingState`. + +Now you need to: + +* Handle HEARTBEAT frames. +* Update `ConnectionState.Status` and `LastHeartbeatUtc`. + +### 3.1 Frame dispatch on router side + +In your router’s InMemory server loop (or equivalent), add case for `FrameType.Heartbeat`: + +* Deserialize `HeartbeatPayload` from `frame.Payload`. +* Find the corresponding `ConnectionState` by `InstanceId` (and/or connection ID). +* Update: + + * `LastHeartbeatUtc` = `DateTime.UtcNow`. + * `Status` = `payload.Status`. + +You can add a method in your routing-state implementation: + +```csharp +public void UpdateHeartbeat(string connectionId, HeartbeatPayload payload) +{ + if (!_connections.TryGetValue(connectionId, out var conn)) + return; + + conn.LastHeartbeatUtc = DateTime.UtcNow; + conn.Status = payload.Status; +} +``` + +The router’s transport server should know which `connectionId` delivered the frame; pass that along. + +### 3.2 Detect stale connections (health degradation) + +Add a background “health monitor” in the gateway: + +* Reads `HeartbeatTimeout` from configuration (can reuse the same default as microservice or have separate router-side config). +* Periodically scans all `ConnectionState` entries: + + * If `Now - LastHeartbeatUtc > HeartbeatTimeout`, mark `Status = Unhealthy` (or remove connection entirely). + * If connection drops (transport disconnect), also mark `Unhealthy` or remove. + +This can be a simple `IHostedService`: + +```csharp +internal sealed class ConnectionHealthMonitor : IHostedService +{ + private readonly IGlobalRoutingState _state; + private readonly TimeSpan _heartbeatTimeout; + private Task? _loop; + private CancellationTokenSource? _cts; + + public Task StartAsync(CancellationToken cancellationToken) + { + _cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken); + _loop = Task.Run(() => MonitorLoopAsync(_cts.Token), _cts.Token); + return Task.CompletedTask; + } + + public async Task StopAsync(CancellationToken cancellationToken) + { + _cts?.Cancel(); + if (_loop is not null) + await _loop; + } + + private async Task MonitorLoopAsync(CancellationToken ct) + { + while (!ct.IsCancellationRequested) + { + _state.MarkStaleConnectionsUnhealthy(_heartbeatTimeout, DateTime.UtcNow); + await Task.Delay(TimeSpan.FromSeconds(5), ct); + } + } +} +``` + +You’ll add a method like `MarkStaleConnectionsUnhealthy` on your `IGlobalRoutingState` implementation. + +--- + +## 4. Track basic latency (AveragePingMs) + +**Project:** Gateway + Common +**Owner:** Gateway dev + +You want `AveragePingMs` per connection to inform routing decisions. + +### 4.1 Decide where to measure + +Simplest: measure “request → response” round-trip time in the gateway: + +* When you send a `Request` frame to a specific connection, record: + + * `SentAtUtc[CorrelationId] = DateTime.UtcNow`. +* When you receive a `Response` frame with that correlation: + + * Compute `latencyMs = (UtcNow - SentAtUtc[CorrelationId]).TotalMilliseconds`. + * Discard map entry. + +Then update `ConnectionState.AveragePingMs`, e.g. with an exponential moving average: + +```csharp +conn.AveragePingMs = conn.AveragePingMs <= 0 + ? latencyMs + : conn.AveragePingMs * 0.8 + latencyMs * 0.2; +``` + +### 4.2 Where to hook this + +* In the **gateway-side transport client** (InMemory implementation for now): + + * When sending `Request` frame: + + * Register `SentAtUtc` per correlation ID. + * When receiving `Response` frame: + + * Compute latency. + * Call `IGlobalRoutingState.UpdateLatency(connectionId, latencyMs)`. + +Add a method to the routing state: + +```csharp +public void UpdateLatency(string connectionId, double latencyMs) +{ + if (_connections.TryGetValue(connectionId, out var conn)) + { + if (conn.AveragePingMs <= 0) + conn.AveragePingMs = latencyMs; + else + conn.AveragePingMs = conn.AveragePingMs * 0.8 + latencyMs * 0.2; + } +} +``` + +You can keep it simple; sophistication can come later. + +--- + +## 5. Basic routing plugin implementation + +**Project:** `StellaOps.Gateway.WebService` +**Owner:** Gateway dev + +You already have `IRoutingPlugin` defined. Now implement a concrete `BasicRoutingPlugin` that respects: + +* Region (gateway region first, then neighbor tiers). +* Health (`Healthy` / `Degraded` only). +* Latency preference (`AveragePingMs`). + +### 5.1 Inputs & data + +`RoutingContext` should carry: + +* `EndpointDescriptor` (with ServiceName, Version, Method, Path). +* `GatewayRegion` (from `GatewayNodeConfig.Region`). +* The `HttpContext` if you need headers (not needed for routing at this stage). + +`IGlobalRoutingState` should provide: + +* `GetConnectionsFor(serviceName, version, method, path)` returning all `ConnectionState`s that support that endpoint. + +### 5.2 Basic algorithm + +Algorithm outline: + +```csharp +public sealed class BasicRoutingPlugin : IRoutingPlugin +{ + private readonly IGlobalRoutingState _state; + private readonly string[] _neighborRegions; // configured, can be empty + + public async Task ChooseInstanceAsync( + RoutingContext context, + CancellationToken cancellationToken) + { + var endpoint = context.Endpoint; + var candidates = _state.GetConnectionsFor( + endpoint.ServiceName, + endpoint.Version, + endpoint.Method, + endpoint.Path); + + if (candidates.Count == 0) + return null; + + // 1. Filter by health (only Healthy or Degraded) + var healthy = candidates + .Where(c => c.Status == InstanceHealthStatus.Healthy || c.Status == InstanceHealthStatus.Degraded) + .ToList(); + + if (healthy.Count == 0) + return null; + + // 2. Partition by region tier + var gatewayRegion = context.GatewayRegion; + + List tier1 = healthy.Where(c => c.Instance.Region == gatewayRegion).ToList(); + List tier2 = healthy.Where(c => _neighborRegions.Contains(c.Instance.Region)).ToList(); + List tier3 = healthy.Except(tier1).Except(tier2).ToList(); + + var chosenTier = tier1.Count > 0 ? tier1 : tier2.Count > 0 ? tier2 : tier3; + if (chosenTier.Count == 0) + return null; + + // 3. Sort by latency, then heartbeat freshness + var ordered = chosenTier + .OrderBy(c => c.AveragePingMs <= 0 ? double.MaxValue : c.AveragePingMs) + .ThenByDescending(c => c.LastHeartbeatUtc) + .ToList(); + + var winner = ordered[0]; + + // 4. Build decision + return new RoutingDecision + { + Endpoint = endpoint, + Connection = winner, + TransportType = winner.TransportType, + EffectiveTimeout = endpoint.DefaultTimeout // or compose with config later + }; + } +} +``` + +Wire it into DI: + +```csharp +services.AddSingleton(); +``` + +And ensure `RoutingDecisionMiddleware` calls it. + +--- + +## 6. Integrate health-aware routing into the HTTP pipeline + +**Project:** `StellaOps.Gateway.WebService` +**Owner:** Gateway dev + +Update your `RoutingDecisionMiddleware` to: + +* Use the final `IRoutingPlugin` instead of picking a random connection. +* Handle null decision appropriately: + + * If `ChooseInstanceAsync` returns `null`, respond with `503 Service Unavailable` or `502 Bad Gateway` and a generic error body, log the incident. + +Check that: + +* Gateway’s region is injected (via `GatewayNodeConfig.Region`) into `RoutingContext.GatewayRegion`. +* Endpoint descriptor is resolved before you call the plugin. + +--- + +## 7. Testing plan + +**Project:** `StellaOps.Gateway.WebService.Tests`, `StellaOps.Microservice.Tests` +**Owner:** test agent + +Write basic tests to lock in behavior. + +### 7.1 Microservice heartbeat tests + +In `StellaOps.Microservice.Tests`: + +* Use a fake `IMicroserviceConnection` that records frames sent. +* Configure `HeartbeatInterval` to a small number (e.g. 100 ms). +* Start a Host with `AddStellaMicroservice`. +* Wait some time, assert: + + * At least one HELLO frame was sent. + * At least N HEARTBEAT frames were sent. + * HEARTBEAT payload has correct `InstanceId` and `Status`. + +### 7.2 Router health update tests + +In `StellaOps.Gateway.WebService.Tests` (or a separate routing-state test project): + +* Create an instance of your `IGlobalRoutingState` implementation. + +* Add a connection via HELLO simulation. + +* Call `UpdateHeartbeat` with a HeartbeatPayload. + +* Assert: + + * `LastHeartbeatUtc` updated. + * `Status` set to `Healthy` (or whatever payload said). + +* Advance time (simulate via injecting a clock or mocking DateTime) and call `MarkStaleConnectionsUnhealthy`: + + * Assert that `Status` changed to `Unhealthy`. + +### 7.3 Routing plugin tests + +Write tests for `BasicRoutingPlugin`: + +* Case 1: multiple connections, some unhealthy: + + * Only Healthy/Degraded are considered. +* Case 2: multiple regions: + + * Instances in gateway region win over others. +* Case 3: same region, different `AveragePingMs`: + + * Lower latency chosen. +* Case 4: same latency, different `LastHeartbeatUtc`: + + * More recent heartbeat chosen. + +These tests will give you confidence that the routing logic behaves as requested and is stable as you add complexity later (streaming, cancellation, etc.). + +--- + +## 8. Done criteria for “Add heartbeat, health, basic routing rules” + +You can declare this step complete when: + +* Microservices: + + * Periodically send HEARTBEAT frames on the same connection they use for requests. +* Gateway/router: + + * Updates `LastHeartbeatUtc` and `Status` on receipt of HEARTBEAT. + * Marks stale or disconnected connections as `Unhealthy` (or removes them). + * Tracks `AveragePingMs` per connection based on request/response round trips. +* Routing: + + * `IRoutingPlugin` chooses instances based on: + + * Strict `ServiceName` + `Version` + endpoint match. + * Health (`Healthy`/`Degraded` only). + * Region preference (gateway region > neighbors > others). + * Latency (`AveragePingMs`) then heartbeat recency. +* Tests: + + * Validate heartbeats are sent and processed. + * Validate stale connections are marked unhealthy. + * Validate routing plugin picks the expected instance in simple scenarios. + +Once this is in place, you have a live, health-aware routing fabric. The next logical step after this is to add **cancellation** and then **streaming + payload limits** on top of the same structures. diff --git a/docs/router/07-Step.md b/docs/router/07-Step.md new file mode 100644 index 000000000..4beee90c8 --- /dev/null +++ b/docs/router/07-Step.md @@ -0,0 +1,378 @@ +For this step you’re wiring **request cancellation** end‑to‑end in the InMemory setup: + +> Client / gateway gives up → gateway sends CANCEL → microservice cancels handler + +No need to mix in streaming or payload limits yet; just enforce cancellation for timeouts and client disconnects. + +--- + +## 0. Preconditions + +Have in place: + +* `FrameType.Cancel` in `StellaOps.Router.Common.FrameType`. +* `ITransportClient.SendCancelAsync(ConnectionState, Guid, string?)` in Common. +* Minimal InMemory path from HTTP → gateway → microservice (HELLO + REQUEST/RESPONSE) working. + +If `FrameType.Cancel` or `SendCancelAsync` aren’t there yet, add them first. + +--- + +## 1. Common: cancel payload (optional, but useful) + +If you want reasons attached, add a DTO in Common: + +```csharp +public sealed class CancelPayload +{ + public string Reason { get; init; } = string.Empty; // eg: "ClientDisconnected", "Timeout" +} +``` + +You’ll serialize this into `Frame.Payload` when sending a CANCEL. If you don’t care about reasons yet, you can skip the payload and just use the correlation id. + +No behavior in Common, just the shape. + +--- + +## 2. Gateway: trigger CANCEL on client abort and timeout + +### 2.1 Extend `TransportDispatchMiddleware` + +You already: + +* Generate a `correlationId`. +* Build a `FrameType.Request`. +* Call `ITransportClient.SendRequestAsync(...)` and await it. + +Now: + +1. Create a linked CTS that combines: + + * `HttpContext.RequestAborted` + * The endpoint timeout + +2. Register a callback on `RequestAborted` that sends a CANCEL with the same correlationId. + +3. On `OperationCanceledException` where the HTTP token is not canceled (pure timeout), send a CANCEL once and return 504. + +Sketch: + +```csharp +public async Task Invoke(HttpContext context, ITransportClient transportClient) +{ + var decision = (RoutingDecision)context.Items[RouterHttpContextKeys.RoutingDecision]!; + var correlationId = Guid.NewGuid(); + + // build requestFrame as before + + var timeout = decision.EffectiveTimeout; + using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(context.RequestAborted); + linkedCts.CancelAfter(timeout); + + // fire-and-forget cancel on client disconnect + context.RequestAborted.Register(() => + { + _ = transportClient.SendCancelAsync( + decision.Connection, correlationId, "ClientDisconnected"); + }); + + Frame responseFrame; + try + { + responseFrame = await transportClient.SendRequestAsync( + decision.Connection, + requestFrame, + timeout, + linkedCts.Token); + } + catch (OperationCanceledException) when (!context.RequestAborted.IsCancellationRequested) + { + // internal timeout + await transportClient.SendCancelAsync( + decision.Connection, correlationId, "Timeout"); + + context.Response.StatusCode = StatusCodes.Status504GatewayTimeout; + await context.Response.WriteAsync("Upstream timeout"); + return; + } + + // existing response mapping goes here +} +``` + +Key points: + +* The gateway sends CANCEL **as soon as**: + + * The client disconnects (RequestAborted). + * Or the internal timeout triggers (catch branch). +* We do not need any global correlation registry on the gateway side; the middleware has the `correlationId` and `Connection`. + +--- + +## 3. InMemory transport: propagate CANCEL to microservice + +### 3.1 Implement `SendCancelAsync` in `InMemoryTransportClient` (gateway side) + +In your gateway InMemory implementation: + +```csharp +public Task SendCancelAsync(ConnectionState connection, Guid correlationId, string? reason = null) +{ + var payload = reason is null + ? Array.Empty() + : SerializeCancelPayload(new CancelPayload { Reason = reason }); + + var frame = new Frame + { + Type = FrameType.Cancel, + CorrelationId = correlationId, + Payload = payload + }; + + return _hub.SendFromGatewayAsync(connection.ConnectionId, frame, CancellationToken.None); +} +``` + +`_hub.SendFromGatewayAsync` must route the frame to the microservice’s receive loop for that connection. + +### 3.2 Hub routing + +Ensure your `IInMemoryRouterHub` implementation: + +* When `SendFromGatewayAsync(connectionId, cancelFrame, ct)` is called: + + * Enqueues that frame onto the microservice’s incoming channel (`GetFramesForMicroserviceAsync` stream). + +No extra logic; just treat CANCEL like REQUEST/HELLO in terms of delivery. + +--- + +## 4. Microservice: track in-flight requests + +Now microservice needs to know **which** request to cancel when a CANCEL arrives. + +### 4.1 In-flight registry + +In the microservice connection class (the one doing the receive loop): + +```csharp +private readonly ConcurrentDictionary _inflight = + new(); + +private sealed class RequestExecution +{ + public CancellationTokenSource Cts { get; init; } = default!; + public Task ExecutionTask { get; init; } = default!; +} +``` + +When a `Request` frame arrives: + +* Create a `CancellationTokenSource`. +* Start the handler using that token. +* Store both in `_inflight`. + +Example pattern in `ReceiveLoopAsync`: + +```csharp +private async Task ReceiveLoopAsync(CancellationToken ct) +{ + await foreach (var frame in _routerClient.GetIncomingFramesAsync(ct)) + { + switch (frame.Type) + { + case FrameType.Request: + HandleRequest(frame); + break; + + case FrameType.Cancel: + HandleCancel(frame); + break; + + // other frame types... + } + } +} + +private void HandleRequest(Frame frame) +{ + var cts = new CancellationTokenSource(); + var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(cts.Token); // later link to global shutdown if needed + + var exec = new RequestExecution + { + Cts = cts, + ExecutionTask = HandleRequestCoreAsync(frame, linkedCts.Token) + }; + + _inflight[frame.CorrelationId] = exec; + + _ = exec.ExecutionTask.ContinueWith(_ => + { + _inflight.TryRemove(frame.CorrelationId, out _); + cts.Dispose(); + linkedCts.Dispose(); + }, TaskScheduler.Default); +} +``` + +### 4.2 Wire CancellationToken into dispatcher + +`HandleRequestCoreAsync` should: + +* Deserialize the request payload. +* Build a `RawRequestContext` with `CancellationToken = token`. +* Pass that token through to: + + * `IRawStellaEndpoint.HandleAsync(context)` (via the context). + * Or typed handler adapter (`IStellaEndpoint<,>` / `IStellaEndpoint`), passing it explicitly. + +Example pattern: + +```csharp +private async Task HandleRequestCoreAsync(Frame frame, CancellationToken ct) +{ + var req = DeserializeRequestPayload(frame.Payload); + + if (!_catalog.TryGetHandler(req.Method, req.Path, out var registration)) + { + var notFound = BuildNotFoundResponse(frame.CorrelationId); + await _routerClient.SendFrameAsync(notFound, ct); + return; + } + + using var bodyStream = new MemoryStream(req.Body); // minimal case + + var ctx = new RawRequestContext + { + Method = req.Method, + Path = req.Path, + Headers = req.Headers, + Body = bodyStream, + CancellationToken = ct + }; + + var handler = (IRawStellaEndpoint)_serviceProvider.GetRequiredService(registration.HandlerType); + + var response = await handler.HandleAsync(ctx); + + var respFrame = BuildResponseFrame(frame.CorrelationId, response); + await _routerClient.SendFrameAsync(respFrame, ct); +} +``` + +Now each handler sees a token that will be canceled when a CANCEL frame arrives. + +### 4.3 Handle CANCEL frames + +When a `Cancel` frame arrives: + +```csharp +private void HandleCancel(Frame frame) +{ + if (_inflight.TryGetValue(frame.CorrelationId, out var exec)) + { + exec.Cts.Cancel(); + } + // Ignore if not found (e.g. already completed) +} +``` + +If you care about the reason, deserialize `CancelPayload` and log it; not required for behavior. + +--- + +## 5. Handler guidance (for your Microservice docs) + +In `Stella Ops Router – Microservice.md`, add simple rules devs must follow: + +* Any long‑running or IO-heavy code in endpoints MUST: + + * Accept a `CancellationToken` (for typed endpoints). + * Or use `RawRequestContext.CancellationToken` for raw endpoints. +* Always pass the token into: + + * DB calls. + * File I/O and stream operations. + * HTTP/gRPC calls to other services. +* Do not swallow `OperationCanceledException` unless there is a good reason; normally let it bubble or treat it as a normal cancellation. + +Concrete example for devs: + +```csharp +[StellaEndpoint("POST", "/billing/slow-operation")] +public sealed class SlowEndpoint : IRawStellaEndpoint +{ + public async Task HandleAsync(RawRequestContext ctx) + { + // Correct: observe token + await Task.Delay(TimeSpan.FromMinutes(5), ctx.CancellationToken); + + return new RawResponse { StatusCode = 204 }; + } +} +``` + +--- + +## 6. Tests + +### 6.1 Client abort → CANCEL + +Test outline: + +* Setup: + + * Gateway + microservice wired via InMemory hub. + * Microservice endpoint that: + + * Waits on `Task.Delay(TimeSpan.FromMinutes(5), ctx.CancellationToken)`. + +* Test: + + 1. Start HTTP request to `/slow`. + 2. After sending request, cancel the client’s HttpClient token or close the connection. + 3. Assert: + + * Gateway’s InMemory transport sent a `FrameType.Cancel`. + * Microservice’s handler is canceled (e.g. no longer running after a short time). + * No response (or partial) is written; HTTP side will produce whatever your test harness expects when client aborts. + +### 6.2 Gateway timeout → CANCEL + +* Configure endpoint timeout small (e.g. 100 ms). +* Have endpoint sleep for 5 seconds with the token. +* Assert: + + * Gateway returns 504. + * Cancel frame was sent. + * Handler is canceled (task completes early). + +These tests lock in the semantics so later additions (real transports, streaming) don’t regress cancellation. + +--- + +## 7. Done criteria for “Add cancellation semantics (with InMemory)” + +You can mark step 7 as complete when: + +* For every routed request, the gateway knows its correlationId and connection. +* On client disconnect: + + * Gateway sends a `FrameType.Cancel` with that correlationId. +* On internal timeout: + + * Gateway sends a `FrameType.Cancel` and returns 504 to the client. +* InMemory hub delivers CANCEL frames to the microservice. +* Microservice: + + * Tracks in‑flight requests by correlationId. + * Cancels the proper `CancellationTokenSource` when CANCEL arrives. + * Passes the token into handlers via `RawRequestContext` and typed adapters. +* At least one automated test proves: + + * Cancellation propagates from gateway to microservice and stops the handler. + +Once this is done, you’ll be in good shape to add streaming & payload-limits on top, because the cancel path is already wired end‑to‑end. diff --git a/docs/router/08-Step.md b/docs/router/08-Step.md new file mode 100644 index 000000000..95c179b75 --- /dev/null +++ b/docs/router/08-Step.md @@ -0,0 +1,501 @@ +For this step you’re teaching the system to handle **streams** instead of always buffering, and to **enforce payload limits** so the gateway can’t be DoS’d by large uploads. Still only using the InMemory transport. + +Goal state: + +* Gateway can stream HTTP request/response bodies to/from microservice without buffering everything. +* Gateway enforces per‑call and global/in‑flight payload limits. +* Microservice sees a `Stream` on `RawRequestContext.Body` and reads from it. +* All of this works over the existing InMemory “connection”. + +I’ll break it into concrete tasks. + +--- + +## 0. Preconditions + +Make sure you already have: + +* Minimal InMemory routing working: + + * HTTP → gateway → InMemory → microservice → InMemory → HTTP. +* Cancellation wired (step 7): + + * `FrameType.Cancel`. + * `ITransportClient.SendCancelAsync` implemented for InMemory. + * Microservice uses `CancellationToken` in `RawRequestContext`. + +Then layer streaming & limits on top. + +--- + +## 1. Confirm / finalize Common primitives for streaming & limits + +**Project:** `StellaOps.Router.Common` + +Tasks: + +1. Ensure `FrameType` has: + + ```csharp + public enum FrameType : byte + { + Hello = 1, + Heartbeat = 2, + EndpointsUpdate = 3, + Request = 4, + RequestStreamData = 5, + Response = 6, + ResponseStreamData = 7, + Cancel = 8 + } + ``` + + You may not *use* `RequestStreamData` / `ResponseStreamData` in InMemory implementation initially if you choose the bridging approach, but having them defined keeps the model coherent. + +2. Ensure `EndpointDescriptor` has: + + ```csharp + public bool SupportsStreaming { get; init; } + ``` + +3. Ensure `PayloadLimits` type exists (in Common or Config, but referenced by both): + + ```csharp + public sealed class PayloadLimits + { + public long MaxRequestBytesPerCall { get; set; } // per HTTP request + public long MaxRequestBytesPerConnection { get; set; } // per microservice connection + public long MaxAggregateInflightBytes { get; set; } // across all requests + } + ``` + +4. `ITransportClient` already contains: + + ```csharp + Task SendStreamingAsync( + ConnectionState connection, + Frame requestHeader, + Stream requestBody, + Func readResponseBody, + PayloadLimits limits, + CancellationToken ct); + ``` + + If not, add it now (implementation will be InMemory-only for this step). + +No logic in Common; just shapes. + +--- + +## 2. Gateway: payload budget tracker + +You need a small service in the gateway that tracks in‑flight bytes to enforce limits. + +**Project:** `StellaOps.Gateway.WebService` + +### 2.1 Define a budget interface + +```csharp +public interface IPayloadBudget +{ + bool TryReserve(string connectionId, Guid requestId, long bytes); + void Release(string connectionId, Guid requestId, long bytes); +} +``` + +### 2.2 Implement a simple in-memory tracker + +Implementation outline: + +* Track: + + * `long _globalInflightBytes`. + * `Dictionary _perConnectionInflightBytes`. + * `Dictionary _perRequestInflightBytes`. + +All updated under a lock or `ConcurrentDictionary` + `Interlocked`. + +Logic for `TryReserve`: + +* Compute proposed: + + * `newGlobal = _globalInflightBytes + bytes` + * `newConn = perConnection[connectionId] + bytes` + * `newReq = perRequest[requestId] + bytes` +* If any exceed configured limits (`PayloadLimits` from config), return `false`. +* Else: + + * Commit updates and return `true`. + +`Release` subtracts the bytes, never going below zero. + +Register in DI: + +```csharp +services.AddSingleton(); +``` + +--- + +## 3. Gateway: choose buffered vs streaming path + +Extend `TransportDispatchMiddleware` to branch on mode. + +**Project:** `StellaOps.Gateway.WebService` + +### 3.1 Decide mode + +At the start of the middleware: + +```csharp +var decision = (RoutingDecision)context.Items[RouterHttpContextKeys.RoutingDecision]!; +var endpoint = decision.Endpoint; +var limits = _options.Value.PayloadLimits; // from RouterConfig + +var supportsStreaming = endpoint.SupportsStreaming; +var hasKnownLength = context.Request.ContentLength.HasValue; +var contentLength = context.Request.ContentLength ?? -1; + +// Simple rule for now: +var useStreaming = + supportsStreaming && + (!hasKnownLength || contentLength > limits.MaxRequestBytesPerCall); +``` + +* If `useStreaming == false`: + + * Use buffered path with hard size checks. +* If `useStreaming == true`: + + * Use streaming path (`ITransportClient.SendStreamingAsync`). + +--- + +## 4. Gateway: buffered path with limits + +**Still in `TransportDispatchMiddleware`** + +### 4.1 Early 413 check + +When `supportsStreaming == false`: + +1. If `Content-Length` known and: + + ```csharp + if (hasKnownLength && contentLength > limits.MaxRequestBytesPerCall) + { + context.Response.StatusCode = StatusCodes.Status413PayloadTooLarge; + return; + } + ``` + +2. When reading body into memory: + + * Read in chunks. + * Track `bytesReadThisCall`. + * If `bytesReadThisCall > limits.MaxRequestBytesPerCall`, abort and return 413. + +You don’t have to call `IPayloadBudget` for buffered mode yet; you can, but the hard per-call limit already protects RAM for this step. + +Buffered path then proceeds as before: + +* Build `MinimalRequestPayload` with full body. +* Send via `SendRequestAsync`. +* Map response. + +--- + +## 5. Gateway: streaming path (InMemory) + +This is the new part. + +### 5.1 Use `ITransportClient.SendStreamingAsync` + +In the `useStreaming == true` branch: + +```csharp +var correlationId = Guid.NewGuid(); + +var headerPayload = new MinimalRequestPayload +{ + Method = context.Request.Method, + Path = context.Request.Path.ToString(), + Headers = ExtractHeaders(context.Request), + Body = Array.Empty(), // streaming body will follow + IsStreaming = true // add this flag to your payload DTO +}; + +var headerFrame = new Frame +{ + Type = FrameType.Request, + CorrelationId = correlationId, + Payload = SerializeRequestPayload(headerPayload) +}; + +using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(context.RequestAborted); +linkedCts.CancelAfter(decision.EffectiveTimeout); + +// register cancel → SendCancelAsync (already done in step 7) + +await _transportClient.SendStreamingAsync( + decision.Connection, + headerFrame, + context.Request.Body, + async responseBodyStream => + { + // Copy microservice stream directly to HTTP response + await responseBodyStream.CopyToAsync(context.Response.Body, linkedCts.Token); + }, + limits, + linkedCts.Token); +``` + +Key points: + +* Streaming path does not buffer the whole body. +* Limits and cancellation are enforced inside `SendStreamingAsync`. + +--- + +## 6. InMemory transport: streaming implementation + +**Project:** gateway side InMemory `ITransportClient` implementation and InMemory router hub; microservice side connection. + +For InMemory, you can model streaming via **bridged streams**: a producer/consumer pair in memory. + +### 6.1 Add streaming call to InMemory client + +In `InMemoryTransportClient`: + +```csharp +public async Task SendStreamingAsync( + ConnectionState connection, + Frame requestHeader, + Stream httpRequestBody, + Func readResponseBody, + PayloadLimits limits, + CancellationToken ct) +{ + await _hub.StreamFromGatewayAsync( + connection.ConnectionId, + requestHeader, + httpRequestBody, + readResponseBody, + limits, + ct); +} +``` + +Expose `StreamFromGatewayAsync` on `IInMemoryRouterHub`: + +```csharp +Task StreamFromGatewayAsync( + string connectionId, + Frame requestHeader, + Stream requestBody, + Func readResponseBody, + PayloadLimits limits, + CancellationToken ct); +``` + +### 6.2 InMemory hub streaming strategy (bridging style) + +Inside `StreamFromGatewayAsync`: + +1. Create a **pair of connected streams** for request body: + + * e.g., a custom `ProducerConsumerStream` built on a `Channel` or `System.IO.Pipelines`. + * “Producer” side (writer) will be fed from HTTP. + * “Consumer” side will be given to the microservice as `RawRequestContext.Body`. + +2. Create a **pair of connected streams** for response body: + + * “Consumer” side will be used in `readResponseBody` to write to HTTP. + * “Producer” side will be given to the microservice handler to write response body. + +3. On the microservice side: + + * Build a `RawRequestContext` with `Body = requestBodyConsumerStream` and `CancellationToken = ct`. + * Dispatch to the endpoint handler as usual. + * Have the handler’s `RawResponse.WriteBodyAsync` pointed at `responseBodyProducerStream`. + +4. Parallel tasks: + + * Task 1: Copy HTTP → `requestBodyProducerStream` in chunks, enforcing `PayloadLimits` (see next section). + * Task 2: Execute the handler, which reads from `Body` and writes to `responseBodyProducerStream`. + * Task 3: Copy `responseBodyConsumerStream` → HTTP via `readResponseBody`. + +5. Propagate cancellation: + + * If `ct` is canceled (client disconnect/timeout/payload limit breach): + + * Stop HTTP→requestBody copy. + * Signal stream completion / cancellation to handler. + * Handler should see cancellation via `CancellationToken`. + +Because this is InMemory, you don’t *have* to materialize explicit `RequestStreamData` frames; you only need the behavior. Real transports will implement the same semantics with actual frames. + +--- + +## 7. Enforce payload limits in streaming copy + +Still in `StreamFromGatewayAsync` / InMemory side: + +### 7.1 HTTP → microservice copy with budget + +In Task 1: + +```csharp +var buffer = new byte[64 * 1024]; +int read; +var requestId = requestHeader.CorrelationId; +var connectionId = connectionIdFromArgs; + +while ((read = await httpRequestBody.ReadAsync(buffer, 0, buffer.Length, ct)) > 0) +{ + if (!_budget.TryReserve(connectionId, requestId, read)) + { + // Limit exceeded: signal failure + await _cancelCallback?.Invoke(requestId, "PayloadLimitExceeded"); // or call SendCancelAsync + break; + } + + await requestBodyProducerStream.WriteAsync(buffer.AsMemory(0, read), ct); +} + +// After loop, ensure we release whatever was reserved +_budget.Release(connectionId, requestId, totalBytesReserved); +await requestBodyProducerStream.FlushAsync(ct); +await requestBodyProducerStream.DisposeAsync(); +``` + +If `TryReserve` fails: + +* Stop reading further bytes. +* Trigger cancellation downstream: + + * Either call the existing `SendCancelAsync` path. + * Or signal completion with error and let handler catch cancellation. + +Gateway side should then translate this into 413 or 503 to the client. + +### 7.2 Response copy + +Response path doesn’t need budget tracking (the danger is inbound to gateway); but if you want symmetry, you can also enforce a max outbound size. + +For now, just stream microservice → HTTP through `readResponseBody` until EOF or cancellation. + +--- + +## 8. Microservice side: streaming-aware `RawRequestContext.Body` + +Your streaming bridging already gives the handler a `Stream` that reads what the gateway sends: + +* No changes required in handler interfaces. +* You only need to ensure: + + * `RawRequestContext.Body` **may be non-seekable**. + * Handlers know they must treat it as a forward-only stream. + +Guidance for devs in `Microservice.md`: + +* For binary uploads or large files, implement `IRawStellaEndpoint` and read incrementally: + + ```csharp + [StellaEndpoint("POST", "/billing/invoices/upload")] + public sealed class InvoiceUploadEndpoint : IRawStellaEndpoint + { + public async Task HandleAsync(RawRequestContext ctx) + { + var buffer = new byte[64 * 1024]; + int read; + while ((read = await ctx.Body.ReadAsync(buffer.AsMemory(0, buffer.Length), ctx.CancellationToken)) > 0) + { + // Process chunk + } + + return new RawResponse { StatusCode = 204 }; + } + } + ``` + +--- + +## 9. Tests + +**Scope:** still InMemory, but now streaming & limits. + +### 9.1 Streaming happy path + +* Setup: + + * Endpoint with `SupportsStreaming = true`. + * `IRawStellaEndpoint` that: + + * Counts total bytes read from `ctx.Body`. + * Returns 200. + +* Test: + + * Send an HTTP POST with a body larger than `MaxRequestBytesPerCall`, but with streaming enabled. + * Assert: + + * Gateway does **not** buffer entire body in one array (you can assert via instrumentation or at least confirm no 413). + * Handler sees the full number of bytes. + * Response is 200. + +### 9.2 Per-call limit breach + +* Configure: + + * `SupportsStreaming = false` (or use streaming but set low `MaxRequestBytesPerCall`). +* Test: + + * Send a body larger than limit. + * Assert: + + * Gateway responds 413. + * Handler is not invoked at all. + +### 9.3 Global/in-flight limit breach + +* Configure: + + * `MaxAggregateInflightBytes` very low (e.g. 1 MB). +* Test: + + * Start multiple concurrent streaming requests that each try to send more than the allowed total. + * Assert: + + * Some of them get a CANCEL / error (413 or 503). + * `IPayloadBudget` denies reservations and releases resources correctly. + +--- + +## 10. Done criteria for “Add streaming & payload limits (InMemory)” + +You’re done with this step when: + +* Gateway: + + * Chooses buffered vs streaming based on `EndpointDescriptor.SupportsStreaming` and size. + * Enforces `MaxRequestBytesPerCall` for buffered requests (413 on violation). + * Uses `ITransportClient.SendStreamingAsync` for streaming. + * Has an `IPayloadBudget` preventing excessive in-flight payload accumulation. + +* InMemory transport: + + * Implements `SendStreamingAsync` by bridging HTTP streams to microservice handlers and back. + * Enforces payload limits while copying. + +* Microservice: + + * Receives a functional `Stream` in `RawRequestContext.Body`. + * Can implement `IRawStellaEndpoint` that reads incrementally for large payloads. + +* Tests: + + * Demonstrate a streaming endpoint works for large payloads. + * Demonstrate per-call and aggregate limits are respected and cause rejections/cancellations. + +After this, you can reuse the same semantics when you implement real transports (TCP/TLS/RabbitMQ), with InMemory as your reference implementation. diff --git a/docs/router/09-Step.md b/docs/router/09-Step.md new file mode 100644 index 000000000..35284f19e --- /dev/null +++ b/docs/router/09-Step.md @@ -0,0 +1,562 @@ +For this step you’re taking the protocol you already proved with InMemory and putting it on real transports: + +* TCP (baseline) +* Certificate/TLS (secure TCP) +* UDP (small, non‑streaming) +* RabbitMQ + +The idea: every plugin implements the same `Frame` semantics (HELLO/HEARTBEAT/REQUEST/RESPONSE/CANCEL, plus streaming where supported), and the gateway/microservices don’t change their business logic at all. + +I’ll structure this as a sequence of sub‑steps you can execute in order. + +--- + +## 0. Preconditions + +Before you start adding real transports, make sure: + +* Frame model is stable in `StellaOps.Router.Common`: + + * `Frame`, `FrameType`, `TransportType`. +* Microservice and gateway code use **only**: + + * `ITransportClient` to send (gateway side). + * `ITransportServer` / connection abstractions to receive (gateway side). + * `IMicroserviceConnection` + `ITransportClient` under the hood (microservice side). +* InMemory transport is working with: + + * HELLO + * REQUEST / RESPONSE + * CANCEL + * Streaming & payload limits (step 8) + +If any code still directly talks to “InMemoryRouterHub” from app logic, hide it behind the `ITransportClient` / `ITransportServer` abstractions first. + +--- + +## 1. Freeze the wire protocol and serializer + +**Owner:** protocol / infra dev + +Before touching sockets or RabbitMQ, lock down **how a `Frame` is encoded** on the wire. This must be consistent across all transports except InMemory (which can cheat a bit internally). + +### 1.1 Frame header + +Define a simple binary header; for example: + +* 1 byte: `FrameType` +* 16 bytes: `CorrelationId` (`Guid`) +* 4 bytes: payload length (`int32`, big- or little-endian, but be consistent) + +Total header = 21 bytes. Then `payloadLength` bytes follow. + +You can evolve later but start with something simple. + +### 1.2 Frame serializer + +In a small shared, **non‑ASP.NET** assembly (either Common or a new `StellaOps.Router.Protocol` library), implement: + +```csharp +public interface IFrameSerializer +{ + void WriteFrame(Frame frame, Stream stream, CancellationToken ct); + Task WriteFrameAsync(Frame frame, Stream stream, CancellationToken ct); + + Frame ReadFrame(Stream stream, CancellationToken ct); + Task ReadFrameAsync(Stream stream, CancellationToken ct); +} +``` + +Implementation: + +* Writes header then payload. +* Reads header then payload; throws on EOF. + +For payloads (HELLO, HEARTBEAT, etc.), use one encoding consistently (e.g. `System.Text.Json` for now) and **centralize** DTO ⇒ `byte[]` conversions: + +```csharp +public static class PayloadCodec +{ + public static byte[] Encode(T payload) { ... } + public static T Decode(byte[] bytes) { ... } +} +``` + +All transports use `IFrameSerializer` + `PayloadCodec`. + +--- + +## 2. Introduce a transport registry / resolver + +**Projects:** gateway + microservice +**Owner:** infra dev + +You need a way to map `TransportType` to a concrete plugin. + +### 2.1 Gateway side + +Define: + +```csharp +public interface ITransportClientResolver +{ + ITransportClient GetClient(TransportType transportType); +} + +public interface ITransportServerFactory +{ + ITransportServer CreateServer(TransportType transportType); +} +``` + +Initial implementation: + +* Registers the available clients: + +```csharp +public sealed class TransportClientResolver : ITransportClientResolver +{ + private readonly IServiceProvider _sp; + + public TransportClientResolver(IServiceProvider sp) => _sp = sp; + + public ITransportClient GetClient(TransportType transportType) => + transportType switch + { + TransportType.Tcp => _sp.GetRequiredService(), + TransportType.Certificate=> _sp.GetRequiredService(), + TransportType.Udp => _sp.GetRequiredService(), + TransportType.RabbitMq => _sp.GetRequiredService(), + _ => throw new NotSupportedException($"Transport {transportType} not supported.") + }; +} +``` + +Then in `TransportDispatchMiddleware`, instead of injecting a single `ITransportClient`, inject `ITransportClientResolver` and choose: + +```csharp +var client = clientResolver.GetClient(decision.TransportType); +``` + +### 2.2 Microservice side + +On the microservice, you can do something similar: + +```csharp +internal interface IMicroserviceTransportConnector +{ + Task ConnectAsync(StellaMicroserviceOptions options, CancellationToken ct); +} +``` + +Implement one per transport type; later `StellaMicroserviceOptions.Routers` will determine which transport to use for each router endpoint. + +--- + +## 3. Implement plugin 1: TCP + +Start with TCP; it’s the most straightforward and will largely mirror your InMemory behavior. + +### 3.1 Gateway: `TcpTransportServer` + +**Project:** `StellaOps.Gateway.WebService` or a transport sub-namespace. + +Responsibilities: + +* Listen on a configured TCP port (e.g. from `RouterConfig`). +* Accept connections, each mapping to a `ConnectionId`. +* For each connection: + + * Start a background receive loop: + + * Use `IFrameSerializer.ReadFrameAsync` on a `NetworkStream`. + * On `FrameType.Hello`: + + * Deserialize HELLO payload. + * Build a `ConnectionState` and register with `IGlobalRoutingState`. + * On `FrameType.Heartbeat`: + + * Update heartbeat for that `ConnectionId`. + * On `FrameType.Response` or `ResponseStreamData`: + + * Push frame into the gateway’s correlation / streaming handler (similar to InMemory path). + * On `FrameType.Cancel` (rare from microservice): + + * Optionally implement; can be ignored for now. + * Provide a sending API to the matching `TcpTransportClient` (gateway-side) using `WriteFrameAsync`. + +You will likely have: + +* A `TcpConnectionContext` per connected microservice: + + * Holds `ConnectionId`, `TcpClient`, `NetworkStream`, `TaskCompletionSource` maps for correlation IDs. + +### 3.2 Gateway: `TcpTransportClient` (gateway-side, to microservices) + +Implements `ITransportClient`: + +* `SendRequestAsync`: + + * Given `ConnectionState`: + + * Get the associated `TcpConnectionContext`. + * Register a `TaskCompletionSource` keyed by `CorrelationId`. + * Call `WriteFrameAsync(requestFrame)` on the connection’s stream. + * Await the TCS, which is completed in the receive loop when a `Response` frame arrives. +* `SendStreamingAsync`: + + * Write header `FrameType.Request`. + * Read from `BudgetedRequestStream` in chunks: + + * For TCP plugin you can either: + + * Use `RequestStreamData` frames with chunk payloads, or + * Keep the simple bridging approach and send a single `Request` with all body bytes. + * Since you already validated streaming semantics with InMemory, you can decide: + + * For first version of TCP, **only support buffered data**, then add chunk frames later. +* `SendCancelAsync`: + + * Write a `FrameType.Cancel` frame with the same `CorrelationId`. + +### 3.3 Microservice: `TcpTransportClientConnection` + +**Project:** `StellaOps.Microservice` + +Responsibilities on microservice side: + +* For each `RouterEndpointConfig` where `TransportType == Tcp`: + + * Open a `TcpClient` to `Host:Port`. + * Use `IFrameSerializer` to send: + + * `HELLO` frame (payload = identity + descriptors). + * Periodic `HEARTBEAT` frames. + * `RESPONSE` frames for incoming `REQUEST`s. + +* Receive loop: + + * `ReadFrameAsync` from `NetworkStream`. + * On `REQUEST`: + + * Dispatch through `IEndpointDispatcher`. + * For minimal streaming, treat payload as buffered; you’ll align with streaming later. + * On `CANCEL`: + + * Use correlation ID to cancel the `CancellationTokenSource` you already maintain. + +This is conceptually the same as InMemory but using real sockets. + +--- + +## 4. Implement plugin 2: Certificate/TLS + +Build TLS on top of TCP plugin; do not fork logic unnecessarily. + +### 4.1 Gateway: `TlsTransportServer` + +* Wrap accepted `TcpClient` sockets in `SslStream`. +* Load server certificate from configuration (for the node/region). +* Authenticate client if you want mutual TLS. + +Structure: + +* Reuse almost all of `TcpTransportServer` logic, but instead of `NetworkStream` you use `SslStream` as the underlying stream for `IFrameSerializer`. + +### 4.2 Microservice: `TlsTransportClientConnection` + +* Instead of plain `TcpClient.GetStream`, wrap in `SslStream`. +* Authenticate server (hostname & certificate). +* Optional: present client certificate. + +Configuration fields in `RouterEndpointConfig` (or a TLS-specific sub-config): + +* `UseTls` / `TransportType.Certificate`. +* Certificate paths / thumbprints / validation parameters. + +At the SDK level, you just treat it as a different transport type; protocol remains identical. + +--- + +## 5. Implement plugin 3: UDP (small, non‑streaming) + +UDP is only for small, bounded payloads. No streaming, best‑effort delivery. + +### 5.1 Constraints + +* Use UDP **only** for buffered, small payload endpoints. +* No streaming (`SupportsStreaming` must be `false` for UDP endpoints). +* No guarantee of delivery or ordering; caller must tolerate occasional failures/timeouts. + +### 5.2 Gateway: `UdpTransportServer` + +Responsibilities: + +* Listen on a UDP port. +* Parse each incoming datagram as a full `Frame`: + + * `FrameType.Hello`: + + * Register a “logical connection” keyed by `(remoteEndpoint)` and `InstanceId`. + * `FrameType.Heartbeat`: + + * Update health for that logical connection. + * `FrameType.Response`: + + * Use `CorrelationId` and “connectionId” to complete a `TaskCompletionSource` as with TCP. + +Because UDP is connectionless, your `ConnectionId` can be: + +* A composite of microservice identity + remote endpoint, e.g. `"{instanceId}@{ip}:{port}"`. + +### 5.3 Gateway: `UdpTransportClient` (gateway-side) + +`SendRequestAsync`: + +* Serialize `Frame` to `byte[]`. +* Send via `UdpClient.SendAsync` to the remote endpoint from `ConnectionState`. +* Start a timer: + + * Wait for `Response` datagram with matching `CorrelationId`. + * If none comes within timeout → throw `OperationCanceledException`. + +`SendStreamingAsync`: + +* For this first iteration, **throw NotSupportedException**. +* Router should not route streaming endpoints over UDP; your routing config should enforce that. + +`SendCancelAsync`: + +* Optionally send a CANCEL datagram; but in practice, if requests are small, this is less useful. You can still implement it for symmetry. + +### 5.4 Microservice: UDP connection + +For microservice side: + +* A single `UdpClient` bound to a local port. +* For each configured router (host/port): + + * HELLO: send a `FrameType.Hello` datagram. + * HEARTBEAT: send periodic `FrameType.Heartbeat`. + * REQUEST handling: not needed; UDP plugin is used **for gateway → microservice** only if you design it that way. More likely, microservice is the server in TCP, but for UDP you might decide microservice is listening on port and gateway sends requests. So invert roles if needed. + +Given the complexity and limited utility, you can treat UDP as “advanced/optional transport” and implement it last. + +--- + +## 6. Implement plugin 4: RabbitMQ + +This is conceptually similar to what you had in Serdica. + +### 6.1 Exchange/queue design + +Decide and document (in `Protocol & Transport Specification.md`) something like: + +* Exchange: `stella.router` +* Routing keys: + + * `request.{serviceName}.{version}` — gateway → microservice. + * Microservice’s reply queue per instance: `reply.{serviceName}.{version}.{instanceId}`. + +Rabbit usages: + +* Gateway: + + * Publishes REQUEST frames to `request.{serviceName}.{version}`. + * Consumes from `reply.*` for responses. + +* Microservice: + + * Consumes from `request.{serviceName}.{version}`. + * Publishes responses to its own reply queue; sets `CorrelationId` property. + +### 6.2 Gateway: `RabbitMqTransportClient` + +Implements `ITransportClient`: + +* `SendRequestAsync`: + + * Create a message with: + + * Body = serialized `Frame` (REQUEST or buffered streaming). + * Properties: + + * `CorrelationId` = `frame.CorrelationId`. + * `ReplyTo` = microservice’s reply queue name for this instance. + * Publish to `request.{serviceName}.{version}`. + * Await a response: + + * Consumer on reply queue completes a `TaskCompletionSource` keyed by correlation ID. + +* `SendStreamingAsync`: + + * For v1, you can: + + * Only support buffered endpoints over RabbitMQ (like UDP). + * Or send chunked messages (`RequestStreamData` frames as separate messages) and reconstruct on microservice side. + * I’d recommend: + + * Start with buffered only over RabbitMQ. + * Mark Rabbit as “no streaming support yet” in config. + +* `SendCancelAsync`: + + * Option 1: send a separate CANCEL message with same `CorrelationId`. + * Option 2: rely on timeout; cancellation doesn’t buy much given overhead. + +### 6.3 Microservice: RabbitMQ listener + +* Single `IConnection` and `IModel`. + +* Declare and bind: + + * Service request queue: `request.{serviceName}.{version}`. + * Reply queue: `reply.{serviceName}.{version}.{instanceId}`. + +* Consume request queue: + + * On message: + + * Deserialize `Frame`. + * Dispatch through `IEndpointDispatcher`. + * Publish RESPONSE message to `ReplyTo` queue with same `CorrelationId`. + +If you already have RabbitMQ experience from Serdica, this should feel familiar. + +--- + +## 7. Routing config & transport selection + +**Projects:** router config + microservice options +**Owner:** config / platform dev + +You need to define which transport is actually used in production. + +### 7.1 Gateway config (RouterConfig) + +Per service/instance, store: + +* `TransportType` to listen on / expect connections for. +* Ports / Rabbit URLs / TLS settings. + +Example shape in `RouterConfig`: + +```csharp +public sealed class ServiceInstanceConfig +{ + public string ServiceName { get; set; } = string.Empty; + public string Version { get; set; } = string.Empty; + public string Region { get; set; } = string.Empty; + public TransportType TransportType { get; set; } = TransportType.Udp; // default + public int Port { get; set; } // for TCP/UDP/TLS + public string? RabbitConnectionString { get; set; } + // TLS info, etc. +} +``` + +`StellaOps.Gateway.WebService` startup: + +* Reads these configs. +* Starts corresponding `ITransportServer` instances. + +### 7.2 Microservice options + +`StellaMicroserviceOptions.Routers` entries must define: + +* `Host` +* `Port` +* `TransportType` +* Any transport-specific settings (TLS, Rabbit URL). + +At connect time, microservice chooses: + +* For each `RouterEndpointConfig`, instantiate the right connector: + + ```csharp + switch(config.TransportType) + { + case TransportType.Tcp: + use TcpMicroserviceConnector; + break; + case TransportType.Certificate: + use TlsMicroserviceConnector; + break; + case TransportType.Udp: + use UdpMicroserviceConnector; + break; + case TransportType.RabbitMq: + use RabbitMqMicroserviceConnector; + break; + } + ``` + +--- + +## 8. Implementation order & testing strategy + +**Owner:** tech lead + +Do NOT try to implement all at once. Suggested order: + +1. **TCP**: + + * Reuse InMemory test suite: + + * HELLO + endpoint registration. + * REQUEST → RESPONSE. + * CANCEL. + * Heartbeats. + * (Optional) streaming as buffered stub for v1, then add genuine streaming. + +2. **Certificate/TLS**: + + * Wrap TCP logic in TLS. + * Same tests, plus: + + * Certificate validation. + * Mutual TLS if required. + +3. **RabbitMQ**: + + * Start with buffered-only endpoints. + * Mirror existing InMemory/TCP tests where payloads are small. + * Add tests for connection resilience (reconnect, etc.). + +4. **UDP**: + + * Implement only for very small buffered requests; no streaming. + * Add tests that verify: + + * HELLO + basic health. + * REQUEST → RESPONSE with small payload. + * Proper timeouts. + +At each stage, tests for that plugin must reuse the **same microservice and gateway** code that worked with InMemory. Only the transport factories change. + +--- + +## 9. Done criteria for “Implement real transport plugins one by one” + +You can consider step 9 done when: + +* There are **concrete implementations** of `ITransportServer` + `ITransportClient` for: + + * TCP + * Certificate/TLS + * UDP (buffered only) + * RabbitMQ (buffered at minimum) +* Gateway startup: + + * Reads `RouterConfig`. + * Starts appropriate transport servers per node/region. +* Microservice SDK: + + * Reads `StellaMicroserviceOptions.Routers`. + * Connects to router nodes using the configured `TransportType`. + * Uses the same HELLO/HEARTBEAT/REQUEST/RESPONSE/CANCEL semantics as InMemory. +* The same functional tests that passed for InMemory: + + * Now pass with TCP plugin. + * At least a subset pass with TLS, Rabbit, and UDP, honoring their constraints (no streaming on UDP, etc.). + +From there, you can move into hardening each plugin (reconnect, backoff, error handling) and documenting “which transport to use when” in your router docs. diff --git a/docs/router/10-Step.md b/docs/router/10-Step.md new file mode 100644 index 000000000..81ce479d2 --- /dev/null +++ b/docs/router/10-Step.md @@ -0,0 +1,586 @@ +For this step you’re wiring **configuration** into the system properly: + +* Router reads a strongly‑typed config model (including payload limits, node region, transports). +* Microservices can optionally load a YAML file to **override** endpoint metadata discovered by reflection. +* No behavior changes to routing or transports, just how they get their settings. + +Think “config plumbing and merging rules,” not new business logic. + +--- + +## 0. Preconditions + +Before starting, confirm: + +* `__Libraries/StellaOps.Router.Config` project exists and references `StellaOps.Router.Common`. +* `StellaOps.Microservice` has: + + * `StellaMicroserviceOptions` (ServiceName, Version, Region, InstanceId, Routers, ConfigFilePath). + * Reflection‑based endpoint discovery that produces `EndpointDescriptor` instances. +* Gateway and microservices currently use **hardcoded** or stub config; you’re about to replace that with real config. + +--- + +## 1. Define RouterConfig model and YAML schema + +**Project:** `__Libraries/StellaOps.Router.Config` +**Owner:** config / platform dev + +### 1.1 C# model + +Create clear, minimal models to cover current needs (you can extend later): + +```csharp +namespace StellaOps.Router.Config; + +public sealed class RouterConfig +{ + public GatewayNodeConfig Node { get; set; } = new(); + public PayloadLimits PayloadLimits { get; set; } = new(); + public IList Transports { get; set; } = new List(); + public IList Services { get; set; } = new List(); +} + +public sealed class GatewayNodeConfig +{ + public string NodeId { get; set; } = string.Empty; + public string Region { get; set; } = string.Empty; + public string Environment { get; set; } = "prod"; +} + +public sealed class TransportEndpointConfig +{ + public TransportType TransportType { get; set; } + public int Port { get; set; } // for TCP/UDP/TLS + public bool Enabled { get; set; } = true; + + // TLS-specific + public string? ServerCertificatePath { get; set; } + public string? ServerCertificatePassword { get; set; } + public bool RequireClientCertificate { get; set; } + + // Rabbit-specific + public string? RabbitConnectionString { get; set; } +} + +public sealed class ServiceConfig +{ + public string Name { get; set; } = string.Empty; + public string DefaultVersion { get; set; } = "1.0.0"; + public IList NeighborRegions { get; set; } = new List(); +} +``` + +Use the `PayloadLimits` class from Common (or mirror it here and keep a single definition). + +### 1.2 YAML shape + +Decide and document a YAML layout, e.g.: + +```yaml +node: + nodeId: "gw-eu1-01" + region: "eu1" + environment: "prod" + +payloadLimits: + maxRequestBytesPerCall: 10485760 # 10 MB + maxRequestBytesPerConnection: 52428800 + maxAggregateInflightBytes: 209715200 + +transports: + - transportType: Tcp + port: 45000 + enabled: true + - transportType: Certificate + port: 45001 + enabled: false + serverCertificatePath: "certs/router.pfx" + serverCertificatePassword: "secret" + - transportType: Udp + port: 45002 + enabled: true + - transportType: RabbitMq + enabled: true + rabbitConnectionString: "amqp://guest:guest@localhost:5672" + +services: + - name: "Billing" + defaultVersion: "1.0.0" + neighborRegions: ["eu2", "us1"] + - name: "Identity" + defaultVersion: "2.1.0" + neighborRegions: ["eu2"] +``` + +This YAML is the canonical config for the router; environment variables and JSON can override individual properties later via `IConfiguration`. + +--- + +## 2. Implement Router.Config loader and DI extensions + +**Project:** `StellaOps.Router.Config` + +### 2.1 Choose YAML library + +Add a YAML library (e.g. YamlDotNet) to `StellaOps.Router.Config`: + +```bash +dotnet add src/__Libraries/StellaOps.Router.Config/StellaOps.Router.Config.csproj package YamlDotNet +``` + +### 2.2 Implement simple loader + +Provide a helper that can load YAML into `RouterConfig`: + +```csharp +public static class RouterConfigLoader +{ + public static RouterConfig LoadFromYaml(string path) + { + using var reader = new StreamReader(path); + var yaml = new YamlStream(); + yaml.Load(reader); + + var root = (YamlMappingNode)yaml.Documents[0].RootNode; + var json = ConvertYamlToJson(root); // simplest: walk node, serialize to JSON string + return JsonSerializer.Deserialize(json)!; + } +} +``` + +Alternatively, bind YAML directly to `RouterConfig` with YamlDotNet’s object mapping; the detail is implementation‑specific. + +### 2.3 ASP.NET Core integration extension + +In the router library, add a DI extension the gateway can call: + +```csharp +public static class ServiceCollectionExtensions +{ + public static IServiceCollection AddRouterConfig( + this IServiceCollection services, + IConfiguration configuration) + { + services.Configure(configuration.GetSection("Router")); + services.AddSingleton(sp => sp.GetRequiredService>()); + + return services; + } +} +``` + +Gateway will: + +* Add the YAML file to the configuration builder. +* Call `AddRouterConfig` to bind it. + +--- + +## 3. Wire RouterConfig into Gateway startup & components + +**Project:** `StellaOps.Gateway.WebService` +**Owner:** gateway dev + +### 3.1 Program.cs configuration + +Adjust `Program.cs`: + +```csharp +var builder = WebApplication.CreateBuilder(args); + +// add YAML config +builder.Configuration + .AddJsonFile("appsettings.json", optional: true) + .AddYamlFile("router.yaml", optional: false, reloadOnChange: true) + .AddEnvironmentVariables("STELLAOPS_"); + +// bind RouterConfig +builder.Services.AddRouterConfig(builder.Configuration.GetSection("Router")); + +var app = builder.Build(); +``` + +Key points: + +* `AddYamlFile("router.yaml", reloadOnChange: true)` ensures hot‑reload from YAML. +* `AddEnvironmentVariables("STELLAOPS_")` allows env‑based overrides (optional, but useful). + +### 3.2 Inject config into transport factories and routing + +Where you start transports: + +* Inject `IOptionsMonitor` into your `ITransportServerFactory`, and use `RouterConfig.Transports` to know which servers to create and on which ports. + +Where you need node identity: + +* Inject `IOptionsMonitor` into any service needing `GatewayNodeConfig` (e.g. when building `RoutingContext.GatewayRegion`): + + ```csharp + var nodeRegion = routerConfig.CurrentValue.Node.Region; + ``` + +Where you need payload limits: + +* Inject `IOptionsMonitor` into `IPayloadBudget` or `TransportDispatchMiddleware` to fetch current `PayloadLimits`. + +Because you’re using `IOptionsMonitor`, components can react to changes when `router.yaml` is modified. + +--- + +## 4. Microservice YAML: schema & loader + +**Project:** `__Libraries/StellaOps.Microservice` +**Owner:** SDK dev + +Microservice YAML is optional and used **only** to override endpoint metadata, not to define identity or router pool. + +### 4.1 Define YAML shape + +Keep it focused on endpoints and overrides: + +```yaml +service: + serviceName: "Billing" + version: "1.0.0" + region: "eu1" + +endpoints: + - method: "POST" + path: "/billing/invoices/upload" + defaultTimeout: "00:02:00" + supportsStreaming: true + requiringClaims: + - type: "role" + value: "billing-editor" + - method: "GET" + path: "/billing/invoices/{id}" + defaultTimeout: "00:00:10" + requiringClaims: + - type: "role" + value: "billing-reader" +``` + +Identity (`serviceName`, `version`, `region`) in YAML is **informative**; the authoritative values still come from `StellaMicroserviceOptions`. If they differ, you log, but don’t override options from YAML. + +### 4.2 C# model + +In `StellaOps.Microservice`: + +```csharp +internal sealed class MicroserviceYamlConfig +{ + public MicroserviceYamlService? Service { get; set; } + public IList Endpoints { get; set; } = new List(); +} + +internal sealed class MicroserviceYamlService +{ + public string? ServiceName { get; set; } + public string? Version { get; set; } + public string? Region { get; set; } +} + +internal sealed class MicroserviceYamlEndpoint +{ + public string Method { get; set; } = string.Empty; + public string Path { get; set; } = string.Empty; + public string? DefaultTimeout { get; set; } + public bool? SupportsStreaming { get; set; } + public IList RequiringClaims { get; set; } = new List(); +} +``` + +### 4.3 YAML loader + +Reuse YamlDotNet (add package to `StellaOps.Microservice` if needed): + +```csharp +internal interface IMicroserviceYamlLoader +{ + MicroserviceYamlConfig? Load(string? path); +} + +internal sealed class MicroserviceYamlLoader : IMicroserviceYamlLoader +{ + private readonly ILogger _logger; + + public MicroserviceYamlLoader(ILogger logger) + { + _logger = logger; + } + + public MicroserviceYamlConfig? Load(string? path) + { + if (string.IsNullOrWhiteSpace(path) || !File.Exists(path)) + return null; + + try + { + using var reader = new StreamReader(path); + var deserializer = new DeserializerBuilder().Build(); + return deserializer.Deserialize(reader); + } + catch (Exception ex) + { + _logger.LogError(ex, "Failed to load microservice YAML from {Path}", path); + return null; + } + } +} +``` + +Register in DI: + +```csharp +services.AddSingleton(); +``` + +--- + +## 5. Merge YAML overrides with reflection-discovered endpoints + +**Project:** `StellaOps.Microservice` +**Owner:** SDK dev + +Extend `EndpointCatalog` to apply YAML overrides. + +### 5.1 Extend constructor to accept YAML config + +Adjust `EndpointCatalog`: + +```csharp +internal sealed class EndpointCatalog : IEndpointCatalog +{ + public IReadOnlyList Descriptors { get; } + + private readonly Dictionary<(string Method, string Path), EndpointRegistration> _map; + + public EndpointCatalog( + IEndpointDiscovery discovery, + IMicroserviceYamlLoader yamlLoader, + IOptions optionsAccessor) + { + var options = optionsAccessor.Value; + + var registrations = discovery.DiscoverEndpoints(options); + var yamlConfig = yamlLoader.Load(options.ConfigFilePath); + + registrations = ApplyYamlOverrides(registrations, yamlConfig); + + _map = registrations.ToDictionary( + r => (r.Descriptor.Method, r.Descriptor.Path), + r => r, + StringComparer.OrdinalIgnoreCase); + + Descriptors = registrations.Select(r => r.Descriptor).ToArray(); + } +} +``` + +### 5.2 Implement `ApplyYamlOverrides` + +Key rules: + +* Identity (ServiceName, Version, Region) always come from `StellaMicroserviceOptions`. +* YAML can override: + + * `DefaultTimeout` + * `SupportsStreaming` + * `RequiringClaims` + +Implementation sketch: + +```csharp +private static IReadOnlyList ApplyYamlOverrides( + IReadOnlyList registrations, + MicroserviceYamlConfig? yaml) +{ + if (yaml is null || yaml.Endpoints.Count == 0) + return registrations; + + var overrideMap = yaml.Endpoints.ToDictionary( + e => (e.Method, e.Path), + e => e, + StringComparer.OrdinalIgnoreCase); + + var result = new List(registrations.Count); + + foreach (var reg in registrations) + { + if (!overrideMap.TryGetValue((reg.Descriptor.Method, reg.Descriptor.Path), out var ov)) + { + result.Add(reg); + continue; + } + + var desc = reg.Descriptor; + + var timeout = desc.DefaultTimeout; + if (!string.IsNullOrWhiteSpace(ov.DefaultTimeout) && + TimeSpan.TryParse(ov.DefaultTimeout, out var parsed)) + { + timeout = parsed; + } + + var supportsStreaming = desc.SupportsStreaming; + if (ov.SupportsStreaming.HasValue) + { + supportsStreaming = ov.SupportsStreaming.Value; + } + + var requiringClaims = ov.RequiringClaims.Count > 0 + ? ov.RequiringClaims.ToArray() + : desc.RequiringClaims; + + var overriddenDescriptor = new EndpointDescriptor + { + ServiceName = desc.ServiceName, + Version = desc.Version, + Method = desc.Method, + Path = desc.Path, + DefaultTimeout = timeout, + SupportsStreaming = supportsStreaming, + RequiringClaims = requiringClaims + }; + + result.Add(new EndpointRegistration + { + Descriptor = overriddenDescriptor, + HandlerType = reg.HandlerType + }); + } + + return result; +} +``` + +This ensures code defines the set of endpoints; YAML only tunes metadata. + +--- + +## 6. Hot‑reload / YAML change handling + +**Router side:** you already enabled `reloadOnChange` for `router.yaml`, and use `IOptionsMonitor`. Next: + +* Components that care about changes must **react**: + + * Payload limits: + + * `IPayloadBudget` or `TransportDispatchMiddleware` should read `routerConfig.CurrentValue.PayloadLimits` on each request rather than caching. + * Node region: + + * `RoutingContext.GatewayRegion` can be built from `routerConfig.CurrentValue.Node.Region` per request. + +You do **not** need a custom watcher; `IOptionsMonitor` already tracks config changes. + +**Microservice side:** for now you can start with **load-on-startup** YAML. If you want hot‑reload: + +* Implement a FileSystemWatcher in `MicroserviceYamlLoader` or a small `IHostedService`: + + * Watch `options.ConfigFilePath` for changes. + * On change: + + * Reload YAML. + * Rebuild `EndpointDescriptor` list. + * Send an updated HELLO or an ENDPOINTS_UPDATE frame to router. + +Given complexity, you can postpone true hot reload to a later iteration and document that microservices must be restarted to pick up YAML changes. + +--- + +## 7. Tests + +**Router.Config tests:** + +* Unit tests for `RouterConfigLoader`: + + * Given a YAML string, bind to `RouterConfig` properly. + * Validate `TransportType.Tcp` / `Udp` / `RabbitMq` values map correctly. + +* Integration test: + + * Start gateway with `router.yaml`. + * Access `IOptionsMonitor` in a test controller or test service and assert values. + * Modify YAML on disk (if test infra allows) and ensure values update via `IOptionsMonitor`. + +**Microservice YAML tests:** + +* Unit tests for `MicroserviceYamlLoader`: + + * Load valid YAML, confirm endpoints and claims/timeouts parsed. + +* `EndpointCatalog` tests: + + * Build fake `EndpointRegistration` list from reflection. + * Build YAML overrides. + * Call `ApplyYamlOverrides` and assert: + + * Timeouts updated. + * SupportsStreaming updated. + * RequiringClaims replaced where provided. + * Descriptors with no matching YAML remain unchanged. + +--- + +## 8. Documentation updates + +Update docs under `docs/router`: + +1. **Stella Ops Router – Webserver.md**: + + * Describe `router.yaml`: + + * Node config (region, nodeId). + * PayloadLimits. + * Transports. + * Explain precedence: + + * YAML as base. + * Environment variables can override individual fields via `STELLAOPS_Router__Node__Region` etc. + +2. **Stella Ops Router – Microservice.md**: + + * Explain `ConfigFilePath` in `StellaMicroserviceOptions`. + * Show full example microservice YAML and how it maps to endpoint metadata. + * Clearly state: + + * Identity comes from options (code/config), not YAML. + * YAML can override per‑endpoint timeout, streaming flag, requiringClaims. + * YAML can’t add endpoints that don’t exist in code. + +3. **Stella Ops Router Documentation.md**: + + * Add a short “Configuration” chapter: + + * Where `router.yaml` lives. + * Where microservice YAML lives. + * How to run locally with custom configs. + +--- + +## 9. Done criteria for “Add Router.Config + Microservice YAML integration” + +You can call step 10 complete when: + +* Router: + + * Loads `router.yaml` into `RouterConfig` using `StellaOps.Router.Config`. + * Uses `RouterConfig.Node.Region` when building routing context. + * Uses `RouterConfig.PayloadLimits` for payload budget enforcement. + * Uses `RouterConfig.Transports` to start the right `ITransportServer` instances. + * Supports runtime changes to `router.yaml` via `IOptionsMonitor` for at least node identity and payload limits. + +* Microservice: + + * Accepts optional `ConfigFilePath` in `StellaMicroserviceOptions`. + * Loads YAML (when present) and merges overrides into reflection‑discovered endpoints. + * Sends HELLO with the **merged** descriptors (i.e., YAML-aware defaults). + * Behavior remains unchanged when no YAML is provided (pure reflection mode). + +* Tests: + + * Confirm config binding for router and microservice. + * Confirm YAML overrides are applied correctly to endpoint metadata. + +At that point, configuration is no longer hardcoded, and you have a clear, documented path for both router operators and microservice teams to configure behavior via YAML with predictable precedence. diff --git a/docs/router/11-Step.md b/docs/router/11-Step.md new file mode 100644 index 000000000..1bb55c0f4 --- /dev/null +++ b/docs/router/11-Step.md @@ -0,0 +1,550 @@ +Goal for this step: have a **concrete, runnable example** (gateway + one microservice) and a **clear skeleton** for migrating any existing `StellaOps.*.WebService` into `StellaOps.*.Microservice`. After this, devs should be able to: + +* Run a full vertical slice locally. +* Open a “migration cookbook” and follow a predictable recipe. + +I’ll split it into two tracks: reference example, then migration skeleton. + +--- + +## 1. Reference example: “Billing” vertical slice + +### 1.1 Create the sample microservice project + +**Project:** `src/StellaOps.Billing.Microservice` +**Owner:** feature/example dev + +Tasks: + +1. Create the project: + +```bash +cd src +dotnet new worker -n StellaOps.Billing.Microservice +``` + +2. Add references: + +```bash +dotnet add StellaOps.Billing.Microservice/StellaOps.Billing.Microservice.csproj reference \ + __Libraries/StellaOps.Microservice/StellaOps.Microservice.csproj +dotnet add StellaOps.Billing.Microservice/StellaOps.Billing.Microservice.csproj reference \ + __Libraries/StellaOps.Router.Common/StellaOps.Router.Common.csproj +``` + +3. In `Program.cs`, wire the SDK with **InMemory transport** for now: + +```csharp +var builder = Host.CreateApplicationBuilder(args); + +builder.Services.AddStellaMicroservice(opts => +{ + opts.ServiceName = "Billing"; + opts.Version = "1.0.0"; + opts.Region = "eu1"; + opts.InstanceId = $"billing-{Environment.MachineName}"; + opts.Routers.Add(new RouterEndpointConfig + { + Host = "localhost", + Port = 50050, // to match gateway’s InMemory/TCP harness + TransportType = TransportType.Tcp + }); + opts.ConfigFilePath = "billing.microservice.yaml"; // optional overrides +}); + +var app = builder.Build(); +await app.RunAsync(); +``` + +(You can keep `TransportType` as TCP even if implemented in-process for now; once real TCP is in, nothing changes here.) + +--- + +### 1.2 Implement a few canonical endpoints + +Pick 3–4 endpoints that exercise different features: + +1. **Health / contract check** + +```csharp +[StellaEndpoint("GET", "/ping")] +public sealed class PingEndpoint : IRawStellaEndpoint +{ + public Task HandleAsync(RawRequestContext ctx) + { + var resp = new RawResponse { StatusCode = 200 }; + resp.Headers["Content-Type"] = "text/plain"; + resp.WriteBodyAsync = async stream => + { + await stream.WriteAsync("pong"u8.ToArray(), ctx.CancellationToken); + }; + return Task.FromResult(resp); + } +} +``` + +2. **Simple JSON read/write (non-streaming)** + +```csharp +public sealed record CreateInvoiceRequest(string CustomerId, decimal Amount); +public sealed record CreateInvoiceResponse(Guid Id); + +[StellaEndpoint("POST", "/billing/invoices")] +public sealed class CreateInvoiceEndpoint : IStellaEndpoint +{ + public Task HandleAsync(CreateInvoiceRequest req, CancellationToken ct) + { + // pretend to store in DB + return Task.FromResult(new CreateInvoiceResponse(Guid.NewGuid())); + } +} +``` + +3. **Streaming upload (large file)** + +```csharp +[StellaEndpoint("POST", "/billing/invoices/upload")] +public sealed class InvoiceUploadEndpoint : IRawStellaEndpoint +{ + public async Task HandleAsync(RawRequestContext ctx) + { + var buffer = new byte[64 * 1024]; + var total = 0L; + + int read; + while ((read = await ctx.Body.ReadAsync(buffer.AsMemory(0, buffer.Length), ctx.CancellationToken)) > 0) + { + total += read; + // process chunk or write to temp file + } + + var resp = new RawResponse { StatusCode = 200 }; + resp.Headers["Content-Type"] = "application/json"; + resp.WriteBodyAsync = async stream => + { + var json = $"{{\"bytesReceived\":{total}}}"; + await stream.WriteAsync(System.Text.Encoding.UTF8.GetBytes(json), ctx.CancellationToken); + }; + return resp; + } +} +``` + +This gives devs examples of: + +* Raw endpoint (`/ping`, `/upload`). +* Typed endpoint (`/billing/invoices`). +* Streaming usage (`Body.ReadAsync`). + +--- + +### 1.3 Microservice YAML override example + +**File:** `src/StellaOps.Billing.Microservice/billing.microservice.yaml` + +```yaml +endpoints: + - method: GET + path: /ping + timeout: 00:00:02 + + - method: POST + path: /billing/invoices + timeout: 00:00:05 + supportsStreaming: false + requiringClaims: + - type: role + value: BillingWriter + + - method: POST + path: /billing/invoices/upload + timeout: 00:02:00 + supportsStreaming: true + requiringClaims: + - type: role + value: BillingUploader +``` + +This file demonstrates: + +* Timeout override. +* Streaming flag. +* `RequiringClaims` usage. + +--- + +### 1.4 Gateway example config for Billing + +**File:** `config/router.billing.yaml` (for local dev) + +```yaml +nodeId: "gw-dev-01" +region: "eu1" + +payloadLimits: + maxRequestBytesPerCall: 10485760 # 10 MB + maxRequestBytesPerConnection: 52428800 # 50 MB + maxAggregateInflightBytes: 209715200 # 200 MB + +services: + - name: "Billing" + defaultVersion: "1.0.0" + endpoints: + - method: "GET" + path: "/ping" + # router defaults, if any + - method: "POST" + path: "/billing/invoices" + defaultTimeout: "00:00:05" + requiringClaims: + - type: "role" + value: "BillingWriter" + - method: "POST" + path: "/billing/invoices/upload" + defaultTimeout: "00:02:00" + supportsStreaming: true + requiringClaims: + - type: "role" + value: "BillingUploader" +``` + +This lets you show precedence: + +* Reflection → microservice YAML → router YAML. + +--- + +### 1.5 Gateway wiring for the example + +**Project:** `StellaOps.Gateway.WebService` + +In `Program.cs`: + +1. Load router config and point it to `router.billing.yaml` for dev: + +```csharp +builder.Configuration + .AddJsonFile("appsettings.json", optional: true) + .AddEnvironmentVariables(prefix: "STELLAOPS_"); + +builder.Services.AddOptions() + .Configure((cfg, configuration) => + { + configuration.GetSection("Router").Bind(cfg); + + var yamlPath = configuration["Router:YamlPath"] ?? "config/router.billing.yaml"; + if (File.Exists(yamlPath)) + { + var yamlCfg = RouterConfigLoader.LoadFromFile(yamlPath); + // either cfg = yamlCfg (if you treat YAML as source of truth) + OverlayRouterConfig(cfg, yamlCfg); + } + }); + +builder.Services.AddOptions() + .Configure>((node, routerCfg) => + { + var cfg = routerCfg.Value; + node.NodeId = cfg.NodeId; + node.Region = cfg.Region; + }); +``` + +2. Ensure you start the appropriate transport server (for dev, TCP on localhost:50050): + +* From `RouterConfig.Transports` or a dev shortcut, start the TCP server listening on that port. + +3. HTTP pipeline: + +* `EndpointResolutionMiddleware` +* `RoutingDecisionMiddleware` +* `TransportDispatchMiddleware` + +Now your dev loop is: + +* Run `StellaOps.Gateway.WebService`. +* Run `StellaOps.Billing.Microservice`. +* `curl http://localhost:{gatewayPort}/ping` → should go through gateway to microservice and back. +* Similarly for `/billing/invoices` and `/billing/invoices/upload`. + +--- + +### 1.6 Example documentation + +Create `docs/router/examples/Billing.Sample.md`: + +* “How to run the example”: + + * build solution + * `dotnet run` for gateway + * `dotnet run` for Billing microservice +* Show sample `curl` commands: + + * `curl http://localhost:8080/ping` + * `curl -X POST http://localhost:8080/billing/invoices -d '{"customerId":"C1","amount":123.45}'` + * `curl -X POST http://localhost:8080/billing/invoices/upload --data-binary @bigfile.bin` +* Note where config files live and how to change them. + +This becomes your canonical reference for new teams. + +--- + +## 2. Migration skeleton: from WebService to Microservice + +Now that you have a working example, you need a **repeatable recipe** for migrating any existing `StellaOps.*.WebService` into the microservice router model. + +### 2.1 Define the migration target shape + +For each webservice you migrate, you want: + +* A new project: `StellaOps.{Domain}.Microservice`. + +* Shared domain logic extracted into a library (if not already): `StellaOps.{Domain}.Core` or similar. + +* Controllers → endpoint classes: + + * `Controller` methods ⇨ `[StellaEndpoint]`-annotated types. + * `HttpGet/HttpPost` attributes ⇨ `Method` and `Path` pair. + +* Configuration: + + * WebService’s appsettings routes → microservice YAML + router YAML. + * Authentication/authorization → `RequiringClaims` in endpoint metadata. + +Document this target shape in `docs/router/Migration of Webservices to Microservices.md`. + +--- + +### 2.2 Skeleton microservice template + +Create a **generic** microservice skeleton that any team can copy: + +**Project:** `templates/StellaOps.Template.Microservice` or at least a folder `samples/MigrationSkeleton/`. + +Contents: + +* `Program.cs`: + +```csharp +var builder = Host.CreateApplicationBuilder(args); + +builder.Services.AddStellaMicroservice(opts => +{ + opts.ServiceName = "{DomainName}"; + opts.Version = "1.0.0"; + opts.Region = "eu1"; + opts.InstanceId = "{DomainName}-" + Environment.MachineName; + + // Mandatory router pool configuration + opts.Routers.Add(new RouterEndpointConfig + { + Host = "localhost", // or injected via env + Port = 50050, + TransportType = TransportType.Tcp + }); + + opts.ConfigFilePath = $"{DomainName}.microservice.yaml"; +}); + +// domain DI (reuse existing domain services from WebService) +// builder.Services.AddDomainServices(); + +var app = builder.Build(); +await app.RunAsync(); +``` + +* A sample endpoint mapping from a typical WebService controller method: + + Legacy controller: + + ```csharp + [ApiController] + [Route("api/billing/invoices")] + public class InvoicesController : ControllerBase + { + [HttpPost] + [Authorize(Roles = "BillingWriter")] + public async Task> Create(CreateInvoiceRequest request) + { + var result = await _service.Create(request); + return Ok(result); + } + } + ``` + + Microservice endpoint: + + ```csharp + [StellaEndpoint("POST", "/billing/invoices")] + public sealed class CreateInvoiceEndpoint : IStellaEndpoint + { + private readonly IInvoiceService _service; + + public CreateInvoiceEndpoint(IInvoiceService service) + { + _service = service; + } + + public Task HandleAsync(CreateInvoiceRequest request, CancellationToken ct) + { + return _service.Create(request, ct); + } + } + ``` + + And matching YAML: + + ```yaml + endpoints: + - method: POST + path: /billing/invoices + timeout: 00:00:05 + requiringClaims: + - type: role + value: BillingWriter + ``` + +This skeleton demonstrates the mapping clearly. + +--- + +### 2.3 Migration workflow for a team (per service) + +Put this as a checklist in `Migration of Webservices to Microservices.md`: + +1. **Inventory existing HTTP surface** + + * List all controllers and actions with: + + * HTTP method. + * Route template (full path). + * Auth attributes (`[Authorize(Roles=..)]` or policies). + * Whether the action handles large uploads/downloads. + +2. **Create microservice project** + + * Add `StellaOps.{Domain}.Microservice` using the skeleton. + * Reference domain logic project (`StellaOps.{Domain}.Core`), or extract one if necessary. + +3. **Map each controller action → endpoint** + + For each action: + + * Create an endpoint class in the microservice: + + * `IRawStellaEndpoint` for: + + * Large payloads. + * Very custom body handling. + * `IStellaEndpoint` for standard JSON APIs. + * Use `[StellaEndpoint("{METHOD}", "{PATH}")]` matching the existing route. + +4. **Wire domain services & auth** + + * Register the same domain services the WebService used (DB contexts, repositories, etc.). + * Translate role/claim-based `[Authorize]` usage to microservice YAML `RequiringClaims`. + +5. **Create microservice YAML** + + * For each new endpoint: + + * Define default timeout. + * `supportsStreaming: true` where appropriate. + * `requiringClaims` matching prior auth requirements. + +6. **Update router YAML** + + * Add service entry under `services`: + + * `name: "{Domain}"`. + * `defaultVersion: "1.0.0"`. + * Add endpoints (method/path, router-side overrides if needed). + +7. **Smoke-test locally** + + * Run gateway + microservice side-by-side. + * Hit the same URLs via gateway that previously were served by the WebService directly. + * Compare behavior (status codes, semantics) with existing environment. + +8. **Gradual rollout** + + Strategy options: + + * **Proxy mode**: + + * Keep WebService behind gateway for a while. + * Add router endpoints that proxy to existing WebService (via HTTP) while microservice matures. + * Gradually switch endpoints to microservice once stable. + + * **Blue/green**: + + * Run WebService and Microservice in parallel. + * Route a small percentage of traffic to microservice via router. + * Increase gradually. + + Outline these as patterns in the migration doc, but keep them high-level here. + +--- + +### 2.4 Migration skeleton repository structure + +Add a clear place in repo for skeleton code & docs: + +```text +/docs + /router + Migration of Webservices to Microservices.md + examples/ + Billing.Sample.md + +/samples + /Billing + StellaOps.Billing.Microservice/ # full example project + router.billing.yaml # example router config + /MigrationSkeleton + StellaOps.Template.Microservice/ # template project + example-controller-mapping.md # before/after snippet +``` + +The **skeleton** project should: + +* Compile. +* Contain TODO markers where teams fill in domain pieces. +* Be referenced in the migration doc so people know where to look. + +--- + +### 2.5 Tests to make the reference stick + +Add a minimal test suite around the Billing example: + +* **Integration tests** in `tests/StellaOps.Billing.IntegrationTests`: + + * Start gateway + Billing microservice (using in-memory test host or docker-compose). + * `GET /ping` returns 200 and “pong”. + * `POST /billing/invoices` returns 200 with a JSON body containing an `id`. + * `POST /billing/invoices/upload` with a large payload succeeds and reports `bytesReceived`. + +* Use these tests as a reference for future services: they show how to spin up a microservice + gateway in tests. + +--- + +## 3. Done criteria for step 11 + +You can treat “Build a reference example + migration skeleton” as complete when: + +* `StellaOps.Billing.Microservice` exists, runs, and successfully serves requests through the gateway using your real transport (or InMemory/TCP for dev). +* `router.billing.yaml` plus `billing.microservice.yaml` show config patterns for: + + * timeouts + * streaming + * requiringClaims +* `docs/router/examples/Billing.Sample.md` explains how to run and test the example. +* `Migration of Webservices to Microservices.md` contains: + + * A concrete mapping example (controller → endpoint + YAML). + * A step-by-step migration checklist for teams. + * Pointers to the skeleton project and sample configs. +* A template microservice project exists (`StellaOps.Template.Microservice` or equivalent) that teams can copy to bootstrap new services. + +Once you have this, onboarding new domains and migrating old WebServices stops being an ad-hoc effort and becomes a repeatable, documented process. diff --git a/docs/router/12-Step.md b/docs/router/12-Step.md new file mode 100644 index 000000000..709030c6a --- /dev/null +++ b/docs/router/12-Step.md @@ -0,0 +1,415 @@ +Below is how I’d tell your dev agents to operate on this codebase so it doesn’t turn into chaos over time. + +Think of this as the “rules of engagement” for Stella Ops Router. + +--- + +## 1. Non‑negotiable operating principles + +All agents follow these rules: + +1. **Specs are law** + + * `docs/router/specs.md` is the primary source of truth. + * If code and spec differ: + + * Fix the spec **first** (in a PR), then adjust the code. + * No “quick fixes” that contradict the spec. + +2. **Common & protocol are sacred** + + * `StellaOps.Router.Common` and the wire protocol (Frame/FrameType/serialization) are stable layers. + * Any change to: + + * `Frame`, `FrameType` + * `EndpointDescriptor`, `ConnectionState` + * `ITransportClient` / `ITransportServer` + * …requires: + + * Explicit spec update. + * Compatibility consideration. + * Code review by someone thinking about all transports and both sides (gateway + microservice). + +3. **InMemory first, then real transports** + + * New protocol semantics (e.g., new frame type, new behavior, new timeout rules) MUST: + + 1. Be implemented and proven with InMemory. + 2. Have tests passing with InMemory. + 3. Only then be rolled into TCP/TLS/UDP/RabbitMQ. + +4. **No backdoor HTTP between microservices and router** + + * Microservices must never talk HTTP to the router for control plane or data. + * All microservice–router traffic goes through the registered transports (UDP/TCP/TLS/RabbitMQ) using `Frame`. + +5. **Method + Path = contract** + + * Endpoint identity is always: `HTTP Method + Path`, nothing else. + * No “dynamic” routing hacks that bypass the `(Method, Path)` resolution. + +--- + +## 2. How agents should structure work (vertical slices, not scattered edits) + +Whenever you assign work, agents should: + +1. **Work in vertical slices** + + * Example slice: “Cancellation with InMemory”, “Streaming + payload limits with TCP”, “RabbitMQ buffered requests”. + * Each slice includes: + + * Spec amendments (if needed). + * Common contracts (if needed). + * Implementation (gateway + microservice + transport). + * Tests. + +2. **Avoid cross‑cutting, half‑finished changes** + + * Do not: + + * Change Common, start on TCP, then get bored and leave InMemory broken. + * Do: + + * Finish one vertical slice end‑to‑end, then move on. + +3. **Keep changes small and reviewable** + + * Prefer: + + * One PR for “add YAML overrides merging”. + * Another PR for “add router YAML hot‑reload details”. + * Avoid huge omnibus PRs that change protocol, transports, router, and microservice in one go. + +--- + +## 3. Change categories & review rules + +Agents should classify their work by category and obey the review level. + +1. **Category A – Protocol / Common changes** + + * Affects: + + * `Frame`, `FrameType`, payload DTOs. + * `EndpointDescriptor`, `ConnectionState`, `RoutingDecision`. + * `ITransportClient`, `ITransportServer`. + * Requirements: + + * Spec change with rationale. + * Cross‑side impact analysis: gateway + microservice + all transports. + * Tests updated for InMemory and at least one real transport. + * Review: 2+ reviewers, one acting as “protocol owner”. + +2. **Category B – Router logic / routing plugin** + + * Affects: + + * `IGlobalRoutingState` implementation. + * `IRoutingPlugin` logic (region, ping, heartbeat). + * Requirements: + + * Unit tests for routing plugin (selection rules). + * At least one integration test through gateway + InMemory. + * Review: at least one reviewer who understands region/version semantics. + +3. **Category C – Transport implementation** + + * Affects: + + * TCP/TLS/UDP/RabbitMQ clients & servers. + * Requirements: + + * Transport‑specific tests (connection, basic request/response, timeout). + * No protocol changes. + * Review: 1–2 reviewers, including one who owns that transport. + +4. **Category D – SDK / Microservice developer experience** + + * Affects: + + * `StellaOps.Microservice` public surface, endpoint discovery, YAML merging. + * Requirements: + + * API review for public surface. + * Docs update (`Microservice.md`) if behavior changes. + * Review: 1–2 reviewers. + +5. **Category E – Docs only** + + * Affects: + + * `docs/router/*`, no code. + * Requirements: + + * Ensure docs match current behavior; if not, spawn follow‑up issues. + +--- + +## 4. Workflow per change (what each agent does) + +For any non‑trivial change: + +1. **Check the spec** + + * Confirm that: + + * The desired behavior is already described, or + * You will extend the spec first. + +2. **Update / extend spec if needed** + + * Edit `docs/router/specs.md` or appropriate doc. + * Document: + + * What’s changing. + * Why we need it. + * Which components are affected. + +3. **Adjust Common / contracts if needed** + + * Only after spec is updated. + * Keep changes minimal and backwards compatible where possible. + +4. **Implement in InMemory path** + + * Update: + + * InMemory `ITransportClient`/hub. + * Microservice and gateway logic that rely on it. + * Add tests to prove behavior. + +5. **Port to real transports** + + * Implement the same behavior in: + + * TCP (baseline). + * TLS (wrapping TCP). + * Others when needed. + * Reuse the same InMemory tests pattern for transport tests. + +6. **Add / update tests** + + * Unit tests for logic. + * Integration tests for gateway + microservice via at least one real transport. + +7. **Update documentation** + + * Update relevant docs: + + * `Stella Ops Router - Webserver.md` + * `Stella Ops Router - Microservice.md` + * `Common.md`, if common contracts changed. + * Highlight any new configuration knobs or invariants. + +--- + +## 5. Testing expectations for all agents + +Agents should treat tests as part of the change, not an afterthought. + +1. **Unit tests** + + * For: + + * Routing plugin decisions. + * YAML merge behavior. + * Payload budget logic. + * Goal: + + * All tricky branches are covered. + +2. **Integration tests** + + * For gateway + microservice using: + + * InMemory. + * At least one real transport (TCP in dev). + + * Scenarios to maintain: + + * Simple request/response. + * Streaming upload. + * Cancellation on client abort. + * Timeout leading to CANCEL. + * Payload limit exceeded. + +3. **Smoke tests for examples** + + * Ensure `StellaOps.Billing.Microservice` example always passes a small test: + + * `/billing/health` works. + * `/billing/invoices/upload` streaming behaves. + +4. **CI gating** + + * No PR merges unless: + + * `dotnet build` for solution succeeds. + * All tests pass. + * If agents add new projects/tests, CI must be updated in the same PR. + +--- + +## 6. How agents should use configuration & YAML + +1. **Router side** + + * Always read payload limits, node region, transports from `RouterConfig` (bound from YAML + env). + * Do not hardcode: + + * Limits. + * Regions. + * Ports. + * If behavior depends on config, fetch from `IOptionsMonitor` at runtime, not from cached fields unless you explicitly freeze. + +2. **Microservice side** + + * Identity & router pool: + + * From `StellaMicroserviceOptions` (code/env). + * Endpoint metadata overrides: + + * From YAML (`ConfigFilePath`) merged into reflection result. + * Agents must not let YAML create endpoints that don’t exist in code; overrides only. + +3. **No hidden defaults** + + * If a default is important (e.g. `HeartbeatInterval`), document it and centralize it. + * Don’t sprinkle magic numbers across code. + +--- + +## 7. Adding new capabilities: pattern all agents follow + +When someone wants a new capability (e.g. “retry on transient transport failures”): + +1. **Open a design issue / doc snippet** + + * Describe: + + * Problem. + * Proposed design. + * Where it sits in architecture (router, microservice, transport, config). + +2. **Update spec** + + * Write the behavior in the appropriate doc section. + * Include: + + * API shape (if public). + * Transport impacts. + * Failure modes. + +3. **Follow the vertical slice path** + + * Implement in Common (if needed). + * Implement InMemory. + * Implement in primary transport (TCP). + * Add tests. + * Update docs. + +Agents should not just spike code into TCP implementation without spec or tests. + +--- + +## 8. Logging, tracing, and debugging expectations + +Agents should instrument consistently; this matters for operations and for debugging during development. + +1. **Use structured logging** + + * At minimum, include: + + * `ServiceName` + * `InstanceId` + * `CorrelationId` + * `Method` + * `Path` + * `ConnectionId` + * Never log full payload bodies by default for privacy and performance; log sizes and key metadata instead. + +2. **Trace correlation** + + * Ensure correlation IDs: + + * Propagate from HTTP (gateway) into `Frame.CorrelationId`. + * Are used in logs on both sides (gateway + microservice). + +3. **Agent debugging guidance** + + * When debugging a routing or transport problem: + + * Turn on debug logging for gateway + microservice for that service. + * Use the correlation ID to follow the request end‑to‑end. + * Verify: + + * HELLO registration. + * HEARTBEAT events. + * REQUEST leaving gateway. + * RESPONSE arriving. + +--- + +## 9. Daily agent workflow (practical directions) + +For each day / task, an agent should: + +1. **Start from an issue or spec line item** + + * Never “just code something” without an issue/state in the backlog. + +2. **Locate the relevant doc** + + * Spec section. + * Example docs (e.g. Billing sample). + * Migration doc if working on conversion. + +3. **Work in a feature branch** + + * Branch name reflects scope: `feature/streaming-tcp`, `fix/router-cancellation`, etc. + +4. **Keep notes** + + * If an assumption is made (e.g. “we currently don’t support streaming over RabbitMQ”), note it in the issue. + * If they discover inconsistency in docs, open a doc‑fix issue. + +5. **Finish the full slice** + + * Code + tests + docs. + * Keep partial implementations behind feature flags (if needed) and clearly marked. + +6. **Open PR with clear description** + + * What changed. + * Which spec section it implements or modifies. + * Any risks or roll‑back notes. + +--- + +## 10. Guardrails against drift + +Finally, a few things agents must actively avoid: + +* **No silent protocol changes** + + * Don’t change `FrameType` semantics, payload formats, or header layout without: + + * Spec update. + * Full impact review. + +* **No specless behavior** + + * If something matters at runtime (timeouts, retries, routing rules), it has to be in the docs, not just in someone’s head. + +* **No bypassing of router** + + * Do not introduce “temporary” direct calls from clients to microservices. All client HTTP should go via gateway. + +* **No direct dependencies on specific transports in domain code** + + * Domain and microservice endpoint logic must not know if the transport is TCP, TLS, UDP, or RabbitMQ. They only see `RawRequestContext`, `RawResponse`, and cancellation tokens. + +--- + +If you want, I can turn this into a one‑page “Agent Handbook” markdown file you can drop into `docs/router/AGENTS_PROCESS.md` and link from `specs.md` so every AI or human dev working on this stack has the same ground rules. diff --git a/docs/router/SPRINT_7000_0001_0001_router_skeleton.md b/docs/router/SPRINT_7000_0001_0001_router_skeleton.md new file mode 100644 index 000000000..44a7a25c4 --- /dev/null +++ b/docs/router/SPRINT_7000_0001_0001_router_skeleton.md @@ -0,0 +1,41 @@ +# Sprint 7000·0001·0001 · Router Skeleton + +## Topic & Scope +- Stand up the dedicated StellaOps Router repo skeleton under `docs/router` as per `specs.md` / `01-Step.md`. +- Produce the empty solution structure, projects, references, and placeholder docs ready for future transport/SDK work. +- Enforce .NET 10 (`net10.0`) across all new projects; ignore prior net8 defaults. +- **Working directory:** `docs/router`. + +## Dependencies & Concurrency +- Depends on `docs/router/specs.md` remaining the authoritative requirements source. +- No upstream sprint blockers; this spin-off is self-contained. +- Can run in parallel with other repo work because it writes only under `docs/router`. + +## Documentation Prerequisites +- `docs/router/specs.md` +- `docs/router/implplan.md` +- `docs/router/01-Step.md` + +## Delivery Tracker +| # | Task ID | Status | Key dependency / next step | Owners | Task Definition | +| --- | --- | --- | --- | --- | --- | +| 1 | ROUTER-SKEL-SETUP | TODO | Read specs + step docs | Skeleton Agent | Create repo folders (`src/`, `src/__Libraries/`, `tests/`, `docs/router`) & add `README.md` pointer. | +| 2 | ROUTER-SKEL-SOLUTION | TODO | Task 1 | Skeleton Agent | Generate `StellaOps.Router.sln`, add Gateway + library + test projects targeting `net10.0`. | +| 3 | ROUTER-SKEL-REFS | TODO | Task 2 | Skeleton Agent | Wire project references per plan (Gateway→Common+Config, etc.). | +| 4 | ROUTER-SKEL-BUILDPROPS | TODO | Task 2 | Infra Agent | Add repo-level `Directory.Build.props` pinning `net10.0`, nullable, implicit usings. | +| 5 | ROUTER-SKEL-STUBS | TODO | Tasks 2-4 | Common/Microservice Agents | Add placeholder types/extension methods per `01-Step.md` (no logic). | +| 6 | ROUTER-SKEL-TESTS | TODO | Task 5 | QA Agent | Create dummy `[Fact]` tests in each test project so `dotnet test` passes. | +| 7 | ROUTER-SKEL-CI | TODO | Tasks 2-6 | Infra Agent | Configure CI pipeline running `dotnet restore/build/test` on solution. | + +## Execution Log +| Date (UTC) | Update | Owner | +| --- | --- | --- | +| 2025-12-02 | Created sprint skeleton per router spin-off instructions. | Planning | + +## Decisions & Risks +- Use .NET 10 baseline even though other modules still target net8; future agents must not downgrade frameworks. +- Scope intentionally limited to `docs/router` to avoid cross-repo conflicts; any shared assets must be duplicated or referenced via documentation until later alignment. +- Risk: missing AGENTS.md for this folder—future sprint should establish one if work extends beyond skeleton. + +## Next Checkpoints +- 2025-12-04: Verify solution + CI scaffold committed and passing. diff --git a/docs/router/implplan.md b/docs/router/implplan.md new file mode 100644 index 000000000..e149aa94d --- /dev/null +++ b/docs/router/implplan.md @@ -0,0 +1,356 @@ +Start by treating `docs/router/specs.md` as law. Nothing gets coded that contradicts it. The first sprint or two should be about *wiring the skeleton* and proving the core flows with the simplest possible transport, then layering in the real transports and migration paths. + +I’d structure the work for your agents like this. + +--- + +## 0. Read & freeze invariants + +**All agents:** + +* Read `docs/router/specs.md` end to end. +* Extract and pin the non-negotiables: + + * Method + Path identity. + * Strict semver for versions. + * Region from `GatewayNodeConfig.Region` (no host/header magic). + * No HTTP transport for microservice communications. + * Single connection carrying HELLO + HEARTBEAT + REQUEST/RESPONSE + CANCEL. + * Router treats body as opaque bytes/streams. + * `RequiringClaims` replaces any form of `AllowedRoles`. + +Agree that these are invariants; any future idea that violates them needs an explicit spec change first. + +--- + +## 1. Lay down the solution skeleton + +**“Skeleton” agent (or gateway core agent):** + +Create the basic project structure, no logic yet: + +* `src/__Libraries/StellaOps.Router.Common` +* `src/__Libraries/StellaOps.Router.Config` +* `src/__Libraries/StellaOps.Microservice` +* `src/StellaOps.Gateway.WebService` +* `docs/router/` already has `specs.md` (add placeholders for the other docs). + +Goal: everything builds, but most classes are empty or stubs. + +--- + +## 2. Implement the shared core model (Common) + +**Common/core agent:** + +Implement only the *data* and *interfaces*, no behavior: + +* Enums: + + * `TransportType`, `FrameType`, `InstanceHealthStatus`. +* Models: + + * `ClaimRequirement` + * `EndpointDescriptor` + * `InstanceDescriptor` + * `ConnectionState` + * `RoutingContext`, `RoutingDecision` + * `PayloadLimits` +* Interfaces: + + * `IGlobalRoutingState` + * `IRoutingPlugin` + * `ITransportServer` + * `ITransportClient` +* `Frame` struct/class: + + * `FrameType`, `CorrelationId`, `Payload` (byte[]). + +Leave implementations of `IGlobalRoutingState`, `IRoutingPlugin`, transports, etc., for later steps. + +Deliverable: a stable set of contracts that gateway + microservice SDK depend on. + +--- + +## 3. Build a fake “in-memory” transport plugin + +**Transport agent:** + +Before UDP/TCP/Rabbit, build an **in-process transport**: + +* `InMemoryTransportServer` and `InMemoryTransportClient`. +* They share a concurrent dictionary keyed by `ConnectionId`. +* Frames are passed via channels/queues in memory. + +Purpose: + +* Let you prove HELLO/HEARTBEAT/REQUEST/RESPONSE/CANCEL semantics and routing logic *without* dealing with sockets and Rabbit yet. +* Let you unit and integration test the router and SDK quickly. + +This plugin will never ship to production; it’s only for dev tests and CI. + +--- + +## 4. Microservice SDK: minimal handshake & dispatch (with InMemory) + +**Microservice agent:** + +Initial focus: “connect and say HELLO, then handle a simple request.” + +1. Implement `StellaMicroserviceOptions`. +2. Implement `AddStellaMicroservice(...)`: + + * Bind options. + * Register endpoint handlers and SDK internal services. +3. Endpoint discovery: + + * Implement runtime reflection for `[StellaEndpoint]` + handler types. + * Build in-memory `EndpointDescriptor` list (simple: no YAML yet). +4. Connection: + + * Use `InMemoryTransportClient` to “connect” to a fake router. + * On connect, send a HELLO frame with: + + * Identity. + * Endpoint list and metadata (`SupportsStreaming` false for now, simple `RequiringClaims` empty). +5. Request handling: + + * Implement `IRawStellaEndpoint` and adapter to it. + * Implement `RawRequestContext` / `RawResponse`. + * Implement a dispatcher that: + + * Receives `Request` frame. + * Builds `RawRequestContext`. + * Invokes the correct handler. + * Sends `Response` frame. + +Do **not** handle streaming or cancellation yet; just basic request/response with small bodies. + +--- + +## 5. Gateway: minimal routing using InMemory plugin + +**Gateway agent:** + +Goal: HTTP → in-memory transport → microservice → HTTP response. + +1. Implement `GatewayNodeConfig` and bind it from config. + +2. Implement `IGlobalRoutingState` as a simple in-memory implementation that: + + * Holds `ConnectionState` objects. + * Builds a map `(Method, Path)` → endpoint + connections. + +3. Implement a minimal `IRoutingPlugin` that: + + * For now, just picks *any* connection that has the endpoint (no region/ping logic yet). + +4. Implement minimal HTTP pipeline: + + * `EndpointResolutionMiddleware`: + + * `(Method, Path)` → `EndpointDescriptor` from `IGlobalRoutingState`. + * Naive authorization middleware stub (only checks “needs authenticated user”; ignore real requiringClaims for now). + * `RoutingDecisionMiddleware`: + + * Ask `IRoutingPlugin` for a `RoutingDecision`. + * `TransportDispatchMiddleware`: + + * Build a `Request` frame. + * Use `InMemoryTransportClient` to send and await `Response`. + * Map response to HTTP. + +5. Implement HELLO handler on gateway side: + + * When InMemory “connection” from microservice appears and sends HELLO: + + * Construct `ConnectionState`. + * Update `IGlobalRoutingState` with endpoint → connection mapping. + +Once this works, you have end-to-end: + +* Example microservice. +* Example gateway. +* In-memory transport. +* A couple of test endpoints returning simple JSON. + +--- + +## 6. Add heartbeat, health, and basic routing rules + +**Common/core + gateway agent:** + +Now enforce liveness and basic routing: + +1. Heartbeat: + + * Microservice SDK sends HEARTBEAT frames on a timer. + * Gateway updates `LastHeartbeatUtc` and `Status`. +2. Health: + + * Add background job in gateway that: + + * Marks instances Unhealthy if heartbeat stale. +3. Routing: + + * Enhance `IRoutingPlugin` to: + + * Filter out Unhealthy instances. + * Prefer gateway region (using `GatewayNodeConfig.Region`). + * Use simple `AveragePingMs` stub from request/response timings. + +Still using InMemory transport; just building the selection logic. + +--- + +## 7. Add cancellation semantics (with InMemory) + +**Microservice + gateway agents:** + +Wire up cancellation logic before touching real transports: + +1. Common: + + * Extend `FrameType` with `Cancel`. +2. Gateway: + + * In `TransportDispatchMiddleware`: + + * Tie `HttpContext.RequestAborted` to a `SendCancelAsync` call. + * On timeout, send CANCEL. + * Ignore late `Response`/stream data for canceled correlation IDs. +3. Microservice: + + * Maintain `_inflight` map of correlation → `CancellationTokenSource`. + * When `Cancel` frame arrives, call `cts.Cancel()`. + * Ensure handlers receive and honor `CancellationToken`. + +Prove via tests: if client disconnects, handler stops quickly. + +--- + +## 8. Add streaming & payload limits (still InMemory) + +**Gateway + microservice agents:** + +1. Streaming: + + * Extend InMemory transport to support `RequestStreamData` / `ResponseStreamData` frames. + * On the gateway: + + * For `SupportsStreaming` endpoints, pipe HTTP body stream → frame stream. + * For response, pipe frames → HTTP response stream. + * On microservice: + + * Expose `RawRequestContext.Body` as a stream reading frames as they arrive. + * Allow `RawResponse.WriteBodyAsync` to stream out. + +2. Payload limits: + + * Implement `PayloadLimits` enforcement at gateway: + + * Early reject large `Content-Length`. + * Track counters in streaming; trigger cancellation when exceeding thresholds. + +Demonstrate with a fake “upload” endpoint that uses `IRawStellaEndpoint` and streaming. + +--- + +## 9. Implement real transport plugins one by one + +**Transport agent:** + +Now replace InMemory with real transports: + +Order: + +1. **TCP plugin** (easiest baseline): + + * Length-prefixed frame protocol. + * Connection per microservice instance (or multi-instance if needed later). + * Implement HELLO/HEARTBEAT/REQUEST/RESPONSE/STREAM/CANCEL as per frame model. + +2. **Certificate (TLS) plugin**: + + * Wrap TCP plugin with TLS. + * Add configuration for server & client certs. + +3. **UDP plugin**: + + * Single datagram = single frame; no streaming. + * Enforce `MaxRequestBytesPerCall`. + * Use for small, idempotent operations. + +4. **RabbitMQ plugin**: + + * Add exchanges/queues for HELLO/HEARTBEAT and REQUEST/RESPONSE. + * Use `CorrelationId` properties for matching. + * Guarantee at-most-once semantics where practical. + +While each plugin is built, keep the core router and microservice SDK relying only on `ITransportClient`/`ITransportServer` abstractions. + +--- + +## 10. Add Router.Config + Microservice YAML integration + +**Config agent:** + +1. Implement `__Libraries/StellaOps.Router.Config`: + + * YAML → `RouterConfig` binding. + * Services, endpoints, static instances, payload limits. + * Hot-reload via `IOptionsMonitor` / file watcher. + +2. Implement microservice YAML: + + * Endpoint-level overrides only (timeouts, requiringClaims, SupportsStreaming). + * Merge logic: code defaults → YAML override. + +3. Integrate: + + * Gateway uses RouterConfig for: + + * Defaults when no microservice registered yet. + * Payload limits. + * Microservice uses YAML to refine endpoint metadata before sending HELLO. + +--- + +## 11. Build a reference example + migration skeleton + +**DX / migration agent:** + +1. Build a `StellaOps.Billing.Microservice` example: + + * A couple of simple endpoints (GET/POST). + * One streaming upload endpoint. + * YAML for requiringClaims and timeouts. + +2. Build a `StellaOps.Gateway.WebService` example config around it. + +3. Document the full path: + + * How to run both locally. + * How to add a new endpoint. + * How cancellation behaves (killing the client, watching logs). + * How payload limits work (try to upload too-large file). + +4. Outline migration steps from an imaginary `StellaOps.Billing.WebService` using the patterns in `Migration of Webservices to Microservices.md`. + +--- + +## 12. Process guidance for your agents + +* **Do not jump to UDP/TCP immediately.** + Prove the protocol (HELLO/HEARTBEAT/REQUEST/RESPONSE/STREAM/CANCEL), routing, and limits on the InMemory plugin first. + +* **Guard the invariants.** + If someone proposes “just call HTTP between services” or “let’s derive region from host,” they’re violating spec and must update `docs/router/specs.md` before coding. + +* **Keep Common stable.** + Changes to `StellaOps.Router.Common` must be rare and reviewed; everything else depends on it. + +* **Document as you go.** + Every time a behavior settles (e.g. status mapping, frame layout), update the docs under `docs/router/` so new agents always have a single source of truth. + +If you want, next step I can convert this into a task board (epic → stories) per repo folder, so you can assign specific chunks to named agents. diff --git a/docs/router/specs.md b/docs/router/specs.md new file mode 100644 index 000000000..f55a92f7b --- /dev/null +++ b/docs/router/specs.md @@ -0,0 +1,494 @@ +I’ll group everything into requirement buckets, but keep it all as requirements statements (no rationale). This is the union of what you asked for or confirmed across the whole thread. + +--- + +## 1. Architectural / scope requirements + +* There SHALL be a single HTTP ingress service named `StellaOps.Gateway.WebService`. +* Microservices SHALL NOT expose HTTP to the router; all microservice-to-router traffic (control + data) MUST use in-house transports (UDP, TCP, certificate/TLS, RabbitMQ). +* There SHALL NOT be a separate control-plane service or protocol; each transport connection between a microservice and the router MUST carry: + + * Initial registration (HELLO) and endpoint configuration. + * Ongoing heartbeats. + * Endpoint updates (if any). + * Request/response and streaming data. +* The router SHALL maintain per-connection endpoint mappings and derive its global routing state from the union of all live connections. +* The router SHALL treat request and response bodies as opaque (raw bytes / streams); all deserialization and schema handling SHALL be the microservice’s responsibility. +* The system SHALL support both buffered and streaming request/response flows end-to-end. +* The design MUST reuse only the generic parts of `__SerdicaTemplate` (dynamic endpoint metadata, attribute-based endpoint discovery, request routing patterns, correlation, connection management) and MUST drop Serdica-specific stack (Oracle schema, domain logic, etc.). +* The solution MUST be a simpler, generic replacement for the existing Serdica HTTP→RabbitMQ→microservice design. + +--- + +## 2. Service identity, region, versioning + +* Each microservice instance SHALL be identified by `(ServiceName, Version, Region, InstanceId)`. +* `Version` MUST follow strict semantic versioning (`major.minor.patch`). +* Routing MUST be strict on version: + + * The router MUST only route a request to instances whose `Version` equals the selected version. + * When a version is not explicitly specified by the client, a default version MUST be used (from config or metadata). +* Each gateway node SHALL have a static configuration object `GatewayNodeConfig` containing at least: + + * `Region` (e.g. `"eu1"`). + * `NodeId` (e.g. `"gw-eu1-01"`). + * `Environment` (e.g. `"prod"`). +* Routing decisions MUST use `GatewayNodeConfig.Region` as the node’s region; the router MUST NOT derive region from HTTP headers or URL host names. +* DNS/host naming conventions SHOULD express region in the domain (e.g. `eu1.global.stella-ops.org`, `mainoffice.contoso.stella-ops.org`), but routing logic MUST be driven by `GatewayNodeConfig.Region` rather than by host parsing. + +--- + +## 3. Endpoint identity and metadata + +* Endpoint identity in the router and microservices MUST be `HTTP Method + Path`, for example: + + * `Method`: one of `GET`, `POST`, `PUT`, `PATCH`, `DELETE`. + * `Path`: e.g. `/section/get/{id}`. + +* The router and microservices MUST use the same path template syntax and matching rules (e.g. ASP.NET-style route templates), including decisions on: + + * Case sensitivity. + * Trailing slash handling. + * Parameter segments (e.g. `{id}`). + +* The router MUST resolve an incoming HTTP `(Method, Path)` to a logical endpoint descriptor that includes: + + * ServiceName. + * Version. + * Method. + * Path. + * DefaultTimeout. + * `RequiringClaims`: a list of claim requirements. + * A flag indicating whether the endpoint supports streaming. + +* Every place that previously spoke about `AllowedRoles` MUST be replaced with `RequiringClaims`: + + * Each requirement MUST at minimum contain a `Type` and MAY contain a `Value`. + +* Endpoints MUST support being configured with default `RequiringClaims` in microservices, with the possibility of external override (see Authority section). + +--- + +## 4. Routing algorithm / instance selection + +* Given a resolved endpoint `(ServiceName, Version, Method, Path)`, the router MUST: + + * Filter candidate instances by: + + * Matching `ServiceName`. + * Matching `Version` (strict semver equality). + * Health in an acceptable set (e.g. `Healthy` or `Degraded`). +* Instances MUST have health metadata: + + * `Status` ∈ {`Unknown`, `Healthy`, `Degraded`, `Draining`, `Unhealthy`}. + * `LastHeartbeatUtc`. + * `AveragePingMs`. +* The router’s instance selection MUST obey these rules: + + * Region: + + * Prefer instances whose `Region == GatewayNodeConfig.Region`. + * If none, fall back to configured neighbor regions. + * If none, fall back to all other regions. + * Within a chosen region tier: + + * Prefer lower `AveragePingMs`. + * If several are tied, prefer more recent `LastHeartbeatUtc`. + * If still tied, use a balancing strategy (e.g. random or round-robin). +* The router MUST support a strict fallback order as requested: + + * Prefer “closest by region and heartbeat and ping.” + * If having to choose between worse candidates, fall back in order of: + + * Greater ping (latency). + * Greater heartbeat age. + * Less preferred region tier. + +--- + +## 5. Transport plugin requirements + +* There MUST be a transport plugin abstraction representing how the router and microservices communicate. +* The default transport type MUST be UDP. +* Additional supported transport types MUST include: + + * TCP. + * Certificate-based TCP (TLS / mTLS). + * RabbitMQ. +* There MUST NOT be an HTTP transport plugin; HTTP MUST NOT be used for microservice-to-router communications (control or data). +* Each transport plugin MUST support: + + * Establishing logical connections between microservices and the router. + * Sending/receiving HELLO (registration), HEARTBEAT, optional ENDPOINTS_UPDATE. + * Sending/receiving REQUEST/RESPONSE frames. + * Supporting streaming via REQUEST_STREAM_DATA / RESPONSE_STREAM_DATA frames where the transport allows it. + * Sending/receiving CANCEL frames to abort specific in-flight requests. +* UDP transport: + + * MUST be used only for small/bounded payloads (no unbounded streaming). + * MUST respect configured `MaxRequestBytesPerCall`. +* TCP and Certificate transports: + + * MUST implement a length-prefixed framing protocol capable of multiplexing frames for multiple correlation IDs. + * Certificate transport MUST enforce TLS and support optional mutual TLS (verifiable peer identity). +* RabbitMQ: + + * MUST implement queue/exchange naming and routing keys sufficient to represent logical connections and correlation IDs. + * MUST use message properties (e.g. `CorrelationId`) for request/response matching. + +--- + +## 6. Gateway (`StellaOps.Gateway.WebService`) requirements + +### 6.1 HTTP ingress pipeline + +* The gateway MUST host an ASP.NET Core HTTP server. +* The HTTP middleware pipeline MUST include at least: + + * Forwarded headers handling (when behind reverse proxy). + * Request logging (e.g. via Serilog) including correlation ID, service, endpoint, region, instance. + * Global error-handling middleware. + * Authentication middleware. + * `EndpointResolutionMiddleware` to resolve `(Method, Path)` → endpoint. + * Authorization middleware that enforces `RequiringClaims`. + * `RoutingDecisionMiddleware` to choose connection/instance/transport. + * `TransportDispatchMiddleware` to carry out buffered or streaming dispatch. +* The gateway MUST read `Method` and `Path` from the HTTP request and use them to resolve endpoints. + +### 6.2 Per-connection state and routing view + +* The gateway MUST maintain a `ConnectionState` per logical connection that includes: + + * ConnectionId. + * `InstanceDescriptor` (`InstanceId`, `ServiceName`, `Version`, `Region`). + * `Status`, `LastHeartbeatUtc`, `AveragePingMs`. + * The set of endpoints that this connection serves (`(Method, Path)` → `EndpointDescriptor`). + * The transport type for that connection. +* The gateway MUST maintain a global routing state (`IGlobalRoutingState`) that: + + * Resolves `(Method, Path)` to an `EndpointDescriptor` (service, version, metadata). + * Provides the set of `ConnectionState` objects that can handle a given `(ServiceName, Version, Method, Path)`. + +### 6.3 Buffered vs streaming dispatch + +* The gateway MUST support: + + * **Buffered mode** for small to medium payloads: + + * Read the entire HTTP body into memory (or temp file when above a threshold). + * Send as a single REQUEST payload. + * **Streaming mode** for large or unknown content: + + * Streaming from HTTP body to microservice via a sequence of REQUEST_STREAM_DATA frames. + * Streaming from microservice back to HTTP via RESPONSE_STREAM_DATA frames. +* For each endpoint, the gateway MUST know whether it can use streaming or must use buffered mode (`SupportsStreaming` flag). + +### 6.4 Opaque body handling + +* The gateway MUST treat request and response bodies as opaque byte sequences and MUST NOT attempt to deserialize or interpret payload contents. +* The gateway MUST forward headers and body bytes as given and leave any schema, JSON, or other decoding to the microservice. + +### 6.5 Payload and memory protection + +* The gateway MUST enforce configured payload limits: + + * `MaxRequestBytesPerCall`. + * `MaxRequestBytesPerConnection`. + * `MaxAggregateInflightBytes`. +* If `Content-Length` is known and exceeds `MaxRequestBytesPerCall`, the gateway MUST reject the request early (e.g. HTTP 413 Payload Too Large). +* During streaming, the gateway MUST maintain counters of: + + * Bytes read for this request. + * Bytes for this connection. + * Total in-flight bytes across all requests. +* If any limit is exceeded mid-stream, the gateway MUST: + + * Stop reading the HTTP body. + * Send a CANCEL frame for that correlation ID. + * Abort the stream to the microservice. + * Return an appropriate error to the client (e.g. 413 or 503) and log the incident. + +--- + +## 7. Microservice SDK (`__Libraries/StellaOps.Microservice`) requirements + +### 7.1 Identity & router connections + +* `StellaMicroserviceOptions` MUST let microservices configure: + + * `ServiceName`. + * `Version`. + * `Region`. + * `InstanceId`. + * A list of router endpoints (`Routers` / router pool) including host, port, and transport type for each. + * Optional path to a YAML config file for endpoint-level overrides. +* Providing the router pool (`Routers` / HTTP servers pool) MUST be mandatory; a microservice cannot start without at least one configured router endpoint. +* The router pool SHOULD be configurable via code and MAY optionally be configured via YAML with hot-reload (causing reconnections if changed). + +### 7.2 Endpoint definition & discovery + +* Microservice endpoints MUST be declared using attributes that specify `(Method, Path)`: + + ```csharp + [StellaEndpoint("POST", "/billing/invoices")] + public sealed class CreateInvoiceEndpoint : ... + ``` + +* The SDK MUST support two handler shapes: + + * Raw handler: + + * `IRawStellaEndpoint` taking a `RawRequestContext` and returning a `RawResponse`, where: + + * `RawRequestContext.Body` is a stream (may be buffered or streaming). + * Body contents are raw bytes. + * Typed handlers: + + * `IStellaEndpoint` which takes a typed request and returns a typed response. + * `IStellaEndpoint` which has no request payload and returns a typed response. + +* The SDK MUST adapt typed endpoints to the raw model internally (microservice-side only), leaving the router unaware of types. + +* Endpoint discovery MUST work by: + + * Runtime reflection: scanning assemblies for `[StellaEndpoint]` and handler interfaces. + * Build-time reflection via source generation: + + * A Roslyn source generator MUST generate a descriptor list at build time. + * At runtime, the SDK MUST prefer source-generated metadata and only fall back to reflection if generation is not available. + +### 7.3 Endpoint metadata defaults & overrides + +* Microservices MUST be able to provide default endpoint metadata: + + * `SupportsStreaming` flag. + * Default timeout. + * Default `RequiringClaims`. +* Microservice-local YAML MUST be allowed to override or refine these defaults per endpoint, keyed by `(Method, Path)`. +* Precedence rules MUST be clearly defined and honored: + + * Service identity & router pool: from `StellaMicroserviceOptions` (not YAML). + * Endpoint set: from code (attributes/source gen); YAML MAY override properties but ideally not create endpoints not present in code (policy decision to be documented). + * `RequiringClaims` and timeouts: YAML overrides defaults from code, unless overridden by central Authority. + +### 7.4 Connection behavior + +* On establishing a connection to a router endpoint, the SDK MUST: + + * Immediately send a HELLO frame containing: + + * `ServiceName`, `Version`, `Region`, `InstanceId`. + * The list of endpoints (Method, Path) with their metadata (SupportsStreaming, default timeouts, default RequiringClaims). +* At regular intervals, the SDK MUST send HEARTBEAT frames on each connection indicating: + + * Instance health status. + * Optional metrics (e.g. in-flight request count, error rate). +* The SDK SHOULD support optional ENDPOINTS_UPDATE (or a re-HELLO) to update endpoint metadata at runtime if needed. + +### 7.5 Request handling & streaming + +* For each incoming REQUEST frame: + + * The SDK MUST create a `RawRequestContext` with: + + * Method. + * Path. + * Headers. + * A `Body` stream that either: + + * Wraps a buffered byte array. + * Or exposes streaming reads from subsequent REQUEST_STREAM_DATA frames. + * A `CancellationToken` that will be cancelled when the router sends a CANCEL frame or the connection fails. +* The SDK MUST resolve the correct endpoint handler by `(Method, Path)` using the same path template rules as the router. +* For streaming endpoints, handlers MUST be able to read from `RawRequestContext.Body` incrementally and obey the `CancellationToken`. + +### 7.6 Cancellation handling (microservice side) + +* The SDK MUST maintain a map of in-flight requests by correlation ID, each containing: + + * A `CancellationTokenSource`. + * The task executing the handler. +* Upon receiving a CANCEL frame for a given correlation ID, the SDK MUST: + + * Look up the corresponding entry and call `CancellationTokenSource.Cancel()`. +* Handlers (both raw and typed) MUST receive a `CancellationToken`: + + * They MUST observe the token and be coded to cancel promptly where needed. + * They MUST pass the token to downstream I/O operations (DB calls, file I/O, network). +* If the transport connection is closed, the SDK MUST treat it as a cancellation trigger for all outstanding requests on that connection and cancel their tokens. + +--- + +## 8. Control / health / ping requirements + +* Heartbeats MUST be sent over the same connection as requests (no separate control channel). +* The router MUST: + + * Track `LastHeartbeatUtc` for each connection. + * Derive `InstanceHealthStatus` based on heartbeat recency and optionally metrics. + * Drop or mark as Unhealthy any instances whose heartbeats are stale past configured thresholds. +* The router SHOULD measure network latency (ping) by: + + * Timing request-response round trips, or + * Using explicit ping frames, and updating `AveragePingMs` for each connection. +* The router MUST use heartbeat and ping metrics in its routing decision as described above. + +--- + +## 9. Authorization / requiringClaims / Authority requirements + +* `RequiringClaims` MUST be the only authorization metadata field; `AllowedRoles` MUST NOT be used. +* Every endpoint MUST be able to specify: + + * An empty `RequiringClaims` list (no additional claims required beyond authenticated). + * Or one or more `ClaimRequirement` objects (Type + optional Value). +* The gateway MUST enforce `RequiringClaims` per request: + + * Authorization MUST check that the request’s user principal has all required claims for the endpoint. +* Microservices MUST provide default `RequiringClaims` as part of their HELLO metadata. +* There MUST be a mechanism for an external Authority service to override `RequiringClaims` centrally: + + * Defaults MUST come from microservices. + * Authority MUST be able to push or supply overrides that the gateway applies at startup and/or at runtime. + * The gateway MUST proactively request such overrides on startup (e.g. via a special message or mechanism) before handling traffic, or as early as practical. +* Final, effective `RequiringClaims` enforced at the gateway MUST be derived from microservice defaults plus Authority overrides, with Authority taking precedence where applicable. + +--- + +## 10. Cancellation requirements (router side) + +* The protocol MUST define a `FrameType.Cancel` with: + + * A `CorrelationId` indicating which request to cancel. + * An optional payload containing a reason code (e.g. `"ClientDisconnected"`, `"Timeout"`, `"PayloadLimitExceeded"`). +* The router MUST send CANCEL frames when: + + * The HTTP client disconnects (ASP.NET `HttpContext.RequestAborted` fires) while the request is in progress. + * The router’s effective timeout for the request elapses, and no response has been received. + * The router detects payload/memory limit breaches and has to abort the request. + * The router is shutting down and explicitly aborts in-flight requests (if implemented). +* The router MUST: + + * Stop forwarding any additional REQUEST_STREAM_DATA to the microservice once a CANCEL is sent. + * Stop reading any remaining response frames for that correlation and either: + + * Discard them. + * Or treat them as late, log them, and ignore them. +* For streaming responses, if the HTTP client disconnects or router cancels: + + * The router MUST stop writing to the HTTP response and treat any subsequent frames as ignored. + +--- + +## 11. Configuration and YAML requirements + +* `__Libraries/StellaOps.Router.Config` MUST handle: + + * Binding router config from JSON/appsettings + YAML + environment variables. + * Static service definitions: + + * ServiceName. + * DefaultVersion. + * DefaultTransport. + * Endpoint list (Method, Path) with default timeouts, requiringClaims, streaming flags. + * Static instance definitions (optional): + + * ServiceName, Version, Region, supported transports, plugin-specific settings. + * Global payload limits (`PayloadLimits`). +* Router YAML config MUST support hot-reload: + + * Changes SHOULD be picked up at runtime without restarting the gateway. + * Hot-reload MUST cause in-memory routing state to be updated, including: + + * New or removed services/endpoints. + * New or removed instances (static). + * Updated payload limits. +* Microservice YAML config MUST be optional and used for endpoint-level overrides only, not for identity or router pool configuration. +* The router pool for microservices MUST be configured via code and MAY be backed by YAML (with hot-plug / reconnection behavior) if desired. + +--- + +## 12. Library naming / repo structure requirements + +* The router configuration library MUST be named `__Libraries/StellaOps.Router.Config`. +* The microservice SDK library MUST be named `__Libraries/StellaOps.Microservice`. +* The gateway webservice MUST be named `StellaOps.Gateway.WebService`. +* There MUST be a “common” library for shared types and abstractions (e.g. `__Libraries/StellaOps.Router.Common`). +* Documentation files MUST include at least: + + * `Stella Ops Router.md` (what it is, why, high-level architecture). + * `Stella Ops Router - Webserver.md` (how the webservice works). + * `Stella Ops Router - Microservice.md` (how the microservice SDK works and is implemented). + * `Stella Ops Router - Common.md` (common components and how they are implemented). + * `Migration of Webservices to Microservices.md`. + * `Stella Ops Router Documentation.md` (doc structure & guidance). + +--- + +## 13. Documentation & developer-experience requirements + +* The docs MUST be detailed; “do not spare details” implies: + + * High-fidelity, concrete examples and not hand-wavy descriptions. +* For average C# developers, documentation MUST cover: + + * Exact .NET / ASP.NET Core target version and runtime baseline. + * Required NuGet packages (logging, serialization, YAML parsing, RabbitMQ client, etc.). + * Exact serialization formats for frames and payloads (JSON vs MessagePack vs others). + * Exact framing rules for each transport (length-prefix for TCP/TLS, datagrams for UDP, exchanges/queues for Rabbit). + * Concrete sample `Program.cs` for: + + * A gateway node. + * A microservice. + * Example endpoint implementations: + + * Typed (with and without request). + * Raw streaming endpoints for large payloads. + * Example router YAML and microservice YAML with realistic values. + * Error and HTTP status mapping policy: + + * E.g. “version not found → 404 or 400; no instance available → 503; timeout → 504; payload too large → 413.” + * Guidelines on: + + * When to use UDP vs TCP vs RabbitMQ. + * How to configure and validate certificates for the certificate transport. + * How to write cancellation-friendly handlers (proper use of `CancellationToken`). + * Testing strategies: local dev setups, integration test harnesses, how to run router + microservice together for tests. + * Clear explanation of config precedence: + + * Code options vs YAML vs microservice defaults vs Authority for claims. +* Documentation MUST answer for each major concept: + + * What it is. + * Why it exists. + * How it works. + * How to use it (with examples). + * What happens when it is misused and how to debug issues. + +--- + +## 14. Migration requirements + +* There MUST be a defined migration path from `StellaOps.*.WebServices` to `StellaOps.*.Microservices`. +* Migration documentation MUST cover: + + * Inventorying existing HTTP routes (Method + Path). + * Strategy A (in-place adaptation): + + * Adding microservice SDK into WebService. + * Declaring endpoints with `[StellaEndpoint]`. + * Wrapping existing controller logic in handlers. + * Connecting to the router and validating registration. + * Gradually shifting traffic from direct WebService HTTP ingress to gateway routing. + * Strategy B (split): + + * Extracting domain logic into shared libraries. + * Creating a dedicated microservice project using the SDK. + * Mapping routes and handlers. + * Phasing out or repurposing the original WebService. + * Ensuring cancellation tokens are wired throughout migrated code. + * Handling streaming endpoints (large uploads/downloads) via `IRawStellaEndpoint` and streaming support instead of naive buffered HTTP controllers. + +--- + +If you want, I can next turn this requirement set into a machine-readable checklist (e.g. JSON or YAML) or derive a first-pass implementation roadmap directly from these requirements.