router planning

This commit is contained in:
master
2025-12-02 18:38:32 +02:00
parent 790801f329
commit 0c9e8d5d18
15 changed files with 6439 additions and 0 deletions

356
docs/router/implplan.md Normal file
View File

@@ -0,0 +1,356 @@
Start by treating `docs/router/specs.md` as law. Nothing gets coded that contradicts it. The first sprint or two should be about *wiring the skeleton* and proving the core flows with the simplest possible transport, then layering in the real transports and migration paths.
Id structure the work for your agents like this.
---
## 0. Read & freeze invariants
**All agents:**
* Read `docs/router/specs.md` end to end.
* Extract and pin the non-negotiables:
* Method + Path identity.
* Strict semver for versions.
* Region from `GatewayNodeConfig.Region` (no host/header magic).
* No HTTP transport for microservice communications.
* Single connection carrying HELLO + HEARTBEAT + REQUEST/RESPONSE + CANCEL.
* Router treats body as opaque bytes/streams.
* `RequiringClaims` replaces any form of `AllowedRoles`.
Agree that these are invariants; any future idea that violates them needs an explicit spec change first.
---
## 1. Lay down the solution skeleton
**“Skeleton” agent (or gateway core agent):**
Create the basic project structure, no logic yet:
* `src/__Libraries/StellaOps.Router.Common`
* `src/__Libraries/StellaOps.Router.Config`
* `src/__Libraries/StellaOps.Microservice`
* `src/StellaOps.Gateway.WebService`
* `docs/router/` already has `specs.md` (add placeholders for the other docs).
Goal: everything builds, but most classes are empty or stubs.
---
## 2. Implement the shared core model (Common)
**Common/core agent:**
Implement only the *data* and *interfaces*, no behavior:
* Enums:
* `TransportType`, `FrameType`, `InstanceHealthStatus`.
* Models:
* `ClaimRequirement`
* `EndpointDescriptor`
* `InstanceDescriptor`
* `ConnectionState`
* `RoutingContext`, `RoutingDecision`
* `PayloadLimits`
* Interfaces:
* `IGlobalRoutingState`
* `IRoutingPlugin`
* `ITransportServer`
* `ITransportClient`
* `Frame` struct/class:
* `FrameType`, `CorrelationId`, `Payload` (byte[]).
Leave implementations of `IGlobalRoutingState`, `IRoutingPlugin`, transports, etc., for later steps.
Deliverable: a stable set of contracts that gateway + microservice SDK depend on.
---
## 3. Build a fake “in-memory” transport plugin
**Transport agent:**
Before UDP/TCP/Rabbit, build an **in-process transport**:
* `InMemoryTransportServer` and `InMemoryTransportClient`.
* They share a concurrent dictionary keyed by `ConnectionId`.
* Frames are passed via channels/queues in memory.
Purpose:
* Let you prove HELLO/HEARTBEAT/REQUEST/RESPONSE/CANCEL semantics and routing logic *without* dealing with sockets and Rabbit yet.
* Let you unit and integration test the router and SDK quickly.
This plugin will never ship to production; its only for dev tests and CI.
---
## 4. Microservice SDK: minimal handshake & dispatch (with InMemory)
**Microservice agent:**
Initial focus: “connect and say HELLO, then handle a simple request.”
1. Implement `StellaMicroserviceOptions`.
2. Implement `AddStellaMicroservice(...)`:
* Bind options.
* Register endpoint handlers and SDK internal services.
3. Endpoint discovery:
* Implement runtime reflection for `[StellaEndpoint]` + handler types.
* Build in-memory `EndpointDescriptor` list (simple: no YAML yet).
4. Connection:
* Use `InMemoryTransportClient` to “connect” to a fake router.
* On connect, send a HELLO frame with:
* Identity.
* Endpoint list and metadata (`SupportsStreaming` false for now, simple `RequiringClaims` empty).
5. Request handling:
* Implement `IRawStellaEndpoint` and adapter to it.
* Implement `RawRequestContext` / `RawResponse`.
* Implement a dispatcher that:
* Receives `Request` frame.
* Builds `RawRequestContext`.
* Invokes the correct handler.
* Sends `Response` frame.
Do **not** handle streaming or cancellation yet; just basic request/response with small bodies.
---
## 5. Gateway: minimal routing using InMemory plugin
**Gateway agent:**
Goal: HTTP → in-memory transport → microservice → HTTP response.
1. Implement `GatewayNodeConfig` and bind it from config.
2. Implement `IGlobalRoutingState` as a simple in-memory implementation that:
* Holds `ConnectionState` objects.
* Builds a map `(Method, Path)` → endpoint + connections.
3. Implement a minimal `IRoutingPlugin` that:
* For now, just picks *any* connection that has the endpoint (no region/ping logic yet).
4. Implement minimal HTTP pipeline:
* `EndpointResolutionMiddleware`:
* `(Method, Path)``EndpointDescriptor` from `IGlobalRoutingState`.
* Naive authorization middleware stub (only checks “needs authenticated user”; ignore real requiringClaims for now).
* `RoutingDecisionMiddleware`:
* Ask `IRoutingPlugin` for a `RoutingDecision`.
* `TransportDispatchMiddleware`:
* Build a `Request` frame.
* Use `InMemoryTransportClient` to send and await `Response`.
* Map response to HTTP.
5. Implement HELLO handler on gateway side:
* When InMemory “connection” from microservice appears and sends HELLO:
* Construct `ConnectionState`.
* Update `IGlobalRoutingState` with endpoint → connection mapping.
Once this works, you have end-to-end:
* Example microservice.
* Example gateway.
* In-memory transport.
* A couple of test endpoints returning simple JSON.
---
## 6. Add heartbeat, health, and basic routing rules
**Common/core + gateway agent:**
Now enforce liveness and basic routing:
1. Heartbeat:
* Microservice SDK sends HEARTBEAT frames on a timer.
* Gateway updates `LastHeartbeatUtc` and `Status`.
2. Health:
* Add background job in gateway that:
* Marks instances Unhealthy if heartbeat stale.
3. Routing:
* Enhance `IRoutingPlugin` to:
* Filter out Unhealthy instances.
* Prefer gateway region (using `GatewayNodeConfig.Region`).
* Use simple `AveragePingMs` stub from request/response timings.
Still using InMemory transport; just building the selection logic.
---
## 7. Add cancellation semantics (with InMemory)
**Microservice + gateway agents:**
Wire up cancellation logic before touching real transports:
1. Common:
* Extend `FrameType` with `Cancel`.
2. Gateway:
* In `TransportDispatchMiddleware`:
* Tie `HttpContext.RequestAborted` to a `SendCancelAsync` call.
* On timeout, send CANCEL.
* Ignore late `Response`/stream data for canceled correlation IDs.
3. Microservice:
* Maintain `_inflight` map of correlation → `CancellationTokenSource`.
* When `Cancel` frame arrives, call `cts.Cancel()`.
* Ensure handlers receive and honor `CancellationToken`.
Prove via tests: if client disconnects, handler stops quickly.
---
## 8. Add streaming & payload limits (still InMemory)
**Gateway + microservice agents:**
1. Streaming:
* Extend InMemory transport to support `RequestStreamData` / `ResponseStreamData` frames.
* On the gateway:
* For `SupportsStreaming` endpoints, pipe HTTP body stream → frame stream.
* For response, pipe frames → HTTP response stream.
* On microservice:
* Expose `RawRequestContext.Body` as a stream reading frames as they arrive.
* Allow `RawResponse.WriteBodyAsync` to stream out.
2. Payload limits:
* Implement `PayloadLimits` enforcement at gateway:
* Early reject large `Content-Length`.
* Track counters in streaming; trigger cancellation when exceeding thresholds.
Demonstrate with a fake “upload” endpoint that uses `IRawStellaEndpoint` and streaming.
---
## 9. Implement real transport plugins one by one
**Transport agent:**
Now replace InMemory with real transports:
Order:
1. **TCP plugin** (easiest baseline):
* Length-prefixed frame protocol.
* Connection per microservice instance (or multi-instance if needed later).
* Implement HELLO/HEARTBEAT/REQUEST/RESPONSE/STREAM/CANCEL as per frame model.
2. **Certificate (TLS) plugin**:
* Wrap TCP plugin with TLS.
* Add configuration for server & client certs.
3. **UDP plugin**:
* Single datagram = single frame; no streaming.
* Enforce `MaxRequestBytesPerCall`.
* Use for small, idempotent operations.
4. **RabbitMQ plugin**:
* Add exchanges/queues for HELLO/HEARTBEAT and REQUEST/RESPONSE.
* Use `CorrelationId` properties for matching.
* Guarantee at-most-once semantics where practical.
While each plugin is built, keep the core router and microservice SDK relying only on `ITransportClient`/`ITransportServer` abstractions.
---
## 10. Add Router.Config + Microservice YAML integration
**Config agent:**
1. Implement `__Libraries/StellaOps.Router.Config`:
* YAML → `RouterConfig` binding.
* Services, endpoints, static instances, payload limits.
* Hot-reload via `IOptionsMonitor` / file watcher.
2. Implement microservice YAML:
* Endpoint-level overrides only (timeouts, requiringClaims, SupportsStreaming).
* Merge logic: code defaults → YAML override.
3. Integrate:
* Gateway uses RouterConfig for:
* Defaults when no microservice registered yet.
* Payload limits.
* Microservice uses YAML to refine endpoint metadata before sending HELLO.
---
## 11. Build a reference example + migration skeleton
**DX / migration agent:**
1. Build a `StellaOps.Billing.Microservice` example:
* A couple of simple endpoints (GET/POST).
* One streaming upload endpoint.
* YAML for requiringClaims and timeouts.
2. Build a `StellaOps.Gateway.WebService` example config around it.
3. Document the full path:
* How to run both locally.
* How to add a new endpoint.
* How cancellation behaves (killing the client, watching logs).
* How payload limits work (try to upload too-large file).
4. Outline migration steps from an imaginary `StellaOps.Billing.WebService` using the patterns in `Migration of Webservices to Microservices.md`.
---
## 12. Process guidance for your agents
* **Do not jump to UDP/TCP immediately.**
Prove the protocol (HELLO/HEARTBEAT/REQUEST/RESPONSE/STREAM/CANCEL), routing, and limits on the InMemory plugin first.
* **Guard the invariants.**
If someone proposes “just call HTTP between services” or “lets derive region from host,” theyre violating spec and must update `docs/router/specs.md` before coding.
* **Keep Common stable.**
Changes to `StellaOps.Router.Common` must be rare and reviewed; everything else depends on it.
* **Document as you go.**
Every time a behavior settles (e.g. status mapping, frame layout), update the docs under `docs/router/` so new agents always have a single source of truth.
If you want, next step I can convert this into a task board (epic → stories) per repo folder, so you can assign specific chunks to named agents.