Files
git.stella-ops.org/docs/router/implplan.md
2025-12-02 18:38:32 +02:00

357 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Start by treating `docs/router/specs.md` as law. Nothing gets coded that contradicts it. The first sprint or two should be about *wiring the skeleton* and proving the core flows with the simplest possible transport, then layering in the real transports and migration paths.
Id structure the work for your agents like this.
---
## 0. Read & freeze invariants
**All agents:**
* Read `docs/router/specs.md` end to end.
* Extract and pin the non-negotiables:
* Method + Path identity.
* Strict semver for versions.
* Region from `GatewayNodeConfig.Region` (no host/header magic).
* No HTTP transport for microservice communications.
* Single connection carrying HELLO + HEARTBEAT + REQUEST/RESPONSE + CANCEL.
* Router treats body as opaque bytes/streams.
* `RequiringClaims` replaces any form of `AllowedRoles`.
Agree that these are invariants; any future idea that violates them needs an explicit spec change first.
---
## 1. Lay down the solution skeleton
**“Skeleton” agent (or gateway core agent):**
Create the basic project structure, no logic yet:
* `src/__Libraries/StellaOps.Router.Common`
* `src/__Libraries/StellaOps.Router.Config`
* `src/__Libraries/StellaOps.Microservice`
* `src/StellaOps.Gateway.WebService`
* `docs/router/` already has `specs.md` (add placeholders for the other docs).
Goal: everything builds, but most classes are empty or stubs.
---
## 2. Implement the shared core model (Common)
**Common/core agent:**
Implement only the *data* and *interfaces*, no behavior:
* Enums:
* `TransportType`, `FrameType`, `InstanceHealthStatus`.
* Models:
* `ClaimRequirement`
* `EndpointDescriptor`
* `InstanceDescriptor`
* `ConnectionState`
* `RoutingContext`, `RoutingDecision`
* `PayloadLimits`
* Interfaces:
* `IGlobalRoutingState`
* `IRoutingPlugin`
* `ITransportServer`
* `ITransportClient`
* `Frame` struct/class:
* `FrameType`, `CorrelationId`, `Payload` (byte[]).
Leave implementations of `IGlobalRoutingState`, `IRoutingPlugin`, transports, etc., for later steps.
Deliverable: a stable set of contracts that gateway + microservice SDK depend on.
---
## 3. Build a fake “in-memory” transport plugin
**Transport agent:**
Before UDP/TCP/Rabbit, build an **in-process transport**:
* `InMemoryTransportServer` and `InMemoryTransportClient`.
* They share a concurrent dictionary keyed by `ConnectionId`.
* Frames are passed via channels/queues in memory.
Purpose:
* Let you prove HELLO/HEARTBEAT/REQUEST/RESPONSE/CANCEL semantics and routing logic *without* dealing with sockets and Rabbit yet.
* Let you unit and integration test the router and SDK quickly.
This plugin will never ship to production; its only for dev tests and CI.
---
## 4. Microservice SDK: minimal handshake & dispatch (with InMemory)
**Microservice agent:**
Initial focus: “connect and say HELLO, then handle a simple request.”
1. Implement `StellaMicroserviceOptions`.
2. Implement `AddStellaMicroservice(...)`:
* Bind options.
* Register endpoint handlers and SDK internal services.
3. Endpoint discovery:
* Implement runtime reflection for `[StellaEndpoint]` + handler types.
* Build in-memory `EndpointDescriptor` list (simple: no YAML yet).
4. Connection:
* Use `InMemoryTransportClient` to “connect” to a fake router.
* On connect, send a HELLO frame with:
* Identity.
* Endpoint list and metadata (`SupportsStreaming` false for now, simple `RequiringClaims` empty).
5. Request handling:
* Implement `IRawStellaEndpoint` and adapter to it.
* Implement `RawRequestContext` / `RawResponse`.
* Implement a dispatcher that:
* Receives `Request` frame.
* Builds `RawRequestContext`.
* Invokes the correct handler.
* Sends `Response` frame.
Do **not** handle streaming or cancellation yet; just basic request/response with small bodies.
---
## 5. Gateway: minimal routing using InMemory plugin
**Gateway agent:**
Goal: HTTP → in-memory transport → microservice → HTTP response.
1. Implement `GatewayNodeConfig` and bind it from config.
2. Implement `IGlobalRoutingState` as a simple in-memory implementation that:
* Holds `ConnectionState` objects.
* Builds a map `(Method, Path)` → endpoint + connections.
3. Implement a minimal `IRoutingPlugin` that:
* For now, just picks *any* connection that has the endpoint (no region/ping logic yet).
4. Implement minimal HTTP pipeline:
* `EndpointResolutionMiddleware`:
* `(Method, Path)``EndpointDescriptor` from `IGlobalRoutingState`.
* Naive authorization middleware stub (only checks “needs authenticated user”; ignore real requiringClaims for now).
* `RoutingDecisionMiddleware`:
* Ask `IRoutingPlugin` for a `RoutingDecision`.
* `TransportDispatchMiddleware`:
* Build a `Request` frame.
* Use `InMemoryTransportClient` to send and await `Response`.
* Map response to HTTP.
5. Implement HELLO handler on gateway side:
* When InMemory “connection” from microservice appears and sends HELLO:
* Construct `ConnectionState`.
* Update `IGlobalRoutingState` with endpoint → connection mapping.
Once this works, you have end-to-end:
* Example microservice.
* Example gateway.
* In-memory transport.
* A couple of test endpoints returning simple JSON.
---
## 6. Add heartbeat, health, and basic routing rules
**Common/core + gateway agent:**
Now enforce liveness and basic routing:
1. Heartbeat:
* Microservice SDK sends HEARTBEAT frames on a timer.
* Gateway updates `LastHeartbeatUtc` and `Status`.
2. Health:
* Add background job in gateway that:
* Marks instances Unhealthy if heartbeat stale.
3. Routing:
* Enhance `IRoutingPlugin` to:
* Filter out Unhealthy instances.
* Prefer gateway region (using `GatewayNodeConfig.Region`).
* Use simple `AveragePingMs` stub from request/response timings.
Still using InMemory transport; just building the selection logic.
---
## 7. Add cancellation semantics (with InMemory)
**Microservice + gateway agents:**
Wire up cancellation logic before touching real transports:
1. Common:
* Extend `FrameType` with `Cancel`.
2. Gateway:
* In `TransportDispatchMiddleware`:
* Tie `HttpContext.RequestAborted` to a `SendCancelAsync` call.
* On timeout, send CANCEL.
* Ignore late `Response`/stream data for canceled correlation IDs.
3. Microservice:
* Maintain `_inflight` map of correlation → `CancellationTokenSource`.
* When `Cancel` frame arrives, call `cts.Cancel()`.
* Ensure handlers receive and honor `CancellationToken`.
Prove via tests: if client disconnects, handler stops quickly.
---
## 8. Add streaming & payload limits (still InMemory)
**Gateway + microservice agents:**
1. Streaming:
* Extend InMemory transport to support `RequestStreamData` / `ResponseStreamData` frames.
* On the gateway:
* For `SupportsStreaming` endpoints, pipe HTTP body stream → frame stream.
* For response, pipe frames → HTTP response stream.
* On microservice:
* Expose `RawRequestContext.Body` as a stream reading frames as they arrive.
* Allow `RawResponse.WriteBodyAsync` to stream out.
2. Payload limits:
* Implement `PayloadLimits` enforcement at gateway:
* Early reject large `Content-Length`.
* Track counters in streaming; trigger cancellation when exceeding thresholds.
Demonstrate with a fake “upload” endpoint that uses `IRawStellaEndpoint` and streaming.
---
## 9. Implement real transport plugins one by one
**Transport agent:**
Now replace InMemory with real transports:
Order:
1. **TCP plugin** (easiest baseline):
* Length-prefixed frame protocol.
* Connection per microservice instance (or multi-instance if needed later).
* Implement HELLO/HEARTBEAT/REQUEST/RESPONSE/STREAM/CANCEL as per frame model.
2. **Certificate (TLS) plugin**:
* Wrap TCP plugin with TLS.
* Add configuration for server & client certs.
3. **UDP plugin**:
* Single datagram = single frame; no streaming.
* Enforce `MaxRequestBytesPerCall`.
* Use for small, idempotent operations.
4. **RabbitMQ plugin**:
* Add exchanges/queues for HELLO/HEARTBEAT and REQUEST/RESPONSE.
* Use `CorrelationId` properties for matching.
* Guarantee at-most-once semantics where practical.
While each plugin is built, keep the core router and microservice SDK relying only on `ITransportClient`/`ITransportServer` abstractions.
---
## 10. Add Router.Config + Microservice YAML integration
**Config agent:**
1. Implement `__Libraries/StellaOps.Router.Config`:
* YAML → `RouterConfig` binding.
* Services, endpoints, static instances, payload limits.
* Hot-reload via `IOptionsMonitor` / file watcher.
2. Implement microservice YAML:
* Endpoint-level overrides only (timeouts, requiringClaims, SupportsStreaming).
* Merge logic: code defaults → YAML override.
3. Integrate:
* Gateway uses RouterConfig for:
* Defaults when no microservice registered yet.
* Payload limits.
* Microservice uses YAML to refine endpoint metadata before sending HELLO.
---
## 11. Build a reference example + migration skeleton
**DX / migration agent:**
1. Build a `StellaOps.Billing.Microservice` example:
* A couple of simple endpoints (GET/POST).
* One streaming upload endpoint.
* YAML for requiringClaims and timeouts.
2. Build a `StellaOps.Gateway.WebService` example config around it.
3. Document the full path:
* How to run both locally.
* How to add a new endpoint.
* How cancellation behaves (killing the client, watching logs).
* How payload limits work (try to upload too-large file).
4. Outline migration steps from an imaginary `StellaOps.Billing.WebService` using the patterns in `Migration of Webservices to Microservices.md`.
---
## 12. Process guidance for your agents
* **Do not jump to UDP/TCP immediately.**
Prove the protocol (HELLO/HEARTBEAT/REQUEST/RESPONSE/STREAM/CANCEL), routing, and limits on the InMemory plugin first.
* **Guard the invariants.**
If someone proposes “just call HTTP between services” or “lets derive region from host,” theyre violating spec and must update `docs/router/specs.md` before coding.
* **Keep Common stable.**
Changes to `StellaOps.Router.Common` must be rare and reviewed; everything else depends on it.
* **Document as you go.**
Every time a behavior settles (e.g. status mapping, frame layout), update the docs under `docs/router/` so new agents always have a single source of truth.
If you want, next step I can convert this into a task board (epic → stories) per repo folder, so you can assign specific chunks to named agents.