router planning
This commit is contained in:
415
docs/router/12-Step.md
Normal file
415
docs/router/12-Step.md
Normal file
@@ -0,0 +1,415 @@
|
||||
Below is how I’d tell your dev agents to operate on this codebase so it doesn’t turn into chaos over time.
|
||||
|
||||
Think of this as the “rules of engagement” for Stella Ops Router.
|
||||
|
||||
---
|
||||
|
||||
## 1. Non‑negotiable operating principles
|
||||
|
||||
All agents follow these rules:
|
||||
|
||||
1. **Specs are law**
|
||||
|
||||
* `docs/router/specs.md` is the primary source of truth.
|
||||
* If code and spec differ:
|
||||
|
||||
* Fix the spec **first** (in a PR), then adjust the code.
|
||||
* No “quick fixes” that contradict the spec.
|
||||
|
||||
2. **Common & protocol are sacred**
|
||||
|
||||
* `StellaOps.Router.Common` and the wire protocol (Frame/FrameType/serialization) are stable layers.
|
||||
* Any change to:
|
||||
|
||||
* `Frame`, `FrameType`
|
||||
* `EndpointDescriptor`, `ConnectionState`
|
||||
* `ITransportClient` / `ITransportServer`
|
||||
* …requires:
|
||||
|
||||
* Explicit spec update.
|
||||
* Compatibility consideration.
|
||||
* Code review by someone thinking about all transports and both sides (gateway + microservice).
|
||||
|
||||
3. **InMemory first, then real transports**
|
||||
|
||||
* New protocol semantics (e.g., new frame type, new behavior, new timeout rules) MUST:
|
||||
|
||||
1. Be implemented and proven with InMemory.
|
||||
2. Have tests passing with InMemory.
|
||||
3. Only then be rolled into TCP/TLS/UDP/RabbitMQ.
|
||||
|
||||
4. **No backdoor HTTP between microservices and router**
|
||||
|
||||
* Microservices must never talk HTTP to the router for control plane or data.
|
||||
* All microservice–router traffic goes through the registered transports (UDP/TCP/TLS/RabbitMQ) using `Frame`.
|
||||
|
||||
5. **Method + Path = contract**
|
||||
|
||||
* Endpoint identity is always: `HTTP Method + Path`, nothing else.
|
||||
* No “dynamic” routing hacks that bypass the `(Method, Path)` resolution.
|
||||
|
||||
---
|
||||
|
||||
## 2. How agents should structure work (vertical slices, not scattered edits)
|
||||
|
||||
Whenever you assign work, agents should:
|
||||
|
||||
1. **Work in vertical slices**
|
||||
|
||||
* Example slice: “Cancellation with InMemory”, “Streaming + payload limits with TCP”, “RabbitMQ buffered requests”.
|
||||
* Each slice includes:
|
||||
|
||||
* Spec amendments (if needed).
|
||||
* Common contracts (if needed).
|
||||
* Implementation (gateway + microservice + transport).
|
||||
* Tests.
|
||||
|
||||
2. **Avoid cross‑cutting, half‑finished changes**
|
||||
|
||||
* Do not:
|
||||
|
||||
* Change Common, start on TCP, then get bored and leave InMemory broken.
|
||||
* Do:
|
||||
|
||||
* Finish one vertical slice end‑to‑end, then move on.
|
||||
|
||||
3. **Keep changes small and reviewable**
|
||||
|
||||
* Prefer:
|
||||
|
||||
* One PR for “add YAML overrides merging”.
|
||||
* Another PR for “add router YAML hot‑reload details”.
|
||||
* Avoid huge omnibus PRs that change protocol, transports, router, and microservice in one go.
|
||||
|
||||
---
|
||||
|
||||
## 3. Change categories & review rules
|
||||
|
||||
Agents should classify their work by category and obey the review level.
|
||||
|
||||
1. **Category A – Protocol / Common changes**
|
||||
|
||||
* Affects:
|
||||
|
||||
* `Frame`, `FrameType`, payload DTOs.
|
||||
* `EndpointDescriptor`, `ConnectionState`, `RoutingDecision`.
|
||||
* `ITransportClient`, `ITransportServer`.
|
||||
* Requirements:
|
||||
|
||||
* Spec change with rationale.
|
||||
* Cross‑side impact analysis: gateway + microservice + all transports.
|
||||
* Tests updated for InMemory and at least one real transport.
|
||||
* Review: 2+ reviewers, one acting as “protocol owner”.
|
||||
|
||||
2. **Category B – Router logic / routing plugin**
|
||||
|
||||
* Affects:
|
||||
|
||||
* `IGlobalRoutingState` implementation.
|
||||
* `IRoutingPlugin` logic (region, ping, heartbeat).
|
||||
* Requirements:
|
||||
|
||||
* Unit tests for routing plugin (selection rules).
|
||||
* At least one integration test through gateway + InMemory.
|
||||
* Review: at least one reviewer who understands region/version semantics.
|
||||
|
||||
3. **Category C – Transport implementation**
|
||||
|
||||
* Affects:
|
||||
|
||||
* TCP/TLS/UDP/RabbitMQ clients & servers.
|
||||
* Requirements:
|
||||
|
||||
* Transport‑specific tests (connection, basic request/response, timeout).
|
||||
* No protocol changes.
|
||||
* Review: 1–2 reviewers, including one who owns that transport.
|
||||
|
||||
4. **Category D – SDK / Microservice developer experience**
|
||||
|
||||
* Affects:
|
||||
|
||||
* `StellaOps.Microservice` public surface, endpoint discovery, YAML merging.
|
||||
* Requirements:
|
||||
|
||||
* API review for public surface.
|
||||
* Docs update (`Microservice.md`) if behavior changes.
|
||||
* Review: 1–2 reviewers.
|
||||
|
||||
5. **Category E – Docs only**
|
||||
|
||||
* Affects:
|
||||
|
||||
* `docs/router/*`, no code.
|
||||
* Requirements:
|
||||
|
||||
* Ensure docs match current behavior; if not, spawn follow‑up issues.
|
||||
|
||||
---
|
||||
|
||||
## 4. Workflow per change (what each agent does)
|
||||
|
||||
For any non‑trivial change:
|
||||
|
||||
1. **Check the spec**
|
||||
|
||||
* Confirm that:
|
||||
|
||||
* The desired behavior is already described, or
|
||||
* You will extend the spec first.
|
||||
|
||||
2. **Update / extend spec if needed**
|
||||
|
||||
* Edit `docs/router/specs.md` or appropriate doc.
|
||||
* Document:
|
||||
|
||||
* What’s changing.
|
||||
* Why we need it.
|
||||
* Which components are affected.
|
||||
|
||||
3. **Adjust Common / contracts if needed**
|
||||
|
||||
* Only after spec is updated.
|
||||
* Keep changes minimal and backwards compatible where possible.
|
||||
|
||||
4. **Implement in InMemory path**
|
||||
|
||||
* Update:
|
||||
|
||||
* InMemory `ITransportClient`/hub.
|
||||
* Microservice and gateway logic that rely on it.
|
||||
* Add tests to prove behavior.
|
||||
|
||||
5. **Port to real transports**
|
||||
|
||||
* Implement the same behavior in:
|
||||
|
||||
* TCP (baseline).
|
||||
* TLS (wrapping TCP).
|
||||
* Others when needed.
|
||||
* Reuse the same InMemory tests pattern for transport tests.
|
||||
|
||||
6. **Add / update tests**
|
||||
|
||||
* Unit tests for logic.
|
||||
* Integration tests for gateway + microservice via at least one real transport.
|
||||
|
||||
7. **Update documentation**
|
||||
|
||||
* Update relevant docs:
|
||||
|
||||
* `Stella Ops Router - Webserver.md`
|
||||
* `Stella Ops Router - Microservice.md`
|
||||
* `Common.md`, if common contracts changed.
|
||||
* Highlight any new configuration knobs or invariants.
|
||||
|
||||
---
|
||||
|
||||
## 5. Testing expectations for all agents
|
||||
|
||||
Agents should treat tests as part of the change, not an afterthought.
|
||||
|
||||
1. **Unit tests**
|
||||
|
||||
* For:
|
||||
|
||||
* Routing plugin decisions.
|
||||
* YAML merge behavior.
|
||||
* Payload budget logic.
|
||||
* Goal:
|
||||
|
||||
* All tricky branches are covered.
|
||||
|
||||
2. **Integration tests**
|
||||
|
||||
* For gateway + microservice using:
|
||||
|
||||
* InMemory.
|
||||
* At least one real transport (TCP in dev).
|
||||
|
||||
* Scenarios to maintain:
|
||||
|
||||
* Simple request/response.
|
||||
* Streaming upload.
|
||||
* Cancellation on client abort.
|
||||
* Timeout leading to CANCEL.
|
||||
* Payload limit exceeded.
|
||||
|
||||
3. **Smoke tests for examples**
|
||||
|
||||
* Ensure `StellaOps.Billing.Microservice` example always passes a small test:
|
||||
|
||||
* `/billing/health` works.
|
||||
* `/billing/invoices/upload` streaming behaves.
|
||||
|
||||
4. **CI gating**
|
||||
|
||||
* No PR merges unless:
|
||||
|
||||
* `dotnet build` for solution succeeds.
|
||||
* All tests pass.
|
||||
* If agents add new projects/tests, CI must be updated in the same PR.
|
||||
|
||||
---
|
||||
|
||||
## 6. How agents should use configuration & YAML
|
||||
|
||||
1. **Router side**
|
||||
|
||||
* Always read payload limits, node region, transports from `RouterConfig` (bound from YAML + env).
|
||||
* Do not hardcode:
|
||||
|
||||
* Limits.
|
||||
* Regions.
|
||||
* Ports.
|
||||
* If behavior depends on config, fetch from `IOptionsMonitor<RouterConfig>` at runtime, not from cached fields unless you explicitly freeze.
|
||||
|
||||
2. **Microservice side**
|
||||
|
||||
* Identity & router pool:
|
||||
|
||||
* From `StellaMicroserviceOptions` (code/env).
|
||||
* Endpoint metadata overrides:
|
||||
|
||||
* From YAML (`ConfigFilePath`) merged into reflection result.
|
||||
* Agents must not let YAML create endpoints that don’t exist in code; overrides only.
|
||||
|
||||
3. **No hidden defaults**
|
||||
|
||||
* If a default is important (e.g. `HeartbeatInterval`), document it and centralize it.
|
||||
* Don’t sprinkle magic numbers across code.
|
||||
|
||||
---
|
||||
|
||||
## 7. Adding new capabilities: pattern all agents follow
|
||||
|
||||
When someone wants a new capability (e.g. “retry on transient transport failures”):
|
||||
|
||||
1. **Open a design issue / doc snippet**
|
||||
|
||||
* Describe:
|
||||
|
||||
* Problem.
|
||||
* Proposed design.
|
||||
* Where it sits in architecture (router, microservice, transport, config).
|
||||
|
||||
2. **Update spec**
|
||||
|
||||
* Write the behavior in the appropriate doc section.
|
||||
* Include:
|
||||
|
||||
* API shape (if public).
|
||||
* Transport impacts.
|
||||
* Failure modes.
|
||||
|
||||
3. **Follow the vertical slice path**
|
||||
|
||||
* Implement in Common (if needed).
|
||||
* Implement InMemory.
|
||||
* Implement in primary transport (TCP).
|
||||
* Add tests.
|
||||
* Update docs.
|
||||
|
||||
Agents should not just spike code into TCP implementation without spec or tests.
|
||||
|
||||
---
|
||||
|
||||
## 8. Logging, tracing, and debugging expectations
|
||||
|
||||
Agents should instrument consistently; this matters for operations and for debugging during development.
|
||||
|
||||
1. **Use structured logging**
|
||||
|
||||
* At minimum, include:
|
||||
|
||||
* `ServiceName`
|
||||
* `InstanceId`
|
||||
* `CorrelationId`
|
||||
* `Method`
|
||||
* `Path`
|
||||
* `ConnectionId`
|
||||
* Never log full payload bodies by default for privacy and performance; log sizes and key metadata instead.
|
||||
|
||||
2. **Trace correlation**
|
||||
|
||||
* Ensure correlation IDs:
|
||||
|
||||
* Propagate from HTTP (gateway) into `Frame.CorrelationId`.
|
||||
* Are used in logs on both sides (gateway + microservice).
|
||||
|
||||
3. **Agent debugging guidance**
|
||||
|
||||
* When debugging a routing or transport problem:
|
||||
|
||||
* Turn on debug logging for gateway + microservice for that service.
|
||||
* Use the correlation ID to follow the request end‑to‑end.
|
||||
* Verify:
|
||||
|
||||
* HELLO registration.
|
||||
* HEARTBEAT events.
|
||||
* REQUEST leaving gateway.
|
||||
* RESPONSE arriving.
|
||||
|
||||
---
|
||||
|
||||
## 9. Daily agent workflow (practical directions)
|
||||
|
||||
For each day / task, an agent should:
|
||||
|
||||
1. **Start from an issue or spec line item**
|
||||
|
||||
* Never “just code something” without an issue/state in the backlog.
|
||||
|
||||
2. **Locate the relevant doc**
|
||||
|
||||
* Spec section.
|
||||
* Example docs (e.g. Billing sample).
|
||||
* Migration doc if working on conversion.
|
||||
|
||||
3. **Work in a feature branch**
|
||||
|
||||
* Branch name reflects scope: `feature/streaming-tcp`, `fix/router-cancellation`, etc.
|
||||
|
||||
4. **Keep notes**
|
||||
|
||||
* If an assumption is made (e.g. “we currently don’t support streaming over RabbitMQ”), note it in the issue.
|
||||
* If they discover inconsistency in docs, open a doc‑fix issue.
|
||||
|
||||
5. **Finish the full slice**
|
||||
|
||||
* Code + tests + docs.
|
||||
* Keep partial implementations behind feature flags (if needed) and clearly marked.
|
||||
|
||||
6. **Open PR with clear description**
|
||||
|
||||
* What changed.
|
||||
* Which spec section it implements or modifies.
|
||||
* Any risks or roll‑back notes.
|
||||
|
||||
---
|
||||
|
||||
## 10. Guardrails against drift
|
||||
|
||||
Finally, a few things agents must actively avoid:
|
||||
|
||||
* **No silent protocol changes**
|
||||
|
||||
* Don’t change `FrameType` semantics, payload formats, or header layout without:
|
||||
|
||||
* Spec update.
|
||||
* Full impact review.
|
||||
|
||||
* **No specless behavior**
|
||||
|
||||
* If something matters at runtime (timeouts, retries, routing rules), it has to be in the docs, not just in someone’s head.
|
||||
|
||||
* **No bypassing of router**
|
||||
|
||||
* Do not introduce “temporary” direct calls from clients to microservices. All client HTTP should go via gateway.
|
||||
|
||||
* **No direct dependencies on specific transports in domain code**
|
||||
|
||||
* Domain and microservice endpoint logic must not know if the transport is TCP, TLS, UDP, or RabbitMQ. They only see `RawRequestContext`, `RawResponse`, and cancellation tokens.
|
||||
|
||||
---
|
||||
|
||||
If you want, I can turn this into a one‑page “Agent Handbook” markdown file you can drop into `docs/router/AGENTS_PROCESS.md` and link from `specs.md` so every AI or human dev working on this stack has the same ground rules.
|
||||
Reference in New Issue
Block a user