Files
git.stella-ops.org/docs/router/12-Step.md
2025-12-02 18:38:32 +02:00

11 KiB
Raw Blame History

Below is how Id tell your dev agents to operate on this codebase so it doesnt turn into chaos over time.

Think of this as the “rules of engagement” for Stella Ops Router.


1. Nonnegotiable operating principles

All agents follow these rules:

  1. Specs are law

    • docs/router/specs.md is the primary source of truth.

    • If code and spec differ:

      • Fix the spec first (in a PR), then adjust the code.
    • No “quick fixes” that contradict the spec.

  2. Common & protocol are sacred

    • StellaOps.Router.Common and the wire protocol (Frame/FrameType/serialization) are stable layers.

    • Any change to:

      • Frame, FrameType
      • EndpointDescriptor, ConnectionState
      • ITransportClient / ITransportServer
    • …requires:

      • Explicit spec update.
      • Compatibility consideration.
      • Code review by someone thinking about all transports and both sides (gateway + microservice).
  3. InMemory first, then real transports

    • New protocol semantics (e.g., new frame type, new behavior, new timeout rules) MUST:

      1. Be implemented and proven with InMemory.
      2. Have tests passing with InMemory.
      3. Only then be rolled into TCP/TLS/UDP/RabbitMQ.
  4. No backdoor HTTP between microservices and router

    • Microservices must never talk HTTP to the router for control plane or data.
    • All microservicerouter traffic goes through the registered transports (UDP/TCP/TLS/RabbitMQ) using Frame.
  5. Method + Path = contract

    • Endpoint identity is always: HTTP Method + Path, nothing else.
    • No “dynamic” routing hacks that bypass the (Method, Path) resolution.

2. How agents should structure work (vertical slices, not scattered edits)

Whenever you assign work, agents should:

  1. Work in vertical slices

    • Example slice: “Cancellation with InMemory”, “Streaming + payload limits with TCP”, “RabbitMQ buffered requests”.

    • Each slice includes:

      • Spec amendments (if needed).
      • Common contracts (if needed).
      • Implementation (gateway + microservice + transport).
      • Tests.
  2. Avoid crosscutting, halffinished changes

    • Do not:

      • Change Common, start on TCP, then get bored and leave InMemory broken.
    • Do:

      • Finish one vertical slice endtoend, then move on.
  3. Keep changes small and reviewable

    • Prefer:

      • One PR for “add YAML overrides merging”.
      • Another PR for “add router YAML hotreload details”.
    • Avoid huge omnibus PRs that change protocol, transports, router, and microservice in one go.


3. Change categories & review rules

Agents should classify their work by category and obey the review level.

  1. Category A Protocol / Common changes

    • Affects:

      • Frame, FrameType, payload DTOs.
      • EndpointDescriptor, ConnectionState, RoutingDecision.
      • ITransportClient, ITransportServer.
    • Requirements:

      • Spec change with rationale.
      • Crossside impact analysis: gateway + microservice + all transports.
      • Tests updated for InMemory and at least one real transport.
    • Review: 2+ reviewers, one acting as “protocol owner”.

  2. Category B Router logic / routing plugin

    • Affects:

      • IGlobalRoutingState implementation.
      • IRoutingPlugin logic (region, ping, heartbeat).
    • Requirements:

      • Unit tests for routing plugin (selection rules).
      • At least one integration test through gateway + InMemory.
    • Review: at least one reviewer who understands region/version semantics.

  3. Category C Transport implementation

    • Affects:

      • TCP/TLS/UDP/RabbitMQ clients & servers.
    • Requirements:

      • Transportspecific tests (connection, basic request/response, timeout).
      • No protocol changes.
    • Review: 12 reviewers, including one who owns that transport.

  4. Category D SDK / Microservice developer experience

    • Affects:

      • StellaOps.Microservice public surface, endpoint discovery, YAML merging.
    • Requirements:

      • API review for public surface.
      • Docs update (Microservice.md) if behavior changes.
    • Review: 12 reviewers.

  5. Category E Docs only

    • Affects:

      • docs/router/*, no code.
    • Requirements:

      • Ensure docs match current behavior; if not, spawn followup issues.

4. Workflow per change (what each agent does)

For any nontrivial change:

  1. Check the spec

    • Confirm that:

      • The desired behavior is already described, or
      • You will extend the spec first.
  2. Update / extend spec if needed

    • Edit docs/router/specs.md or appropriate doc.

    • Document:

      • Whats changing.
      • Why we need it.
      • Which components are affected.
  3. Adjust Common / contracts if needed

    • Only after spec is updated.
    • Keep changes minimal and backwards compatible where possible.
  4. Implement in InMemory path

    • Update:

      • InMemory ITransportClient/hub.
      • Microservice and gateway logic that rely on it.
    • Add tests to prove behavior.

  5. Port to real transports

    • Implement the same behavior in:

      • TCP (baseline).
      • TLS (wrapping TCP).
      • Others when needed.
    • Reuse the same InMemory tests pattern for transport tests.

  6. Add / update tests

    • Unit tests for logic.
    • Integration tests for gateway + microservice via at least one real transport.
  7. Update documentation

    • Update relevant docs:

      • Stella Ops Router - Webserver.md
      • Stella Ops Router - Microservice.md
      • Common.md, if common contracts changed.
    • Highlight any new configuration knobs or invariants.


5. Testing expectations for all agents

Agents should treat tests as part of the change, not an afterthought.

  1. Unit tests

    • For:

      • Routing plugin decisions.
      • YAML merge behavior.
      • Payload budget logic.
    • Goal:

      • All tricky branches are covered.
  2. Integration tests

    • For gateway + microservice using:

      • InMemory.
      • At least one real transport (TCP in dev).
    • Scenarios to maintain:

      • Simple request/response.
      • Streaming upload.
      • Cancellation on client abort.
      • Timeout leading to CANCEL.
      • Payload limit exceeded.
  3. Smoke tests for examples

    • Ensure StellaOps.Billing.Microservice example always passes a small test:

      • /billing/health works.
      • /billing/invoices/upload streaming behaves.
  4. CI gating

    • No PR merges unless:

      • dotnet build for solution succeeds.
      • All tests pass.
    • If agents add new projects/tests, CI must be updated in the same PR.


6. How agents should use configuration & YAML

  1. Router side

    • Always read payload limits, node region, transports from RouterConfig (bound from YAML + env).

    • Do not hardcode:

      • Limits.
      • Regions.
      • Ports.
    • If behavior depends on config, fetch from IOptionsMonitor<RouterConfig> at runtime, not from cached fields unless you explicitly freeze.

  2. Microservice side

    • Identity & router pool:

      • From StellaMicroserviceOptions (code/env).
    • Endpoint metadata overrides:

      • From YAML (ConfigFilePath) merged into reflection result.
    • Agents must not let YAML create endpoints that dont exist in code; overrides only.

  3. No hidden defaults

    • If a default is important (e.g. HeartbeatInterval), document it and centralize it.
    • Dont sprinkle magic numbers across code.

7. Adding new capabilities: pattern all agents follow

When someone wants a new capability (e.g. “retry on transient transport failures”):

  1. Open a design issue / doc snippet

    • Describe:

      • Problem.
      • Proposed design.
      • Where it sits in architecture (router, microservice, transport, config).
  2. Update spec

    • Write the behavior in the appropriate doc section.

    • Include:

      • API shape (if public).
      • Transport impacts.
      • Failure modes.
  3. Follow the vertical slice path

    • Implement in Common (if needed).
    • Implement InMemory.
    • Implement in primary transport (TCP).
    • Add tests.
    • Update docs.

Agents should not just spike code into TCP implementation without spec or tests.


8. Logging, tracing, and debugging expectations

Agents should instrument consistently; this matters for operations and for debugging during development.

  1. Use structured logging

    • At minimum, include:

      • ServiceName
      • InstanceId
      • CorrelationId
      • Method
      • Path
      • ConnectionId
    • Never log full payload bodies by default for privacy and performance; log sizes and key metadata instead.

  2. Trace correlation

    • Ensure correlation IDs:

      • Propagate from HTTP (gateway) into Frame.CorrelationId.
      • Are used in logs on both sides (gateway + microservice).
  3. Agent debugging guidance

    • When debugging a routing or transport problem:

      • Turn on debug logging for gateway + microservice for that service.

      • Use the correlation ID to follow the request endtoend.

      • Verify:

        • HELLO registration.
        • HEARTBEAT events.
        • REQUEST leaving gateway.
        • RESPONSE arriving.

9. Daily agent workflow (practical directions)

For each day / task, an agent should:

  1. Start from an issue or spec line item

    • Never “just code something” without an issue/state in the backlog.
  2. Locate the relevant doc

    • Spec section.
    • Example docs (e.g. Billing sample).
    • Migration doc if working on conversion.
  3. Work in a feature branch

    • Branch name reflects scope: feature/streaming-tcp, fix/router-cancellation, etc.
  4. Keep notes

    • If an assumption is made (e.g. “we currently dont support streaming over RabbitMQ”), note it in the issue.
    • If they discover inconsistency in docs, open a docfix issue.
  5. Finish the full slice

    • Code + tests + docs.
    • Keep partial implementations behind feature flags (if needed) and clearly marked.
  6. Open PR with clear description

    • What changed.
    • Which spec section it implements or modifies.
    • Any risks or rollback notes.

10. Guardrails against drift

Finally, a few things agents must actively avoid:

  • No silent protocol changes

    • Dont change FrameType semantics, payload formats, or header layout without:

      • Spec update.
      • Full impact review.
  • No specless behavior

    • If something matters at runtime (timeouts, retries, routing rules), it has to be in the docs, not just in someones head.
  • No bypassing of router

    • Do not introduce “temporary” direct calls from clients to microservices. All client HTTP should go via gateway.
  • No direct dependencies on specific transports in domain code

    • Domain and microservice endpoint logic must not know if the transport is TCP, TLS, UDP, or RabbitMQ. They only see RawRequestContext, RawResponse, and cancellation tokens.

If you want, I can turn this into a onepage “Agent Handbook” markdown file you can drop into docs/router/AGENTS_PROCESS.md and link from specs.md so every AI or human dev working on this stack has the same ground rules.