Update docs, sprint plans, and compose configuration
Add 12 new sprint files (Integrations, Graph, JobEngine, FE, Router, AdvisoryAI), archive completed scheduler UI sprint, update module architecture docs (router, graph, jobengine, web, integrations), and add Gitea entrypoint script for local dev. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -24,7 +24,7 @@ Rollout policy: `docs/operations/multi-tenant-rollout-and-compatibility.md`
|
||||
|
||||
Each transport connection carries:
|
||||
|
||||
- Initial registration (HELLO) and endpoint configuration
|
||||
- Initial identity (HELLO) and, when needed, endpoint metadata replay
|
||||
- Ongoing heartbeats
|
||||
- Request/response data frames
|
||||
- Streaming data frames
|
||||
@@ -34,9 +34,11 @@ Each transport connection carries:
|
||||
┌─────────────────┐ ┌─────────────────┐
|
||||
│ Microservice │ │ Gateway │
|
||||
│ │ HELLO │ │
|
||||
│ Endpoints: │ ─────────────────────────►│ Routing │
|
||||
│ Identity │ ─────────────────────────►│ Routing │
|
||||
│ - POST /items │ HEARTBEAT │ State │
|
||||
│ - GET /items │ ◄────────────────────────►│ │
|
||||
│ Metadata │ RESYNC / ENDPOINTS │ Connections[] │
|
||||
│ replay │ ◄────────────────────────►│ │
|
||||
│ │ │ Connections[] │
|
||||
│ │ REQUEST / RESPONSE │ │
|
||||
│ │ ◄────────────────────────►│ │
|
||||
@@ -280,7 +282,9 @@ public enum FrameType : byte
|
||||
Response = 4,
|
||||
RequestStreamData = 5,
|
||||
ResponseStreamData = 6,
|
||||
Cancel = 7
|
||||
Cancel = 7,
|
||||
ResyncRequest = 8,
|
||||
EndpointsUpdate = 9
|
||||
}
|
||||
```
|
||||
|
||||
@@ -415,9 +419,10 @@ Two mechanisms:
|
||||
### Connection Behavior
|
||||
|
||||
On connection:
|
||||
1. Send HELLO with instance info and endpoints
|
||||
2. Start heartbeat timer
|
||||
3. Listen for REQUEST frames
|
||||
1. Send HELLO with instance identity.
|
||||
2. Start heartbeat timer.
|
||||
3. For messaging transport, replay endpoint/schema/OpenAPI metadata only when the router explicitly asks for it.
|
||||
4. Listen for REQUEST frames.
|
||||
|
||||
HELLO payload:
|
||||
|
||||
@@ -431,6 +436,11 @@ public sealed class HelloPayload
|
||||
}
|
||||
```
|
||||
|
||||
For messaging transport the steady-state contract is intentionally slimmer than the generic shape above:
|
||||
- startup `HELLO` carries identity and may leave `Endpoints` empty
|
||||
- the gateway sends `ResyncRequest` on service startup, administrative replay, or gateway-state miss
|
||||
- the microservice answers with `EndpointsUpdate` containing endpoints, schemas, and OpenAPI metadata
|
||||
|
||||
---
|
||||
|
||||
## Authorization
|
||||
@@ -449,7 +459,7 @@ public sealed class ClaimRequirement
|
||||
|
||||
### Precedence
|
||||
|
||||
1. Microservice provides defaults in HELLO
|
||||
1. Microservice provides defaults in registration metadata
|
||||
2. Authority can override centrally
|
||||
3. Gateway enforces final effective claims
|
||||
|
||||
@@ -533,9 +543,12 @@ Sent at regular intervals over the same connection as requests:
|
||||
```csharp
|
||||
public sealed class HeartbeatPayload
|
||||
{
|
||||
public InstanceDescriptor? Instance { get; init; }
|
||||
public string InstanceId { get; init; }
|
||||
public required InstanceHealthStatus Status { get; init; }
|
||||
public int InflightRequests { get; init; }
|
||||
public int InFlightRequestCount { get; init; }
|
||||
public double ErrorRate { get; init; }
|
||||
public DateTime TimestampUtc { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
@@ -546,13 +559,14 @@ Gateway tracks:
|
||||
- Derives status from heartbeat recency
|
||||
- Marks stale instances as Unhealthy
|
||||
- Uses health in routing decisions
|
||||
- Messaging heartbeats include instance identity so the gateway can rebuild minimal state after a gateway restart or local routing-state loss without waiting for a full reconnect.
|
||||
- Messaging transports stay push-first even when backed by notifiable queues; the missed-notification safety-net timeout is derived from the configured heartbeat interval and clamped to a short bounded window instead of falling back to a fixed long poll.
|
||||
- Gateway degraded and stale transitions are normalized against the messaging heartbeat contract. A gateway may not mark an instance `Degraded` earlier than `2x` the heartbeat interval or `Unhealthy` earlier than `3x` the heartbeat interval, even when looser defaults were configured.
|
||||
- `/health/ready` is stricter than "process started": it remains `503` until the configured required first-party microservices have live healthy or degraded registrations in router state. Local scratch compose uses this to hold the frontdoor unhealthy until the core Stella API surface has replayed HELLO after a rebuild.
|
||||
- The required-service list must use canonical router `serviceName` values, not loose product-family aliases. Gateway readiness normalizes host-style suffixes such as `-gateway`, `-web`, `.stella-ops.local`, and ports, but it does not treat sibling services as interchangeable.
|
||||
- When a request already matched a configured `Microservice` route but the target service has not registered yet, the gateway returns `503 Service Unavailable`, not `404 Not Found`. `404` remains reserved for genuinely unknown paths or missing endpoints on an otherwise registered service.
|
||||
|
||||
Periodic HELLO re-registration is valid so a microservice can repopulate gateway state after a gateway restart, but it must refresh the existing logical transport connection instead of minting a second one. Gateway routing state also deduplicates by service instance identity (`ServiceName`, `Version`, `InstanceId`, transport) before re-indexing endpoints so repeated HELLO frames cannot accumulate stale route candidates.
|
||||
- Messaging resync is explicit instead of periodic: startup, administrative replay, and gateway-state misses trigger `ResyncRequest`, while normal heartbeats stay small.
|
||||
- The Valkey transport keeps its timeout fallback plus proactive randomized re-subscribe so silent Pub/Sub failures still recover. That fallback still produces some `XREADGROUP`/`XAUTOCLAIM` traffic, but it is resilience traffic rather than endpoint-catalog churn.
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user