Files
git.stella-ops.org/docs/router/SPRINT_7000_0005_0003_cancellation.md
StellaOps Bot 6a299d231f
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Add unit tests for Router configuration and transport layers
- Implemented tests for RouterConfig, RoutingOptions, StaticInstanceConfig, and RouterConfigOptions to ensure default values are set correctly.
- Added tests for RouterConfigProvider to validate configurations and ensure defaults are returned when no file is specified.
- Created tests for ConfigValidationResult to check success and error scenarios.
- Developed tests for ServiceCollectionExtensions to verify service registration for RouterConfig.
- Introduced UdpTransportTests to validate serialization, connection, request-response, and error handling in UDP transport.
- Added scripts for signing authority gaps and hashing DevPortal SDK snippets.
2025-12-05 08:01:47 +02:00

7.7 KiB

Sprint 7000-0005-0003 · Protocol Features · Cancellation Semantics

Topic & Scope

Implement cancellation semantics on both gateway and microservice sides. When HTTP clients disconnect, timeouts occur, or payload limits are breached, CANCEL frames are sent to stop in-flight work.

Goal: Clean cancellation propagation from HTTP client through gateway to microservice handlers.

Working directories:

  • src/Gateway/StellaOps.Gateway.WebService/ (send CANCEL)
  • src/__Libraries/StellaOps.Microservice/ (receive CANCEL, cancel handler)
  • src/__Libraries/StellaOps.Router.Common/ (CancelPayload)

Dependencies & Concurrency

  • Upstream: SPRINT_7000_0005_0002 (routing algorithm complete)
  • Downstream: SPRINT_7000_0005_0004 (streaming uses cancellation)
  • Parallel work: None. Sequential.
  • Cross-module impact: SDK and Gateway both modified.

Documentation Prerequisites

  • docs/router/specs.md (sections 7.6, 10 - Cancellation requirements)
  • docs/router/07-Step.md (cancellation section)
  • docs/router/implplan.md (phase 7 guidance)

BLOCKED Tasks: Before working on BLOCKED tasks, review ../implplan/BLOCKED_DEPENDENCY_TREE.md for root blockers and dependencies.

Delivery Tracker

# Task ID Status Description Working Directory
1 CAN-001 DONE Define CancelPayload with Reason code Common
2 CAN-002 DONE Define cancel reason constants ClientDisconnected, Timeout, PayloadLimitExceeded, Shutdown
3 CAN-010 DONE Implement CANCEL frame sending in gateway Gateway
4 CAN-011 DONE Wire HttpContext.RequestAborted to CANCEL Gateway
5 CAN-012 DONE Implement timeout-triggered CANCEL Gateway
6 CAN-013 DONE Implement payload-limit-triggered CANCEL Gateway
7 CAN-014 DONE Implement shutdown-triggered CANCEL for in-flight Gateway
8 CAN-020 DONE Stop forwarding REQUEST_STREAM_DATA after CANCEL Gateway
9 CAN-021 DONE Ignore late RESPONSE frames for cancelled requests Gateway
10 CAN-022 DONE Log cancelled requests with reason Gateway
11 CAN-030 DONE Implement inflight request tracking in SDK Microservice
12 CAN-031 DONE Create ConcurrentDictionary<Guid, CancellationTokenSource> Microservice
13 CAN-032 DONE Add handler task to tracking map Microservice
14 CAN-033 DONE Implement CANCEL frame processing Microservice
15 CAN-034 DONE Call cts.Cancel() on CANCEL frame Microservice
16 CAN-035 DONE Remove from tracking when handler completes Microservice
17 CAN-040 DONE Implement connection-close cancellation Microservice
18 CAN-041 DONE Cancel all inflight on connection loss Microservice
19 CAN-050 DONE Pass CancellationToken to handler interfaces Microservice
20 CAN-051 DONE Document cancellation best practices for handlers Docs
21 CAN-060 DONE Write integration tests: client disconnect → handler cancelled
22 CAN-061 DONE Write integration tests: timeout → handler cancelled
23 CAN-062 DONE Write tests: late response ignored

CancelPayload

public sealed class CancelPayload
{
    public string Reason { get; init; } = string.Empty;
}

public static class CancelReasons
{
    public const string ClientDisconnected = "ClientDisconnected";
    public const string Timeout = "Timeout";
    public const string PayloadLimitExceeded = "PayloadLimitExceeded";
    public const string Shutdown = "Shutdown";
}

Gateway-Side: Sending CANCEL

On Client Disconnect

// In TransportDispatchMiddleware
context.RequestAborted.Register(async () =>
{
    await transport.SendCancelAsync(
        connection,
        correlationId,
        CancelReasons.ClientDisconnected);
});

On Timeout

using var cts = CancellationTokenSource.CreateLinkedTokenSource(context.RequestAborted);
cts.CancelAfter(decision.EffectiveTimeout);

try
{
    var response = await transport.SendRequestAsync(..., cts.Token);
}
catch (OperationCanceledException) when (cts.IsCancellationRequested)
{
    if (!context.RequestAborted.IsCancellationRequested)
    {
        // Timeout, not client disconnect
        await transport.SendCancelAsync(connection, correlationId, CancelReasons.Timeout);
        context.Response.StatusCode = 504;
        return;
    }
}

Late Response Handling

private readonly ConcurrentDictionary<Guid, bool> _cancelledRequests = new();

public void MarkCancelled(Guid correlationId)
{
    _cancelledRequests[correlationId] = true;
}

public bool IsCancelled(Guid correlationId)
{
    return _cancelledRequests.ContainsKey(correlationId);
}

// When response arrives
if (IsCancelled(frame.CorrelationId))
{
    _logger.LogDebug("Ignoring late response for cancelled {CorrelationId}", frame.CorrelationId);
    return; // Discard
}

Microservice-Side: Receiving CANCEL

Inflight Tracking

internal sealed class InflightRequestTracker
{
    private readonly ConcurrentDictionary<Guid, InflightRequest> _inflight = new();

    public CancellationToken Track(Guid correlationId, Task handlerTask)
    {
        var cts = new CancellationTokenSource();
        _inflight[correlationId] = new InflightRequest(cts, handlerTask);
        return cts.Token;
    }

    public void Cancel(Guid correlationId, string reason)
    {
        if (_inflight.TryGetValue(correlationId, out var request))
        {
            request.Cts.Cancel();
            _logger.LogInformation("Cancelled {CorrelationId}: {Reason}", correlationId, reason);
        }
    }

    public void Complete(Guid correlationId)
    {
        if (_inflight.TryRemove(correlationId, out var request))
        {
            request.Cts.Dispose();
        }
    }

    public void CancelAll(string reason)
    {
        foreach (var kvp in _inflight)
        {
            kvp.Value.Cts.Cancel();
        }
        _inflight.Clear();
    }
}

Connection-Close Handling

// When connection closes unexpectedly
_inflightTracker.CancelAll("ConnectionClosed");

Handler Cancellation Guidelines

Handlers MUST:

  1. Accept CancellationToken parameter
  2. Pass token to all async I/O operations
  3. Check token.IsCancellationRequested in loops
  4. Stop work promptly when cancelled
public class ProcessDataEndpoint : IStellaEndpoint<DataRequest, DataResponse>
{
    public async Task<DataResponse> HandleAsync(DataRequest request, CancellationToken ct)
    {
        // Pass token to I/O
        var data = await _database.QueryAsync(request.Id, ct);

        // Check in loops
        foreach (var item in data)
        {
            ct.ThrowIfCancellationRequested();
            await ProcessItemAsync(item, ct);
        }

        return new DataResponse { ... };
    }
}

Exit Criteria

Before marking this sprint DONE:

  1. CANCEL frames sent on client disconnect
  2. CANCEL frames sent on timeout
  3. SDK tracks inflight requests with CTS
  4. SDK cancels handlers on CANCEL frame
  5. Connection close cancels all inflight
  6. Late responses are ignored/logged
  7. Integration tests verify cancellation flow

Execution Log

Date (UTC) Update Owner
2025-12-05 Sprint DONE - CancelReasons defined, InflightRequestTracker implemented, Gateway sends CANCEL on disconnect/timeout, SDK handles CANCEL frames, 67 tests pass Claude

Decisions & Risks

  • Cancellation is cooperative; handlers must honor the token
  • CTS disposal happens on completion to avoid leaks
  • Late response cleanup: entries expire after 60 seconds
  • Shutdown CANCEL is best-effort (connections may close first)