Files
git.stella-ops.org/docs/router/SPRINT_7000_0005_0003_cancellation.md
master 75f6942769
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Add integration tests for migration categories and execution
- Implemented MigrationCategoryTests to validate migration categorization for startup, release, seed, and data migrations.
- Added tests for edge cases, including null, empty, and whitespace migration names.
- Created StartupMigrationHostTests to verify the behavior of the migration host with real PostgreSQL instances using Testcontainers.
- Included tests for migration execution, schema creation, and handling of pending release migrations.
- Added SQL migration files for testing: creating a test table, adding a column, a release migration, and seeding data.
2025-12-04 19:10:54 +02:00

7.5 KiB

Sprint 7000-0005-0003 · Protocol Features · Cancellation Semantics

Topic & Scope

Implement cancellation semantics on both gateway and microservice sides. When HTTP clients disconnect, timeouts occur, or payload limits are breached, CANCEL frames are sent to stop in-flight work.

Goal: Clean cancellation propagation from HTTP client through gateway to microservice handlers.

Working directories:

  • src/Gateway/StellaOps.Gateway.WebService/ (send CANCEL)
  • src/__Libraries/StellaOps.Microservice/ (receive CANCEL, cancel handler)
  • src/__Libraries/StellaOps.Router.Common/ (CancelPayload)

Dependencies & Concurrency

  • Upstream: SPRINT_7000_0005_0002 (routing algorithm complete)
  • Downstream: SPRINT_7000_0005_0004 (streaming uses cancellation)
  • Parallel work: None. Sequential.
  • Cross-module impact: SDK and Gateway both modified.

Documentation Prerequisites

  • docs/router/specs.md (sections 7.6, 10 - Cancellation requirements)
  • docs/router/07-Step.md (cancellation section)
  • docs/router/implplan.md (phase 7 guidance)

BLOCKED Tasks: Before working on BLOCKED tasks, review ../implplan/BLOCKED_DEPENDENCY_TREE.md for root blockers and dependencies.

Delivery Tracker

# Task ID Status Description Working Directory
1 CAN-001 TODO Define CancelPayload with Reason code Common
2 CAN-002 TODO Define cancel reason constants ClientDisconnected, Timeout, PayloadLimitExceeded, Shutdown
3 CAN-010 TODO Implement CANCEL frame sending in gateway Gateway
4 CAN-011 TODO Wire HttpContext.RequestAborted to CANCEL Gateway
5 CAN-012 TODO Implement timeout-triggered CANCEL Gateway
6 CAN-013 TODO Implement payload-limit-triggered CANCEL Gateway
7 CAN-014 TODO Implement shutdown-triggered CANCEL for in-flight Gateway
8 CAN-020 TODO Stop forwarding REQUEST_STREAM_DATA after CANCEL Gateway
9 CAN-021 TODO Ignore late RESPONSE frames for cancelled requests Gateway
10 CAN-022 TODO Log cancelled requests with reason Gateway
11 CAN-030 TODO Implement inflight request tracking in SDK Microservice
12 CAN-031 TODO Create ConcurrentDictionary<Guid, CancellationTokenSource> Microservice
13 CAN-032 TODO Add handler task to tracking map Microservice
14 CAN-033 TODO Implement CANCEL frame processing Microservice
15 CAN-034 TODO Call cts.Cancel() on CANCEL frame Microservice
16 CAN-035 TODO Remove from tracking when handler completes Microservice
17 CAN-040 TODO Implement connection-close cancellation Microservice
18 CAN-041 TODO Cancel all inflight on connection loss Microservice
19 CAN-050 TODO Pass CancellationToken to handler interfaces Microservice
20 CAN-051 TODO Document cancellation best practices for handlers Docs
21 CAN-060 TODO Write integration tests: client disconnect → handler cancelled
22 CAN-061 TODO Write integration tests: timeout → handler cancelled
23 CAN-062 TODO Write tests: late response ignored

CancelPayload

public sealed class CancelPayload
{
    public string Reason { get; init; } = string.Empty;
}

public static class CancelReasons
{
    public const string ClientDisconnected = "ClientDisconnected";
    public const string Timeout = "Timeout";
    public const string PayloadLimitExceeded = "PayloadLimitExceeded";
    public const string Shutdown = "Shutdown";
}

Gateway-Side: Sending CANCEL

On Client Disconnect

// In TransportDispatchMiddleware
context.RequestAborted.Register(async () =>
{
    await transport.SendCancelAsync(
        connection,
        correlationId,
        CancelReasons.ClientDisconnected);
});

On Timeout

using var cts = CancellationTokenSource.CreateLinkedTokenSource(context.RequestAborted);
cts.CancelAfter(decision.EffectiveTimeout);

try
{
    var response = await transport.SendRequestAsync(..., cts.Token);
}
catch (OperationCanceledException) when (cts.IsCancellationRequested)
{
    if (!context.RequestAborted.IsCancellationRequested)
    {
        // Timeout, not client disconnect
        await transport.SendCancelAsync(connection, correlationId, CancelReasons.Timeout);
        context.Response.StatusCode = 504;
        return;
    }
}

Late Response Handling

private readonly ConcurrentDictionary<Guid, bool> _cancelledRequests = new();

public void MarkCancelled(Guid correlationId)
{
    _cancelledRequests[correlationId] = true;
}

public bool IsCancelled(Guid correlationId)
{
    return _cancelledRequests.ContainsKey(correlationId);
}

// When response arrives
if (IsCancelled(frame.CorrelationId))
{
    _logger.LogDebug("Ignoring late response for cancelled {CorrelationId}", frame.CorrelationId);
    return; // Discard
}

Microservice-Side: Receiving CANCEL

Inflight Tracking

internal sealed class InflightRequestTracker
{
    private readonly ConcurrentDictionary<Guid, InflightRequest> _inflight = new();

    public CancellationToken Track(Guid correlationId, Task handlerTask)
    {
        var cts = new CancellationTokenSource();
        _inflight[correlationId] = new InflightRequest(cts, handlerTask);
        return cts.Token;
    }

    public void Cancel(Guid correlationId, string reason)
    {
        if (_inflight.TryGetValue(correlationId, out var request))
        {
            request.Cts.Cancel();
            _logger.LogInformation("Cancelled {CorrelationId}: {Reason}", correlationId, reason);
        }
    }

    public void Complete(Guid correlationId)
    {
        if (_inflight.TryRemove(correlationId, out var request))
        {
            request.Cts.Dispose();
        }
    }

    public void CancelAll(string reason)
    {
        foreach (var kvp in _inflight)
        {
            kvp.Value.Cts.Cancel();
        }
        _inflight.Clear();
    }
}

Connection-Close Handling

// When connection closes unexpectedly
_inflightTracker.CancelAll("ConnectionClosed");

Handler Cancellation Guidelines

Handlers MUST:

  1. Accept CancellationToken parameter
  2. Pass token to all async I/O operations
  3. Check token.IsCancellationRequested in loops
  4. Stop work promptly when cancelled
public class ProcessDataEndpoint : IStellaEndpoint<DataRequest, DataResponse>
{
    public async Task<DataResponse> HandleAsync(DataRequest request, CancellationToken ct)
    {
        // Pass token to I/O
        var data = await _database.QueryAsync(request.Id, ct);

        // Check in loops
        foreach (var item in data)
        {
            ct.ThrowIfCancellationRequested();
            await ProcessItemAsync(item, ct);
        }

        return new DataResponse { ... };
    }
}

Exit Criteria

Before marking this sprint DONE:

  1. CANCEL frames sent on client disconnect
  2. CANCEL frames sent on timeout
  3. SDK tracks inflight requests with CTS
  4. SDK cancels handlers on CANCEL frame
  5. Connection close cancels all inflight
  6. Late responses are ignored/logged
  7. Integration tests verify cancellation flow

Execution Log

Date (UTC) Update Owner

Decisions & Risks

  • Cancellation is cooperative; handlers must honor the token
  • CTS disposal happens on completion to avoid leaks
  • Late response cleanup: entries expire after 60 seconds
  • Shutdown CANCEL is best-effort (connections may close first)