git.stella-ops.org/docs/workflow/engine/04-persistence-signaling-and-scheduling.md

# 04. Persistence, Signaling, And Scheduling

## 1. Persistence Strategy

Oracle is the single durable backend for v1.

It stores:

- workflow instance projections
- workflow task projections
- workflow task events
- workflow runtime snapshots
- hosted job locks
- AQ queues for immediate and delayed signals

This keeps correctness inside one transactional platform.

## 2. Existing Tables To Preserve

The current workflow schema already has the right base tables:

- `WF_INSTANCES`
- `WF_TASKS`
- `WF_TASK_EVENTS`
- `WF_RUNTIME_STATES`
- `WF_HOST_LOCKS`

The current workflow database model is the mapping baseline for these tables.

### 2.1 WF_INSTANCES

Purpose:

- product-facing workflow instance summary
- instance business reference
- instance status
- product-facing state snapshot

### 2.2 WF_TASKS

Purpose:

- active and historical human task projections
- task routing
- assignment
- task payload
- effective roles

### 2.3 WF_TASK_EVENTS

Purpose:

- append-only task event history
- created, assigned, released, completed, reassigned events

### 2.4 WF_RUNTIME_STATES

Purpose:

- engine-owned durable runtime snapshot

This table becomes the main source of truth for engine resume.

## 3. Proposed Runtime State Extensions

`WF_RUNTIME_STATES` should be extended to support canonical engine execution directly.

Recommended new columns:

- `STATE_VERSION`
  Numeric optimistic concurrency version.
- `SNAPSHOT_SCHEMA_VERSION`
  Snapshot format version for engine evolution.
- `WAITING_KIND`
  Current wait type.
- `WAITING_TOKEN`
  Stale-signal guard token.
- `WAITING_UNTIL_UTC`
  Next due time when waiting on time-based resume.
- `ACTIVE_TASK_ID`
  Current active task projection id when applicable.
- `RESUME_POINTER_JSON`
  Serialized canonical resume pointer.
- `LAST_SIGNAL_ID`
  Last successfully processed signal id.
- `LAST_ERROR_CODE`
  Last engine error code.
- `LAST_ERROR_JSON`
  Structured last error details.
- `LAST_EXECUTED_BY`
  Node id that last committed execution.
- `LAST_EXECUTED_ON_UTC`
  Last successful engine commit timestamp.

The existing fields remain useful:

- workflow identity
- business reference
- runtime provider
- runtime instance id
- runtime status
- state json
- lifecycle timestamps

## 4. Snapshot Structure

`STATE_JSON` should hold a provider snapshot object for `SerdicaEngine`.

Recommended shape:

```json
{
  "engineSchemaVersion": 1,
  "workflowState": {},
  "businessReference": {
    "key": "1200345",
    "parts": {}
  },
  "status": "Open",
  "waiting": {
    "kind": "TaskCompletion",
    "token": "wait-123",
    "untilUtc": null
  },
  "resume": {
    "entryPointKind": "TaskOnComplete",
    "taskName": "ApproveApplication",
    "branchPath": [],
    "nextStepIndex": 3
  },
  "subWorkflowFrames": [],
  "continuationBuffer": []
}
```

## 5. Oracle AQ Strategy

### 5.1 Why AQ

AQ is the default signaling backend because it gives:

- durable storage
- blocking dequeue
- delayed delivery
- database-managed recovery
- transactional semantics close to the runtime state store

### 5.2 Queue Topology

Use explicit queues for clarity and operations.

Recommended objects:

- `WF_SIGNAL_QTAB`
- `WF_SIGNAL_Q`
- `WF_SCHEDULE_QTAB`
- `WF_SCHEDULE_Q`
- `WF_DLQ_QTAB`
- `WF_DLQ_Q`

Rationale:

- immediate signals and delayed signals are operationally different
- dead-letter isolation matters for supportability
- queue separation makes metrics and troubleshooting simpler

### 5.3 Payload Format

Use a compact JSON envelope serialized to UTF-8 bytes in a `RAW` payload.

Reasons:

- simple from .NET
- explicit schema ownership in application code
- small message size
- backend abstraction remains possible later

Do not put full workflow snapshots into AQ messages.

## 6. Signal Envelope

Recommended envelope:

```csharp
public sealed record WorkflowSignalEnvelope
{
    public required string SignalId { get; init; }
    public required string WorkflowInstanceId { get; init; }
    public required string RuntimeProvider { get; init; }
    public required string SignalType { get; init; }
    public required long ExpectedVersion { get; init; }
    public string? WaitingToken { get; init; }
    public DateTime OccurredAtUtc { get; init; }
    public DateTime? DueAtUtc { get; init; }
    public Dictionary<string, JsonElement> Payload { get; init; } = [];
}
```

Signal types:

- `TaskCompleted`
- `TimerDue`
- `RetryDue`
- `ExternalSignal`
- `SubWorkflowCompleted`
- `InternalContinue`

## 7. Transaction Model

### 7.1 Start Transaction

Start must durably commit:

- instance projection
- runtime snapshot
- task rows if any
- task events if any
- scheduled or immediate AQ messages if any

### 7.2 Completion Transaction

Task completion must durably commit:

- task completion event
- updated instance projection
- new task rows if any
- updated runtime snapshot
- any resulting AQ signals or schedules

### 7.3 Signal Resume Transaction

Signal resume must durably commit:

- AQ dequeue
- updated runtime snapshot
- resulting projection changes
- any next AQ signals

The intended operational model is:

- dequeue with transactional semantics
- update state and projections
- commit once

If commit fails, the signal must become visible again.

## 8. Blocking Receive Model

No polling loop should be used for work discovery.

Each node should run a signal pump that:

- opens one or more blocking AQ dequeue consumers
- waits on AQ rather than sleeping and scanning
- dispatches envelopes to bounded execution workers

Suggested parameters:

- dequeue wait seconds
- max concurrent handlers
- max poison retries
- dead-letter policy

## 9. Scheduling Model

### 9.1 Scheduling Requirement

The engine must support timers without a periodic sweep job.

### 9.2 Scheduling Approach

When a workflow enters a timed wait:

1. runtime snapshot is updated with:
   - waiting kind
   - waiting token
   - due time
2. a delayed AQ message is enqueued
3. transaction commits

When the delayed message becomes available:

1. a signal consumer dequeues it
2. current snapshot is loaded
3. waiting token is checked
4. if token matches, resume
5. if token does not match, ignore as stale

### 9.3 Logical Cancel Instead Of Physical Delete

The scheduler should treat cancel and reschedule logically.

Do not make correctness depend on deleting a queued timer message.

Instead:

- generate a new waiting token when schedule changes
- old delayed message becomes stale automatically

This is simpler and more reliable in distributed execution.

## 10. Multi-Node Concurrency Model

The engine must assume multiple nodes can receive signals for the same workflow instance.

Correctness model:

- signal delivery is at-least-once
- snapshot update uses version control
- waiting token guards stale work
- duplicate resumes are safe

Recommended write model:

- read snapshot version
- execute
- update `WF_RUNTIME_STATES` where `STATE_VERSION = expected`
- if update count is zero, abandon and retry or ignore as stale

This avoids permanent instance ownership.

## 11. Restart And Recovery Semantics

### 11.1 One Node Down

Other nodes continue consuming AQ and processing instances.

### 11.2 All Nodes Down, Database Up

Signals remain durable in AQ.

When any node comes back:

- AQ consumers reconnect
- pending immediate and delayed signals are processed
- workflow resumes continue

### 11.3 Database Down

No execution can continue while Oracle is unavailable.

Once Oracle returns:

- AQ queues recover with the database
- runtime snapshots recover with the database
- resumed node consumers continue from durable state

### 11.4 All Nodes And Database Down

After Oracle returns and at least one application node starts:

- AQ messages are still present
- runtime state is still present
- due delayed messages can be consumed
- execution resumes from durable state

This is one of the main reasons Oracle AQ is preferred over a separate volatile wake-up layer.

## 12. Redis Position In V1

Redis is optional and not part of the correctness path.

It may be used later for:

- local cache
- non-authoritative wake hints
- metrics fanout

It should not be required for:

- durable signal delivery
- timer delivery
- restart recovery

## 13. Dead-Letter Strategy

Messages should move to DLQ when:

- deserialization fails
- definition is missing
- snapshot is irreparably inconsistent
- retry count exceeds threshold

DLQ entry should preserve:

- original envelope
- failure reason
- last node id
- failure timestamp

## 14. Retention

Retention remains a service responsibility.

It should continue to clean:

- stale instances
- stale tasks
- completed data past purge window
- runtime states past purge window

AQ retention policy should be aligned with application retention and supportability needs, but queue cleanup must not delete active work.