Files
git.stella-ops.org/docs/workflow/engine/03-canonical-execution-model.md
master f5b5f24d95 Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects
Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into
standalone StellaOps.Workflow.* libraries targeting net10.0.

Libraries (14):
- Contracts, Abstractions (compiler, decompiler, expression runtime)
- Engine (execution, signaling, scheduling, projections, hosted services)
- ElkSharp (generic graph layout algorithm)
- Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg
- Signaling.Redis, Signaling.OracleAq
- DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle

WebService: ASP.NET Core Minimal API with 22 endpoints

Tests (8 projects, 109 tests pass):
- Engine.Tests (105 pass), WebService.Tests (4 E2E pass)
- Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests
- Signaling.Redis.Tests, IntegrationTests.Shared

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:14:44 +02:00

8.7 KiB

03. Canonical Execution Model

1. Why The Engine Executes Canonical Definitions

The workflow corpus is now fully declarative and canonicalizable.

That changes the best runtime strategy:

  • authored C# remains the source of truth
  • canonical definition becomes the runtime execution contract
  • the engine interprets canonical definitions directly

This gives the platform:

  • deterministic runtime behavior
  • shared semantics between export/import and execution
  • less runtime coupling to workflow-specific CLR delegates
  • a clean separation between authoring and execution

2. Definition Lifecycle

2.1 Authoring

Workflows are authored in C# through the declarative DSL.

2.2 Normalization

At service startup, each workflow registration is normalized into:

  1. workflow registration metadata
  2. canonical workflow definition
  3. required module set
  4. function usage metadata

2.3 Validation

The runtime should validate canonical definitions before accepting them for execution.

Recommended startup modes:

  • Strict Startup fails if a definition is invalid.
  • Warn Startup succeeds, but invalid definitions are marked unavailable.

2.4 Runtime Cache

The engine should cache canonical runtime definitions in memory by:

  • workflow name
  • workflow version

This cache is immutable after startup in v1.

3. Canonical Runtime Definition Shape

The runtime definition should be treated as a compiled, execution-ready representation of the canonical contracts, not a raw JSON document.

The runtime model should contain:

  • definition identity
  • display metadata
  • required modules
  • step graph
  • task declarations
  • expression trees
  • transport declarations
  • subworkflow declarations
  • continue-with declarations

4. Execution Context Model

The interpreter should run every step against a single canonical execution context.

Recommended execution context fields:

  • WorkflowName
  • WorkflowVersion
  • WorkflowInstanceId
  • BusinessReference
  • State
  • StartPayload
  • CompletionPayload
  • CurrentTask
  • CurrentSignal
  • FunctionRuntime
  • TransportDispatcher
  • RuntimeMetadata

RuntimeMetadata should hold:

  • node id
  • current signal id
  • snapshot version
  • waiting token
  • execution started at

5. Core Runtime State Model

The runtime must distinguish between:

  • business state
  • engine state

5.1 Business State

Business state is what the workflow author reasons about.

Examples:

  • srPolicyId
  • policySubstatus
  • customer lookup state
  • payload shaping outputs
  • subworkflow results

5.2 Engine State

Engine state is what the runtime needs to resume correctly.

Examples:

  • current workflow status
  • current wait type
  • current wait token
  • active task identity
  • resume pointer
  • subworkflow frame stack
  • outstanding timer descriptors
  • last processed signal id

Business state must remain visible in runtime inspection. Engine state must remain safe and deterministic for resume.

6. Run-To-Wait Execution Model

The engine uses a run-to-wait interpreter.

This means:

  1. load snapshot
  2. execute sequentially
  3. stop when a durable wait boundary is reached
  4. persist resulting snapshot
  5. release instance

Wait boundaries are:

  • human task activation
  • scheduled timer
  • external signal wait
  • child workflow wait
  • terminal completion

This model is essential for:

  • multi-instance safety
  • restart recovery
  • no sticky ownership
  • no in-memory correctness assumptions

7. Step Semantics

7.1 State Assignment

State assignment is immediate and local to the current execution transaction.

The engine:

  • evaluates the assignment expression
  • writes to the business state dictionary
  • keeps changes in-memory until the next durable checkpoint

7.2 Business Reference Assignment

Business reference assignment updates the canonical business reference attached to:

  • the runtime snapshot
  • new tasks
  • instance projection updates

Business reference changes must be applied transactionally with other execution results.

7.3 Human Task Activation

A human task activation step is a terminal wait boundary.

The interpreter does not continue past it in the same execution.

The result of task activation is:

  • one active task projection
  • updated instance status
  • updated runtime snapshot
  • optional runtime metadata for the active task

7.4 Transport Call

Transport calls are synchronous from the perspective of a single execution slice.

The engine:

  • evaluates payload expressions
  • dispatches through the correct transport adapter
  • captures result payload
  • stores result under the result key when present
  • chooses the success, failure, or timeout branch

No engine-specific callback registration should be required for normal synchronous transport calls.

7.5 Conditional Branch

Conditions evaluate against the current execution context.

Only one branch is executed.

The branch path must be reproducible in the resume pointer model.

7.6 Repeat

Repeat executes logically as:

  • evaluate collection or repeat source
  • for each iteration:
    • bind iteration context
    • execute nested sequence

If an iteration hits a wait boundary, the engine snapshot must preserve:

  • repeat step id
  • iteration index
  • remaining resume location inside the iteration body

7.7 Subworkflow Invocation

Subworkflow invocation is a wait boundary unless the child completes inline before producing a wait.

Parent snapshot must record:

  • child workflow identity
  • child workflow version
  • parent business reference
  • parent resume pointer
  • target result key
  • parent workflow state needed for resume

7.8 Continue-With

Continue-with creates a new workflow start request as an engine side effect.

It is not a resume boundary for the current instance unless explicitly modeled that way by the workflow.

8. Resume Model

8.1 Resume Pointer

The engine must persist a deterministic resume pointer.

It should identify:

  • entry point kind
  • task name if resuming from task completion
  • branch path
  • next step index
  • repeat iteration where applicable

The existing declarative resume model is the right conceptual baseline, but the engine should persist it inside the canonical runtime snapshot rather than inside a CLR-only execution flow.

8.2 Waiting Token

Every durable wait must have a waiting token.

The waiting token is how the engine prevents stale resumes.

When a signal arrives:

  • if the waiting token does not match the snapshot
  • the signal is stale and must be ignored safely

This is the primary guard for:

  • canceled timers
  • duplicate wake-ups
  • late child completions
  • redelivered signals

8.3 Version

Every successful execution commit must increment snapshot version.

Signals may carry the expected version that created the wait.

This allows the engine to detect stale work before any mutation.

9. Human Task Model

The task model remains projection-first.

The runtime does not wait on an in-memory task object.

Instead:

  • task activation writes a task projection row
  • runtime snapshot enters WaitingForTaskCompletion
  • task completion API provides the wake-up event

Task completion is therefore an external signal into the engine.

10. Error Model

The interpreter should classify errors into:

  • definition errors
  • expression evaluation errors
  • transport errors
  • timeout errors
  • authorization errors
  • engine consistency errors

Definition errors are startup or validation failures. Execution errors are runtime failures that may:

  • route into a failure branch
  • schedule a retry
  • fail the workflow
  • move the instance to a recoverable error state

11. Retry Model

Retries should be modeled explicitly as scheduled signals.

The engine should not sleep inside a worker.

A retry should:

  1. persist the failure context
  2. generate a new waiting token
  3. enqueue a delayed resume signal
  4. commit

12. Completion Model

A workflow completes when the interpreter reaches terminal completion with no outstanding waits.

Completion result must:

  • mark instance projection completed
  • mark runtime state completed
  • clear stale timeout metadata
  • apply retention timing

13. Determinism Requirements

The runtime must assume:

  • expressions are deterministic given the execution context
  • transport calls are side effects and must be treated explicitly
  • no hidden CLR delegate behavior remains in workflow definitions

The runtime should not rely on:

  • non-deterministic local time calls inside step execution
  • in-memory mutable workflow objects
  • ambient state outside the canonical execution context

14. Resulting Implementation Shape

The engine kernel should be implemented as:

  • definition normalizer
  • canonical interpreter
  • transport dispatcher
  • execution coordinator
  • resume serializer/deserializer

This produces a runtime that is small, explicit, and aligned with the already-completed full-declaration effort.