Files
git.stella-ops.org/docs/workflow/engine/03-canonical-execution-model.md
master f5b5f24d95 Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects
Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into
standalone StellaOps.Workflow.* libraries targeting net10.0.

Libraries (14):
- Contracts, Abstractions (compiler, decompiler, expression runtime)
- Engine (execution, signaling, scheduling, projections, hosted services)
- ElkSharp (generic graph layout algorithm)
- Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg
- Signaling.Redis, Signaling.OracleAq
- DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle

WebService: ASP.NET Core Minimal API with 22 endpoints

Tests (8 projects, 109 tests pass):
- Engine.Tests (105 pass), WebService.Tests (4 E2E pass)
- Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests
- Signaling.Redis.Tests, IntegrationTests.Shared

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:14:44 +02:00

378 lines
8.7 KiB
Markdown

# 03. Canonical Execution Model
## 1. Why The Engine Executes Canonical Definitions
The workflow corpus is now fully declarative and canonicalizable.
That changes the best runtime strategy:
- authored C# remains the source of truth
- canonical definition becomes the runtime execution contract
- the engine interprets canonical definitions directly
This gives the platform:
- deterministic runtime behavior
- shared semantics between export/import and execution
- less runtime coupling to workflow-specific CLR delegates
- a clean separation between authoring and execution
## 2. Definition Lifecycle
### 2.1 Authoring
Workflows are authored in C# through the declarative DSL.
### 2.2 Normalization
At service startup, each workflow registration is normalized into:
1. workflow registration metadata
2. canonical workflow definition
3. required module set
4. function usage metadata
### 2.3 Validation
The runtime should validate canonical definitions before accepting them for execution.
Recommended startup modes:
- `Strict`
Startup fails if a definition is invalid.
- `Warn`
Startup succeeds, but invalid definitions are marked unavailable.
### 2.4 Runtime Cache
The engine should cache canonical runtime definitions in memory by:
- workflow name
- workflow version
This cache is immutable after startup in v1.
## 3. Canonical Runtime Definition Shape
The runtime definition should be treated as a compiled, execution-ready representation of the canonical contracts, not a raw JSON document.
The runtime model should contain:
- definition identity
- display metadata
- required modules
- step graph
- task declarations
- expression trees
- transport declarations
- subworkflow declarations
- continue-with declarations
## 4. Execution Context Model
The interpreter should run every step against a single canonical execution context.
Recommended execution context fields:
- `WorkflowName`
- `WorkflowVersion`
- `WorkflowInstanceId`
- `BusinessReference`
- `State`
- `StartPayload`
- `CompletionPayload`
- `CurrentTask`
- `CurrentSignal`
- `FunctionRuntime`
- `TransportDispatcher`
- `RuntimeMetadata`
`RuntimeMetadata` should hold:
- node id
- current signal id
- snapshot version
- waiting token
- execution started at
## 5. Core Runtime State Model
The runtime must distinguish between:
- business state
- engine state
### 5.1 Business State
Business state is what the workflow author reasons about.
Examples:
- `srPolicyId`
- `policySubstatus`
- customer lookup state
- payload shaping outputs
- subworkflow results
### 5.2 Engine State
Engine state is what the runtime needs to resume correctly.
Examples:
- current workflow status
- current wait type
- current wait token
- active task identity
- resume pointer
- subworkflow frame stack
- outstanding timer descriptors
- last processed signal id
Business state must remain visible in runtime inspection.
Engine state must remain safe and deterministic for resume.
## 6. Run-To-Wait Execution Model
The engine uses a run-to-wait interpreter.
This means:
1. load snapshot
2. execute sequentially
3. stop when a durable wait boundary is reached
4. persist resulting snapshot
5. release instance
Wait boundaries are:
- human task activation
- scheduled timer
- external signal wait
- child workflow wait
- terminal completion
This model is essential for:
- multi-instance safety
- restart recovery
- no sticky ownership
- no in-memory correctness assumptions
## 7. Step Semantics
### 7.1 State Assignment
State assignment is immediate and local to the current execution transaction.
The engine:
- evaluates the assignment expression
- writes to the business state dictionary
- keeps changes in-memory until the next durable checkpoint
### 7.2 Business Reference Assignment
Business reference assignment updates the canonical business reference attached to:
- the runtime snapshot
- new tasks
- instance projection updates
Business reference changes must be applied transactionally with other execution results.
### 7.3 Human Task Activation
A human task activation step is a terminal wait boundary.
The interpreter does not continue past it in the same execution.
The result of task activation is:
- one active task projection
- updated instance status
- updated runtime snapshot
- optional runtime metadata for the active task
### 7.4 Transport Call
Transport calls are synchronous from the perspective of a single execution slice.
The engine:
- evaluates payload expressions
- dispatches through the correct transport adapter
- captures result payload
- stores result under the result key when present
- chooses the success, failure, or timeout branch
No engine-specific callback registration should be required for normal synchronous transport calls.
### 7.5 Conditional Branch
Conditions evaluate against the current execution context.
Only one branch is executed.
The branch path must be reproducible in the resume pointer model.
### 7.6 Repeat
Repeat executes logically as:
- evaluate collection or repeat source
- for each iteration:
- bind iteration context
- execute nested sequence
If an iteration hits a wait boundary, the engine snapshot must preserve:
- repeat step id
- iteration index
- remaining resume location inside the iteration body
### 7.7 Subworkflow Invocation
Subworkflow invocation is a wait boundary unless the child completes inline before producing a wait.
Parent snapshot must record:
- child workflow identity
- child workflow version
- parent business reference
- parent resume pointer
- target result key
- parent workflow state needed for resume
### 7.8 Continue-With
Continue-with creates a new workflow start request as an engine side effect.
It is not a resume boundary for the current instance unless explicitly modeled that way by the workflow.
## 8. Resume Model
### 8.1 Resume Pointer
The engine must persist a deterministic resume pointer.
It should identify:
- entry point kind
- task name if resuming from task completion
- branch path
- next step index
- repeat iteration where applicable
The existing declarative resume model is the right conceptual baseline, but the engine should persist it inside the canonical runtime snapshot rather than inside a CLR-only execution flow.
### 8.2 Waiting Token
Every durable wait must have a waiting token.
The waiting token is how the engine prevents stale resumes.
When a signal arrives:
- if the waiting token does not match the snapshot
- the signal is stale and must be ignored safely
This is the primary guard for:
- canceled timers
- duplicate wake-ups
- late child completions
- redelivered signals
### 8.3 Version
Every successful execution commit must increment snapshot version.
Signals may carry the expected version that created the wait.
This allows the engine to detect stale work before any mutation.
## 9. Human Task Model
The task model remains projection-first.
The runtime does not wait on an in-memory task object.
Instead:
- task activation writes a task projection row
- runtime snapshot enters `WaitingForTaskCompletion`
- task completion API provides the wake-up event
Task completion is therefore an external signal into the engine.
## 10. Error Model
The interpreter should classify errors into:
- definition errors
- expression evaluation errors
- transport errors
- timeout errors
- authorization errors
- engine consistency errors
Definition errors are startup or validation failures.
Execution errors are runtime failures that may:
- route into a failure branch
- schedule a retry
- fail the workflow
- move the instance to a recoverable error state
## 11. Retry Model
Retries should be modeled explicitly as scheduled signals.
The engine should not sleep inside a worker.
A retry should:
1. persist the failure context
2. generate a new waiting token
3. enqueue a delayed resume signal
4. commit
## 12. Completion Model
A workflow completes when the interpreter reaches terminal completion with no outstanding waits.
Completion result must:
- mark instance projection completed
- mark runtime state completed
- clear stale timeout metadata
- apply retention timing
## 13. Determinism Requirements
The runtime must assume:
- expressions are deterministic given the execution context
- transport calls are side effects and must be treated explicitly
- no hidden CLR delegate behavior remains in workflow definitions
The runtime should not rely on:
- non-deterministic local time calls inside step execution
- in-memory mutable workflow objects
- ambient state outside the canonical execution context
## 14. Resulting Implementation Shape
The engine kernel should be implemented as:
- definition normalizer
- canonical interpreter
- transport dispatcher
- execution coordinator
- resume serializer/deserializer
This produces a runtime that is small, explicit, and aligned with the already-completed full-declaration effort.