Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into standalone StellaOps.Workflow.* libraries targeting net10.0. Libraries (14): - Contracts, Abstractions (compiler, decompiler, expression runtime) - Engine (execution, signaling, scheduling, projections, hosted services) - ElkSharp (generic graph layout algorithm) - Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg - Signaling.Redis, Signaling.OracleAq - DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle WebService: ASP.NET Core Minimal API with 22 endpoints Tests (8 projects, 109 tests pass): - Engine.Tests (105 pass), WebService.Tests (4 E2E pass) - Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests - Signaling.Redis.Tests, IntegrationTests.Shared Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
378 lines
8.7 KiB
Markdown
378 lines
8.7 KiB
Markdown
# 03. Canonical Execution Model
|
|
|
|
## 1. Why The Engine Executes Canonical Definitions
|
|
|
|
The workflow corpus is now fully declarative and canonicalizable.
|
|
|
|
That changes the best runtime strategy:
|
|
|
|
- authored C# remains the source of truth
|
|
- canonical definition becomes the runtime execution contract
|
|
- the engine interprets canonical definitions directly
|
|
|
|
This gives the platform:
|
|
|
|
- deterministic runtime behavior
|
|
- shared semantics between export/import and execution
|
|
- less runtime coupling to workflow-specific CLR delegates
|
|
- a clean separation between authoring and execution
|
|
|
|
## 2. Definition Lifecycle
|
|
|
|
### 2.1 Authoring
|
|
|
|
Workflows are authored in C# through the declarative DSL.
|
|
|
|
### 2.2 Normalization
|
|
|
|
At service startup, each workflow registration is normalized into:
|
|
|
|
1. workflow registration metadata
|
|
2. canonical workflow definition
|
|
3. required module set
|
|
4. function usage metadata
|
|
|
|
### 2.3 Validation
|
|
|
|
The runtime should validate canonical definitions before accepting them for execution.
|
|
|
|
Recommended startup modes:
|
|
|
|
- `Strict`
|
|
Startup fails if a definition is invalid.
|
|
- `Warn`
|
|
Startup succeeds, but invalid definitions are marked unavailable.
|
|
|
|
### 2.4 Runtime Cache
|
|
|
|
The engine should cache canonical runtime definitions in memory by:
|
|
|
|
- workflow name
|
|
- workflow version
|
|
|
|
This cache is immutable after startup in v1.
|
|
|
|
## 3. Canonical Runtime Definition Shape
|
|
|
|
The runtime definition should be treated as a compiled, execution-ready representation of the canonical contracts, not a raw JSON document.
|
|
|
|
The runtime model should contain:
|
|
|
|
- definition identity
|
|
- display metadata
|
|
- required modules
|
|
- step graph
|
|
- task declarations
|
|
- expression trees
|
|
- transport declarations
|
|
- subworkflow declarations
|
|
- continue-with declarations
|
|
|
|
## 4. Execution Context Model
|
|
|
|
The interpreter should run every step against a single canonical execution context.
|
|
|
|
Recommended execution context fields:
|
|
|
|
- `WorkflowName`
|
|
- `WorkflowVersion`
|
|
- `WorkflowInstanceId`
|
|
- `BusinessReference`
|
|
- `State`
|
|
- `StartPayload`
|
|
- `CompletionPayload`
|
|
- `CurrentTask`
|
|
- `CurrentSignal`
|
|
- `FunctionRuntime`
|
|
- `TransportDispatcher`
|
|
- `RuntimeMetadata`
|
|
|
|
`RuntimeMetadata` should hold:
|
|
|
|
- node id
|
|
- current signal id
|
|
- snapshot version
|
|
- waiting token
|
|
- execution started at
|
|
|
|
## 5. Core Runtime State Model
|
|
|
|
The runtime must distinguish between:
|
|
|
|
- business state
|
|
- engine state
|
|
|
|
### 5.1 Business State
|
|
|
|
Business state is what the workflow author reasons about.
|
|
|
|
Examples:
|
|
|
|
- `srPolicyId`
|
|
- `policySubstatus`
|
|
- customer lookup state
|
|
- payload shaping outputs
|
|
- subworkflow results
|
|
|
|
### 5.2 Engine State
|
|
|
|
Engine state is what the runtime needs to resume correctly.
|
|
|
|
Examples:
|
|
|
|
- current workflow status
|
|
- current wait type
|
|
- current wait token
|
|
- active task identity
|
|
- resume pointer
|
|
- subworkflow frame stack
|
|
- outstanding timer descriptors
|
|
- last processed signal id
|
|
|
|
Business state must remain visible in runtime inspection.
|
|
Engine state must remain safe and deterministic for resume.
|
|
|
|
## 6. Run-To-Wait Execution Model
|
|
|
|
The engine uses a run-to-wait interpreter.
|
|
|
|
This means:
|
|
|
|
1. load snapshot
|
|
2. execute sequentially
|
|
3. stop when a durable wait boundary is reached
|
|
4. persist resulting snapshot
|
|
5. release instance
|
|
|
|
Wait boundaries are:
|
|
|
|
- human task activation
|
|
- scheduled timer
|
|
- external signal wait
|
|
- child workflow wait
|
|
- terminal completion
|
|
|
|
This model is essential for:
|
|
|
|
- multi-instance safety
|
|
- restart recovery
|
|
- no sticky ownership
|
|
- no in-memory correctness assumptions
|
|
|
|
## 7. Step Semantics
|
|
|
|
### 7.1 State Assignment
|
|
|
|
State assignment is immediate and local to the current execution transaction.
|
|
|
|
The engine:
|
|
|
|
- evaluates the assignment expression
|
|
- writes to the business state dictionary
|
|
- keeps changes in-memory until the next durable checkpoint
|
|
|
|
### 7.2 Business Reference Assignment
|
|
|
|
Business reference assignment updates the canonical business reference attached to:
|
|
|
|
- the runtime snapshot
|
|
- new tasks
|
|
- instance projection updates
|
|
|
|
Business reference changes must be applied transactionally with other execution results.
|
|
|
|
### 7.3 Human Task Activation
|
|
|
|
A human task activation step is a terminal wait boundary.
|
|
|
|
The interpreter does not continue past it in the same execution.
|
|
|
|
The result of task activation is:
|
|
|
|
- one active task projection
|
|
- updated instance status
|
|
- updated runtime snapshot
|
|
- optional runtime metadata for the active task
|
|
|
|
### 7.4 Transport Call
|
|
|
|
Transport calls are synchronous from the perspective of a single execution slice.
|
|
|
|
The engine:
|
|
|
|
- evaluates payload expressions
|
|
- dispatches through the correct transport adapter
|
|
- captures result payload
|
|
- stores result under the result key when present
|
|
- chooses the success, failure, or timeout branch
|
|
|
|
No engine-specific callback registration should be required for normal synchronous transport calls.
|
|
|
|
### 7.5 Conditional Branch
|
|
|
|
Conditions evaluate against the current execution context.
|
|
|
|
Only one branch is executed.
|
|
|
|
The branch path must be reproducible in the resume pointer model.
|
|
|
|
### 7.6 Repeat
|
|
|
|
Repeat executes logically as:
|
|
|
|
- evaluate collection or repeat source
|
|
- for each iteration:
|
|
- bind iteration context
|
|
- execute nested sequence
|
|
|
|
If an iteration hits a wait boundary, the engine snapshot must preserve:
|
|
|
|
- repeat step id
|
|
- iteration index
|
|
- remaining resume location inside the iteration body
|
|
|
|
### 7.7 Subworkflow Invocation
|
|
|
|
Subworkflow invocation is a wait boundary unless the child completes inline before producing a wait.
|
|
|
|
Parent snapshot must record:
|
|
|
|
- child workflow identity
|
|
- child workflow version
|
|
- parent business reference
|
|
- parent resume pointer
|
|
- target result key
|
|
- parent workflow state needed for resume
|
|
|
|
### 7.8 Continue-With
|
|
|
|
Continue-with creates a new workflow start request as an engine side effect.
|
|
|
|
It is not a resume boundary for the current instance unless explicitly modeled that way by the workflow.
|
|
|
|
## 8. Resume Model
|
|
|
|
### 8.1 Resume Pointer
|
|
|
|
The engine must persist a deterministic resume pointer.
|
|
|
|
It should identify:
|
|
|
|
- entry point kind
|
|
- task name if resuming from task completion
|
|
- branch path
|
|
- next step index
|
|
- repeat iteration where applicable
|
|
|
|
The existing declarative resume model is the right conceptual baseline, but the engine should persist it inside the canonical runtime snapshot rather than inside a CLR-only execution flow.
|
|
|
|
### 8.2 Waiting Token
|
|
|
|
Every durable wait must have a waiting token.
|
|
|
|
The waiting token is how the engine prevents stale resumes.
|
|
|
|
When a signal arrives:
|
|
|
|
- if the waiting token does not match the snapshot
|
|
- the signal is stale and must be ignored safely
|
|
|
|
This is the primary guard for:
|
|
|
|
- canceled timers
|
|
- duplicate wake-ups
|
|
- late child completions
|
|
- redelivered signals
|
|
|
|
### 8.3 Version
|
|
|
|
Every successful execution commit must increment snapshot version.
|
|
|
|
Signals may carry the expected version that created the wait.
|
|
|
|
This allows the engine to detect stale work before any mutation.
|
|
|
|
## 9. Human Task Model
|
|
|
|
The task model remains projection-first.
|
|
|
|
The runtime does not wait on an in-memory task object.
|
|
|
|
Instead:
|
|
|
|
- task activation writes a task projection row
|
|
- runtime snapshot enters `WaitingForTaskCompletion`
|
|
- task completion API provides the wake-up event
|
|
|
|
Task completion is therefore an external signal into the engine.
|
|
|
|
## 10. Error Model
|
|
|
|
The interpreter should classify errors into:
|
|
|
|
- definition errors
|
|
- expression evaluation errors
|
|
- transport errors
|
|
- timeout errors
|
|
- authorization errors
|
|
- engine consistency errors
|
|
|
|
Definition errors are startup or validation failures.
|
|
Execution errors are runtime failures that may:
|
|
|
|
- route into a failure branch
|
|
- schedule a retry
|
|
- fail the workflow
|
|
- move the instance to a recoverable error state
|
|
|
|
## 11. Retry Model
|
|
|
|
Retries should be modeled explicitly as scheduled signals.
|
|
|
|
The engine should not sleep inside a worker.
|
|
|
|
A retry should:
|
|
|
|
1. persist the failure context
|
|
2. generate a new waiting token
|
|
3. enqueue a delayed resume signal
|
|
4. commit
|
|
|
|
## 12. Completion Model
|
|
|
|
A workflow completes when the interpreter reaches terminal completion with no outstanding waits.
|
|
|
|
Completion result must:
|
|
|
|
- mark instance projection completed
|
|
- mark runtime state completed
|
|
- clear stale timeout metadata
|
|
- apply retention timing
|
|
|
|
## 13. Determinism Requirements
|
|
|
|
The runtime must assume:
|
|
|
|
- expressions are deterministic given the execution context
|
|
- transport calls are side effects and must be treated explicitly
|
|
- no hidden CLR delegate behavior remains in workflow definitions
|
|
|
|
The runtime should not rely on:
|
|
|
|
- non-deterministic local time calls inside step execution
|
|
- in-memory mutable workflow objects
|
|
- ambient state outside the canonical execution context
|
|
|
|
## 14. Resulting Implementation Shape
|
|
|
|
The engine kernel should be implemented as:
|
|
|
|
- definition normalizer
|
|
- canonical interpreter
|
|
- transport dispatcher
|
|
- execution coordinator
|
|
- resume serializer/deserializer
|
|
|
|
This produces a runtime that is small, explicit, and aligned with the already-completed full-declaration effort.
|
|
|