Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects
Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into standalone StellaOps.Workflow.* libraries targeting net10.0. Libraries (14): - Contracts, Abstractions (compiler, decompiler, expression runtime) - Engine (execution, signaling, scheduling, projections, hosted services) - ElkSharp (generic graph layout algorithm) - Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg - Signaling.Redis, Signaling.OracleAq - DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle WebService: ASP.NET Core Minimal API with 22 endpoints Tests (8 projects, 109 tests pass): - Engine.Tests (105 pass), WebService.Tests (4 E2E pass) - Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests - Signaling.Redis.Tests, IntegrationTests.Shared Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
377
docs/workflow/engine/03-canonical-execution-model.md
Normal file
377
docs/workflow/engine/03-canonical-execution-model.md
Normal file
@@ -0,0 +1,377 @@
|
||||
# 03. Canonical Execution Model
|
||||
|
||||
## 1. Why The Engine Executes Canonical Definitions
|
||||
|
||||
The workflow corpus is now fully declarative and canonicalizable.
|
||||
|
||||
That changes the best runtime strategy:
|
||||
|
||||
- authored C# remains the source of truth
|
||||
- canonical definition becomes the runtime execution contract
|
||||
- the engine interprets canonical definitions directly
|
||||
|
||||
This gives the platform:
|
||||
|
||||
- deterministic runtime behavior
|
||||
- shared semantics between export/import and execution
|
||||
- less runtime coupling to workflow-specific CLR delegates
|
||||
- a clean separation between authoring and execution
|
||||
|
||||
## 2. Definition Lifecycle
|
||||
|
||||
### 2.1 Authoring
|
||||
|
||||
Workflows are authored in C# through the declarative DSL.
|
||||
|
||||
### 2.2 Normalization
|
||||
|
||||
At service startup, each workflow registration is normalized into:
|
||||
|
||||
1. workflow registration metadata
|
||||
2. canonical workflow definition
|
||||
3. required module set
|
||||
4. function usage metadata
|
||||
|
||||
### 2.3 Validation
|
||||
|
||||
The runtime should validate canonical definitions before accepting them for execution.
|
||||
|
||||
Recommended startup modes:
|
||||
|
||||
- `Strict`
|
||||
Startup fails if a definition is invalid.
|
||||
- `Warn`
|
||||
Startup succeeds, but invalid definitions are marked unavailable.
|
||||
|
||||
### 2.4 Runtime Cache
|
||||
|
||||
The engine should cache canonical runtime definitions in memory by:
|
||||
|
||||
- workflow name
|
||||
- workflow version
|
||||
|
||||
This cache is immutable after startup in v1.
|
||||
|
||||
## 3. Canonical Runtime Definition Shape
|
||||
|
||||
The runtime definition should be treated as a compiled, execution-ready representation of the canonical contracts, not a raw JSON document.
|
||||
|
||||
The runtime model should contain:
|
||||
|
||||
- definition identity
|
||||
- display metadata
|
||||
- required modules
|
||||
- step graph
|
||||
- task declarations
|
||||
- expression trees
|
||||
- transport declarations
|
||||
- subworkflow declarations
|
||||
- continue-with declarations
|
||||
|
||||
## 4. Execution Context Model
|
||||
|
||||
The interpreter should run every step against a single canonical execution context.
|
||||
|
||||
Recommended execution context fields:
|
||||
|
||||
- `WorkflowName`
|
||||
- `WorkflowVersion`
|
||||
- `WorkflowInstanceId`
|
||||
- `BusinessReference`
|
||||
- `State`
|
||||
- `StartPayload`
|
||||
- `CompletionPayload`
|
||||
- `CurrentTask`
|
||||
- `CurrentSignal`
|
||||
- `FunctionRuntime`
|
||||
- `TransportDispatcher`
|
||||
- `RuntimeMetadata`
|
||||
|
||||
`RuntimeMetadata` should hold:
|
||||
|
||||
- node id
|
||||
- current signal id
|
||||
- snapshot version
|
||||
- waiting token
|
||||
- execution started at
|
||||
|
||||
## 5. Core Runtime State Model
|
||||
|
||||
The runtime must distinguish between:
|
||||
|
||||
- business state
|
||||
- engine state
|
||||
|
||||
### 5.1 Business State
|
||||
|
||||
Business state is what the workflow author reasons about.
|
||||
|
||||
Examples:
|
||||
|
||||
- `srPolicyId`
|
||||
- `policySubstatus`
|
||||
- customer lookup state
|
||||
- payload shaping outputs
|
||||
- subworkflow results
|
||||
|
||||
### 5.2 Engine State
|
||||
|
||||
Engine state is what the runtime needs to resume correctly.
|
||||
|
||||
Examples:
|
||||
|
||||
- current workflow status
|
||||
- current wait type
|
||||
- current wait token
|
||||
- active task identity
|
||||
- resume pointer
|
||||
- subworkflow frame stack
|
||||
- outstanding timer descriptors
|
||||
- last processed signal id
|
||||
|
||||
Business state must remain visible in runtime inspection.
|
||||
Engine state must remain safe and deterministic for resume.
|
||||
|
||||
## 6. Run-To-Wait Execution Model
|
||||
|
||||
The engine uses a run-to-wait interpreter.
|
||||
|
||||
This means:
|
||||
|
||||
1. load snapshot
|
||||
2. execute sequentially
|
||||
3. stop when a durable wait boundary is reached
|
||||
4. persist resulting snapshot
|
||||
5. release instance
|
||||
|
||||
Wait boundaries are:
|
||||
|
||||
- human task activation
|
||||
- scheduled timer
|
||||
- external signal wait
|
||||
- child workflow wait
|
||||
- terminal completion
|
||||
|
||||
This model is essential for:
|
||||
|
||||
- multi-instance safety
|
||||
- restart recovery
|
||||
- no sticky ownership
|
||||
- no in-memory correctness assumptions
|
||||
|
||||
## 7. Step Semantics
|
||||
|
||||
### 7.1 State Assignment
|
||||
|
||||
State assignment is immediate and local to the current execution transaction.
|
||||
|
||||
The engine:
|
||||
|
||||
- evaluates the assignment expression
|
||||
- writes to the business state dictionary
|
||||
- keeps changes in-memory until the next durable checkpoint
|
||||
|
||||
### 7.2 Business Reference Assignment
|
||||
|
||||
Business reference assignment updates the canonical business reference attached to:
|
||||
|
||||
- the runtime snapshot
|
||||
- new tasks
|
||||
- instance projection updates
|
||||
|
||||
Business reference changes must be applied transactionally with other execution results.
|
||||
|
||||
### 7.3 Human Task Activation
|
||||
|
||||
A human task activation step is a terminal wait boundary.
|
||||
|
||||
The interpreter does not continue past it in the same execution.
|
||||
|
||||
The result of task activation is:
|
||||
|
||||
- one active task projection
|
||||
- updated instance status
|
||||
- updated runtime snapshot
|
||||
- optional runtime metadata for the active task
|
||||
|
||||
### 7.4 Transport Call
|
||||
|
||||
Transport calls are synchronous from the perspective of a single execution slice.
|
||||
|
||||
The engine:
|
||||
|
||||
- evaluates payload expressions
|
||||
- dispatches through the correct transport adapter
|
||||
- captures result payload
|
||||
- stores result under the result key when present
|
||||
- chooses the success, failure, or timeout branch
|
||||
|
||||
No engine-specific callback registration should be required for normal synchronous transport calls.
|
||||
|
||||
### 7.5 Conditional Branch
|
||||
|
||||
Conditions evaluate against the current execution context.
|
||||
|
||||
Only one branch is executed.
|
||||
|
||||
The branch path must be reproducible in the resume pointer model.
|
||||
|
||||
### 7.6 Repeat
|
||||
|
||||
Repeat executes logically as:
|
||||
|
||||
- evaluate collection or repeat source
|
||||
- for each iteration:
|
||||
- bind iteration context
|
||||
- execute nested sequence
|
||||
|
||||
If an iteration hits a wait boundary, the engine snapshot must preserve:
|
||||
|
||||
- repeat step id
|
||||
- iteration index
|
||||
- remaining resume location inside the iteration body
|
||||
|
||||
### 7.7 Subworkflow Invocation
|
||||
|
||||
Subworkflow invocation is a wait boundary unless the child completes inline before producing a wait.
|
||||
|
||||
Parent snapshot must record:
|
||||
|
||||
- child workflow identity
|
||||
- child workflow version
|
||||
- parent business reference
|
||||
- parent resume pointer
|
||||
- target result key
|
||||
- parent workflow state needed for resume
|
||||
|
||||
### 7.8 Continue-With
|
||||
|
||||
Continue-with creates a new workflow start request as an engine side effect.
|
||||
|
||||
It is not a resume boundary for the current instance unless explicitly modeled that way by the workflow.
|
||||
|
||||
## 8. Resume Model
|
||||
|
||||
### 8.1 Resume Pointer
|
||||
|
||||
The engine must persist a deterministic resume pointer.
|
||||
|
||||
It should identify:
|
||||
|
||||
- entry point kind
|
||||
- task name if resuming from task completion
|
||||
- branch path
|
||||
- next step index
|
||||
- repeat iteration where applicable
|
||||
|
||||
The existing declarative resume model is the right conceptual baseline, but the engine should persist it inside the canonical runtime snapshot rather than inside a CLR-only execution flow.
|
||||
|
||||
### 8.2 Waiting Token
|
||||
|
||||
Every durable wait must have a waiting token.
|
||||
|
||||
The waiting token is how the engine prevents stale resumes.
|
||||
|
||||
When a signal arrives:
|
||||
|
||||
- if the waiting token does not match the snapshot
|
||||
- the signal is stale and must be ignored safely
|
||||
|
||||
This is the primary guard for:
|
||||
|
||||
- canceled timers
|
||||
- duplicate wake-ups
|
||||
- late child completions
|
||||
- redelivered signals
|
||||
|
||||
### 8.3 Version
|
||||
|
||||
Every successful execution commit must increment snapshot version.
|
||||
|
||||
Signals may carry the expected version that created the wait.
|
||||
|
||||
This allows the engine to detect stale work before any mutation.
|
||||
|
||||
## 9. Human Task Model
|
||||
|
||||
The task model remains projection-first.
|
||||
|
||||
The runtime does not wait on an in-memory task object.
|
||||
|
||||
Instead:
|
||||
|
||||
- task activation writes a task projection row
|
||||
- runtime snapshot enters `WaitingForTaskCompletion`
|
||||
- task completion API provides the wake-up event
|
||||
|
||||
Task completion is therefore an external signal into the engine.
|
||||
|
||||
## 10. Error Model
|
||||
|
||||
The interpreter should classify errors into:
|
||||
|
||||
- definition errors
|
||||
- expression evaluation errors
|
||||
- transport errors
|
||||
- timeout errors
|
||||
- authorization errors
|
||||
- engine consistency errors
|
||||
|
||||
Definition errors are startup or validation failures.
|
||||
Execution errors are runtime failures that may:
|
||||
|
||||
- route into a failure branch
|
||||
- schedule a retry
|
||||
- fail the workflow
|
||||
- move the instance to a recoverable error state
|
||||
|
||||
## 11. Retry Model
|
||||
|
||||
Retries should be modeled explicitly as scheduled signals.
|
||||
|
||||
The engine should not sleep inside a worker.
|
||||
|
||||
A retry should:
|
||||
|
||||
1. persist the failure context
|
||||
2. generate a new waiting token
|
||||
3. enqueue a delayed resume signal
|
||||
4. commit
|
||||
|
||||
## 12. Completion Model
|
||||
|
||||
A workflow completes when the interpreter reaches terminal completion with no outstanding waits.
|
||||
|
||||
Completion result must:
|
||||
|
||||
- mark instance projection completed
|
||||
- mark runtime state completed
|
||||
- clear stale timeout metadata
|
||||
- apply retention timing
|
||||
|
||||
## 13. Determinism Requirements
|
||||
|
||||
The runtime must assume:
|
||||
|
||||
- expressions are deterministic given the execution context
|
||||
- transport calls are side effects and must be treated explicitly
|
||||
- no hidden CLR delegate behavior remains in workflow definitions
|
||||
|
||||
The runtime should not rely on:
|
||||
|
||||
- non-deterministic local time calls inside step execution
|
||||
- in-memory mutable workflow objects
|
||||
- ambient state outside the canonical execution context
|
||||
|
||||
## 14. Resulting Implementation Shape
|
||||
|
||||
The engine kernel should be implemented as:
|
||||
|
||||
- definition normalizer
|
||||
- canonical interpreter
|
||||
- transport dispatcher
|
||||
- execution coordinator
|
||||
- resume serializer/deserializer
|
||||
|
||||
This produces a runtime that is small, explicit, and aligned with the already-completed full-declaration effort.
|
||||
|
||||
Reference in New Issue
Block a user