Files

master f5b5f24d95 Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects

Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into
standalone StellaOps.Workflow.* libraries targeting net10.0.

Libraries (14):
- Contracts, Abstractions (compiler, decompiler, expression runtime)
- Engine (execution, signaling, scheduling, projections, hosted services)
- ElkSharp (generic graph layout algorithm)
- Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg
- Signaling.Redis, Signaling.OracleAq
- DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle

WebService: ASP.NET Core Minimal API with 22 endpoints

Tests (8 projects, 109 tests pass):
- Engine.Tests (105 pass), WebService.Tests (4 E2E pass)
- Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests
- Signaling.Redis.Tests, IntegrationTests.Shared

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-20 19:14:44 +02:00

8.7 KiB

Raw Blame History

03. Canonical Execution Model

1. Why The Engine Executes Canonical Definitions

The workflow corpus is now fully declarative and canonicalizable.

That changes the best runtime strategy:

authored C# remains the source of truth
canonical definition becomes the runtime execution contract
the engine interprets canonical definitions directly

This gives the platform:

deterministic runtime behavior
shared semantics between export/import and execution
less runtime coupling to workflow-specific CLR delegates
a clean separation between authoring and execution

2. Definition Lifecycle

2.1 Authoring

Workflows are authored in C# through the declarative DSL.

2.2 Normalization

At service startup, each workflow registration is normalized into:

workflow registration metadata
canonical workflow definition
required module set
function usage metadata

2.3 Validation

The runtime should validate canonical definitions before accepting them for execution.

Recommended startup modes:

Strict Startup fails if a definition is invalid.
Warn Startup succeeds, but invalid definitions are marked unavailable.

2.4 Runtime Cache

The engine should cache canonical runtime definitions in memory by:

workflow name
workflow version

This cache is immutable after startup in v1.

3. Canonical Runtime Definition Shape

The runtime definition should be treated as a compiled, execution-ready representation of the canonical contracts, not a raw JSON document.

The runtime model should contain:

definition identity
display metadata
required modules
step graph
task declarations
expression trees
transport declarations
subworkflow declarations
continue-with declarations

4. Execution Context Model

The interpreter should run every step against a single canonical execution context.

Recommended execution context fields:

WorkflowName
WorkflowVersion
WorkflowInstanceId
BusinessReference
State
StartPayload
CompletionPayload
CurrentTask
CurrentSignal
FunctionRuntime
TransportDispatcher
RuntimeMetadata

RuntimeMetadata should hold:

node id
current signal id
snapshot version
waiting token
execution started at

5. Core Runtime State Model

The runtime must distinguish between:

business state
engine state

5.1 Business State

Business state is what the workflow author reasons about.

Examples:

srPolicyId
policySubstatus
customer lookup state
payload shaping outputs
subworkflow results

5.2 Engine State

Engine state is what the runtime needs to resume correctly.

Examples:

current workflow status
current wait type
current wait token
active task identity
resume pointer
subworkflow frame stack
outstanding timer descriptors
last processed signal id

Business state must remain visible in runtime inspection. Engine state must remain safe and deterministic for resume.

6. Run-To-Wait Execution Model

The engine uses a run-to-wait interpreter.

This means:

load snapshot
execute sequentially
stop when a durable wait boundary is reached
persist resulting snapshot
release instance

Wait boundaries are:

human task activation
scheduled timer
external signal wait
child workflow wait
terminal completion

This model is essential for:

multi-instance safety
restart recovery
no sticky ownership
no in-memory correctness assumptions

7. Step Semantics

7.1 State Assignment

State assignment is immediate and local to the current execution transaction.

The engine:

evaluates the assignment expression
writes to the business state dictionary
keeps changes in-memory until the next durable checkpoint

7.2 Business Reference Assignment

Business reference assignment updates the canonical business reference attached to:

the runtime snapshot
new tasks
instance projection updates

Business reference changes must be applied transactionally with other execution results.

7.3 Human Task Activation

A human task activation step is a terminal wait boundary.

The interpreter does not continue past it in the same execution.

The result of task activation is:

one active task projection
updated instance status
updated runtime snapshot
optional runtime metadata for the active task

7.4 Transport Call

Transport calls are synchronous from the perspective of a single execution slice.

The engine:

evaluates payload expressions
dispatches through the correct transport adapter
captures result payload
stores result under the result key when present
chooses the success, failure, or timeout branch

No engine-specific callback registration should be required for normal synchronous transport calls.

7.5 Conditional Branch

Conditions evaluate against the current execution context.

Only one branch is executed.

The branch path must be reproducible in the resume pointer model.

7.6 Repeat

Repeat executes logically as:

evaluate collection or repeat source
for each iteration:
- bind iteration context
- execute nested sequence

If an iteration hits a wait boundary, the engine snapshot must preserve:

repeat step id
iteration index
remaining resume location inside the iteration body

7.7 Subworkflow Invocation

Subworkflow invocation is a wait boundary unless the child completes inline before producing a wait.

Parent snapshot must record:

child workflow identity
child workflow version
parent business reference
parent resume pointer
target result key
parent workflow state needed for resume

7.8 Continue-With

Continue-with creates a new workflow start request as an engine side effect.

It is not a resume boundary for the current instance unless explicitly modeled that way by the workflow.

8. Resume Model

8.1 Resume Pointer

The engine must persist a deterministic resume pointer.

It should identify:

entry point kind
task name if resuming from task completion
branch path
next step index
repeat iteration where applicable

The existing declarative resume model is the right conceptual baseline, but the engine should persist it inside the canonical runtime snapshot rather than inside a CLR-only execution flow.

8.2 Waiting Token

Every durable wait must have a waiting token.

The waiting token is how the engine prevents stale resumes.

When a signal arrives:

if the waiting token does not match the snapshot
the signal is stale and must be ignored safely

This is the primary guard for:

canceled timers
duplicate wake-ups
late child completions
redelivered signals

8.3 Version

Every successful execution commit must increment snapshot version.

Signals may carry the expected version that created the wait.

This allows the engine to detect stale work before any mutation.

9. Human Task Model

The task model remains projection-first.

The runtime does not wait on an in-memory task object.

Instead:

task activation writes a task projection row
runtime snapshot enters WaitingForTaskCompletion
task completion API provides the wake-up event

Task completion is therefore an external signal into the engine.

10. Error Model

The interpreter should classify errors into:

definition errors
expression evaluation errors
transport errors
timeout errors
authorization errors
engine consistency errors

Definition errors are startup or validation failures. Execution errors are runtime failures that may:

route into a failure branch
schedule a retry
fail the workflow
move the instance to a recoverable error state

11. Retry Model

Retries should be modeled explicitly as scheduled signals.

The engine should not sleep inside a worker.

A retry should:

persist the failure context
generate a new waiting token
enqueue a delayed resume signal
commit

12. Completion Model

A workflow completes when the interpreter reaches terminal completion with no outstanding waits.

Completion result must:

mark instance projection completed
mark runtime state completed
clear stale timeout metadata
apply retention timing

13. Determinism Requirements

The runtime must assume:

expressions are deterministic given the execution context
transport calls are side effects and must be treated explicitly
no hidden CLR delegate behavior remains in workflow definitions

The runtime should not rely on:

non-deterministic local time calls inside step execution
in-memory mutable workflow objects
ambient state outside the canonical execution context

14. Resulting Implementation Shape

The engine kernel should be implemented as:

definition normalizer
canonical interpreter
transport dispatcher
execution coordinator
resume serializer/deserializer

This produces a runtime that is small, explicit, and aligned with the already-completed full-declaration effort.

8.7 KiB Raw Blame History