Files
git.stella-ops.org/docs/workflow/engine/02-runtime-and-component-architecture.md
master f5b5f24d95 Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects
Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into
standalone StellaOps.Workflow.* libraries targeting net10.0.

Libraries (14):
- Contracts, Abstractions (compiler, decompiler, expression runtime)
- Engine (execution, signaling, scheduling, projections, hosted services)
- ElkSharp (generic graph layout algorithm)
- Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg
- Signaling.Redis, Signaling.OracleAq
- DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle

WebService: ASP.NET Core Minimal API with 22 endpoints

Tests (8 projects, 109 tests pass):
- Engine.Tests (105 pass), WebService.Tests (4 E2E pass)
- Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests
- Signaling.Redis.Tests, IntegrationTests.Shared

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:14:44 +02:00

398 lines
9.4 KiB
Markdown

# 02. Runtime And Component Architecture
## 1. Top-Level System View
At the highest level, the service contains six product-facing areas:
1. definition and canonical catalog
2. start and task APIs
3. engine execution runtime
4. durable state and read projections
5. signaling and scheduling
6. operational services
The engine replaces the Elsa-dependent runtime area, not the whole product.
## 2. Top-Level Components
### 2.1 API Layer
Responsibilities:
- expose workflow endpoints
- validate user input
- call `WorkflowRuntimeService`
- preserve current contract shape
Examples in the current service:
- workflow start endpoint
- task get/list endpoints
- task assign/release/complete endpoints
- instance get/list endpoints
- canonical schema and validation endpoints
### 2.2 Product Orchestration Layer
Responsibilities:
- resolve workflow registration and definition
- enforce service-level flow for start and task completion
- update read projections
- call runtime provider
- persist runtime snapshot metadata
- start continuations
The current workflow runtime service remains the product orchestrator in v1.
### 2.3 Runtime Provider Layer
Responsibilities:
- provide a stable execution interface
- hide the concrete runtime implementation
- allow a future backend swap without changing service-level behavior
Proposed abstraction:
```csharp
public interface IWorkflowRuntimeProvider
{
string ProviderName { get; }
Task<WorkflowRuntimeExecutionResult> StartAsync(
WorkflowRegistration registration,
WorkflowDefinitionDescriptor definition,
WorkflowBusinessReference? businessReference,
StartWorkflowRequest request,
object startRequest,
CancellationToken cancellationToken = default);
Task<WorkflowRuntimeExecutionResult> CompleteAsync(
WorkflowRegistration registration,
WorkflowDefinitionDescriptor definition,
WorkflowTaskExecutionContext context,
CancellationToken cancellationToken = default);
}
```
In v1, one provider is active per deployment:
- `SerdicaEngineRuntimeProvider`
The abstraction still exists so the backend can change later.
### 2.4 Canonical Execution Layer
Responsibilities:
- execute canonical definitions
- evaluate expressions
- drive state transitions
- activate tasks
- invoke transports
- persist wait state
- emit signals and schedules
This is the actual engine kernel.
### 2.5 Persistence Layer
Responsibilities:
- store runtime snapshots
- store instance projections
- store task projections
- store task events
- coordinate host-owned jobs and workers
The current baseline uses one workflow database model plus one projection application service for product-facing reads.
### 2.6 Signal And Schedule Layer
Responsibilities:
- deliver immediate wake-up signals
- deliver delayed wake-up signals
- support blocking receive
- support durable retry and dead-letter handling
Default backend:
- Oracle AQ
### 2.7 Operational Layer
Responsibilities:
- retention
- dead-letter handling
- metrics
- tracing
- runtime diagnostics
- workflow diagram projection
## 3. Mid-Level Runtime Structure
The engine should be decomposed into the following internal runtime components.
### 3.1 Definition Normalizer
Purpose:
- take authored workflow registrations
- compile them into canonical runtime definitions
- validate the definitions
- cache them for execution
Responsibilities:
- call canonical compiler
- call canonical validator
- fail startup when configured to require valid definitions
- expose resolved runtime definitions by workflow name/version
### 3.2 Execution Coordinator
Purpose:
- provide the single in-process entry point for runtime execution
Responsibilities:
- load current snapshot
- acquire execution right through version check or row lock
- invoke interpreter
- collect engine side effects
- persist snapshot changes
- update projections
- enqueue signals or schedules
- commit transaction
### 3.3 Canonical Interpreter
Purpose:
- interpret canonical steps until the next wait boundary
Responsibilities:
- evaluate canonical expressions
- handle step sequencing
- handle branching and repeat loops
- activate human tasks
- invoke transport adapters
- enter wait states
- resume from wait states
- manage subworkflow frames
### 3.4 Expression Runtime
Purpose:
- evaluate canonical expressions consistently across runtime and validation expectations
Responsibilities:
- use core function catalog
- use plugin function catalog
- evaluate against the canonical execution context
Current design baseline:
- one canonical expression runtime
- one core function catalog
- zero or more plugin-provided function catalogs
### 3.5 Transport Dispatcher
Purpose:
- execute transport-backed steps through Serdica transport abstractions
Responsibilities:
- resolve transport type
- call the correct adapter
- normalize responses to canonical result objects
- route failure and timeout behavior back into the interpreter
### 3.6 Task Activation Writer
Purpose:
- convert a runtime task activation result into projection rows
Responsibilities:
- create task rows
- create task-created events
- preserve business reference and role semantics
### 3.7 Signal Pump
Purpose:
- block on AQ dequeue
- dispatch envelopes to the execution coordinator
Responsibilities:
- receive signal envelope
- process with bounded concurrency
- complete or abandon transactionally
- dead-letter poison signals
### 3.8 Scheduler Adapter
Purpose:
- translate runtime waits into AQ delayed messages
Responsibilities:
- enqueue due signals with delay
- cancel logically through waiting tokens
- ignore stale delayed messages safely
## 4. Detailed Component Responsibilities
### 4.1 WorkflowRuntimeService
This service remains the product boundary for runtime actions.
It should continue to own:
- start request binding
- business reference resolution
- task authorization integration
- projection updates
- runtime snapshot persistence
- continuation dispatch
It should stop owning:
- engine-specific step execution logic
- engine-specific scheduling details
- engine-specific signal handling
### 4.2 SerdicaEngineRuntimeProvider
This provider becomes the main bridge between product orchestration and the runtime kernel.
It should:
- normalize the requested workflow into a canonical runtime definition
- create an execution request
- call the execution coordinator
- map engine execution results into `WorkflowRuntimeExecutionResult`
It should not:
- update read projections directly
- own task authorization
- know about HTTP endpoint contracts
### 4.3 WorkflowProjectionStore
This store remains the read model writer.
It should continue to own:
- `WF_INSTANCES`
- `WF_TASKS`
- `WF_TASK_EVENTS`
It should not become the engine snapshot store.
### 4.4 Runtime Snapshot Store
This store owns the authoritative engine snapshot.
It should:
- read current runtime state
- write runtime state atomically
- enforce optimistic concurrency or explicit version progression
- store waiting metadata
- store provider state
It may evolve from the current `IWorkflowRuntimeStateStore`.
### 4.5 AQ Signal Bus
This adapter owns durable wake-up delivery.
It should:
- publish immediate signals
- publish delayed signals
- receive with blocking dequeue
- expose complete/abandon semantics
It should not:
- understand workflow business logic
- mutate projections
- deserialize full workflow snapshots
## 5. Runtime Request Flows
### 5.1 Start Workflow
1. API receives `StartWorkflowRequest`.
2. `WorkflowRuntimeService` resolves registration and definition.
3. The typed request is bound from payload.
4. Business reference is resolved.
5. `SerdicaEngineRuntimeProvider.StartAsync` is called.
6. The provider resolves the canonical runtime definition.
7. The execution coordinator creates a new snapshot and runs the interpreter.
8. The interpreter runs until:
- a task is activated
- a timer wait is registered
- an external wait is registered
- the workflow completes
9. The coordinator persists runtime snapshot changes.
10. `WorkflowRuntimeService` writes projections and runtime metadata.
11. Continuations are started if present.
### 5.2 Complete Task
1. API receives `WorkflowTaskCompleteRequest`.
2. `WorkflowRuntimeService` loads snapshot and task projection.
3. Authorization is checked.
4. The runtime provider is called with:
- task context
- workflow state
- completion payload
5. The execution coordinator advances the canonical definition from the task completion entry point.
6. It persists the new runtime snapshot and engine wait state.
7. `WorkflowRuntimeService` applies task completion and creates new task rows if needed.
### 5.3 External Or Scheduled Signal
1. AQ signal pump dequeues a signal.
2. The signal is deserialized to a workflow signal envelope.
3. The execution coordinator loads the current snapshot.
4. The coordinator verifies:
- workflow instance exists
- waiting token matches
- version is compatible
5. The interpreter resumes from the stored resume point.
6. The transaction commits snapshot changes, projection changes, and any next signals.
## 6. Why This Structure Fits The Current Service
The current service already separates:
- product orchestration
- execution abstraction
- projections
- runtime state
- authorization
The new engine architecture uses that separation rather than fighting it.
That is the main reason the replacement can be implemented incrementally without redesigning the whole product.