Files
git.stella-ops.org/docs/workflow/ENGINE.md
master f5b5f24d95 Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects
Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into
standalone StellaOps.Workflow.* libraries targeting net10.0.

Libraries (14):
- Contracts, Abstractions (compiler, decompiler, expression runtime)
- Engine (execution, signaling, scheduling, projections, hosted services)
- ElkSharp (generic graph layout algorithm)
- Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg
- Signaling.Redis, Signaling.OracleAq
- DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle

WebService: ASP.NET Core Minimal API with 22 endpoints

Tests (8 projects, 109 tests pass):
- Engine.Tests (105 pass), WebService.Tests (4 E2E pass)
- Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests
- Signaling.Redis.Tests, IntegrationTests.Shared

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:14:44 +02:00

1024 lines
33 KiB
Markdown

# Serdica Workflow Engine
A declarative, plugin-based workflow engine for long-running insurance business processes. Replaces Camunda BPMN with a native C# fluent DSL, canonical JSON schema, durable signal-based execution, and multi-backend persistence.
---
## Table of Contents
- [Architecture Overview](#architecture-overview)
- [Workflow Declaration DSL](#workflow-declaration-dsl)
- [Human Tasks](#human-tasks)
- [Transport Calls (Service Tasks)](#transport-calls-service-tasks)
- [Control Flow](#control-flow)
- [Sub-Workflows & Continuations](#sub-workflows--continuations)
- [Canonical Definition Schema](#canonical-definition-schema)
- [Expression System](#expression-system)
- [Signal System](#signal-system)
- [Timeout Architecture](#timeout-architecture)
- [Retention & Lifecycle](#retention--lifecycle)
- [Authorization](#authorization)
- [Plugin System](#plugin-system)
- [Configuration Reference](#configuration-reference)
- [Service Surface](#service-surface)
- [Diagram & Visualization](#diagram--visualization)
- [Error Handling](#error-handling)
---
## Architecture Overview
### Execution Flow
```
Workflow request
-> Workflow runtime service
-> Runtime orchestrator
-> Canonical execution handler
-> Transport adapters
-> Projection store
-> Runtime state store
-> Signal and schedule buses
```
### Key Components
| Component | Responsibility |
|-----------|---------------|
| **WorkflowRuntimeService** | Engine-facing lifecycle and task operations |
| **CanonicalWorkflowExecutionHandler** | Evaluates canonical step sequences, manages fork/join state |
| **WorkflowSignalPumpHostedService** | Background consumer for durable signal processing |
| **WorkflowRetentionHostedService** | Background cleanup of stale/completed instances |
| **IWorkflowProjectionStore** | Task and instance persistence (Mongo, Oracle, Postgres) |
| **IWorkflowRuntimeStateStore** | Durable execution state snapshots |
| **IWorkflowSignalBus** | Signal publishing and delivery |
| **Transport Plugins** | HTTP, GraphQL, message-bus transports, microservice command transport |
### Runtime Providers
| Provider | Name | Use Case |
|----------|------|----------|
| **Serdica.InProcess** | In-memory execution | Testing, simple workflows without durability |
| **Serdica.Engine** | Canonical engine with durable state | Production workflows with signal-based resumption |
---
## Workflow Declaration DSL
Workflows are defined using a strongly-typed C# fluent builder. The DSL compiles to a canonical JSON definition at startup.
### Minimal Workflow
```csharp
public sealed class ApproveApplicationWorkflow
: IDeclarativeWorkflow<ApproveRequest>
{
public string WorkflowName => "ApproveApplication";
public string WorkflowVersion => "1.0.0";
public string DisplayName => "Approve Application";
public IReadOnlyCollection<string> WorkflowRoles => ["DBA", "UR_UNDERWRITER"];
public WorkflowSpec<ApproveRequest> Spec { get; } = WorkflowSpec
.For<ApproveRequest>()
.InitializeState(request => new Dictionary<string, JsonElement>
{
["policyId"] = JsonSerializer.SerializeToElement(request.PolicyId),
})
.StartWith(approveTask)
.Build();
public IReadOnlyCollection<WorkflowTaskDescriptor> Tasks => Spec.TaskDescriptors;
private static readonly WorkflowHumanTaskDefinition<ApproveRequest> approveTask =
WorkflowHumanTask.For<ApproveRequest>(
taskName: "Approve Application",
taskType: "ApproveQTApproveApplication",
route: "business/policies")
.WithPayload(context => new Dictionary<string, JsonElement>
{
["policyId"] = context.StateValues.GetRequired<long>("policyId").AsJsonElement(),
})
.OnComplete(flow => flow.Complete());
}
```
### State Initialization
State can be initialized from the start request using delegates or expressions:
```csharp
// Delegate-based (typed)
.InitializeState(request => new { policyId = request.PolicyId, status = "NEW" })
// Expression-based (canonical, portable)
.InitializeState(
WorkflowExpr.Object(
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("start.policyId")),
WorkflowExpr.Prop("status", WorkflowExpr.String("NEW"))))
```
### Business Reference
Business references provide a queryable key for workflow instances:
```csharp
flow.SetBusinessReference(new WorkflowBusinessReferenceDeclaration
{
KeyExpression = WorkflowExpr.Path("state.policyId"),
PartsExpressions =
{
["policyId"] = WorkflowExpr.Path("state.policyId"),
["annexId"] = WorkflowExpr.Path("state.annexId"),
},
})
```
---
## Human Tasks
Human tasks pause workflow execution and wait for a user action (assign, complete, release).
### Defining a Task
```csharp
var reviewTask = WorkflowHumanTask.For<MyRequest>(
taskName: "Review Changes",
taskType: "ReviewPolicyChanges",
route: "business/policies",
taskRoles: ["UR_UNDERWRITER", "UR_OPERATIONS"])
.WithPayload(context => new Dictionary<string, JsonElement>
{
["policyId"] = context.StateValues.GetRequired<long>("policyId").AsJsonElement(),
})
.WithTimeout(86400) // 24-hour deadline (optional; default: no deadline)
.OnComplete(flow => flow
.WhenExpression(
"Approved?",
WorkflowExpr.Eq(WorkflowExpr.Path("payload.answer"), WorkflowExpr.String("approve")),
approved => approved
.Call("Confirm", confirmAddress, confirmPayload,
WorkflowHandledBranchAction.Complete,
WorkflowHandledBranchAction.Complete)
.Complete(),
rejected => rejected.Complete()));
```
### Task Properties
| Property | Type | Description |
|----------|------|-------------|
| `TaskName` | string | Unique name within the workflow |
| `TaskType` | string | UI component type identifier |
| `Route` | string | Navigation route for the UI |
| `TaskRoles` | string[] | Roles that can interact with this task |
| `TimeoutSeconds` | int? | Optional deadline. Null = no deadline (runs until completed or purged) |
| `DeadlineUtc` | DateTime? | Computed: `CreatedOnUtc + TimeoutSeconds`. Null if no timeout set |
### Task Lifecycle
```
Created (Pending)
-> Assigned (user claims task)
-> Completed (user submits payload)
-> OnComplete sequence executes
-> Next task activated, or workflow completes
```
### Task Actions & Authorization
| Action | Who Can Perform |
|--------|----------------|
| `AssignSelf` | Any user with matching effective roles |
| `AssignOther` | Admin roles only |
| `AssignRoles` | Admin roles only |
| `Release` | Current assignee or admin |
| `Complete` | Current assignee or admin |
---
## Transport Calls (Service Tasks)
Service tasks call external services via pluggable transports. Each call has optional failure and timeout recovery branches.
### Call with Address
```csharp
flow.Call<object>(
"Calculate Premium",
Address.LegacyRabbit("pas_premium_calculate_for_object"),
context => new { policyId = context.StateValues.GetRequired<long>("policyId") },
whenFailure: fail => fail.Complete(), // recovery on failure
whenTimeout: timeout => timeout.Complete(), // recovery on timeout
resultKey: "premiumResult", // store response in state
timeoutSeconds: 120); // per-step timeout override
```
### Address Types
| Address Factory | Transport | Example |
|----------------|-----------|---------|
| `Address.Microservice(name, command)` | Microservice command transport | `Address.Microservice("PasOperations", "perform")` |
| `Address.LegacyRabbit(command)` | Legacy message-bus transport | `Address.LegacyRabbit("pas_premium_calculate")` |
| `Address.Rabbit(exchange, routingKey)` | Exchange/routing-key bus transport | `Address.Rabbit("serdica", "policy.create")` |
| `Address.Http(target, path, method?)` | HTTP REST | `Address.Http("authority", "/api/users", "GET")` |
| `Address.Graphql(target, query)` | GraphQL | `Address.Graphql("serdica", "query { ... }")` |
### Failure & Timeout Handling
Every `Call` step supports optional `whenFailure` and `whenTimeout` branches:
```csharp
.Call("Service Task", address, payload,
whenFailure: fail => fail
.SetState("errorOccurred", WorkflowExpr.Bool(true))
.Complete(), // graceful completion on failure
whenTimeout: timeout => timeout
.Call("Retry Alternative", altAddress, altPayload,
WorkflowHandledBranchAction.Complete,
WorkflowHandledBranchAction.Complete)
.Complete())
```
If neither handler is defined and the transport call fails/times out, the exception propagates and the signal is retried (up to `MaxDeliveryAttempts`).
### Shorthand Actions
```csharp
.Call("Step", address, payload,
WorkflowHandledBranchAction.Complete, // on failure: complete workflow
WorkflowHandledBranchAction.Complete) // on timeout: complete workflow
```
---
## Control Flow
### Decisions (Conditional Branching)
```csharp
flow.WhenExpression(
"Is VIP Customer?",
WorkflowExpr.Eq(WorkflowExpr.Path("state.customerType"), WorkflowExpr.String("VIP")),
whenTrue: vip => vip
.Call("VIP Processing", ...)
.Complete(),
whenElse: standard => standard
.Call("Standard Processing", ...)
.Complete());
```
### State Flag Decisions
```csharp
flow.WhenStateFlag(
"policyExistsOnIPAL",
expectedValue: true,
"Policy exists on IPAL?",
whenTrue: exists => exists.Call("Open For Change", ...),
whenElse: notExists => notExists.Call("Create Policy", ...));
```
### Repeat (Loops)
```csharp
flow.Repeat(
"Retry Integration",
maxIterations: context => 5,
body: body => body
.Call("Integrate", integrationAddress, payload,
WorkflowHandledBranchAction.Complete,
WorkflowHandledBranchAction.Complete)
.SetState("retryCount", WorkflowExpr.Func("add",
WorkflowExpr.Path("state.retryCount"), WorkflowExpr.Number(1))),
continueWhile: WorkflowExpr.Ne(
WorkflowExpr.Path("state.integrationStatus"),
WorkflowExpr.String("SUCCESS")));
```
### Fork (Parallel Branches)
```csharp
flow.Fork("Process All Objects",
branch1 => branch1.Call("Process Object A", ...),
branch2 => branch2.Call("Process Object B", ...),
branch3 => branch3.Call("Process Object C", ...));
```
All branches execute concurrently. The workflow resumes after all branches complete.
### Timer (Delay)
```csharp
flow.Timer("Wait Before Retry",
delay: context => TimeSpan.FromMinutes(5));
```
### External Signal (Wait for Event)
```csharp
flow.WaitForSignal(
"Wait for Document Upload",
signalName: "documents-uploaded",
resultKey: "uploadedDocuments");
```
Signals are raised via `RaiseExternalSignalAsync` and matched by `signalName` + `WaitingToken`.
---
## Sub-Workflows & Continuations
### SubWorkflow (Inline Execution)
Executes a child workflow inline within the parent. The parent waits for the child to complete.
```csharp
flow.SubWorkflow(
"Run Review Process",
new WorkflowWorkflowInvocationDeclaration
{
WorkflowName = "ReviewPolicyChanges",
PayloadExpression = WorkflowExpr.Object(
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("state.policyId"))),
});
```
### ContinueWith (Signal-Based)
Starts a new workflow instance asynchronously via the signal bus. The parent completes immediately.
```csharp
flow.ContinueWith(
"Start Transfer Process",
new WorkflowWorkflowInvocationDeclaration
{
WorkflowName = "TransferPolicy",
PayloadExpression = WorkflowExpr.Path("state"),
});
```
**When to use which:**
- **SubWorkflow**: Child must complete before parent continues. State flows back to parent.
- **ContinueWith**: Fire-and-forget. Parent completes, child runs independently.
---
## Canonical Definition Schema
Every workflow compiles to a canonical JSON definition (`serdica.workflow.definition/v1`). This enables:
- Portable workflow definitions (JSON import/export)
- Runtime validation without C# compilation
- Visual designer support
### Step Types
| Type | JSON `$type` | Description |
|------|-------------|-------------|
| Set State | `"set-state"` | Assign a value to workflow state |
| Business Reference | `"assign-business-reference"` | Set the business reference |
| Transport Call | `"call-transport"` | Call an external service |
| Decision | `"decision"` | Conditional branch |
| Activate Task | `"activate-task"` | Pause for human task |
| Continue With | `"continue-with-workflow"` | Start child workflow (async) |
| Sub-Workflow | `"sub-workflow"` | Execute child workflow (inline) |
| Repeat | `"repeat"` | Loop with condition |
| Timer | `"timer"` | Delay execution |
| External Signal | `"external-signal"` | Wait for external event |
| Fork | `"fork"` | Parallel branches |
| Complete | `"complete"` | Terminal step |
### Transport Address Types
| Type | JSON `$type` | Properties |
|------|-------------|------------|
| Microservice | `"microservice"` | `microserviceName`, `command` |
| Rabbit | `"rabbit"` | `exchange`, `routingKey` |
| Legacy Rabbit | `"legacy-rabbit"` | `command`, `mode` |
| GraphQL | `"graphql"` | `target`, `query`, `operationName?` |
| HTTP | `"http"` | `target`, `path`, `method` |
### Example Canonical Definition
```json
{
"$schemaVersion": "serdica.workflow.definition/v1",
"workflowName": "ApproveApplication",
"workflowVersion": "1.0.0",
"displayName": "Approve Application",
"workflowRoles": ["DBA", "UR_UNDERWRITER"],
"start": {
"initializeStateExpression": {
"$type": "object",
"properties": [
{ "name": "policyId", "expression": { "$type": "path", "path": "start.policyId" } }
]
},
"sequence": {
"steps": [
{
"$type": "call-transport",
"stepName": "Validate Policy",
"timeoutSeconds": 60,
"invocation": {
"address": {
"$type": "legacy-rabbit",
"command": "pas_policy_validate"
},
"payloadExpression": {
"$type": "object",
"properties": [
{ "name": "policyId", "expression": { "$type": "path", "path": "state.policyId" } }
]
}
}
},
{
"$type": "activate-task",
"taskName": "Approve Application",
"timeoutSeconds": 86400
}
]
}
},
"tasks": [
{
"taskName": "Approve Application",
"taskType": "ApproveQTApproveApplication",
"routeExpression": { "$type": "string", "value": "business/policies" },
"taskRoles": [],
"payloadExpression": { "$type": "path", "path": "state" }
}
]
}
```
---
## Expression System
The expression system evaluates declarative expressions at runtime without recompilation. All expressions are JSON-serializable for canonical portability.
### Expression Types
| Type | Builder | Example |
|------|---------|---------|
| Null | `WorkflowExpr.Null()` | JSON null |
| String | `WorkflowExpr.String("value")` | `"value"` |
| Number | `WorkflowExpr.Number(42)` | `42` |
| Boolean | `WorkflowExpr.Bool(true)` | `true` |
| Path | `WorkflowExpr.Path("state.policyId")` | Navigate object graph |
| Object | `WorkflowExpr.Object(props...)` | Construct object from named props |
| Array | `WorkflowExpr.Array(items...)` | Construct array |
| Function | `WorkflowExpr.Func("name", args...)` | Call a registered function |
| Binary | `WorkflowExpr.Eq(left, right)` | Comparison/arithmetic |
| Unary | `WorkflowExpr.Not(expr)` | Logical negation |
### Path Navigation
Paths navigate the execution context:
- `start.*` — Start request fields
- `state.*` — Current workflow state
- `payload.*` — Current task completion payload
- `result.*` — Step result (when `resultKey` is set)
### Binary Operators
| Operator | Builder | Description |
|----------|---------|-------------|
| `eq` | `WorkflowExpr.Eq(a, b)` | Equal |
| `ne` | `WorkflowExpr.Ne(a, b)` | Not equal |
| `gt` | `WorkflowExpr.Gt(a, b)` | Greater than |
| `gte` | `WorkflowExpr.Gte(a, b)` | Greater or equal |
| `lt` | `WorkflowExpr.Lt(a, b)` | Less than |
| `lte` | `WorkflowExpr.Lte(a, b)` | Less or equal |
| `and` | `WorkflowExpr.And(a, b)` | Logical AND |
| `or` | `WorkflowExpr.Or(a, b)` | Logical OR |
| `add` | — | Arithmetic addition |
| `subtract` | — | Arithmetic subtraction |
| `multiply` | — | Arithmetic multiplication |
| `divide` | — | Arithmetic division |
### Built-in Functions
| Function | Signature | Description |
|----------|-----------|-------------|
| `coalesce` | `coalesce(value1, value2, ...)` | Returns first non-null argument |
| `concat` | `concat(str1, str2, ...)` | String concatenation |
| `add` | `add(num1, num2, ...)` | Sum of numeric arguments |
| `first` | `first(array)` | First element of array |
| `if` | `if(condition, whenTrue, whenFalse)` | Conditional value |
| `isNullOrWhiteSpace` | `isNullOrWhiteSpace(value)` | Check for null/empty string |
| `length` | `length(value)` | Length of string or array |
| `mergeObjects` | `mergeObjects(obj1, obj2, ...)` | Deep merge objects |
| `upper` | `upper(value)` | Uppercase string |
| `selectManyPath` | `selectManyPath(array, path)` | Map over array elements |
| `findPath` | `findPath(data, path)` | Navigate nested paths |
Custom functions can be registered via `IWorkflowFunctionProvider` plugins.
---
## Signal System
Signals enable durable, asynchronous communication within workflows. They are persisted to a message queue (Oracle AQ, MongoDB, etc.) and processed by the signal pump.
### Signal Types
| Type | Trigger | Purpose |
|------|---------|---------|
| `InternalContinue` | `ContinueWith` step | Start a child workflow asynchronously |
| `TimerDue` | Timer step delay expired | Resume workflow after delay |
| `RetryDue` | Retry delay expired | Resume after backoff |
| `ExternalSignal` | External signal submission through the workflow service surface | External event notification |
| `SubWorkflowCompleted` | Child sub-workflow finished | Resume parent workflow |
### Signal Envelope
```csharp
new WorkflowSignalEnvelope
{
SignalId = "unique-id",
WorkflowInstanceId = "target-instance",
RuntimeProvider = "Serdica.Engine",
SignalType = "ExternalSignal",
ExpectedVersion = 5, // Concurrency control
WaitingToken = "wait-token", // Match specific wait
DueAtUtc = null, // null = immediate, DateTime = scheduled
Payload = { ... },
}
```
### Signal Processing Pipeline
```
Oracle AQ / MongoDB / Postgres
-> WorkflowSignalPumpHostedService (N concurrent workers)
-> WorkflowSignalPumpWorker.RunOnceAsync
-> IWorkflowSignalBus.ReceiveAsync (blocking dequeue)
-> WorkflowSignalProcessor.ProcessAsync (route by type)
-> WorkflowSignalCommandDispatcher
-> WorkflowRuntimeService.StartWorkflowAsync (InternalContinue)
-> WorkflowRuntimeService.ResumeSignalAsync (all others)
```
### Concurrency Control
Version-based optimistic concurrency prevents duplicate signal processing:
- Each signal carries `ExpectedVersion`
- `IWorkflowRuntimeStateStore.UpsertAsync` validates version matches
- On mismatch: `WorkflowRuntimeStateConcurrencyException` is thrown
- Signal pump treats concurrency conflicts as successful (completes the lease)
### Dead Letter Queue
Signals that fail `MaxDeliveryAttempts` times are moved to the dead-letter queue. Dead letters can be inspected and replayed through the workflow service surface.
---
## Timeout Architecture
Timeouts operate at three independent levels:
### Level 1: Per-Step Timeout (Service Tasks)
Each transport call step has an optional timeout that wraps the entire call (including retries) with a `CancellationTokenSource`.
| Setting | Default | Override |
|---------|---------|----------|
| `step.TimeoutSeconds` | null | Per-step in workflow declaration |
| `DefaultTimeoutForServiceTaskCallsSeconds` | 3600s (1h) | Code constant (fallback) |
```csharp
// Per-step override in workflow DSL:
.Call("Slow Service", address, payload, fail, timeout, timeoutSeconds: 300)
// In canonical JSON:
{ "$type": "call-transport", "timeoutSeconds": 300, ... }
```
**Precedence:** `step.TimeoutSeconds` -> `DefaultTimeoutForServiceTaskCallsSeconds` (1h)
### Level 2: Per-Attempt Transport Timeout
Each individual transport attempt (single HTTP request, single RPC call) has its own timeout. This is independent of the step-level timeout.
| Transport | Default | Config Section |
|-----------|---------|---------------|
| HTTP | 30s | `WorkflowHttpTransport.TimeoutSeconds` |
| GraphQL | 30s | `WorkflowGraphqlTransport.TimeoutSeconds` |
| Legacy message-bus transport | 30s | `WorkflowLegacyRabbitTransport.DefaultTimeout` |
| Exchange/routing-key bus transport | 30s | `WorkflowRabbitTransport.DefaultTimeout` |
The step timeout wraps all attempts. Example: step timeout 120s + transport timeout 30s = up to 4 retries within the step window.
### Level 3: Engine-Wide Execution Timeout
Optional global timeout per workflow operation (start, complete, resume).
| Setting | Default | Config |
|---------|---------|--------|
| `ExecutionTimeoutSeconds` | null (disabled) | `WorkflowEngine.ExecutionTimeoutSeconds` |
Set to null for long-running business processes that span days or months.
### Human Task Deadlines
| Setting | Default | Override |
|---------|---------|----------|
| `TimeoutSeconds` on activate-task | null (no deadline) | `.WithTimeout(seconds)` on task builder |
| `DeadlineUtc` on task summary | null | Computed: `CreatedOnUtc + TimeoutSeconds` |
When null, human tasks run indefinitely. Stale/orphaned tasks are cleaned up by the retention service.
---
## Retention & Lifecycle
The retention system automatically manages workflow instance lifecycle.
### Configuration
| Setting | Default | Config Section |
|---------|---------|---------------|
| `OpenStaleAfterDays` | 30 | `WorkflowRetention` |
| `CompletedPurgeAfterDays` | 180 | `WorkflowRetention` |
### Retention Job
| Setting | Default | Config Section |
|---------|---------|---------------|
| `Enabled` | true | `WorkflowRetentionHostedJob` |
| `RunOnStartup` | false | `WorkflowRetentionHostedJob` |
| `InitialDelay` | 5 min | `WorkflowRetentionHostedJob` |
| `Interval` | 24 hours | `WorkflowRetentionHostedJob` |
| `LockLease` | 2 hours | `WorkflowRetentionHostedJob` |
### Lifecycle Flow
```
Instance Created (Open)
-> StaleAfterUtc = CreatedOnUtc + OpenStaleAfterDays
[Retention job marks as stale]
Instance Completed
-> PurgeAfterUtc = CompletedOnUtc + CompletedPurgeAfterDays
[Retention job deletes instance, tasks, events, runtime state]
```
Manual trigger: `POST /workflow-retention/run`
---
## Authorization
Task authorization uses a pluggable evaluator pattern.
### Interface
```csharp
public interface IWorkflowAssignmentPermissionEvaluator
{
WorkflowAssignmentPermissionDecision Evaluate(WorkflowAssignmentPermissionContext context);
}
```
### Default Plugin: Generic Assignment Permissions
Configured via `GenericAssignmentPermissions.AdminRoles` (appsettings).
| Action | Admin | Standard User |
|--------|-------|--------------|
| AssignSelf | Yes | Yes (if has effective role) |
| AssignOther | Yes | No |
| AssignRoles | Yes | No |
| Release | Yes | Yes (if current assignee) |
| Complete | Yes | Yes (if current assignee) |
### Effective Roles
A task's `EffectiveRoles` combines:
1. `WorkflowRoles` — from the workflow definition
2. `TaskRoles` — from the task definition
3. `RuntimeRoles` — computed at runtime via expression
If `TaskRoles` are specified, they narrow the effective roles. Otherwise, `WorkflowRoles` apply.
---
## Plugin System
Plugins extend the workflow engine with backend stores, transports, signal drivers, and workflow definitions.
### Plugin Types
| Category | Example Plugins |
|----------|----------------|
| **Backend Store** | Oracle, MongoDB, Postgres |
| **Signal Driver** | Redis, Oracle AQ (native) |
| **Transport** | HTTP, GraphQL, legacy message-bus, exchange/routing-key bus, microservice command |
| **Permissions** | Generic RBAC |
| **Workflow Definitions** | Bulstrad (customer-specific) |
### Creating a Plugin
```csharp
public sealed class ServiceRegistrator : IPluginServiceRegistrator
{
public void RegisterServices(IServiceCollection services, IConfiguration configuration)
{
services.AddWorkflowModule("my-module", "1.0.0");
services.AddScoped<IMyService, MyServiceImpl>();
}
}
```
### Loading Order
Plugins load in the order specified by `PluginsConfig.PluginsOrder` in appsettings. Backend stores must load before transport or workflow plugins.
### Marker Interfaces
- `IWorkflowBackendRegistrationMarker` — validates backend plugin is loaded
- `IWorkflowSignalDriverRegistrationMarker` — validates signal driver is loaded
Startup validation throws `InvalidOperationException` if a configured provider is missing its plugin.
---
## Configuration Reference
### WorkflowEngine
```json
{
"WorkflowEngine": {
"NodeId": "workflow-node-1",
"MaxConcurrentExecutions": 16,
"MaxConcurrentSignalHandlers": 16,
"ExecutionTimeoutSeconds": 300,
"GracefulShutdownTimeoutSeconds": 30
}
}
```
### WorkflowRuntime
```json
{
"WorkflowRuntime": {
"DefaultProvider": "Serdica.Engine",
"EnabledProviders": ["Serdica.InProcess", "Serdica.Engine"]
}
}
```
### WorkflowAq (Signal Queue)
```json
{
"WorkflowAq": {
"QueueOwner": "SRD_WFKLW",
"SignalQueueName": "WF_SIGNAL_Q",
"ScheduleQueueName": "WF_SCHEDULE_Q",
"DeadLetterQueueName": "WF_DLQ_Q",
"ConsumerName": "WORKFLOW_SERVICE",
"BlockingDequeueSeconds": 30,
"MaxDeliveryAttempts": 10
}
}
```
### WorkflowRetention
```json
{
"WorkflowRetention": {
"OpenStaleAfterDays": 30,
"CompletedPurgeAfterDays": 180
}
}
```
### WorkflowRetentionHostedJob
```json
{
"WorkflowRetentionHostedJob": {
"Enabled": true,
"RunOnStartup": false,
"InitialDelay": "00:05:00",
"Interval": "1.00:00:00",
"LockName": "workflow.retention",
"LockLease": "02:00:00"
}
}
```
### Transport Configuration
```json
{
"WorkflowHttpTransport": {
"TimeoutSeconds": 30,
"RetryCount": 3,
"Targets": {
"authority": { "Url": "http://localhost:52000", "Headers": {} }
}
},
"WorkflowGraphqlTransport": {
"TimeoutSeconds": 30,
"RetryCount": 3,
"Targets": {
"serdica": { "Url": "http://localhost:5100/graphql/" }
}
},
"WorkflowLegacyRabbitTransport": {
"DefaultTimeout": "00:00:30"
},
"WorkflowRabbitTransport": {
"DefaultTimeout": "00:00:30",
"DefaultUserId": "workflow-engine"
}
}
```
### Plugin Loading
```json
{
"PluginsConfig": {
"PluginsDirectory": "PluginBinaries",
"PluginsOrder": [
"assign-permissions",
"workflow-store",
"signal-driver",
"transports",
"workflow-definitions"
]
}
}
```
Use deployment-specific plugin identifiers in that order: durability first, wake mechanism second, transports after that, and workflow-definition bundles last.
---
## Service Surface
The engine depends on a workflow service surface, but the platform transport and command-mapping layer are intentionally out of scope for this document.
### Lifecycle Operations
- Start a workflow instance.
- List workflow instances with filtering (by name, version, status, business reference, instance ID, or multiple instance IDs). Set `IncludeDetails = true` to return each instance's active task and workflow state variables.
- Read one workflow instance with tasks, events, and runtime state.
### Task Operations
- List tasks by workflow, status, assignee, or business reference.
- Read one task.
- Assign a task to a user or role group.
- Release a task back to the pool.
- Complete a task with payload.
### Signal And Operations Management
- Raise an external signal to a waiting instance.
- Inspect dead-lettered signals.
- Replay dead-lettered signals.
- Inspect signal-pump telemetry.
### Definitions And Metadata
- List workflow definitions (filterable by name, version, or multiple names).
- Get a single definition by name with optional rendering assets (SVG/PNG/JSON).
- Render a workflow definition as a diagram.
- Render a definition in a specific format (`svg`, `png`, or `json` render graph).
- Expose the canonical schema.
- Validate canonical definitions.
- Expose the installed function catalog and engine metadata.
### Definition Deployment
- Import a canonical definition with versioned storage and content-hash deduplication.
- Export a definition with optional rendering package.
- List all versions of a definition with hash, active flag, and metadata.
- Activate a specific version as the active version for a workflow name.
### Administration
- Trigger a manual retention sweep.
---
## Diagram & Visualization
The engine can render workflow definitions as visual diagrams.
### Layout Engines
| Engine | Description |
|--------|-------------|
| **ElkSharp** | Port of Eclipse Layout Kernel (default) |
| **ElkJS** | JavaScript-based ELK via Node.js |
| **MSAGL** | Microsoft Automatic Graph Layout |
### Configuration
```json
{
"WorkflowRendering": {
"LayoutProvider": "ElkSharp"
}
}
```
### Render Pipeline
```
WorkflowCanonicalDefinition
-> WorkflowRenderGraphCompiler (nodes + edges)
-> WorkflowRenderLayoutEngineResolver (select engine)
-> Layout engine (compute positions)
-> WorkflowRenderDiagramResponse (JSON for UI)
```
---
## Error Handling
### Exception Types
| Exception | Cause | Recovery |
|-----------|-------|----------|
| `WorkflowRuntimeStateConcurrencyException` | Duplicate/stale signal delivery | Auto-handled by signal pump (completes lease) |
| `BaseResultException` | Business validation failure (not found, denied) | Returns error to caller |
| `TimeoutException` | Transport or step timeout exceeded | Executes `WhenTimeout` branch if configured |
| `NotSupportedException` | Unsupported operation (e.g., null signal store) | Configuration error — check plugin loading |
### Signal Retry Behavior
| Scenario | Behavior |
|----------|----------|
| Transient error | Signal abandoned, retried on next poll |
| Concurrency conflict | Signal completed (not retried) |
| Max delivery attempts exceeded | Signal moved to dead-letter queue |
| Deserialization failure | Signal dead-lettered with error logged |
### Observability
- **Structured logging** via Serilog (all key operations logged with structured properties)
- **Signal pump telemetry** via `WorkflowSignalPumpTelemetryService` (in-memory counters, queryable through the workflow service surface)
- **W3C trace IDs** enabled (`Activity.DefaultIdFormat = W3C`)
---
## Compiler & Decompiler
### Forward Compiler
`WorkflowCanonicalDefinitionCompiler.Compile<TStartRequest>()` converts a C# fluent DSL workflow (`IDeclarativeWorkflow<T>`) into a canonical JSON definition (`WorkflowCanonicalDefinition`). This runs at startup for all registered workflows.
The compiler also generates a **JSON Schema** for the start request type, embedded in the canonical definition's `startRequest.schema` field. This provides a portable, CLR-independent contract for the workflow's input.
### Reverse Compiler (Decompiler)
`WorkflowCanonicalDecompiler` converts a canonical definition back to C# source code using Roslyn `SyntaxFactory`. Two modes:
- **`Decompile(definition)`** — produces formatted C# source text including a typed start request class generated from the JSON Schema and the full workflow class with fluent builder chain
- **`Reconstruct(definition)`** — produces a new `WorkflowCanonicalDefinition` via deep clone (for structural comparison)
The decompiler uses `nameof()` for all type and method references (`WorkflowExpr.Obj`, `LegacyRabbitAddress`, etc.) to ensure rename safety at compile time.
### Round-Trip Verification
The test suite verifies compiler fidelity via real Roslyn dynamic compilation:
```
Original C# workflow
-> [compile] -> canonical JSON (JSON1)
-> [decompile] -> C# source text
-> [Roslyn CSharpCompilation] -> in-memory assembly
-> [reflection: instantiate workflow]
-> [compile] -> canonical JSON (JSON2)
-> assert JSON1 == JSON2
```
This catches any information loss in the compile/decompile cycle: missing steps, truncated expressions, wrong addresses, lost failure/timeout branches.
**Test results:**
- 177/177 decompiled C# files compile cleanly with Roslyn
- Semantic round-trip comparison identifies remaining gaps for iterative improvement
### Decompiled Output
Running the `RenderAllDecompiledOutputs` test generates human-readable output for all workflows:
```
docs/decompiled-samples/
csharp/ 177 .cs files (Roslyn-formatted C# with typed request models)
json/ 177 .json files (indented canonical definitions with JSON Schema)
```