Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects
Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into standalone StellaOps.Workflow.* libraries targeting net10.0. Libraries (14): - Contracts, Abstractions (compiler, decompiler, expression runtime) - Engine (execution, signaling, scheduling, projections, hosted services) - ElkSharp (generic graph layout algorithm) - Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg - Signaling.Redis, Signaling.OracleAq - DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle WebService: ASP.NET Core Minimal API with 22 endpoints Tests (8 projects, 109 tests pass): - Engine.Tests (105 pass), WebService.Tests (4 E2E pass) - Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests - Signaling.Redis.Tests, IntegrationTests.Shared Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
493
docs/workflow/engine/15-backend-and-signal-driver-usage.md
Normal file
493
docs/workflow/engine/15-backend-and-signal-driver-usage.md
Normal file
@@ -0,0 +1,493 @@
|
||||
# 15. Backend And Signal Driver Usage
|
||||
|
||||
## Purpose
|
||||
|
||||
This document turns the current backend implementation and measured six-profile matrix into operating guidance.
|
||||
|
||||
It answers three practical questions:
|
||||
|
||||
1. which backend should be the durable workflow system of record
|
||||
2. whether the signal driver should stay native or use Redis
|
||||
3. when a given combination should or should not be used
|
||||
|
||||
The reference comparison data comes from:
|
||||
|
||||
- [13-backend-comparison-2026-03-17.md](13-backend-comparison-2026-03-17.md)
|
||||
- [14-signal-driver-backend-matrix-2026-03-17.md](14-signal-driver-backend-matrix-2026-03-17.md)
|
||||
|
||||
## Two Separate Choices
|
||||
|
||||
There are two distinct infrastructure choices in the current engine.
|
||||
|
||||
### 1. Backend
|
||||
|
||||
The backend is the durable correctness layer.
|
||||
|
||||
It owns:
|
||||
|
||||
- runtime state
|
||||
- projections
|
||||
- durable signal persistence
|
||||
- delayed signal persistence
|
||||
- dead-letter persistence
|
||||
- mutation transaction boundary
|
||||
|
||||
The configured backend lives under:
|
||||
|
||||
- `WorkflowBackend:Provider`
|
||||
|
||||
Supported values are defined by the engine backend identifiers.
|
||||
|
||||
Current values:
|
||||
|
||||
- `Oracle`
|
||||
- `Postgres`
|
||||
- `Mongo`
|
||||
|
||||
### 2. Signal Driver
|
||||
|
||||
The signal driver is the wake mechanism.
|
||||
|
||||
It owns:
|
||||
|
||||
- wake notification delivery
|
||||
- receive wait behavior
|
||||
- claim loop entry path
|
||||
|
||||
It does not own correctness.
|
||||
|
||||
The configured signal driver lives under:
|
||||
|
||||
- `WorkflowSignalDriver:Provider`
|
||||
|
||||
Supported values are defined by the engine signal-driver identifiers.
|
||||
|
||||
Current values:
|
||||
|
||||
- `Native`
|
||||
- `Redis`
|
||||
|
||||
## Core Rule
|
||||
|
||||
Redis is a wake driver, not a durable workflow queue.
|
||||
|
||||
That means:
|
||||
|
||||
1. the selected backend always remains the durable source of truth
|
||||
2. runtime state and durable signals commit in the backend transaction boundary
|
||||
3. Redis only publishes wake hints after commit
|
||||
4. workers always claim from the durable backend store
|
||||
|
||||
Do not design or describe Redis as the place where workflow correctness lives.
|
||||
|
||||
## Supported Profiles
|
||||
|
||||
| Profile | Durable correctness layer | Wake path | Current recommendation |
|
||||
| --- | --- | --- | --- |
|
||||
| `Oracle + Native` | Oracle + AQ | AQ dequeue | Default production profile |
|
||||
| `Oracle + Redis` | Oracle + AQ | Redis wake, AQ claim | Supported, not preferred |
|
||||
| `Postgres + Native` | PostgreSQL tables | PostgreSQL native wake | Best relational portability profile |
|
||||
| `Postgres + Redis` | PostgreSQL tables | Redis wake, PostgreSQL claim | Supported, optional |
|
||||
| `Mongo + Native` | Mongo collections | Mongo change streams | Fastest measured profile, with operational caveats |
|
||||
| `Mongo + Redis` | Mongo collections | Redis wake, Mongo claim | Supported, generally not recommended |
|
||||
|
||||
## How To Read The Performance Data
|
||||
|
||||
The six-profile matrix contains both real resume timing and benchmark drain policy timing.
|
||||
|
||||
Use these rows as primary decision inputs:
|
||||
|
||||
- `Signal to first completion avg`
|
||||
- `Throughput`
|
||||
|
||||
Treat these rows as secondary:
|
||||
|
||||
- `Signal to completion avg`
|
||||
- `Drain-to-idle overhang avg`
|
||||
|
||||
Reason:
|
||||
|
||||
- `Signal to first completion avg` measures actual wake and resume speed
|
||||
- `Signal to completion avg` also includes empty-queue drain behavior
|
||||
- `Drain-to-idle overhang avg` explains how much of the mixed latency is benchmark overhang, not real resume work
|
||||
|
||||
The current matrix shows that clearly:
|
||||
|
||||
| Metric | Oracle | PostgreSQL | Mongo | Oracle+Redis | PostgreSQL+Redis | Mongo+Redis |
|
||||
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||
| Signal to first completion avg ms | 76.15 | 37.56 | 55.06 | 81.46 | 31.77 | 40.88 |
|
||||
| Throughput ops/s | 24.17 | 26.28 | 119.51 | 21.88 | 25.51 | 25.14 |
|
||||
| Drain-to-idle overhang avg ms | 2909.65 | 3047.65 | 57.86 | 3031.66 | 3033.61 | 3036.85 |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- native Mongo is fast because the native change-stream wake path also has low empty-receive overhang
|
||||
- PostgreSQL native and PostgreSQL plus Redis are close in real resume speed
|
||||
- Oracle native remains slightly better than Oracle plus Redis
|
||||
- Mongo plus Redis loses most of native Mongo's advantage because Redis mode reintroduces the empty-wait overhang
|
||||
|
||||
## Recommended Default Choices
|
||||
|
||||
### Default Production Choice Today
|
||||
|
||||
Use `Oracle + Native`.
|
||||
|
||||
Use it when:
|
||||
|
||||
- Oracle is already the platform system of record
|
||||
- strongest validated correctness and restart behavior matter more than portability
|
||||
- AQ is available and operationally acceptable
|
||||
- timer precision and native transactional coupling are important
|
||||
|
||||
Why:
|
||||
|
||||
- it has the strongest hostile-condition coverage
|
||||
- it remains the semantic reference implementation
|
||||
- it keeps one native durable stack for state, signals, and scheduling
|
||||
|
||||
### Best Relational Non-Oracle Choice
|
||||
|
||||
Use `Postgres + Native`.
|
||||
|
||||
Use it when:
|
||||
|
||||
- a relational backend is required
|
||||
- Oracle is not desired
|
||||
- you want the cleanest portability path
|
||||
- you want performance close to Oracle with simpler infrastructure
|
||||
|
||||
Why:
|
||||
|
||||
- it is the strongest non-Oracle backend in the current relational comparison
|
||||
- native PostgreSQL wake is already competitive with Redis in the current measurements
|
||||
- it keeps one backend-native operational story
|
||||
|
||||
### Highest Measured Synthetic Throughput Choice
|
||||
|
||||
Use `Mongo + Native` only when its operational assumptions are acceptable.
|
||||
|
||||
Use it when:
|
||||
|
||||
- throughput and low wake latency matter strongly
|
||||
- Mongo replica-set transactions are already an accepted platform dependency
|
||||
- the team is comfortable operating change streams and Mongo-specific failure modes
|
||||
|
||||
Why:
|
||||
|
||||
- it is currently the fastest measured profile
|
||||
- its native wake path avoids the large empty-wait overhang seen in the other measured paths
|
||||
|
||||
Do not treat this as the universal default.
|
||||
|
||||
Mongo is fast in the current engine workload, but its operational model is still less conservative than the relational profiles.
|
||||
|
||||
## When Redis Should Be Used
|
||||
|
||||
Redis should be selected for operational topology reasons, not by default as a performance assumption.
|
||||
|
||||
Good reasons to use Redis:
|
||||
|
||||
- one shared wake substrate is required across multiple backend profiles
|
||||
- the deployment already standardizes on Redis for fan-out and worker wake infrastructure
|
||||
- you want the backend-native wake path disabled intentionally and replaced by one uniform wake mechanism
|
||||
|
||||
Weak reasons to use Redis:
|
||||
|
||||
- "Redis is always faster"
|
||||
- "Redis should hold the durable signal queue"
|
||||
- "Redis should replace the backend transaction boundary"
|
||||
|
||||
Those are not valid design assumptions for this engine.
|
||||
|
||||
## Profile-By-Profile Guidance
|
||||
|
||||
### Oracle + Native
|
||||
|
||||
Use when:
|
||||
|
||||
- Oracle is the chosen workflow backend
|
||||
- AQ is available
|
||||
- you want the strongest native transactional semantics
|
||||
|
||||
Do not switch away from it just to standardize on Redis.
|
||||
|
||||
Current measured result:
|
||||
|
||||
- native Oracle is slightly better than Oracle plus Redis on both first-completion latency and throughput
|
||||
|
||||
### Oracle + Redis
|
||||
|
||||
Use only when:
|
||||
|
||||
- Oracle remains the durable backend
|
||||
- Redis is required as a uniform wake topology across the environment
|
||||
- the small performance loss is acceptable
|
||||
|
||||
Do not use it as the default Oracle profile.
|
||||
|
||||
Current measured result:
|
||||
|
||||
- it works correctly
|
||||
- it is slower than native Oracle
|
||||
- it does not improve timer behavior today
|
||||
|
||||
### Postgres + Native
|
||||
|
||||
Use as the first portability target when leaving Oracle.
|
||||
|
||||
Use when:
|
||||
|
||||
- you want a relational durable store
|
||||
- you want the cleanest alternative to Oracle
|
||||
- you want the simplest operational story for PostgreSQL
|
||||
|
||||
This should be the default PostgreSQL profile.
|
||||
|
||||
### Postgres + Redis
|
||||
|
||||
Use when:
|
||||
|
||||
- PostgreSQL is the durable backend
|
||||
- a shared Redis wake topology is required
|
||||
- a nearly flat performance profile versus native PostgreSQL is acceptable
|
||||
|
||||
Do not assume it is a speed upgrade.
|
||||
|
||||
Current measured result:
|
||||
|
||||
- it is very close to native PostgreSQL
|
||||
- it is not a compelling performance win on its own
|
||||
|
||||
### Mongo + Native
|
||||
|
||||
Use when:
|
||||
|
||||
- MongoDB is an accepted transactional system of record for workflow runtime state
|
||||
- replica-set transactions are available
|
||||
- the team accepts Mongo operational ownership
|
||||
|
||||
This should be the default Mongo profile.
|
||||
|
||||
### Mongo + Redis
|
||||
|
||||
Avoid as the normal Mongo profile.
|
||||
|
||||
Use only when:
|
||||
|
||||
- Mongo must remain the durable backend
|
||||
- Redis wake standardization is mandatory for the deployment
|
||||
- the team accepts materially worse measured wake behavior than native Mongo
|
||||
|
||||
Current measured result:
|
||||
|
||||
- native Mongo is much better overall
|
||||
- first-completion latency stays acceptable, but steady throughput and idle-drain behavior become much worse
|
||||
- Redis removes the main measured advantage of the native Mongo wake path
|
||||
|
||||
## Timer And Delayed-Signal Guidance
|
||||
|
||||
Timers remain durable in the selected backend.
|
||||
|
||||
That means:
|
||||
|
||||
- Oracle timers remain durable in AQ
|
||||
- PostgreSQL timers remain durable in PostgreSQL tables
|
||||
- Mongo timers remain durable in Mongo collections
|
||||
|
||||
Redis does not become the timer authority.
|
||||
|
||||
Current practical rule:
|
||||
|
||||
- if timer behavior is a primary concern, prefer the native signal driver for the selected backend
|
||||
|
||||
Reason:
|
||||
|
||||
- Redis wake currently optimizes wake notification, not durable due-time ownership
|
||||
- delayed messages still live in the backend store
|
||||
- due-time wake precision in Redis mode is still bounded by the driver wait policy rather than a separate Redis-native timer authority
|
||||
|
||||
## What Must Not Be Mixed
|
||||
|
||||
Do not mix durable responsibilities across systems.
|
||||
|
||||
Bad combinations:
|
||||
|
||||
- Oracle runtime state with PostgreSQL signals
|
||||
- PostgreSQL runtime state with Redis as the durable signal queue
|
||||
- Mongo runtime state with Oracle scheduling
|
||||
- one backend for runtime state and another backend for projections
|
||||
|
||||
Use one backend profile per deployment.
|
||||
|
||||
The only supported cross-system split is:
|
||||
|
||||
- durable backend
|
||||
- optional Redis wake driver
|
||||
|
||||
## Operational Decision Matrix
|
||||
|
||||
| Goal | Recommended profile |
|
||||
| --- | --- |
|
||||
| strongest production default today | `Oracle + Native` |
|
||||
| best non-Oracle relational target | `Postgres + Native` |
|
||||
| one uniform wake substrate across relational backends | `Postgres + Redis` |
|
||||
| highest measured synthetic wake and throughput | `Mongo + Native` |
|
||||
| Mongo with forced Redis standardization | `Mongo + Redis`, only if policy requires it |
|
||||
| Oracle with forced Redis standardization | `Oracle + Redis`, only if policy requires it |
|
||||
|
||||
## Configuration Surface
|
||||
|
||||
### Oracle + Native
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowBackend": {
|
||||
"Provider": "Oracle"
|
||||
},
|
||||
"WorkflowSignalDriver": {
|
||||
"Provider": "Native"
|
||||
},
|
||||
"WorkflowAq": {
|
||||
"QueueOwner": "SRD_WFKLW",
|
||||
"SignalQueueName": "WF_SIGNAL_Q",
|
||||
"ScheduleQueueName": "WF_SCHEDULE_Q",
|
||||
"DeadLetterQueueName": "WF_DLQ_Q"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Oracle + Redis
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowBackend": {
|
||||
"Provider": "Oracle"
|
||||
},
|
||||
"WorkflowSignalDriver": {
|
||||
"Provider": "Redis",
|
||||
"Redis": {
|
||||
"ChannelName": "serdica:workflow:signals",
|
||||
"BlockingWaitSeconds": 5
|
||||
}
|
||||
},
|
||||
"WorkflowAq": {
|
||||
"QueueOwner": "SRD_WFKLW",
|
||||
"SignalQueueName": "WF_SIGNAL_Q",
|
||||
"ScheduleQueueName": "WF_SCHEDULE_Q",
|
||||
"DeadLetterQueueName": "WF_DLQ_Q"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Postgres + Native
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowBackend": {
|
||||
"Provider": "Postgres",
|
||||
"Postgres": {
|
||||
"ConnectionStringName": "WorkflowPostgres",
|
||||
"SchemaName": "srd_wfklw",
|
||||
"ClaimBatchSize": 32,
|
||||
"BlockingWaitSeconds": 30
|
||||
}
|
||||
},
|
||||
"WorkflowSignalDriver": {
|
||||
"Provider": "Native"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Postgres + Redis
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowBackend": {
|
||||
"Provider": "Postgres",
|
||||
"Postgres": {
|
||||
"ConnectionStringName": "WorkflowPostgres",
|
||||
"SchemaName": "srd_wfklw"
|
||||
}
|
||||
},
|
||||
"WorkflowSignalDriver": {
|
||||
"Provider": "Redis",
|
||||
"Redis": {
|
||||
"ChannelName": "serdica:workflow:signals",
|
||||
"BlockingWaitSeconds": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Mongo + Native
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowBackend": {
|
||||
"Provider": "Mongo",
|
||||
"Mongo": {
|
||||
"ConnectionStringName": "WorkflowMongo",
|
||||
"DatabaseName": "serdica_workflow_store",
|
||||
"BlockingWaitSeconds": 30
|
||||
}
|
||||
},
|
||||
"WorkflowSignalDriver": {
|
||||
"Provider": "Native"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Mongo + Redis
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowBackend": {
|
||||
"Provider": "Mongo",
|
||||
"Mongo": {
|
||||
"ConnectionStringName": "WorkflowMongo",
|
||||
"DatabaseName": "serdica_workflow_store"
|
||||
}
|
||||
},
|
||||
"WorkflowSignalDriver": {
|
||||
"Provider": "Redis",
|
||||
"Redis": {
|
||||
"ChannelName": "serdica:workflow:signals",
|
||||
"BlockingWaitSeconds": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Plugin Registration Rule
|
||||
|
||||
The host stays backend-neutral.
|
||||
|
||||
That means the selected backend and optional Redis wake plugin must be present in `PluginsConfig:PluginsOrder`.
|
||||
|
||||
Relevant plugin categories are:
|
||||
|
||||
- Oracle backend plugin
|
||||
- PostgreSQL backend plugin
|
||||
- MongoDB backend plugin
|
||||
- Redis wake-driver plugin
|
||||
|
||||
If Redis is not configured, do not register it just because it exists.
|
||||
|
||||
## Recommended Decision Order
|
||||
|
||||
When choosing a deployment profile, use this order:
|
||||
|
||||
1. choose the durable backend based on correctness and platform ownership
|
||||
2. choose the native signal driver first
|
||||
3. add Redis only if there is a clear topology or operational reason
|
||||
4. validate the choice against the six-profile matrix, not assumption
|
||||
|
||||
## Current Bottom Line
|
||||
|
||||
Today the practical recommendation is:
|
||||
|
||||
- `Oracle + Native` for the strongest default production backend
|
||||
- `Postgres + Native` for the best relational portability target
|
||||
- `Mongo + Native` only when Mongo operational assumptions are explicitly accepted
|
||||
- `Redis` as an optional wake standardization layer, not as the default performance answer
|
||||
|
||||
Reference in New Issue
Block a user