Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into standalone StellaOps.Workflow.* libraries targeting net10.0. Libraries (14): - Contracts, Abstractions (compiler, decompiler, expression runtime) - Engine (execution, signaling, scheduling, projections, hosted services) - ElkSharp (generic graph layout algorithm) - Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg - Signaling.Redis, Signaling.OracleAq - DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle WebService: ASP.NET Core Minimal API with 22 endpoints Tests (8 projects, 109 tests pass): - Engine.Tests (105 pass), WebService.Tests (4 E2E pass) - Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests - Signaling.Redis.Tests, IntegrationTests.Shared Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
494 lines
13 KiB
Markdown
494 lines
13 KiB
Markdown
# 15. Backend And Signal Driver Usage
|
|
|
|
## Purpose
|
|
|
|
This document turns the current backend implementation and measured six-profile matrix into operating guidance.
|
|
|
|
It answers three practical questions:
|
|
|
|
1. which backend should be the durable workflow system of record
|
|
2. whether the signal driver should stay native or use Redis
|
|
3. when a given combination should or should not be used
|
|
|
|
The reference comparison data comes from:
|
|
|
|
- [13-backend-comparison-2026-03-17.md](13-backend-comparison-2026-03-17.md)
|
|
- [14-signal-driver-backend-matrix-2026-03-17.md](14-signal-driver-backend-matrix-2026-03-17.md)
|
|
|
|
## Two Separate Choices
|
|
|
|
There are two distinct infrastructure choices in the current engine.
|
|
|
|
### 1. Backend
|
|
|
|
The backend is the durable correctness layer.
|
|
|
|
It owns:
|
|
|
|
- runtime state
|
|
- projections
|
|
- durable signal persistence
|
|
- delayed signal persistence
|
|
- dead-letter persistence
|
|
- mutation transaction boundary
|
|
|
|
The configured backend lives under:
|
|
|
|
- `WorkflowBackend:Provider`
|
|
|
|
Supported values are defined by the engine backend identifiers.
|
|
|
|
Current values:
|
|
|
|
- `Oracle`
|
|
- `Postgres`
|
|
- `Mongo`
|
|
|
|
### 2. Signal Driver
|
|
|
|
The signal driver is the wake mechanism.
|
|
|
|
It owns:
|
|
|
|
- wake notification delivery
|
|
- receive wait behavior
|
|
- claim loop entry path
|
|
|
|
It does not own correctness.
|
|
|
|
The configured signal driver lives under:
|
|
|
|
- `WorkflowSignalDriver:Provider`
|
|
|
|
Supported values are defined by the engine signal-driver identifiers.
|
|
|
|
Current values:
|
|
|
|
- `Native`
|
|
- `Redis`
|
|
|
|
## Core Rule
|
|
|
|
Redis is a wake driver, not a durable workflow queue.
|
|
|
|
That means:
|
|
|
|
1. the selected backend always remains the durable source of truth
|
|
2. runtime state and durable signals commit in the backend transaction boundary
|
|
3. Redis only publishes wake hints after commit
|
|
4. workers always claim from the durable backend store
|
|
|
|
Do not design or describe Redis as the place where workflow correctness lives.
|
|
|
|
## Supported Profiles
|
|
|
|
| Profile | Durable correctness layer | Wake path | Current recommendation |
|
|
| --- | --- | --- | --- |
|
|
| `Oracle + Native` | Oracle + AQ | AQ dequeue | Default production profile |
|
|
| `Oracle + Redis` | Oracle + AQ | Redis wake, AQ claim | Supported, not preferred |
|
|
| `Postgres + Native` | PostgreSQL tables | PostgreSQL native wake | Best relational portability profile |
|
|
| `Postgres + Redis` | PostgreSQL tables | Redis wake, PostgreSQL claim | Supported, optional |
|
|
| `Mongo + Native` | Mongo collections | Mongo change streams | Fastest measured profile, with operational caveats |
|
|
| `Mongo + Redis` | Mongo collections | Redis wake, Mongo claim | Supported, generally not recommended |
|
|
|
|
## How To Read The Performance Data
|
|
|
|
The six-profile matrix contains both real resume timing and benchmark drain policy timing.
|
|
|
|
Use these rows as primary decision inputs:
|
|
|
|
- `Signal to first completion avg`
|
|
- `Throughput`
|
|
|
|
Treat these rows as secondary:
|
|
|
|
- `Signal to completion avg`
|
|
- `Drain-to-idle overhang avg`
|
|
|
|
Reason:
|
|
|
|
- `Signal to first completion avg` measures actual wake and resume speed
|
|
- `Signal to completion avg` also includes empty-queue drain behavior
|
|
- `Drain-to-idle overhang avg` explains how much of the mixed latency is benchmark overhang, not real resume work
|
|
|
|
The current matrix shows that clearly:
|
|
|
|
| Metric | Oracle | PostgreSQL | Mongo | Oracle+Redis | PostgreSQL+Redis | Mongo+Redis |
|
|
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
|
|
| Signal to first completion avg ms | 76.15 | 37.56 | 55.06 | 81.46 | 31.77 | 40.88 |
|
|
| Throughput ops/s | 24.17 | 26.28 | 119.51 | 21.88 | 25.51 | 25.14 |
|
|
| Drain-to-idle overhang avg ms | 2909.65 | 3047.65 | 57.86 | 3031.66 | 3033.61 | 3036.85 |
|
|
|
|
Interpretation:
|
|
|
|
- native Mongo is fast because the native change-stream wake path also has low empty-receive overhang
|
|
- PostgreSQL native and PostgreSQL plus Redis are close in real resume speed
|
|
- Oracle native remains slightly better than Oracle plus Redis
|
|
- Mongo plus Redis loses most of native Mongo's advantage because Redis mode reintroduces the empty-wait overhang
|
|
|
|
## Recommended Default Choices
|
|
|
|
### Default Production Choice Today
|
|
|
|
Use `Oracle + Native`.
|
|
|
|
Use it when:
|
|
|
|
- Oracle is already the platform system of record
|
|
- strongest validated correctness and restart behavior matter more than portability
|
|
- AQ is available and operationally acceptable
|
|
- timer precision and native transactional coupling are important
|
|
|
|
Why:
|
|
|
|
- it has the strongest hostile-condition coverage
|
|
- it remains the semantic reference implementation
|
|
- it keeps one native durable stack for state, signals, and scheduling
|
|
|
|
### Best Relational Non-Oracle Choice
|
|
|
|
Use `Postgres + Native`.
|
|
|
|
Use it when:
|
|
|
|
- a relational backend is required
|
|
- Oracle is not desired
|
|
- you want the cleanest portability path
|
|
- you want performance close to Oracle with simpler infrastructure
|
|
|
|
Why:
|
|
|
|
- it is the strongest non-Oracle backend in the current relational comparison
|
|
- native PostgreSQL wake is already competitive with Redis in the current measurements
|
|
- it keeps one backend-native operational story
|
|
|
|
### Highest Measured Synthetic Throughput Choice
|
|
|
|
Use `Mongo + Native` only when its operational assumptions are acceptable.
|
|
|
|
Use it when:
|
|
|
|
- throughput and low wake latency matter strongly
|
|
- Mongo replica-set transactions are already an accepted platform dependency
|
|
- the team is comfortable operating change streams and Mongo-specific failure modes
|
|
|
|
Why:
|
|
|
|
- it is currently the fastest measured profile
|
|
- its native wake path avoids the large empty-wait overhang seen in the other measured paths
|
|
|
|
Do not treat this as the universal default.
|
|
|
|
Mongo is fast in the current engine workload, but its operational model is still less conservative than the relational profiles.
|
|
|
|
## When Redis Should Be Used
|
|
|
|
Redis should be selected for operational topology reasons, not by default as a performance assumption.
|
|
|
|
Good reasons to use Redis:
|
|
|
|
- one shared wake substrate is required across multiple backend profiles
|
|
- the deployment already standardizes on Redis for fan-out and worker wake infrastructure
|
|
- you want the backend-native wake path disabled intentionally and replaced by one uniform wake mechanism
|
|
|
|
Weak reasons to use Redis:
|
|
|
|
- "Redis is always faster"
|
|
- "Redis should hold the durable signal queue"
|
|
- "Redis should replace the backend transaction boundary"
|
|
|
|
Those are not valid design assumptions for this engine.
|
|
|
|
## Profile-By-Profile Guidance
|
|
|
|
### Oracle + Native
|
|
|
|
Use when:
|
|
|
|
- Oracle is the chosen workflow backend
|
|
- AQ is available
|
|
- you want the strongest native transactional semantics
|
|
|
|
Do not switch away from it just to standardize on Redis.
|
|
|
|
Current measured result:
|
|
|
|
- native Oracle is slightly better than Oracle plus Redis on both first-completion latency and throughput
|
|
|
|
### Oracle + Redis
|
|
|
|
Use only when:
|
|
|
|
- Oracle remains the durable backend
|
|
- Redis is required as a uniform wake topology across the environment
|
|
- the small performance loss is acceptable
|
|
|
|
Do not use it as the default Oracle profile.
|
|
|
|
Current measured result:
|
|
|
|
- it works correctly
|
|
- it is slower than native Oracle
|
|
- it does not improve timer behavior today
|
|
|
|
### Postgres + Native
|
|
|
|
Use as the first portability target when leaving Oracle.
|
|
|
|
Use when:
|
|
|
|
- you want a relational durable store
|
|
- you want the cleanest alternative to Oracle
|
|
- you want the simplest operational story for PostgreSQL
|
|
|
|
This should be the default PostgreSQL profile.
|
|
|
|
### Postgres + Redis
|
|
|
|
Use when:
|
|
|
|
- PostgreSQL is the durable backend
|
|
- a shared Redis wake topology is required
|
|
- a nearly flat performance profile versus native PostgreSQL is acceptable
|
|
|
|
Do not assume it is a speed upgrade.
|
|
|
|
Current measured result:
|
|
|
|
- it is very close to native PostgreSQL
|
|
- it is not a compelling performance win on its own
|
|
|
|
### Mongo + Native
|
|
|
|
Use when:
|
|
|
|
- MongoDB is an accepted transactional system of record for workflow runtime state
|
|
- replica-set transactions are available
|
|
- the team accepts Mongo operational ownership
|
|
|
|
This should be the default Mongo profile.
|
|
|
|
### Mongo + Redis
|
|
|
|
Avoid as the normal Mongo profile.
|
|
|
|
Use only when:
|
|
|
|
- Mongo must remain the durable backend
|
|
- Redis wake standardization is mandatory for the deployment
|
|
- the team accepts materially worse measured wake behavior than native Mongo
|
|
|
|
Current measured result:
|
|
|
|
- native Mongo is much better overall
|
|
- first-completion latency stays acceptable, but steady throughput and idle-drain behavior become much worse
|
|
- Redis removes the main measured advantage of the native Mongo wake path
|
|
|
|
## Timer And Delayed-Signal Guidance
|
|
|
|
Timers remain durable in the selected backend.
|
|
|
|
That means:
|
|
|
|
- Oracle timers remain durable in AQ
|
|
- PostgreSQL timers remain durable in PostgreSQL tables
|
|
- Mongo timers remain durable in Mongo collections
|
|
|
|
Redis does not become the timer authority.
|
|
|
|
Current practical rule:
|
|
|
|
- if timer behavior is a primary concern, prefer the native signal driver for the selected backend
|
|
|
|
Reason:
|
|
|
|
- Redis wake currently optimizes wake notification, not durable due-time ownership
|
|
- delayed messages still live in the backend store
|
|
- due-time wake precision in Redis mode is still bounded by the driver wait policy rather than a separate Redis-native timer authority
|
|
|
|
## What Must Not Be Mixed
|
|
|
|
Do not mix durable responsibilities across systems.
|
|
|
|
Bad combinations:
|
|
|
|
- Oracle runtime state with PostgreSQL signals
|
|
- PostgreSQL runtime state with Redis as the durable signal queue
|
|
- Mongo runtime state with Oracle scheduling
|
|
- one backend for runtime state and another backend for projections
|
|
|
|
Use one backend profile per deployment.
|
|
|
|
The only supported cross-system split is:
|
|
|
|
- durable backend
|
|
- optional Redis wake driver
|
|
|
|
## Operational Decision Matrix
|
|
|
|
| Goal | Recommended profile |
|
|
| --- | --- |
|
|
| strongest production default today | `Oracle + Native` |
|
|
| best non-Oracle relational target | `Postgres + Native` |
|
|
| one uniform wake substrate across relational backends | `Postgres + Redis` |
|
|
| highest measured synthetic wake and throughput | `Mongo + Native` |
|
|
| Mongo with forced Redis standardization | `Mongo + Redis`, only if policy requires it |
|
|
| Oracle with forced Redis standardization | `Oracle + Redis`, only if policy requires it |
|
|
|
|
## Configuration Surface
|
|
|
|
### Oracle + Native
|
|
|
|
```json
|
|
{
|
|
"WorkflowBackend": {
|
|
"Provider": "Oracle"
|
|
},
|
|
"WorkflowSignalDriver": {
|
|
"Provider": "Native"
|
|
},
|
|
"WorkflowAq": {
|
|
"QueueOwner": "SRD_WFKLW",
|
|
"SignalQueueName": "WF_SIGNAL_Q",
|
|
"ScheduleQueueName": "WF_SCHEDULE_Q",
|
|
"DeadLetterQueueName": "WF_DLQ_Q"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Oracle + Redis
|
|
|
|
```json
|
|
{
|
|
"WorkflowBackend": {
|
|
"Provider": "Oracle"
|
|
},
|
|
"WorkflowSignalDriver": {
|
|
"Provider": "Redis",
|
|
"Redis": {
|
|
"ChannelName": "serdica:workflow:signals",
|
|
"BlockingWaitSeconds": 5
|
|
}
|
|
},
|
|
"WorkflowAq": {
|
|
"QueueOwner": "SRD_WFKLW",
|
|
"SignalQueueName": "WF_SIGNAL_Q",
|
|
"ScheduleQueueName": "WF_SCHEDULE_Q",
|
|
"DeadLetterQueueName": "WF_DLQ_Q"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Postgres + Native
|
|
|
|
```json
|
|
{
|
|
"WorkflowBackend": {
|
|
"Provider": "Postgres",
|
|
"Postgres": {
|
|
"ConnectionStringName": "WorkflowPostgres",
|
|
"SchemaName": "srd_wfklw",
|
|
"ClaimBatchSize": 32,
|
|
"BlockingWaitSeconds": 30
|
|
}
|
|
},
|
|
"WorkflowSignalDriver": {
|
|
"Provider": "Native"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Postgres + Redis
|
|
|
|
```json
|
|
{
|
|
"WorkflowBackend": {
|
|
"Provider": "Postgres",
|
|
"Postgres": {
|
|
"ConnectionStringName": "WorkflowPostgres",
|
|
"SchemaName": "srd_wfklw"
|
|
}
|
|
},
|
|
"WorkflowSignalDriver": {
|
|
"Provider": "Redis",
|
|
"Redis": {
|
|
"ChannelName": "serdica:workflow:signals",
|
|
"BlockingWaitSeconds": 5
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Mongo + Native
|
|
|
|
```json
|
|
{
|
|
"WorkflowBackend": {
|
|
"Provider": "Mongo",
|
|
"Mongo": {
|
|
"ConnectionStringName": "WorkflowMongo",
|
|
"DatabaseName": "serdica_workflow_store",
|
|
"BlockingWaitSeconds": 30
|
|
}
|
|
},
|
|
"WorkflowSignalDriver": {
|
|
"Provider": "Native"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Mongo + Redis
|
|
|
|
```json
|
|
{
|
|
"WorkflowBackend": {
|
|
"Provider": "Mongo",
|
|
"Mongo": {
|
|
"ConnectionStringName": "WorkflowMongo",
|
|
"DatabaseName": "serdica_workflow_store"
|
|
}
|
|
},
|
|
"WorkflowSignalDriver": {
|
|
"Provider": "Redis",
|
|
"Redis": {
|
|
"ChannelName": "serdica:workflow:signals",
|
|
"BlockingWaitSeconds": 5
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Plugin Registration Rule
|
|
|
|
The host stays backend-neutral.
|
|
|
|
That means the selected backend and optional Redis wake plugin must be present in `PluginsConfig:PluginsOrder`.
|
|
|
|
Relevant plugin categories are:
|
|
|
|
- Oracle backend plugin
|
|
- PostgreSQL backend plugin
|
|
- MongoDB backend plugin
|
|
- Redis wake-driver plugin
|
|
|
|
If Redis is not configured, do not register it just because it exists.
|
|
|
|
## Recommended Decision Order
|
|
|
|
When choosing a deployment profile, use this order:
|
|
|
|
1. choose the durable backend based on correctness and platform ownership
|
|
2. choose the native signal driver first
|
|
3. add Redis only if there is a clear topology or operational reason
|
|
4. validate the choice against the six-profile matrix, not assumption
|
|
|
|
## Current Bottom Line
|
|
|
|
Today the practical recommendation is:
|
|
|
|
- `Oracle + Native` for the strongest default production backend
|
|
- `Postgres + Native` for the best relational portability target
|
|
- `Mongo + Native` only when Mongo operational assumptions are explicitly accepted
|
|
- `Redis` as an optional wake standardization layer, not as the default performance answer
|
|
|