Files
git.stella-ops.org/docs/workflow/engine/15-backend-and-signal-driver-usage.md
master f5b5f24d95 Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects
Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into
standalone StellaOps.Workflow.* libraries targeting net10.0.

Libraries (14):
- Contracts, Abstractions (compiler, decompiler, expression runtime)
- Engine (execution, signaling, scheduling, projections, hosted services)
- ElkSharp (generic graph layout algorithm)
- Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg
- Signaling.Redis, Signaling.OracleAq
- DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle

WebService: ASP.NET Core Minimal API with 22 endpoints

Tests (8 projects, 109 tests pass):
- Engine.Tests (105 pass), WebService.Tests (4 E2E pass)
- Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests
- Signaling.Redis.Tests, IntegrationTests.Shared

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:14:44 +02:00

13 KiB

15. Backend And Signal Driver Usage

Purpose

This document turns the current backend implementation and measured six-profile matrix into operating guidance.

It answers three practical questions:

  1. which backend should be the durable workflow system of record
  2. whether the signal driver should stay native or use Redis
  3. when a given combination should or should not be used

The reference comparison data comes from:

Two Separate Choices

There are two distinct infrastructure choices in the current engine.

1. Backend

The backend is the durable correctness layer.

It owns:

  • runtime state
  • projections
  • durable signal persistence
  • delayed signal persistence
  • dead-letter persistence
  • mutation transaction boundary

The configured backend lives under:

  • WorkflowBackend:Provider

Supported values are defined by the engine backend identifiers.

Current values:

  • Oracle
  • Postgres
  • Mongo

2. Signal Driver

The signal driver is the wake mechanism.

It owns:

  • wake notification delivery
  • receive wait behavior
  • claim loop entry path

It does not own correctness.

The configured signal driver lives under:

  • WorkflowSignalDriver:Provider

Supported values are defined by the engine signal-driver identifiers.

Current values:

  • Native
  • Redis

Core Rule

Redis is a wake driver, not a durable workflow queue.

That means:

  1. the selected backend always remains the durable source of truth
  2. runtime state and durable signals commit in the backend transaction boundary
  3. Redis only publishes wake hints after commit
  4. workers always claim from the durable backend store

Do not design or describe Redis as the place where workflow correctness lives.

Supported Profiles

Profile Durable correctness layer Wake path Current recommendation
Oracle + Native Oracle + AQ AQ dequeue Default production profile
Oracle + Redis Oracle + AQ Redis wake, AQ claim Supported, not preferred
Postgres + Native PostgreSQL tables PostgreSQL native wake Best relational portability profile
Postgres + Redis PostgreSQL tables Redis wake, PostgreSQL claim Supported, optional
Mongo + Native Mongo collections Mongo change streams Fastest measured profile, with operational caveats
Mongo + Redis Mongo collections Redis wake, Mongo claim Supported, generally not recommended

How To Read The Performance Data

The six-profile matrix contains both real resume timing and benchmark drain policy timing.

Use these rows as primary decision inputs:

  • Signal to first completion avg
  • Throughput

Treat these rows as secondary:

  • Signal to completion avg
  • Drain-to-idle overhang avg

Reason:

  • Signal to first completion avg measures actual wake and resume speed
  • Signal to completion avg also includes empty-queue drain behavior
  • Drain-to-idle overhang avg explains how much of the mixed latency is benchmark overhang, not real resume work

The current matrix shows that clearly:

Metric Oracle PostgreSQL Mongo Oracle+Redis PostgreSQL+Redis Mongo+Redis
Signal to first completion avg ms 76.15 37.56 55.06 81.46 31.77 40.88
Throughput ops/s 24.17 26.28 119.51 21.88 25.51 25.14
Drain-to-idle overhang avg ms 2909.65 3047.65 57.86 3031.66 3033.61 3036.85

Interpretation:

  • native Mongo is fast because the native change-stream wake path also has low empty-receive overhang
  • PostgreSQL native and PostgreSQL plus Redis are close in real resume speed
  • Oracle native remains slightly better than Oracle plus Redis
  • Mongo plus Redis loses most of native Mongo's advantage because Redis mode reintroduces the empty-wait overhang

Default Production Choice Today

Use Oracle + Native.

Use it when:

  • Oracle is already the platform system of record
  • strongest validated correctness and restart behavior matter more than portability
  • AQ is available and operationally acceptable
  • timer precision and native transactional coupling are important

Why:

  • it has the strongest hostile-condition coverage
  • it remains the semantic reference implementation
  • it keeps one native durable stack for state, signals, and scheduling

Best Relational Non-Oracle Choice

Use Postgres + Native.

Use it when:

  • a relational backend is required
  • Oracle is not desired
  • you want the cleanest portability path
  • you want performance close to Oracle with simpler infrastructure

Why:

  • it is the strongest non-Oracle backend in the current relational comparison
  • native PostgreSQL wake is already competitive with Redis in the current measurements
  • it keeps one backend-native operational story

Highest Measured Synthetic Throughput Choice

Use Mongo + Native only when its operational assumptions are acceptable.

Use it when:

  • throughput and low wake latency matter strongly
  • Mongo replica-set transactions are already an accepted platform dependency
  • the team is comfortable operating change streams and Mongo-specific failure modes

Why:

  • it is currently the fastest measured profile
  • its native wake path avoids the large empty-wait overhang seen in the other measured paths

Do not treat this as the universal default.

Mongo is fast in the current engine workload, but its operational model is still less conservative than the relational profiles.

When Redis Should Be Used

Redis should be selected for operational topology reasons, not by default as a performance assumption.

Good reasons to use Redis:

  • one shared wake substrate is required across multiple backend profiles
  • the deployment already standardizes on Redis for fan-out and worker wake infrastructure
  • you want the backend-native wake path disabled intentionally and replaced by one uniform wake mechanism

Weak reasons to use Redis:

  • "Redis is always faster"
  • "Redis should hold the durable signal queue"
  • "Redis should replace the backend transaction boundary"

Those are not valid design assumptions for this engine.

Profile-By-Profile Guidance

Oracle + Native

Use when:

  • Oracle is the chosen workflow backend
  • AQ is available
  • you want the strongest native transactional semantics

Do not switch away from it just to standardize on Redis.

Current measured result:

  • native Oracle is slightly better than Oracle plus Redis on both first-completion latency and throughput

Oracle + Redis

Use only when:

  • Oracle remains the durable backend
  • Redis is required as a uniform wake topology across the environment
  • the small performance loss is acceptable

Do not use it as the default Oracle profile.

Current measured result:

  • it works correctly
  • it is slower than native Oracle
  • it does not improve timer behavior today

Postgres + Native

Use as the first portability target when leaving Oracle.

Use when:

  • you want a relational durable store
  • you want the cleanest alternative to Oracle
  • you want the simplest operational story for PostgreSQL

This should be the default PostgreSQL profile.

Postgres + Redis

Use when:

  • PostgreSQL is the durable backend
  • a shared Redis wake topology is required
  • a nearly flat performance profile versus native PostgreSQL is acceptable

Do not assume it is a speed upgrade.

Current measured result:

  • it is very close to native PostgreSQL
  • it is not a compelling performance win on its own

Mongo + Native

Use when:

  • MongoDB is an accepted transactional system of record for workflow runtime state
  • replica-set transactions are available
  • the team accepts Mongo operational ownership

This should be the default Mongo profile.

Mongo + Redis

Avoid as the normal Mongo profile.

Use only when:

  • Mongo must remain the durable backend
  • Redis wake standardization is mandatory for the deployment
  • the team accepts materially worse measured wake behavior than native Mongo

Current measured result:

  • native Mongo is much better overall
  • first-completion latency stays acceptable, but steady throughput and idle-drain behavior become much worse
  • Redis removes the main measured advantage of the native Mongo wake path

Timer And Delayed-Signal Guidance

Timers remain durable in the selected backend.

That means:

  • Oracle timers remain durable in AQ
  • PostgreSQL timers remain durable in PostgreSQL tables
  • Mongo timers remain durable in Mongo collections

Redis does not become the timer authority.

Current practical rule:

  • if timer behavior is a primary concern, prefer the native signal driver for the selected backend

Reason:

  • Redis wake currently optimizes wake notification, not durable due-time ownership
  • delayed messages still live in the backend store
  • due-time wake precision in Redis mode is still bounded by the driver wait policy rather than a separate Redis-native timer authority

What Must Not Be Mixed

Do not mix durable responsibilities across systems.

Bad combinations:

  • Oracle runtime state with PostgreSQL signals
  • PostgreSQL runtime state with Redis as the durable signal queue
  • Mongo runtime state with Oracle scheduling
  • one backend for runtime state and another backend for projections

Use one backend profile per deployment.

The only supported cross-system split is:

  • durable backend
  • optional Redis wake driver

Operational Decision Matrix

Goal Recommended profile
strongest production default today Oracle + Native
best non-Oracle relational target Postgres + Native
one uniform wake substrate across relational backends Postgres + Redis
highest measured synthetic wake and throughput Mongo + Native
Mongo with forced Redis standardization Mongo + Redis, only if policy requires it
Oracle with forced Redis standardization Oracle + Redis, only if policy requires it

Configuration Surface

Oracle + Native

{
  "WorkflowBackend": {
    "Provider": "Oracle"
  },
  "WorkflowSignalDriver": {
    "Provider": "Native"
  },
  "WorkflowAq": {
    "QueueOwner": "SRD_WFKLW",
    "SignalQueueName": "WF_SIGNAL_Q",
    "ScheduleQueueName": "WF_SCHEDULE_Q",
    "DeadLetterQueueName": "WF_DLQ_Q"
  }
}

Oracle + Redis

{
  "WorkflowBackend": {
    "Provider": "Oracle"
  },
  "WorkflowSignalDriver": {
    "Provider": "Redis",
    "Redis": {
      "ChannelName": "serdica:workflow:signals",
      "BlockingWaitSeconds": 5
    }
  },
  "WorkflowAq": {
    "QueueOwner": "SRD_WFKLW",
    "SignalQueueName": "WF_SIGNAL_Q",
    "ScheduleQueueName": "WF_SCHEDULE_Q",
    "DeadLetterQueueName": "WF_DLQ_Q"
  }
}

Postgres + Native

{
  "WorkflowBackend": {
    "Provider": "Postgres",
    "Postgres": {
      "ConnectionStringName": "WorkflowPostgres",
      "SchemaName": "srd_wfklw",
      "ClaimBatchSize": 32,
      "BlockingWaitSeconds": 30
    }
  },
  "WorkflowSignalDriver": {
    "Provider": "Native"
  }
}

Postgres + Redis

{
  "WorkflowBackend": {
    "Provider": "Postgres",
    "Postgres": {
      "ConnectionStringName": "WorkflowPostgres",
      "SchemaName": "srd_wfklw"
    }
  },
  "WorkflowSignalDriver": {
    "Provider": "Redis",
    "Redis": {
      "ChannelName": "serdica:workflow:signals",
      "BlockingWaitSeconds": 5
    }
  }
}

Mongo + Native

{
  "WorkflowBackend": {
    "Provider": "Mongo",
    "Mongo": {
      "ConnectionStringName": "WorkflowMongo",
      "DatabaseName": "serdica_workflow_store",
      "BlockingWaitSeconds": 30
    }
  },
  "WorkflowSignalDriver": {
    "Provider": "Native"
  }
}

Mongo + Redis

{
  "WorkflowBackend": {
    "Provider": "Mongo",
    "Mongo": {
      "ConnectionStringName": "WorkflowMongo",
      "DatabaseName": "serdica_workflow_store"
    }
  },
  "WorkflowSignalDriver": {
    "Provider": "Redis",
    "Redis": {
      "ChannelName": "serdica:workflow:signals",
      "BlockingWaitSeconds": 5
    }
  }
}

Plugin Registration Rule

The host stays backend-neutral.

That means the selected backend and optional Redis wake plugin must be present in PluginsConfig:PluginsOrder.

Relevant plugin categories are:

  • Oracle backend plugin
  • PostgreSQL backend plugin
  • MongoDB backend plugin
  • Redis wake-driver plugin

If Redis is not configured, do not register it just because it exists.

When choosing a deployment profile, use this order:

  1. choose the durable backend based on correctness and platform ownership
  2. choose the native signal driver first
  3. add Redis only if there is a clear topology or operational reason
  4. validate the choice against the six-profile matrix, not assumption

Current Bottom Line

Today the practical recommendation is:

  • Oracle + Native for the strongest default production backend
  • Postgres + Native for the best relational portability target
  • Mongo + Native only when Mongo operational assumptions are explicitly accepted
  • Redis as an optional wake standardization layer, not as the default performance answer