Files
git.stella-ops.org/docs/workflow/engine/11-postgres-performance-baseline-2026-03-17.md
master f5b5f24d95 Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects
Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into
standalone StellaOps.Workflow.* libraries targeting net10.0.

Libraries (14):
- Contracts, Abstractions (compiler, decompiler, expression runtime)
- Engine (execution, signaling, scheduling, projections, hosted services)
- ElkSharp (generic graph layout algorithm)
- Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg
- Signaling.Redis, Signaling.OracleAq
- DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle

WebService: ASP.NET Core Minimal API with 22 endpoints

Tests (8 projects, 109 tests pass):
- Engine.Tests (105 pass), WebService.Tests (4 E2E pass)
- Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests
- Signaling.Redis.Tests, IntegrationTests.Shared

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:14:44 +02:00

9.2 KiB

PostgreSQL Performance Baseline 2026-03-17

Purpose

This document captures the current PostgreSQL-backed load and performance baseline for the Serdica workflow engine. It is the reference point for later MongoDB backend comparisons and the final three-backend decision pack.

The durable machine-readable companion is 11-postgres-performance-baseline-2026-03-17.json.

Run Metadata

  • Date: 2026-03-17
  • Test command:
    • integration performance suite filtered to PostgresPerformance
  • Suite result:
    • 11/11 tests passed
    • total wall-clock time: 2 m 16 s
  • Raw artifact directory:
    • TestResults/workflow-performance/
  • PostgreSQL environment:
    • Docker image: postgres:16-alpine
    • database: workflow
    • version: PostgreSQL 16.13
    • backend: durable queue tables plus LISTEN/NOTIFY wake hints

Scenario Summary

Scenario Tier Ops Conc Duration ms Throughput/s Avg ms P95 ms Max ms
postgres-signal-roundtrip-capacity-c1 WorkflowPerfCapacity 16 1 3895.54 4.11 3738.08 3762.51 3771.10
postgres-signal-roundtrip-capacity-c4 WorkflowPerfCapacity 64 4 3700.99 17.29 3577.49 3583.70 3584.43
postgres-signal-roundtrip-capacity-c8 WorkflowPerfCapacity 128 8 3853.89 33.21 3713.31 3718.66 3719.34
postgres-signal-roundtrip-capacity-c16 WorkflowPerfCapacity 256 16 4488.07 57.04 4251.48 4287.87 4294.09
postgres-signal-roundtrip-latency-serial WorkflowPerfLatency 16 1 49290.47 0.32 3079.33 3094.94 3101.71
postgres-bulstrad-quotation-confirm-convert-to-policy-nightly WorkflowPerfNightly 12 4 3598.64 3.33 3478.52 3500.76 3503.73
postgres-delayed-burst-nightly WorkflowPerfNightly 48 1 2449.25 19.60 2096.34 2152.50 2157.39
postgres-immediate-burst-nightly WorkflowPerfNightly 120 1 1711.87 70.10 849.78 1012.13 1030.98
postgres-synthetic-external-resume-nightly WorkflowPerfNightly 36 8 4162.56 8.65 4026.50 4048.09 4049.91
postgres-bulstrad-quote-or-apl-cancel-smoke WorkflowPerfSmoke 10 4 166.99 59.88 13.51 23.87 26.35
postgres-delayed-burst-smoke WorkflowPerfSmoke 12 1 2146.89 5.59 2032.67 2050.20 2051.30
postgres-immediate-burst-smoke WorkflowPerfSmoke 24 1 341.84 70.21 176.19 197.25 197.91
postgres-signal-roundtrip-soak WorkflowPerfSoak 108 8 25121.68 4.30 4164.52 4208.42 4209.96
postgres-signal-roundtrip-throughput-parallel WorkflowPerfThroughput 96 16 3729.17 25.74 3603.54 3635.59 3649.96

Measurement Split

The synthetic signal round-trip workload is measured in three separate ways:

  • postgres-signal-roundtrip-latency-serial: one workflow at a time, one signal worker, used as the single-instance latency baseline.
  • postgres-signal-roundtrip-throughput-parallel: 96 workflows, 16-way workload concurrency, 8 signal workers, used as the steady-state throughput baseline.
  • postgres-signal-roundtrip-capacity-c*: batch-wave capacity ladder used to observe scaling and pressure points.

The useful PostgreSQL baseline is:

  • serial latency baseline: 3079.33 ms average end-to-end per workflow
  • steady throughput baseline: 25.74 ops/s with 16 workload concurrency and 8 signal workers
  • capacity c1: 4.11 ops/s; this is only the smallest batch-wave rung

Serial Latency Baseline

Phase Avg ms P95 ms Max ms
start 6.12 9.29 11.26
signalPublish 5.63 6.82 7.53
signalToCompletion 3073.20 3086.59 3090.44

Interpretation:

  • almost all serial latency is in signalToCompletion
  • workflow start is very cheap on this backend
  • external signal publication is also cheap

Steady Throughput Baseline

Phase Avg ms P95 ms Max ms
start 16.21 40.31 47.02
signalPublish 18.11 23.62 28.41
signalToCompletion 3504.24 3530.38 3531.14

Interpretation:

  • the engine sustained 25.74 ops/s in a 96-operation wave
  • end-to-end average stayed at 3603.54 ms
  • start and signal publication remained small compared to the resume path

PostgreSQL Observations

Dominant Waits

  • Client:ClientRead was the top observed wait class in 13/14 scenario artifacts.
  • The serial latency scenario had no distinct competing wait class because the measurement ran with effectively no backend concurrency.
  • On this local PostgreSQL profile the wake-up path is not the visible bottleneck; the dominant observed state is clients waiting on the next command while the engine completes work in short transactions.

Capacity Ladder

Scenario Throughput/s P95 ms Xact Commits Buffer Hits Buffer Reads Tuples Inserted Tuples Updated Tuples Deleted Top Wait
c1 4.11 3762.51 251 1654 24 48 48 16 Client:ClientRead
c4 17.29 3583.70 1080 7084 1 192 192 64 Client:ClientRead
c8 33.21 3718.66 2348 17069 0 384 384 128 Client:ClientRead
c16 57.04 4287.87 4536 40443 0 768 768 256 Client:ClientRead

Interpretation:

  • the capacity ladder scales more smoothly than the Oracle baseline on the same local machine
  • c16 is the fastest tested rung and does not yet show a hard cliff
  • the next meaningful PostgreSQL characterization step should test above c16 before declaring a saturation boundary

Transport Baselines

Scenario Throughput/s Xact Commits Buffer Hits Buffer Reads Tuples Inserted Tuples Updated Top Wait
postgres-immediate-burst-nightly 70.10 801 13207 4 570 162 Client:ClientRead
postgres-delayed-burst-nightly 19.60 269 11472 3 498 33 Client:ClientRead

Interpretation:

  • immediate transport remains much cheaper than full workflow resume
  • delayed transport is still dominated by the intentional delay window, not by raw dequeue speed
  • the very short smoke transport runs are useful for end-to-end timing, but they are too brief to rely on as the primary PostgreSQL stat sample

Business Flow Baselines

Scenario Throughput/s Avg ms Xact Commits Buffer Hits Buffer Reads Tuples Inserted Tuples Updated Top Wait
postgres-bulstrad-quote-or-apl-cancel-smoke 59.88 13.51 3 93 0 0 0 Client:ClientRead
postgres-bulstrad-quotation-confirm-convert-to-policy-nightly 3.33 3478.52 236 12028 270 546 75 Client:ClientRead

Interpretation:

  • the short Bulstrad flow is still mostly transport and orchestration overhead
  • the heavier QuotationConfirm -> ConvertToPolicy flow is a better real-workload pressure baseline because it exercises deeper projection and signal traffic

Soak Baseline

postgres-signal-roundtrip-soak completed 108 operations at concurrency 8 with:

  • throughput: 4.30 ops/s
  • average latency: 4164.52 ms
  • P95 latency: 4208.42 ms
  • 0 failures
  • 0 dead-lettered signals
  • 0 runtime conflicts
  • 0 stuck instances

PostgreSQL metrics for the soak run:

  • xact_commit: 3313
  • xact_rollback: 352
  • blks_hit: 26548
  • blks_read: 269
  • tup_inserted: 774
  • tup_updated: 339
  • tup_deleted: 108
  • top wait:
    • Client:ClientRead

What Must Stay Constant For Future Backend Comparisons

When MongoDB is benchmarked and the final Oracle/PostgreSQL/MongoDB comparison is produced, keep these constant:

  • same scenario names
  • same operation counts
  • same concurrency levels
  • same worker counts for signal drain
  • same synthetic workflow definitions
  • same Bulstrad workflow families
  • same correctness assertions

Compare these dimensions directly:

  • throughput per second
  • latency average, P95, P99, and max
  • phase latency summaries for start, signal publish, and signal-to-completion on the synthetic signal round-trip workload
  • failures, dead letters, runtime conflicts, and stuck instances
  • commit count analogs
  • row, tuple, or document movement analogs
  • read-hit or read-amplification analogs
  • dominant waits, locks, or wake-path contention classes

First Sizing Note

On this local PostgreSQL baseline:

  • immediate queue burst handling is comfortably above the small workflow tiers; the current nightly transport baseline is 70.10 ops/s
  • the separated steady throughput baseline is 25.74 ops/s, ahead of the current Oracle baseline on the same synthetic workflow profile
  • the ladder through c16 still looks healthy and does not yet expose a sharp pressure rung
  • the dominant observed backend state is client read waiting, which suggests the next tuning conversation should focus on queue claim cadence, notification wake-ups, and transaction shape rather than on an obvious storage stall

This is a baseline, not a production commitment. MongoDB should now reuse the same scenarios and produce the same summary tables before any backend recommendation is declared.