Files
git.stella-ops.org/docs/workflow/engine/12-mongo-performance-baseline-2026-03-17.md
master f5b5f24d95 Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects
Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into
standalone StellaOps.Workflow.* libraries targeting net10.0.

Libraries (14):
- Contracts, Abstractions (compiler, decompiler, expression runtime)
- Engine (execution, signaling, scheduling, projections, hosted services)
- ElkSharp (generic graph layout algorithm)
- Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg
- Signaling.Redis, Signaling.OracleAq
- DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle

WebService: ASP.NET Core Minimal API with 22 endpoints

Tests (8 projects, 109 tests pass):
- Engine.Tests (105 pass), WebService.Tests (4 E2E pass)
- Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests
- Signaling.Redis.Tests, IntegrationTests.Shared

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:14:44 +02:00

9.1 KiB

MongoDB Performance Baseline 2026-03-17

Purpose

This document captures the current MongoDB-backed load and performance baseline for the Serdica workflow engine. It completes the per-backend baseline set that will feed the final three-backend comparison.

The durable machine-readable companion is 12-mongo-performance-baseline-2026-03-17.json.

Run Metadata

  • Date: 2026-03-17
  • Test command:
    • integration performance suite filtered to MongoPerformance
  • Suite result:
    • 14/14 tests passed
    • total wall-clock time: 48 s
  • Raw artifact directory:
    • TestResults/workflow-performance/
  • MongoDB environment:
    • Docker image: mongo:7.0
    • topology: single-node replica set
    • version: 7.0.30
    • backend: durable collections plus change-stream wake hints

Scenario Summary

Scenario Tier Ops Conc Duration ms Throughput/s Avg ms P95 ms Max ms
mongo-signal-roundtrip-capacity-c1 WorkflowPerfCapacity 16 1 2259.99 7.08 1394.99 1576.55 2063.72
mongo-signal-roundtrip-capacity-c4 WorkflowPerfCapacity 64 4 1668.99 38.35 1244.81 1472.61 1527.26
mongo-signal-roundtrip-capacity-c8 WorkflowPerfCapacity 128 8 1938.12 66.04 1477.49 1743.52 1757.88
mongo-signal-roundtrip-capacity-c16 WorkflowPerfCapacity 256 16 3728.88 68.65 3203.94 3507.95 3527.96
mongo-signal-roundtrip-latency-serial WorkflowPerfLatency 16 1 1675.77 9.55 97.88 149.20 324.02
mongo-bulstrad-quotation-confirm-convert-to-policy-nightly WorkflowPerfNightly 12 4 1108.42 10.83 790.30 947.21 963.16
mongo-delayed-burst-nightly WorkflowPerfNightly 48 1 2881.66 16.66 2142.14 2265.15 2281.04
mongo-immediate-burst-nightly WorkflowPerfNightly 120 1 2598.57 46.18 1148.06 1530.49 1575.98
mongo-synthetic-external-resume-nightly WorkflowPerfNightly 36 8 976.73 36.86 633.82 770.10 772.71
mongo-bulstrad-quote-or-apl-cancel-smoke WorkflowPerfSmoke 10 4 425.81 23.48 124.35 294.76 295.32
mongo-delayed-burst-smoke WorkflowPerfSmoke 12 1 2416.23 4.97 2040.30 2079.79 2084.03
mongo-immediate-burst-smoke WorkflowPerfSmoke 24 1 747.36 32.11 264.14 339.42 400.99
mongo-signal-roundtrip-soak WorkflowPerfSoak 108 8 2267.91 47.62 322.40 550.50 572.73
mongo-signal-roundtrip-throughput-parallel WorkflowPerfThroughput 96 16 1258.48 76.28 1110.94 1121.22 1127.11

Measurement Split

The synthetic signal round-trip workload is measured in three separate ways:

  • mongo-signal-roundtrip-latency-serial: one workflow at a time, one signal worker, used as the single-instance latency baseline.
  • mongo-signal-roundtrip-throughput-parallel: 96 workflows, 16-way workload concurrency, 8 signal workers, used as the steady-state throughput baseline.
  • mongo-signal-roundtrip-capacity-c*: batch-wave capacity ladder used to observe scaling and pressure points.

The useful MongoDB baseline is:

  • serial latency baseline: 97.88 ms average end-to-end per workflow
  • steady throughput baseline: 76.28 ops/s with 16 workload concurrency and 8 signal workers
  • capacity c1: 7.08 ops/s; this is only the smallest batch-wave rung

Serial Latency Baseline

Phase Avg ms P95 ms Max ms
start 26.34 79.35 251.36
signalPublish 8.17 10.75 12.17
signalToCompletion 71.54 77.94 79.48

Interpretation:

  • serial end-to-end latency is far lower than the Oracle and PostgreSQL baselines on this local setup
  • most of the work remains in signal-to-completion, but the absolute time is much smaller
  • workflow start is still the most variable of the three measured phases

Steady Throughput Baseline

Phase Avg ms P95 ms Max ms
start 20.88 28.64 33.67
signalPublish 16.01 20.90 22.71
signalToCompletion 988.88 1000.12 1004.92

Interpretation:

  • the engine sustained 76.28 ops/s in a 96-operation wave
  • end-to-end average stayed at 1110.94 ms
  • the dominant cost is still resume processing, but Mongo remains materially faster on this synthetic profile

MongoDB Observations

Dominant Waits

  • no durable current-op contention class dominated these runs; every scenario finished without a stable top wait entry
  • this means the current Mongo baseline should be read primarily through normalized workflow metrics and the Mongo-specific counter set, not through a wait-event headline
  • the backend bug exposed by the first perf pass was not storage contention; it was correctness:
    • empty-queue receive had to become bounded
    • collection bootstrap had to be explicit before transactional concurrency

Capacity Ladder

Scenario Throughput/s P95 ms Commands Inserts Updates Deletes Docs Returned Docs Inserted Docs Updated Docs Deleted
c1 7.08 1576.55 183 48 48 16 80 48 48 16
c4 38.35 1472.61 684 192 192 64 320 192 192 64
c8 66.04 1743.52 1349 384 384 128 640 384 384 128
c16 68.65 3507.95 2515 768 768 256 1280 768 768 256

Interpretation:

  • Mongo scales very aggressively through c8
  • c16 is still the fastest rung, but it is also where latency expands sharply relative to c8
  • the first visible pressure point is therefore c16, even though throughput still rises slightly

Transport Baselines

Scenario Throughput/s Commands Inserts Deletes Network In Network Out
mongo-immediate-burst-nightly 46.18 379 120 120 277307 296277
mongo-delayed-burst-nightly 16.66 1052 48 48 507607 450004

Interpretation:

  • immediate transport is still much cheaper than full workflow resume
  • delayed transport carries more command and network chatter because the wake path repeatedly checks due work through the change-stream plus due-time model

Business Flow Baselines

Scenario Throughput/s Avg ms Commands Queries Inserts Updates Deletes Tx Started Tx Committed
mongo-bulstrad-quote-or-apl-cancel-smoke 23.48 124.35 54 45 20 0 0 10 10
mongo-bulstrad-quotation-confirm-convert-to-policy-nightly 10.83 790.30 189 151 96 48 12 36 36

Interpretation:

  • the short Bulstrad flow is still cheap enough that transport and projection movement dominate
  • the heavier QuotationConfirm -> ConvertToPolicy -> PdfGenerator path stays comfortably sub-second on this local Mongo baseline

Soak Baseline

mongo-signal-roundtrip-soak completed 108 operations at concurrency 8 with:

  • throughput: 47.62 ops/s
  • average latency: 322.40 ms
  • P95 latency: 550.50 ms
  • 0 failures
  • 0 dead-lettered signals
  • 0 runtime conflicts
  • 0 stuck instances

MongoDB metrics for the soak run:

  • opcounters.command: 2264
  • opcounters.insert: 324
  • opcounters.update: 324
  • opcounters.delete: 108
  • metrics.document.returned: 540
  • metrics.document.inserted: 324
  • metrics.document.updated: 324
  • metrics.document.deleted: 108
  • transactions.totalStarted: 216
  • transactions.totalCommitted: 216

What Must Stay Constant For Final Backend Comparison

When the final Oracle/PostgreSQL/MongoDB comparison is produced, keep these constant:

  • same scenario names
  • same operation counts
  • same concurrency levels
  • same worker counts for signal drain
  • same synthetic workflow definitions
  • same Bulstrad workflow families
  • same correctness assertions

Compare these dimensions directly:

  • throughput per second
  • latency average, P95, P99, and max
  • phase latency summaries for start, signal publish, and signal-to-completion on the synthetic signal round-trip workload
  • failures, dead letters, runtime conflicts, and stuck instances
  • commit, transaction, or mutation count analogs
  • row, tuple, or document movement analogs
  • read, network, and wake-path cost analogs
  • dominant waits, locks, or contention classes when the backend exposes them clearly

First Sizing Note

On this local MongoDB baseline:

  • Mongo is the fastest of the three backends on the synthetic signal round-trip workloads measured so far
  • the biggest correctness findings came from backend behavior, not raw throughput:
    • bounded empty-queue receive
    • explicit collection bootstrap before transactional concurrency
  • c8 is the last clearly comfortable capacity rung
  • c16 is the first rung where latency growth becomes visible, even though throughput still increases slightly

This is a baseline, not a production commitment. The final recommendation still needs the explicit three-backend comparison pack using the same workloads and the same correctness rules.