Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into standalone StellaOps.Workflow.* libraries targeting net10.0. Libraries (14): - Contracts, Abstractions (compiler, decompiler, expression runtime) - Engine (execution, signaling, scheduling, projections, hosted services) - ElkSharp (generic graph layout algorithm) - Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg - Signaling.Redis, Signaling.OracleAq - DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle WebService: ASP.NET Core Minimal API with 22 endpoints Tests (8 projects, 109 tests pass): - Engine.Tests (105 pass), WebService.Tests (4 E2E pass) - Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests - Signaling.Redis.Tests, IntegrationTests.Shared Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9.1 KiB
MongoDB Performance Baseline 2026-03-17
Purpose
This document captures the current MongoDB-backed load and performance baseline for the Serdica workflow engine. It completes the per-backend baseline set that will feed the final three-backend comparison.
The durable machine-readable companion is 12-mongo-performance-baseline-2026-03-17.json.
Run Metadata
- Date:
2026-03-17 - Test command:
- integration performance suite filtered to
MongoPerformance
- integration performance suite filtered to
- Suite result:
14/14tests passed- total wall-clock time:
48 s
- Raw artifact directory:
TestResults/workflow-performance/
- MongoDB environment:
- Docker image:
mongo:7.0 - topology: single-node replica set
- version:
7.0.30 - backend: durable collections plus change-stream wake hints
- Docker image:
Scenario Summary
| Scenario | Tier | Ops | Conc | Duration ms | Throughput/s | Avg ms | P95 ms | Max ms |
|---|---|---|---|---|---|---|---|---|
mongo-signal-roundtrip-capacity-c1 |
WorkflowPerfCapacity |
16 | 1 | 2259.99 | 7.08 | 1394.99 | 1576.55 | 2063.72 |
mongo-signal-roundtrip-capacity-c4 |
WorkflowPerfCapacity |
64 | 4 | 1668.99 | 38.35 | 1244.81 | 1472.61 | 1527.26 |
mongo-signal-roundtrip-capacity-c8 |
WorkflowPerfCapacity |
128 | 8 | 1938.12 | 66.04 | 1477.49 | 1743.52 | 1757.88 |
mongo-signal-roundtrip-capacity-c16 |
WorkflowPerfCapacity |
256 | 16 | 3728.88 | 68.65 | 3203.94 | 3507.95 | 3527.96 |
mongo-signal-roundtrip-latency-serial |
WorkflowPerfLatency |
16 | 1 | 1675.77 | 9.55 | 97.88 | 149.20 | 324.02 |
mongo-bulstrad-quotation-confirm-convert-to-policy-nightly |
WorkflowPerfNightly |
12 | 4 | 1108.42 | 10.83 | 790.30 | 947.21 | 963.16 |
mongo-delayed-burst-nightly |
WorkflowPerfNightly |
48 | 1 | 2881.66 | 16.66 | 2142.14 | 2265.15 | 2281.04 |
mongo-immediate-burst-nightly |
WorkflowPerfNightly |
120 | 1 | 2598.57 | 46.18 | 1148.06 | 1530.49 | 1575.98 |
mongo-synthetic-external-resume-nightly |
WorkflowPerfNightly |
36 | 8 | 976.73 | 36.86 | 633.82 | 770.10 | 772.71 |
mongo-bulstrad-quote-or-apl-cancel-smoke |
WorkflowPerfSmoke |
10 | 4 | 425.81 | 23.48 | 124.35 | 294.76 | 295.32 |
mongo-delayed-burst-smoke |
WorkflowPerfSmoke |
12 | 1 | 2416.23 | 4.97 | 2040.30 | 2079.79 | 2084.03 |
mongo-immediate-burst-smoke |
WorkflowPerfSmoke |
24 | 1 | 747.36 | 32.11 | 264.14 | 339.42 | 400.99 |
mongo-signal-roundtrip-soak |
WorkflowPerfSoak |
108 | 8 | 2267.91 | 47.62 | 322.40 | 550.50 | 572.73 |
mongo-signal-roundtrip-throughput-parallel |
WorkflowPerfThroughput |
96 | 16 | 1258.48 | 76.28 | 1110.94 | 1121.22 | 1127.11 |
Measurement Split
The synthetic signal round-trip workload is measured in three separate ways:
mongo-signal-roundtrip-latency-serial: one workflow at a time, one signal worker, used as the single-instance latency baseline.mongo-signal-roundtrip-throughput-parallel:96workflows,16-way workload concurrency,8signal workers, used as the steady-state throughput baseline.mongo-signal-roundtrip-capacity-c*: batch-wave capacity ladder used to observe scaling and pressure points.
The useful MongoDB baseline is:
- serial latency baseline:
97.88 msaverage end-to-end per workflow - steady throughput baseline:
76.28 ops/swith16workload concurrency and8signal workers - capacity
c1:7.08 ops/s; this is only the smallest batch-wave rung
Serial Latency Baseline
| Phase | Avg ms | P95 ms | Max ms |
|---|---|---|---|
start |
26.34 | 79.35 | 251.36 |
signalPublish |
8.17 | 10.75 | 12.17 |
signalToCompletion |
71.54 | 77.94 | 79.48 |
Interpretation:
- serial end-to-end latency is far lower than the Oracle and PostgreSQL baselines on this local setup
- most of the work remains in signal-to-completion, but the absolute time is much smaller
- workflow start is still the most variable of the three measured phases
Steady Throughput Baseline
| Phase | Avg ms | P95 ms | Max ms |
|---|---|---|---|
start |
20.88 | 28.64 | 33.67 |
signalPublish |
16.01 | 20.90 | 22.71 |
signalToCompletion |
988.88 | 1000.12 | 1004.92 |
Interpretation:
- the engine sustained
76.28 ops/sin a96-operation wave - end-to-end average stayed at
1110.94 ms - the dominant cost is still resume processing, but Mongo remains materially faster on this synthetic profile
MongoDB Observations
Dominant Waits
- no durable current-op contention class dominated these runs; every scenario finished without a stable top wait entry
- this means the current Mongo baseline should be read primarily through normalized workflow metrics and the Mongo-specific counter set, not through a wait-event headline
- the backend bug exposed by the first perf pass was not storage contention; it was correctness:
- empty-queue receive had to become bounded
- collection bootstrap had to be explicit before transactional concurrency
Capacity Ladder
| Scenario | Throughput/s | P95 ms | Commands | Inserts | Updates | Deletes | Docs Returned | Docs Inserted | Docs Updated | Docs Deleted |
|---|---|---|---|---|---|---|---|---|---|---|
c1 |
7.08 | 1576.55 | 183 | 48 | 48 | 16 | 80 | 48 | 48 | 16 |
c4 |
38.35 | 1472.61 | 684 | 192 | 192 | 64 | 320 | 192 | 192 | 64 |
c8 |
66.04 | 1743.52 | 1349 | 384 | 384 | 128 | 640 | 384 | 384 | 128 |
c16 |
68.65 | 3507.95 | 2515 | 768 | 768 | 256 | 1280 | 768 | 768 | 256 |
Interpretation:
- Mongo scales very aggressively through
c8 c16is still the fastest rung, but it is also where latency expands sharply relative toc8- the first visible pressure point is therefore
c16, even though throughput still rises slightly
Transport Baselines
| Scenario | Throughput/s | Commands | Inserts | Deletes | Network In | Network Out |
|---|---|---|---|---|---|---|
mongo-immediate-burst-nightly |
46.18 | 379 | 120 | 120 | 277307 | 296277 |
mongo-delayed-burst-nightly |
16.66 | 1052 | 48 | 48 | 507607 | 450004 |
Interpretation:
- immediate transport is still much cheaper than full workflow resume
- delayed transport carries more command and network chatter because the wake path repeatedly checks due work through the change-stream plus due-time model
Business Flow Baselines
| Scenario | Throughput/s | Avg ms | Commands | Queries | Inserts | Updates | Deletes | Tx Started | Tx Committed |
|---|---|---|---|---|---|---|---|---|---|
mongo-bulstrad-quote-or-apl-cancel-smoke |
23.48 | 124.35 | 54 | 45 | 20 | 0 | 0 | 10 | 10 |
mongo-bulstrad-quotation-confirm-convert-to-policy-nightly |
10.83 | 790.30 | 189 | 151 | 96 | 48 | 12 | 36 | 36 |
Interpretation:
- the short Bulstrad flow is still cheap enough that transport and projection movement dominate
- the heavier
QuotationConfirm -> ConvertToPolicy -> PdfGeneratorpath stays comfortably sub-second on this local Mongo baseline
Soak Baseline
mongo-signal-roundtrip-soak completed 108 operations at concurrency 8 with:
- throughput:
47.62 ops/s - average latency:
322.40 ms - P95 latency:
550.50 ms 0failures0dead-lettered signals0runtime conflicts0stuck instances
MongoDB metrics for the soak run:
opcounters.command:2264opcounters.insert:324opcounters.update:324opcounters.delete:108metrics.document.returned:540metrics.document.inserted:324metrics.document.updated:324metrics.document.deleted:108transactions.totalStarted:216transactions.totalCommitted:216
What Must Stay Constant For Final Backend Comparison
When the final Oracle/PostgreSQL/MongoDB comparison is produced, keep these constant:
- same scenario names
- same operation counts
- same concurrency levels
- same worker counts for signal drain
- same synthetic workflow definitions
- same Bulstrad workflow families
- same correctness assertions
Compare these dimensions directly:
- throughput per second
- latency average, P95, P99, and max
- phase latency summaries for start, signal publish, and signal-to-completion on the synthetic signal round-trip workload
- failures, dead letters, runtime conflicts, and stuck instances
- commit, transaction, or mutation count analogs
- row, tuple, or document movement analogs
- read, network, and wake-path cost analogs
- dominant waits, locks, or contention classes when the backend exposes them clearly
First Sizing Note
On this local MongoDB baseline:
- Mongo is the fastest of the three backends on the synthetic signal round-trip workloads measured so far
- the biggest correctness findings came from backend behavior, not raw throughput:
- bounded empty-queue receive
- explicit collection bootstrap before transactional concurrency
c8is the last clearly comfortable capacity rungc16is the first rung where latency growth becomes visible, even though throughput still increases slightly
This is a baseline, not a production commitment. The final recommendation still needs the explicit three-backend comparison pack using the same workloads and the same correctness rules.