# MongoDB Performance Baseline 2026-03-17 ## Purpose This document captures the current MongoDB-backed load and performance baseline for the Serdica workflow engine. It completes the per-backend baseline set that will feed the final three-backend comparison. The durable machine-readable companion is [12-mongo-performance-baseline-2026-03-17.json](12-mongo-performance-baseline-2026-03-17.json). ## Run Metadata - Date: `2026-03-17` - Test command: - integration performance suite filtered to `MongoPerformance` - Suite result: - `14/14` tests passed - total wall-clock time: `48 s` - Raw artifact directory: - `TestResults/workflow-performance/` - MongoDB environment: - Docker image: `mongo:7.0` - topology: single-node replica set - version: `7.0.30` - backend: durable collections plus change-stream wake hints ## Scenario Summary | Scenario | Tier | Ops | Conc | Duration ms | Throughput/s | Avg ms | P95 ms | Max ms | | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | | `mongo-signal-roundtrip-capacity-c1` | `WorkflowPerfCapacity` | 16 | 1 | 2259.99 | 7.08 | 1394.99 | 1576.55 | 2063.72 | | `mongo-signal-roundtrip-capacity-c4` | `WorkflowPerfCapacity` | 64 | 4 | 1668.99 | 38.35 | 1244.81 | 1472.61 | 1527.26 | | `mongo-signal-roundtrip-capacity-c8` | `WorkflowPerfCapacity` | 128 | 8 | 1938.12 | 66.04 | 1477.49 | 1743.52 | 1757.88 | | `mongo-signal-roundtrip-capacity-c16` | `WorkflowPerfCapacity` | 256 | 16 | 3728.88 | 68.65 | 3203.94 | 3507.95 | 3527.96 | | `mongo-signal-roundtrip-latency-serial` | `WorkflowPerfLatency` | 16 | 1 | 1675.77 | 9.55 | 97.88 | 149.20 | 324.02 | | `mongo-bulstrad-quotation-confirm-convert-to-policy-nightly` | `WorkflowPerfNightly` | 12 | 4 | 1108.42 | 10.83 | 790.30 | 947.21 | 963.16 | | `mongo-delayed-burst-nightly` | `WorkflowPerfNightly` | 48 | 1 | 2881.66 | 16.66 | 2142.14 | 2265.15 | 2281.04 | | `mongo-immediate-burst-nightly` | `WorkflowPerfNightly` | 120 | 1 | 2598.57 | 46.18 | 1148.06 | 1530.49 | 1575.98 | | `mongo-synthetic-external-resume-nightly` | `WorkflowPerfNightly` | 36 | 8 | 976.73 | 36.86 | 633.82 | 770.10 | 772.71 | | `mongo-bulstrad-quote-or-apl-cancel-smoke` | `WorkflowPerfSmoke` | 10 | 4 | 425.81 | 23.48 | 124.35 | 294.76 | 295.32 | | `mongo-delayed-burst-smoke` | `WorkflowPerfSmoke` | 12 | 1 | 2416.23 | 4.97 | 2040.30 | 2079.79 | 2084.03 | | `mongo-immediate-burst-smoke` | `WorkflowPerfSmoke` | 24 | 1 | 747.36 | 32.11 | 264.14 | 339.42 | 400.99 | | `mongo-signal-roundtrip-soak` | `WorkflowPerfSoak` | 108 | 8 | 2267.91 | 47.62 | 322.40 | 550.50 | 572.73 | | `mongo-signal-roundtrip-throughput-parallel` | `WorkflowPerfThroughput` | 96 | 16 | 1258.48 | 76.28 | 1110.94 | 1121.22 | 1127.11 | ## Measurement Split The synthetic signal round-trip workload is measured in three separate ways: - `mongo-signal-roundtrip-latency-serial`: one workflow at a time, one signal worker, used as the single-instance latency baseline. - `mongo-signal-roundtrip-throughput-parallel`: `96` workflows, `16`-way workload concurrency, `8` signal workers, used as the steady-state throughput baseline. - `mongo-signal-roundtrip-capacity-c*`: batch-wave capacity ladder used to observe scaling and pressure points. The useful MongoDB baseline is: - serial latency baseline: `97.88 ms` average end-to-end per workflow - steady throughput baseline: `76.28 ops/s` with `16` workload concurrency and `8` signal workers - capacity `c1`: `7.08 ops/s`; this is only the smallest batch-wave rung ### Serial Latency Baseline | Phase | Avg ms | P95 ms | Max ms | | --- | ---: | ---: | ---: | | `start` | 26.34 | 79.35 | 251.36 | | `signalPublish` | 8.17 | 10.75 | 12.17 | | `signalToCompletion` | 71.54 | 77.94 | 79.48 | Interpretation: - serial end-to-end latency is far lower than the Oracle and PostgreSQL baselines on this local setup - most of the work remains in signal-to-completion, but the absolute time is much smaller - workflow start is still the most variable of the three measured phases ### Steady Throughput Baseline | Phase | Avg ms | P95 ms | Max ms | | --- | ---: | ---: | ---: | | `start` | 20.88 | 28.64 | 33.67 | | `signalPublish` | 16.01 | 20.90 | 22.71 | | `signalToCompletion` | 988.88 | 1000.12 | 1004.92 | Interpretation: - the engine sustained `76.28 ops/s` in a `96`-operation wave - end-to-end average stayed at `1110.94 ms` - the dominant cost is still resume processing, but Mongo remains materially faster on this synthetic profile ## MongoDB Observations ### Dominant Waits - no durable current-op contention class dominated these runs; every scenario finished without a stable top wait entry - this means the current Mongo baseline should be read primarily through normalized workflow metrics and the Mongo-specific counter set, not through a wait-event headline - the backend bug exposed by the first perf pass was not storage contention; it was correctness: - empty-queue receive had to become bounded - collection bootstrap had to be explicit before transactional concurrency ### Capacity Ladder | Scenario | Throughput/s | P95 ms | Commands | Inserts | Updates | Deletes | Docs Returned | Docs Inserted | Docs Updated | Docs Deleted | | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | | `c1` | 7.08 | 1576.55 | 183 | 48 | 48 | 16 | 80 | 48 | 48 | 16 | | `c4` | 38.35 | 1472.61 | 684 | 192 | 192 | 64 | 320 | 192 | 192 | 64 | | `c8` | 66.04 | 1743.52 | 1349 | 384 | 384 | 128 | 640 | 384 | 384 | 128 | | `c16` | 68.65 | 3507.95 | 2515 | 768 | 768 | 256 | 1280 | 768 | 768 | 256 | Interpretation: - Mongo scales very aggressively through `c8` - `c16` is still the fastest rung, but it is also where latency expands sharply relative to `c8` - the first visible pressure point is therefore `c16`, even though throughput still rises slightly ### Transport Baselines | Scenario | Throughput/s | Commands | Inserts | Deletes | Network In | Network Out | | --- | ---: | ---: | ---: | ---: | ---: | ---: | | `mongo-immediate-burst-nightly` | 46.18 | 379 | 120 | 120 | 277307 | 296277 | | `mongo-delayed-burst-nightly` | 16.66 | 1052 | 48 | 48 | 507607 | 450004 | Interpretation: - immediate transport is still much cheaper than full workflow resume - delayed transport carries more command and network chatter because the wake path repeatedly checks due work through the change-stream plus due-time model ### Business Flow Baselines | Scenario | Throughput/s | Avg ms | Commands | Queries | Inserts | Updates | Deletes | Tx Started | Tx Committed | | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | | `mongo-bulstrad-quote-or-apl-cancel-smoke` | 23.48 | 124.35 | 54 | 45 | 20 | 0 | 0 | 10 | 10 | | `mongo-bulstrad-quotation-confirm-convert-to-policy-nightly` | 10.83 | 790.30 | 189 | 151 | 96 | 48 | 12 | 36 | 36 | Interpretation: - the short Bulstrad flow is still cheap enough that transport and projection movement dominate - the heavier `QuotationConfirm -> ConvertToPolicy -> PdfGenerator` path stays comfortably sub-second on this local Mongo baseline ### Soak Baseline `mongo-signal-roundtrip-soak` completed `108` operations at concurrency `8` with: - throughput: `47.62 ops/s` - average latency: `322.40 ms` - P95 latency: `550.50 ms` - `0` failures - `0` dead-lettered signals - `0` runtime conflicts - `0` stuck instances MongoDB metrics for the soak run: - `opcounters.command`: `2264` - `opcounters.insert`: `324` - `opcounters.update`: `324` - `opcounters.delete`: `108` - `metrics.document.returned`: `540` - `metrics.document.inserted`: `324` - `metrics.document.updated`: `324` - `metrics.document.deleted`: `108` - `transactions.totalStarted`: `216` - `transactions.totalCommitted`: `216` ## What Must Stay Constant For Final Backend Comparison When the final Oracle/PostgreSQL/MongoDB comparison is produced, keep these constant: - same scenario names - same operation counts - same concurrency levels - same worker counts for signal drain - same synthetic workflow definitions - same Bulstrad workflow families - same correctness assertions Compare these dimensions directly: - throughput per second - latency average, P95, P99, and max - phase latency summaries for start, signal publish, and signal-to-completion on the synthetic signal round-trip workload - failures, dead letters, runtime conflicts, and stuck instances - commit, transaction, or mutation count analogs - row, tuple, or document movement analogs - read, network, and wake-path cost analogs - dominant waits, locks, or contention classes when the backend exposes them clearly ## First Sizing Note On this local MongoDB baseline: - Mongo is the fastest of the three backends on the synthetic signal round-trip workloads measured so far - the biggest correctness findings came from backend behavior, not raw throughput: - bounded empty-queue receive - explicit collection bootstrap before transactional concurrency - `c8` is the last clearly comfortable capacity rung - `c16` is the first rung where latency growth becomes visible, even though throughput still increases slightly This is a baseline, not a production commitment. The final recommendation still needs the explicit three-backend comparison pack using the same workloads and the same correctness rules.