Files

master f5b5f24d95 Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects

Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into
standalone StellaOps.Workflow.* libraries targeting net10.0.

Libraries (14):
- Contracts, Abstractions (compiler, decompiler, expression runtime)
- Engine (execution, signaling, scheduling, projections, hosted services)
- ElkSharp (generic graph layout algorithm)
- Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg
- Signaling.Redis, Signaling.OracleAq
- DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle

WebService: ASP.NET Core Minimal API with 22 endpoints

Tests (8 projects, 109 tests pass):
- Engine.Tests (105 pass), WebService.Tests (4 E2E pass)
- Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests
- Signaling.Redis.Tests, IntegrationTests.Shared

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-20 19:14:44 +02:00

9.1 KiB

Raw Blame History

MongoDB Performance Baseline 2026-03-17

Purpose

This document captures the current MongoDB-backed load and performance baseline for the Serdica workflow engine. It completes the per-backend baseline set that will feed the final three-backend comparison.

The durable machine-readable companion is 12-mongo-performance-baseline-2026-03-17.json.

Run Metadata

Date: 2026-03-17
Test command:
- integration performance suite filtered to MongoPerformance
Suite result:
- 14/14 tests passed
- total wall-clock time: 48 s
Raw artifact directory:
- TestResults/workflow-performance/
MongoDB environment:
- Docker image: mongo:7.0
- topology: single-node replica set
- version: 7.0.30
- backend: durable collections plus change-stream wake hints

Scenario Summary

Scenario	Tier	Ops	Conc	Duration ms	Throughput/s	Avg ms	P95 ms	Max ms
`mongo-signal-roundtrip-capacity-c1`	`WorkflowPerfCapacity`	16	1	2259.99	7.08	1394.99	1576.55	2063.72
`mongo-signal-roundtrip-capacity-c4`	`WorkflowPerfCapacity`	64	4	1668.99	38.35	1244.81	1472.61	1527.26
`mongo-signal-roundtrip-capacity-c8`	`WorkflowPerfCapacity`	128	8	1938.12	66.04	1477.49	1743.52	1757.88
`mongo-signal-roundtrip-capacity-c16`	`WorkflowPerfCapacity`	256	16	3728.88	68.65	3203.94	3507.95	3527.96
`mongo-signal-roundtrip-latency-serial`	`WorkflowPerfLatency`	16	1	1675.77	9.55	97.88	149.20	324.02
`mongo-bulstrad-quotation-confirm-convert-to-policy-nightly`	`WorkflowPerfNightly`	12	4	1108.42	10.83	790.30	947.21	963.16
`mongo-delayed-burst-nightly`	`WorkflowPerfNightly`	48	1	2881.66	16.66	2142.14	2265.15	2281.04
`mongo-immediate-burst-nightly`	`WorkflowPerfNightly`	120	1	2598.57	46.18	1148.06	1530.49	1575.98
`mongo-synthetic-external-resume-nightly`	`WorkflowPerfNightly`	36	8	976.73	36.86	633.82	770.10	772.71
`mongo-bulstrad-quote-or-apl-cancel-smoke`	`WorkflowPerfSmoke`	10	4	425.81	23.48	124.35	294.76	295.32
`mongo-delayed-burst-smoke`	`WorkflowPerfSmoke`	12	1	2416.23	4.97	2040.30	2079.79	2084.03
`mongo-immediate-burst-smoke`	`WorkflowPerfSmoke`	24	1	747.36	32.11	264.14	339.42	400.99
`mongo-signal-roundtrip-soak`	`WorkflowPerfSoak`	108	8	2267.91	47.62	322.40	550.50	572.73
`mongo-signal-roundtrip-throughput-parallel`	`WorkflowPerfThroughput`	96	16	1258.48	76.28	1110.94	1121.22	1127.11

Measurement Split

The synthetic signal round-trip workload is measured in three separate ways:

mongo-signal-roundtrip-latency-serial: one workflow at a time, one signal worker, used as the single-instance latency baseline.
mongo-signal-roundtrip-throughput-parallel: 96 workflows, 16-way workload concurrency, 8 signal workers, used as the steady-state throughput baseline.
mongo-signal-roundtrip-capacity-c*: batch-wave capacity ladder used to observe scaling and pressure points.

The useful MongoDB baseline is:

serial latency baseline: 97.88 ms average end-to-end per workflow
steady throughput baseline: 76.28 ops/s with 16 workload concurrency and 8 signal workers
capacity c1: 7.08 ops/s; this is only the smallest batch-wave rung

Serial Latency Baseline

Phase	Avg ms	P95 ms	Max ms
`start`	26.34	79.35	251.36
`signalPublish`	8.17	10.75	12.17
`signalToCompletion`	71.54	77.94	79.48

Interpretation:

serial end-to-end latency is far lower than the Oracle and PostgreSQL baselines on this local setup
most of the work remains in signal-to-completion, but the absolute time is much smaller
workflow start is still the most variable of the three measured phases

Steady Throughput Baseline

Phase	Avg ms	P95 ms	Max ms
`start`	20.88	28.64	33.67
`signalPublish`	16.01	20.90	22.71
`signalToCompletion`	988.88	1000.12	1004.92

Interpretation:

the engine sustained 76.28 ops/s in a 96-operation wave
end-to-end average stayed at 1110.94 ms
the dominant cost is still resume processing, but Mongo remains materially faster on this synthetic profile

MongoDB Observations

Dominant Waits

no durable current-op contention class dominated these runs; every scenario finished without a stable top wait entry
this means the current Mongo baseline should be read primarily through normalized workflow metrics and the Mongo-specific counter set, not through a wait-event headline
the backend bug exposed by the first perf pass was not storage contention; it was correctness:
- empty-queue receive had to become bounded
- collection bootstrap had to be explicit before transactional concurrency

Capacity Ladder

Scenario	Throughput/s	P95 ms	Commands	Inserts	Updates	Deletes	Docs Returned	Docs Inserted	Docs Updated	Docs Deleted
`c1`	7.08	1576.55	183	48	48	16	80	48	48	16
`c4`	38.35	1472.61	684	192	192	64	320	192	192	64
`c8`	66.04	1743.52	1349	384	384	128	640	384	384	128
`c16`	68.65	3507.95	2515	768	768	256	1280	768	768	256

Interpretation:

Mongo scales very aggressively through c8
c16 is still the fastest rung, but it is also where latency expands sharply relative to c8
the first visible pressure point is therefore c16, even though throughput still rises slightly

Transport Baselines

Scenario	Throughput/s	Commands	Inserts	Deletes	Network In	Network Out
`mongo-immediate-burst-nightly`	46.18	379	120	120	277307	296277
`mongo-delayed-burst-nightly`	16.66	1052	48	48	507607	450004

Interpretation:

immediate transport is still much cheaper than full workflow resume
delayed transport carries more command and network chatter because the wake path repeatedly checks due work through the change-stream plus due-time model

Business Flow Baselines

Scenario	Throughput/s	Avg ms	Commands	Queries	Inserts	Updates	Deletes	Tx Started	Tx Committed
`mongo-bulstrad-quote-or-apl-cancel-smoke`	23.48	124.35	54	45	20	0	0	10	10
`mongo-bulstrad-quotation-confirm-convert-to-policy-nightly`	10.83	790.30	189	151	96	48	12	36	36

Interpretation:

the short Bulstrad flow is still cheap enough that transport and projection movement dominate
the heavier QuotationConfirm -> ConvertToPolicy -> PdfGenerator path stays comfortably sub-second on this local Mongo baseline

Soak Baseline

mongo-signal-roundtrip-soak completed 108 operations at concurrency 8 with:

throughput: 47.62 ops/s
average latency: 322.40 ms
P95 latency: 550.50 ms
0 failures
0 dead-lettered signals
0 runtime conflicts
0 stuck instances

MongoDB metrics for the soak run:

opcounters.command: 2264
opcounters.insert: 324
opcounters.update: 324
opcounters.delete: 108
metrics.document.returned: 540
metrics.document.inserted: 324
metrics.document.updated: 324
metrics.document.deleted: 108
transactions.totalStarted: 216
transactions.totalCommitted: 216

What Must Stay Constant For Final Backend Comparison

When the final Oracle/PostgreSQL/MongoDB comparison is produced, keep these constant:

same scenario names
same operation counts
same concurrency levels
same worker counts for signal drain
same synthetic workflow definitions
same Bulstrad workflow families
same correctness assertions

Compare these dimensions directly:

throughput per second
latency average, P95, P99, and max
phase latency summaries for start, signal publish, and signal-to-completion on the synthetic signal round-trip workload
failures, dead letters, runtime conflicts, and stuck instances
commit, transaction, or mutation count analogs
row, tuple, or document movement analogs
read, network, and wake-path cost analogs
dominant waits, locks, or contention classes when the backend exposes them clearly

First Sizing Note

On this local MongoDB baseline:

Mongo is the fastest of the three backends on the synthetic signal round-trip workloads measured so far
the biggest correctness findings came from backend behavior, not raw throughput:
- bounded empty-queue receive
- explicit collection bootstrap before transactional concurrency
c8 is the last clearly comfortable capacity rung
c16 is the first rung where latency growth becomes visible, even though throughput still increases slightly

This is a baseline, not a production commitment. The final recommendation still needs the explicit three-backend comparison pack using the same workloads and the same correctness rules.

9.1 KiB Raw Blame History

MongoDB Performance Baseline 2026-03-17

Purpose

Run Metadata

Scenario Summary

Measurement Split

Serial Latency Baseline

Steady Throughput Baseline

MongoDB Observations

Dominant Waits

Capacity Ladder

Transport Baselines

Business Flow Baselines

Soak Baseline

What Must Stay Constant For Final Backend Comparison

First Sizing Note

9.1 KiB

Raw Blame History