Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into standalone StellaOps.Workflow.* libraries targeting net10.0. Libraries (14): - Contracts, Abstractions (compiler, decompiler, expression runtime) - Engine (execution, signaling, scheduling, projections, hosted services) - ElkSharp (generic graph layout algorithm) - Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg - Signaling.Redis, Signaling.OracleAq - DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle WebService: ASP.NET Core Minimal API with 22 endpoints Tests (8 projects, 109 tests pass): - Engine.Tests (105 pass), WebService.Tests (4 E2E pass) - Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests - Signaling.Redis.Tests, IntegrationTests.Shared Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
15 KiB
08. Load And Performance Plan
Purpose
This document defines how the Serdica workflow engine should be load-tested, performance-characterized, and capacity-sized once functional parity is in place.
The goal is not only to prove that the engine is correct under load, but to answer these product and platform questions:
- how many workflow starts, task completions, and signal resumes can one node sustain
- how quickly does backlog drain after restart or outage
- how much timing variance is normal for Oracle AQ on local Docker, CI, and shared environments
- which workloads are Oracle-bound, AQ-bound, or engine-bound
- which scenarios are safe to gate in PR and which belong in nightly or explicit soak runs
Principles
The performance plan follows these rules:
- correctness comes first; a fast but lossy engine result is a failed run
- performance tests must be split by intent: smoke, characterization, stress, soak, and failure-under-load
- transport-only tests and full workflow tests must both exist; they answer different questions
- synthetic workflows are required for stable measurement
- representative Bulstrad workflows are required for product confidence
- PR gates should use coarse, stable envelopes
- nightly and explicit runs should record and compare detailed metrics
- Oracle and AQ behavior must be measured directly, not inferred from app logs alone
What Must Be Measured
Correctness Under Load
Every load run should capture:
- total workflows started
- total tasks activated
- total tasks completed
- total signals published
- total signals processed
- total signals ignored as stale or duplicate
- total dead-lettered signals
- total runtime concurrency conflicts
- total failed runs
- total stuck instances at end of run
Correctness invariants:
- no lost committed signal
- no duplicate open task for the same logical wait
- no orphan subworkflow frame
- no runtime state row left without a valid explainable wait reason
- no queue backlog remaining after a successful drain phase unless the scenario intentionally leaves poison messages in DLQ
Latency
The engine should measure at least:
- start-to-first-task latency
- start-to-completion latency
- task-complete-to-next-task latency
- signal-publish-to-task-visible latency
- timer-due-to-resume latency
- delayed-message lateness relative to requested due time
- backlog-drain completion time
- restart-to-first-processed-signal time
These should be recorded as:
- average
- p50
- p95
- p99
- max
Throughput
The engine should measure:
- workflows started per second
- task completions per second
- signals published per second
- signals processed per second
- backlog drain rate in signals per second
- completed end-to-end business workflows per minute
Saturation
The engine should measure:
- app process CPU
- app process private memory and working set
- Oracle container CPU and memory when running locally
- queue depth over time
- active waiting instances over time
- dead-letter depth over time
- runtime state update conflicts over time
- open task count over time
Oracle-Side Signals
If the environment permits access, also collect:
- AQ queue depth before, during, and after load
- queue-table growth during sustained runs
- visible dequeue lag
- Oracle session count for the test service
- lock or wait spikes on workflow tables
- transaction duration for mutation transactions
If the environment does not permit these views, fall back to:
- app-side timing
- browse counts from AQ
- workflow table row counts
- signal pump telemetry snapshots
Workload Model
The load plan should be split into four workload families.
1. Transport Microbenchmarks
These isolate Oracle AQ behavior from workflow logic.
Use them to answer:
- how fast can AQ accept immediate messages
- how fast can AQ release delayed messages
- what is the drain rate for mixed backlogs
- how much delayed-message jitter is normal
Core scenarios:
- burst immediate enqueue and drain
- burst delayed enqueue with same due second
- mixed immediate and delayed enqueue on one queue
- dequeue rollback redelivery under sustained load
- dead-letter and replay backlog
- delayed backlog surviving Oracle restart
2. Synthetic Engine Workloads
These isolate the runtime from business-specific transport noise.
Recommended synthetic workflow types:
- start-to-complete with no task
- start-to-task with one human task
- signal-wait then task activation
- timer-wait then task activation
- continue-with dispatcher chain
- parent-child subworkflow chain
Use them to answer:
- raw start throughput
- raw resume throughput
- timer-due drain rate
- subworkflow coordination cost
- task activation/update cost
3. Representative Bulstrad Workloads
These prove that realistic product workflows behave well under load.
The first performance wave should use workflows that are already functionally covered in the Oracle suite:
AssistantPrintInsisDocumentsOpenForChangePolicyReviewPolicyOpenForChangeAssistantAddAnnexAnnexCancellationAssistantPolicyCancellationAssistantPolicyReinstateInsisIntegrationNewQuotationConfirmQuoteOrAplCancel
Use them to answer:
- how the engine behaves with realistic transport payload shaping
- how nested child workflows affect latency
- how multi-step review chains behave during backlog drain
- how short utility flows compare to long policy chains
4. Failure-Under-Load Workloads
These are not optional. A production engine must be tested while busy.
Scenarios:
- provider restart during active signal drain
- Oracle restart while delayed backlog exists
- dead-letter replay while new live signals continue to arrive
- duplicate signal storm against the same waiting instance set
- one worker repeatedly failing while another healthy worker continues
- scheduled backlog plus external-signal backlog mixed together
Use them to answer:
- whether recovery stays bounded
- whether backlog drain remains monotonic
- whether duplicate-delivery protections still hold under pressure
- whether DLQ replay can safely coexist with live traffic
Test Tiers
Performance testing should not be a single bucket.
Tier 1: PR Smoke
Purpose:
- catch catastrophic regressions quickly
Characteristics:
- small datasets
- short run time
- deterministic scenarios
- hard pass/fail envelopes
Recommended scope:
- one AQ immediate burst
- one AQ delayed backlog burst
- one synthetic signal-resume scenario
- one short Bulstrad business flow
Target duration:
- under 5 minutes total
Gating style:
- zero correctness failures
- no DLQ unless explicitly expected
- coarse latency ceilings only
Tier 2: Nightly Characterization
Purpose:
- measure trends and detect meaningful performance regression
Characteristics:
- moderate dataset
- multiple concurrency levels
- metrics persisted as artifacts
Recommended scope:
- full Oracle transport matrix
- synthetic engine workloads at 1, 4, 8, and 16-way concurrency
- 3-5 representative Bulstrad families
- restart and DLQ replay under moderate backlog
Target duration:
- 15 to 45 minutes
Gating style:
- correctness failures fail the run
- latency/throughput compare against baseline with tolerance
Tier 3: Weekly Soak
Purpose:
- detect leaks, drift, and long-tail timing issues
Characteristics:
- long-running mixed workload
- periodic restarts or controlled faults
- queue depth and runtime-state stability tracking
Recommended scope:
- 30 to 120 minute mixed load
- immediate, delayed, and replay traffic mixed together
- repeated provider restarts
- one Oracle restart in the middle of the run
Gating style:
- no unbounded backlog growth
- no stuck instances
- no memory growth trend outside a defined envelope
Tier 4: Explicit Capacity And Breakpoint Runs
Purpose:
- learn real limits before production sizing decisions
Characteristics:
- not part of normal CI
- intentionally pushes throughput until latency or failure thresholds break
Recommended scope:
- ramp concurrency upward until queue lag or DB pressure exceeds target
- test one-node and multi-node configurations
- record saturation points, not just pass/fail
Deliverable:
- capacity report with recommended node counts and operational envelopes
Scenario Matrix
The initial scenario matrix should look like this.
Oracle AQ Transport
- immediate burst: 100, 500, 1000 messages
- delayed burst: 50, 100, 250 messages due in same second
- mixed burst: 70 percent immediate, 30 percent delayed
- redelivery burst: 25 messages rolled back once then committed
- DLQ burst: 25 poison messages then replay
Synthetic Engine
- start-to-task: 50, 200, 500 workflow starts
- task-complete-to-next-task: 50, 200 completions
- signal-wait-resume: 50, 200, 500 waiting instances resumed concurrently
- timer-wait-resume: 50, 200 due timers
- subworkflow chain: 25, 100 parent-child chains
Bulstrad Business
- short business flow:
QuoteOrAplCancel - medium transport flow:
InsisIntegrationNew - child-workflow flow:
QuotationConfirm - long review chain:
OpenForChangePolicy - print flow:
AssistantPrintInsisDocuments - cancellation flow:
AnnexCancellation
Failure Under Load
- 100 waiting instances, provider restart during drain
- 100 delayed messages, Oracle restart before due time
- 50 poison signals plus live replay traffic
- duplicate external signal storm against 50 waiting instances
- mixed task completions and signal resumes on same service instance set
Concurrency Steps
Use explicit concurrency ladders instead of one arbitrary load value.
Recommended first ladder:
- 1
- 4
- 8
- 16
- 32
Use different ladders if the environment is too small, but always record:
- node count
- worker concurrency
- queue backlog size
- workflow count
- message mix
Metrics Collection Design
The harness should persist results for every performance run.
Each result set should include:
- scenario name
- git commit or working tree marker
- test timestamp
- environment label
- node count
- concurrency level
- workflow count
- signal count
- Oracle queue names used
- measured latency summary
- throughput summary
- correctness summary
- process resource summary
- optional Oracle observations
Recommended output format:
- JSON artifact for machines
- short markdown summary for humans
Recommended location:
TestResults/workflow-performance/
Baseline Strategy
Do not hard-code aggressive latency thresholds before collecting stable data.
Use this sequence:
-
characterization phase Run each scenario several times on local Docker and CI Oracle.
-
baseline phase Record stable p50, p95, p99, throughput, and drain-rate envelopes.
-
gating phase Add coarse PR thresholds and tighter nightly regression detection.
PR thresholds should be:
- intentionally forgiving
- correctness-first
- designed to catch major regressions only
Nightly thresholds should be:
- baseline-relative
- environment-specific if necessary
- reviewed whenever Oracle container images or CI hardware changes
Harness Design
The load harness should be separate from the normal fast integration suite.
Recommended structure:
- keep correctness-focused Oracle AQ tests in the current integration project
- add categorized performance tests with explicit categories such as:
WorkflowPerfLatencyWorkflowPerfThroughputWorkflowPerfSmokeWorkflowPerfNightlyWorkflowPerfSoakWorkflowPerfCapacity
- keep scenario builders reusable so the same workflow/transports can be used in correctness and performance runs
The harness should include:
- scenario driver
- result collector
- metric aggregator
- optional Oracle observation collector
- artifact writer
- explicit phase-latency capture for start, signal publish, and signal-to-completion on the synthetic signal round-trip workload
Multi-Backend Expansion Rules
Once Oracle is the validated reference baseline, PostgreSQL and MongoDB must adopt the same load and performance structure instead of inventing backend-specific suites first.
Required rules:
- keep one shared scenario catalog for Oracle, PostgreSQL, and MongoDB
- compare backends first on normalized workflow metrics, not backend-native counters
- keep backend-native metrics as appendices, not as the headline result
- use the same tier names and artifact schema across all backends
- keep the same curated Bulstrad workload pack across all backends unless a workflow is backend-blocked by a real functional defect
The shared artifact set should ultimately include:
10-oracle-performance-baseline-<date>.md/.json11-postgres-performance-baseline-<date>.md/.json12-mongo-performance-baseline-<date>.md/.json13-backend-comparison-<date>.md/.json
The shared normalized metrics are:
- serial end-to-end latency
- start-to-first-task latency
- signal-publish-to-visible-resume latency
- steady-state throughput
- capacity ladder at
c1,c4,c8, andc16 - backlog drain time
- failures
- dead letters
- runtime conflicts
- stuck instances
Backend-native appendices should include:
- Oracle:
- AQ browse depth
V$SYSSTATdeltasV$SYS_TIME_MODELdeltas- top wait deltas
- PostgreSQL:
- queue-table depth
pg_stat_databasepg_stat_statements- lock and wait observations
- WAL pressure observations
- MongoDB:
- signal collection depth
serverStatuscounters- transaction counters
- change-stream wake observations
- lock percentage observations
Oracle-Specific Observation Plan
For Oracle-backed runs, observe both the engine and the database.
At minimum, record:
- AQ browse depth before, during, and after the run
- count of runtime-state rows touched
- count of task and task-event rows created
- number of dead-lettered signals
- duplicate/stale resume ignore count
If the environment allows deeper Oracle access, also record:
- session count for the service user
- top wait classes during the run
- lock waits on workflow tables
- statement time for key mutation queries
Exit Criteria
The load/performance work is complete when:
- PR smoke scenarios are stable and cheap enough to run continuously
- nightly characterization produces persisted metrics and useful regression signal
- at least one weekly soak run is stable without correctness drift
- representative Bulstrad families have measured latency and throughput envelopes
- Oracle restart, provider restart, DLQ replay, and duplicate-delivery scenarios are all characterized under load
- the team can state a first production sizing recommendation for one node and multi-node deployment
Next Sprint Shape
This plan maps naturally to a dedicated sprint focused on:
- performance harness infrastructure
- synthetic scenario library
- representative Bulstrad workload runner
- metrics artifact generation
- baseline capture
- first capacity report