# 08. Load And Performance Plan ## Purpose This document defines how the Serdica workflow engine should be load-tested, performance-characterized, and capacity-sized once functional parity is in place. The goal is not only to prove that the engine is correct under load, but to answer these product and platform questions: - how many workflow starts, task completions, and signal resumes can one node sustain - how quickly does backlog drain after restart or outage - how much timing variance is normal for Oracle AQ on local Docker, CI, and shared environments - which workloads are Oracle-bound, AQ-bound, or engine-bound - which scenarios are safe to gate in PR and which belong in nightly or explicit soak runs ## Principles The performance plan follows these rules: - correctness comes first; a fast but lossy engine result is a failed run - performance tests must be split by intent: smoke, characterization, stress, soak, and failure-under-load - transport-only tests and full workflow tests must both exist; they answer different questions - synthetic workflows are required for stable measurement - representative Bulstrad workflows are required for product confidence - PR gates should use coarse, stable envelopes - nightly and explicit runs should record and compare detailed metrics - Oracle and AQ behavior must be measured directly, not inferred from app logs alone ## What Must Be Measured ### Correctness Under Load Every load run should capture: - total workflows started - total tasks activated - total tasks completed - total signals published - total signals processed - total signals ignored as stale or duplicate - total dead-lettered signals - total runtime concurrency conflicts - total failed runs - total stuck instances at end of run Correctness invariants: - no lost committed signal - no duplicate open task for the same logical wait - no orphan subworkflow frame - no runtime state row left without a valid explainable wait reason - no queue backlog remaining after a successful drain phase unless the scenario intentionally leaves poison messages in DLQ ### Latency The engine should measure at least: - start-to-first-task latency - start-to-completion latency - task-complete-to-next-task latency - signal-publish-to-task-visible latency - timer-due-to-resume latency - delayed-message lateness relative to requested due time - backlog-drain completion time - restart-to-first-processed-signal time These should be recorded as: - average - p50 - p95 - p99 - max ### Throughput The engine should measure: - workflows started per second - task completions per second - signals published per second - signals processed per second - backlog drain rate in signals per second - completed end-to-end business workflows per minute ### Saturation The engine should measure: - app process CPU - app process private memory and working set - Oracle container CPU and memory when running locally - queue depth over time - active waiting instances over time - dead-letter depth over time - runtime state update conflicts over time - open task count over time ### Oracle-Side Signals If the environment permits access, also collect: - AQ queue depth before, during, and after load - queue-table growth during sustained runs - visible dequeue lag - Oracle session count for the test service - lock or wait spikes on workflow tables - transaction duration for mutation transactions If the environment does not permit these views, fall back to: - app-side timing - browse counts from AQ - workflow table row counts - signal pump telemetry snapshots ## Workload Model The load plan should be split into four workload families. ### 1. Transport Microbenchmarks These isolate Oracle AQ behavior from workflow logic. Use them to answer: - how fast can AQ accept immediate messages - how fast can AQ release delayed messages - what is the drain rate for mixed backlogs - how much delayed-message jitter is normal Core scenarios: - burst immediate enqueue and drain - burst delayed enqueue with same due second - mixed immediate and delayed enqueue on one queue - dequeue rollback redelivery under sustained load - dead-letter and replay backlog - delayed backlog surviving Oracle restart ### 2. Synthetic Engine Workloads These isolate the runtime from business-specific transport noise. Recommended synthetic workflow types: - start-to-complete with no task - start-to-task with one human task - signal-wait then task activation - timer-wait then task activation - continue-with dispatcher chain - parent-child subworkflow chain Use them to answer: - raw start throughput - raw resume throughput - timer-due drain rate - subworkflow coordination cost - task activation/update cost ### 3. Representative Bulstrad Workloads These prove that realistic product workflows behave well under load. The first performance wave should use workflows that are already functionally covered in the Oracle suite: - `AssistantPrintInsisDocuments` - `OpenForChangePolicy` - `ReviewPolicyOpenForChange` - `AssistantAddAnnex` - `AnnexCancellation` - `AssistantPolicyCancellation` - `AssistantPolicyReinstate` - `InsisIntegrationNew` - `QuotationConfirm` - `QuoteOrAplCancel` Use them to answer: - how the engine behaves with realistic transport payload shaping - how nested child workflows affect latency - how multi-step review chains behave during backlog drain - how short utility flows compare to long policy chains ### 4. Failure-Under-Load Workloads These are not optional. A production engine must be tested while busy. Scenarios: - provider restart during active signal drain - Oracle restart while delayed backlog exists - dead-letter replay while new live signals continue to arrive - duplicate signal storm against the same waiting instance set - one worker repeatedly failing while another healthy worker continues - scheduled backlog plus external-signal backlog mixed together Use them to answer: - whether recovery stays bounded - whether backlog drain remains monotonic - whether duplicate-delivery protections still hold under pressure - whether DLQ replay can safely coexist with live traffic ## Test Tiers Performance testing should not be a single bucket. ### Tier 1: PR Smoke Purpose: - catch catastrophic regressions quickly Characteristics: - small datasets - short run time - deterministic scenarios - hard pass/fail envelopes Recommended scope: - one AQ immediate burst - one AQ delayed backlog burst - one synthetic signal-resume scenario - one short Bulstrad business flow Target duration: - under 5 minutes total Gating style: - zero correctness failures - no DLQ unless explicitly expected - coarse latency ceilings only ### Tier 2: Nightly Characterization Purpose: - measure trends and detect meaningful performance regression Characteristics: - moderate dataset - multiple concurrency levels - metrics persisted as artifacts Recommended scope: - full Oracle transport matrix - synthetic engine workloads at 1, 4, 8, and 16-way concurrency - 3-5 representative Bulstrad families - restart and DLQ replay under moderate backlog Target duration: - 15 to 45 minutes Gating style: - correctness failures fail the run - latency/throughput compare against baseline with tolerance ### Tier 3: Weekly Soak Purpose: - detect leaks, drift, and long-tail timing issues Characteristics: - long-running mixed workload - periodic restarts or controlled faults - queue depth and runtime-state stability tracking Recommended scope: - 30 to 120 minute mixed load - immediate, delayed, and replay traffic mixed together - repeated provider restarts - one Oracle restart in the middle of the run Gating style: - no unbounded backlog growth - no stuck instances - no memory growth trend outside a defined envelope ### Tier 4: Explicit Capacity And Breakpoint Runs Purpose: - learn real limits before production sizing decisions Characteristics: - not part of normal CI - intentionally pushes throughput until latency or failure thresholds break Recommended scope: - ramp concurrency upward until queue lag or DB pressure exceeds target - test one-node and multi-node configurations - record saturation points, not just pass/fail Deliverable: - capacity report with recommended node counts and operational envelopes ## Scenario Matrix The initial scenario matrix should look like this. ### Oracle AQ Transport - immediate burst: 100, 500, 1000 messages - delayed burst: 50, 100, 250 messages due in same second - mixed burst: 70 percent immediate, 30 percent delayed - redelivery burst: 25 messages rolled back once then committed - DLQ burst: 25 poison messages then replay ### Synthetic Engine - start-to-task: 50, 200, 500 workflow starts - task-complete-to-next-task: 50, 200 completions - signal-wait-resume: 50, 200, 500 waiting instances resumed concurrently - timer-wait-resume: 50, 200 due timers - subworkflow chain: 25, 100 parent-child chains ### Bulstrad Business - short business flow: `QuoteOrAplCancel` - medium transport flow: `InsisIntegrationNew` - child-workflow flow: `QuotationConfirm` - long review chain: `OpenForChangePolicy` - print flow: `AssistantPrintInsisDocuments` - cancellation flow: `AnnexCancellation` ### Failure Under Load - 100 waiting instances, provider restart during drain - 100 delayed messages, Oracle restart before due time - 50 poison signals plus live replay traffic - duplicate external signal storm against 50 waiting instances - mixed task completions and signal resumes on same service instance set ## Concurrency Steps Use explicit concurrency ladders instead of one arbitrary load value. Recommended first ladder: - 1 - 4 - 8 - 16 - 32 Use different ladders if the environment is too small, but always record: - node count - worker concurrency - queue backlog size - workflow count - message mix ## Metrics Collection Design The harness should persist results for every performance run. Each result set should include: - scenario name - git commit or working tree marker - test timestamp - environment label - node count - concurrency level - workflow count - signal count - Oracle queue names used - measured latency summary - throughput summary - correctness summary - process resource summary - optional Oracle observations Recommended output format: - JSON artifact for machines - short markdown summary for humans Recommended location: - `TestResults/workflow-performance/` ## Baseline Strategy Do not hard-code aggressive latency thresholds before collecting stable data. Use this sequence: 1. characterization phase Run each scenario several times on local Docker and CI Oracle. 2. baseline phase Record stable p50, p95, p99, throughput, and drain-rate envelopes. 3. gating phase Add coarse PR thresholds and tighter nightly regression detection. PR thresholds should be: - intentionally forgiving - correctness-first - designed to catch major regressions only Nightly thresholds should be: - baseline-relative - environment-specific if necessary - reviewed whenever Oracle container images or CI hardware changes ## Harness Design The load harness should be separate from the normal fast integration suite. Recommended structure: - keep correctness-focused Oracle AQ tests in the current integration project - add categorized performance tests with explicit categories such as: - `WorkflowPerfLatency` - `WorkflowPerfThroughput` - `WorkflowPerfSmoke` - `WorkflowPerfNightly` - `WorkflowPerfSoak` - `WorkflowPerfCapacity` - keep scenario builders reusable so the same workflow/transports can be used in correctness and performance runs The harness should include: - scenario driver - result collector - metric aggregator - optional Oracle observation collector - artifact writer - explicit phase-latency capture for start, signal publish, and signal-to-completion on the synthetic signal round-trip workload ## Multi-Backend Expansion Rules Once Oracle is the validated reference baseline, PostgreSQL and MongoDB must adopt the same load and performance structure instead of inventing backend-specific suites first. Required rules: - keep one shared scenario catalog for Oracle, PostgreSQL, and MongoDB - compare backends first on normalized workflow metrics, not backend-native counters - keep backend-native metrics as appendices, not as the headline result - use the same tier names and artifact schema across all backends - keep the same curated Bulstrad workload pack across all backends unless a workflow is backend-blocked by a real functional defect The shared artifact set should ultimately include: - `10-oracle-performance-baseline-.md/.json` - `11-postgres-performance-baseline-.md/.json` - `12-mongo-performance-baseline-.md/.json` - `13-backend-comparison-.md/.json` The shared normalized metrics are: - serial end-to-end latency - start-to-first-task latency - signal-publish-to-visible-resume latency - steady-state throughput - capacity ladder at `c1`, `c4`, `c8`, and `c16` - backlog drain time - failures - dead letters - runtime conflicts - stuck instances Backend-native appendices should include: - Oracle: - AQ browse depth - `V$SYSSTAT` deltas - `V$SYS_TIME_MODEL` deltas - top wait deltas - PostgreSQL: - queue-table depth - `pg_stat_database` - `pg_stat_statements` - lock and wait observations - WAL pressure observations - MongoDB: - signal collection depth - `serverStatus` counters - transaction counters - change-stream wake observations - lock percentage observations ## Oracle-Specific Observation Plan For Oracle-backed runs, observe both the engine and the database. At minimum, record: - AQ browse depth before, during, and after the run - count of runtime-state rows touched - count of task and task-event rows created - number of dead-lettered signals - duplicate/stale resume ignore count If the environment allows deeper Oracle access, also record: - session count for the service user - top wait classes during the run - lock waits on workflow tables - statement time for key mutation queries ## Exit Criteria The load/performance work is complete when: - PR smoke scenarios are stable and cheap enough to run continuously - nightly characterization produces persisted metrics and useful regression signal - at least one weekly soak run is stable without correctness drift - representative Bulstrad families have measured latency and throughput envelopes - Oracle restart, provider restart, DLQ replay, and duplicate-delivery scenarios are all characterized under load - the team can state a first production sizing recommendation for one node and multi-node deployment ## Next Sprint Shape This plan maps naturally to a dedicated sprint focused on: - performance harness infrastructure - synthetic scenario library - representative Bulstrad workload runner - metrics artifact generation - baseline capture - first capacity report