Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects

Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into
standalone StellaOps.Workflow.* libraries targeting net10.0.

Libraries (14):
- Contracts, Abstractions (compiler, decompiler, expression runtime)
- Engine (execution, signaling, scheduling, projections, hosted services)
- ElkSharp (generic graph layout algorithm)
- Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg
- Signaling.Redis, Signaling.OracleAq
- DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle

WebService: ASP.NET Core Minimal API with 22 endpoints

Tests (8 projects, 109 tests pass):
- Engine.Tests (105 pass), WebService.Tests (4 E2E pass)
- Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests
- Signaling.Redis.Tests, IntegrationTests.Shared

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
master
2026-03-20 19:14:44 +02:00
parent e56f9a114a
commit f5b5f24d95
422 changed files with 85428 additions and 0 deletions

View File

@@ -0,0 +1,544 @@
# 08. Load And Performance Plan
## Purpose
This document defines how the Serdica workflow engine should be load-tested, performance-characterized, and capacity-sized once functional parity is in place.
The goal is not only to prove that the engine is correct under load, but to answer these product and platform questions:
- how many workflow starts, task completions, and signal resumes can one node sustain
- how quickly does backlog drain after restart or outage
- how much timing variance is normal for Oracle AQ on local Docker, CI, and shared environments
- which workloads are Oracle-bound, AQ-bound, or engine-bound
- which scenarios are safe to gate in PR and which belong in nightly or explicit soak runs
## Principles
The performance plan follows these rules:
- correctness comes first; a fast but lossy engine result is a failed run
- performance tests must be split by intent: smoke, characterization, stress, soak, and failure-under-load
- transport-only tests and full workflow tests must both exist; they answer different questions
- synthetic workflows are required for stable measurement
- representative Bulstrad workflows are required for product confidence
- PR gates should use coarse, stable envelopes
- nightly and explicit runs should record and compare detailed metrics
- Oracle and AQ behavior must be measured directly, not inferred from app logs alone
## What Must Be Measured
### Correctness Under Load
Every load run should capture:
- total workflows started
- total tasks activated
- total tasks completed
- total signals published
- total signals processed
- total signals ignored as stale or duplicate
- total dead-lettered signals
- total runtime concurrency conflicts
- total failed runs
- total stuck instances at end of run
Correctness invariants:
- no lost committed signal
- no duplicate open task for the same logical wait
- no orphan subworkflow frame
- no runtime state row left without a valid explainable wait reason
- no queue backlog remaining after a successful drain phase unless the scenario intentionally leaves poison messages in DLQ
### Latency
The engine should measure at least:
- start-to-first-task latency
- start-to-completion latency
- task-complete-to-next-task latency
- signal-publish-to-task-visible latency
- timer-due-to-resume latency
- delayed-message lateness relative to requested due time
- backlog-drain completion time
- restart-to-first-processed-signal time
These should be recorded as:
- average
- p50
- p95
- p99
- max
### Throughput
The engine should measure:
- workflows started per second
- task completions per second
- signals published per second
- signals processed per second
- backlog drain rate in signals per second
- completed end-to-end business workflows per minute
### Saturation
The engine should measure:
- app process CPU
- app process private memory and working set
- Oracle container CPU and memory when running locally
- queue depth over time
- active waiting instances over time
- dead-letter depth over time
- runtime state update conflicts over time
- open task count over time
### Oracle-Side Signals
If the environment permits access, also collect:
- AQ queue depth before, during, and after load
- queue-table growth during sustained runs
- visible dequeue lag
- Oracle session count for the test service
- lock or wait spikes on workflow tables
- transaction duration for mutation transactions
If the environment does not permit these views, fall back to:
- app-side timing
- browse counts from AQ
- workflow table row counts
- signal pump telemetry snapshots
## Workload Model
The load plan should be split into four workload families.
### 1. Transport Microbenchmarks
These isolate Oracle AQ behavior from workflow logic.
Use them to answer:
- how fast can AQ accept immediate messages
- how fast can AQ release delayed messages
- what is the drain rate for mixed backlogs
- how much delayed-message jitter is normal
Core scenarios:
- burst immediate enqueue and drain
- burst delayed enqueue with same due second
- mixed immediate and delayed enqueue on one queue
- dequeue rollback redelivery under sustained load
- dead-letter and replay backlog
- delayed backlog surviving Oracle restart
### 2. Synthetic Engine Workloads
These isolate the runtime from business-specific transport noise.
Recommended synthetic workflow types:
- start-to-complete with no task
- start-to-task with one human task
- signal-wait then task activation
- timer-wait then task activation
- continue-with dispatcher chain
- parent-child subworkflow chain
Use them to answer:
- raw start throughput
- raw resume throughput
- timer-due drain rate
- subworkflow coordination cost
- task activation/update cost
### 3. Representative Bulstrad Workloads
These prove that realistic product workflows behave well under load.
The first performance wave should use workflows that are already functionally covered in the Oracle suite:
- `AssistantPrintInsisDocuments`
- `OpenForChangePolicy`
- `ReviewPolicyOpenForChange`
- `AssistantAddAnnex`
- `AnnexCancellation`
- `AssistantPolicyCancellation`
- `AssistantPolicyReinstate`
- `InsisIntegrationNew`
- `QuotationConfirm`
- `QuoteOrAplCancel`
Use them to answer:
- how the engine behaves with realistic transport payload shaping
- how nested child workflows affect latency
- how multi-step review chains behave during backlog drain
- how short utility flows compare to long policy chains
### 4. Failure-Under-Load Workloads
These are not optional. A production engine must be tested while busy.
Scenarios:
- provider restart during active signal drain
- Oracle restart while delayed backlog exists
- dead-letter replay while new live signals continue to arrive
- duplicate signal storm against the same waiting instance set
- one worker repeatedly failing while another healthy worker continues
- scheduled backlog plus external-signal backlog mixed together
Use them to answer:
- whether recovery stays bounded
- whether backlog drain remains monotonic
- whether duplicate-delivery protections still hold under pressure
- whether DLQ replay can safely coexist with live traffic
## Test Tiers
Performance testing should not be a single bucket.
### Tier 1: PR Smoke
Purpose:
- catch catastrophic regressions quickly
Characteristics:
- small datasets
- short run time
- deterministic scenarios
- hard pass/fail envelopes
Recommended scope:
- one AQ immediate burst
- one AQ delayed backlog burst
- one synthetic signal-resume scenario
- one short Bulstrad business flow
Target duration:
- under 5 minutes total
Gating style:
- zero correctness failures
- no DLQ unless explicitly expected
- coarse latency ceilings only
### Tier 2: Nightly Characterization
Purpose:
- measure trends and detect meaningful performance regression
Characteristics:
- moderate dataset
- multiple concurrency levels
- metrics persisted as artifacts
Recommended scope:
- full Oracle transport matrix
- synthetic engine workloads at 1, 4, 8, and 16-way concurrency
- 3-5 representative Bulstrad families
- restart and DLQ replay under moderate backlog
Target duration:
- 15 to 45 minutes
Gating style:
- correctness failures fail the run
- latency/throughput compare against baseline with tolerance
### Tier 3: Weekly Soak
Purpose:
- detect leaks, drift, and long-tail timing issues
Characteristics:
- long-running mixed workload
- periodic restarts or controlled faults
- queue depth and runtime-state stability tracking
Recommended scope:
- 30 to 120 minute mixed load
- immediate, delayed, and replay traffic mixed together
- repeated provider restarts
- one Oracle restart in the middle of the run
Gating style:
- no unbounded backlog growth
- no stuck instances
- no memory growth trend outside a defined envelope
### Tier 4: Explicit Capacity And Breakpoint Runs
Purpose:
- learn real limits before production sizing decisions
Characteristics:
- not part of normal CI
- intentionally pushes throughput until latency or failure thresholds break
Recommended scope:
- ramp concurrency upward until queue lag or DB pressure exceeds target
- test one-node and multi-node configurations
- record saturation points, not just pass/fail
Deliverable:
- capacity report with recommended node counts and operational envelopes
## Scenario Matrix
The initial scenario matrix should look like this.
### Oracle AQ Transport
- immediate burst: 100, 500, 1000 messages
- delayed burst: 50, 100, 250 messages due in same second
- mixed burst: 70 percent immediate, 30 percent delayed
- redelivery burst: 25 messages rolled back once then committed
- DLQ burst: 25 poison messages then replay
### Synthetic Engine
- start-to-task: 50, 200, 500 workflow starts
- task-complete-to-next-task: 50, 200 completions
- signal-wait-resume: 50, 200, 500 waiting instances resumed concurrently
- timer-wait-resume: 50, 200 due timers
- subworkflow chain: 25, 100 parent-child chains
### Bulstrad Business
- short business flow: `QuoteOrAplCancel`
- medium transport flow: `InsisIntegrationNew`
- child-workflow flow: `QuotationConfirm`
- long review chain: `OpenForChangePolicy`
- print flow: `AssistantPrintInsisDocuments`
- cancellation flow: `AnnexCancellation`
### Failure Under Load
- 100 waiting instances, provider restart during drain
- 100 delayed messages, Oracle restart before due time
- 50 poison signals plus live replay traffic
- duplicate external signal storm against 50 waiting instances
- mixed task completions and signal resumes on same service instance set
## Concurrency Steps
Use explicit concurrency ladders instead of one arbitrary load value.
Recommended first ladder:
- 1
- 4
- 8
- 16
- 32
Use different ladders if the environment is too small, but always record:
- node count
- worker concurrency
- queue backlog size
- workflow count
- message mix
## Metrics Collection Design
The harness should persist results for every performance run.
Each result set should include:
- scenario name
- git commit or working tree marker
- test timestamp
- environment label
- node count
- concurrency level
- workflow count
- signal count
- Oracle queue names used
- measured latency summary
- throughput summary
- correctness summary
- process resource summary
- optional Oracle observations
Recommended output format:
- JSON artifact for machines
- short markdown summary for humans
Recommended location:
- `TestResults/workflow-performance/`
## Baseline Strategy
Do not hard-code aggressive latency thresholds before collecting stable data.
Use this sequence:
1. characterization phase
Run each scenario several times on local Docker and CI Oracle.
2. baseline phase
Record stable p50, p95, p99, throughput, and drain-rate envelopes.
3. gating phase
Add coarse PR thresholds and tighter nightly regression detection.
PR thresholds should be:
- intentionally forgiving
- correctness-first
- designed to catch major regressions only
Nightly thresholds should be:
- baseline-relative
- environment-specific if necessary
- reviewed whenever Oracle container images or CI hardware changes
## Harness Design
The load harness should be separate from the normal fast integration suite.
Recommended structure:
- keep correctness-focused Oracle AQ tests in the current integration project
- add categorized performance tests with explicit categories such as:
- `WorkflowPerfLatency`
- `WorkflowPerfThroughput`
- `WorkflowPerfSmoke`
- `WorkflowPerfNightly`
- `WorkflowPerfSoak`
- `WorkflowPerfCapacity`
- keep scenario builders reusable so the same workflow/transports can be used in correctness and performance runs
The harness should include:
- scenario driver
- result collector
- metric aggregator
- optional Oracle observation collector
- artifact writer
- explicit phase-latency capture for start, signal publish, and signal-to-completion on the synthetic signal round-trip workload
## Multi-Backend Expansion Rules
Once Oracle is the validated reference baseline, PostgreSQL and MongoDB must adopt the same load and performance structure instead of inventing backend-specific suites first.
Required rules:
- keep one shared scenario catalog for Oracle, PostgreSQL, and MongoDB
- compare backends first on normalized workflow metrics, not backend-native counters
- keep backend-native metrics as appendices, not as the headline result
- use the same tier names and artifact schema across all backends
- keep the same curated Bulstrad workload pack across all backends unless a workflow is backend-blocked by a real functional defect
The shared artifact set should ultimately include:
- `10-oracle-performance-baseline-<date>.md/.json`
- `11-postgres-performance-baseline-<date>.md/.json`
- `12-mongo-performance-baseline-<date>.md/.json`
- `13-backend-comparison-<date>.md/.json`
The shared normalized metrics are:
- serial end-to-end latency
- start-to-first-task latency
- signal-publish-to-visible-resume latency
- steady-state throughput
- capacity ladder at `c1`, `c4`, `c8`, and `c16`
- backlog drain time
- failures
- dead letters
- runtime conflicts
- stuck instances
Backend-native appendices should include:
- Oracle:
- AQ browse depth
- `V$SYSSTAT` deltas
- `V$SYS_TIME_MODEL` deltas
- top wait deltas
- PostgreSQL:
- queue-table depth
- `pg_stat_database`
- `pg_stat_statements`
- lock and wait observations
- WAL pressure observations
- MongoDB:
- signal collection depth
- `serverStatus` counters
- transaction counters
- change-stream wake observations
- lock percentage observations
## Oracle-Specific Observation Plan
For Oracle-backed runs, observe both the engine and the database.
At minimum, record:
- AQ browse depth before, during, and after the run
- count of runtime-state rows touched
- count of task and task-event rows created
- number of dead-lettered signals
- duplicate/stale resume ignore count
If the environment allows deeper Oracle access, also record:
- session count for the service user
- top wait classes during the run
- lock waits on workflow tables
- statement time for key mutation queries
## Exit Criteria
The load/performance work is complete when:
- PR smoke scenarios are stable and cheap enough to run continuously
- nightly characterization produces persisted metrics and useful regression signal
- at least one weekly soak run is stable without correctness drift
- representative Bulstrad families have measured latency and throughput envelopes
- Oracle restart, provider restart, DLQ replay, and duplicate-delivery scenarios are all characterized under load
- the team can state a first production sizing recommendation for one node and multi-node deployment
## Next Sprint Shape
This plan maps naturally to a dedicated sprint focused on:
- performance harness infrastructure
- synthetic scenario library
- representative Bulstrad workload runner
- metrics artifact generation
- baseline capture
- first capacity report