Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects
Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into standalone StellaOps.Workflow.* libraries targeting net10.0. Libraries (14): - Contracts, Abstractions (compiler, decompiler, expression runtime) - Engine (execution, signaling, scheduling, projections, hosted services) - ElkSharp (generic graph layout algorithm) - Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg - Signaling.Redis, Signaling.OracleAq - DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle WebService: ASP.NET Core Minimal API with 22 endpoints Tests (8 projects, 109 tests pass): - Engine.Tests (105 pass), WebService.Tests (4 E2E pass) - Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests - Signaling.Redis.Tests, IntegrationTests.Shared Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
1023
docs/workflow/ENGINE.md
Normal file
1023
docs/workflow/ENGINE.md
Normal file
File diff suppressed because it is too large
Load Diff
291
docs/workflow/engine/01-requirements-and-principles.md
Normal file
291
docs/workflow/engine/01-requirements-and-principles.md
Normal file
@@ -0,0 +1,291 @@
|
||||
# 01. Requirements And Principles
|
||||
|
||||
## 1. Product Goal
|
||||
|
||||
Build a Serdica-owned workflow engine that can run the current Bulstrad workflow corpus without Elsa while preserving the existing service-level workflow product:
|
||||
|
||||
- workflow start
|
||||
- task inbox and task lifecycle
|
||||
- business-reference based lookup
|
||||
- runtime state inspection
|
||||
- workflow diagrams
|
||||
- canonical schema and canonical validation exposure
|
||||
- workflow retention and hosted jobs
|
||||
|
||||
The engine must execute the same business behavior currently expressed in the declarative workflow DSL and canonical workflow definition model.
|
||||
|
||||
## 2. Functional Requirements
|
||||
|
||||
### 2.1 Workflow Definition Handling
|
||||
|
||||
The engine must:
|
||||
|
||||
- discover workflow registrations from authored C# workflow classes
|
||||
- resolve the latest or exact workflow version through the existing registration catalog
|
||||
- compile authored declarative workflows into canonical runtime definitions
|
||||
- keep canonical validation as a first-class platform capability
|
||||
- reject invalid or unsupported definitions during startup or validation
|
||||
|
||||
### 2.2 Workflow Start
|
||||
|
||||
The engine must:
|
||||
|
||||
- bind the untyped start payload to the workflow start request type
|
||||
- resolve or derive business reference data
|
||||
- initialize canonical workflow state
|
||||
- execute the initial sequence until a wait boundary or completion
|
||||
- create workflow projections and runtime state in one durable flow
|
||||
- support workflow continuations created during start
|
||||
|
||||
### 2.3 Human Tasks
|
||||
|
||||
The engine must:
|
||||
|
||||
- activate human tasks with:
|
||||
- task type
|
||||
- route
|
||||
- workflow roles
|
||||
- task roles
|
||||
- runtime roles
|
||||
- payload
|
||||
- business reference
|
||||
- preserve the current task assignment model:
|
||||
- assign to self
|
||||
- assign to user
|
||||
- assign to runtime roles
|
||||
- release
|
||||
- expose completed and active task history through the existing projection model
|
||||
|
||||
### 2.4 Task Completion
|
||||
|
||||
The engine must:
|
||||
|
||||
- load the current workflow state and task context
|
||||
- authorize completion through the existing service layer
|
||||
- apply completion payload
|
||||
- continue execution from the task completion entry point
|
||||
- produce next tasks, next waits, next continuations, or completion
|
||||
- update runtime state and read projections durably
|
||||
|
||||
### 2.5 Runtime Semantics
|
||||
|
||||
The engine must support the semantic surface already present in declarative workflows:
|
||||
|
||||
- state assignment
|
||||
- business reference assignment
|
||||
- human task activation
|
||||
- microservice calls
|
||||
- legacy rabbit calls
|
||||
- GraphQL calls
|
||||
- HTTP calls
|
||||
- conditional branches
|
||||
- decision branches
|
||||
- repeat loops
|
||||
- subworkflow invocation
|
||||
- continue-with orchestration
|
||||
- timeout branches
|
||||
- failure branches
|
||||
- function-backed expressions
|
||||
|
||||
### 2.6 Subworkflows
|
||||
|
||||
The engine must:
|
||||
|
||||
- start child workflows
|
||||
- persist parent resume frames
|
||||
- carry child output back into parent state
|
||||
- support nested resume across multiple levels
|
||||
- preserve current declarative subworkflow semantics
|
||||
|
||||
### 2.7 Scheduling
|
||||
|
||||
The engine must support:
|
||||
|
||||
- timeouts
|
||||
- retry wake-ups
|
||||
- delayed continuation
|
||||
- explicit wait-until behavior
|
||||
|
||||
This must happen without a steady-state polling loop.
|
||||
|
||||
### 2.8 Inspection And Operations
|
||||
|
||||
The service must continue to expose:
|
||||
|
||||
- workflow definitions
|
||||
- workflow instances
|
||||
- workflow tasks
|
||||
- workflow task events
|
||||
- workflow diagrams
|
||||
- runtime state snapshots
|
||||
- canonical schema
|
||||
- canonical validation
|
||||
|
||||
## 3. Non-Functional Requirements
|
||||
|
||||
### 3.1 Multi-Instance Deployment
|
||||
|
||||
The service must support multiple application nodes against one shared Oracle database.
|
||||
|
||||
Implications:
|
||||
|
||||
- no single-node assumptions
|
||||
- no in-memory-only correctness logic
|
||||
- no sticky workflow ownership
|
||||
- duplicate signal delivery must be safe
|
||||
|
||||
### 3.2 Durability
|
||||
|
||||
The system of record must be durable across:
|
||||
|
||||
- process restart
|
||||
- node restart
|
||||
- full cluster restart
|
||||
- database restart
|
||||
|
||||
Workflow progress, pending waits, active tasks, and due timers must not be lost.
|
||||
|
||||
### 3.3 No Polling
|
||||
|
||||
Signal-driven wake-up is mandatory.
|
||||
|
||||
The engine must not rely on a periodic database scan loop to discover work. Blocking or event-driven delivery is required for:
|
||||
|
||||
- task completion wake-up
|
||||
- delayed resume wake-up
|
||||
- subworkflow completion wake-up
|
||||
- external signal wake-up
|
||||
|
||||
### 3.4 One Database
|
||||
|
||||
Oracle is the shared durable state backend for:
|
||||
|
||||
- workflow projections
|
||||
- workflow runtime snapshots
|
||||
- host coordination
|
||||
- signal and schedule durability through Oracle AQ
|
||||
|
||||
Redis may exist in the wider platform, but it is not required for engine correctness.
|
||||
|
||||
### 3.5 Observability
|
||||
|
||||
The engine must produce enough telemetry to answer:
|
||||
|
||||
- what instance is waiting
|
||||
- why it is waiting
|
||||
- which signal resumed it
|
||||
- which node executed it
|
||||
- which definition version it used
|
||||
- why it failed
|
||||
- whether a message was retried, dead-lettered, or ignored as stale
|
||||
|
||||
### 3.6 Compatibility
|
||||
|
||||
The engine must preserve the existing public workflow service contracts unless a future product change explicitly changes them.
|
||||
|
||||
The following service-contract groups are especially important:
|
||||
|
||||
- workflow start contracts
|
||||
- workflow definition contracts
|
||||
- workflow task contracts
|
||||
- workflow instance contracts
|
||||
- workflow operational contracts
|
||||
|
||||
## 4. Explicit V1 Assumptions
|
||||
|
||||
These assumptions simplify the engine architecture and are intentional.
|
||||
|
||||
### 4.1 Single Active Runtime Provider Per Deployment
|
||||
|
||||
The service runs one engine provider at a time.
|
||||
|
||||
This means:
|
||||
|
||||
- no mixed-provider instance routing
|
||||
- no live migration between engines
|
||||
- no simultaneous old-runtime and engine execution inside one deployment
|
||||
|
||||
The design still keeps abstractions around the runtime, signaling bus, and scheduler so that future replacement remains possible.
|
||||
|
||||
### 4.2 Canonical Runtime, Not Elsa Activity Runtime
|
||||
|
||||
The target engine executes canonical workflow definitions directly.
|
||||
|
||||
Authored C# remains the source of truth, but runtime semantics are driven by canonical definitions compiled from that source.
|
||||
|
||||
### 4.3 Oracle AQ Is The Default Event Backbone
|
||||
|
||||
Oracle AQ is treated as part of the durable engine platform because it satisfies:
|
||||
|
||||
- one-database architecture
|
||||
- blocking dequeue
|
||||
- durable delivery
|
||||
- delayed delivery
|
||||
- transactional behavior
|
||||
|
||||
## 5. Design Principles
|
||||
|
||||
### 5.1 Keep The Product Surface Stable
|
||||
|
||||
The workflow service remains the product boundary. The engine is an internal subsystem.
|
||||
|
||||
### 5.2 Separate Read Model From Runtime Model
|
||||
|
||||
Task and instance projections are optimized for product reads.
|
||||
|
||||
Runtime snapshots are optimized for deterministic resume.
|
||||
|
||||
They are related, but they are not the same data structure.
|
||||
|
||||
### 5.3 Run To Wait
|
||||
|
||||
The engine should never keep a workflow instance “hot†in memory for correctness.
|
||||
|
||||
Execution should run until:
|
||||
|
||||
- a task is activated
|
||||
- a timer is scheduled
|
||||
- an external signal wait is registered
|
||||
- the workflow completes
|
||||
|
||||
Then the snapshot is persisted and released.
|
||||
|
||||
### 5.4 Make Delivery At-Least-Once And Resume Idempotent
|
||||
|
||||
Distributed delivery is never exactly-once in practice.
|
||||
|
||||
The engine must treat duplicate signals, duplicate wake-ups, and late timer arrivals as normal conditions.
|
||||
|
||||
### 5.5 Keep Signals Small
|
||||
|
||||
Signals should identify work, not carry the full workflow state.
|
||||
|
||||
The database snapshot remains authoritative.
|
||||
|
||||
### 5.6 Keep Abstractions At The Backend Boundary
|
||||
|
||||
Abstract:
|
||||
|
||||
- runtime provider
|
||||
- signal bus
|
||||
- schedule bus
|
||||
- snapshot store
|
||||
|
||||
Do not abstract away the workflow semantics themselves.
|
||||
|
||||
### 5.7 Prefer Transactional Consistency Over Cleverness
|
||||
|
||||
If a feature can be made transactional in Oracle, prefer that over eventually-consistent coordination tricks.
|
||||
|
||||
## 6. Success Criteria
|
||||
|
||||
The engine architecture is successful when:
|
||||
|
||||
- the service can start and complete workflows without Elsa
|
||||
- task projections remain correct
|
||||
- delayed resumes happen without polling
|
||||
- a stopped cluster resumes safely after restart
|
||||
- a multi-node deployment does not corrupt workflow state
|
||||
- canonical definitions remain the execution contract
|
||||
- operations can inspect and support the system with existing product-level APIs
|
||||
|
||||
397
docs/workflow/engine/02-runtime-and-component-architecture.md
Normal file
397
docs/workflow/engine/02-runtime-and-component-architecture.md
Normal file
@@ -0,0 +1,397 @@
|
||||
# 02. Runtime And Component Architecture
|
||||
|
||||
## 1. Top-Level System View
|
||||
|
||||
At the highest level, the service contains six product-facing areas:
|
||||
|
||||
1. definition and canonical catalog
|
||||
2. start and task APIs
|
||||
3. engine execution runtime
|
||||
4. durable state and read projections
|
||||
5. signaling and scheduling
|
||||
6. operational services
|
||||
|
||||
The engine replaces the Elsa-dependent runtime area, not the whole product.
|
||||
|
||||
## 2. Top-Level Components
|
||||
|
||||
### 2.1 API Layer
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- expose workflow endpoints
|
||||
- validate user input
|
||||
- call `WorkflowRuntimeService`
|
||||
- preserve current contract shape
|
||||
|
||||
Examples in the current service:
|
||||
|
||||
- workflow start endpoint
|
||||
- task get/list endpoints
|
||||
- task assign/release/complete endpoints
|
||||
- instance get/list endpoints
|
||||
- canonical schema and validation endpoints
|
||||
|
||||
### 2.2 Product Orchestration Layer
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- resolve workflow registration and definition
|
||||
- enforce service-level flow for start and task completion
|
||||
- update read projections
|
||||
- call runtime provider
|
||||
- persist runtime snapshot metadata
|
||||
- start continuations
|
||||
|
||||
The current workflow runtime service remains the product orchestrator in v1.
|
||||
|
||||
### 2.3 Runtime Provider Layer
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- provide a stable execution interface
|
||||
- hide the concrete runtime implementation
|
||||
- allow a future backend swap without changing service-level behavior
|
||||
|
||||
Proposed abstraction:
|
||||
|
||||
```csharp
|
||||
public interface IWorkflowRuntimeProvider
|
||||
{
|
||||
string ProviderName { get; }
|
||||
|
||||
Task<WorkflowRuntimeExecutionResult> StartAsync(
|
||||
WorkflowRegistration registration,
|
||||
WorkflowDefinitionDescriptor definition,
|
||||
WorkflowBusinessReference? businessReference,
|
||||
StartWorkflowRequest request,
|
||||
object startRequest,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
Task<WorkflowRuntimeExecutionResult> CompleteAsync(
|
||||
WorkflowRegistration registration,
|
||||
WorkflowDefinitionDescriptor definition,
|
||||
WorkflowTaskExecutionContext context,
|
||||
CancellationToken cancellationToken = default);
|
||||
}
|
||||
```
|
||||
|
||||
In v1, one provider is active per deployment:
|
||||
|
||||
- `SerdicaEngineRuntimeProvider`
|
||||
|
||||
The abstraction still exists so the backend can change later.
|
||||
|
||||
### 2.4 Canonical Execution Layer
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- execute canonical definitions
|
||||
- evaluate expressions
|
||||
- drive state transitions
|
||||
- activate tasks
|
||||
- invoke transports
|
||||
- persist wait state
|
||||
- emit signals and schedules
|
||||
|
||||
This is the actual engine kernel.
|
||||
|
||||
### 2.5 Persistence Layer
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- store runtime snapshots
|
||||
- store instance projections
|
||||
- store task projections
|
||||
- store task events
|
||||
- coordinate host-owned jobs and workers
|
||||
|
||||
The current baseline uses one workflow database model plus one projection application service for product-facing reads.
|
||||
|
||||
### 2.6 Signal And Schedule Layer
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- deliver immediate wake-up signals
|
||||
- deliver delayed wake-up signals
|
||||
- support blocking receive
|
||||
- support durable retry and dead-letter handling
|
||||
|
||||
Default backend:
|
||||
|
||||
- Oracle AQ
|
||||
|
||||
### 2.7 Operational Layer
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- retention
|
||||
- dead-letter handling
|
||||
- metrics
|
||||
- tracing
|
||||
- runtime diagnostics
|
||||
- workflow diagram projection
|
||||
|
||||
## 3. Mid-Level Runtime Structure
|
||||
|
||||
The engine should be decomposed into the following internal runtime components.
|
||||
|
||||
### 3.1 Definition Normalizer
|
||||
|
||||
Purpose:
|
||||
|
||||
- take authored workflow registrations
|
||||
- compile them into canonical runtime definitions
|
||||
- validate the definitions
|
||||
- cache them for execution
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- call canonical compiler
|
||||
- call canonical validator
|
||||
- fail startup when configured to require valid definitions
|
||||
- expose resolved runtime definitions by workflow name/version
|
||||
|
||||
### 3.2 Execution Coordinator
|
||||
|
||||
Purpose:
|
||||
|
||||
- provide the single in-process entry point for runtime execution
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- load current snapshot
|
||||
- acquire execution right through version check or row lock
|
||||
- invoke interpreter
|
||||
- collect engine side effects
|
||||
- persist snapshot changes
|
||||
- update projections
|
||||
- enqueue signals or schedules
|
||||
- commit transaction
|
||||
|
||||
### 3.3 Canonical Interpreter
|
||||
|
||||
Purpose:
|
||||
|
||||
- interpret canonical steps until the next wait boundary
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- evaluate canonical expressions
|
||||
- handle step sequencing
|
||||
- handle branching and repeat loops
|
||||
- activate human tasks
|
||||
- invoke transport adapters
|
||||
- enter wait states
|
||||
- resume from wait states
|
||||
- manage subworkflow frames
|
||||
|
||||
### 3.4 Expression Runtime
|
||||
|
||||
Purpose:
|
||||
|
||||
- evaluate canonical expressions consistently across runtime and validation expectations
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- use core function catalog
|
||||
- use plugin function catalog
|
||||
- evaluate against the canonical execution context
|
||||
|
||||
Current design baseline:
|
||||
|
||||
- one canonical expression runtime
|
||||
- one core function catalog
|
||||
- zero or more plugin-provided function catalogs
|
||||
|
||||
### 3.5 Transport Dispatcher
|
||||
|
||||
Purpose:
|
||||
|
||||
- execute transport-backed steps through Serdica transport abstractions
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- resolve transport type
|
||||
- call the correct adapter
|
||||
- normalize responses to canonical result objects
|
||||
- route failure and timeout behavior back into the interpreter
|
||||
|
||||
### 3.6 Task Activation Writer
|
||||
|
||||
Purpose:
|
||||
|
||||
- convert a runtime task activation result into projection rows
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- create task rows
|
||||
- create task-created events
|
||||
- preserve business reference and role semantics
|
||||
|
||||
### 3.7 Signal Pump
|
||||
|
||||
Purpose:
|
||||
|
||||
- block on AQ dequeue
|
||||
- dispatch envelopes to the execution coordinator
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- receive signal envelope
|
||||
- process with bounded concurrency
|
||||
- complete or abandon transactionally
|
||||
- dead-letter poison signals
|
||||
|
||||
### 3.8 Scheduler Adapter
|
||||
|
||||
Purpose:
|
||||
|
||||
- translate runtime waits into AQ delayed messages
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- enqueue due signals with delay
|
||||
- cancel logically through waiting tokens
|
||||
- ignore stale delayed messages safely
|
||||
|
||||
## 4. Detailed Component Responsibilities
|
||||
|
||||
### 4.1 WorkflowRuntimeService
|
||||
|
||||
This service remains the product boundary for runtime actions.
|
||||
|
||||
It should continue to own:
|
||||
|
||||
- start request binding
|
||||
- business reference resolution
|
||||
- task authorization integration
|
||||
- projection updates
|
||||
- runtime snapshot persistence
|
||||
- continuation dispatch
|
||||
|
||||
It should stop owning:
|
||||
|
||||
- engine-specific step execution logic
|
||||
- engine-specific scheduling details
|
||||
- engine-specific signal handling
|
||||
|
||||
### 4.2 SerdicaEngineRuntimeProvider
|
||||
|
||||
This provider becomes the main bridge between product orchestration and the runtime kernel.
|
||||
|
||||
It should:
|
||||
|
||||
- normalize the requested workflow into a canonical runtime definition
|
||||
- create an execution request
|
||||
- call the execution coordinator
|
||||
- map engine execution results into `WorkflowRuntimeExecutionResult`
|
||||
|
||||
It should not:
|
||||
|
||||
- update read projections directly
|
||||
- own task authorization
|
||||
- know about HTTP endpoint contracts
|
||||
|
||||
### 4.3 WorkflowProjectionStore
|
||||
|
||||
This store remains the read model writer.
|
||||
|
||||
It should continue to own:
|
||||
|
||||
- `WF_INSTANCES`
|
||||
- `WF_TASKS`
|
||||
- `WF_TASK_EVENTS`
|
||||
|
||||
It should not become the engine snapshot store.
|
||||
|
||||
### 4.4 Runtime Snapshot Store
|
||||
|
||||
This store owns the authoritative engine snapshot.
|
||||
|
||||
It should:
|
||||
|
||||
- read current runtime state
|
||||
- write runtime state atomically
|
||||
- enforce optimistic concurrency or explicit version progression
|
||||
- store waiting metadata
|
||||
- store provider state
|
||||
|
||||
It may evolve from the current `IWorkflowRuntimeStateStore`.
|
||||
|
||||
### 4.5 AQ Signal Bus
|
||||
|
||||
This adapter owns durable wake-up delivery.
|
||||
|
||||
It should:
|
||||
|
||||
- publish immediate signals
|
||||
- publish delayed signals
|
||||
- receive with blocking dequeue
|
||||
- expose complete/abandon semantics
|
||||
|
||||
It should not:
|
||||
|
||||
- understand workflow business logic
|
||||
- mutate projections
|
||||
- deserialize full workflow snapshots
|
||||
|
||||
## 5. Runtime Request Flows
|
||||
|
||||
### 5.1 Start Workflow
|
||||
|
||||
1. API receives `StartWorkflowRequest`.
|
||||
2. `WorkflowRuntimeService` resolves registration and definition.
|
||||
3. The typed request is bound from payload.
|
||||
4. Business reference is resolved.
|
||||
5. `SerdicaEngineRuntimeProvider.StartAsync` is called.
|
||||
6. The provider resolves the canonical runtime definition.
|
||||
7. The execution coordinator creates a new snapshot and runs the interpreter.
|
||||
8. The interpreter runs until:
|
||||
- a task is activated
|
||||
- a timer wait is registered
|
||||
- an external wait is registered
|
||||
- the workflow completes
|
||||
9. The coordinator persists runtime snapshot changes.
|
||||
10. `WorkflowRuntimeService` writes projections and runtime metadata.
|
||||
11. Continuations are started if present.
|
||||
|
||||
### 5.2 Complete Task
|
||||
|
||||
1. API receives `WorkflowTaskCompleteRequest`.
|
||||
2. `WorkflowRuntimeService` loads snapshot and task projection.
|
||||
3. Authorization is checked.
|
||||
4. The runtime provider is called with:
|
||||
- task context
|
||||
- workflow state
|
||||
- completion payload
|
||||
5. The execution coordinator advances the canonical definition from the task completion entry point.
|
||||
6. It persists the new runtime snapshot and engine wait state.
|
||||
7. `WorkflowRuntimeService` applies task completion and creates new task rows if needed.
|
||||
|
||||
### 5.3 External Or Scheduled Signal
|
||||
|
||||
1. AQ signal pump dequeues a signal.
|
||||
2. The signal is deserialized to a workflow signal envelope.
|
||||
3. The execution coordinator loads the current snapshot.
|
||||
4. The coordinator verifies:
|
||||
- workflow instance exists
|
||||
- waiting token matches
|
||||
- version is compatible
|
||||
5. The interpreter resumes from the stored resume point.
|
||||
6. The transaction commits snapshot changes, projection changes, and any next signals.
|
||||
|
||||
## 6. Why This Structure Fits The Current Service
|
||||
|
||||
The current service already separates:
|
||||
|
||||
- product orchestration
|
||||
- execution abstraction
|
||||
- projections
|
||||
- runtime state
|
||||
- authorization
|
||||
|
||||
The new engine architecture uses that separation rather than fighting it.
|
||||
|
||||
That is the main reason the replacement can be implemented incrementally without redesigning the whole product.
|
||||
|
||||
377
docs/workflow/engine/03-canonical-execution-model.md
Normal file
377
docs/workflow/engine/03-canonical-execution-model.md
Normal file
@@ -0,0 +1,377 @@
|
||||
# 03. Canonical Execution Model
|
||||
|
||||
## 1. Why The Engine Executes Canonical Definitions
|
||||
|
||||
The workflow corpus is now fully declarative and canonicalizable.
|
||||
|
||||
That changes the best runtime strategy:
|
||||
|
||||
- authored C# remains the source of truth
|
||||
- canonical definition becomes the runtime execution contract
|
||||
- the engine interprets canonical definitions directly
|
||||
|
||||
This gives the platform:
|
||||
|
||||
- deterministic runtime behavior
|
||||
- shared semantics between export/import and execution
|
||||
- less runtime coupling to workflow-specific CLR delegates
|
||||
- a clean separation between authoring and execution
|
||||
|
||||
## 2. Definition Lifecycle
|
||||
|
||||
### 2.1 Authoring
|
||||
|
||||
Workflows are authored in C# through the declarative DSL.
|
||||
|
||||
### 2.2 Normalization
|
||||
|
||||
At service startup, each workflow registration is normalized into:
|
||||
|
||||
1. workflow registration metadata
|
||||
2. canonical workflow definition
|
||||
3. required module set
|
||||
4. function usage metadata
|
||||
|
||||
### 2.3 Validation
|
||||
|
||||
The runtime should validate canonical definitions before accepting them for execution.
|
||||
|
||||
Recommended startup modes:
|
||||
|
||||
- `Strict`
|
||||
Startup fails if a definition is invalid.
|
||||
- `Warn`
|
||||
Startup succeeds, but invalid definitions are marked unavailable.
|
||||
|
||||
### 2.4 Runtime Cache
|
||||
|
||||
The engine should cache canonical runtime definitions in memory by:
|
||||
|
||||
- workflow name
|
||||
- workflow version
|
||||
|
||||
This cache is immutable after startup in v1.
|
||||
|
||||
## 3. Canonical Runtime Definition Shape
|
||||
|
||||
The runtime definition should be treated as a compiled, execution-ready representation of the canonical contracts, not a raw JSON document.
|
||||
|
||||
The runtime model should contain:
|
||||
|
||||
- definition identity
|
||||
- display metadata
|
||||
- required modules
|
||||
- step graph
|
||||
- task declarations
|
||||
- expression trees
|
||||
- transport declarations
|
||||
- subworkflow declarations
|
||||
- continue-with declarations
|
||||
|
||||
## 4. Execution Context Model
|
||||
|
||||
The interpreter should run every step against a single canonical execution context.
|
||||
|
||||
Recommended execution context fields:
|
||||
|
||||
- `WorkflowName`
|
||||
- `WorkflowVersion`
|
||||
- `WorkflowInstanceId`
|
||||
- `BusinessReference`
|
||||
- `State`
|
||||
- `StartPayload`
|
||||
- `CompletionPayload`
|
||||
- `CurrentTask`
|
||||
- `CurrentSignal`
|
||||
- `FunctionRuntime`
|
||||
- `TransportDispatcher`
|
||||
- `RuntimeMetadata`
|
||||
|
||||
`RuntimeMetadata` should hold:
|
||||
|
||||
- node id
|
||||
- current signal id
|
||||
- snapshot version
|
||||
- waiting token
|
||||
- execution started at
|
||||
|
||||
## 5. Core Runtime State Model
|
||||
|
||||
The runtime must distinguish between:
|
||||
|
||||
- business state
|
||||
- engine state
|
||||
|
||||
### 5.1 Business State
|
||||
|
||||
Business state is what the workflow author reasons about.
|
||||
|
||||
Examples:
|
||||
|
||||
- `srPolicyId`
|
||||
- `policySubstatus`
|
||||
- customer lookup state
|
||||
- payload shaping outputs
|
||||
- subworkflow results
|
||||
|
||||
### 5.2 Engine State
|
||||
|
||||
Engine state is what the runtime needs to resume correctly.
|
||||
|
||||
Examples:
|
||||
|
||||
- current workflow status
|
||||
- current wait type
|
||||
- current wait token
|
||||
- active task identity
|
||||
- resume pointer
|
||||
- subworkflow frame stack
|
||||
- outstanding timer descriptors
|
||||
- last processed signal id
|
||||
|
||||
Business state must remain visible in runtime inspection.
|
||||
Engine state must remain safe and deterministic for resume.
|
||||
|
||||
## 6. Run-To-Wait Execution Model
|
||||
|
||||
The engine uses a run-to-wait interpreter.
|
||||
|
||||
This means:
|
||||
|
||||
1. load snapshot
|
||||
2. execute sequentially
|
||||
3. stop when a durable wait boundary is reached
|
||||
4. persist resulting snapshot
|
||||
5. release instance
|
||||
|
||||
Wait boundaries are:
|
||||
|
||||
- human task activation
|
||||
- scheduled timer
|
||||
- external signal wait
|
||||
- child workflow wait
|
||||
- terminal completion
|
||||
|
||||
This model is essential for:
|
||||
|
||||
- multi-instance safety
|
||||
- restart recovery
|
||||
- no sticky ownership
|
||||
- no in-memory correctness assumptions
|
||||
|
||||
## 7. Step Semantics
|
||||
|
||||
### 7.1 State Assignment
|
||||
|
||||
State assignment is immediate and local to the current execution transaction.
|
||||
|
||||
The engine:
|
||||
|
||||
- evaluates the assignment expression
|
||||
- writes to the business state dictionary
|
||||
- keeps changes in-memory until the next durable checkpoint
|
||||
|
||||
### 7.2 Business Reference Assignment
|
||||
|
||||
Business reference assignment updates the canonical business reference attached to:
|
||||
|
||||
- the runtime snapshot
|
||||
- new tasks
|
||||
- instance projection updates
|
||||
|
||||
Business reference changes must be applied transactionally with other execution results.
|
||||
|
||||
### 7.3 Human Task Activation
|
||||
|
||||
A human task activation step is a terminal wait boundary.
|
||||
|
||||
The interpreter does not continue past it in the same execution.
|
||||
|
||||
The result of task activation is:
|
||||
|
||||
- one active task projection
|
||||
- updated instance status
|
||||
- updated runtime snapshot
|
||||
- optional runtime metadata for the active task
|
||||
|
||||
### 7.4 Transport Call
|
||||
|
||||
Transport calls are synchronous from the perspective of a single execution slice.
|
||||
|
||||
The engine:
|
||||
|
||||
- evaluates payload expressions
|
||||
- dispatches through the correct transport adapter
|
||||
- captures result payload
|
||||
- stores result under the result key when present
|
||||
- chooses the success, failure, or timeout branch
|
||||
|
||||
No engine-specific callback registration should be required for normal synchronous transport calls.
|
||||
|
||||
### 7.5 Conditional Branch
|
||||
|
||||
Conditions evaluate against the current execution context.
|
||||
|
||||
Only one branch is executed.
|
||||
|
||||
The branch path must be reproducible in the resume pointer model.
|
||||
|
||||
### 7.6 Repeat
|
||||
|
||||
Repeat executes logically as:
|
||||
|
||||
- evaluate collection or repeat source
|
||||
- for each iteration:
|
||||
- bind iteration context
|
||||
- execute nested sequence
|
||||
|
||||
If an iteration hits a wait boundary, the engine snapshot must preserve:
|
||||
|
||||
- repeat step id
|
||||
- iteration index
|
||||
- remaining resume location inside the iteration body
|
||||
|
||||
### 7.7 Subworkflow Invocation
|
||||
|
||||
Subworkflow invocation is a wait boundary unless the child completes inline before producing a wait.
|
||||
|
||||
Parent snapshot must record:
|
||||
|
||||
- child workflow identity
|
||||
- child workflow version
|
||||
- parent business reference
|
||||
- parent resume pointer
|
||||
- target result key
|
||||
- parent workflow state needed for resume
|
||||
|
||||
### 7.8 Continue-With
|
||||
|
||||
Continue-with creates a new workflow start request as an engine side effect.
|
||||
|
||||
It is not a resume boundary for the current instance unless explicitly modeled that way by the workflow.
|
||||
|
||||
## 8. Resume Model
|
||||
|
||||
### 8.1 Resume Pointer
|
||||
|
||||
The engine must persist a deterministic resume pointer.
|
||||
|
||||
It should identify:
|
||||
|
||||
- entry point kind
|
||||
- task name if resuming from task completion
|
||||
- branch path
|
||||
- next step index
|
||||
- repeat iteration where applicable
|
||||
|
||||
The existing declarative resume model is the right conceptual baseline, but the engine should persist it inside the canonical runtime snapshot rather than inside a CLR-only execution flow.
|
||||
|
||||
### 8.2 Waiting Token
|
||||
|
||||
Every durable wait must have a waiting token.
|
||||
|
||||
The waiting token is how the engine prevents stale resumes.
|
||||
|
||||
When a signal arrives:
|
||||
|
||||
- if the waiting token does not match the snapshot
|
||||
- the signal is stale and must be ignored safely
|
||||
|
||||
This is the primary guard for:
|
||||
|
||||
- canceled timers
|
||||
- duplicate wake-ups
|
||||
- late child completions
|
||||
- redelivered signals
|
||||
|
||||
### 8.3 Version
|
||||
|
||||
Every successful execution commit must increment snapshot version.
|
||||
|
||||
Signals may carry the expected version that created the wait.
|
||||
|
||||
This allows the engine to detect stale work before any mutation.
|
||||
|
||||
## 9. Human Task Model
|
||||
|
||||
The task model remains projection-first.
|
||||
|
||||
The runtime does not wait on an in-memory task object.
|
||||
|
||||
Instead:
|
||||
|
||||
- task activation writes a task projection row
|
||||
- runtime snapshot enters `WaitingForTaskCompletion`
|
||||
- task completion API provides the wake-up event
|
||||
|
||||
Task completion is therefore an external signal into the engine.
|
||||
|
||||
## 10. Error Model
|
||||
|
||||
The interpreter should classify errors into:
|
||||
|
||||
- definition errors
|
||||
- expression evaluation errors
|
||||
- transport errors
|
||||
- timeout errors
|
||||
- authorization errors
|
||||
- engine consistency errors
|
||||
|
||||
Definition errors are startup or validation failures.
|
||||
Execution errors are runtime failures that may:
|
||||
|
||||
- route into a failure branch
|
||||
- schedule a retry
|
||||
- fail the workflow
|
||||
- move the instance to a recoverable error state
|
||||
|
||||
## 11. Retry Model
|
||||
|
||||
Retries should be modeled explicitly as scheduled signals.
|
||||
|
||||
The engine should not sleep inside a worker.
|
||||
|
||||
A retry should:
|
||||
|
||||
1. persist the failure context
|
||||
2. generate a new waiting token
|
||||
3. enqueue a delayed resume signal
|
||||
4. commit
|
||||
|
||||
## 12. Completion Model
|
||||
|
||||
A workflow completes when the interpreter reaches terminal completion with no outstanding waits.
|
||||
|
||||
Completion result must:
|
||||
|
||||
- mark instance projection completed
|
||||
- mark runtime state completed
|
||||
- clear stale timeout metadata
|
||||
- apply retention timing
|
||||
|
||||
## 13. Determinism Requirements
|
||||
|
||||
The runtime must assume:
|
||||
|
||||
- expressions are deterministic given the execution context
|
||||
- transport calls are side effects and must be treated explicitly
|
||||
- no hidden CLR delegate behavior remains in workflow definitions
|
||||
|
||||
The runtime should not rely on:
|
||||
|
||||
- non-deterministic local time calls inside step execution
|
||||
- in-memory mutable workflow objects
|
||||
- ambient state outside the canonical execution context
|
||||
|
||||
## 14. Resulting Implementation Shape
|
||||
|
||||
The engine kernel should be implemented as:
|
||||
|
||||
- definition normalizer
|
||||
- canonical interpreter
|
||||
- transport dispatcher
|
||||
- execution coordinator
|
||||
- resume serializer/deserializer
|
||||
|
||||
This produces a runtime that is small, explicit, and aligned with the already-completed full-declaration effort.
|
||||
|
||||
403
docs/workflow/engine/04-persistence-signaling-and-scheduling.md
Normal file
403
docs/workflow/engine/04-persistence-signaling-and-scheduling.md
Normal file
@@ -0,0 +1,403 @@
|
||||
# 04. Persistence, Signaling, And Scheduling
|
||||
|
||||
## 1. Persistence Strategy
|
||||
|
||||
Oracle is the single durable backend for v1.
|
||||
|
||||
It stores:
|
||||
|
||||
- workflow instance projections
|
||||
- workflow task projections
|
||||
- workflow task events
|
||||
- workflow runtime snapshots
|
||||
- hosted job locks
|
||||
- AQ queues for immediate and delayed signals
|
||||
|
||||
This keeps correctness inside one transactional platform.
|
||||
|
||||
## 2. Existing Tables To Preserve
|
||||
|
||||
The current workflow schema already has the right base tables:
|
||||
|
||||
- `WF_INSTANCES`
|
||||
- `WF_TASKS`
|
||||
- `WF_TASK_EVENTS`
|
||||
- `WF_RUNTIME_STATES`
|
||||
- `WF_HOST_LOCKS`
|
||||
|
||||
The current workflow database model is the mapping baseline for these tables.
|
||||
|
||||
### 2.1 WF_INSTANCES
|
||||
|
||||
Purpose:
|
||||
|
||||
- product-facing workflow instance summary
|
||||
- instance business reference
|
||||
- instance status
|
||||
- product-facing state snapshot
|
||||
|
||||
### 2.2 WF_TASKS
|
||||
|
||||
Purpose:
|
||||
|
||||
- active and historical human task projections
|
||||
- task routing
|
||||
- assignment
|
||||
- task payload
|
||||
- effective roles
|
||||
|
||||
### 2.3 WF_TASK_EVENTS
|
||||
|
||||
Purpose:
|
||||
|
||||
- append-only task event history
|
||||
- created, assigned, released, completed, reassigned events
|
||||
|
||||
### 2.4 WF_RUNTIME_STATES
|
||||
|
||||
Purpose:
|
||||
|
||||
- engine-owned durable runtime snapshot
|
||||
|
||||
This table becomes the main source of truth for engine resume.
|
||||
|
||||
## 3. Proposed Runtime State Extensions
|
||||
|
||||
`WF_RUNTIME_STATES` should be extended to support canonical engine execution directly.
|
||||
|
||||
Recommended new columns:
|
||||
|
||||
- `STATE_VERSION`
|
||||
Numeric optimistic concurrency version.
|
||||
- `SNAPSHOT_SCHEMA_VERSION`
|
||||
Snapshot format version for engine evolution.
|
||||
- `WAITING_KIND`
|
||||
Current wait type.
|
||||
- `WAITING_TOKEN`
|
||||
Stale-signal guard token.
|
||||
- `WAITING_UNTIL_UTC`
|
||||
Next due time when waiting on time-based resume.
|
||||
- `ACTIVE_TASK_ID`
|
||||
Current active task projection id when applicable.
|
||||
- `RESUME_POINTER_JSON`
|
||||
Serialized canonical resume pointer.
|
||||
- `LAST_SIGNAL_ID`
|
||||
Last successfully processed signal id.
|
||||
- `LAST_ERROR_CODE`
|
||||
Last engine error code.
|
||||
- `LAST_ERROR_JSON`
|
||||
Structured last error details.
|
||||
- `LAST_EXECUTED_BY`
|
||||
Node id that last committed execution.
|
||||
- `LAST_EXECUTED_ON_UTC`
|
||||
Last successful engine commit timestamp.
|
||||
|
||||
The existing fields remain useful:
|
||||
|
||||
- workflow identity
|
||||
- business reference
|
||||
- runtime provider
|
||||
- runtime instance id
|
||||
- runtime status
|
||||
- state json
|
||||
- lifecycle timestamps
|
||||
|
||||
## 4. Snapshot Structure
|
||||
|
||||
`STATE_JSON` should hold a provider snapshot object for `SerdicaEngine`.
|
||||
|
||||
Recommended shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"engineSchemaVersion": 1,
|
||||
"workflowState": {},
|
||||
"businessReference": {
|
||||
"key": "1200345",
|
||||
"parts": {}
|
||||
},
|
||||
"status": "Open",
|
||||
"waiting": {
|
||||
"kind": "TaskCompletion",
|
||||
"token": "wait-123",
|
||||
"untilUtc": null
|
||||
},
|
||||
"resume": {
|
||||
"entryPointKind": "TaskOnComplete",
|
||||
"taskName": "ApproveApplication",
|
||||
"branchPath": [],
|
||||
"nextStepIndex": 3
|
||||
},
|
||||
"subWorkflowFrames": [],
|
||||
"continuationBuffer": []
|
||||
}
|
||||
```
|
||||
|
||||
## 5. Oracle AQ Strategy
|
||||
|
||||
### 5.1 Why AQ
|
||||
|
||||
AQ is the default signaling backend because it gives:
|
||||
|
||||
- durable storage
|
||||
- blocking dequeue
|
||||
- delayed delivery
|
||||
- database-managed recovery
|
||||
- transactional semantics close to the runtime state store
|
||||
|
||||
### 5.2 Queue Topology
|
||||
|
||||
Use explicit queues for clarity and operations.
|
||||
|
||||
Recommended objects:
|
||||
|
||||
- `WF_SIGNAL_QTAB`
|
||||
- `WF_SIGNAL_Q`
|
||||
- `WF_SCHEDULE_QTAB`
|
||||
- `WF_SCHEDULE_Q`
|
||||
- `WF_DLQ_QTAB`
|
||||
- `WF_DLQ_Q`
|
||||
|
||||
Rationale:
|
||||
|
||||
- immediate signals and delayed signals are operationally different
|
||||
- dead-letter isolation matters for supportability
|
||||
- queue separation makes metrics and troubleshooting simpler
|
||||
|
||||
### 5.3 Payload Format
|
||||
|
||||
Use a compact JSON envelope serialized to UTF-8 bytes in a `RAW` payload.
|
||||
|
||||
Reasons:
|
||||
|
||||
- simple from .NET
|
||||
- explicit schema ownership in application code
|
||||
- small message size
|
||||
- backend abstraction remains possible later
|
||||
|
||||
Do not put full workflow snapshots into AQ messages.
|
||||
|
||||
## 6. Signal Envelope
|
||||
|
||||
Recommended envelope:
|
||||
|
||||
```csharp
|
||||
public sealed record WorkflowSignalEnvelope
|
||||
{
|
||||
public required string SignalId { get; init; }
|
||||
public required string WorkflowInstanceId { get; init; }
|
||||
public required string RuntimeProvider { get; init; }
|
||||
public required string SignalType { get; init; }
|
||||
public required long ExpectedVersion { get; init; }
|
||||
public string? WaitingToken { get; init; }
|
||||
public DateTime OccurredAtUtc { get; init; }
|
||||
public DateTime? DueAtUtc { get; init; }
|
||||
public Dictionary<string, JsonElement> Payload { get; init; } = [];
|
||||
}
|
||||
```
|
||||
|
||||
Signal types:
|
||||
|
||||
- `TaskCompleted`
|
||||
- `TimerDue`
|
||||
- `RetryDue`
|
||||
- `ExternalSignal`
|
||||
- `SubWorkflowCompleted`
|
||||
- `InternalContinue`
|
||||
|
||||
## 7. Transaction Model
|
||||
|
||||
### 7.1 Start Transaction
|
||||
|
||||
Start must durably commit:
|
||||
|
||||
- instance projection
|
||||
- runtime snapshot
|
||||
- task rows if any
|
||||
- task events if any
|
||||
- scheduled or immediate AQ messages if any
|
||||
|
||||
### 7.2 Completion Transaction
|
||||
|
||||
Task completion must durably commit:
|
||||
|
||||
- task completion event
|
||||
- updated instance projection
|
||||
- new task rows if any
|
||||
- updated runtime snapshot
|
||||
- any resulting AQ signals or schedules
|
||||
|
||||
### 7.3 Signal Resume Transaction
|
||||
|
||||
Signal resume must durably commit:
|
||||
|
||||
- AQ dequeue
|
||||
- updated runtime snapshot
|
||||
- resulting projection changes
|
||||
- any next AQ signals
|
||||
|
||||
The intended operational model is:
|
||||
|
||||
- dequeue with transactional semantics
|
||||
- update state and projections
|
||||
- commit once
|
||||
|
||||
If commit fails, the signal must become visible again.
|
||||
|
||||
## 8. Blocking Receive Model
|
||||
|
||||
No polling loop should be used for work discovery.
|
||||
|
||||
Each node should run a signal pump that:
|
||||
|
||||
- opens one or more blocking AQ dequeue consumers
|
||||
- waits on AQ rather than sleeping and scanning
|
||||
- dispatches envelopes to bounded execution workers
|
||||
|
||||
Suggested parameters:
|
||||
|
||||
- dequeue wait seconds
|
||||
- max concurrent handlers
|
||||
- max poison retries
|
||||
- dead-letter policy
|
||||
|
||||
## 9. Scheduling Model
|
||||
|
||||
### 9.1 Scheduling Requirement
|
||||
|
||||
The engine must support timers without a periodic sweep job.
|
||||
|
||||
### 9.2 Scheduling Approach
|
||||
|
||||
When a workflow enters a timed wait:
|
||||
|
||||
1. runtime snapshot is updated with:
|
||||
- waiting kind
|
||||
- waiting token
|
||||
- due time
|
||||
2. a delayed AQ message is enqueued
|
||||
3. transaction commits
|
||||
|
||||
When the delayed message becomes available:
|
||||
|
||||
1. a signal consumer dequeues it
|
||||
2. current snapshot is loaded
|
||||
3. waiting token is checked
|
||||
4. if token matches, resume
|
||||
5. if token does not match, ignore as stale
|
||||
|
||||
### 9.3 Logical Cancel Instead Of Physical Delete
|
||||
|
||||
The scheduler should treat cancel and reschedule logically.
|
||||
|
||||
Do not make correctness depend on deleting a queued timer message.
|
||||
|
||||
Instead:
|
||||
|
||||
- generate a new waiting token when schedule changes
|
||||
- old delayed message becomes stale automatically
|
||||
|
||||
This is simpler and more reliable in distributed execution.
|
||||
|
||||
## 10. Multi-Node Concurrency Model
|
||||
|
||||
The engine must assume multiple nodes can receive signals for the same workflow instance.
|
||||
|
||||
Correctness model:
|
||||
|
||||
- signal delivery is at-least-once
|
||||
- snapshot update uses version control
|
||||
- waiting token guards stale work
|
||||
- duplicate resumes are safe
|
||||
|
||||
Recommended write model:
|
||||
|
||||
- read snapshot version
|
||||
- execute
|
||||
- update `WF_RUNTIME_STATES` where `STATE_VERSION = expected`
|
||||
- if update count is zero, abandon and retry or ignore as stale
|
||||
|
||||
This avoids permanent instance ownership.
|
||||
|
||||
## 11. Restart And Recovery Semantics
|
||||
|
||||
### 11.1 One Node Down
|
||||
|
||||
Other nodes continue consuming AQ and processing instances.
|
||||
|
||||
### 11.2 All Nodes Down, Database Up
|
||||
|
||||
Signals remain durable in AQ.
|
||||
|
||||
When any node comes back:
|
||||
|
||||
- AQ consumers reconnect
|
||||
- pending immediate and delayed signals are processed
|
||||
- workflow resumes continue
|
||||
|
||||
### 11.3 Database Down
|
||||
|
||||
No execution can continue while Oracle is unavailable.
|
||||
|
||||
Once Oracle returns:
|
||||
|
||||
- AQ queues recover with the database
|
||||
- runtime snapshots recover with the database
|
||||
- resumed node consumers continue from durable state
|
||||
|
||||
### 11.4 All Nodes And Database Down
|
||||
|
||||
After Oracle returns and at least one application node starts:
|
||||
|
||||
- AQ messages are still present
|
||||
- runtime state is still present
|
||||
- due delayed messages can be consumed
|
||||
- execution resumes from durable state
|
||||
|
||||
This is one of the main reasons Oracle AQ is preferred over a separate volatile wake-up layer.
|
||||
|
||||
## 12. Redis Position In V1
|
||||
|
||||
Redis is optional and not part of the correctness path.
|
||||
|
||||
It may be used later for:
|
||||
|
||||
- local cache
|
||||
- non-authoritative wake hints
|
||||
- metrics fanout
|
||||
|
||||
It should not be required for:
|
||||
|
||||
- durable signal delivery
|
||||
- timer delivery
|
||||
- restart recovery
|
||||
|
||||
## 13. Dead-Letter Strategy
|
||||
|
||||
Messages should move to DLQ when:
|
||||
|
||||
- deserialization fails
|
||||
- definition is missing
|
||||
- snapshot is irreparably inconsistent
|
||||
- retry count exceeds threshold
|
||||
|
||||
DLQ entry should preserve:
|
||||
|
||||
- original envelope
|
||||
- failure reason
|
||||
- last node id
|
||||
- failure timestamp
|
||||
|
||||
## 14. Retention
|
||||
|
||||
Retention remains a service responsibility.
|
||||
|
||||
It should continue to clean:
|
||||
|
||||
- stale instances
|
||||
- stale tasks
|
||||
- completed data past purge window
|
||||
- runtime states past purge window
|
||||
|
||||
AQ retention policy should be aligned with application retention and supportability needs, but queue cleanup must not delete active work.
|
||||
|
||||
@@ -0,0 +1,425 @@
|
||||
# 05. Service Surface, Hosting, And Operations
|
||||
|
||||
## 1. Public Service Surface
|
||||
|
||||
The engine replacement must preserve the current workflow product APIs.
|
||||
|
||||
That means the following capability groups remain stable:
|
||||
|
||||
- workflow definition inspection
|
||||
- workflow start
|
||||
- workflow tasks list/get/assign/release/complete
|
||||
- workflow instances list/get
|
||||
- workflow diagrams
|
||||
- workflow retention run
|
||||
- canonical schema inspection
|
||||
- canonical import validation
|
||||
|
||||
The existing service-contract groups remain the baseline:
|
||||
|
||||
- workflow definition contracts
|
||||
- workflow start contracts
|
||||
- workflow task contracts
|
||||
- workflow instance contracts
|
||||
- workflow operational contracts
|
||||
|
||||
## 2. Service Metadata
|
||||
|
||||
The service should continue to advertise:
|
||||
|
||||
- definition inspection support
|
||||
- instance inspection support
|
||||
- canonical schema inspection support
|
||||
- canonical validation support
|
||||
|
||||
The diagram provider value should change from old-runtime semantics to an engine-compatible diagram provider, but the public contract can remain unchanged.
|
||||
|
||||
## 3. Workflow Diagram Strategy
|
||||
|
||||
The current diagram service builds a simplified linear diagram from definition metadata and overlays instance/task status.
|
||||
|
||||
The current simplified workflow diagram service is the baseline. V1 engine design keeps this approach.
|
||||
|
||||
Why:
|
||||
|
||||
- it is already product-compatible
|
||||
- it does not depend on Elsa runtime internals
|
||||
- it uses task and instance projections, which remain in place
|
||||
|
||||
The engine should not block on building a richer graph renderer.
|
||||
|
||||
## 4. Authorization And Assignment
|
||||
|
||||
Authorization remains in the service layer, not the engine kernel.
|
||||
|
||||
This should remain true in v1:
|
||||
|
||||
- engine activates tasks
|
||||
- projection store writes tasks
|
||||
- service decides who may assign/release/complete them
|
||||
|
||||
The engine should never embed user-specific authorization policy.
|
||||
|
||||
## 5. Hosting Model
|
||||
|
||||
### 5.1 Host Shape
|
||||
|
||||
The service process should host:
|
||||
|
||||
- API endpoints
|
||||
- canonical definition cache
|
||||
- runtime provider
|
||||
- AQ signal consumer hosted service
|
||||
- retention hosted service
|
||||
|
||||
### 5.2 Background Services
|
||||
|
||||
Recommended hosted services:
|
||||
|
||||
- `WorkflowEngineSignalHostedService`
|
||||
- `WorkflowEngineScheduleHostedService`
|
||||
This may be unnecessary if delayed AQ messages are consumed by the same signal service.
|
||||
- `WorkflowRetentionHostedService`
|
||||
|
||||
### 5.3 Concurrency Configuration
|
||||
|
||||
The host must expose configuration for:
|
||||
|
||||
- signal consumer count
|
||||
- max concurrent execution handlers
|
||||
- dequeue wait duration
|
||||
- per-execution timeout
|
||||
|
||||
## 6. Configuration Model
|
||||
|
||||
### 6.1 Runtime Configuration
|
||||
|
||||
Recommended runtime options:
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowRuntime": {
|
||||
"Provider": "SerdicaEngine",
|
||||
"FailStartupOnInvalidDefinition": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
In v1 this is a single-provider choice, not a mixed routing system.
|
||||
|
||||
### 6.2 Engine Execution Configuration
|
||||
|
||||
Recommended engine options:
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowEngine": {
|
||||
"NodeId": "workflow-node-1",
|
||||
"MaxConcurrentExecutions": 16,
|
||||
"ExecutionTimeoutSeconds": 300,
|
||||
"DefinitionCacheMode": "Startup"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.3 AQ Configuration
|
||||
|
||||
Recommended AQ options:
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowAq": {
|
||||
"Schema": "SRD_WFKLW",
|
||||
"SignalQueueName": "WF_SIGNAL_Q",
|
||||
"ScheduleQueueName": "WF_SCHEDULE_Q",
|
||||
"DeadLetterQueueName": "WF_DLQ_Q",
|
||||
"DequeueWaitSeconds": 30,
|
||||
"MaxDeliveryAttempts": 10,
|
||||
"SignalConsumers": 4
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.4 Retention Configuration
|
||||
|
||||
Reuse the existing retention options and align engine snapshot retention with projection retention.
|
||||
|
||||
## 7. Operational Diagnostics
|
||||
|
||||
The engine must make the following available in logs and metrics:
|
||||
|
||||
- workflow instance id
|
||||
- workflow name
|
||||
- workflow version
|
||||
- business reference key
|
||||
- signal id
|
||||
- signal type
|
||||
- waiting token
|
||||
- state version
|
||||
- node id
|
||||
- execution duration
|
||||
- dequeue latency
|
||||
- retry count
|
||||
- dead-letter count
|
||||
- transport name and step id on failure
|
||||
|
||||
## 8. Metrics
|
||||
|
||||
Recommended metrics:
|
||||
|
||||
- workflows started
|
||||
- workflows completed
|
||||
- workflows failed
|
||||
- tasks activated
|
||||
- task completions processed
|
||||
- AQ signals dequeued
|
||||
- AQ signal failures
|
||||
- AQ DLQ count
|
||||
- timer signals fired
|
||||
- stale signals ignored
|
||||
- execution conflict retries
|
||||
- average execution slice duration
|
||||
- active waiting instances by waiting kind
|
||||
|
||||
## 9. Logging
|
||||
|
||||
Logging should distinguish between:
|
||||
|
||||
- product logs
|
||||
- engine execution logs
|
||||
- signal bus logs
|
||||
- scheduler logs
|
||||
- transport logs
|
||||
|
||||
The engine should log structured fields, not only free text.
|
||||
|
||||
Minimum structured fields:
|
||||
|
||||
- `workflowInstanceId`
|
||||
- `workflowName`
|
||||
- `workflowVersion`
|
||||
- `businessReferenceKey`
|
||||
- `signalId`
|
||||
- `signalType`
|
||||
- `nodeId`
|
||||
- `stateVersion`
|
||||
- `waitingToken`
|
||||
|
||||
## 10. Failure Handling Policy
|
||||
|
||||
### 10.1 Recoverable Failures
|
||||
|
||||
Examples:
|
||||
|
||||
- transient transport failure
|
||||
- transient AQ dequeue failure
|
||||
- optimistic concurrency conflict
|
||||
|
||||
Handling:
|
||||
|
||||
- retry execution
|
||||
- reschedule if policy exists
|
||||
- keep workflow consistent
|
||||
|
||||
### 10.2 Non-Recoverable Failures
|
||||
|
||||
Examples:
|
||||
|
||||
- invalid snapshot format
|
||||
- missing definition for existing instance
|
||||
- unresolvable signal payload
|
||||
|
||||
Handling:
|
||||
|
||||
- move signal to DLQ
|
||||
- mark instance runtime state as failed or blocked
|
||||
- expose failure through inspection
|
||||
|
||||
## 11. Security Boundaries
|
||||
|
||||
### 11.1 Service API Boundary
|
||||
|
||||
User-facing authorization stays where it currently belongs:
|
||||
|
||||
- endpoint layer
|
||||
- task authorization service
|
||||
|
||||
### 11.2 Engine Boundary
|
||||
|
||||
The engine should trust only:
|
||||
|
||||
- validated workflow definitions
|
||||
- validated task completion requests from the service layer
|
||||
- authenticated transport adapters
|
||||
|
||||
### 11.3 AQ Boundary
|
||||
|
||||
AQ queues should be scoped to the workflow schema and not shared casually with unrelated services.
|
||||
|
||||
## 12. Testing Strategy
|
||||
|
||||
### 12.1 Unit Tests
|
||||
|
||||
Test:
|
||||
|
||||
- canonical interpreter step behavior
|
||||
- resume pointer serialization
|
||||
- waiting token behavior
|
||||
- optimistic concurrency conflict handling
|
||||
- AQ envelope serialization
|
||||
|
||||
### 12.2 Component Tests
|
||||
|
||||
Test:
|
||||
|
||||
- start flow to task activation
|
||||
- task completion to next task
|
||||
- timer registration to delayed resume
|
||||
- subworkflow completion to parent resume
|
||||
- transport failure to retry or failure branch
|
||||
|
||||
### 12.3 Integration Tests
|
||||
|
||||
Test with real Oracle and AQ:
|
||||
|
||||
- signal enqueue/dequeue
|
||||
- delayed message handling
|
||||
- restart recovery
|
||||
- multi-node duplicate delivery safety
|
||||
|
||||
### 12.4 Oracle And AQ Reliability Tests
|
||||
|
||||
The engine should have a dedicated Oracle-focused integration suite, not just generic workflow integration coverage.
|
||||
|
||||
The Oracle suite should be split into four layers.
|
||||
|
||||
#### 12.4.1 Oracle Transport Reality Tests
|
||||
|
||||
These tests prove the raw AQ behavior that the engine depends on:
|
||||
|
||||
- immediate enqueue followed by blocking dequeue
|
||||
- delayed enqueue followed by eventual dequeue
|
||||
- enqueue with transaction commit succeeds
|
||||
- enqueue with transaction rollback disappears
|
||||
- dequeue with `OnCommit` plus rollback causes redelivery
|
||||
- dequeue with `OnCommit` plus commit removes message
|
||||
- dead-letter enqueue and replay path
|
||||
- browse path against dead-letter queue
|
||||
- queue creation and teardown in ephemeral schemas or ephemeral queue names
|
||||
|
||||
These tests should stay small and synthetic so transport failures are easy to isolate.
|
||||
|
||||
#### 12.4.2 Engine Persistence And Delivery Coupling Tests
|
||||
|
||||
These tests prove that Oracle state and Oracle AQ stay consistent together:
|
||||
|
||||
- runtime state update plus AQ enqueue committed atomically
|
||||
- runtime state update rolled back means no visible signal
|
||||
- projection update plus AQ enqueue committed atomically
|
||||
- duplicate AQ delivery with the same waiting token is harmless
|
||||
- stale expected version plus valid waiting token is ignored safely
|
||||
- stale timer message after reschedule becomes a no-op
|
||||
|
||||
These are the most important correctness tests for the run-to-wait architecture.
|
||||
|
||||
#### 12.4.3 Restart And Recovery Tests
|
||||
|
||||
These tests should simulate realistic restart conditions:
|
||||
|
||||
- app restart with immediate signal already in queue
|
||||
- app restart with delayed signal not yet due
|
||||
- app restart after delayed signal becomes due
|
||||
- app restart after dequeue but before commit
|
||||
- Oracle container restart while waiting instances exist
|
||||
- Oracle restart while delayed messages are still pending
|
||||
- service restart with dead-letter backlog present
|
||||
|
||||
These tests should prove that no polling is needed to recover normal execution.
|
||||
|
||||
#### 12.4.4 Oracle Load And Timing Tests
|
||||
|
||||
These tests should focus on timing variance and backlog behavior:
|
||||
|
||||
- cold-container delayed message latency envelope
|
||||
- many delayed messages becoming due in the same second
|
||||
- burst of immediate signals after service startup
|
||||
- mixed immediate and delayed signals on one queue
|
||||
- long-running dequeue loops with empty polls between real messages
|
||||
- bounded backlog drain time for representative queue depth
|
||||
|
||||
The goal is not only correctness, but knowing what timing variance is normal on local and CI Oracle containers.
|
||||
|
||||
The detailed workload model, KPI set, harness structure, and test-tier split should live in [08-load-and-performance-plan.md](08-load-and-performance-plan.md).
|
||||
|
||||
### 12.5 Bulstrad Product-Parity Tests
|
||||
|
||||
Synthetic engine tests are necessary but not sufficient.
|
||||
|
||||
The main parity suite should use real Bulstrad declarative workflows with scripted downstream transport responses. The purpose is to prove that the Serdica engine executes product workflows, not just toy workflows.
|
||||
|
||||
Recommended first-wave Bulstrad coverage:
|
||||
|
||||
- transport-heavy completion flows such as `AssistantPrintInsisDocuments`
|
||||
- approval/review chains such as `ReviewPolicyOpenForChange`
|
||||
- parent-child workflow chains such as `OpenForChangePolicy`
|
||||
- cancellation flows such as `AnnexCancellation`
|
||||
- policy end-state flows such as `AssistantPolicyCancellation`
|
||||
- reinstate or reopen flows such as `AssistantPolicyReinstate`
|
||||
- shared-policy integration flows such as `InsisIntegrationNew`
|
||||
- shared-policy confirmation and conversion flows such as `QuotationConfirm`
|
||||
- failure-tolerant cleanup flows such as `QuoteOrAplCancel`
|
||||
|
||||
Each Bulstrad test should assert:
|
||||
|
||||
- task sequence
|
||||
- task payload shape
|
||||
- transport invocation order
|
||||
- final workflow state
|
||||
- runtime version progression
|
||||
- absence of leaked subworkflow frames or stale wait metadata
|
||||
|
||||
Current Oracle-backed parity coverage already includes these families and uses restarted providers plus real Oracle workflow tables, not synthetic in-memory state.
|
||||
|
||||
### 12.6 Chaos And Fault-Injection Tests
|
||||
|
||||
The engine should also have a deterministic chaos suite.
|
||||
|
||||
Recommended failure points:
|
||||
|
||||
- before snapshot save
|
||||
- after snapshot save but before projection save
|
||||
- after projection save but before AQ enqueue
|
||||
- after AQ enqueue but before commit
|
||||
- after dequeue but before signal processing completes
|
||||
- after signal processing but before lease commit
|
||||
|
||||
Recommended assertions:
|
||||
|
||||
- no duplicate open tasks
|
||||
- no lost committed signal
|
||||
- no unbounded retry loop
|
||||
- no invalid version rollback
|
||||
- no stuck instance without an explainable wait reason
|
||||
|
||||
### 12.7 Parity Tests
|
||||
|
||||
The most important tests compare outcomes against the current declarative workflow expectations:
|
||||
|
||||
- same task sequence
|
||||
- same state changes
|
||||
- same business reference results
|
||||
- same transport payload shaping
|
||||
|
||||
## 13. Supportability
|
||||
|
||||
Operations staff should be able to answer:
|
||||
|
||||
- what is this instance waiting for
|
||||
- when was it last executed
|
||||
- what signal is due next
|
||||
- why was a signal ignored
|
||||
- why did a signal go to DLQ
|
||||
- which step failed
|
||||
|
||||
This is why runtime state inspection and structured failure metadata are mandatory.
|
||||
|
||||
285
docs/workflow/engine/06-implementation-structure.md
Normal file
285
docs/workflow/engine/06-implementation-structure.md
Normal file
@@ -0,0 +1,285 @@
|
||||
# 06. Implementation Structure
|
||||
|
||||
## 1. Implementation Goal
|
||||
|
||||
The implementation should mirror the architecture instead of collapsing everything into `Services/`.
|
||||
|
||||
The code layout should make it obvious which parts are:
|
||||
|
||||
- product orchestration
|
||||
- engine runtime
|
||||
- persistence
|
||||
- signaling
|
||||
- scheduling
|
||||
- operations
|
||||
|
||||
## 2. Proposed Folder Layout
|
||||
|
||||
Recommended new structure across the workflow host, shared abstractions, and external service contracts.
|
||||
|
||||
### 2.1 Service Host Project
|
||||
|
||||
Proposed folders:
|
||||
|
||||
```text
|
||||
Engine/
|
||||
Contracts/
|
||||
Execution/
|
||||
Persistence/
|
||||
Signaling/
|
||||
Scheduling/
|
||||
Hosting/
|
||||
Diagnostics/
|
||||
```
|
||||
|
||||
Detailed proposal:
|
||||
|
||||
```text
|
||||
Engine/
|
||||
Contracts/
|
||||
IWorkflowRuntimeProvider.cs
|
||||
IWorkflowSignalBus.cs
|
||||
IWorkflowScheduleBus.cs
|
||||
IWorkflowRuntimeSnapshotStore.cs
|
||||
IWorkflowRuntimeDefinitionStore.cs
|
||||
|
||||
Execution/
|
||||
SerdicaEngineRuntimeProvider.cs
|
||||
WorkflowExecutionCoordinator.cs
|
||||
WorkflowCanonicalInterpreter.cs
|
||||
WorkflowResumePointerSerializer.cs
|
||||
WorkflowExecutionSliceResult.cs
|
||||
WorkflowWaitDescriptor.cs
|
||||
WorkflowSubWorkflowCoordinator.cs
|
||||
WorkflowTransportDispatcher.cs
|
||||
|
||||
Persistence/
|
||||
OracleWorkflowRuntimeSnapshotStore.cs
|
||||
WorkflowRuntimeSnapshotMapper.cs
|
||||
WorkflowRuntimeStateMutator.cs
|
||||
|
||||
Signaling/
|
||||
OracleAqWorkflowSignalBus.cs
|
||||
WorkflowSignalEnvelope.cs
|
||||
WorkflowSignalPump.cs
|
||||
WorkflowSignalHandler.cs
|
||||
|
||||
Scheduling/
|
||||
OracleAqWorkflowScheduleBus.cs
|
||||
WorkflowScheduleRequest.cs
|
||||
|
||||
Hosting/
|
||||
WorkflowEngineSignalHostedService.cs
|
||||
WorkflowEngineStartupValidator.cs
|
||||
|
||||
Diagnostics/
|
||||
WorkflowEngineMetrics.cs
|
||||
WorkflowEngineLogScope.cs
|
||||
```
|
||||
|
||||
### 2.2 Shared Abstractions Project
|
||||
|
||||
Keep these in abstractions:
|
||||
|
||||
- execution contracts
|
||||
- signal/schedule bus interfaces
|
||||
- runtime provider interfaces
|
||||
- runtime snapshot records where shared
|
||||
|
||||
Do not put Oracle-specific details into the shared abstractions project.
|
||||
|
||||
### 2.3 Contracts Project
|
||||
|
||||
Keep only external service contracts there.
|
||||
|
||||
Do not leak engine-internal snapshot or AQ message contracts into public workflow contracts.
|
||||
|
||||
## 3. Recommended Core Interfaces
|
||||
|
||||
### 3.1 Runtime Provider
|
||||
|
||||
```csharp
|
||||
public interface IWorkflowRuntimeProvider
|
||||
{
|
||||
string ProviderName { get; }
|
||||
|
||||
Task<WorkflowRuntimeExecutionResult> StartAsync(
|
||||
WorkflowRegistration registration,
|
||||
WorkflowDefinitionDescriptor definition,
|
||||
WorkflowBusinessReference? businessReference,
|
||||
StartWorkflowRequest request,
|
||||
object startRequest,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
Task<WorkflowRuntimeExecutionResult> CompleteAsync(
|
||||
WorkflowRegistration registration,
|
||||
WorkflowDefinitionDescriptor definition,
|
||||
WorkflowTaskExecutionContext context,
|
||||
CancellationToken cancellationToken = default);
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Snapshot Store
|
||||
|
||||
```csharp
|
||||
public interface IWorkflowRuntimeSnapshotStore
|
||||
{
|
||||
Task<WorkflowRuntimeSnapshot?> GetAsync(
|
||||
string workflowInstanceId,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
Task<bool> TryUpsertAsync(
|
||||
WorkflowRuntimeSnapshot snapshot,
|
||||
long expectedVersion,
|
||||
CancellationToken cancellationToken = default);
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 Signal Bus
|
||||
|
||||
```csharp
|
||||
public interface IWorkflowSignalBus
|
||||
{
|
||||
Task PublishAsync(
|
||||
WorkflowSignalEnvelope envelope,
|
||||
CancellationToken cancellationToken = default);
|
||||
|
||||
Task<IWorkflowSignalLease?> ReceiveAsync(
|
||||
string consumerName,
|
||||
CancellationToken cancellationToken = default);
|
||||
}
|
||||
```
|
||||
|
||||
### 3.4 Schedule Bus
|
||||
|
||||
```csharp
|
||||
public interface IWorkflowScheduleBus
|
||||
{
|
||||
Task ScheduleAsync(
|
||||
WorkflowSignalEnvelope envelope,
|
||||
DateTime dueAtUtc,
|
||||
CancellationToken cancellationToken = default);
|
||||
}
|
||||
```
|
||||
|
||||
### 3.5 Definition Store
|
||||
|
||||
```csharp
|
||||
public interface IWorkflowRuntimeDefinitionStore
|
||||
{
|
||||
WorkflowRuntimeDefinition GetRequiredDefinition(
|
||||
string workflowName,
|
||||
string workflowVersion);
|
||||
}
|
||||
```
|
||||
|
||||
## 4. Runtime Definition Normalization
|
||||
|
||||
Recommended startup path:
|
||||
|
||||
1. read registrations from `WorkflowRegistrationCatalog`
|
||||
2. compile each workflow to canonical definition
|
||||
3. validate canonical definition
|
||||
4. convert to `WorkflowRuntimeDefinition`
|
||||
5. store in immutable in-memory cache
|
||||
|
||||
This startup step should be implemented once and reused by:
|
||||
|
||||
- runtime execution
|
||||
- canonical inspection endpoints
|
||||
- diagnostics
|
||||
|
||||
## 5. Snapshot Model
|
||||
|
||||
Recommended runtime snapshot record:
|
||||
|
||||
```csharp
|
||||
public sealed record WorkflowRuntimeSnapshot
|
||||
{
|
||||
public required string WorkflowInstanceId { get; init; }
|
||||
public required string WorkflowName { get; init; }
|
||||
public required string WorkflowVersion { get; init; }
|
||||
public required string RuntimeProvider { get; init; }
|
||||
public required long Version { get; init; }
|
||||
public WorkflowBusinessReference? BusinessReference { get; init; }
|
||||
public required string RuntimeStatus { get; init; }
|
||||
public required WorkflowEngineState EngineState { get; init; }
|
||||
public DateTime CreatedOnUtc { get; init; }
|
||||
public DateTime? CompletedOnUtc { get; init; }
|
||||
public DateTime LastUpdatedOnUtc { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
## 6. AQ Adapter Design
|
||||
|
||||
AQ adapters should be isolated behind backend-neutral interfaces.
|
||||
|
||||
Do not let the rest of the engine know about:
|
||||
|
||||
- queue table names
|
||||
- enqueue option types
|
||||
- dequeue option types
|
||||
- AQ-specific exception types
|
||||
|
||||
That isolation is the main swap seam for any future non-AQ backend.
|
||||
|
||||
## 7. Transaction Boundary Design
|
||||
|
||||
### 7.1 Coordinator Owns Transactions
|
||||
|
||||
`WorkflowExecutionCoordinator` should own the unit of work for:
|
||||
|
||||
- snapshot update
|
||||
- projection update
|
||||
- AQ publish
|
||||
- AQ dequeue completion
|
||||
|
||||
This avoids split responsibility across product services and engine helpers.
|
||||
|
||||
### 7.2 Projection Store Remains Focused
|
||||
|
||||
`WorkflowProjectionStore` should stay focused on:
|
||||
|
||||
- read projection writes
|
||||
- query paths
|
||||
- task event history
|
||||
|
||||
It should not become the coordinator for AQ or engine versioning.
|
||||
|
||||
## 8. Startup Composition
|
||||
|
||||
`WorkflowServiceCollectionExtensions` should eventually compose the engine roughly like this:
|
||||
|
||||
```csharp
|
||||
services.Configure<WorkflowRuntimeOptions>(...);
|
||||
services.Configure<WorkflowEngineOptions>(...);
|
||||
services.Configure<WorkflowAqOptions>(...);
|
||||
|
||||
services.AddScoped<IWorkflowRuntimeProvider, SerdicaEngineRuntimeProvider>();
|
||||
services.AddScoped<IWorkflowRuntimeOrchestrator, WorkflowRuntimeOrchestrator>();
|
||||
services.AddScoped<IWorkflowRuntimeSnapshotStore, OracleWorkflowRuntimeSnapshotStore>();
|
||||
services.AddScoped<IWorkflowSignalBus, OracleAqWorkflowSignalBus>();
|
||||
services.AddScoped<IWorkflowScheduleBus, OracleAqWorkflowScheduleBus>();
|
||||
services.AddSingleton<IWorkflowRuntimeDefinitionStore, WorkflowRuntimeDefinitionStore>();
|
||||
services.AddHostedService<WorkflowEngineSignalHostedService>();
|
||||
```
|
||||
|
||||
## 9. Avoided Anti-Patterns
|
||||
|
||||
The implementation should explicitly avoid:
|
||||
|
||||
- a giant engine service that knows everything
|
||||
- polling tables for due work
|
||||
- in-memory only timer ownership
|
||||
- transport-specific engine branches scattered across the codebase
|
||||
- storing huge snapshots in AQ messages
|
||||
- mixing public contracts with engine internal contracts
|
||||
|
||||
## 10. Implementation Rules
|
||||
|
||||
1. Put backend-specific code behind an interface.
|
||||
2. Keep canonical interpretation pure and backend-agnostic.
|
||||
3. Keep Oracle transaction handling close to the execution coordinator.
|
||||
4. Make resume idempotency part of the snapshot model, not a side utility.
|
||||
5. Keep projection writes product-oriented, not runtime-oriented.
|
||||
|
||||
676
docs/workflow/engine/07-sprint-plan.md
Normal file
676
docs/workflow/engine/07-sprint-plan.md
Normal file
@@ -0,0 +1,676 @@
|
||||
# 07. Sprint Plan
|
||||
|
||||
## Planning Assumptions
|
||||
|
||||
- sprint length: 2 weeks
|
||||
- one team owning runtime, persistence, and service integration
|
||||
- Oracle AQ available
|
||||
- no concurrent-engine migration scope
|
||||
- acceptance means code, tests, and updated docs
|
||||
|
||||
## Sprint 1: Foundations And Contracts
|
||||
|
||||
### Goal
|
||||
|
||||
Create the engine skeleton and the stable interfaces.
|
||||
|
||||
### Scope
|
||||
|
||||
- add runtime provider abstraction
|
||||
- add signal bus abstraction
|
||||
- add schedule bus abstraction
|
||||
- add runtime snapshot abstraction
|
||||
- add engine option classes
|
||||
- add `docs/engine/` package
|
||||
|
||||
### Deliverables
|
||||
|
||||
- interface set compiled into shared abstractions
|
||||
- configuration classes
|
||||
- initial DI composition path
|
||||
- unit tests for options and registration
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- service builds with engine abstractions present
|
||||
- no Elsa runtime assumptions are introduced into new code
|
||||
- docs and interface names are stable enough for later sprints
|
||||
|
||||
## Sprint 2: Canonical Runtime Definition Store
|
||||
|
||||
### Goal
|
||||
|
||||
Make canonical execution definitions available at runtime without Elsa.
|
||||
|
||||
### Scope
|
||||
|
||||
- compile authored workflows to canonical runtime definitions at startup
|
||||
- validate definitions during startup
|
||||
- cache runtime definitions
|
||||
- expose startup failure mode for invalid definitions
|
||||
|
||||
### Deliverables
|
||||
|
||||
- `WorkflowRuntimeDefinitionStore`
|
||||
- definition normalization pipeline
|
||||
- startup validator
|
||||
- tests covering:
|
||||
- valid definition load
|
||||
- invalid definition rejection
|
||||
- version resolution
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- all registered workflows load into runtime definition cache
|
||||
- the runtime can resolve definition by name/version
|
||||
|
||||
## Sprint 3: Snapshot Store And Versioned Runtime State
|
||||
|
||||
### Goal
|
||||
|
||||
Turn `WF_RUNTIME_STATES` into a first-class engine snapshot store.
|
||||
|
||||
### Scope
|
||||
|
||||
- extend runtime state schema
|
||||
- implement snapshot mapper
|
||||
- implement optimistic concurrency versioning
|
||||
- wire snapshot reads and writes
|
||||
|
||||
### Deliverables
|
||||
|
||||
- database migration scripts
|
||||
- `OracleWorkflowRuntimeSnapshotStore`
|
||||
- snapshot serialization contracts
|
||||
- tests for:
|
||||
- initial insert
|
||||
- update with expected version
|
||||
- stale version conflict
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- runtime snapshots can be loaded and committed with version control
|
||||
- stale updates are rejected safely
|
||||
|
||||
## Sprint 4: AQ Signal And Schedule Backbone
|
||||
|
||||
### Goal
|
||||
|
||||
Introduce Oracle AQ as the durable event backbone.
|
||||
|
||||
### Scope
|
||||
|
||||
- create AQ setup scripts
|
||||
- implement signal bus
|
||||
- implement schedule bus
|
||||
- implement signal envelope serialization
|
||||
- implement hosted signal consumer skeleton
|
||||
|
||||
### Deliverables
|
||||
|
||||
- AQ DDL scripts
|
||||
- `OracleAqWorkflowSignalBus`
|
||||
- `OracleAqWorkflowScheduleBus`
|
||||
- integration tests with enqueue/dequeue
|
||||
- delayed message smoke tests
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- engine can publish and receive immediate signals without polling
|
||||
- engine can publish and receive delayed signals
|
||||
|
||||
## Sprint 5: Start Flow And Human Task Activation
|
||||
|
||||
### Goal
|
||||
|
||||
Run workflows from start until first durable wait.
|
||||
|
||||
### Scope
|
||||
|
||||
- implement execution coordinator
|
||||
- implement canonical interpreter subset:
|
||||
- state assignment
|
||||
- business reference assignment
|
||||
- task activation
|
||||
- terminal completion
|
||||
- integrate with `WorkflowRuntimeService`
|
||||
- keep existing projection model
|
||||
|
||||
### Deliverables
|
||||
|
||||
- `SerdicaEngineRuntimeProvider.StartAsync`
|
||||
- execution slice result model
|
||||
- task activation write path
|
||||
- tests for:
|
||||
- start to task
|
||||
- start to completion
|
||||
- business reference propagation
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- selected declarative workflows can start and create correct tasks without Elsa
|
||||
|
||||
## Sprint 6: Task Completion And Transport Calls
|
||||
|
||||
### Goal
|
||||
|
||||
Advance workflows after task completion and support transport-backed orchestration.
|
||||
|
||||
### Scope
|
||||
|
||||
- implement task completion execution path
|
||||
- implement canonical interpreter support for:
|
||||
- transport calls
|
||||
- branches
|
||||
- success/failure paths
|
||||
- integrate completion flow with runtime snapshot commit
|
||||
|
||||
### Deliverables
|
||||
|
||||
- `SerdicaEngineRuntimeProvider.CompleteAsync`
|
||||
- transport dispatcher
|
||||
- tests for:
|
||||
- completion to next task
|
||||
- failure branch
|
||||
- timeout branch where applicable
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- representative workflows can complete first task and reach correct next state
|
||||
|
||||
## Sprint 7: Subworkflows, Continue-With, And Repeat
|
||||
|
||||
### Goal
|
||||
|
||||
Support the higher-order orchestration patterns used heavily in the corpus.
|
||||
|
||||
### Scope
|
||||
|
||||
- implement subworkflow frame persistence
|
||||
- implement parent resume
|
||||
- implement continue-with production
|
||||
- implement repeat resume semantics
|
||||
|
||||
### Deliverables
|
||||
|
||||
- subworkflow coordinator
|
||||
- resume pointer serializer
|
||||
- tests for:
|
||||
- child completion resumes parent
|
||||
- nested frame handling
|
||||
- repeat interrupted by wait
|
||||
- continue-with request emission
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- representative subworkflow-heavy families execute correctly
|
||||
|
||||
## Sprint 8: Timers, Retries, And Delayed Resume
|
||||
|
||||
### Goal
|
||||
|
||||
Finish the non-polling scheduling path.
|
||||
|
||||
### Scope
|
||||
|
||||
- implement timer waits
|
||||
- implement retry scheduling
|
||||
- implement stale timer ignore logic via waiting tokens
|
||||
- integrate delayed AQ delivery into execution coordinator
|
||||
|
||||
### Deliverables
|
||||
|
||||
- timer wait model
|
||||
- delayed resume handler
|
||||
- tests for:
|
||||
- timer due resume
|
||||
- retry due resume
|
||||
- canceled timer ignored
|
||||
- restart-safe delayed processing
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- the engine supports time-based orchestration without polling loops
|
||||
|
||||
## Sprint 9: Operational Parity
|
||||
|
||||
### Goal
|
||||
|
||||
Reach product-surface and operations parity with the existing workflow service.
|
||||
|
||||
### Scope
|
||||
|
||||
- diagram parity validation
|
||||
- runtime state inspection parity
|
||||
- retention integration
|
||||
- structured metrics and logging
|
||||
- DLQ handling and diagnostics
|
||||
|
||||
### Deliverables
|
||||
|
||||
- runtime metadata mapping updates
|
||||
- operational dashboards or documented metric set
|
||||
- DLQ support
|
||||
- tests for supportability paths
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- operations can inspect and support engine-driven instances through the existing product surface
|
||||
|
||||
## Sprint 10: Corpus Parity And Hardening
|
||||
|
||||
### Goal
|
||||
|
||||
Prove the engine against the real declarative workflow corpus.
|
||||
|
||||
### Scope
|
||||
|
||||
- execute representative high-fanout families end-to-end
|
||||
- resolve remaining interpreter gaps
|
||||
- multi-node duplicate delivery testing
|
||||
- restart and recovery testing
|
||||
- performance and soak tests
|
||||
|
||||
### Deliverables
|
||||
|
||||
- parity report against selected workflow families
|
||||
- load test results
|
||||
- recovery test results
|
||||
- production readiness checklist
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- selected production-grade workflows run without Elsa
|
||||
- restart recovery is proven
|
||||
- no polling is used for steady-state signal or timer discovery
|
||||
|
||||
## Sprint 11: Bulstrad E2E Parity And Oracle Reliability
|
||||
|
||||
### Goal
|
||||
|
||||
Turn the engine from a validated runtime into a production-grade execution platform by proving it against real Bulstrad workflows and hostile Oracle operating conditions.
|
||||
|
||||
### Scope
|
||||
|
||||
- build a curated Bulstrad Oracle-AQ E2E suite
|
||||
- replace synthetic runtime-state backing in Oracle integration tests with the real Oracle runtime-state store
|
||||
- add Oracle transaction-coupling tests for state, projections, and AQ publish
|
||||
- add Oracle restart, redelivery, and DLQ replay tests
|
||||
- add multi-worker and duplicate-delivery race tests
|
||||
- add deterministic fault-injection around commit boundaries
|
||||
|
||||
### Deliverables
|
||||
|
||||
- `BulstradOracleAqE2ETests`
|
||||
- curated representative workflows with scripted downstream responders
|
||||
- Oracle transport reliability suite covering:
|
||||
- immediate and delayed delivery
|
||||
- rollback and redelivery
|
||||
- dead-letter browse and replay
|
||||
- restart-safe delayed processing
|
||||
- concurrency suite covering:
|
||||
- duplicate signal delivery
|
||||
- same-instance multi-worker races
|
||||
- retry-after-conflict behavior
|
||||
- documented timing expectations for cold-start and steady-state Oracle AQ
|
||||
|
||||
### Implemented Coverage
|
||||
|
||||
The current Oracle-backed integration harness now includes:
|
||||
|
||||
- Bulstrad policy-change families:
|
||||
- `OpenForChangePolicy`
|
||||
- `ReviewPolicyOpenForChange`
|
||||
- `AssistantAddAnnex`
|
||||
- `AnnexCancellation`
|
||||
- `AssistantPolicyReinstate`
|
||||
- `AssistantPolicyCancellation`
|
||||
- `AssistantPrintInsisDocuments`
|
||||
- shared policy families:
|
||||
- `InsisIntegrationNew`
|
||||
- `QuotationConfirm`
|
||||
- `QuoteOrAplCancel`
|
||||
- Oracle transport and recovery matrix:
|
||||
- immediate and delayed AQ delivery
|
||||
- delayed backlog drain within a bounded latency envelope
|
||||
- dequeue rollback redelivery
|
||||
- ambient Oracle transaction commit and rollback for immediate messages
|
||||
- ambient Oracle transaction commit and rollback for delayed messages
|
||||
- dead-letter browse, replay, and backlog replay
|
||||
- dead-letter backlog survival across Oracle restart
|
||||
- timer backlog recovery across provider restart and Oracle restart
|
||||
- external-signal backlog recovery, worker abandon/recovery, and duplicate-delivery races
|
||||
- schedule/publish failure rollback inside workflow mutation transactions
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- representative Bulstrad workflows execute correctly on `SerdicaEngine` with real Oracle AQ
|
||||
- AQ-backed restart and delayed-delivery behavior is proven under realistic timing variance
|
||||
- duplicate delivery and commit-boundary failures are shown to be safe
|
||||
- the team has a stable PR suite and a broader nightly suite for Oracle-backed engine validation
|
||||
|
||||
## Sprint 12: Load, Performance, And Capacity Characterization
|
||||
|
||||
### Goal
|
||||
|
||||
Turn the correctness-focused Oracle validation suite into a real load and performance program with stable smoke gates, nightly trend runs, soak coverage, and first capacity numbers.
|
||||
|
||||
### Scope
|
||||
|
||||
- build a dedicated performance harness on top of the Oracle AQ integration foundation
|
||||
- separate PR smoke, nightly characterization, weekly soak, and explicit capacity tiers
|
||||
- add synthetic engine workloads for stable measurement
|
||||
- add representative Bulstrad workload runners for business realism
|
||||
- persist performance artifacts and summary reports
|
||||
- define baseline and regression strategy per environment
|
||||
|
||||
### Deliverables
|
||||
|
||||
- categorized performance scenarios:
|
||||
- `WorkflowPerfLatency`
|
||||
- `WorkflowPerfThroughput`
|
||||
- `WorkflowPerfSmoke`
|
||||
- `WorkflowPerfNightly`
|
||||
- `WorkflowPerfSoak`
|
||||
- `WorkflowPerfCapacity`
|
||||
- result artifact writer under `TestResults/workflow-performance/`
|
||||
- scenario matrix covering:
|
||||
- AQ immediate bursts
|
||||
- AQ delayed bursts
|
||||
- mixed signal backlogs
|
||||
- synthetic start/task/signal/timer/subworkflow flows
|
||||
- representative Bulstrad families
|
||||
- restart and replay under load
|
||||
- first baseline report for local Docker and CI Oracle
|
||||
- first capacity note for one-node and multi-node assumptions
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- PR smoke load checks are cheap and stable enough to run continuously
|
||||
- nightly runs capture latency, throughput, and correctness artifacts
|
||||
- soak runs prove no backlog drift or correctness decay over extended execution
|
||||
- representative Bulstrad workflows have measured latency envelopes, not just functional pass/fail
|
||||
- the team has an initial sizing recommendation for worker concurrency and queue backlog expectations
|
||||
|
||||
### Implemented Foundation
|
||||
|
||||
The current Sprint 12 implementation now includes:
|
||||
|
||||
- performance categories and artifact generation under `TestResults/workflow-performance/`
|
||||
- Oracle AQ smoke scenarios for:
|
||||
- immediate burst drain
|
||||
- delayed burst drain
|
||||
- synthetic external-signal backlog resume
|
||||
- short Bulstrad business burst using `QuoteOrAplCancel`
|
||||
- persisted comparison against the previous artifact for the same scenario and tier
|
||||
- Oracle AQ nightly scenarios for:
|
||||
- larger immediate burst drain
|
||||
- larger delayed burst drain
|
||||
- larger synthetic external-signal backlog resume
|
||||
- Bulstrad `QuotationConfirm -> PdfGenerator` burst
|
||||
- Oracle AQ soak scenario for:
|
||||
- sustained synthetic signal round-trip waves without correctness drift
|
||||
- Oracle AQ latency baseline for:
|
||||
- one-at-a-time synthetic signal round-trip with phase-level latency summaries
|
||||
- Oracle AQ throughput baseline for:
|
||||
- parallel synthetic signal round-trip with `16` workload concurrency and `8` signal workers
|
||||
- Oracle AQ capacity ladder for:
|
||||
- synthetic signal round-trip at concurrency `1`, `4`, `8`, and `16`
|
||||
- thread-safe scripted transport recording for concurrent smoke scenarios
|
||||
- first full Oracle baseline run with documented metrics in:
|
||||
- [10-oracle-performance-baseline-2026-03-17.md](10-oracle-performance-baseline-2026-03-17.md)
|
||||
- [10-oracle-performance-baseline-2026-03-17.json](10-oracle-performance-baseline-2026-03-17.json)
|
||||
|
||||
### Reference
|
||||
|
||||
The detailed workload model, KPI set, harness design, and baseline strategy are defined in [08-load-and-performance-plan.md](08-load-and-performance-plan.md).
|
||||
|
||||
## Sprint 13: Engine-Native Rendering And Authoring Projection
|
||||
|
||||
### Goal
|
||||
|
||||
Restore definition rendering and authoring projection without reintroducing Elsa types or runtime dependencies into the workflow declarations or the engine host.
|
||||
|
||||
### Scope
|
||||
|
||||
- design and implement a native definition-to-diagram projection for declarative and canonical workflows
|
||||
- support deterministic node and edge generation from runtime definitions
|
||||
- preserve task, branch, repeat, fork, timer, signal, and subworkflow visibility in the rendered output
|
||||
- define a stable rendering contract for the operational API and future authoring tools
|
||||
- keep rendering as a separate projection layer, not as part of runtime execution
|
||||
|
||||
### Deliverables
|
||||
|
||||
- native rendering model and renderer for `WorkflowRuntimeDefinition`
|
||||
- canonical-to-diagram projection rules for:
|
||||
- linear sequences
|
||||
- decisions and conditional branches
|
||||
- repeats
|
||||
- forks and joins
|
||||
- timers and external-signal waits
|
||||
- continuations and subworkflows
|
||||
- updated operational metadata and diagram endpoints backed only by engine assets
|
||||
- test suite covering rendering determinism and parity for representative Bulstrad workflows
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- workflow definitions render without any Elsa packages, builders, or activity models
|
||||
- rendered diagrams remain stable for the same declarative definition across rebuilds
|
||||
- operational diagram inspection uses the native renderer only
|
||||
- the rendering layer is ready to support a later authoring surface without changing workflow declarations
|
||||
|
||||
## Sprint 14: Backend Portability And Store Profiles
|
||||
|
||||
### Goal
|
||||
|
||||
Turn the Oracle-first engine into a backend-switchable engine with one selected backend profile per deployment.
|
||||
|
||||
### Scope
|
||||
|
||||
- introduce backend profile abstraction and dedicated backend plugin registration
|
||||
- split projection persistence from the current Oracle-first application service
|
||||
- formalize mutation coordinator abstraction
|
||||
- add backend-neutral dead-letter contract
|
||||
- add backend conformance suite
|
||||
- implement PostgreSQL profile
|
||||
- design MongoDB profile in executable detail, with implementation only after explicit product approval
|
||||
|
||||
### Deliverables
|
||||
|
||||
- `IWorkflowBackendRegistrationMarker`
|
||||
- backend-neutral projection contract
|
||||
- backend-neutral mutation coordinator contract
|
||||
- backend conformance suite
|
||||
- dedicated Oracle, PostgreSQL, and MongoDB backend plugin projects
|
||||
- executable MongoDB backend plugin design package
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- host selects one backend profile by configuration
|
||||
- host stays backend-neutral and does not resolve Oracle/PostgreSQL directly
|
||||
- Oracle and PostgreSQL pass the same conformance suite
|
||||
- MongoDB path is specified well enough that implementation is a bounded engineering task
|
||||
- workflow declarations and canonical definitions remain unchanged across backend profiles
|
||||
|
||||
## Sprint 15: Backend-Neutral Parity And Performance Harness
|
||||
|
||||
### Goal
|
||||
|
||||
Remove the remaining Oracle-only assumptions from the validation stack so PostgreSQL and MongoDB can be measured with the same correctness, Bulstrad, and performance scenarios.
|
||||
|
||||
### Scope
|
||||
|
||||
- extract backend-neutral performance artifacts, categories, and scenario drivers
|
||||
- extract backend-neutral runtime workload helpers from the Oracle-only harness
|
||||
- define one hostile-condition matrix shared by Oracle, PostgreSQL, and MongoDB
|
||||
- define one curated Bulstrad parity pack shared by all backends
|
||||
- define one normalized performance artifact format and baseline comparison model
|
||||
|
||||
### Deliverables
|
||||
|
||||
- shared `IntegrationTests/Performance/Common/` package
|
||||
- shared normalized performance metrics model
|
||||
- shared Bulstrad workload catalog for:
|
||||
- `OpenForChangePolicy`
|
||||
- `ReviewPolicyOpenForChange`
|
||||
- `AssistantPrintInsisDocuments`
|
||||
- `AssistantAddAnnex`
|
||||
- `AnnexCancellation`
|
||||
- `AssistantPolicyCancellation`
|
||||
- `AssistantPolicyReinstate`
|
||||
- `InsisIntegrationNew`
|
||||
- `QuotationConfirm`
|
||||
- `QuoteOrAplCancel`
|
||||
- backend-neutral hostile-condition checklist for:
|
||||
- duplicate delivery
|
||||
- same-instance resume race
|
||||
- abandon and reclaim
|
||||
- rollback on publish/schedule failure
|
||||
- restart with pending due messages
|
||||
- DLQ replay
|
||||
- backlog drain
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- Oracle, PostgreSQL, and MongoDB use the same performance artifact shape
|
||||
- Oracle no longer owns the reporting model for later backend baselines
|
||||
- PostgreSQL and MongoDB can plug into the same workload definitions without changing workflow semantics
|
||||
|
||||
## Sprint 16: PostgreSQL Hardening, Bulstrad Parity, And Baseline
|
||||
|
||||
### Goal
|
||||
|
||||
Bring PostgreSQL to Oracle-level confidence for correctness, hostile conditions, representative product behavior, and measured performance.
|
||||
|
||||
### Scope
|
||||
|
||||
- close the PostgreSQL hostile-condition gap to the Oracle matrix
|
||||
- add PostgreSQL-backed Bulstrad E2E parity
|
||||
- implement PostgreSQL latency, throughput, smoke, nightly, soak, and capacity suites
|
||||
- publish PostgreSQL baseline artifacts and narrative summary
|
||||
|
||||
### Deliverables
|
||||
|
||||
- PostgreSQL hostile-condition integration suite
|
||||
- PostgreSQL Bulstrad parity suite
|
||||
- PostgreSQL performance suites for:
|
||||
- latency
|
||||
- throughput
|
||||
- smoke
|
||||
- nightly
|
||||
- soak
|
||||
- capacity
|
||||
- baseline documents:
|
||||
- `11-postgres-performance-baseline-<date>.md`
|
||||
- `11-postgres-performance-baseline-<date>.json`
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- PostgreSQL passes the same hostile-condition matrix as Oracle
|
||||
- representative Bulstrad workflows run correctly on PostgreSQL
|
||||
- PostgreSQL has a durable, documented performance baseline comparable to Oracle
|
||||
|
||||
## Sprint 17: MongoDB Hardening, Bulstrad Parity, And Baseline
|
||||
|
||||
### Goal
|
||||
|
||||
Bring MongoDB to the same product and operational confidence level as the relational backends without changing workflow behavior.
|
||||
|
||||
### Scope
|
||||
|
||||
- close the MongoDB hostile-condition gap to the Oracle matrix
|
||||
- add MongoDB-backed Bulstrad E2E parity
|
||||
- implement MongoDB latency, throughput, smoke, nightly, soak, and capacity suites
|
||||
- publish MongoDB baseline artifacts and narrative summary
|
||||
|
||||
### Deliverables
|
||||
|
||||
- MongoDB hostile-condition integration suite
|
||||
- MongoDB Bulstrad parity suite
|
||||
- MongoDB performance suites for:
|
||||
- latency
|
||||
- throughput
|
||||
- smoke
|
||||
- nightly
|
||||
- soak
|
||||
- capacity
|
||||
- baseline documents:
|
||||
- `12-mongo-performance-baseline-<date>.md`
|
||||
- `12-mongo-performance-baseline-<date>.json`
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- MongoDB passes the same hostile-condition matrix as Oracle
|
||||
- representative Bulstrad workflows run correctly on MongoDB
|
||||
- MongoDB has a durable, documented performance baseline comparable to Oracle and PostgreSQL
|
||||
|
||||
## Sprint 18: Final Three-Backend Characterization And Decision Pack
|
||||
|
||||
### Goal
|
||||
|
||||
Produce the final side-by-side comparison for Oracle, PostgreSQL, and MongoDB using the same workloads, the same correctness rules, and the same performance artifact format.
|
||||
|
||||
### Scope
|
||||
|
||||
- rerun the shared Bulstrad parity pack on all three backends
|
||||
- rerun the shared hostile-condition matrix on all three backends
|
||||
- rerun the shared performance tiers and compare normalized metrics
|
||||
- capture backend-specific metrics appendices without letting them replace normalized workflow metrics
|
||||
- publish the final recommendation pack
|
||||
|
||||
### Deliverables
|
||||
|
||||
- final comparison documents:
|
||||
- `13-backend-comparison-<date>.md`
|
||||
- `13-backend-comparison-<date>.json`
|
||||
- normalized comparison across:
|
||||
- serial latency
|
||||
- steady-state throughput
|
||||
- capacity ladder
|
||||
- backlog drain
|
||||
- duplicate-delivery safety
|
||||
- restart recovery
|
||||
- backend-specific appendices for:
|
||||
- Oracle wait and AQ observations
|
||||
- PostgreSQL lock, WAL, and queue-table observations
|
||||
- MongoDB transaction, lock, and change-stream observations
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- all three backends are compared through the same workload lens
|
||||
- the team has one documented backend recommendation pack
|
||||
- future backend decisions can reuse the same comparison harness instead of inventing new ad hoc measurements
|
||||
|
||||
### Current Status
|
||||
|
||||
- baseline comparison pack published in:
|
||||
- [13-backend-comparison-2026-03-17.md](13-backend-comparison-2026-03-17.md)
|
||||
- [13-backend-comparison-2026-03-17.json](13-backend-comparison-2026-03-17.json)
|
||||
- normalized performance comparison is complete for Oracle, PostgreSQL, and MongoDB
|
||||
- reliability and Bulstrad hardening depth remains Oracle-first, so the current comparison is a baseline decision pack, not the final production closeout
|
||||
- the signal path is now split into durable store and wake driver seams
|
||||
- PostgreSQL and MongoDB now persist transactional wake-outbox records behind that seam
|
||||
- the optional Redis wake-driver plugin is implemented for PostgreSQL and MongoDB
|
||||
- Oracle intentionally remains on native AQ and does not support the Redis wake-driver combination
|
||||
|
||||
## Cross-Sprint Work Items
|
||||
|
||||
These should be maintained continuously, not left to the end:
|
||||
|
||||
- architecture doc updates
|
||||
- test harness improvements
|
||||
- canonical execution parity assertions
|
||||
- operational telemetry quality
|
||||
- snapshot schema versioning discipline
|
||||
- Oracle timing-envelope observations for CI and local Docker environments
|
||||
|
||||
## Final Milestone Definition
|
||||
|
||||
The project is complete when:
|
||||
|
||||
- the workflow service can run on the engine as the active runtime
|
||||
- task and instance APIs remain stable
|
||||
- Oracle AQ handles both immediate signaling and delayed scheduling
|
||||
- the service resumes correctly after restart without polling
|
||||
- the engine runs representative real workflows with production-grade observability
|
||||
|
||||
544
docs/workflow/engine/08-load-and-performance-plan.md
Normal file
544
docs/workflow/engine/08-load-and-performance-plan.md
Normal file
@@ -0,0 +1,544 @@
|
||||
# 08. Load And Performance Plan
|
||||
|
||||
## Purpose
|
||||
|
||||
This document defines how the Serdica workflow engine should be load-tested, performance-characterized, and capacity-sized once functional parity is in place.
|
||||
|
||||
The goal is not only to prove that the engine is correct under load, but to answer these product and platform questions:
|
||||
|
||||
- how many workflow starts, task completions, and signal resumes can one node sustain
|
||||
- how quickly does backlog drain after restart or outage
|
||||
- how much timing variance is normal for Oracle AQ on local Docker, CI, and shared environments
|
||||
- which workloads are Oracle-bound, AQ-bound, or engine-bound
|
||||
- which scenarios are safe to gate in PR and which belong in nightly or explicit soak runs
|
||||
|
||||
## Principles
|
||||
|
||||
The performance plan follows these rules:
|
||||
|
||||
- correctness comes first; a fast but lossy engine result is a failed run
|
||||
- performance tests must be split by intent: smoke, characterization, stress, soak, and failure-under-load
|
||||
- transport-only tests and full workflow tests must both exist; they answer different questions
|
||||
- synthetic workflows are required for stable measurement
|
||||
- representative Bulstrad workflows are required for product confidence
|
||||
- PR gates should use coarse, stable envelopes
|
||||
- nightly and explicit runs should record and compare detailed metrics
|
||||
- Oracle and AQ behavior must be measured directly, not inferred from app logs alone
|
||||
|
||||
## What Must Be Measured
|
||||
|
||||
### Correctness Under Load
|
||||
|
||||
Every load run should capture:
|
||||
|
||||
- total workflows started
|
||||
- total tasks activated
|
||||
- total tasks completed
|
||||
- total signals published
|
||||
- total signals processed
|
||||
- total signals ignored as stale or duplicate
|
||||
- total dead-lettered signals
|
||||
- total runtime concurrency conflicts
|
||||
- total failed runs
|
||||
- total stuck instances at end of run
|
||||
|
||||
Correctness invariants:
|
||||
|
||||
- no lost committed signal
|
||||
- no duplicate open task for the same logical wait
|
||||
- no orphan subworkflow frame
|
||||
- no runtime state row left without a valid explainable wait reason
|
||||
- no queue backlog remaining after a successful drain phase unless the scenario intentionally leaves poison messages in DLQ
|
||||
|
||||
### Latency
|
||||
|
||||
The engine should measure at least:
|
||||
|
||||
- start-to-first-task latency
|
||||
- start-to-completion latency
|
||||
- task-complete-to-next-task latency
|
||||
- signal-publish-to-task-visible latency
|
||||
- timer-due-to-resume latency
|
||||
- delayed-message lateness relative to requested due time
|
||||
- backlog-drain completion time
|
||||
- restart-to-first-processed-signal time
|
||||
|
||||
These should be recorded as:
|
||||
|
||||
- average
|
||||
- p50
|
||||
- p95
|
||||
- p99
|
||||
- max
|
||||
|
||||
### Throughput
|
||||
|
||||
The engine should measure:
|
||||
|
||||
- workflows started per second
|
||||
- task completions per second
|
||||
- signals published per second
|
||||
- signals processed per second
|
||||
- backlog drain rate in signals per second
|
||||
- completed end-to-end business workflows per minute
|
||||
|
||||
### Saturation
|
||||
|
||||
The engine should measure:
|
||||
|
||||
- app process CPU
|
||||
- app process private memory and working set
|
||||
- Oracle container CPU and memory when running locally
|
||||
- queue depth over time
|
||||
- active waiting instances over time
|
||||
- dead-letter depth over time
|
||||
- runtime state update conflicts over time
|
||||
- open task count over time
|
||||
|
||||
### Oracle-Side Signals
|
||||
|
||||
If the environment permits access, also collect:
|
||||
|
||||
- AQ queue depth before, during, and after load
|
||||
- queue-table growth during sustained runs
|
||||
- visible dequeue lag
|
||||
- Oracle session count for the test service
|
||||
- lock or wait spikes on workflow tables
|
||||
- transaction duration for mutation transactions
|
||||
|
||||
If the environment does not permit these views, fall back to:
|
||||
|
||||
- app-side timing
|
||||
- browse counts from AQ
|
||||
- workflow table row counts
|
||||
- signal pump telemetry snapshots
|
||||
|
||||
## Workload Model
|
||||
|
||||
The load plan should be split into four workload families.
|
||||
|
||||
### 1. Transport Microbenchmarks
|
||||
|
||||
These isolate Oracle AQ behavior from workflow logic.
|
||||
|
||||
Use them to answer:
|
||||
|
||||
- how fast can AQ accept immediate messages
|
||||
- how fast can AQ release delayed messages
|
||||
- what is the drain rate for mixed backlogs
|
||||
- how much delayed-message jitter is normal
|
||||
|
||||
Core scenarios:
|
||||
|
||||
- burst immediate enqueue and drain
|
||||
- burst delayed enqueue with same due second
|
||||
- mixed immediate and delayed enqueue on one queue
|
||||
- dequeue rollback redelivery under sustained load
|
||||
- dead-letter and replay backlog
|
||||
- delayed backlog surviving Oracle restart
|
||||
|
||||
### 2. Synthetic Engine Workloads
|
||||
|
||||
These isolate the runtime from business-specific transport noise.
|
||||
|
||||
Recommended synthetic workflow types:
|
||||
|
||||
- start-to-complete with no task
|
||||
- start-to-task with one human task
|
||||
- signal-wait then task activation
|
||||
- timer-wait then task activation
|
||||
- continue-with dispatcher chain
|
||||
- parent-child subworkflow chain
|
||||
|
||||
Use them to answer:
|
||||
|
||||
- raw start throughput
|
||||
- raw resume throughput
|
||||
- timer-due drain rate
|
||||
- subworkflow coordination cost
|
||||
- task activation/update cost
|
||||
|
||||
### 3. Representative Bulstrad Workloads
|
||||
|
||||
These prove that realistic product workflows behave well under load.
|
||||
|
||||
The first performance wave should use workflows that are already functionally covered in the Oracle suite:
|
||||
|
||||
- `AssistantPrintInsisDocuments`
|
||||
- `OpenForChangePolicy`
|
||||
- `ReviewPolicyOpenForChange`
|
||||
- `AssistantAddAnnex`
|
||||
- `AnnexCancellation`
|
||||
- `AssistantPolicyCancellation`
|
||||
- `AssistantPolicyReinstate`
|
||||
- `InsisIntegrationNew`
|
||||
- `QuotationConfirm`
|
||||
- `QuoteOrAplCancel`
|
||||
|
||||
Use them to answer:
|
||||
|
||||
- how the engine behaves with realistic transport payload shaping
|
||||
- how nested child workflows affect latency
|
||||
- how multi-step review chains behave during backlog drain
|
||||
- how short utility flows compare to long policy chains
|
||||
|
||||
### 4. Failure-Under-Load Workloads
|
||||
|
||||
These are not optional. A production engine must be tested while busy.
|
||||
|
||||
Scenarios:
|
||||
|
||||
- provider restart during active signal drain
|
||||
- Oracle restart while delayed backlog exists
|
||||
- dead-letter replay while new live signals continue to arrive
|
||||
- duplicate signal storm against the same waiting instance set
|
||||
- one worker repeatedly failing while another healthy worker continues
|
||||
- scheduled backlog plus external-signal backlog mixed together
|
||||
|
||||
Use them to answer:
|
||||
|
||||
- whether recovery stays bounded
|
||||
- whether backlog drain remains monotonic
|
||||
- whether duplicate-delivery protections still hold under pressure
|
||||
- whether DLQ replay can safely coexist with live traffic
|
||||
|
||||
## Test Tiers
|
||||
|
||||
Performance testing should not be a single bucket.
|
||||
|
||||
### Tier 1: PR Smoke
|
||||
|
||||
Purpose:
|
||||
|
||||
- catch catastrophic regressions quickly
|
||||
|
||||
Characteristics:
|
||||
|
||||
- small datasets
|
||||
- short run time
|
||||
- deterministic scenarios
|
||||
- hard pass/fail envelopes
|
||||
|
||||
Recommended scope:
|
||||
|
||||
- one AQ immediate burst
|
||||
- one AQ delayed backlog burst
|
||||
- one synthetic signal-resume scenario
|
||||
- one short Bulstrad business flow
|
||||
|
||||
Target duration:
|
||||
|
||||
- under 5 minutes total
|
||||
|
||||
Gating style:
|
||||
|
||||
- zero correctness failures
|
||||
- no DLQ unless explicitly expected
|
||||
- coarse latency ceilings only
|
||||
|
||||
### Tier 2: Nightly Characterization
|
||||
|
||||
Purpose:
|
||||
|
||||
- measure trends and detect meaningful performance regression
|
||||
|
||||
Characteristics:
|
||||
|
||||
- moderate dataset
|
||||
- multiple concurrency levels
|
||||
- metrics persisted as artifacts
|
||||
|
||||
Recommended scope:
|
||||
|
||||
- full Oracle transport matrix
|
||||
- synthetic engine workloads at 1, 4, 8, and 16-way concurrency
|
||||
- 3-5 representative Bulstrad families
|
||||
- restart and DLQ replay under moderate backlog
|
||||
|
||||
Target duration:
|
||||
|
||||
- 15 to 45 minutes
|
||||
|
||||
Gating style:
|
||||
|
||||
- correctness failures fail the run
|
||||
- latency/throughput compare against baseline with tolerance
|
||||
|
||||
### Tier 3: Weekly Soak
|
||||
|
||||
Purpose:
|
||||
|
||||
- detect leaks, drift, and long-tail timing issues
|
||||
|
||||
Characteristics:
|
||||
|
||||
- long-running mixed workload
|
||||
- periodic restarts or controlled faults
|
||||
- queue depth and runtime-state stability tracking
|
||||
|
||||
Recommended scope:
|
||||
|
||||
- 30 to 120 minute mixed load
|
||||
- immediate, delayed, and replay traffic mixed together
|
||||
- repeated provider restarts
|
||||
- one Oracle restart in the middle of the run
|
||||
|
||||
Gating style:
|
||||
|
||||
- no unbounded backlog growth
|
||||
- no stuck instances
|
||||
- no memory growth trend outside a defined envelope
|
||||
|
||||
### Tier 4: Explicit Capacity And Breakpoint Runs
|
||||
|
||||
Purpose:
|
||||
|
||||
- learn real limits before production sizing decisions
|
||||
|
||||
Characteristics:
|
||||
|
||||
- not part of normal CI
|
||||
- intentionally pushes throughput until latency or failure thresholds break
|
||||
|
||||
Recommended scope:
|
||||
|
||||
- ramp concurrency upward until queue lag or DB pressure exceeds target
|
||||
- test one-node and multi-node configurations
|
||||
- record saturation points, not just pass/fail
|
||||
|
||||
Deliverable:
|
||||
|
||||
- capacity report with recommended node counts and operational envelopes
|
||||
|
||||
## Scenario Matrix
|
||||
|
||||
The initial scenario matrix should look like this.
|
||||
|
||||
### Oracle AQ Transport
|
||||
|
||||
- immediate burst: 100, 500, 1000 messages
|
||||
- delayed burst: 50, 100, 250 messages due in same second
|
||||
- mixed burst: 70 percent immediate, 30 percent delayed
|
||||
- redelivery burst: 25 messages rolled back once then committed
|
||||
- DLQ burst: 25 poison messages then replay
|
||||
|
||||
### Synthetic Engine
|
||||
|
||||
- start-to-task: 50, 200, 500 workflow starts
|
||||
- task-complete-to-next-task: 50, 200 completions
|
||||
- signal-wait-resume: 50, 200, 500 waiting instances resumed concurrently
|
||||
- timer-wait-resume: 50, 200 due timers
|
||||
- subworkflow chain: 25, 100 parent-child chains
|
||||
|
||||
### Bulstrad Business
|
||||
|
||||
- short business flow: `QuoteOrAplCancel`
|
||||
- medium transport flow: `InsisIntegrationNew`
|
||||
- child-workflow flow: `QuotationConfirm`
|
||||
- long review chain: `OpenForChangePolicy`
|
||||
- print flow: `AssistantPrintInsisDocuments`
|
||||
- cancellation flow: `AnnexCancellation`
|
||||
|
||||
### Failure Under Load
|
||||
|
||||
- 100 waiting instances, provider restart during drain
|
||||
- 100 delayed messages, Oracle restart before due time
|
||||
- 50 poison signals plus live replay traffic
|
||||
- duplicate external signal storm against 50 waiting instances
|
||||
- mixed task completions and signal resumes on same service instance set
|
||||
|
||||
## Concurrency Steps
|
||||
|
||||
Use explicit concurrency ladders instead of one arbitrary load value.
|
||||
|
||||
Recommended first ladder:
|
||||
|
||||
- 1
|
||||
- 4
|
||||
- 8
|
||||
- 16
|
||||
- 32
|
||||
|
||||
Use different ladders if the environment is too small, but always record:
|
||||
|
||||
- node count
|
||||
- worker concurrency
|
||||
- queue backlog size
|
||||
- workflow count
|
||||
- message mix
|
||||
|
||||
## Metrics Collection Design
|
||||
|
||||
The harness should persist results for every performance run.
|
||||
|
||||
Each result set should include:
|
||||
|
||||
- scenario name
|
||||
- git commit or working tree marker
|
||||
- test timestamp
|
||||
- environment label
|
||||
- node count
|
||||
- concurrency level
|
||||
- workflow count
|
||||
- signal count
|
||||
- Oracle queue names used
|
||||
- measured latency summary
|
||||
- throughput summary
|
||||
- correctness summary
|
||||
- process resource summary
|
||||
- optional Oracle observations
|
||||
|
||||
Recommended output format:
|
||||
|
||||
- JSON artifact for machines
|
||||
- short markdown summary for humans
|
||||
|
||||
Recommended location:
|
||||
|
||||
- `TestResults/workflow-performance/`
|
||||
|
||||
## Baseline Strategy
|
||||
|
||||
Do not hard-code aggressive latency thresholds before collecting stable data.
|
||||
|
||||
Use this sequence:
|
||||
|
||||
1. characterization phase
|
||||
Run each scenario several times on local Docker and CI Oracle.
|
||||
|
||||
2. baseline phase
|
||||
Record stable p50, p95, p99, throughput, and drain-rate envelopes.
|
||||
|
||||
3. gating phase
|
||||
Add coarse PR thresholds and tighter nightly regression detection.
|
||||
|
||||
PR thresholds should be:
|
||||
|
||||
- intentionally forgiving
|
||||
- correctness-first
|
||||
- designed to catch major regressions only
|
||||
|
||||
Nightly thresholds should be:
|
||||
|
||||
- baseline-relative
|
||||
- environment-specific if necessary
|
||||
- reviewed whenever Oracle container images or CI hardware changes
|
||||
|
||||
## Harness Design
|
||||
|
||||
The load harness should be separate from the normal fast integration suite.
|
||||
|
||||
Recommended structure:
|
||||
|
||||
- keep correctness-focused Oracle AQ tests in the current integration project
|
||||
- add categorized performance tests with explicit categories such as:
|
||||
- `WorkflowPerfLatency`
|
||||
- `WorkflowPerfThroughput`
|
||||
- `WorkflowPerfSmoke`
|
||||
- `WorkflowPerfNightly`
|
||||
- `WorkflowPerfSoak`
|
||||
- `WorkflowPerfCapacity`
|
||||
- keep scenario builders reusable so the same workflow/transports can be used in correctness and performance runs
|
||||
|
||||
The harness should include:
|
||||
|
||||
- scenario driver
|
||||
- result collector
|
||||
- metric aggregator
|
||||
- optional Oracle observation collector
|
||||
- artifact writer
|
||||
- explicit phase-latency capture for start, signal publish, and signal-to-completion on the synthetic signal round-trip workload
|
||||
|
||||
## Multi-Backend Expansion Rules
|
||||
|
||||
Once Oracle is the validated reference baseline, PostgreSQL and MongoDB must adopt the same load and performance structure instead of inventing backend-specific suites first.
|
||||
|
||||
Required rules:
|
||||
|
||||
- keep one shared scenario catalog for Oracle, PostgreSQL, and MongoDB
|
||||
- compare backends first on normalized workflow metrics, not backend-native counters
|
||||
- keep backend-native metrics as appendices, not as the headline result
|
||||
- use the same tier names and artifact schema across all backends
|
||||
- keep the same curated Bulstrad workload pack across all backends unless a workflow is backend-blocked by a real functional defect
|
||||
|
||||
The shared artifact set should ultimately include:
|
||||
|
||||
- `10-oracle-performance-baseline-<date>.md/.json`
|
||||
- `11-postgres-performance-baseline-<date>.md/.json`
|
||||
- `12-mongo-performance-baseline-<date>.md/.json`
|
||||
- `13-backend-comparison-<date>.md/.json`
|
||||
|
||||
The shared normalized metrics are:
|
||||
|
||||
- serial end-to-end latency
|
||||
- start-to-first-task latency
|
||||
- signal-publish-to-visible-resume latency
|
||||
- steady-state throughput
|
||||
- capacity ladder at `c1`, `c4`, `c8`, and `c16`
|
||||
- backlog drain time
|
||||
- failures
|
||||
- dead letters
|
||||
- runtime conflicts
|
||||
- stuck instances
|
||||
|
||||
Backend-native appendices should include:
|
||||
|
||||
- Oracle:
|
||||
- AQ browse depth
|
||||
- `V$SYSSTAT` deltas
|
||||
- `V$SYS_TIME_MODEL` deltas
|
||||
- top wait deltas
|
||||
- PostgreSQL:
|
||||
- queue-table depth
|
||||
- `pg_stat_database`
|
||||
- `pg_stat_statements`
|
||||
- lock and wait observations
|
||||
- WAL pressure observations
|
||||
- MongoDB:
|
||||
- signal collection depth
|
||||
- `serverStatus` counters
|
||||
- transaction counters
|
||||
- change-stream wake observations
|
||||
- lock percentage observations
|
||||
|
||||
## Oracle-Specific Observation Plan
|
||||
|
||||
For Oracle-backed runs, observe both the engine and the database.
|
||||
|
||||
At minimum, record:
|
||||
|
||||
- AQ browse depth before, during, and after the run
|
||||
- count of runtime-state rows touched
|
||||
- count of task and task-event rows created
|
||||
- number of dead-lettered signals
|
||||
- duplicate/stale resume ignore count
|
||||
|
||||
If the environment allows deeper Oracle access, also record:
|
||||
|
||||
- session count for the service user
|
||||
- top wait classes during the run
|
||||
- lock waits on workflow tables
|
||||
- statement time for key mutation queries
|
||||
|
||||
## Exit Criteria
|
||||
|
||||
The load/performance work is complete when:
|
||||
|
||||
- PR smoke scenarios are stable and cheap enough to run continuously
|
||||
- nightly characterization produces persisted metrics and useful regression signal
|
||||
- at least one weekly soak run is stable without correctness drift
|
||||
- representative Bulstrad families have measured latency and throughput envelopes
|
||||
- Oracle restart, provider restart, DLQ replay, and duplicate-delivery scenarios are all characterized under load
|
||||
- the team can state a first production sizing recommendation for one node and multi-node deployment
|
||||
|
||||
## Next Sprint Shape
|
||||
|
||||
This plan maps naturally to a dedicated sprint focused on:
|
||||
|
||||
- performance harness infrastructure
|
||||
- synthetic scenario library
|
||||
- representative Bulstrad workload runner
|
||||
- metrics artifact generation
|
||||
- baseline capture
|
||||
- first capacity report
|
||||
|
||||
806
docs/workflow/engine/09-backend-portability-plan.md
Normal file
806
docs/workflow/engine/09-backend-portability-plan.md
Normal file
@@ -0,0 +1,806 @@
|
||||
# 09. Backend Portability Plan
|
||||
|
||||
## Purpose
|
||||
|
||||
This document defines how `SerdicaEngine` should evolve from an Oracle-first runtime into a backend-switchable engine that can also run on PostgreSQL and MongoDB without changing workflow declarations, canonical definitions, or runtime semantics.
|
||||
|
||||
The goal is not to support every backend in the same way internally.
|
||||
|
||||
The goal is to preserve one stable engine contract:
|
||||
|
||||
- the same declarative workflow classes
|
||||
- the same canonical runtime definitions
|
||||
- the same public workflow/task APIs
|
||||
- the same runtime behavior around tasks, waits, timers, external signals, subworkflows, retries, and retention
|
||||
|
||||
Backend switching must only change infrastructure adapters and host configuration.
|
||||
|
||||
## Current Baseline
|
||||
|
||||
Today the strongest backend shape is Oracle:
|
||||
|
||||
- runtime state persists in an Oracle-backed runtime-state adapter
|
||||
- projections persist in an Oracle-backed projection adapter
|
||||
- immediate signaling and delayed scheduling run through Oracle AQ adapters
|
||||
- the engine host composes those adapters through backend registration
|
||||
|
||||
Oracle is the reference implementation because it already gives:
|
||||
|
||||
- one durable database
|
||||
- durable queueing
|
||||
- delayed delivery
|
||||
- blocking dequeue without polling
|
||||
- transactional coupling between state mutation and queue enqueue
|
||||
|
||||
That reference point matters because PostgreSQL and MongoDB must match the engine contract even if they reach it through different infrastructure mechanisms.
|
||||
|
||||
## Non-Negotiable Product Rules
|
||||
|
||||
Backend portability must not break these rules:
|
||||
|
||||
1. Authored workflow classes remain pure declaration classes.
|
||||
2. Canonical runtime definitions remain backend-agnostic.
|
||||
3. Engine execution remains run-to-wait.
|
||||
4. Multi-instance deployment remains supported.
|
||||
5. Steady-state signal and timer discovery must not rely on polling loops.
|
||||
6. Signal delivery remains at-least-once.
|
||||
7. Resume remains idempotent through version and waiting-token checks.
|
||||
8. Public API contracts and projections remain stable.
|
||||
9. Operational features remain available:
|
||||
- signal raise
|
||||
- dead-letter inspection
|
||||
- dead-letter replay
|
||||
- runtime inspection
|
||||
- retention
|
||||
- diagram inspection
|
||||
|
||||
## Architecture Principle
|
||||
|
||||
Do not make the engine "database-agnostic" by hiding everything behind one giant repository.
|
||||
|
||||
That approach will collapse important guarantees.
|
||||
|
||||
Instead, separate the backend into explicit capabilities:
|
||||
|
||||
1. runtime state persistence
|
||||
2. projection persistence
|
||||
3. signal transport
|
||||
4. schedule transport
|
||||
5. mutation transaction boundary
|
||||
6. wake-up notification strategy
|
||||
7. lease or concurrency strategy
|
||||
8. dead-letter and replay strategy
|
||||
9. retention and purge strategy
|
||||
|
||||
Each backend implementation must satisfy the full capability matrix.
|
||||
|
||||
## Implemented Signal Driver Split
|
||||
|
||||
The engine now separates durable signal ownership from wake-up delivery.
|
||||
|
||||
The shared seam is defined by engine signal-driver abstractions plus signal and schedule bridge contracts.
|
||||
|
||||
That split exists to preserve transactional correctness while still allowing faster wake strategies later.
|
||||
|
||||
The separation is:
|
||||
|
||||
- `IWorkflowSignalStore`: durable immediate signal persistence
|
||||
- `IWorkflowSignalDriver`: wake-up and claim path for available signals
|
||||
- `IWorkflowSignalScheduler`: durable delayed-signal persistence
|
||||
- `IWorkflowWakeOutbox`: deferred wake publication when the driver is not transaction-coupled to the durable store
|
||||
|
||||
The public engine surface still uses:
|
||||
|
||||
- `IWorkflowSignalBus`
|
||||
- `IWorkflowScheduleBus`
|
||||
|
||||
Those are now bridge contracts.
|
||||
|
||||
They do not define backend mechanics directly.
|
||||
|
||||
### Current Backend Matrix
|
||||
|
||||
| Backend profile | Durable signal store | Wake driver | Schedule store | Dispatch mode |
|
||||
|-----------|--------|------------|---------|-------------|
|
||||
| Oracle | Oracle AQ signal adapter | Oracle AQ blocking dequeue | Oracle AQ schedule adapter | `NativeTransactional` |
|
||||
| PostgreSQL | PostgreSQL durable signal store | PostgreSQL native wake or claim adapter | PostgreSQL durable schedule store | `NativeTransactional` |
|
||||
| MongoDB | MongoDB durable signal store | MongoDB change-stream wake or claim adapter | MongoDB durable schedule store | `NativeTransactional` |
|
||||
|
||||
### Implemented Optional Redis Wake Driver
|
||||
|
||||
The Redis driver is implemented as a separate wake-driver plugin.
|
||||
|
||||
Its shape is intentionally narrow:
|
||||
|
||||
- Oracle, PostgreSQL, and MongoDB remain the durable signal stores.
|
||||
- Oracle, PostgreSQL, and MongoDB persist durable signals transactionally.
|
||||
- Redis receives wake hints directly after commit through the mutation scope post-commit hook.
|
||||
- workers wake through Redis and then claim from the durable backend store.
|
||||
|
||||
Oracle is now supported in this combination, but it is not the preferred Oracle profile.
|
||||
Oracle native AQ wake remains the default because it is slightly faster in the current measurements and keeps the cleanest native timer and dequeue path.
|
||||
|
||||
Redis on Oracle exists for topology consistency, not because Oracle needs Redis for correctness or because it is the current fastest Oracle path.
|
||||
|
||||
### Redis Driver Rules
|
||||
|
||||
Redis must remain a wake driver plugin, not the authoritative durable signal queue for mixed backends.
|
||||
|
||||
The intended shape is:
|
||||
|
||||
- Oracle or PostgreSQL or MongoDB remains the durable `IWorkflowSignalStore`
|
||||
- Redis becomes an `IWorkflowSignalDriver`
|
||||
- Redis is published directly after the durable store transaction commits
|
||||
- backend-native wake drivers are not active when Redis is selected
|
||||
|
||||
That preserves the required correctness model:
|
||||
|
||||
1. persist runtime state, projections, and durable signal inside the backend mutation boundary
|
||||
2. commit the mutation boundary
|
||||
3. publish the Redis wake hint from the registered post-commit action
|
||||
4. wake workers and claim from the durable backend store
|
||||
|
||||
`IWorkflowWakeOutbox` remains in the abstraction set for future non-Redis wake drivers that may still need deferred publication, but it is not the active Redis hot path.
|
||||
|
||||
Redis may improve signal-to-resume latency, especially for PostgreSQL and MongoDB where the durable store and the wake path are already split cleanly.
|
||||
|
||||
Redis must not become the correctness layer unless the whole durable signal model also moves there, which is not the design target of this engine.
|
||||
|
||||
## Required Capability Matrix
|
||||
|
||||
Every engine backend profile must define concrete answers for the following:
|
||||
|
||||
| Capability | Oracle | PostgreSQL | MongoDB |
|
||||
|-----------|--------|------------|---------|
|
||||
| Runtime state durability | Native | Required | Required |
|
||||
| Projection durability | Native | Required | Required |
|
||||
| Optimistic concurrency | Row/version | Row/version | Document version |
|
||||
| Immediate signal durability | AQ | Queue table or queue extension | Signal collection |
|
||||
| Delayed scheduling | AQ delayed delivery | Durable due-message table | Durable due-message collection |
|
||||
| Blocking wake-up | AQ dequeue | `LISTEN/NOTIFY`, Redis wake driver, or dedicated queue worker | Change streams or Redis wake driver |
|
||||
| Atomic state + signal publish | Native DB transaction | Outbox transaction | Transactional outbox or equivalent |
|
||||
| Dead-letter support | AQ + table | Queue/DLQ table | DLQ collection |
|
||||
| Multi-node safety | DB + AQ | DB + wake hints | DB + change stream / wake hints |
|
||||
| Restart recovery | Native | Required | Required |
|
||||
|
||||
The backend is not complete until every row has a real implementation.
|
||||
|
||||
## Engine Backend Layers
|
||||
|
||||
The switchable backend model should be built around these interfaces.
|
||||
|
||||
### 1. Runtime State Store
|
||||
|
||||
Responsible for:
|
||||
|
||||
- loading runtime snapshot by workflow instance id
|
||||
- inserting new snapshot
|
||||
- updating snapshot with expected version
|
||||
- querying runtime status for operational needs
|
||||
- storing engine-specific snapshot JSON
|
||||
|
||||
Target interface shape:
|
||||
|
||||
```csharp
|
||||
public interface IWorkflowRuntimeStateStore
|
||||
{
|
||||
Task<WorkflowRuntimeStateRecord?> GetAsync(string workflowInstanceId, CancellationToken ct = default);
|
||||
Task InsertAsync(WorkflowRuntimeStateRecord record, CancellationToken ct = default);
|
||||
Task UpdateAsync(
|
||||
WorkflowRuntimeStateRecord record,
|
||||
long expectedVersion,
|
||||
CancellationToken ct = default);
|
||||
}
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- Oracle and PostgreSQL should use explicit version columns.
|
||||
- MongoDB should use a document version field and conditional update filter.
|
||||
- This store must not also own signal publishing logic.
|
||||
|
||||
### 2. Projection Store
|
||||
|
||||
Responsible for:
|
||||
|
||||
- workflow instance summaries
|
||||
- task summaries
|
||||
- task event history
|
||||
- business reference lookup
|
||||
- support read APIs
|
||||
|
||||
The projection model is product-facing and must remain stable.
|
||||
|
||||
That means:
|
||||
|
||||
- the shape of projection records must not depend on the backend
|
||||
- only the persistence adapter may change
|
||||
|
||||
Target direction:
|
||||
|
||||
- split the current projection application service into a backend-neutral application service plus backend adapters
|
||||
- keep one projection contract
|
||||
- allow Oracle and PostgreSQL to stay relational
|
||||
- allow MongoDB to project into document collections if needed
|
||||
|
||||
### 3. Signal Bus
|
||||
|
||||
Responsible for durable immediate signals:
|
||||
|
||||
- internal continue
|
||||
- external signal
|
||||
- task completion continuation
|
||||
- subworkflow completion
|
||||
- replay from dead-letter
|
||||
|
||||
The current contract already exists in the engine runtime abstractions.
|
||||
|
||||
Required guarantees:
|
||||
|
||||
- at-least-once delivery
|
||||
- ack only after successful processing
|
||||
- delivery count visibility
|
||||
- explicit abandon
|
||||
- explicit dead-letter move
|
||||
- replay support
|
||||
|
||||
### 4. Schedule Bus
|
||||
|
||||
Responsible for durable delayed delivery:
|
||||
|
||||
- timer due
|
||||
- retry due
|
||||
- delayed continuation
|
||||
|
||||
Required guarantees:
|
||||
|
||||
- message is not lost across process restart
|
||||
- message becomes visible at or after due time
|
||||
- stale due messages are safely ignored through waiting tokens
|
||||
- schedule and immediate signal semantics use the same envelope model
|
||||
|
||||
### 5. Mutation Transaction Boundary
|
||||
|
||||
This is the most important portability seam.
|
||||
|
||||
The engine mutates three things together:
|
||||
|
||||
- runtime state
|
||||
- projections
|
||||
- signals or schedules
|
||||
|
||||
Oracle can do that in one database transaction because state, projections, and AQ live inside the same durable boundary.
|
||||
|
||||
PostgreSQL and MongoDB may require an outbox-based boundary instead.
|
||||
|
||||
This must be explicit:
|
||||
|
||||
```csharp
|
||||
public interface IWorkflowMutationCoordinator
|
||||
{
|
||||
Task ExecuteAsync(
|
||||
Func<IWorkflowMutationContext, CancellationToken, Task> action,
|
||||
CancellationToken ct = default);
|
||||
}
|
||||
```
|
||||
|
||||
Where the mutation context exposes:
|
||||
|
||||
- runtime state adapter
|
||||
- projection adapter
|
||||
- signal outbox writer
|
||||
- schedule outbox writer
|
||||
|
||||
Do not let the runtime service hand-roll transaction logic per backend.
|
||||
|
||||
### 6. Wake-Up Notifier
|
||||
|
||||
The engine must not scan due rows in a steady loop.
|
||||
|
||||
That means every backend needs a wake-up channel:
|
||||
|
||||
- Oracle: AQ blocking dequeue
|
||||
- PostgreSQL: `LISTEN/NOTIFY` as wake hint for durable queue tables
|
||||
- MongoDB: change streams as wake hint for durable signal collections
|
||||
|
||||
The wake-up channel is not the durable source of truth except in Oracle AQ.
|
||||
|
||||
It is only the wake mechanism.
|
||||
|
||||
That distinction is mandatory for PostgreSQL and MongoDB.
|
||||
|
||||
## Backend Profiles
|
||||
|
||||
## Oracle Profile
|
||||
|
||||
### Role
|
||||
|
||||
Oracle remains the reference backend profile and the operational default.
|
||||
|
||||
### Storage Model
|
||||
|
||||
- runtime state table
|
||||
- relational projection tables
|
||||
- AQ signal queue
|
||||
- AQ schedule queue or delayed signal queue
|
||||
- DLQ table and AQ-assisted replay
|
||||
|
||||
### Commit Model
|
||||
|
||||
- one transaction for runtime state, projections, and AQ enqueue
|
||||
|
||||
### Wake Model
|
||||
|
||||
- AQ blocking dequeue
|
||||
|
||||
### Advantages
|
||||
|
||||
- strongest correctness story
|
||||
- simplest atomic mutation model
|
||||
- no extra wake layer required
|
||||
|
||||
### Risks
|
||||
|
||||
- Oracle-specific infrastructure coupling
|
||||
- AQ operational expertise required
|
||||
- portability work must not assume AQ-only features in engine logic
|
||||
|
||||
Oracle should be treated as the semantic gold standard that other backends must match.
|
||||
|
||||
## PostgreSQL Profile
|
||||
|
||||
### Goal
|
||||
|
||||
Provide a backend profile that preserves engine semantics using PostgreSQL as the durable system of record.
|
||||
|
||||
### Recommended Shape
|
||||
|
||||
- runtime state in PostgreSQL tables
|
||||
- projections in PostgreSQL tables
|
||||
- durable signal queue table
|
||||
- durable schedule queue table
|
||||
- DLQ table
|
||||
- `LISTEN/NOTIFY` for wake-up hints only
|
||||
|
||||
### Why Not `LISTEN/NOTIFY` Alone
|
||||
|
||||
`LISTEN/NOTIFY` is not sufficient as the durable signal layer because notifications are ephemeral.
|
||||
|
||||
The durable truth must stay in tables.
|
||||
|
||||
The recommended model is:
|
||||
|
||||
1. insert durable signal row in the same transaction as state/projection mutation
|
||||
2. emit `NOTIFY` before commit or immediately after durable insert
|
||||
3. workers wake up and claim rows from the signal queue table
|
||||
4. if notification is missed, the next notification or startup recovery still finds the rows
|
||||
|
||||
### Queue Claim Strategy
|
||||
|
||||
Recommended queue-claim pattern:
|
||||
|
||||
- `FOR UPDATE SKIP LOCKED`
|
||||
- ordered by available time, priority, and creation time
|
||||
- delivery count increment on claim
|
||||
- explicit ack by state transition or delete
|
||||
- explicit dead-letter move after delivery limit
|
||||
|
||||
### Schedule Strategy
|
||||
|
||||
Recommended schedule table:
|
||||
|
||||
- `signal_id`
|
||||
- `available_at_utc`
|
||||
- `workflow_instance_id`
|
||||
- `runtime_provider`
|
||||
- `signal_type`
|
||||
- serialized payload
|
||||
- delivery count
|
||||
- dead-letter metadata
|
||||
|
||||
Recommended wake-up path:
|
||||
|
||||
- durable insert into schedule table
|
||||
- `NOTIFY workflow_signal`
|
||||
- workers wake and attempt claim of rows with `available_at_utc <= now()`
|
||||
|
||||
This is still not "polling" if workers block on `LISTEN` and only do bounded claim attempts on wake-up, startup, and recovery events.
|
||||
|
||||
### Atomicity Model
|
||||
|
||||
PostgreSQL cannot rely on an external broker if we want the same atomicity guarantees.
|
||||
|
||||
The cleanest profile is:
|
||||
|
||||
- database state
|
||||
- database projections
|
||||
- database signal queue
|
||||
- database schedule queue
|
||||
- `NOTIFY` as non-durable wake hint
|
||||
|
||||
That keeps the entire correctness boundary in PostgreSQL.
|
||||
|
||||
### Operational Notes
|
||||
|
||||
Need explicit handling for:
|
||||
|
||||
- orphan claimed rows after node crash
|
||||
- reclaim timeout
|
||||
- dead-letter browsing and replay
|
||||
- table bloat and retention
|
||||
- index strategy for due rows
|
||||
|
||||
### Suggested Components
|
||||
|
||||
- `PostgresWorkflowRuntimeStateStore`
|
||||
- `PostgresWorkflowProjectionStore`
|
||||
- `PostgresWorkflowSignalQueue`
|
||||
- `PostgresWorkflowScheduleQueue`
|
||||
- `PostgresWorkflowWakeListener`
|
||||
- `PostgresWorkflowMutationCoordinator`
|
||||
|
||||
## MongoDB Profile
|
||||
|
||||
### Goal
|
||||
|
||||
Provide a backend profile that preserves engine semantics using MongoDB as the durable system of record.
|
||||
|
||||
### Recommended Shape
|
||||
|
||||
- runtime state in a `workflow_runtime_states` collection
|
||||
- projections in dedicated collections
|
||||
- durable `workflow_signals` collection
|
||||
- durable `workflow_schedules` collection
|
||||
- dead-letter collection
|
||||
- change streams for wake-up hints
|
||||
|
||||
### Why Change Streams Are Not Enough
|
||||
|
||||
Change streams are a wake mechanism, not the durable queue itself.
|
||||
|
||||
The durable truth must remain in collections so the engine can recover after:
|
||||
|
||||
- service restart
|
||||
- watcher restart
|
||||
- temporary connectivity loss
|
||||
|
||||
### Document Model
|
||||
|
||||
Signal document fields should include:
|
||||
|
||||
- `_id`
|
||||
- `workflowInstanceId`
|
||||
- `runtimeProvider`
|
||||
- `signalType`
|
||||
- `waitingToken`
|
||||
- `expectedVersion`
|
||||
- `dueAtUtc`
|
||||
- `status`
|
||||
- `deliveryCount`
|
||||
- `claimedBy`
|
||||
- `claimedAtUtc`
|
||||
- `deadLetterReason`
|
||||
- `payload`
|
||||
|
||||
### Claim Strategy
|
||||
|
||||
Recommended model:
|
||||
|
||||
- atomically claim one available document with `findOneAndUpdate`
|
||||
- filter by:
|
||||
- `status = Ready`
|
||||
- `dueAtUtc <= now`
|
||||
- not already claimed
|
||||
- set:
|
||||
- `status = Claimed`
|
||||
- `claimedBy`
|
||||
- `claimedAtUtc`
|
||||
- increment `deliveryCount`
|
||||
|
||||
Ack means:
|
||||
|
||||
- delete the signal or mark it completed
|
||||
|
||||
Abandon means:
|
||||
|
||||
- move back to `Ready`
|
||||
|
||||
Dead-letter means:
|
||||
|
||||
- move to DLQ collection or set `status = DeadLetter`
|
||||
|
||||
### Schedule Strategy
|
||||
|
||||
Two reasonable models exist.
|
||||
|
||||
#### Model A: Separate Schedule Collection
|
||||
|
||||
- keep delayed signals in `workflow_schedules`
|
||||
- promote due documents into `workflow_signals`
|
||||
- wake workers through change streams
|
||||
|
||||
This is simpler conceptually but adds one extra movement step.
|
||||
|
||||
#### Model B: Unified Signal Collection
|
||||
|
||||
- store all signals in one collection
|
||||
- use `dueAtUtc` and `status`
|
||||
- workers claim only due documents
|
||||
|
||||
This is the better v1 choice because it keeps one signal envelope pipeline.
|
||||
|
||||
### Atomicity Model
|
||||
|
||||
MongoDB can support multi-document transactions in replica-set mode.
|
||||
|
||||
That means the preferred model is:
|
||||
|
||||
- runtime state
|
||||
- projections
|
||||
- signal collection writes
|
||||
- schedule writes
|
||||
|
||||
all inside one MongoDB transaction.
|
||||
|
||||
If that operational assumption is unacceptable, then MongoDB is not a correctness-grade replacement for the Oracle profile and should not be offered as a production engine backend.
|
||||
|
||||
### Wake Model
|
||||
|
||||
Use change streams to avoid steady-state polling:
|
||||
|
||||
- watch inserts and state transitions for ready or due signals
|
||||
- on startup, run bounded recovery sweep for unclaimed ready signals
|
||||
- on worker restart, resume from durable signal documents, not from missed change stream events
|
||||
|
||||
### Operational Notes
|
||||
|
||||
Need explicit handling for:
|
||||
|
||||
- resume token persistence for observers
|
||||
- claimed-document recovery after node failure
|
||||
- shard-key implications if sharding is introduced later
|
||||
- transactional prerequisites in local and CI test environments
|
||||
|
||||
### Suggested Components
|
||||
|
||||
- `MongoWorkflowRuntimeStateStore`
|
||||
- `MongoWorkflowProjectionStore`
|
||||
- `MongoWorkflowSignalStore`
|
||||
- `MongoWorkflowWakeStreamListener`
|
||||
- `MongoWorkflowMutationCoordinator`
|
||||
|
||||
## Backend Selection Model
|
||||
|
||||
The engine should not expose dozens of independent switches in appsettings.
|
||||
|
||||
Use one backend profile section plus internal composition.
|
||||
|
||||
Recommended shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowEngine": {
|
||||
"BackendProfile": "Oracle"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And then backend-specific option sections:
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowBackend:Oracle": {
|
||||
"ConnectionString": "...",
|
||||
"QueueOwner": "SRD_WFKLW",
|
||||
"SignalQueueName": "WF_SIGNAL_Q",
|
||||
"DeadLetterQueueName": "WF_SIGNAL_DLQ"
|
||||
},
|
||||
"WorkflowBackend:PostgreSql": {
|
||||
"ConnectionString": "...",
|
||||
"SignalTable": "workflow_signals",
|
||||
"ScheduleTable": "workflow_schedules",
|
||||
"DeadLetterTable": "workflow_signal_dead_letters",
|
||||
"NotificationChannel": "workflow_signal"
|
||||
},
|
||||
"WorkflowBackend:MongoDb": {
|
||||
"ConnectionString": "...",
|
||||
"DatabaseName": "serdica_workflow",
|
||||
"SignalCollection": "workflow_signals",
|
||||
"RuntimeStateCollection": "workflow_runtime_states",
|
||||
"ProjectionPrefix": "workflow"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The DI layer should map `BackendProfile` to one complete backend package, not a mix-and-match set of partial adapters.
|
||||
|
||||
That avoids unsupported combinations like:
|
||||
|
||||
- Oracle state + Mongo signals
|
||||
- PostgreSQL state + Redis schedule
|
||||
|
||||
unless they are designed explicitly as a later profile.
|
||||
|
||||
## Implementation Refactor Needed
|
||||
|
||||
To make the backend switch clean, the current Oracle-first host should be refactored in this order.
|
||||
|
||||
### Phase 1: Split Projection Persistence
|
||||
|
||||
Refactor the current projection application service into:
|
||||
|
||||
- projection application service
|
||||
- backend-neutral projection contract
|
||||
- Oracle implementation
|
||||
|
||||
Then add backend implementations later without changing the application service.
|
||||
|
||||
### Phase 2: Introduce Dedicated Backend Plugin Registration
|
||||
|
||||
Add:
|
||||
|
||||
```csharp
|
||||
public interface IWorkflowBackendRegistrationMarker
|
||||
{
|
||||
string BackendName { get; }
|
||||
}
|
||||
```
|
||||
|
||||
Then create dedicated backend plugins for:
|
||||
|
||||
- Oracle
|
||||
- PostgreSQL
|
||||
- MongoDB
|
||||
|
||||
The host should remain backend-neutral and validate that the selected backend plugin has registered itself.
|
||||
Each backend plugin should own registration of:
|
||||
|
||||
- runtime state store
|
||||
- projection store
|
||||
- mutation coordinator
|
||||
- signal bus
|
||||
- schedule bus
|
||||
- dead-letter store
|
||||
- backend-specific options and wake-up strategy
|
||||
|
||||
### Phase 3: Move Transaction Logic Into Backend Coordinator
|
||||
|
||||
Refactor the current workflow mutation transaction scope so the runtime service no longer knows whether the backend uses:
|
||||
|
||||
- direct database transaction
|
||||
- database transaction plus outbox
|
||||
- document transaction
|
||||
|
||||
The runtime service should only ask for one mutation boundary.
|
||||
|
||||
### Phase 4: Normalize Dead-Letter Model
|
||||
|
||||
Standardize a backend-neutral dead-letter record so the operational endpoints do not care which backend stores it.
|
||||
|
||||
That includes:
|
||||
|
||||
- signal id
|
||||
- workflow instance id
|
||||
- signal type
|
||||
- first failure time
|
||||
- last failure time
|
||||
- delivery count
|
||||
- last error
|
||||
- payload snapshot
|
||||
|
||||
### Phase 5: Introduce Backend Conformance Tests
|
||||
|
||||
Every backend must pass the same contract suite:
|
||||
|
||||
- state insert/update/version conflict
|
||||
- task activation and completion
|
||||
- timer due resume
|
||||
- external signal resume
|
||||
- subworkflow completion resume
|
||||
- duplicate delivery safety
|
||||
- restart recovery
|
||||
- dead-letter move and replay
|
||||
- retention and purge
|
||||
|
||||
Oracle should remain the first backend to pass the full suite.
|
||||
|
||||
PostgreSQL and MongoDB are not ready until they pass the same suite.
|
||||
|
||||
## Backend-Specific Risks
|
||||
|
||||
## PostgreSQL Risks
|
||||
|
||||
- row-level queue claim logic can create hot indexes under high throughput
|
||||
- `LISTEN/NOTIFY` payloads are not durable
|
||||
- reclaim and retry logic must be designed carefully to avoid stuck claimed rows
|
||||
- due-row access patterns must be tuned with indexes and partitioning if volume grows
|
||||
|
||||
## MongoDB Risks
|
||||
|
||||
- production-grade correctness depends on replica-set transactions
|
||||
- change streams add operational requirements and resume-token handling
|
||||
- projection queries may become more complex if the read model is heavily relational today
|
||||
- collection growth and retention strategy must be explicit early
|
||||
|
||||
## Oracle Risks
|
||||
|
||||
- Oracle remains the strongest correctness model but the least portable implementation
|
||||
- engine logic must not drift toward AQ-only assumptions that other backends cannot model
|
||||
|
||||
## Recommended Rollout Order
|
||||
|
||||
Do not build PostgreSQL and MongoDB in parallel first.
|
||||
|
||||
Use this order:
|
||||
|
||||
1. stabilize Oracle as the contract baseline
|
||||
2. refactor the host into a true backend-plugin model
|
||||
3. implement PostgreSQL profile
|
||||
4. pass the full backend conformance suite on PostgreSQL
|
||||
5. implement MongoDB profile only if there is a real product need for MongoDB as the system of record
|
||||
|
||||
PostgreSQL should come before MongoDB because:
|
||||
|
||||
- its runtime-state and projection model are closer to the current Oracle design
|
||||
- its transaction semantics fit the engine more naturally
|
||||
- the read-side model is already relational
|
||||
|
||||
## Validation Order After Functional Backend Completion
|
||||
|
||||
Functional backend completion is not the same as backend readiness.
|
||||
|
||||
After a backend can start, resume, signal, schedule, and retain workflows, the next required order is:
|
||||
|
||||
1. backend-neutral hostile-condition coverage
|
||||
2. curated Bulstrad parity coverage
|
||||
3. backend-neutral performance tiers
|
||||
4. backend-specific baseline publication
|
||||
5. final three-backend comparison
|
||||
|
||||
This means:
|
||||
|
||||
- PostgreSQL is not done when its basic stores and buses compile; it must also match the Oracle hostile-condition and Bulstrad suites
|
||||
- MongoDB is not done when replica-set transactions and signal delivery work; it must also match the same parity and performance suites
|
||||
- the final adoption decision should be based on the shared comparison pack, not on isolated backend microbenchmarks
|
||||
|
||||
## Proposed Sprint
|
||||
|
||||
## Sprint 14: Backend Portability And Store Profiles
|
||||
|
||||
### Goal
|
||||
|
||||
Turn the Oracle-first engine into a backend-switchable engine with one selected backend profile per deployment.
|
||||
|
||||
### Scope
|
||||
|
||||
- introduce backend profile abstraction and dedicated backend plugin registration
|
||||
- split projection persistence from the current Oracle-first application service
|
||||
- formalize mutation coordinator abstraction
|
||||
- add backend-neutral dead-letter contract
|
||||
- define and implement backend conformance suite
|
||||
- implement PostgreSQL profile
|
||||
- design MongoDB profile in executable detail, with implementation only after explicit product approval
|
||||
|
||||
### Deliverables
|
||||
|
||||
- `IWorkflowBackendRegistrationMarker`
|
||||
- backend-neutral projection contract
|
||||
- backend-neutral mutation coordinator contract
|
||||
- backend conformance test suite
|
||||
- dedicated Oracle, PostgreSQL, and MongoDB backend plugin projects
|
||||
- architecture-ready MongoDB backend plugin design package
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- host selects one backend profile by configuration
|
||||
- host stays backend-neutral and does not resolve Oracle/PostgreSQL directly
|
||||
- Oracle and PostgreSQL pass the same conformance suite
|
||||
- MongoDB path is specified well enough that implementation is a bounded engineering task
|
||||
- workflow declarations and canonical definitions remain unchanged across backend profiles
|
||||
|
||||
## Final Rule
|
||||
|
||||
Backend switching is an infrastructure concern, not a workflow concern.
|
||||
|
||||
If a future backend requires changing workflow declarations, canonical definitions, or engine semantics, that backend does not fit the architecture and should not be adopted without a new ADR.
|
||||
|
||||
1855
docs/workflow/engine/10-oracle-performance-baseline-2026-03-17.json
Normal file
1855
docs/workflow/engine/10-oracle-performance-baseline-2026-03-17.json
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,200 @@
|
||||
# Oracle Performance Baseline 2026-03-17
|
||||
|
||||
## Purpose
|
||||
|
||||
This document captures the current Oracle-backed load and performance baseline for the Serdica workflow engine. It is the reference point for later PostgreSQL and MongoDB backend comparisons.
|
||||
|
||||
The durable machine-readable companion is [10-oracle-performance-baseline-2026-03-17.json](10-oracle-performance-baseline-2026-03-17.json).
|
||||
|
||||
## Run Metadata
|
||||
|
||||
- Date: `2026-03-17`
|
||||
- Test command:
|
||||
- integration performance suite filtered to `OracleAqPerformance`
|
||||
- Suite result:
|
||||
- `12/12` tests passed
|
||||
- total wall-clock time: `2 m 40 s`
|
||||
- Raw artifact directory:
|
||||
- `TestResults/workflow-performance/`
|
||||
- Oracle environment:
|
||||
- Docker image: `gvenzl/oracle-free:23-slim`
|
||||
- instance: `FREE`
|
||||
- version: `23.0.0.0.0`
|
||||
- AQ backend: Oracle AQ with pooled connections and retry-hardened setup
|
||||
|
||||
## Scenario Summary
|
||||
|
||||
| Scenario | Tier | Ops | Conc | Duration ms | Throughput/s | Avg ms | P95 ms | Max ms |
|
||||
| --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||
| `oracle-aq-signal-roundtrip-capacity-c1` | `WorkflowPerfCapacity` | 16 | 1 | 4752.27 | 3.37 | 4257.28 | 4336.38 | 4359.67 |
|
||||
| `oracle-aq-signal-roundtrip-capacity-c4` | `WorkflowPerfCapacity` | 64 | 4 | 4205.24 | 15.22 | 3926.42 | 3988.33 | 3994.74 |
|
||||
| `oracle-aq-signal-roundtrip-capacity-c8` | `WorkflowPerfCapacity` | 128 | 8 | 5998.88 | 21.34 | 5226.56 | 5561.22 | 5605.59 |
|
||||
| `oracle-aq-signal-roundtrip-capacity-c16` | `WorkflowPerfCapacity` | 256 | 16 | 7523.47 | 34.03 | 6551.81 | 6710.05 | 6721.81 |
|
||||
| `oracle-aq-signal-roundtrip-latency-serial` | `WorkflowPerfLatency` | 16 | 1 | 49755.52 | 0.32 | 3104.85 | 3165.04 | 3232.40 |
|
||||
| `oracle-aq-bulstrad-quotation-confirm-convert-to-policy-nightly` | `WorkflowPerfNightly` | 12 | 4 | 6761.14 | 1.77 | 5679.63 | 6259.65 | 6276.32 |
|
||||
| `oracle-aq-delayed-burst-nightly` | `WorkflowPerfNightly` | 48 | 1 | 4483.42 | 10.71 | 3908.13 | 3978.47 | 3991.75 |
|
||||
| `oracle-aq-immediate-burst-nightly` | `WorkflowPerfNightly` | 120 | 1 | 2391.29 | 50.18 | 902.17 | 1179.59 | 1207.44 |
|
||||
| `oracle-aq-synthetic-external-resume-nightly` | `WorkflowPerfNightly` | 36 | 8 | 6793.73 | 5.30 | 6238.80 | 6425.95 | 6466.75 |
|
||||
| `oracle-aq-bulstrad-quote-or-apl-cancel-smoke` | `WorkflowPerfSmoke` | 10 | 4 | 507.79 | 19.69 | 28.54 | 40.05 | 42.93 |
|
||||
| `oracle-aq-delayed-burst-smoke` | `WorkflowPerfSmoke` | 12 | 1 | 4202.91 | 2.86 | 4040.62 | 4083.70 | 4084.12 |
|
||||
| `oracle-aq-immediate-burst-smoke` | `WorkflowPerfSmoke` | 24 | 1 | 421.48 | 56.94 | 205.87 | 209.90 | 210.16 |
|
||||
| `oracle-aq-synthetic-external-resume-smoke` | `WorkflowPerfSmoke` | 12 | 4 | 3843.39 | 3.12 | 3644.91 | 3691.31 | 3696.92 |
|
||||
| `oracle-aq-signal-roundtrip-soak` | `WorkflowPerfSoak` | 108 | 8 | 27620.16 | 3.91 | 4494.29 | 5589.33 | 5595.04 |
|
||||
| `oracle-aq-signal-roundtrip-throughput-parallel` | `WorkflowPerfThroughput` | 96 | 16 | 4575.99 | 20.98 | 4142.13 | 4215.64 | 4233.33 |
|
||||
|
||||
## Measurement Split
|
||||
|
||||
The synthetic signal round-trip workload is now measured in three separate ways so the numbers are not conflated:
|
||||
|
||||
- `oracle-aq-signal-roundtrip-latency-serial`: one workflow at a time, one signal worker, used as the single-instance latency baseline.
|
||||
- `oracle-aq-signal-roundtrip-throughput-parallel`: `96` workflows, `16`-way workload concurrency, `8` signal workers, used as the steady-state throughput baseline.
|
||||
- `oracle-aq-signal-roundtrip-capacity-c*`: batch-wave capacity ladder used to observe scaling and pressure points.
|
||||
|
||||
This split matters because the old low `c1` figure was easy to misread. The useful baseline now is:
|
||||
|
||||
- serial latency baseline: `3104.85 ms` average end-to-end per workflow
|
||||
- steady throughput baseline: `20.98 ops/s` with `16` workload concurrency and `8` signal workers
|
||||
- capacity `c1`: `3.37 ops/s`; this is now just the smallest batch-wave rung, not the headline latency number
|
||||
|
||||
### Serial Latency Baseline
|
||||
|
||||
| Phase | Avg ms | P95 ms | Max ms |
|
||||
| --- | ---: | ---: | ---: |
|
||||
| `start` | 25.14 | 41.92 | 45.27 |
|
||||
| `signalPublish` | 16.57 | 31.39 | 50.72 |
|
||||
| `signalToCompletion` | 3079.70 | 3128.56 | 3203.33 |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- most of the serial latency is in `signalToCompletion`, not in start or signal publication
|
||||
- start itself is cheap
|
||||
- signal publication itself is also cheap
|
||||
|
||||
### Steady Throughput Baseline
|
||||
|
||||
| Phase | Avg ms | P95 ms | Max ms |
|
||||
| --- | ---: | ---: | ---: |
|
||||
| `start` | 63.15 | 122.10 | 134.45 |
|
||||
| `signalPublish` | 18.61 | 25.99 | 29.06 |
|
||||
| `signalToCompletion` | 3905.26 | 4007.86 | 4016.82 |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- the engine sustained `20.98 ops/s` in a `96`-operation wave
|
||||
- end-to-end average stayed at `4142.13 ms`
|
||||
- start and signal publication remained small compared to the resume path
|
||||
|
||||
## Oracle Observations
|
||||
|
||||
### Dominant Waits
|
||||
|
||||
- `log file sync` was the top wait in `14/15` scenario artifacts. Commit pressure is still the main Oracle-side cost center for this engine profile.
|
||||
- The only scenario with a different top wait was the heavier Bulstrad nightly flow:
|
||||
- `oracle-aq-bulstrad-quotation-confirm-convert-to-policy-nightly` -> `library cache lock`
|
||||
- At higher concurrency the second-order waits become visible:
|
||||
- `resmgr:cpu quantum`
|
||||
- `row cache lock`
|
||||
- `buffer busy waits`
|
||||
|
||||
### Capacity Ladder
|
||||
|
||||
| Scenario | Throughput/s | P95 ms | User Commits | Session Logical Reads | Redo Size | DB Time | DB CPU | Top Wait |
|
||||
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | --- |
|
||||
| `c1` | 3.37 | 4336.38 | 64 | 3609 | 232824 | 653630 | 403101 | `log file sync` |
|
||||
| `c4` | 15.22 | 3988.33 | 256 | 19710 | 913884 | 1867747 | 1070601 | `log file sync` |
|
||||
| `c8` | 21.34 | 5561.22 | 512 | 66375 | 1910412 | 14103899 | 2746786 | `log file sync` |
|
||||
| `c16` | 34.03 | 6710.05 | 1024 | 229828 | 3796688 | 17605655 | 6083523 | `log file sync` |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- The harness changes improved the ladder materially compared to the previous cut.
|
||||
- `c1` moved to `3.37 ops/s` from the earlier `1.63 ops/s`, mostly because the harness no longer spends as much time in serial verifier tail behavior.
|
||||
- `c16` reached `34.03 ops/s`, but it is also the first rung with clearly visible CPU scheduling and contention pressure.
|
||||
- `c8` is still the last comfortable rung on this local Oracle Free setup.
|
||||
|
||||
### Transport Baselines
|
||||
|
||||
| Scenario | Throughput/s | User Commits | Session Logical Reads | Redo Size | Top Wait |
|
||||
| --- | ---: | ---: | ---: | ---: | --- |
|
||||
| `oracle-aq-immediate-burst-smoke` | 56.94 | 48 | 973 | 88700 | `log file sync` |
|
||||
| `oracle-aq-immediate-burst-nightly` | 50.18 | 240 | 12426 | 451200 | `log file sync` |
|
||||
| `oracle-aq-delayed-burst-smoke` | 2.86 | 24 | 566 | 52724 | `log file sync` |
|
||||
| `oracle-aq-delayed-burst-nightly` | 10.71 | 96 | 3043 | 197696 | `log file sync` |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- Immediate AQ transport remains much cheaper than full workflow resume.
|
||||
- Delayed AQ transport is still dominated by the intentional delay window, not raw dequeue throughput.
|
||||
|
||||
### Business Flow Baselines
|
||||
|
||||
| Scenario | Throughput/s | Avg ms | User Commits | Session Logical Reads | Redo Size | Top Wait |
|
||||
| --- | ---: | ---: | ---: | ---: | ---: | --- |
|
||||
| `oracle-aq-bulstrad-quote-or-apl-cancel-smoke` | 19.69 | 28.54 | 10 | 3411 | 40748 | `log file sync` |
|
||||
| `oracle-aq-bulstrad-quotation-confirm-convert-to-policy-nightly` | 1.77 | 5679.63 | 48 | 18562 | 505656 | `library cache lock` |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- The short Bulstrad flow is still mostly transport-bound.
|
||||
- The heavier `QuotationConfirm -> ConvertToPolicy` flow remains a useful real-workflow pressure baseline because it introduces parse and library pressure that the synthetic workloads do not.
|
||||
|
||||
### Soak Baseline
|
||||
|
||||
`oracle-aq-signal-roundtrip-soak` completed `108` operations at concurrency `8` with:
|
||||
|
||||
- throughput: `3.91 ops/s`
|
||||
- average latency: `4494.29 ms`
|
||||
- P95 latency: `5589.33 ms`
|
||||
- `0` failures
|
||||
- `0` dead-lettered signals
|
||||
- `0` runtime conflicts
|
||||
- `0` stuck instances
|
||||
|
||||
Oracle metrics for the soak run:
|
||||
|
||||
- `user commits`: `432`
|
||||
- `user rollbacks`: `54`
|
||||
- `session logical reads`: `104711`
|
||||
- `redo size`: `1535580`
|
||||
- `DB time`: `15394405`
|
||||
- `DB CPU`: `2680492`
|
||||
- top waits:
|
||||
- `log file sync`: `10904550 us`
|
||||
- `resmgr:cpu quantum`: `1573185 us`
|
||||
- `row cache lock`: `719739 us`
|
||||
|
||||
## What Must Stay Constant For Future Backend Comparisons
|
||||
|
||||
When PostgreSQL and MongoDB backends are benchmarked, keep these constant:
|
||||
|
||||
- same scenario names
|
||||
- same operation counts
|
||||
- same concurrency levels
|
||||
- same worker counts for signal drain
|
||||
- same synthetic workflow definitions
|
||||
- same Bulstrad workflow families
|
||||
- same correctness assertions
|
||||
|
||||
Compare these dimensions directly:
|
||||
|
||||
- throughput per second
|
||||
- latency average, P95, P99, and max
|
||||
- phase latency summaries for start, signal publish, and signal-to-completion on the synthetic signal round-trip workload
|
||||
- failures, dead letters, runtime conflicts, and stuck instances
|
||||
- commit count analogs
|
||||
- logical read or document/row read analogs
|
||||
- redo or WAL/journal write analogs
|
||||
- dominant waits, locks, or contention classes
|
||||
|
||||
## First Sizing Note
|
||||
|
||||
On this local Oracle Free baseline:
|
||||
|
||||
- Oracle AQ immediate burst handling is comfortably above the small workflow tiers, but not at the earlier near-`100 ops/s` level on this latest run; the current nightly transport baseline is `50.18 ops/s`.
|
||||
- The first clear saturation signal is still not transport dequeue itself, but commit pressure and then CPU scheduling pressure.
|
||||
- The separated throughput baseline is the better reference for backend comparisons than the old low `c1` figure.
|
||||
- `c8` remains the last comfortably scaling signal-roundtrip rung on this machine.
|
||||
- `c16` is still correct and faster, but it is the first pressure rung, not the default deployment target.
|
||||
|
||||
This is a baseline, not a production commitment. PostgreSQL and MongoDB backend work should reuse the same scenarios and produce the same summary tables before any architectural preference is declared.
|
||||
|
||||
@@ -0,0 +1,213 @@
|
||||
{
|
||||
"Date": "2026-03-17",
|
||||
"Workspace": "C:\\dev\\serdica-backend4",
|
||||
"TestCommand": "dotnet test src/Serdica/Ablera.Serdica.Workflow/__Tests/Ablera.Serdica.Workflow.IntegrationTests/Ablera.Serdica.Workflow.IntegrationTests.csproj -c Release --no-build --filter \"FullyQualifiedName~PostgresPerformance\"",
|
||||
"SuiteResult": {
|
||||
"Passed": 11,
|
||||
"Total": 11,
|
||||
"Duration": "2 m 16 s"
|
||||
},
|
||||
"RawArtifactDirectory": "src/Serdica/Ablera.Serdica.Workflow/__Tests/Ablera.Serdica.Workflow.IntegrationTests/bin/Release/net9.0/TestResults/workflow-performance/",
|
||||
"PostgresEnvironment": {
|
||||
"DockerImage": "postgres:16-alpine",
|
||||
"Database": "workflow",
|
||||
"Version": "PostgreSQL 16.13",
|
||||
"Backend": "Durable queue tables plus LISTEN/NOTIFY wake hints"
|
||||
},
|
||||
"MeasurementViews": {
|
||||
"SerialLatencyScenario": "postgres-signal-roundtrip-latency-serial",
|
||||
"SteadyThroughputScenario": "postgres-signal-roundtrip-throughput-parallel",
|
||||
"CapacityScenarioPrefix": "postgres-signal-roundtrip-capacity-"
|
||||
},
|
||||
"Notes": {
|
||||
"TopWaitCounts": [
|
||||
{
|
||||
"Name": "Client:ClientRead",
|
||||
"Count": 13
|
||||
},
|
||||
{
|
||||
"Name": "",
|
||||
"Count": 1
|
||||
}
|
||||
],
|
||||
"Interpretation": [
|
||||
"Serial latency baseline and steady throughput baseline are separated from the capacity ladder.",
|
||||
"The capacity ladder still scales through c16 on this local PostgreSQL Docker setup.",
|
||||
"Immediate queue handling remains much cheaper than full workflow resume.",
|
||||
"The dominant observed backend state is client read waiting, not an obvious storage stall."
|
||||
]
|
||||
},
|
||||
"Scenarios": [
|
||||
{
|
||||
"ScenarioName": "postgres-signal-roundtrip-capacity-c1",
|
||||
"Tier": "WorkflowPerfCapacity",
|
||||
"OperationCount": 16,
|
||||
"Concurrency": 1,
|
||||
"DurationMilliseconds": 3895.54,
|
||||
"ThroughputPerSecond": 4.11,
|
||||
"AverageLatencyMilliseconds": 3738.08,
|
||||
"P95LatencyMilliseconds": 3762.51,
|
||||
"MaxLatencyMilliseconds": 3771.10,
|
||||
"TopWait": "Client:ClientRead",
|
||||
"CounterDeltas": {
|
||||
"xact_commit": 251,
|
||||
"xact_rollback": 7,
|
||||
"blks_hit": 1654,
|
||||
"blks_read": 24,
|
||||
"tup_inserted": 48,
|
||||
"tup_updated": 48,
|
||||
"tup_deleted": 16
|
||||
}
|
||||
},
|
||||
{
|
||||
"ScenarioName": "postgres-signal-roundtrip-capacity-c4",
|
||||
"Tier": "WorkflowPerfCapacity",
|
||||
"OperationCount": 64,
|
||||
"Concurrency": 4,
|
||||
"DurationMilliseconds": 3700.99,
|
||||
"ThroughputPerSecond": 17.29,
|
||||
"AverageLatencyMilliseconds": 3577.49,
|
||||
"P95LatencyMilliseconds": 3583.70,
|
||||
"MaxLatencyMilliseconds": 3584.43,
|
||||
"TopWait": "Client:ClientRead",
|
||||
"CounterDeltas": {
|
||||
"xact_commit": 1080,
|
||||
"xact_rollback": 21,
|
||||
"blks_hit": 7084,
|
||||
"blks_read": 1,
|
||||
"tup_inserted": 192,
|
||||
"tup_updated": 192,
|
||||
"tup_deleted": 64
|
||||
}
|
||||
},
|
||||
{
|
||||
"ScenarioName": "postgres-signal-roundtrip-capacity-c8",
|
||||
"Tier": "WorkflowPerfCapacity",
|
||||
"OperationCount": 128,
|
||||
"Concurrency": 8,
|
||||
"DurationMilliseconds": 3853.89,
|
||||
"ThroughputPerSecond": 33.21,
|
||||
"AverageLatencyMilliseconds": 3713.31,
|
||||
"P95LatencyMilliseconds": 3718.66,
|
||||
"MaxLatencyMilliseconds": 3719.34,
|
||||
"TopWait": "Client:ClientRead",
|
||||
"CounterDeltas": {
|
||||
"xact_commit": 2348,
|
||||
"xact_rollback": 44,
|
||||
"blks_hit": 17069,
|
||||
"blks_read": 0,
|
||||
"tup_inserted": 384,
|
||||
"tup_updated": 384,
|
||||
"tup_deleted": 128
|
||||
}
|
||||
},
|
||||
{
|
||||
"ScenarioName": "postgres-signal-roundtrip-capacity-c16",
|
||||
"Tier": "WorkflowPerfCapacity",
|
||||
"OperationCount": 256,
|
||||
"Concurrency": 16,
|
||||
"DurationMilliseconds": 4488.07,
|
||||
"ThroughputPerSecond": 57.04,
|
||||
"AverageLatencyMilliseconds": 4251.48,
|
||||
"P95LatencyMilliseconds": 4287.87,
|
||||
"MaxLatencyMilliseconds": 4294.09,
|
||||
"TopWait": "Client:ClientRead",
|
||||
"CounterDeltas": {
|
||||
"xact_commit": 4536,
|
||||
"xact_rollback": 48,
|
||||
"blks_hit": 40443,
|
||||
"blks_read": 0,
|
||||
"tup_inserted": 768,
|
||||
"tup_updated": 768,
|
||||
"tup_deleted": 256
|
||||
}
|
||||
},
|
||||
{
|
||||
"ScenarioName": "postgres-signal-roundtrip-latency-serial",
|
||||
"Tier": "WorkflowPerfLatency",
|
||||
"OperationCount": 16,
|
||||
"Concurrency": 1,
|
||||
"DurationMilliseconds": 49290.47,
|
||||
"ThroughputPerSecond": 0.32,
|
||||
"AverageLatencyMilliseconds": 3079.33,
|
||||
"P95LatencyMilliseconds": 3094.94,
|
||||
"MaxLatencyMilliseconds": 3101.71,
|
||||
"PhaseLatencySummaries": {
|
||||
"start": {
|
||||
"AverageMilliseconds": 6.12,
|
||||
"P95Milliseconds": 9.29,
|
||||
"MaxMilliseconds": 11.26
|
||||
},
|
||||
"signalPublish": {
|
||||
"AverageMilliseconds": 5.63,
|
||||
"P95Milliseconds": 6.82,
|
||||
"MaxMilliseconds": 7.53
|
||||
},
|
||||
"signalToCompletion": {
|
||||
"AverageMilliseconds": 3073.20,
|
||||
"P95Milliseconds": 3086.59,
|
||||
"MaxMilliseconds": 3090.44
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"ScenarioName": "postgres-signal-roundtrip-throughput-parallel",
|
||||
"Tier": "WorkflowPerfThroughput",
|
||||
"OperationCount": 96,
|
||||
"Concurrency": 16,
|
||||
"DurationMilliseconds": 3729.17,
|
||||
"ThroughputPerSecond": 25.74,
|
||||
"AverageLatencyMilliseconds": 3603.54,
|
||||
"P95LatencyMilliseconds": 3635.59,
|
||||
"MaxLatencyMilliseconds": 3649.96,
|
||||
"TopWait": "Client:ClientRead",
|
||||
"CounterDeltas": {
|
||||
"xact_commit": 1502,
|
||||
"xact_rollback": 38,
|
||||
"blks_hit": 21978,
|
||||
"blks_read": 24,
|
||||
"tup_inserted": 288,
|
||||
"tup_updated": 288,
|
||||
"tup_deleted": 96
|
||||
},
|
||||
"PhaseLatencySummaries": {
|
||||
"start": {
|
||||
"AverageMilliseconds": 16.21,
|
||||
"P95Milliseconds": 40.31,
|
||||
"MaxMilliseconds": 47.02
|
||||
},
|
||||
"signalPublish": {
|
||||
"AverageMilliseconds": 18.11,
|
||||
"P95Milliseconds": 23.62,
|
||||
"MaxMilliseconds": 28.41
|
||||
},
|
||||
"signalToCompletion": {
|
||||
"AverageMilliseconds": 3504.24,
|
||||
"P95Milliseconds": 3530.38,
|
||||
"MaxMilliseconds": 3531.14
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"ScenarioName": "postgres-signal-roundtrip-soak",
|
||||
"Tier": "WorkflowPerfSoak",
|
||||
"OperationCount": 108,
|
||||
"Concurrency": 8,
|
||||
"DurationMilliseconds": 25121.68,
|
||||
"ThroughputPerSecond": 4.30,
|
||||
"AverageLatencyMilliseconds": 4164.52,
|
||||
"P95LatencyMilliseconds": 4208.42,
|
||||
"MaxLatencyMilliseconds": 4209.96,
|
||||
"TopWait": "Client:ClientRead",
|
||||
"CounterDeltas": {
|
||||
"xact_commit": 3313,
|
||||
"xact_rollback": 352,
|
||||
"blks_hit": 26548,
|
||||
"blks_read": 269,
|
||||
"tup_inserted": 774,
|
||||
"tup_updated": 339,
|
||||
"tup_deleted": 108
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,191 @@
|
||||
# PostgreSQL Performance Baseline 2026-03-17
|
||||
|
||||
## Purpose
|
||||
|
||||
This document captures the current PostgreSQL-backed load and performance baseline for the Serdica workflow engine. It is the reference point for later MongoDB backend comparisons and the final three-backend decision pack.
|
||||
|
||||
The durable machine-readable companion is [11-postgres-performance-baseline-2026-03-17.json](11-postgres-performance-baseline-2026-03-17.json).
|
||||
|
||||
## Run Metadata
|
||||
|
||||
- Date: `2026-03-17`
|
||||
- Test command:
|
||||
- integration performance suite filtered to `PostgresPerformance`
|
||||
- Suite result:
|
||||
- `11/11` tests passed
|
||||
- total wall-clock time: `2 m 16 s`
|
||||
- Raw artifact directory:
|
||||
- `TestResults/workflow-performance/`
|
||||
- PostgreSQL environment:
|
||||
- Docker image: `postgres:16-alpine`
|
||||
- database: `workflow`
|
||||
- version: `PostgreSQL 16.13`
|
||||
- backend: durable queue tables plus `LISTEN/NOTIFY` wake hints
|
||||
|
||||
## Scenario Summary
|
||||
|
||||
| Scenario | Tier | Ops | Conc | Duration ms | Throughput/s | Avg ms | P95 ms | Max ms |
|
||||
| --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||
| `postgres-signal-roundtrip-capacity-c1` | `WorkflowPerfCapacity` | 16 | 1 | 3895.54 | 4.11 | 3738.08 | 3762.51 | 3771.10 |
|
||||
| `postgres-signal-roundtrip-capacity-c4` | `WorkflowPerfCapacity` | 64 | 4 | 3700.99 | 17.29 | 3577.49 | 3583.70 | 3584.43 |
|
||||
| `postgres-signal-roundtrip-capacity-c8` | `WorkflowPerfCapacity` | 128 | 8 | 3853.89 | 33.21 | 3713.31 | 3718.66 | 3719.34 |
|
||||
| `postgres-signal-roundtrip-capacity-c16` | `WorkflowPerfCapacity` | 256 | 16 | 4488.07 | 57.04 | 4251.48 | 4287.87 | 4294.09 |
|
||||
| `postgres-signal-roundtrip-latency-serial` | `WorkflowPerfLatency` | 16 | 1 | 49290.47 | 0.32 | 3079.33 | 3094.94 | 3101.71 |
|
||||
| `postgres-bulstrad-quotation-confirm-convert-to-policy-nightly` | `WorkflowPerfNightly` | 12 | 4 | 3598.64 | 3.33 | 3478.52 | 3500.76 | 3503.73 |
|
||||
| `postgres-delayed-burst-nightly` | `WorkflowPerfNightly` | 48 | 1 | 2449.25 | 19.60 | 2096.34 | 2152.50 | 2157.39 |
|
||||
| `postgres-immediate-burst-nightly` | `WorkflowPerfNightly` | 120 | 1 | 1711.87 | 70.10 | 849.78 | 1012.13 | 1030.98 |
|
||||
| `postgres-synthetic-external-resume-nightly` | `WorkflowPerfNightly` | 36 | 8 | 4162.56 | 8.65 | 4026.50 | 4048.09 | 4049.91 |
|
||||
| `postgres-bulstrad-quote-or-apl-cancel-smoke` | `WorkflowPerfSmoke` | 10 | 4 | 166.99 | 59.88 | 13.51 | 23.87 | 26.35 |
|
||||
| `postgres-delayed-burst-smoke` | `WorkflowPerfSmoke` | 12 | 1 | 2146.89 | 5.59 | 2032.67 | 2050.20 | 2051.30 |
|
||||
| `postgres-immediate-burst-smoke` | `WorkflowPerfSmoke` | 24 | 1 | 341.84 | 70.21 | 176.19 | 197.25 | 197.91 |
|
||||
| `postgres-signal-roundtrip-soak` | `WorkflowPerfSoak` | 108 | 8 | 25121.68 | 4.30 | 4164.52 | 4208.42 | 4209.96 |
|
||||
| `postgres-signal-roundtrip-throughput-parallel` | `WorkflowPerfThroughput` | 96 | 16 | 3729.17 | 25.74 | 3603.54 | 3635.59 | 3649.96 |
|
||||
|
||||
## Measurement Split
|
||||
|
||||
The synthetic signal round-trip workload is measured in three separate ways:
|
||||
|
||||
- `postgres-signal-roundtrip-latency-serial`: one workflow at a time, one signal worker, used as the single-instance latency baseline.
|
||||
- `postgres-signal-roundtrip-throughput-parallel`: `96` workflows, `16`-way workload concurrency, `8` signal workers, used as the steady-state throughput baseline.
|
||||
- `postgres-signal-roundtrip-capacity-c*`: batch-wave capacity ladder used to observe scaling and pressure points.
|
||||
|
||||
The useful PostgreSQL baseline is:
|
||||
|
||||
- serial latency baseline: `3079.33 ms` average end-to-end per workflow
|
||||
- steady throughput baseline: `25.74 ops/s` with `16` workload concurrency and `8` signal workers
|
||||
- capacity `c1`: `4.11 ops/s`; this is only the smallest batch-wave rung
|
||||
|
||||
### Serial Latency Baseline
|
||||
|
||||
| Phase | Avg ms | P95 ms | Max ms |
|
||||
| --- | ---: | ---: | ---: |
|
||||
| `start` | 6.12 | 9.29 | 11.26 |
|
||||
| `signalPublish` | 5.63 | 6.82 | 7.53 |
|
||||
| `signalToCompletion` | 3073.20 | 3086.59 | 3090.44 |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- almost all serial latency is in `signalToCompletion`
|
||||
- workflow start is very cheap on this backend
|
||||
- external signal publication is also cheap
|
||||
|
||||
### Steady Throughput Baseline
|
||||
|
||||
| Phase | Avg ms | P95 ms | Max ms |
|
||||
| --- | ---: | ---: | ---: |
|
||||
| `start` | 16.21 | 40.31 | 47.02 |
|
||||
| `signalPublish` | 18.11 | 23.62 | 28.41 |
|
||||
| `signalToCompletion` | 3504.24 | 3530.38 | 3531.14 |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- the engine sustained `25.74 ops/s` in a `96`-operation wave
|
||||
- end-to-end average stayed at `3603.54 ms`
|
||||
- start and signal publication remained small compared to the resume path
|
||||
|
||||
## PostgreSQL Observations
|
||||
|
||||
### Dominant Waits
|
||||
|
||||
- `Client:ClientRead` was the top observed wait class in `13/14` scenario artifacts.
|
||||
- The serial latency scenario had no distinct competing wait class because the measurement ran with effectively no backend concurrency.
|
||||
- On this local PostgreSQL profile the wake-up path is not the visible bottleneck; the dominant observed state is clients waiting on the next command while the engine completes work in short transactions.
|
||||
|
||||
### Capacity Ladder
|
||||
|
||||
| Scenario | Throughput/s | P95 ms | Xact Commits | Buffer Hits | Buffer Reads | Tuples Inserted | Tuples Updated | Tuples Deleted | Top Wait |
|
||||
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | --- |
|
||||
| `c1` | 4.11 | 3762.51 | 251 | 1654 | 24 | 48 | 48 | 16 | `Client:ClientRead` |
|
||||
| `c4` | 17.29 | 3583.70 | 1080 | 7084 | 1 | 192 | 192 | 64 | `Client:ClientRead` |
|
||||
| `c8` | 33.21 | 3718.66 | 2348 | 17069 | 0 | 384 | 384 | 128 | `Client:ClientRead` |
|
||||
| `c16` | 57.04 | 4287.87 | 4536 | 40443 | 0 | 768 | 768 | 256 | `Client:ClientRead` |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- the capacity ladder scales more smoothly than the Oracle baseline on the same local machine
|
||||
- `c16` is the fastest tested rung and does not yet show a hard cliff
|
||||
- the next meaningful PostgreSQL characterization step should test above `c16` before declaring a saturation boundary
|
||||
|
||||
### Transport Baselines
|
||||
|
||||
| Scenario | Throughput/s | Xact Commits | Buffer Hits | Buffer Reads | Tuples Inserted | Tuples Updated | Top Wait |
|
||||
| --- | ---: | ---: | ---: | ---: | ---: | ---: | --- |
|
||||
| `postgres-immediate-burst-nightly` | 70.10 | 801 | 13207 | 4 | 570 | 162 | `Client:ClientRead` |
|
||||
| `postgres-delayed-burst-nightly` | 19.60 | 269 | 11472 | 3 | 498 | 33 | `Client:ClientRead` |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- immediate transport remains much cheaper than full workflow resume
|
||||
- delayed transport is still dominated by the intentional delay window, not by raw dequeue speed
|
||||
- the very short smoke transport runs are useful for end-to-end timing, but they are too brief to rely on as the primary PostgreSQL stat sample
|
||||
|
||||
### Business Flow Baselines
|
||||
|
||||
| Scenario | Throughput/s | Avg ms | Xact Commits | Buffer Hits | Buffer Reads | Tuples Inserted | Tuples Updated | Top Wait |
|
||||
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | --- |
|
||||
| `postgres-bulstrad-quote-or-apl-cancel-smoke` | 59.88 | 13.51 | 3 | 93 | 0 | 0 | 0 | `Client:ClientRead` |
|
||||
| `postgres-bulstrad-quotation-confirm-convert-to-policy-nightly` | 3.33 | 3478.52 | 236 | 12028 | 270 | 546 | 75 | `Client:ClientRead` |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- the short Bulstrad flow is still mostly transport and orchestration overhead
|
||||
- the heavier `QuotationConfirm -> ConvertToPolicy` flow is a better real-workload pressure baseline because it exercises deeper projection and signal traffic
|
||||
|
||||
### Soak Baseline
|
||||
|
||||
`postgres-signal-roundtrip-soak` completed `108` operations at concurrency `8` with:
|
||||
|
||||
- throughput: `4.30 ops/s`
|
||||
- average latency: `4164.52 ms`
|
||||
- P95 latency: `4208.42 ms`
|
||||
- `0` failures
|
||||
- `0` dead-lettered signals
|
||||
- `0` runtime conflicts
|
||||
- `0` stuck instances
|
||||
|
||||
PostgreSQL metrics for the soak run:
|
||||
|
||||
- `xact_commit`: `3313`
|
||||
- `xact_rollback`: `352`
|
||||
- `blks_hit`: `26548`
|
||||
- `blks_read`: `269`
|
||||
- `tup_inserted`: `774`
|
||||
- `tup_updated`: `339`
|
||||
- `tup_deleted`: `108`
|
||||
- top wait:
|
||||
- `Client:ClientRead`
|
||||
|
||||
## What Must Stay Constant For Future Backend Comparisons
|
||||
|
||||
When MongoDB is benchmarked and the final Oracle/PostgreSQL/MongoDB comparison is produced, keep these constant:
|
||||
|
||||
- same scenario names
|
||||
- same operation counts
|
||||
- same concurrency levels
|
||||
- same worker counts for signal drain
|
||||
- same synthetic workflow definitions
|
||||
- same Bulstrad workflow families
|
||||
- same correctness assertions
|
||||
|
||||
Compare these dimensions directly:
|
||||
|
||||
- throughput per second
|
||||
- latency average, P95, P99, and max
|
||||
- phase latency summaries for start, signal publish, and signal-to-completion on the synthetic signal round-trip workload
|
||||
- failures, dead letters, runtime conflicts, and stuck instances
|
||||
- commit count analogs
|
||||
- row, tuple, or document movement analogs
|
||||
- read-hit or read-amplification analogs
|
||||
- dominant waits, locks, or wake-path contention classes
|
||||
|
||||
## First Sizing Note
|
||||
|
||||
On this local PostgreSQL baseline:
|
||||
|
||||
- immediate queue burst handling is comfortably above the small workflow tiers; the current nightly transport baseline is `70.10 ops/s`
|
||||
- the separated steady throughput baseline is `25.74 ops/s`, ahead of the current Oracle baseline on the same synthetic workflow profile
|
||||
- the ladder through `c16` still looks healthy and does not yet expose a sharp pressure rung
|
||||
- the dominant observed backend state is client read waiting, which suggests the next tuning conversation should focus on queue claim cadence, notification wake-ups, and transaction shape rather than on an obvious storage stall
|
||||
|
||||
This is a baseline, not a production commitment. MongoDB should now reuse the same scenarios and produce the same summary tables before any backend recommendation is declared.
|
||||
|
||||
@@ -0,0 +1,211 @@
|
||||
{
|
||||
"Date": "2026-03-17",
|
||||
"Workspace": "C:\\dev\\serdica-backend4",
|
||||
"TestCommand": "dotnet test src/Serdica/Ablera.Serdica.Workflow/__Tests/Ablera.Serdica.Workflow.IntegrationTests/Ablera.Serdica.Workflow.IntegrationTests.csproj -c Release --no-build --filter \"FullyQualifiedName~MongoPerformance\"",
|
||||
"SuiteResult": {
|
||||
"Passed": 14,
|
||||
"Total": 14,
|
||||
"Duration": "48 s"
|
||||
},
|
||||
"RawArtifactDirectory": "src/Serdica/Ablera.Serdica.Workflow/__Tests/Ablera.Serdica.Workflow.IntegrationTests/bin/Release/net9.0/TestResults/workflow-performance/",
|
||||
"MongoEnvironment": {
|
||||
"DockerImage": "mongo:7.0",
|
||||
"Topology": "single-node replica set",
|
||||
"Version": "7.0.30",
|
||||
"Backend": "Durable collections plus change-stream wake hints"
|
||||
},
|
||||
"MeasurementViews": {
|
||||
"SerialLatencyScenario": "mongo-signal-roundtrip-latency-serial",
|
||||
"SteadyThroughputScenario": "mongo-signal-roundtrip-throughput-parallel",
|
||||
"CapacityScenarioPrefix": "mongo-signal-roundtrip-capacity-"
|
||||
},
|
||||
"Notes": {
|
||||
"TopWaitCounts": [
|
||||
{
|
||||
"Name": "(none)",
|
||||
"Count": 14
|
||||
}
|
||||
],
|
||||
"Interpretation": [
|
||||
"Serial latency baseline and steady throughput baseline are separated from the capacity ladder.",
|
||||
"Mongo exposed two backend-correctness issues during the first performance pass: bounded idle receive and explicit collection bootstrap.",
|
||||
"Mongo scales very strongly through c8 on this local replica-set baseline.",
|
||||
"c16 is the first visible pressure rung because latency rises materially even though throughput still improves."
|
||||
]
|
||||
},
|
||||
"Scenarios": [
|
||||
{
|
||||
"ScenarioName": "mongo-signal-roundtrip-capacity-c1",
|
||||
"Tier": "WorkflowPerfCapacity",
|
||||
"OperationCount": 16,
|
||||
"Concurrency": 1,
|
||||
"DurationMilliseconds": 2259.99,
|
||||
"ThroughputPerSecond": 7.08,
|
||||
"AverageLatencyMilliseconds": 1394.99,
|
||||
"P95LatencyMilliseconds": 1576.55,
|
||||
"MaxLatencyMilliseconds": 2063.72,
|
||||
"CounterDeltas": {
|
||||
"opcounters.command": 183,
|
||||
"opcounters.insert": 48,
|
||||
"opcounters.update": 48,
|
||||
"opcounters.delete": 16,
|
||||
"metrics.document.returned": 80,
|
||||
"metrics.document.inserted": 48,
|
||||
"metrics.document.updated": 48,
|
||||
"metrics.document.deleted": 16
|
||||
}
|
||||
},
|
||||
{
|
||||
"ScenarioName": "mongo-signal-roundtrip-capacity-c4",
|
||||
"Tier": "WorkflowPerfCapacity",
|
||||
"OperationCount": 64,
|
||||
"Concurrency": 4,
|
||||
"DurationMilliseconds": 1668.99,
|
||||
"ThroughputPerSecond": 38.35,
|
||||
"AverageLatencyMilliseconds": 1244.81,
|
||||
"P95LatencyMilliseconds": 1472.61,
|
||||
"MaxLatencyMilliseconds": 1527.26,
|
||||
"CounterDeltas": {
|
||||
"opcounters.command": 684,
|
||||
"opcounters.insert": 192,
|
||||
"opcounters.update": 192,
|
||||
"opcounters.delete": 64,
|
||||
"metrics.document.returned": 320,
|
||||
"metrics.document.inserted": 192,
|
||||
"metrics.document.updated": 192,
|
||||
"metrics.document.deleted": 64
|
||||
}
|
||||
},
|
||||
{
|
||||
"ScenarioName": "mongo-signal-roundtrip-capacity-c8",
|
||||
"Tier": "WorkflowPerfCapacity",
|
||||
"OperationCount": 128,
|
||||
"Concurrency": 8,
|
||||
"DurationMilliseconds": 1938.12,
|
||||
"ThroughputPerSecond": 66.04,
|
||||
"AverageLatencyMilliseconds": 1477.49,
|
||||
"P95LatencyMilliseconds": 1743.52,
|
||||
"MaxLatencyMilliseconds": 1757.88,
|
||||
"CounterDeltas": {
|
||||
"opcounters.command": 1349,
|
||||
"opcounters.insert": 384,
|
||||
"opcounters.update": 384,
|
||||
"opcounters.delete": 128,
|
||||
"metrics.document.returned": 640,
|
||||
"metrics.document.inserted": 384,
|
||||
"metrics.document.updated": 384,
|
||||
"metrics.document.deleted": 128
|
||||
}
|
||||
},
|
||||
{
|
||||
"ScenarioName": "mongo-signal-roundtrip-capacity-c16",
|
||||
"Tier": "WorkflowPerfCapacity",
|
||||
"OperationCount": 256,
|
||||
"Concurrency": 16,
|
||||
"DurationMilliseconds": 3728.88,
|
||||
"ThroughputPerSecond": 68.65,
|
||||
"AverageLatencyMilliseconds": 3203.94,
|
||||
"P95LatencyMilliseconds": 3507.95,
|
||||
"MaxLatencyMilliseconds": 3527.96,
|
||||
"CounterDeltas": {
|
||||
"opcounters.command": 2515,
|
||||
"opcounters.insert": 768,
|
||||
"opcounters.update": 768,
|
||||
"opcounters.delete": 256,
|
||||
"metrics.document.returned": 1280,
|
||||
"metrics.document.inserted": 768,
|
||||
"metrics.document.updated": 768,
|
||||
"metrics.document.deleted": 256
|
||||
}
|
||||
},
|
||||
{
|
||||
"ScenarioName": "mongo-signal-roundtrip-latency-serial",
|
||||
"Tier": "WorkflowPerfLatency",
|
||||
"OperationCount": 16,
|
||||
"Concurrency": 1,
|
||||
"DurationMilliseconds": 1675.77,
|
||||
"ThroughputPerSecond": 9.55,
|
||||
"AverageLatencyMilliseconds": 97.88,
|
||||
"P95LatencyMilliseconds": 149.20,
|
||||
"MaxLatencyMilliseconds": 324.02,
|
||||
"PhaseLatencySummaries": {
|
||||
"start": {
|
||||
"AverageMilliseconds": 26.34,
|
||||
"P95Milliseconds": 79.35,
|
||||
"MaxMilliseconds": 251.36
|
||||
},
|
||||
"signalPublish": {
|
||||
"AverageMilliseconds": 8.17,
|
||||
"P95Milliseconds": 10.75,
|
||||
"MaxMilliseconds": 12.17
|
||||
},
|
||||
"signalToCompletion": {
|
||||
"AverageMilliseconds": 71.54,
|
||||
"P95Milliseconds": 77.94,
|
||||
"MaxMilliseconds": 79.48
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"ScenarioName": "mongo-signal-roundtrip-throughput-parallel",
|
||||
"Tier": "WorkflowPerfThroughput",
|
||||
"OperationCount": 96,
|
||||
"Concurrency": 16,
|
||||
"DurationMilliseconds": 1258.48,
|
||||
"ThroughputPerSecond": 76.28,
|
||||
"AverageLatencyMilliseconds": 1110.94,
|
||||
"P95LatencyMilliseconds": 1121.22,
|
||||
"MaxLatencyMilliseconds": 1127.11,
|
||||
"PhaseLatencySummaries": {
|
||||
"start": {
|
||||
"AverageMilliseconds": 20.88,
|
||||
"P95Milliseconds": 28.64,
|
||||
"MaxMilliseconds": 33.67
|
||||
},
|
||||
"signalPublish": {
|
||||
"AverageMilliseconds": 16.01,
|
||||
"P95Milliseconds": 20.90,
|
||||
"MaxMilliseconds": 22.71
|
||||
},
|
||||
"signalToCompletion": {
|
||||
"AverageMilliseconds": 988.88,
|
||||
"P95Milliseconds": 1000.12,
|
||||
"MaxMilliseconds": 1004.92
|
||||
}
|
||||
},
|
||||
"CounterDeltas": {
|
||||
"opcounters.command": 1049,
|
||||
"opcounters.insert": 288,
|
||||
"opcounters.update": 288,
|
||||
"opcounters.delete": 96,
|
||||
"metrics.document.returned": 480,
|
||||
"metrics.document.inserted": 288,
|
||||
"metrics.document.updated": 288,
|
||||
"metrics.document.deleted": 96
|
||||
}
|
||||
},
|
||||
{
|
||||
"ScenarioName": "mongo-signal-roundtrip-soak",
|
||||
"Tier": "WorkflowPerfSoak",
|
||||
"OperationCount": 108,
|
||||
"Concurrency": 8,
|
||||
"DurationMilliseconds": 2267.91,
|
||||
"ThroughputPerSecond": 47.62,
|
||||
"AverageLatencyMilliseconds": 322.40,
|
||||
"P95LatencyMilliseconds": 550.50,
|
||||
"MaxLatencyMilliseconds": 572.73,
|
||||
"CounterDeltas": {
|
||||
"opcounters.command": 2264,
|
||||
"opcounters.insert": 324,
|
||||
"opcounters.update": 324,
|
||||
"opcounters.delete": 108,
|
||||
"metrics.document.returned": 540,
|
||||
"metrics.document.inserted": 324,
|
||||
"metrics.document.updated": 324,
|
||||
"metrics.document.deleted": 108,
|
||||
"transactions.totalStarted": 216,
|
||||
"transactions.totalCommitted": 216
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
195
docs/workflow/engine/12-mongo-performance-baseline-2026-03-17.md
Normal file
195
docs/workflow/engine/12-mongo-performance-baseline-2026-03-17.md
Normal file
@@ -0,0 +1,195 @@
|
||||
# MongoDB Performance Baseline 2026-03-17
|
||||
|
||||
## Purpose
|
||||
|
||||
This document captures the current MongoDB-backed load and performance baseline for the Serdica workflow engine. It completes the per-backend baseline set that will feed the final three-backend comparison.
|
||||
|
||||
The durable machine-readable companion is [12-mongo-performance-baseline-2026-03-17.json](12-mongo-performance-baseline-2026-03-17.json).
|
||||
|
||||
## Run Metadata
|
||||
|
||||
- Date: `2026-03-17`
|
||||
- Test command:
|
||||
- integration performance suite filtered to `MongoPerformance`
|
||||
- Suite result:
|
||||
- `14/14` tests passed
|
||||
- total wall-clock time: `48 s`
|
||||
- Raw artifact directory:
|
||||
- `TestResults/workflow-performance/`
|
||||
- MongoDB environment:
|
||||
- Docker image: `mongo:7.0`
|
||||
- topology: single-node replica set
|
||||
- version: `7.0.30`
|
||||
- backend: durable collections plus change-stream wake hints
|
||||
|
||||
## Scenario Summary
|
||||
|
||||
| Scenario | Tier | Ops | Conc | Duration ms | Throughput/s | Avg ms | P95 ms | Max ms |
|
||||
| --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||
| `mongo-signal-roundtrip-capacity-c1` | `WorkflowPerfCapacity` | 16 | 1 | 2259.99 | 7.08 | 1394.99 | 1576.55 | 2063.72 |
|
||||
| `mongo-signal-roundtrip-capacity-c4` | `WorkflowPerfCapacity` | 64 | 4 | 1668.99 | 38.35 | 1244.81 | 1472.61 | 1527.26 |
|
||||
| `mongo-signal-roundtrip-capacity-c8` | `WorkflowPerfCapacity` | 128 | 8 | 1938.12 | 66.04 | 1477.49 | 1743.52 | 1757.88 |
|
||||
| `mongo-signal-roundtrip-capacity-c16` | `WorkflowPerfCapacity` | 256 | 16 | 3728.88 | 68.65 | 3203.94 | 3507.95 | 3527.96 |
|
||||
| `mongo-signal-roundtrip-latency-serial` | `WorkflowPerfLatency` | 16 | 1 | 1675.77 | 9.55 | 97.88 | 149.20 | 324.02 |
|
||||
| `mongo-bulstrad-quotation-confirm-convert-to-policy-nightly` | `WorkflowPerfNightly` | 12 | 4 | 1108.42 | 10.83 | 790.30 | 947.21 | 963.16 |
|
||||
| `mongo-delayed-burst-nightly` | `WorkflowPerfNightly` | 48 | 1 | 2881.66 | 16.66 | 2142.14 | 2265.15 | 2281.04 |
|
||||
| `mongo-immediate-burst-nightly` | `WorkflowPerfNightly` | 120 | 1 | 2598.57 | 46.18 | 1148.06 | 1530.49 | 1575.98 |
|
||||
| `mongo-synthetic-external-resume-nightly` | `WorkflowPerfNightly` | 36 | 8 | 976.73 | 36.86 | 633.82 | 770.10 | 772.71 |
|
||||
| `mongo-bulstrad-quote-or-apl-cancel-smoke` | `WorkflowPerfSmoke` | 10 | 4 | 425.81 | 23.48 | 124.35 | 294.76 | 295.32 |
|
||||
| `mongo-delayed-burst-smoke` | `WorkflowPerfSmoke` | 12 | 1 | 2416.23 | 4.97 | 2040.30 | 2079.79 | 2084.03 |
|
||||
| `mongo-immediate-burst-smoke` | `WorkflowPerfSmoke` | 24 | 1 | 747.36 | 32.11 | 264.14 | 339.42 | 400.99 |
|
||||
| `mongo-signal-roundtrip-soak` | `WorkflowPerfSoak` | 108 | 8 | 2267.91 | 47.62 | 322.40 | 550.50 | 572.73 |
|
||||
| `mongo-signal-roundtrip-throughput-parallel` | `WorkflowPerfThroughput` | 96 | 16 | 1258.48 | 76.28 | 1110.94 | 1121.22 | 1127.11 |
|
||||
|
||||
## Measurement Split
|
||||
|
||||
The synthetic signal round-trip workload is measured in three separate ways:
|
||||
|
||||
- `mongo-signal-roundtrip-latency-serial`: one workflow at a time, one signal worker, used as the single-instance latency baseline.
|
||||
- `mongo-signal-roundtrip-throughput-parallel`: `96` workflows, `16`-way workload concurrency, `8` signal workers, used as the steady-state throughput baseline.
|
||||
- `mongo-signal-roundtrip-capacity-c*`: batch-wave capacity ladder used to observe scaling and pressure points.
|
||||
|
||||
The useful MongoDB baseline is:
|
||||
|
||||
- serial latency baseline: `97.88 ms` average end-to-end per workflow
|
||||
- steady throughput baseline: `76.28 ops/s` with `16` workload concurrency and `8` signal workers
|
||||
- capacity `c1`: `7.08 ops/s`; this is only the smallest batch-wave rung
|
||||
|
||||
### Serial Latency Baseline
|
||||
|
||||
| Phase | Avg ms | P95 ms | Max ms |
|
||||
| --- | ---: | ---: | ---: |
|
||||
| `start` | 26.34 | 79.35 | 251.36 |
|
||||
| `signalPublish` | 8.17 | 10.75 | 12.17 |
|
||||
| `signalToCompletion` | 71.54 | 77.94 | 79.48 |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- serial end-to-end latency is far lower than the Oracle and PostgreSQL baselines on this local setup
|
||||
- most of the work remains in signal-to-completion, but the absolute time is much smaller
|
||||
- workflow start is still the most variable of the three measured phases
|
||||
|
||||
### Steady Throughput Baseline
|
||||
|
||||
| Phase | Avg ms | P95 ms | Max ms |
|
||||
| --- | ---: | ---: | ---: |
|
||||
| `start` | 20.88 | 28.64 | 33.67 |
|
||||
| `signalPublish` | 16.01 | 20.90 | 22.71 |
|
||||
| `signalToCompletion` | 988.88 | 1000.12 | 1004.92 |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- the engine sustained `76.28 ops/s` in a `96`-operation wave
|
||||
- end-to-end average stayed at `1110.94 ms`
|
||||
- the dominant cost is still resume processing, but Mongo remains materially faster on this synthetic profile
|
||||
|
||||
## MongoDB Observations
|
||||
|
||||
### Dominant Waits
|
||||
|
||||
- no durable current-op contention class dominated these runs; every scenario finished without a stable top wait entry
|
||||
- this means the current Mongo baseline should be read primarily through normalized workflow metrics and the Mongo-specific counter set, not through a wait-event headline
|
||||
- the backend bug exposed by the first perf pass was not storage contention; it was correctness:
|
||||
- empty-queue receive had to become bounded
|
||||
- collection bootstrap had to be explicit before transactional concurrency
|
||||
|
||||
### Capacity Ladder
|
||||
|
||||
| Scenario | Throughput/s | P95 ms | Commands | Inserts | Updates | Deletes | Docs Returned | Docs Inserted | Docs Updated | Docs Deleted |
|
||||
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||
| `c1` | 7.08 | 1576.55 | 183 | 48 | 48 | 16 | 80 | 48 | 48 | 16 |
|
||||
| `c4` | 38.35 | 1472.61 | 684 | 192 | 192 | 64 | 320 | 192 | 192 | 64 |
|
||||
| `c8` | 66.04 | 1743.52 | 1349 | 384 | 384 | 128 | 640 | 384 | 384 | 128 |
|
||||
| `c16` | 68.65 | 3507.95 | 2515 | 768 | 768 | 256 | 1280 | 768 | 768 | 256 |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- Mongo scales very aggressively through `c8`
|
||||
- `c16` is still the fastest rung, but it is also where latency expands sharply relative to `c8`
|
||||
- the first visible pressure point is therefore `c16`, even though throughput still rises slightly
|
||||
|
||||
### Transport Baselines
|
||||
|
||||
| Scenario | Throughput/s | Commands | Inserts | Deletes | Network In | Network Out |
|
||||
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||
| `mongo-immediate-burst-nightly` | 46.18 | 379 | 120 | 120 | 277307 | 296277 |
|
||||
| `mongo-delayed-burst-nightly` | 16.66 | 1052 | 48 | 48 | 507607 | 450004 |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- immediate transport is still much cheaper than full workflow resume
|
||||
- delayed transport carries more command and network chatter because the wake path repeatedly checks due work through the change-stream plus due-time model
|
||||
|
||||
### Business Flow Baselines
|
||||
|
||||
| Scenario | Throughput/s | Avg ms | Commands | Queries | Inserts | Updates | Deletes | Tx Started | Tx Committed |
|
||||
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||
| `mongo-bulstrad-quote-or-apl-cancel-smoke` | 23.48 | 124.35 | 54 | 45 | 20 | 0 | 0 | 10 | 10 |
|
||||
| `mongo-bulstrad-quotation-confirm-convert-to-policy-nightly` | 10.83 | 790.30 | 189 | 151 | 96 | 48 | 12 | 36 | 36 |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- the short Bulstrad flow is still cheap enough that transport and projection movement dominate
|
||||
- the heavier `QuotationConfirm -> ConvertToPolicy -> PdfGenerator` path stays comfortably sub-second on this local Mongo baseline
|
||||
|
||||
### Soak Baseline
|
||||
|
||||
`mongo-signal-roundtrip-soak` completed `108` operations at concurrency `8` with:
|
||||
|
||||
- throughput: `47.62 ops/s`
|
||||
- average latency: `322.40 ms`
|
||||
- P95 latency: `550.50 ms`
|
||||
- `0` failures
|
||||
- `0` dead-lettered signals
|
||||
- `0` runtime conflicts
|
||||
- `0` stuck instances
|
||||
|
||||
MongoDB metrics for the soak run:
|
||||
|
||||
- `opcounters.command`: `2264`
|
||||
- `opcounters.insert`: `324`
|
||||
- `opcounters.update`: `324`
|
||||
- `opcounters.delete`: `108`
|
||||
- `metrics.document.returned`: `540`
|
||||
- `metrics.document.inserted`: `324`
|
||||
- `metrics.document.updated`: `324`
|
||||
- `metrics.document.deleted`: `108`
|
||||
- `transactions.totalStarted`: `216`
|
||||
- `transactions.totalCommitted`: `216`
|
||||
|
||||
## What Must Stay Constant For Final Backend Comparison
|
||||
|
||||
When the final Oracle/PostgreSQL/MongoDB comparison is produced, keep these constant:
|
||||
|
||||
- same scenario names
|
||||
- same operation counts
|
||||
- same concurrency levels
|
||||
- same worker counts for signal drain
|
||||
- same synthetic workflow definitions
|
||||
- same Bulstrad workflow families
|
||||
- same correctness assertions
|
||||
|
||||
Compare these dimensions directly:
|
||||
|
||||
- throughput per second
|
||||
- latency average, P95, P99, and max
|
||||
- phase latency summaries for start, signal publish, and signal-to-completion on the synthetic signal round-trip workload
|
||||
- failures, dead letters, runtime conflicts, and stuck instances
|
||||
- commit, transaction, or mutation count analogs
|
||||
- row, tuple, or document movement analogs
|
||||
- read, network, and wake-path cost analogs
|
||||
- dominant waits, locks, or contention classes when the backend exposes them clearly
|
||||
|
||||
## First Sizing Note
|
||||
|
||||
On this local MongoDB baseline:
|
||||
|
||||
- Mongo is the fastest of the three backends on the synthetic signal round-trip workloads measured so far
|
||||
- the biggest correctness findings came from backend behavior, not raw throughput:
|
||||
- bounded empty-queue receive
|
||||
- explicit collection bootstrap before transactional concurrency
|
||||
- `c8` is the last clearly comfortable capacity rung
|
||||
- `c16` is the first rung where latency growth becomes visible, even though throughput still increases slightly
|
||||
|
||||
This is a baseline, not a production commitment. The final recommendation still needs the explicit three-backend comparison pack using the same workloads and the same correctness rules.
|
||||
|
||||
125
docs/workflow/engine/13-backend-comparison-2026-03-17.json
Normal file
125
docs/workflow/engine/13-backend-comparison-2026-03-17.json
Normal file
@@ -0,0 +1,125 @@
|
||||
{
|
||||
"date": "2026-03-17",
|
||||
"type": "backend-comparison",
|
||||
"status": "baseline-decision-pack",
|
||||
"sources": {
|
||||
"oracle": "10-oracle-performance-baseline-2026-03-17.json",
|
||||
"postgres": "11-postgres-performance-baseline-2026-03-17.json",
|
||||
"mongo": "12-mongo-performance-baseline-2026-03-17.json"
|
||||
},
|
||||
"validation": {
|
||||
"integrationBuild": {
|
||||
"warnings": 0,
|
||||
"errors": 0
|
||||
},
|
||||
"oraclePerformanceSuite": "12/12",
|
||||
"postgresPerformanceSuite": "11/11",
|
||||
"mongoPerformanceSuite": "14/14",
|
||||
"postgresBackendParitySuite": "9/9",
|
||||
"mongoBackendParitySuite": "23/23"
|
||||
},
|
||||
"normalizedMetrics": {
|
||||
"signalRoundTrip": {
|
||||
"oracle": {
|
||||
"serialLatencyAvgMs": 3104.85,
|
||||
"serialLatencyP95Ms": 3165.04,
|
||||
"throughputOpsPerSecond": 20.98,
|
||||
"throughputAvgMs": 4142.13,
|
||||
"throughputP95Ms": 4215.64,
|
||||
"soakOpsPerSecond": 3.91,
|
||||
"soakAvgMs": 4494.29,
|
||||
"soakP95Ms": 5589.33
|
||||
},
|
||||
"postgres": {
|
||||
"serialLatencyAvgMs": 3079.33,
|
||||
"serialLatencyP95Ms": 3094.94,
|
||||
"throughputOpsPerSecond": 25.74,
|
||||
"throughputAvgMs": 3603.54,
|
||||
"throughputP95Ms": 3635.59,
|
||||
"soakOpsPerSecond": 4.30,
|
||||
"soakAvgMs": 4164.52,
|
||||
"soakP95Ms": 4208.42
|
||||
},
|
||||
"mongo": {
|
||||
"serialLatencyAvgMs": 97.88,
|
||||
"serialLatencyP95Ms": 149.20,
|
||||
"throughputOpsPerSecond": 76.28,
|
||||
"throughputAvgMs": 1110.94,
|
||||
"throughputP95Ms": 1121.22,
|
||||
"soakOpsPerSecond": 47.62,
|
||||
"soakAvgMs": 322.40,
|
||||
"soakP95Ms": 550.50
|
||||
}
|
||||
},
|
||||
"capacity": {
|
||||
"c1": {
|
||||
"oracle": 3.37,
|
||||
"postgres": 4.11,
|
||||
"mongo": 7.08
|
||||
},
|
||||
"c4": {
|
||||
"oracle": 15.22,
|
||||
"postgres": 17.29,
|
||||
"mongo": 38.35
|
||||
},
|
||||
"c8": {
|
||||
"oracle": 21.34,
|
||||
"postgres": 33.21,
|
||||
"mongo": 66.04
|
||||
},
|
||||
"c16": {
|
||||
"oracle": 34.03,
|
||||
"postgres": 57.04,
|
||||
"mongo": 68.65
|
||||
}
|
||||
},
|
||||
"transport": {
|
||||
"immediateBurstNightlyOpsPerSecond": {
|
||||
"oracle": 50.18,
|
||||
"postgres": 70.10,
|
||||
"mongo": 46.18
|
||||
},
|
||||
"delayedBurstNightlyOpsPerSecond": {
|
||||
"oracle": 10.71,
|
||||
"postgres": 19.60,
|
||||
"mongo": 16.66
|
||||
}
|
||||
},
|
||||
"bulstrad": {
|
||||
"quoteOrAplCancelSmokeOpsPerSecond": {
|
||||
"oracle": 19.69,
|
||||
"postgres": 59.88,
|
||||
"mongo": 23.48
|
||||
},
|
||||
"quotationConfirmConvertToPolicyNightlyOpsPerSecond": {
|
||||
"oracle": 1.77,
|
||||
"postgres": 3.33,
|
||||
"mongo": 10.83
|
||||
}
|
||||
}
|
||||
},
|
||||
"backendObservations": {
|
||||
"oracle": {
|
||||
"dominantWait": "log file sync",
|
||||
"strength": "highest validation maturity",
|
||||
"primaryPressure": "commit pressure"
|
||||
},
|
||||
"postgres": {
|
||||
"dominantWait": "Client:ClientRead",
|
||||
"strength": "best relational performance profile",
|
||||
"primaryPressure": "queue claim and wake cadence"
|
||||
},
|
||||
"mongo": {
|
||||
"dominantWait": "none-stable",
|
||||
"strength": "fastest measured synthetic throughput",
|
||||
"primaryPressure": "latency expansion from c8 to c16 and operational dependence on transactions plus change streams"
|
||||
}
|
||||
},
|
||||
"recommendation": {
|
||||
"currentPortabilityChoice": "PostgreSQL",
|
||||
"reason": "best performance-to-operability compromise with a relational model and competitive backend-native validation",
|
||||
"fastestMeasuredBackend": "MongoDB",
|
||||
"highestValidationMaturity": "Oracle",
|
||||
"caveat": "this is a baseline-level decision pack; Oracle still has the broadest hostile-condition and Bulstrad hardening surface"
|
||||
}
|
||||
}
|
||||
167
docs/workflow/engine/13-backend-comparison-2026-03-17.md
Normal file
167
docs/workflow/engine/13-backend-comparison-2026-03-17.md
Normal file
@@ -0,0 +1,167 @@
|
||||
# Backend Comparison 2026-03-17
|
||||
|
||||
## Purpose
|
||||
|
||||
This document compares the current Oracle, PostgreSQL, and MongoDB workflow-engine backends using the published normalized performance baselines and the currently implemented backend-specific validation suites.
|
||||
|
||||
This is a decision pack for the current local Docker benchmark set. It is not the final production recommendation pack yet, because hostile-condition and Bulstrad hardening depth is still strongest on Oracle.
|
||||
|
||||
The durable machine-readable companion is [13-backend-comparison-2026-03-17.json](13-backend-comparison-2026-03-17.json).
|
||||
|
||||
For the exact six-profile signal-driver matrix, including `Oracle+Redis`, `PostgreSQL+Redis`, and `Mongo+Redis`, see [14-signal-driver-backend-matrix-2026-03-17.md](14-signal-driver-backend-matrix-2026-03-17.md).
|
||||
|
||||
## Source Baselines
|
||||
|
||||
- [10-oracle-performance-baseline-2026-03-17.md](10-oracle-performance-baseline-2026-03-17.md)
|
||||
- [11-postgres-performance-baseline-2026-03-17.md](11-postgres-performance-baseline-2026-03-17.md)
|
||||
- [12-mongo-performance-baseline-2026-03-17.md](12-mongo-performance-baseline-2026-03-17.md)
|
||||
|
||||
## Validation Status
|
||||
|
||||
Current comparison-relevant validation state:
|
||||
|
||||
- Oracle performance suite: `12/12` passed
|
||||
- PostgreSQL performance suite: `11/11` passed
|
||||
- MongoDB performance suite: `14/14` passed
|
||||
- PostgreSQL focused backend parity suite: `9/9` passed
|
||||
- MongoDB focused backend parity suite: `23/23` passed
|
||||
- integration project build: `0` warnings, `0` errors
|
||||
|
||||
Important caveat:
|
||||
|
||||
- Oracle still has the broadest hostile-condition and Bulstrad E2E matrix.
|
||||
- PostgreSQL and MongoDB now have backend-native signal/runtime/projection/performance coverage plus curated Bulstrad parity, but they do not yet match Oracle's full reliability surface.
|
||||
|
||||
## Normalized Comparison
|
||||
|
||||
### Synthetic Signal Round-Trip
|
||||
|
||||
| Metric | Oracle | PostgreSQL | MongoDB |
|
||||
| --- | ---: | ---: | ---: |
|
||||
| Serial latency avg ms | 3104.85 | 3079.33 | 97.88 |
|
||||
| Serial latency P95 ms | 3165.04 | 3094.94 | 149.20 |
|
||||
| Throughput ops/s | 20.98 | 25.74 | 76.28 |
|
||||
| Throughput avg ms | 4142.13 | 3603.54 | 1110.94 |
|
||||
| Throughput P95 ms | 4215.64 | 3635.59 | 1121.22 |
|
||||
| Soak ops/s | 3.91 | 4.30 | 47.62 |
|
||||
| Soak avg ms | 4494.29 | 4164.52 | 322.40 |
|
||||
| Soak P95 ms | 5589.33 | 4208.42 | 550.50 |
|
||||
|
||||
### Capacity Ladder
|
||||
|
||||
| Concurrency | Oracle ops/s | PostgreSQL ops/s | MongoDB ops/s |
|
||||
| --- | ---: | ---: | ---: |
|
||||
| `c1` | 3.37 | 4.11 | 7.08 |
|
||||
| `c4` | 15.22 | 17.29 | 38.35 |
|
||||
| `c8` | 21.34 | 33.21 | 66.04 |
|
||||
| `c16` | 34.03 | 57.04 | 68.65 |
|
||||
|
||||
### Transport Baselines
|
||||
|
||||
| Scenario | Oracle ops/s | PostgreSQL ops/s | MongoDB ops/s |
|
||||
| --- | ---: | ---: | ---: |
|
||||
| Immediate burst nightly | 50.18 | 70.10 | 46.18 |
|
||||
| Delayed burst nightly | 10.71 | 19.60 | 16.66 |
|
||||
| Immediate burst smoke | 56.94 | 70.21 | 32.11 |
|
||||
| Delayed burst smoke | 2.86 | 5.59 | 4.97 |
|
||||
|
||||
### Bulstrad Workloads
|
||||
|
||||
| Scenario | Oracle ops/s | PostgreSQL ops/s | MongoDB ops/s |
|
||||
| --- | ---: | ---: | ---: |
|
||||
| `QuoteOrAplCancel` smoke | 19.69 | 59.88 | 23.48 |
|
||||
| `QuotationConfirm -> ConvertToPolicy` nightly | 1.77 | 3.33 | 10.83 |
|
||||
|
||||
## Backend-Specific Observations
|
||||
|
||||
### Oracle
|
||||
|
||||
- Strongest validation depth and strongest correctness story.
|
||||
- Main cost center is still commit pressure.
|
||||
- Dominant wait is `log file sync` in almost every measured scenario.
|
||||
- `c8` is still the last comfortable rung on the local Oracle Free setup.
|
||||
|
||||
### PostgreSQL
|
||||
|
||||
- Best relational performance profile in the current measurements.
|
||||
- Immediate transport is the strongest of the three measured backends.
|
||||
- Dominant wait is `Client:ClientRead`, which points to queue-claim cadence and short-transaction wake behavior, not a clear storage stall.
|
||||
- `c16` still scales cleanly on this benchmark set and does not yet show a hard saturation cliff.
|
||||
|
||||
### MongoDB
|
||||
|
||||
- Fastest measured backend across the synthetic signal round-trip workloads and the medium Bulstrad nightly flow.
|
||||
- No stable top wait dominated the measured runs; current analysis is more meaningful through normalized metrics and Mongo counters than through wait classification.
|
||||
- The significant findings were correctness issues discovered and fixed during perf work:
|
||||
- bounded empty-queue receive
|
||||
- explicit collection bootstrap before transactional concurrency
|
||||
- `c16` is the first rung where latency visibly expands even though throughput still rises.
|
||||
|
||||
## Decision View
|
||||
|
||||
### Raw Performance Ranking
|
||||
|
||||
For the current local benchmark set:
|
||||
|
||||
1. MongoDB
|
||||
2. PostgreSQL
|
||||
3. Oracle
|
||||
|
||||
This order is stable for:
|
||||
|
||||
- serial latency
|
||||
- steady-state synthetic throughput
|
||||
- soak throughput
|
||||
- medium Bulstrad nightly flow
|
||||
|
||||
### Validation Maturity Ranking
|
||||
|
||||
For the current implementation state:
|
||||
|
||||
1. Oracle
|
||||
2. PostgreSQL
|
||||
3. MongoDB
|
||||
|
||||
Reason:
|
||||
|
||||
- Oracle has the deepest hostile-condition and Bulstrad E2E surface.
|
||||
- PostgreSQL now has a solid backend-native suite and competitive performance, but less reliability breadth than Oracle.
|
||||
- MongoDB now has good backend-native and performance coverage, but its operational model is still the most infrastructure-sensitive because it depends on replica-set transactions plus change-stream wake behavior.
|
||||
|
||||
### Current Recommendation
|
||||
|
||||
If the next backend after Oracle must be chosen today:
|
||||
|
||||
- choose PostgreSQL as the next default portability target
|
||||
|
||||
Reason:
|
||||
|
||||
- it materially outperforms Oracle on the normalized workflow workloads
|
||||
- it preserves the relational operational model for runtime state and projections
|
||||
- its wake-hint model is simpler to reason about operationally than MongoDB change streams
|
||||
- it now has enough backend-native correctness and Bulstrad parity coverage to be a credible second backend
|
||||
|
||||
If the decision is based only on benchmark speed:
|
||||
|
||||
- MongoDB is currently fastest
|
||||
|
||||
But that is not the same as the safest operational recommendation yet.
|
||||
|
||||
## What Remains Before A Final Production Recommendation
|
||||
|
||||
- expand PostgreSQL hostile-condition coverage to match the broader Oracle matrix
|
||||
- expand MongoDB hostile-condition coverage to match the broader Oracle matrix
|
||||
- widen the curated Bulstrad parity pack further on PostgreSQL and MongoDB
|
||||
- rerun the shared parity pack on all three backends in one closeout pass
|
||||
- add environment-to-environment reruns before turning local Docker numbers into sizing guidance
|
||||
|
||||
## Short Conclusion
|
||||
|
||||
The engine is now backend-comparable on normalized performance across Oracle, PostgreSQL, and MongoDB.
|
||||
|
||||
The current picture is:
|
||||
|
||||
- Oracle is the most validated backend
|
||||
- PostgreSQL is the best performance-to-operability compromise
|
||||
- MongoDB is the fastest measured backend but not yet the safest backend recommendation
|
||||
|
||||
@@ -0,0 +1,137 @@
|
||||
{
|
||||
"date": "2026-03-17",
|
||||
"type": "signal-driver-backend-matrix",
|
||||
"sourcePolicy": "artifact-driven-only",
|
||||
"generatedMatrixArtifact": {
|
||||
"markdown": "C:\\dev\\serdica-backend4\\src\\Serdica\\Ablera.Serdica.Workflow\\__Tests\\Ablera.Serdica.Workflow.IntegrationTests\\bin\\Release\\net9.0\\TestResults\\workflow-performance\\WorkflowPerfComparison\\20260317T210643496-workflow-backend-signal-roundtrip-six-profile-matrix.md",
|
||||
"json": "C:\\dev\\serdica-backend4\\src\\Serdica\\Ablera.Serdica.Workflow\\__Tests\\Ablera.Serdica.Workflow.IntegrationTests\\bin\\Release\\net9.0\\TestResults\\workflow-performance\\WorkflowPerfComparison\\20260317T210643496-workflow-backend-signal-roundtrip-six-profile-matrix.json"
|
||||
},
|
||||
"profiles": [
|
||||
"Oracle",
|
||||
"PostgreSQL",
|
||||
"Mongo",
|
||||
"Oracle+Redis",
|
||||
"PostgreSQL+Redis",
|
||||
"Mongo+Redis"
|
||||
],
|
||||
"serialLatency": {
|
||||
"endToEndAvgMs": {
|
||||
"Oracle": 3091.73,
|
||||
"PostgreSQL": 3101.57,
|
||||
"Mongo": 151.36,
|
||||
"Oracle+Redis": 3223.22,
|
||||
"PostgreSQL+Redis": 3073.70,
|
||||
"Mongo+Redis": 3099.51
|
||||
},
|
||||
"endToEndP95Ms": {
|
||||
"Oracle": 3492.73,
|
||||
"PostgreSQL": 3143.39,
|
||||
"Mongo": 308.90,
|
||||
"Oracle+Redis": 3644.66,
|
||||
"PostgreSQL+Redis": 3090.75,
|
||||
"Mongo+Redis": 3162.04
|
||||
},
|
||||
"startAvgMs": {
|
||||
"Oracle": 105.88,
|
||||
"PostgreSQL": 16.35,
|
||||
"Mongo": 38.39,
|
||||
"Oracle+Redis": 110.03,
|
||||
"PostgreSQL+Redis": 8.32,
|
||||
"Mongo+Redis": 21.77
|
||||
},
|
||||
"signalPublishAvgMs": {
|
||||
"Oracle": 23.39,
|
||||
"PostgreSQL": 11.47,
|
||||
"Mongo": 14.30,
|
||||
"Oracle+Redis": 23.90,
|
||||
"PostgreSQL+Redis": 7.55,
|
||||
"Mongo+Redis": 10.43
|
||||
},
|
||||
"signalToFirstCompletionAvgMs": {
|
||||
"Oracle": 76.15,
|
||||
"PostgreSQL": 37.56,
|
||||
"Mongo": 55.06,
|
||||
"Oracle+Redis": 81.46,
|
||||
"PostgreSQL+Redis": 31.77,
|
||||
"Mongo+Redis": 40.88
|
||||
},
|
||||
"signalToCompletionAvgMs": {
|
||||
"Oracle": 2985.81,
|
||||
"PostgreSQL": 3085.21,
|
||||
"Mongo": 112.92,
|
||||
"Oracle+Redis": 3113.11,
|
||||
"PostgreSQL+Redis": 3065.38,
|
||||
"Mongo+Redis": 3077.73
|
||||
},
|
||||
"drainToIdleOverhangAvgMs": {
|
||||
"Oracle": 2909.65,
|
||||
"PostgreSQL": 3047.65,
|
||||
"Mongo": 57.86,
|
||||
"Oracle+Redis": 3031.66,
|
||||
"PostgreSQL+Redis": 3033.61,
|
||||
"Mongo+Redis": 3036.85
|
||||
}
|
||||
},
|
||||
"parallelThroughput": {
|
||||
"throughputOpsPerSecond": {
|
||||
"Oracle": 24.17,
|
||||
"PostgreSQL": 26.28,
|
||||
"Mongo": 119.51,
|
||||
"Oracle+Redis": 21.88,
|
||||
"PostgreSQL+Redis": 25.51,
|
||||
"Mongo+Redis": 25.14
|
||||
},
|
||||
"endToEndAvgMs": {
|
||||
"Oracle": 3740.84,
|
||||
"PostgreSQL": 3546.11,
|
||||
"Mongo": 688.57,
|
||||
"Oracle+Redis": 4147.82,
|
||||
"PostgreSQL+Redis": 3643.70,
|
||||
"Mongo+Redis": 3701.72
|
||||
},
|
||||
"endToEndP95Ms": {
|
||||
"Oracle": 3841.33,
|
||||
"PostgreSQL": 3554.13,
|
||||
"Mongo": 701.92,
|
||||
"Oracle+Redis": 4243.83,
|
||||
"PostgreSQL+Redis": 3675.15,
|
||||
"Mongo+Redis": 3721.14
|
||||
},
|
||||
"startAvgMs": {
|
||||
"Oracle": 47.44,
|
||||
"PostgreSQL": 11.32,
|
||||
"Mongo": 17.89,
|
||||
"Oracle+Redis": 55.82,
|
||||
"PostgreSQL+Redis": 17.07,
|
||||
"Mongo+Redis": 17.04
|
||||
},
|
||||
"signalPublishAvgMs": {
|
||||
"Oracle": 15.62,
|
||||
"PostgreSQL": 15.11,
|
||||
"Mongo": 10.85,
|
||||
"Oracle+Redis": 23.80,
|
||||
"PostgreSQL+Redis": 10.53,
|
||||
"Mongo+Redis": 12.27
|
||||
},
|
||||
"signalToCompletionAvgMs": {
|
||||
"Oracle": 3525.84,
|
||||
"PostgreSQL": 3469.46,
|
||||
"Mongo": 590.78,
|
||||
"Oracle+Redis": 3872.54,
|
||||
"PostgreSQL+Redis": 3564.43,
|
||||
"Mongo+Redis": 3598.14
|
||||
}
|
||||
},
|
||||
"integrity": {
|
||||
"allProfilesPassed": true,
|
||||
"requiredChecks": [
|
||||
"Failures = 0",
|
||||
"DeadLetteredSignals = 0",
|
||||
"RuntimeConflicts = 0",
|
||||
"StuckInstances = 0",
|
||||
"WorkflowsStarted = OperationCount",
|
||||
"SignalsPublished = OperationCount",
|
||||
"SignalsProcessed = OperationCount"
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,76 @@
|
||||
# Signal Driver / Backend Matrix 2026-03-17
|
||||
|
||||
## Purpose
|
||||
|
||||
This snapshot records the current six-profile synthetic signal round-trip comparison:
|
||||
|
||||
- `Oracle`
|
||||
- `PostgreSQL`
|
||||
- `Mongo`
|
||||
- `Oracle+Redis`
|
||||
- `PostgreSQL+Redis`
|
||||
- `Mongo+Redis`
|
||||
|
||||
The matrix is artifact-driven. Every value comes from measured JSON artifacts under `TestResults/workflow-performance/`. No hand-entered metric values are used.
|
||||
|
||||
The exact generated matrix artifact is:
|
||||
- `WorkflowPerfComparison/20260317T210643496-workflow-backend-signal-roundtrip-six-profile-matrix.md`
|
||||
- `WorkflowPerfComparison/20260317T210643496-workflow-backend-signal-roundtrip-six-profile-matrix.json`
|
||||
|
||||
## Serial Latency
|
||||
|
||||
Primary comparison rows in this section are:
|
||||
|
||||
- `Signal to first completion avg`
|
||||
- `Drain-to-idle overhang avg`
|
||||
|
||||
`Signal to completion avg` is a mixed number.
|
||||
It includes both real resume work and the benchmark drain policy.
|
||||
|
||||
| Metric | Unit | Oracle | PostgreSQL | Mongo | Oracle+Redis | PostgreSQL+Redis | Mongo+Redis |
|
||||
| --- | --- | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||
| End-to-end avg | ms | 3091.73 | 3101.57 | 151.36 | 3223.22 | 3073.70 | 3099.51 |
|
||||
| End-to-end p95 | ms | 3492.73 | 3143.39 | 308.90 | 3644.66 | 3090.75 | 3162.04 |
|
||||
| Start avg | ms | 105.88 | 16.35 | 38.39 | 110.03 | 8.32 | 21.77 |
|
||||
| Signal publish avg | ms | 23.39 | 11.47 | 14.30 | 23.90 | 7.55 | 10.43 |
|
||||
| Signal to first completion avg | ms | 76.15 | 37.56 | 55.06 | 81.46 | 31.77 | 40.88 |
|
||||
| Signal to completion avg | ms | 2985.81 | 3085.21 | 112.92 | 3113.11 | 3065.38 | 3077.73 |
|
||||
| Drain-to-idle overhang avg | ms | 2909.65 | 3047.65 | 57.86 | 3031.66 | 3033.61 | 3036.85 |
|
||||
|
||||
## Parallel Throughput
|
||||
|
||||
| Metric | Unit | Oracle | PostgreSQL | Mongo | Oracle+Redis | PostgreSQL+Redis | Mongo+Redis |
|
||||
| --- | --- | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||
| Throughput | ops/s | 24.17 | 26.28 | 119.51 | 21.88 | 25.51 | 25.14 |
|
||||
| End-to-end avg | ms | 3740.84 | 3546.11 | 688.57 | 4147.82 | 3643.70 | 3701.72 |
|
||||
| End-to-end p95 | ms | 3841.33 | 3554.13 | 701.92 | 4243.83 | 3675.15 | 3721.14 |
|
||||
| Start avg | ms | 47.44 | 11.32 | 17.89 | 55.82 | 17.07 | 17.04 |
|
||||
| Signal publish avg | ms | 15.62 | 15.11 | 10.85 | 23.80 | 10.53 | 12.27 |
|
||||
| Signal to completion avg | ms | 3525.84 | 3469.46 | 590.78 | 3872.54 | 3564.43 | 3598.14 |
|
||||
|
||||
## Integrity
|
||||
|
||||
The comparison test required all six columns to pass these checks on both the serial-latency and parallel-throughput source artifacts:
|
||||
|
||||
- `Failures = 0`
|
||||
- `DeadLetteredSignals = 0`
|
||||
- `RuntimeConflicts = 0`
|
||||
- `StuckInstances = 0`
|
||||
- `WorkflowsStarted = OperationCount`
|
||||
- `SignalsPublished = OperationCount`
|
||||
- `SignalsProcessed = OperationCount`
|
||||
|
||||
All six columns passed.
|
||||
|
||||
## Interpretation
|
||||
|
||||
The main conclusions from this six-profile matrix are:
|
||||
|
||||
- Native Mongo is still the fastest measured profile for the synthetic signal round-trip.
|
||||
- Native PostgreSQL remains the best-performing relational profile.
|
||||
- Oracle+Redis is slower than native Oracle in this benchmark.
|
||||
- PostgreSQL+Redis is very close to native PostgreSQL, but not clearly better.
|
||||
- Mongo+Redis is dramatically worse than native Mongo because the Redis path reintroduces the empty-wait overhang that native change streams avoid.
|
||||
|
||||
The most useful row for actual resume speed is `Signal to first completion avg`, not the mixed `Signal to completion avg`, because the latter still includes drain-to-idle policy.
|
||||
|
||||
493
docs/workflow/engine/15-backend-and-signal-driver-usage.md
Normal file
493
docs/workflow/engine/15-backend-and-signal-driver-usage.md
Normal file
@@ -0,0 +1,493 @@
|
||||
# 15. Backend And Signal Driver Usage
|
||||
|
||||
## Purpose
|
||||
|
||||
This document turns the current backend implementation and measured six-profile matrix into operating guidance.
|
||||
|
||||
It answers three practical questions:
|
||||
|
||||
1. which backend should be the durable workflow system of record
|
||||
2. whether the signal driver should stay native or use Redis
|
||||
3. when a given combination should or should not be used
|
||||
|
||||
The reference comparison data comes from:
|
||||
|
||||
- [13-backend-comparison-2026-03-17.md](13-backend-comparison-2026-03-17.md)
|
||||
- [14-signal-driver-backend-matrix-2026-03-17.md](14-signal-driver-backend-matrix-2026-03-17.md)
|
||||
|
||||
## Two Separate Choices
|
||||
|
||||
There are two distinct infrastructure choices in the current engine.
|
||||
|
||||
### 1. Backend
|
||||
|
||||
The backend is the durable correctness layer.
|
||||
|
||||
It owns:
|
||||
|
||||
- runtime state
|
||||
- projections
|
||||
- durable signal persistence
|
||||
- delayed signal persistence
|
||||
- dead-letter persistence
|
||||
- mutation transaction boundary
|
||||
|
||||
The configured backend lives under:
|
||||
|
||||
- `WorkflowBackend:Provider`
|
||||
|
||||
Supported values are defined by the engine backend identifiers.
|
||||
|
||||
Current values:
|
||||
|
||||
- `Oracle`
|
||||
- `Postgres`
|
||||
- `Mongo`
|
||||
|
||||
### 2. Signal Driver
|
||||
|
||||
The signal driver is the wake mechanism.
|
||||
|
||||
It owns:
|
||||
|
||||
- wake notification delivery
|
||||
- receive wait behavior
|
||||
- claim loop entry path
|
||||
|
||||
It does not own correctness.
|
||||
|
||||
The configured signal driver lives under:
|
||||
|
||||
- `WorkflowSignalDriver:Provider`
|
||||
|
||||
Supported values are defined by the engine signal-driver identifiers.
|
||||
|
||||
Current values:
|
||||
|
||||
- `Native`
|
||||
- `Redis`
|
||||
|
||||
## Core Rule
|
||||
|
||||
Redis is a wake driver, not a durable workflow queue.
|
||||
|
||||
That means:
|
||||
|
||||
1. the selected backend always remains the durable source of truth
|
||||
2. runtime state and durable signals commit in the backend transaction boundary
|
||||
3. Redis only publishes wake hints after commit
|
||||
4. workers always claim from the durable backend store
|
||||
|
||||
Do not design or describe Redis as the place where workflow correctness lives.
|
||||
|
||||
## Supported Profiles
|
||||
|
||||
| Profile | Durable correctness layer | Wake path | Current recommendation |
|
||||
| --- | --- | --- | --- |
|
||||
| `Oracle + Native` | Oracle + AQ | AQ dequeue | Default production profile |
|
||||
| `Oracle + Redis` | Oracle + AQ | Redis wake, AQ claim | Supported, not preferred |
|
||||
| `Postgres + Native` | PostgreSQL tables | PostgreSQL native wake | Best relational portability profile |
|
||||
| `Postgres + Redis` | PostgreSQL tables | Redis wake, PostgreSQL claim | Supported, optional |
|
||||
| `Mongo + Native` | Mongo collections | Mongo change streams | Fastest measured profile, with operational caveats |
|
||||
| `Mongo + Redis` | Mongo collections | Redis wake, Mongo claim | Supported, generally not recommended |
|
||||
|
||||
## How To Read The Performance Data
|
||||
|
||||
The six-profile matrix contains both real resume timing and benchmark drain policy timing.
|
||||
|
||||
Use these rows as primary decision inputs:
|
||||
|
||||
- `Signal to first completion avg`
|
||||
- `Throughput`
|
||||
|
||||
Treat these rows as secondary:
|
||||
|
||||
- `Signal to completion avg`
|
||||
- `Drain-to-idle overhang avg`
|
||||
|
||||
Reason:
|
||||
|
||||
- `Signal to first completion avg` measures actual wake and resume speed
|
||||
- `Signal to completion avg` also includes empty-queue drain behavior
|
||||
- `Drain-to-idle overhang avg` explains how much of the mixed latency is benchmark overhang, not real resume work
|
||||
|
||||
The current matrix shows that clearly:
|
||||
|
||||
| Metric | Oracle | PostgreSQL | Mongo | Oracle+Redis | PostgreSQL+Redis | Mongo+Redis |
|
||||
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||
| Signal to first completion avg ms | 76.15 | 37.56 | 55.06 | 81.46 | 31.77 | 40.88 |
|
||||
| Throughput ops/s | 24.17 | 26.28 | 119.51 | 21.88 | 25.51 | 25.14 |
|
||||
| Drain-to-idle overhang avg ms | 2909.65 | 3047.65 | 57.86 | 3031.66 | 3033.61 | 3036.85 |
|
||||
|
||||
Interpretation:
|
||||
|
||||
- native Mongo is fast because the native change-stream wake path also has low empty-receive overhang
|
||||
- PostgreSQL native and PostgreSQL plus Redis are close in real resume speed
|
||||
- Oracle native remains slightly better than Oracle plus Redis
|
||||
- Mongo plus Redis loses most of native Mongo's advantage because Redis mode reintroduces the empty-wait overhang
|
||||
|
||||
## Recommended Default Choices
|
||||
|
||||
### Default Production Choice Today
|
||||
|
||||
Use `Oracle + Native`.
|
||||
|
||||
Use it when:
|
||||
|
||||
- Oracle is already the platform system of record
|
||||
- strongest validated correctness and restart behavior matter more than portability
|
||||
- AQ is available and operationally acceptable
|
||||
- timer precision and native transactional coupling are important
|
||||
|
||||
Why:
|
||||
|
||||
- it has the strongest hostile-condition coverage
|
||||
- it remains the semantic reference implementation
|
||||
- it keeps one native durable stack for state, signals, and scheduling
|
||||
|
||||
### Best Relational Non-Oracle Choice
|
||||
|
||||
Use `Postgres + Native`.
|
||||
|
||||
Use it when:
|
||||
|
||||
- a relational backend is required
|
||||
- Oracle is not desired
|
||||
- you want the cleanest portability path
|
||||
- you want performance close to Oracle with simpler infrastructure
|
||||
|
||||
Why:
|
||||
|
||||
- it is the strongest non-Oracle backend in the current relational comparison
|
||||
- native PostgreSQL wake is already competitive with Redis in the current measurements
|
||||
- it keeps one backend-native operational story
|
||||
|
||||
### Highest Measured Synthetic Throughput Choice
|
||||
|
||||
Use `Mongo + Native` only when its operational assumptions are acceptable.
|
||||
|
||||
Use it when:
|
||||
|
||||
- throughput and low wake latency matter strongly
|
||||
- Mongo replica-set transactions are already an accepted platform dependency
|
||||
- the team is comfortable operating change streams and Mongo-specific failure modes
|
||||
|
||||
Why:
|
||||
|
||||
- it is currently the fastest measured profile
|
||||
- its native wake path avoids the large empty-wait overhang seen in the other measured paths
|
||||
|
||||
Do not treat this as the universal default.
|
||||
|
||||
Mongo is fast in the current engine workload, but its operational model is still less conservative than the relational profiles.
|
||||
|
||||
## When Redis Should Be Used
|
||||
|
||||
Redis should be selected for operational topology reasons, not by default as a performance assumption.
|
||||
|
||||
Good reasons to use Redis:
|
||||
|
||||
- one shared wake substrate is required across multiple backend profiles
|
||||
- the deployment already standardizes on Redis for fan-out and worker wake infrastructure
|
||||
- you want the backend-native wake path disabled intentionally and replaced by one uniform wake mechanism
|
||||
|
||||
Weak reasons to use Redis:
|
||||
|
||||
- "Redis is always faster"
|
||||
- "Redis should hold the durable signal queue"
|
||||
- "Redis should replace the backend transaction boundary"
|
||||
|
||||
Those are not valid design assumptions for this engine.
|
||||
|
||||
## Profile-By-Profile Guidance
|
||||
|
||||
### Oracle + Native
|
||||
|
||||
Use when:
|
||||
|
||||
- Oracle is the chosen workflow backend
|
||||
- AQ is available
|
||||
- you want the strongest native transactional semantics
|
||||
|
||||
Do not switch away from it just to standardize on Redis.
|
||||
|
||||
Current measured result:
|
||||
|
||||
- native Oracle is slightly better than Oracle plus Redis on both first-completion latency and throughput
|
||||
|
||||
### Oracle + Redis
|
||||
|
||||
Use only when:
|
||||
|
||||
- Oracle remains the durable backend
|
||||
- Redis is required as a uniform wake topology across the environment
|
||||
- the small performance loss is acceptable
|
||||
|
||||
Do not use it as the default Oracle profile.
|
||||
|
||||
Current measured result:
|
||||
|
||||
- it works correctly
|
||||
- it is slower than native Oracle
|
||||
- it does not improve timer behavior today
|
||||
|
||||
### Postgres + Native
|
||||
|
||||
Use as the first portability target when leaving Oracle.
|
||||
|
||||
Use when:
|
||||
|
||||
- you want a relational durable store
|
||||
- you want the cleanest alternative to Oracle
|
||||
- you want the simplest operational story for PostgreSQL
|
||||
|
||||
This should be the default PostgreSQL profile.
|
||||
|
||||
### Postgres + Redis
|
||||
|
||||
Use when:
|
||||
|
||||
- PostgreSQL is the durable backend
|
||||
- a shared Redis wake topology is required
|
||||
- a nearly flat performance profile versus native PostgreSQL is acceptable
|
||||
|
||||
Do not assume it is a speed upgrade.
|
||||
|
||||
Current measured result:
|
||||
|
||||
- it is very close to native PostgreSQL
|
||||
- it is not a compelling performance win on its own
|
||||
|
||||
### Mongo + Native
|
||||
|
||||
Use when:
|
||||
|
||||
- MongoDB is an accepted transactional system of record for workflow runtime state
|
||||
- replica-set transactions are available
|
||||
- the team accepts Mongo operational ownership
|
||||
|
||||
This should be the default Mongo profile.
|
||||
|
||||
### Mongo + Redis
|
||||
|
||||
Avoid as the normal Mongo profile.
|
||||
|
||||
Use only when:
|
||||
|
||||
- Mongo must remain the durable backend
|
||||
- Redis wake standardization is mandatory for the deployment
|
||||
- the team accepts materially worse measured wake behavior than native Mongo
|
||||
|
||||
Current measured result:
|
||||
|
||||
- native Mongo is much better overall
|
||||
- first-completion latency stays acceptable, but steady throughput and idle-drain behavior become much worse
|
||||
- Redis removes the main measured advantage of the native Mongo wake path
|
||||
|
||||
## Timer And Delayed-Signal Guidance
|
||||
|
||||
Timers remain durable in the selected backend.
|
||||
|
||||
That means:
|
||||
|
||||
- Oracle timers remain durable in AQ
|
||||
- PostgreSQL timers remain durable in PostgreSQL tables
|
||||
- Mongo timers remain durable in Mongo collections
|
||||
|
||||
Redis does not become the timer authority.
|
||||
|
||||
Current practical rule:
|
||||
|
||||
- if timer behavior is a primary concern, prefer the native signal driver for the selected backend
|
||||
|
||||
Reason:
|
||||
|
||||
- Redis wake currently optimizes wake notification, not durable due-time ownership
|
||||
- delayed messages still live in the backend store
|
||||
- due-time wake precision in Redis mode is still bounded by the driver wait policy rather than a separate Redis-native timer authority
|
||||
|
||||
## What Must Not Be Mixed
|
||||
|
||||
Do not mix durable responsibilities across systems.
|
||||
|
||||
Bad combinations:
|
||||
|
||||
- Oracle runtime state with PostgreSQL signals
|
||||
- PostgreSQL runtime state with Redis as the durable signal queue
|
||||
- Mongo runtime state with Oracle scheduling
|
||||
- one backend for runtime state and another backend for projections
|
||||
|
||||
Use one backend profile per deployment.
|
||||
|
||||
The only supported cross-system split is:
|
||||
|
||||
- durable backend
|
||||
- optional Redis wake driver
|
||||
|
||||
## Operational Decision Matrix
|
||||
|
||||
| Goal | Recommended profile |
|
||||
| --- | --- |
|
||||
| strongest production default today | `Oracle + Native` |
|
||||
| best non-Oracle relational target | `Postgres + Native` |
|
||||
| one uniform wake substrate across relational backends | `Postgres + Redis` |
|
||||
| highest measured synthetic wake and throughput | `Mongo + Native` |
|
||||
| Mongo with forced Redis standardization | `Mongo + Redis`, only if policy requires it |
|
||||
| Oracle with forced Redis standardization | `Oracle + Redis`, only if policy requires it |
|
||||
|
||||
## Configuration Surface
|
||||
|
||||
### Oracle + Native
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowBackend": {
|
||||
"Provider": "Oracle"
|
||||
},
|
||||
"WorkflowSignalDriver": {
|
||||
"Provider": "Native"
|
||||
},
|
||||
"WorkflowAq": {
|
||||
"QueueOwner": "SRD_WFKLW",
|
||||
"SignalQueueName": "WF_SIGNAL_Q",
|
||||
"ScheduleQueueName": "WF_SCHEDULE_Q",
|
||||
"DeadLetterQueueName": "WF_DLQ_Q"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Oracle + Redis
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowBackend": {
|
||||
"Provider": "Oracle"
|
||||
},
|
||||
"WorkflowSignalDriver": {
|
||||
"Provider": "Redis",
|
||||
"Redis": {
|
||||
"ChannelName": "serdica:workflow:signals",
|
||||
"BlockingWaitSeconds": 5
|
||||
}
|
||||
},
|
||||
"WorkflowAq": {
|
||||
"QueueOwner": "SRD_WFKLW",
|
||||
"SignalQueueName": "WF_SIGNAL_Q",
|
||||
"ScheduleQueueName": "WF_SCHEDULE_Q",
|
||||
"DeadLetterQueueName": "WF_DLQ_Q"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Postgres + Native
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowBackend": {
|
||||
"Provider": "Postgres",
|
||||
"Postgres": {
|
||||
"ConnectionStringName": "WorkflowPostgres",
|
||||
"SchemaName": "srd_wfklw",
|
||||
"ClaimBatchSize": 32,
|
||||
"BlockingWaitSeconds": 30
|
||||
}
|
||||
},
|
||||
"WorkflowSignalDriver": {
|
||||
"Provider": "Native"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Postgres + Redis
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowBackend": {
|
||||
"Provider": "Postgres",
|
||||
"Postgres": {
|
||||
"ConnectionStringName": "WorkflowPostgres",
|
||||
"SchemaName": "srd_wfklw"
|
||||
}
|
||||
},
|
||||
"WorkflowSignalDriver": {
|
||||
"Provider": "Redis",
|
||||
"Redis": {
|
||||
"ChannelName": "serdica:workflow:signals",
|
||||
"BlockingWaitSeconds": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Mongo + Native
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowBackend": {
|
||||
"Provider": "Mongo",
|
||||
"Mongo": {
|
||||
"ConnectionStringName": "WorkflowMongo",
|
||||
"DatabaseName": "serdica_workflow_store",
|
||||
"BlockingWaitSeconds": 30
|
||||
}
|
||||
},
|
||||
"WorkflowSignalDriver": {
|
||||
"Provider": "Native"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Mongo + Redis
|
||||
|
||||
```json
|
||||
{
|
||||
"WorkflowBackend": {
|
||||
"Provider": "Mongo",
|
||||
"Mongo": {
|
||||
"ConnectionStringName": "WorkflowMongo",
|
||||
"DatabaseName": "serdica_workflow_store"
|
||||
}
|
||||
},
|
||||
"WorkflowSignalDriver": {
|
||||
"Provider": "Redis",
|
||||
"Redis": {
|
||||
"ChannelName": "serdica:workflow:signals",
|
||||
"BlockingWaitSeconds": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Plugin Registration Rule
|
||||
|
||||
The host stays backend-neutral.
|
||||
|
||||
That means the selected backend and optional Redis wake plugin must be present in `PluginsConfig:PluginsOrder`.
|
||||
|
||||
Relevant plugin categories are:
|
||||
|
||||
- Oracle backend plugin
|
||||
- PostgreSQL backend plugin
|
||||
- MongoDB backend plugin
|
||||
- Redis wake-driver plugin
|
||||
|
||||
If Redis is not configured, do not register it just because it exists.
|
||||
|
||||
## Recommended Decision Order
|
||||
|
||||
When choosing a deployment profile, use this order:
|
||||
|
||||
1. choose the durable backend based on correctness and platform ownership
|
||||
2. choose the native signal driver first
|
||||
3. add Redis only if there is a clear topology or operational reason
|
||||
4. validate the choice against the six-profile matrix, not assumption
|
||||
|
||||
## Current Bottom Line
|
||||
|
||||
Today the practical recommendation is:
|
||||
|
||||
- `Oracle + Native` for the strongest default production backend
|
||||
- `Postgres + Native` for the best relational portability target
|
||||
- `Mongo + Native` only when Mongo operational assumptions are explicitly accepted
|
||||
- `Redis` as an optional wake standardization layer, not as the default performance answer
|
||||
|
||||
85
docs/workflow/engine/index.md
Normal file
85
docs/workflow/engine/index.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# Serdica Workflow Engine Architecture
|
||||
|
||||
## Purpose
|
||||
|
||||
This folder defines the target architecture for the workflow engine runtime.
|
||||
|
||||
The design in this folder assumes:
|
||||
|
||||
- authored C# workflow classes remain the source of truth
|
||||
- workflows are already fully declarative and can be compiled to canonical definitions
|
||||
- the service will run a single engine provider per deployment
|
||||
- migration and concurrent engine execution are out of scope for v1
|
||||
- Oracle is the durable system of record
|
||||
- Oracle Advanced Queuing is the default signaling and scheduling backend
|
||||
- Redis is optional and not on the correctness path
|
||||
|
||||
This package is intentionally detailed. It documents engine behavior, persistence, signaling, and runtime structure. Platform transport and command-mapping details are outside its scope.
|
||||
|
||||
## Reading Order
|
||||
|
||||
1. [01-requirements-and-principles.md](01-requirements-and-principles.md)
|
||||
2. [02-runtime-and-component-architecture.md](02-runtime-and-component-architecture.md)
|
||||
3. [03-canonical-execution-model.md](03-canonical-execution-model.md)
|
||||
4. [04-persistence-signaling-and-scheduling.md](04-persistence-signaling-and-scheduling.md)
|
||||
5. [05-service-surface-hosting-and-operations.md](05-service-surface-hosting-and-operations.md)
|
||||
6. [06-implementation-structure.md](06-implementation-structure.md)
|
||||
7. [07-sprint-plan.md](07-sprint-plan.md)
|
||||
8. [08-load-and-performance-plan.md](08-load-and-performance-plan.md)
|
||||
9. [09-backend-portability-plan.md](09-backend-portability-plan.md)
|
||||
10. [10-oracle-performance-baseline-2026-03-17.md](10-oracle-performance-baseline-2026-03-17.md)
|
||||
11. [11-postgres-performance-baseline-2026-03-17.md](11-postgres-performance-baseline-2026-03-17.md)
|
||||
12. [12-mongo-performance-baseline-2026-03-17.md](12-mongo-performance-baseline-2026-03-17.md)
|
||||
13. [13-backend-comparison-2026-03-17.md](13-backend-comparison-2026-03-17.md)
|
||||
14. [14-signal-driver-backend-matrix-2026-03-17.md](14-signal-driver-backend-matrix-2026-03-17.md)
|
||||
15. [15-backend-and-signal-driver-usage.md](15-backend-and-signal-driver-usage.md)
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The engine is designed around six core decisions:
|
||||
|
||||
1. Workflow execution moves from the earlier runtime to a canonical interpreter owned by the engine.
|
||||
2. The interpreter executes canonical workflow definitions compiled from authored C# workflows.
|
||||
3. Oracle remains the single durable source of truth for workflow runtime state, projections, and host coordination.
|
||||
4. Oracle AQ provides durable signaling and scheduling with blocking dequeue semantics, which removes polling from the steady-state engine path.
|
||||
5. The engine uses a run-to-wait model: an instance is loaded, advanced until the next wait boundary, persisted, and released. No node permanently owns an instance.
|
||||
6. The workflow service surface remains stable. The engine is a runtime replacement, not a transport or UI rewrite.
|
||||
|
||||
## Scope Summary
|
||||
|
||||
### In Scope
|
||||
|
||||
- start workflow
|
||||
- activate human tasks
|
||||
- assign, release, and complete tasks
|
||||
- execute canonical transport calls
|
||||
- execute canonical conditions, loops, branches, and subworkflows
|
||||
- schedule delayed resumes and retry wakes without polling
|
||||
- resume safely after node, service, or full-cluster restart
|
||||
- support multi-instance deployment with one shared database
|
||||
- preserve read-side service contracts, projections, authorization, diagrams, retention, and canonical inspection
|
||||
|
||||
### Out of Scope
|
||||
|
||||
- concurrent old-runtime and engine execution
|
||||
- in-place instance migration between engines
|
||||
- Redis as a correctness-grade signaling dependency
|
||||
- user-facing workflow authoring changes
|
||||
- replacing the public workflow service surface
|
||||
|
||||
## Product Position
|
||||
|
||||
At the workflow-service boundary, the runtime still has to support:
|
||||
|
||||
- workflow and task operations
|
||||
- the task inbox and assignment system
|
||||
- the workflow diagram provider
|
||||
- canonical definition and validation access
|
||||
- the operational retention owner
|
||||
|
||||
The engine is a runtime subsystem under the workflow service, not a separate product.
|
||||
|
||||
## Design Baseline
|
||||
|
||||
The engine architecture in this folder should be treated as the default implementation plan unless a later ADR explicitly replaces part of it.
|
||||
|
||||
30
docs/workflow/tutorials/01-hello-world/README.md
Normal file
30
docs/workflow/tutorials/01-hello-world/README.md
Normal file
@@ -0,0 +1,30 @@
|
||||
# Tutorial 1: Hello World
|
||||
|
||||
The simplest possible workflow: initialize state from a start request, activate a single human task, and complete the workflow when the task is done.
|
||||
|
||||
## Concepts Introduced
|
||||
|
||||
- `IDeclarativeWorkflow<T>` — the contract every workflow implements
|
||||
- `WorkflowSpec.For<T>()` — the builder entry point
|
||||
- `.InitializeState()` — transforms the start request into workflow state
|
||||
- `.StartWith(task)` — sets the first task to activate
|
||||
- `WorkflowHumanTask.For<T>()` — defines a human task
|
||||
- `.OnComplete(flow => flow.Complete())` — terminal step
|
||||
|
||||
## What Happens at Runtime
|
||||
|
||||
1. Client calls `StartWorkflowAsync` with `WorkflowName = "Greeting"` and payload `{ "customerName": "John" }`
|
||||
2. State initializes to `{ "customerName": "John" }`
|
||||
3. Task "Greet Customer" is created with status "Pending"
|
||||
4. A user assigns the task to themselves, then completes it
|
||||
5. `OnComplete` executes `.Complete()` — the workflow finishes
|
||||
|
||||
## Variants
|
||||
|
||||
- [C# Fluent DSL](csharp/)
|
||||
- [Canonical JSON](json/)
|
||||
|
||||
## Next
|
||||
|
||||
[Tutorial 2: Service Tasks](../02-service-tasks/) — call external services before or after human tasks.
|
||||
|
||||
@@ -0,0 +1,58 @@
|
||||
using System.Collections.Generic;
|
||||
using System.Text.Json;
|
||||
|
||||
using WorkflowEngine.Abstractions;
|
||||
using WorkflowEngine.Contracts;
|
||||
|
||||
namespace WorkflowEngine.Tutorials;
|
||||
|
||||
// Start request — defines the input contract for the workflow.
|
||||
public sealed class GreetingRequest
|
||||
{
|
||||
public string CustomerName { get; set; } = string.Empty;
|
||||
}
|
||||
|
||||
// Workflow definition — implements IDeclarativeWorkflow<TStartRequest>.
|
||||
public sealed class GreetingWorkflow : IDeclarativeWorkflow<GreetingRequest>
|
||||
{
|
||||
// Identity: name + version uniquely identify the workflow definition.
|
||||
public string WorkflowName => "Greeting";
|
||||
public string WorkflowVersion => "1.0.0";
|
||||
public string DisplayName => "Customer Greeting";
|
||||
|
||||
// Roles: which user roles can see and interact with this workflow's tasks.
|
||||
public IReadOnlyCollection<string> WorkflowRoles => ["DBA", "UR_AGENT"];
|
||||
|
||||
// Spec: the workflow specification built via the fluent DSL.
|
||||
public WorkflowSpec<GreetingRequest> Spec { get; } = WorkflowSpec
|
||||
.For<GreetingRequest>()
|
||||
|
||||
// InitializeState: transform the start request into the workflow's mutable state.
|
||||
// State is a Dictionary<string, JsonElement> — all values are JSON-serialized.
|
||||
.InitializeState(request => new Dictionary<string, JsonElement>
|
||||
{
|
||||
["customerName"] = JsonSerializer.SerializeToElement(request.CustomerName),
|
||||
})
|
||||
|
||||
// StartWith: register and activate this task as the first step.
|
||||
.StartWith(greetTask)
|
||||
.Build();
|
||||
|
||||
// Tasks: expose task descriptors for the registration catalog.
|
||||
public IReadOnlyCollection<WorkflowTaskDescriptor> Tasks => Spec.TaskDescriptors;
|
||||
|
||||
// Task definition: defines name, type (UI component), route (navigation), and behavior.
|
||||
private static readonly WorkflowHumanTaskDefinition<GreetingRequest> greetTask =
|
||||
WorkflowHumanTask.For<GreetingRequest>(
|
||||
taskName: "Greet Customer", // unique name within this workflow
|
||||
taskType: "GreetCustomerTask", // UI component identifier
|
||||
route: "customers/greet") // navigation route
|
||||
.WithPayload(context => new Dictionary<string, JsonElement>
|
||||
{
|
||||
// Pass state values to the task's UI payload.
|
||||
["customerName"] = context.StateValues
|
||||
.GetRequired<string>("customerName").AsJsonElement(),
|
||||
})
|
||||
// OnComplete: what happens after the user completes this task.
|
||||
.OnComplete(flow => flow.Complete()); // simply end the workflow
|
||||
}
|
||||
@@ -0,0 +1,56 @@
|
||||
{
|
||||
"schemaVersion": "serdica.workflow.definition/v1",
|
||||
"workflowName": "Greeting",
|
||||
"workflowVersion": "1.0.0",
|
||||
"displayName": "Customer Greeting",
|
||||
|
||||
"startRequest": {
|
||||
"contractName": "GreetingRequest",
|
||||
"allowAdditionalProperties": true
|
||||
},
|
||||
|
||||
"workflowRoles": ["DBA", "UR_AGENT"],
|
||||
|
||||
"start": {
|
||||
"initializeStateExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{
|
||||
"name": "customerName",
|
||||
"expression": { "$type": "path", "path": "start.customerName" }
|
||||
}
|
||||
]
|
||||
},
|
||||
"sequence": {
|
||||
"steps": [
|
||||
{
|
||||
"$type": "activate-task",
|
||||
"taskName": "Greet Customer"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
|
||||
"tasks": [
|
||||
{
|
||||
"taskName": "Greet Customer",
|
||||
"taskType": "GreetCustomerTask",
|
||||
"routeExpression": { "$type": "string", "value": "customers/greet" },
|
||||
"taskRoles": [],
|
||||
"payloadExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{
|
||||
"name": "customerName",
|
||||
"expression": { "$type": "path", "path": "state.customerName" }
|
||||
}
|
||||
]
|
||||
},
|
||||
"onCompleteSequence": {
|
||||
"steps": [
|
||||
{ "$type": "complete" }
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
29
docs/workflow/tutorials/02-service-tasks/README.md
Normal file
29
docs/workflow/tutorials/02-service-tasks/README.md
Normal file
@@ -0,0 +1,29 @@
|
||||
# Tutorial 2: Service Tasks
|
||||
|
||||
Call external services (microservices, HTTP APIs, GraphQL, RabbitMQ) from within a workflow. Handle failures and timeouts gracefully.
|
||||
|
||||
## Concepts Introduced
|
||||
|
||||
- `.Call()` — invoke a transport with payload and optional response capture
|
||||
- Address types — `LegacyRabbit`, `Microservice`, `Http`, `Graphql`, `Rabbit`
|
||||
- `resultKey` — store the service response in workflow state
|
||||
- `whenFailure` / `whenTimeout` — recovery branches
|
||||
- `WorkflowHandledBranchAction.Complete` — shorthand for "complete on error"
|
||||
- `timeoutSeconds` — per-step timeout override (default: 1 hour)
|
||||
|
||||
## Key Points
|
||||
|
||||
- Each `Call` step executes synchronously within the workflow
|
||||
- The per-step timeout wraps the entire call including transport-level retries
|
||||
- Transport timeouts (30s default) control individual attempt duration
|
||||
- If no failure/timeout handler is defined, the error propagates and the signal pump retries
|
||||
|
||||
## Variants
|
||||
|
||||
- [C# Fluent DSL](csharp/)
|
||||
- [Canonical JSON](json/)
|
||||
|
||||
## Next
|
||||
|
||||
[Tutorial 3: Decisions](../03-decisions/) — branch workflow logic based on conditions.
|
||||
|
||||
@@ -0,0 +1,83 @@
|
||||
using System.Collections.Generic;
|
||||
using System.Text.Json;
|
||||
|
||||
using WorkflowEngine.Abstractions;
|
||||
using WorkflowEngine.Contracts;
|
||||
|
||||
namespace WorkflowEngine.Tutorials;
|
||||
|
||||
public sealed class PolicyValidationRequest
|
||||
{
|
||||
public long PolicyId { get; set; }
|
||||
}
|
||||
|
||||
public sealed class PolicyValidationWorkflow : IDeclarativeWorkflow<PolicyValidationRequest>
|
||||
{
|
||||
public string WorkflowName => "PolicyValidation";
|
||||
public string WorkflowVersion => "1.0.0";
|
||||
public string DisplayName => "Policy Validation";
|
||||
public IReadOnlyCollection<string> WorkflowRoles => ["DBA"];
|
||||
|
||||
public WorkflowSpec<PolicyValidationRequest> Spec { get; } = WorkflowSpec
|
||||
.For<PolicyValidationRequest>()
|
||||
.InitializeState(WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("start.policyId"))))
|
||||
.StartWith(BuildFlow)
|
||||
.Build();
|
||||
|
||||
public IReadOnlyCollection<WorkflowTaskDescriptor> Tasks => Spec.TaskDescriptors;
|
||||
|
||||
private static void BuildFlow(WorkflowFlowBuilder<PolicyValidationRequest> flow)
|
||||
{
|
||||
flow
|
||||
// --- Example 1: Simple call with shorthand error handling ---
|
||||
.Call(
|
||||
"Validate Policy", // step name
|
||||
Address.LegacyRabbit("pas_policy_validate"), // transport address
|
||||
WorkflowExpr.Object( // payload (expression-based)
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("state.policyId"))),
|
||||
WorkflowHandledBranchAction.Complete, // on failure: complete workflow
|
||||
WorkflowHandledBranchAction.Complete) // on timeout: complete workflow
|
||||
|
||||
// --- Example 2: Call with typed response stored in state ---
|
||||
.Call<object>(
|
||||
"Load Policy Info",
|
||||
Address.LegacyRabbit("pas_get_policy_product_info"),
|
||||
WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("state.policyId"))),
|
||||
WorkflowHandledBranchAction.Complete,
|
||||
WorkflowHandledBranchAction.Complete,
|
||||
resultKey: "policyInfo") // store response as "policyInfo"
|
||||
|
||||
// Use the result to set state values
|
||||
.SetIfHasValue("productCode",
|
||||
WorkflowExpr.Func("upper", WorkflowExpr.Path("result.policyInfo.productCode")))
|
||||
.SetIfHasValue("lob",
|
||||
WorkflowExpr.Path("result.policyInfo.lob"))
|
||||
|
||||
// --- Example 3: Call with custom failure/timeout branches ---
|
||||
.Call(
|
||||
"Calculate Premium",
|
||||
Address.LegacyRabbit("pas_premium_calculate_for_object",
|
||||
SerdicaLegacyRabbitMode.MicroserviceConsumer),
|
||||
WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("state.policyId"))),
|
||||
whenFailure: fail => fail
|
||||
.Set("calculationFailed", WorkflowExpr.Bool(true))
|
||||
.Complete(),
|
||||
whenTimeout: timeout => timeout
|
||||
.Set("calculationTimedOut", WorkflowExpr.Bool(true))
|
||||
.Complete(),
|
||||
timeoutSeconds: 120) // per-step timeout: 2 minutes
|
||||
|
||||
// --- Example 4: HTTP transport ---
|
||||
// .Call("Notify External",
|
||||
// Address.Http("authority", "/api/v1/notifications", "POST"),
|
||||
// WorkflowExpr.Object(
|
||||
// WorkflowExpr.Prop("message", WorkflowExpr.String("Policy validated"))),
|
||||
// WorkflowHandledBranchAction.Complete,
|
||||
// WorkflowHandledBranchAction.Complete)
|
||||
|
||||
.Complete();
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,89 @@
|
||||
{
|
||||
"schemaVersion": "serdica.workflow.definition/v1",
|
||||
"workflowName": "PolicyValidation",
|
||||
"workflowVersion": "1.0.0",
|
||||
"displayName": "Policy Validation",
|
||||
"workflowRoles": ["DBA"],
|
||||
|
||||
"start": {
|
||||
"initializeStateExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "start.policyId" } }
|
||||
]
|
||||
},
|
||||
"sequence": {
|
||||
"steps": [
|
||||
{
|
||||
"$type": "call-transport",
|
||||
"stepName": "Validate Policy",
|
||||
"invocation": {
|
||||
"address": { "$type": "legacy-rabbit", "command": "pas_policy_validate", "mode": "Envelope" },
|
||||
"payloadExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "state.policyId" } }
|
||||
]
|
||||
}
|
||||
},
|
||||
"whenFailure": { "steps": [{ "$type": "complete" }] },
|
||||
"whenTimeout": { "steps": [{ "$type": "complete" }] }
|
||||
},
|
||||
{
|
||||
"$type": "call-transport",
|
||||
"stepName": "Load Policy Info",
|
||||
"resultKey": "policyInfo",
|
||||
"invocation": {
|
||||
"address": { "$type": "legacy-rabbit", "command": "pas_get_policy_product_info", "mode": "Envelope" },
|
||||
"payloadExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "state.policyId" } }
|
||||
]
|
||||
}
|
||||
},
|
||||
"whenFailure": { "steps": [{ "$type": "complete" }] },
|
||||
"whenTimeout": { "steps": [{ "$type": "complete" }] }
|
||||
},
|
||||
{
|
||||
"$type": "set-state",
|
||||
"stateKey": "productCode",
|
||||
"valueExpression": {
|
||||
"$type": "function",
|
||||
"functionName": "upper",
|
||||
"arguments": [{ "$type": "path", "path": "result.policyInfo.productCode" }]
|
||||
},
|
||||
"onlyIfPresent": true
|
||||
},
|
||||
{
|
||||
"$type": "call-transport",
|
||||
"stepName": "Calculate Premium",
|
||||
"timeoutSeconds": 120,
|
||||
"invocation": {
|
||||
"address": { "$type": "legacy-rabbit", "command": "pas_premium_calculate_for_object", "mode": "MicroserviceConsumer" },
|
||||
"payloadExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "state.policyId" } }
|
||||
]
|
||||
}
|
||||
},
|
||||
"whenFailure": {
|
||||
"steps": [
|
||||
{ "$type": "set-state", "stateKey": "calculationFailed", "valueExpression": { "$type": "boolean", "value": true } },
|
||||
{ "$type": "complete" }
|
||||
]
|
||||
},
|
||||
"whenTimeout": {
|
||||
"steps": [
|
||||
{ "$type": "set-state", "stateKey": "calculationTimedOut", "valueExpression": { "$type": "boolean", "value": true } },
|
||||
{ "$type": "complete" }
|
||||
]
|
||||
}
|
||||
},
|
||||
{ "$type": "complete" }
|
||||
]
|
||||
}
|
||||
},
|
||||
"tasks": []
|
||||
}
|
||||
28
docs/workflow/tutorials/03-decisions/README.md
Normal file
28
docs/workflow/tutorials/03-decisions/README.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# Tutorial 3: Decisions
|
||||
|
||||
Branch workflow logic based on conditions — state values, payload answers, or complex expressions.
|
||||
|
||||
## Concepts Introduced
|
||||
|
||||
- `.WhenExpression()` — branch on any boolean expression
|
||||
- `.WhenStateFlag()` — shorthand for checking a boolean state value
|
||||
- `.WhenPayloadEquals()` — shorthand for checking a task completion payload value
|
||||
- Nested decisions — decisions inside decisions for complex routing
|
||||
|
||||
## Decision Types
|
||||
|
||||
| Method | Use When |
|
||||
|--------|----------|
|
||||
| `WhenExpression` | Complex conditions (comparisons, boolean logic, function calls) |
|
||||
| `WhenStateFlag` | Checking a boolean state key against true/false |
|
||||
| `WhenPayloadEquals` | Checking a task completion answer (inside OnComplete) |
|
||||
|
||||
## Variants
|
||||
|
||||
- [C# Fluent DSL](csharp/)
|
||||
- [Canonical JSON](json/)
|
||||
|
||||
## Next
|
||||
|
||||
[Tutorial 4: Human Tasks](../04-human-tasks/) — approve/reject patterns with OnComplete flows.
|
||||
|
||||
@@ -0,0 +1,72 @@
|
||||
using System.Collections.Generic;
|
||||
|
||||
using WorkflowEngine.Abstractions;
|
||||
using WorkflowEngine.Contracts;
|
||||
|
||||
namespace WorkflowEngine.Tutorials;
|
||||
|
||||
public sealed class PolicyRoutingRequest
|
||||
{
|
||||
public long PolicyId { get; set; }
|
||||
public string AnnexType { get; set; } = string.Empty;
|
||||
public bool PolicyExistsOnIPAL { get; set; }
|
||||
}
|
||||
|
||||
public sealed class PolicyRoutingWorkflow : IDeclarativeWorkflow<PolicyRoutingRequest>
|
||||
{
|
||||
public string WorkflowName => "PolicyRouting";
|
||||
public string WorkflowVersion => "1.0.0";
|
||||
public string DisplayName => "Policy Routing Example";
|
||||
public IReadOnlyCollection<string> WorkflowRoles => ["DBA"];
|
||||
|
||||
public WorkflowSpec<PolicyRoutingRequest> Spec { get; } = WorkflowSpec
|
||||
.For<PolicyRoutingRequest>()
|
||||
.InitializeState(WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("start.policyId")),
|
||||
WorkflowExpr.Prop("annexType", WorkflowExpr.Path("start.annexType")),
|
||||
WorkflowExpr.Prop("policyExistsOnIPAL",
|
||||
WorkflowExpr.Func("coalesce",
|
||||
WorkflowExpr.Path("start.policyExistsOnIPAL"),
|
||||
WorkflowExpr.Bool(true)))))
|
||||
.StartWith(BuildFlow)
|
||||
.Build();
|
||||
|
||||
public IReadOnlyCollection<WorkflowTaskDescriptor> Tasks => Spec.TaskDescriptors;
|
||||
|
||||
private static void BuildFlow(WorkflowFlowBuilder<PolicyRoutingRequest> flow)
|
||||
{
|
||||
// --- Example 1: State flag decision (boolean shorthand) ---
|
||||
flow.WhenStateFlag(
|
||||
"policyExistsOnIPAL", // state key to check
|
||||
true, // expected value
|
||||
"Policy exists on IPAL?", // decision name (appears in diagram)
|
||||
whenTrue: ipal => ipal
|
||||
|
||||
// --- Example 2: Expression decision ---
|
||||
.WhenExpression(
|
||||
"Annex Type?",
|
||||
WorkflowExpr.Eq(
|
||||
WorkflowExpr.Func("upper", WorkflowExpr.Path("state.annexType")),
|
||||
WorkflowExpr.String("BENEF")),
|
||||
benefit => benefit
|
||||
.Set("route", WorkflowExpr.String("BENEFIT_PROCESSING"))
|
||||
.Complete(),
|
||||
|
||||
// --- Example 3: Nested decision ---
|
||||
other => other.WhenExpression(
|
||||
"Is Equipment?",
|
||||
WorkflowExpr.Eq(
|
||||
WorkflowExpr.Func("upper", WorkflowExpr.Path("state.annexType")),
|
||||
WorkflowExpr.String("ADDEQ")),
|
||||
equipment => equipment
|
||||
.Set("route", WorkflowExpr.String("EQUIPMENT_PROCESSING"))
|
||||
.Complete(),
|
||||
cover => cover
|
||||
.Set("route", WorkflowExpr.String("COVER_CHANGE"))
|
||||
.Complete())),
|
||||
|
||||
whenElse: notIpal => notIpal
|
||||
.Set("route", WorkflowExpr.String("INSIS_PROCESSING"))
|
||||
.Complete());
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,83 @@
|
||||
{
|
||||
"schemaVersion": "serdica.workflow.definition/v1",
|
||||
"workflowName": "PolicyRouting",
|
||||
"workflowVersion": "1.0.0",
|
||||
"displayName": "Policy Routing Example",
|
||||
"workflowRoles": ["DBA"],
|
||||
"start": {
|
||||
"initializeStateExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "start.policyId" } },
|
||||
{ "name": "annexType", "expression": { "$type": "path", "path": "start.annexType" } },
|
||||
{ "name": "policyExistsOnIPAL", "expression": {
|
||||
"$type": "function", "functionName": "coalesce",
|
||||
"arguments": [
|
||||
{ "$type": "path", "path": "start.policyExistsOnIPAL" },
|
||||
{ "$type": "boolean", "value": true }
|
||||
]
|
||||
}}
|
||||
]
|
||||
},
|
||||
"sequence": {
|
||||
"steps": [
|
||||
{
|
||||
"$type": "decision",
|
||||
"decisionName": "Policy exists on IPAL?",
|
||||
"conditionExpression": { "$type": "path", "path": "state.policyExistsOnIPAL" },
|
||||
"whenTrue": {
|
||||
"steps": [
|
||||
{
|
||||
"$type": "decision",
|
||||
"decisionName": "Annex Type?",
|
||||
"conditionExpression": {
|
||||
"$type": "binary", "operator": "eq",
|
||||
"left": { "$type": "function", "functionName": "upper", "arguments": [{ "$type": "path", "path": "state.annexType" }] },
|
||||
"right": { "$type": "string", "value": "BENEF" }
|
||||
},
|
||||
"whenTrue": {
|
||||
"steps": [
|
||||
{ "$type": "set-state", "stateKey": "route", "valueExpression": { "$type": "string", "value": "BENEFIT_PROCESSING" } },
|
||||
{ "$type": "complete" }
|
||||
]
|
||||
},
|
||||
"whenElse": {
|
||||
"steps": [
|
||||
{
|
||||
"$type": "decision",
|
||||
"decisionName": "Is Equipment?",
|
||||
"conditionExpression": {
|
||||
"$type": "binary", "operator": "eq",
|
||||
"left": { "$type": "function", "functionName": "upper", "arguments": [{ "$type": "path", "path": "state.annexType" }] },
|
||||
"right": { "$type": "string", "value": "ADDEQ" }
|
||||
},
|
||||
"whenTrue": {
|
||||
"steps": [
|
||||
{ "$type": "set-state", "stateKey": "route", "valueExpression": { "$type": "string", "value": "EQUIPMENT_PROCESSING" } },
|
||||
{ "$type": "complete" }
|
||||
]
|
||||
},
|
||||
"whenElse": {
|
||||
"steps": [
|
||||
{ "$type": "set-state", "stateKey": "route", "valueExpression": { "$type": "string", "value": "COVER_CHANGE" } },
|
||||
{ "$type": "complete" }
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
"whenElse": {
|
||||
"steps": [
|
||||
{ "$type": "set-state", "stateKey": "route", "valueExpression": { "$type": "string", "value": "INSIS_PROCESSING" } },
|
||||
{ "$type": "complete" }
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"tasks": []
|
||||
}
|
||||
34
docs/workflow/tutorials/04-human-tasks/README.md
Normal file
34
docs/workflow/tutorials/04-human-tasks/README.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# Tutorial 4: Human Tasks with OnComplete Flows
|
||||
|
||||
The approve/reject pattern — the most common human task flow in insurance workflows.
|
||||
|
||||
## Concepts Introduced
|
||||
|
||||
- `WorkflowHumanTask.For<T>()` — define a task with name, type, route, and roles
|
||||
- `.WithPayload()` — data sent to the UI when the task is displayed
|
||||
- `.WithTimeout(seconds)` — optional deadline for the task
|
||||
- `.WithRoles()` — restrict which roles can interact with this task
|
||||
- `.OnComplete(flow => ...)` — sequence executed after user completes the task
|
||||
- `.ActivateTask()` — pause workflow and wait for user action
|
||||
- `.AddTask()` — register a task in the workflow spec (separate from activation)
|
||||
- Re-activation — send the user back to the same task on validation failure
|
||||
|
||||
## Approve/Reject Pattern
|
||||
|
||||
1. Workflow starts, runs some service tasks
|
||||
2. `.ActivateTask("Approve")` — workflow pauses
|
||||
3. User sees the task in their inbox, assigns it, submits an answer
|
||||
4. `.OnComplete` checks `payload.answer`:
|
||||
- `"approve"` — run confirmation operations, convert to policy
|
||||
- `"reject"` — cancel the application
|
||||
5. If operations fail, re-activate the same task for correction
|
||||
|
||||
## Variants
|
||||
|
||||
- [C# Fluent DSL](csharp/)
|
||||
- [Canonical JSON](json/)
|
||||
|
||||
## Next
|
||||
|
||||
[Tutorial 5: Sub-Workflows](../05-sub-workflows/) — inline vs fire-and-forget child workflows.
|
||||
|
||||
@@ -0,0 +1,101 @@
|
||||
using System.Collections.Generic;
|
||||
using System.Text.Json;
|
||||
|
||||
using WorkflowEngine.Abstractions;
|
||||
using WorkflowEngine.Contracts;
|
||||
|
||||
namespace WorkflowEngine.Tutorials;
|
||||
|
||||
public sealed class ApprovalRequest
|
||||
{
|
||||
public long PolicyId { get; set; }
|
||||
public long AnnexId { get; set; }
|
||||
}
|
||||
|
||||
public sealed class ApprovalWorkflow : IDeclarativeWorkflow<ApprovalRequest>
|
||||
{
|
||||
public string WorkflowName => "ApprovalExample";
|
||||
public string WorkflowVersion => "1.0.0";
|
||||
public string DisplayName => "Approval Example";
|
||||
public IReadOnlyCollection<string> WorkflowRoles => ["DBA", "UR_UNDERWRITER"];
|
||||
|
||||
public WorkflowSpec<ApprovalRequest> Spec { get; } = WorkflowSpec
|
||||
.For<ApprovalRequest>()
|
||||
.InitializeState(WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("start.policyId")),
|
||||
WorkflowExpr.Prop("annexId", WorkflowExpr.Path("start.annexId"))))
|
||||
|
||||
// Register the task definition (separate from activation).
|
||||
.AddTask(approveTask)
|
||||
|
||||
// Start flow: validate, then activate the approval task.
|
||||
.StartWith(flow => flow
|
||||
.Call("Validate",
|
||||
Address.LegacyRabbit("pas_policy_validate"),
|
||||
WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("state.policyId"))),
|
||||
WorkflowHandledBranchAction.Complete,
|
||||
WorkflowHandledBranchAction.Complete)
|
||||
.ActivateTask("Approve Policy")) // pauses here
|
||||
.Build();
|
||||
|
||||
public IReadOnlyCollection<WorkflowTaskDescriptor> Tasks => Spec.TaskDescriptors;
|
||||
|
||||
// Define the human task with roles, payload, optional deadline, and OnComplete flow.
|
||||
private static readonly WorkflowHumanTaskDefinition<ApprovalRequest> approveTask =
|
||||
WorkflowHumanTask.For<ApprovalRequest>(
|
||||
"Approve Policy", // task name
|
||||
"PolicyApproval", // task type (UI component)
|
||||
"business/policies", // route
|
||||
taskRoles: ["UR_UNDERWRITER"]) // only underwriters
|
||||
.WithPayload(WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("state.policyId")),
|
||||
WorkflowExpr.Prop("annexId", WorkflowExpr.Path("state.annexId"))))
|
||||
.WithTimeout(86400) // 24-hour deadline (optional)
|
||||
.OnComplete(BuildApprovalFlow);
|
||||
|
||||
private static void BuildApprovalFlow(WorkflowFlowBuilder<ApprovalRequest> flow)
|
||||
{
|
||||
flow
|
||||
// Store the user's answer in state for auditability.
|
||||
.Set("answer", WorkflowExpr.Path("payload.answer"))
|
||||
|
||||
// Branch on the answer.
|
||||
.WhenPayloadEquals("answer", "reject", "Rejected?",
|
||||
rejected => rejected
|
||||
.Call("Cancel Application",
|
||||
Address.LegacyRabbit("pas_annexprocessing_cancelaplorqt"),
|
||||
WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("state.policyId"))),
|
||||
WorkflowHandledBranchAction.Complete,
|
||||
WorkflowHandledBranchAction.Complete)
|
||||
.Complete(),
|
||||
|
||||
approved => approved
|
||||
.Call<object>("Perform Operations",
|
||||
Address.LegacyRabbit("pas_operations_perform",
|
||||
SerdicaLegacyRabbitMode.MicroserviceConsumer),
|
||||
WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("state.policyId")),
|
||||
WorkflowExpr.Prop("stages", WorkflowExpr.Array(
|
||||
WorkflowExpr.String("UNDERWRITING"),
|
||||
WorkflowExpr.String("CONFIRMATION")))),
|
||||
WorkflowHandledBranchAction.Complete,
|
||||
WorkflowHandledBranchAction.Complete,
|
||||
resultKey: "operations")
|
||||
.Set("passed", WorkflowExpr.Path("result.operations.passed"))
|
||||
|
||||
.WhenStateFlag("passed", true, "Operations Passed?",
|
||||
passed => passed
|
||||
.Call("Convert To Policy",
|
||||
Address.LegacyRabbit("pas_polreg_convertapltopol"),
|
||||
WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("state.policyId"))),
|
||||
WorkflowHandledBranchAction.Complete,
|
||||
WorkflowHandledBranchAction.Complete)
|
||||
.Complete(),
|
||||
|
||||
// Operations failed: re-open the same task for the user to fix and retry.
|
||||
failed => failed.ActivateTask("Approve Policy")));
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,144 @@
|
||||
{
|
||||
"schemaVersion": "serdica.workflow.definition/v1",
|
||||
"workflowName": "ApprovalExample",
|
||||
"workflowVersion": "1.0.0",
|
||||
"displayName": "Approval Example",
|
||||
"workflowRoles": ["DBA", "UR_UNDERWRITER"],
|
||||
"start": {
|
||||
"initializeStateExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "start.policyId" } },
|
||||
{ "name": "annexId", "expression": { "$type": "path", "path": "start.annexId" } }
|
||||
]
|
||||
},
|
||||
"sequence": {
|
||||
"steps": [
|
||||
{
|
||||
"$type": "call-transport",
|
||||
"stepName": "Validate",
|
||||
"invocation": {
|
||||
"address": { "$type": "legacy-rabbit", "command": "pas_policy_validate", "mode": "Envelope" },
|
||||
"payloadExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "state.policyId" } }
|
||||
]
|
||||
}
|
||||
},
|
||||
"whenFailure": { "steps": [{ "$type": "complete" }] },
|
||||
"whenTimeout": { "steps": [{ "$type": "complete" }] }
|
||||
},
|
||||
{
|
||||
"$type": "activate-task",
|
||||
"taskName": "Approve Policy",
|
||||
"timeoutSeconds": 86400
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"tasks": [
|
||||
{
|
||||
"taskName": "Approve Policy",
|
||||
"taskType": "PolicyApproval",
|
||||
"routeExpression": { "$type": "string", "value": "business/policies" },
|
||||
"taskRoles": ["UR_UNDERWRITER"],
|
||||
"payloadExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "state.policyId" } },
|
||||
{ "name": "annexId", "expression": { "$type": "path", "path": "state.annexId" } }
|
||||
]
|
||||
},
|
||||
"onCompleteSequence": {
|
||||
"steps": [
|
||||
{ "$type": "set-state", "stateKey": "answer", "valueExpression": { "$type": "path", "path": "payload.answer" } },
|
||||
{
|
||||
"$type": "decision",
|
||||
"decisionName": "Rejected?",
|
||||
"conditionExpression": {
|
||||
"$type": "binary", "operator": "eq",
|
||||
"left": { "$type": "path", "path": "payload.answer" },
|
||||
"right": { "$type": "string", "value": "reject" }
|
||||
},
|
||||
"whenTrue": {
|
||||
"steps": [
|
||||
{
|
||||
"$type": "call-transport",
|
||||
"stepName": "Cancel Application",
|
||||
"invocation": {
|
||||
"address": { "$type": "legacy-rabbit", "command": "pas_annexprocessing_cancelaplorqt", "mode": "Envelope" },
|
||||
"payloadExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "state.policyId" } }
|
||||
]
|
||||
}
|
||||
},
|
||||
"whenFailure": { "steps": [{ "$type": "complete" }] },
|
||||
"whenTimeout": { "steps": [{ "$type": "complete" }] }
|
||||
},
|
||||
{ "$type": "complete" }
|
||||
]
|
||||
},
|
||||
"whenElse": {
|
||||
"steps": [
|
||||
{
|
||||
"$type": "call-transport",
|
||||
"stepName": "Perform Operations",
|
||||
"resultKey": "operations",
|
||||
"invocation": {
|
||||
"address": { "$type": "legacy-rabbit", "command": "pas_operations_perform", "mode": "MicroserviceConsumer" },
|
||||
"payloadExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "state.policyId" } },
|
||||
{ "name": "stages", "expression": { "$type": "array", "items": [
|
||||
{ "$type": "string", "value": "UNDERWRITING" },
|
||||
{ "$type": "string", "value": "CONFIRMATION" }
|
||||
]}}
|
||||
]
|
||||
}
|
||||
},
|
||||
"whenFailure": { "steps": [{ "$type": "complete" }] },
|
||||
"whenTimeout": { "steps": [{ "$type": "complete" }] }
|
||||
},
|
||||
{ "$type": "set-state", "stateKey": "passed", "valueExpression": { "$type": "path", "path": "result.operations.passed" } },
|
||||
{
|
||||
"$type": "decision",
|
||||
"decisionName": "Operations Passed?",
|
||||
"conditionExpression": { "$type": "path", "path": "state.passed" },
|
||||
"whenTrue": {
|
||||
"steps": [
|
||||
{
|
||||
"$type": "call-transport",
|
||||
"stepName": "Convert To Policy",
|
||||
"invocation": {
|
||||
"address": { "$type": "legacy-rabbit", "command": "pas_polreg_convertapltopol", "mode": "Envelope" },
|
||||
"payloadExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "state.policyId" } }
|
||||
]
|
||||
}
|
||||
},
|
||||
"whenFailure": { "steps": [{ "$type": "complete" }] },
|
||||
"whenTimeout": { "steps": [{ "$type": "complete" }] }
|
||||
},
|
||||
{ "$type": "complete" }
|
||||
]
|
||||
},
|
||||
"whenElse": {
|
||||
"steps": [
|
||||
{ "$type": "activate-task", "taskName": "Approve Policy" }
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
22
docs/workflow/tutorials/05-sub-workflows/README.md
Normal file
22
docs/workflow/tutorials/05-sub-workflows/README.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# Tutorial 5: Sub-Workflows & Continuations
|
||||
|
||||
Compose workflows by invoking child workflows — either inline (SubWorkflow) or fire-and-forget (ContinueWith).
|
||||
|
||||
## SubWorkflow vs ContinueWith
|
||||
|
||||
| Feature | `.SubWorkflow()` | `.ContinueWith()` |
|
||||
|---------|-----------------|-------------------|
|
||||
| Parent waits | Yes — resumes after child completes | No — parent completes immediately |
|
||||
| State flows back | Yes — child state merges into parent | No — child is independent |
|
||||
| Same instance | Yes — tasks appear under parent instance | No — new workflow instance |
|
||||
| Use when | Steps must complete before parent continues | Fire-and-forget, scheduled work |
|
||||
|
||||
## Variants
|
||||
|
||||
- [C# Fluent DSL](csharp/)
|
||||
- [Canonical JSON](json/)
|
||||
|
||||
## Next
|
||||
|
||||
[Tutorial 6: Advanced Patterns](../06-advanced-patterns/) — Fork, Repeat, Timer, External Signal.
|
||||
|
||||
@@ -0,0 +1,61 @@
|
||||
using System.Collections.Generic;
|
||||
|
||||
using WorkflowEngine.Abstractions;
|
||||
using WorkflowEngine.Contracts;
|
||||
|
||||
namespace WorkflowEngine.Tutorials;
|
||||
|
||||
public sealed class ParentWorkflow : IDeclarativeWorkflow<PolicyChangeWorkflowRequest>
|
||||
{
|
||||
public string WorkflowName => "ParentWorkflow";
|
||||
public string WorkflowVersion => "1.0.0";
|
||||
public string DisplayName => "Parent Workflow Example";
|
||||
public IReadOnlyCollection<string> WorkflowRoles => [];
|
||||
|
||||
public WorkflowSpec<PolicyChangeWorkflowRequest> Spec { get; } = WorkflowSpec
|
||||
.For<PolicyChangeWorkflowRequest>()
|
||||
.InitializeState(WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("start.policyId"))))
|
||||
.StartWith(BuildFlow)
|
||||
.Build();
|
||||
|
||||
public IReadOnlyCollection<WorkflowTaskDescriptor> Tasks => Spec.TaskDescriptors;
|
||||
|
||||
private static void BuildFlow(WorkflowFlowBuilder<PolicyChangeWorkflowRequest> flow)
|
||||
{
|
||||
flow
|
||||
.Call("Open For Change",
|
||||
Address.LegacyRabbit("pas_annexprocessing_alterpolicy"),
|
||||
WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("state.policyId"))),
|
||||
WorkflowHandledBranchAction.Complete,
|
||||
WorkflowHandledBranchAction.Complete)
|
||||
|
||||
// --- SubWorkflow: inline execution, parent waits ---
|
||||
// The child workflow runs within this execution.
|
||||
// Its tasks appear under the parent instance.
|
||||
// State from the child merges back into the parent after completion.
|
||||
.SubWorkflow(
|
||||
"Review Policy Changes",
|
||||
new WorkflowWorkflowInvocationDeclaration
|
||||
{
|
||||
WorkflowName = "ReviewPolicyOpenForChange",
|
||||
PayloadExpression = WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("state.policyId")),
|
||||
WorkflowExpr.Prop("productCode", WorkflowExpr.Path("state.productCode"))),
|
||||
})
|
||||
// Execution resumes here after child completes.
|
||||
|
||||
// --- ContinueWith: fire-and-forget ---
|
||||
// The parent workflow completes immediately.
|
||||
// A new independent workflow instance is created via the signal bus.
|
||||
.ContinueWith(
|
||||
"Start Transfer Process",
|
||||
new WorkflowWorkflowInvocationDeclaration
|
||||
{
|
||||
WorkflowName = "TransferPolicy",
|
||||
PayloadExpression = WorkflowExpr.Path("state"),
|
||||
});
|
||||
// Parent is now complete. TransferPolicy runs independently.
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,57 @@
|
||||
{
|
||||
"schemaVersion": "serdica.workflow.definition/v1",
|
||||
"workflowName": "ParentWorkflow",
|
||||
"workflowVersion": "1.0.0",
|
||||
"displayName": "Parent Workflow Example",
|
||||
"workflowRoles": [],
|
||||
"start": {
|
||||
"initializeStateExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "start.policyId" } }
|
||||
]
|
||||
},
|
||||
"sequence": {
|
||||
"steps": [
|
||||
{
|
||||
"$type": "call-transport",
|
||||
"stepName": "Open For Change",
|
||||
"invocation": {
|
||||
"address": { "$type": "legacy-rabbit", "command": "pas_annexprocessing_alterpolicy", "mode": "Envelope" },
|
||||
"payloadExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "state.policyId" } }
|
||||
]
|
||||
}
|
||||
},
|
||||
"whenFailure": { "steps": [{ "$type": "complete" }] },
|
||||
"whenTimeout": { "steps": [{ "$type": "complete" }] }
|
||||
},
|
||||
{
|
||||
"$type": "sub-workflow",
|
||||
"stepName": "Review Policy Changes",
|
||||
"invocation": {
|
||||
"workflowName": "ReviewPolicyOpenForChange",
|
||||
"payloadExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "state.policyId" } },
|
||||
{ "name": "productCode", "expression": { "$type": "path", "path": "state.productCode" } }
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"$type": "continue-with-workflow",
|
||||
"stepName": "Start Transfer Process",
|
||||
"invocation": {
|
||||
"workflowName": "TransferPolicy",
|
||||
"payloadExpression": { "$type": "path", "path": "state" }
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"tasks": []
|
||||
}
|
||||
22
docs/workflow/tutorials/06-advanced-patterns/README.md
Normal file
22
docs/workflow/tutorials/06-advanced-patterns/README.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# Tutorial 6: Advanced Patterns
|
||||
|
||||
Fork (parallel branches), Repeat (retry loops), Timer (delays), and External Signal (wait for events).
|
||||
|
||||
## Patterns
|
||||
|
||||
| Pattern | Use When |
|
||||
|---------|----------|
|
||||
| **Fork** | Multiple independent operations that should run concurrently |
|
||||
| **Repeat** | Retry a service call with backoff, poll until condition met |
|
||||
| **Timer** | Delay between steps (backoff, scheduled processing) |
|
||||
| **External Signal** | Wait for an external event (document upload, approval from another system) |
|
||||
|
||||
## Variants
|
||||
|
||||
- [C# Fluent DSL](csharp/)
|
||||
- [Canonical JSON](json/)
|
||||
|
||||
## Next
|
||||
|
||||
[Tutorial 7: Shared Helpers](../07-shared-helpers/) — organizing reusable workflow components.
|
||||
|
||||
@@ -0,0 +1,92 @@
|
||||
using System.Collections.Generic;
|
||||
|
||||
using WorkflowEngine.Abstractions;
|
||||
using WorkflowEngine.Contracts;
|
||||
|
||||
namespace WorkflowEngine.Tutorials;
|
||||
|
||||
public sealed class AdvancedPatternsWorkflow : IDeclarativeWorkflow<PolicyChangeWorkflowRequest>
|
||||
{
|
||||
public string WorkflowName => "AdvancedPatterns";
|
||||
public string WorkflowVersion => "1.0.0";
|
||||
public string DisplayName => "Advanced Patterns Example";
|
||||
public IReadOnlyCollection<string> WorkflowRoles => [];
|
||||
|
||||
public WorkflowSpec<PolicyChangeWorkflowRequest> Spec { get; } = WorkflowSpec
|
||||
.For<PolicyChangeWorkflowRequest>()
|
||||
.InitializeState(WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("start.policyId")),
|
||||
WorkflowExpr.Prop("retryAttempt", WorkflowExpr.Number(0)),
|
||||
WorkflowExpr.Prop("integrationFailed", WorkflowExpr.Bool(false))))
|
||||
.StartWith(BuildFlow)
|
||||
.Build();
|
||||
|
||||
public IReadOnlyCollection<WorkflowTaskDescriptor> Tasks => Spec.TaskDescriptors;
|
||||
|
||||
private static void BuildFlow(WorkflowFlowBuilder<PolicyChangeWorkflowRequest> flow)
|
||||
{
|
||||
flow
|
||||
// ═══════════════════════════════════════════════
|
||||
// FORK: parallel branches
|
||||
// ═══════════════════════════════════════════════
|
||||
// Both branches run concurrently. Workflow resumes after all complete.
|
||||
.Fork("Process Documents and Notify",
|
||||
documents => documents
|
||||
.Call("Generate PDF",
|
||||
Address.LegacyRabbit("pas_pdf_generate"),
|
||||
WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("state.policyId"))),
|
||||
WorkflowHandledBranchAction.Complete,
|
||||
WorkflowHandledBranchAction.Complete),
|
||||
notification => notification
|
||||
.Call("Send Email",
|
||||
Address.LegacyRabbit("notifications_send_email",
|
||||
SerdicaLegacyRabbitMode.MicroserviceConsumer),
|
||||
WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("to", WorkflowExpr.String("agent@company.com")),
|
||||
WorkflowExpr.Prop("subject", WorkflowExpr.String("Policy processed"))),
|
||||
WorkflowHandledBranchAction.Complete,
|
||||
WorkflowHandledBranchAction.Complete))
|
||||
|
||||
// ═══════════════════════════════════════════════
|
||||
// REPEAT: retry loop with backoff
|
||||
// ═══════════════════════════════════════════════
|
||||
// Retries up to 3 times while integrationFailed is true.
|
||||
.Repeat(
|
||||
"Retry Integration",
|
||||
WorkflowExpr.Number(3), // max iterations
|
||||
"retryAttempt", // counter state key
|
||||
WorkflowExpr.Or( // continue while:
|
||||
WorkflowExpr.Eq( // first attempt (counter == 0)
|
||||
WorkflowExpr.Path("state.retryAttempt"),
|
||||
WorkflowExpr.Number(0)),
|
||||
WorkflowExpr.Path("state.integrationFailed")), // or previous attempt failed
|
||||
body => body
|
||||
.Set("integrationFailed", WorkflowExpr.Bool(false))
|
||||
.Call("Transfer Policy",
|
||||
Address.Http("integration", "/api/transfer", "POST"),
|
||||
WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("state.policyId"))),
|
||||
whenFailure: fail => fail
|
||||
.Set("integrationFailed", WorkflowExpr.Bool(true))
|
||||
// TIMER: wait before retrying
|
||||
.Wait("Backoff", WorkflowExpr.String("00:05:00")),
|
||||
whenTimeout: timeout => timeout
|
||||
.Set("integrationFailed", WorkflowExpr.Bool(true))))
|
||||
|
||||
// ═══════════════════════════════════════════════
|
||||
// EXTERNAL SIGNAL: wait for event
|
||||
// ═══════════════════════════════════════════════
|
||||
// Workflow pauses until an external system raises the named signal.
|
||||
.WaitForSignal(
|
||||
"Wait for Document Upload",
|
||||
signalName: "documents-ready",
|
||||
resultKey: "uploadedDocs")
|
||||
|
||||
// Use the signal payload in subsequent steps.
|
||||
.Set("documentCount",
|
||||
WorkflowExpr.Func("length",
|
||||
WorkflowExpr.Path("result.uploadedDocs.fileIds")))
|
||||
.Complete();
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,127 @@
|
||||
{
|
||||
"schemaVersion": "serdica.workflow.definition/v1",
|
||||
"workflowName": "AdvancedPatterns",
|
||||
"workflowVersion": "1.0.0",
|
||||
"displayName": "Advanced Patterns Example",
|
||||
"workflowRoles": [],
|
||||
"start": {
|
||||
"initializeStateExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "start.policyId" } },
|
||||
{ "name": "retryAttempt", "expression": { "$type": "number", "value": 0 } },
|
||||
{ "name": "integrationFailed", "expression": { "$type": "boolean", "value": false } }
|
||||
]
|
||||
},
|
||||
"sequence": {
|
||||
"steps": [
|
||||
{
|
||||
"$type": "fork",
|
||||
"stepName": "Process Documents and Notify",
|
||||
"branches": [
|
||||
{
|
||||
"steps": [
|
||||
{
|
||||
"$type": "call-transport",
|
||||
"stepName": "Generate PDF",
|
||||
"invocation": {
|
||||
"address": { "$type": "legacy-rabbit", "command": "pas_pdf_generate", "mode": "Envelope" },
|
||||
"payloadExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "state.policyId" } }
|
||||
]
|
||||
}
|
||||
},
|
||||
"whenFailure": { "steps": [{ "$type": "complete" }] },
|
||||
"whenTimeout": { "steps": [{ "$type": "complete" }] }
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"steps": [
|
||||
{
|
||||
"$type": "call-transport",
|
||||
"stepName": "Send Email",
|
||||
"invocation": {
|
||||
"address": { "$type": "legacy-rabbit", "command": "notifications_send_email", "mode": "MicroserviceConsumer" },
|
||||
"payloadExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "to", "expression": { "$type": "string", "value": "agent@company.com" } },
|
||||
{ "name": "subject", "expression": { "$type": "string", "value": "Policy processed" } }
|
||||
]
|
||||
}
|
||||
},
|
||||
"whenFailure": { "steps": [{ "$type": "complete" }] },
|
||||
"whenTimeout": { "steps": [{ "$type": "complete" }] }
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"$type": "repeat",
|
||||
"stepName": "Retry Integration",
|
||||
"maxIterationsExpression": { "$type": "number", "value": 3 },
|
||||
"iterationStateKey": "retryAttempt",
|
||||
"continueWhileExpression": {
|
||||
"$type": "binary", "operator": "or",
|
||||
"left": {
|
||||
"$type": "binary", "operator": "eq",
|
||||
"left": { "$type": "path", "path": "state.retryAttempt" },
|
||||
"right": { "$type": "number", "value": 0 }
|
||||
},
|
||||
"right": { "$type": "path", "path": "state.integrationFailed" }
|
||||
},
|
||||
"body": {
|
||||
"steps": [
|
||||
{ "$type": "set-state", "stateKey": "integrationFailed", "valueExpression": { "$type": "boolean", "value": false } },
|
||||
{
|
||||
"$type": "call-transport",
|
||||
"stepName": "Transfer Policy",
|
||||
"invocation": {
|
||||
"address": { "$type": "http", "target": "integration", "path": "/api/transfer", "method": "POST" },
|
||||
"payloadExpression": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "state.policyId" } }
|
||||
]
|
||||
}
|
||||
},
|
||||
"whenFailure": {
|
||||
"steps": [
|
||||
{ "$type": "set-state", "stateKey": "integrationFailed", "valueExpression": { "$type": "boolean", "value": true } },
|
||||
{ "$type": "timer", "stepName": "Backoff", "delayExpression": { "$type": "string", "value": "00:05:00" } }
|
||||
]
|
||||
},
|
||||
"whenTimeout": {
|
||||
"steps": [
|
||||
{ "$type": "set-state", "stateKey": "integrationFailed", "valueExpression": { "$type": "boolean", "value": true } }
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"$type": "external-signal",
|
||||
"stepName": "Wait for Document Upload",
|
||||
"signalNameExpression": { "$type": "string", "value": "documents-ready" },
|
||||
"resultKey": "uploadedDocs"
|
||||
},
|
||||
{
|
||||
"$type": "set-state",
|
||||
"stateKey": "documentCount",
|
||||
"valueExpression": {
|
||||
"$type": "function",
|
||||
"functionName": "length",
|
||||
"arguments": [{ "$type": "path", "path": "result.uploadedDocs.fileIds" }]
|
||||
}
|
||||
},
|
||||
{ "$type": "complete" }
|
||||
]
|
||||
}
|
||||
},
|
||||
"tasks": []
|
||||
}
|
||||
24
docs/workflow/tutorials/07-shared-helpers/README.md
Normal file
24
docs/workflow/tutorials/07-shared-helpers/README.md
Normal file
@@ -0,0 +1,24 @@
|
||||
# Tutorial 7: Shared Support Helpers
|
||||
|
||||
When building many workflows for the same domain (e.g., 50+ policy change workflows), extract reusable components into a support helper class.
|
||||
|
||||
## What to Extract
|
||||
|
||||
| Component | Example |
|
||||
|-----------|---------|
|
||||
| **Address constants** | `LegacyRabbitAddress`, `HttpAddress` — centralized routing |
|
||||
| **Workflow references** | `WorkflowReference` — for SubWorkflow/ContinueWith targets |
|
||||
| **Payload builders** | Static methods returning `WorkflowExpressionDefinition` |
|
||||
| **State initializers** | Base state + override pattern |
|
||||
| **Flow extensions** | Extension methods on `WorkflowFlowBuilder<T>` for common sequences |
|
||||
|
||||
## C#-Only Tutorial
|
||||
|
||||
This tutorial has no JSON equivalent — it covers C# code organization patterns.
|
||||
|
||||
- [C# Example](csharp/)
|
||||
|
||||
## Next
|
||||
|
||||
[Tutorial 8: Expressions](../08-expressions/) — path navigation, functions, and operators.
|
||||
|
||||
@@ -0,0 +1,170 @@
|
||||
using System;
|
||||
using System.Collections.Generic;
|
||||
|
||||
using WorkflowEngine.Abstractions;
|
||||
using WorkflowEngine.Contracts;
|
||||
|
||||
namespace WorkflowEngine.Tutorials;
|
||||
|
||||
/// <summary>
|
||||
/// Shared support helper for policy change workflows.
|
||||
/// Centralizes addresses, payload builders, and reusable flow patterns.
|
||||
/// </summary>
|
||||
internal static class PolicyWorkflowSupport
|
||||
{
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
// ADDRESS REGISTRY
|
||||
// Centralize all service routing in one place.
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
|
||||
internal static readonly LegacyRabbitAddress ValidatePolicyAddress =
|
||||
new("pas_policy_validate");
|
||||
|
||||
internal static readonly LegacyRabbitAddress AlterPolicyAddress =
|
||||
new("pas_annexprocessing_alterpolicy");
|
||||
|
||||
internal static readonly LegacyRabbitAddress CalculatePremiumAddress =
|
||||
new("pas_premium_calculate_for_object",
|
||||
SerdicaLegacyRabbitMode.MicroserviceConsumer);
|
||||
|
||||
internal static readonly LegacyRabbitAddress GetAnnexDescAddress =
|
||||
new("pas_polannexes_get");
|
||||
|
||||
internal static readonly LegacyRabbitAddress NotificationEmailAddress =
|
||||
new("notifications_send_email",
|
||||
SerdicaLegacyRabbitMode.MicroserviceConsumer);
|
||||
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
// WORKFLOW REFERENCES
|
||||
// For SubWorkflow and ContinueWith invocations.
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
|
||||
internal static readonly WorkflowReference ReviewPolicyReference =
|
||||
new("ReviewPolicyOpenForChange");
|
||||
|
||||
internal static readonly WorkflowReference TransferPolicyReference =
|
||||
new("TransferPolicy");
|
||||
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
// STATE INITIALIZATION
|
||||
// Base state + override pattern for workflow families.
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
|
||||
/// <summary>
|
||||
/// Builds a state initialization expression with common policy fields
|
||||
/// and optional per-workflow overrides.
|
||||
/// </summary>
|
||||
internal static WorkflowExpressionDefinition BuildInitializeState(
|
||||
params WorkflowNamedExpressionDefinition[] overrides)
|
||||
{
|
||||
var properties = new List<WorkflowNamedExpressionDefinition>
|
||||
{
|
||||
WorkflowExpr.Prop("srPolicyId", WorkflowExpr.Path("start.srPolicyId")),
|
||||
WorkflowExpr.Prop("srAnnexId", WorkflowExpr.Path("start.srAnnexId")),
|
||||
WorkflowExpr.Prop("srCustId", WorkflowExpr.Path("start.srCustId")),
|
||||
WorkflowExpr.Prop("annexType", WorkflowExpr.Path("start.annexType")),
|
||||
WorkflowExpr.Prop("beginDate", WorkflowExpr.Path("start.beginDate")),
|
||||
WorkflowExpr.Prop("endDate", WorkflowExpr.Path("start.endDate")),
|
||||
};
|
||||
|
||||
// Apply overrides: replace existing or add new properties.
|
||||
foreach (var o in overrides)
|
||||
{
|
||||
var existing = properties.FindIndex(
|
||||
p => string.Equals(p.Name, o.Name, StringComparison.OrdinalIgnoreCase));
|
||||
if (existing >= 0)
|
||||
{
|
||||
properties[existing] = o;
|
||||
}
|
||||
else
|
||||
{
|
||||
properties.Add(o);
|
||||
}
|
||||
}
|
||||
|
||||
return WorkflowExpr.Object(properties);
|
||||
}
|
||||
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
// PAYLOAD BUILDERS
|
||||
// Reusable expressions for common service call payloads.
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
|
||||
internal static WorkflowExpressionDefinition BuildAlterPolicyPayload()
|
||||
{
|
||||
return WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("srPolicyId", WorkflowExpr.Path("state.srPolicyId")),
|
||||
WorkflowExpr.Prop("beginDate", WorkflowExpr.Path("state.beginDate")),
|
||||
WorkflowExpr.Prop("endDate", WorkflowExpr.Path("state.endDate")),
|
||||
WorkflowExpr.Prop("annexType", WorkflowExpr.Path("state.annexType")));
|
||||
}
|
||||
|
||||
internal static WorkflowExpressionDefinition BuildAnnexTypeEquals(string type)
|
||||
{
|
||||
return WorkflowExpr.Eq(
|
||||
WorkflowExpr.Func("upper", WorkflowExpr.Path("state.annexType")),
|
||||
WorkflowExpr.String(type));
|
||||
}
|
||||
|
||||
internal static WorkflowExpressionDefinition BuildPolicyIdPayload()
|
||||
{
|
||||
return WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("srPolicyId", WorkflowExpr.Path("state.srPolicyId")));
|
||||
}
|
||||
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
// WORKFLOW INVOCATION BUILDERS
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
|
||||
internal static WorkflowWorkflowInvocationDeclaration BuildReviewInvocation()
|
||||
{
|
||||
return new WorkflowWorkflowInvocationDeclaration
|
||||
{
|
||||
WorkflowName = ReviewPolicyReference.WorkflowName,
|
||||
PayloadExpression = WorkflowExpr.Path("state"),
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Extension methods for common flow patterns.
|
||||
/// Used across multiple workflows for DRY step sequences.
|
||||
/// </summary>
|
||||
internal static class PolicyWorkflowFlowExtensions
|
||||
{
|
||||
/// <summary>
|
||||
/// Applies product info from a service call result into workflow state.
|
||||
/// </summary>
|
||||
internal static WorkflowFlowBuilder<T> ApplyProductInfo<T>(
|
||||
this WorkflowFlowBuilder<T> flow,
|
||||
string resultKey = "productInfo")
|
||||
where T : class
|
||||
{
|
||||
return flow
|
||||
.SetIfHasValue("productCode",
|
||||
WorkflowExpr.Func("upper",
|
||||
WorkflowExpr.Path($"result.{resultKey}.productCode")))
|
||||
.SetIfHasValue("lob",
|
||||
WorkflowExpr.Func("upper",
|
||||
WorkflowExpr.Path($"result.{resultKey}.lob")))
|
||||
.SetIfHasValue("contractType",
|
||||
WorkflowExpr.Path($"result.{resultKey}.contractType"));
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Standard "load product info and apply" pattern.
|
||||
/// </summary>
|
||||
internal static WorkflowFlowBuilder<T> LoadAndApplyProductInfo<T>(
|
||||
this WorkflowFlowBuilder<T> flow)
|
||||
where T : class
|
||||
{
|
||||
return flow
|
||||
.Call<object>("Load Product Info",
|
||||
Address.LegacyRabbit("pas_get_policy_product_info"),
|
||||
PolicyWorkflowSupport.BuildPolicyIdPayload(),
|
||||
WorkflowHandledBranchAction.Complete,
|
||||
WorkflowHandledBranchAction.Complete,
|
||||
resultKey: "productInfo")
|
||||
.ApplyProductInfo();
|
||||
}
|
||||
}
|
||||
36
docs/workflow/tutorials/08-expressions/README.md
Normal file
36
docs/workflow/tutorials/08-expressions/README.md
Normal file
@@ -0,0 +1,36 @@
|
||||
# Tutorial 8: Expressions
|
||||
|
||||
The expression system enables declarative logic that compiles to portable canonical JSON. All expressions are evaluable at runtime without recompilation.
|
||||
|
||||
## Path Navigation
|
||||
|
||||
| Prefix | Source | Example |
|
||||
|--------|--------|---------|
|
||||
| `start.*` | Start request fields | `start.policyId` |
|
||||
| `state.*` | Mutable workflow state | `state.customerName` |
|
||||
| `payload.*` | Task completion payload | `payload.answer` |
|
||||
| `result.*` | Step result (by resultKey) | `result.productInfo.lob` |
|
||||
|
||||
## Built-in Functions
|
||||
|
||||
| Function | Description | Example |
|
||||
|----------|-------------|---------|
|
||||
| `coalesce` | First non-null | `coalesce(state.id, start.id, 0)` |
|
||||
| `concat` | String join | `concat("Policy #", state.policyNo)` |
|
||||
| `add` | Sum | `add(state.attempt, 1)` |
|
||||
| `if` | Conditional | `if(state.isVip, "VIP", "Standard")` |
|
||||
| `isNullOrWhiteSpace` | Null/empty check | `isNullOrWhiteSpace(state.name)` |
|
||||
| `length` | String/array length | `length(state.items)` |
|
||||
| `upper` | Uppercase | `upper(state.annexType)` |
|
||||
| `first` | First array element | `first(state.objects)` |
|
||||
| `mergeObjects` | Deep merge | `mergeObjects(state, payload)` |
|
||||
|
||||
## Variants
|
||||
|
||||
- [C# Expression Builder](csharp/)
|
||||
- [JSON Expression Format](json/)
|
||||
|
||||
## Next
|
||||
|
||||
[Tutorial 9: Testing](../09-testing/) — unit test setup with recording transports.
|
||||
|
||||
@@ -0,0 +1,131 @@
|
||||
using WorkflowEngine.Abstractions;
|
||||
|
||||
namespace WorkflowEngine.Tutorials;
|
||||
|
||||
/// <summary>
|
||||
/// Expression builder examples showing all expression types.
|
||||
/// These examples are not a runnable workflow — they demonstrate the WorkflowExpr API.
|
||||
/// </summary>
|
||||
internal static class ExpressionExamples
|
||||
{
|
||||
// ═══════════════════════════════════════════════
|
||||
// LITERALS
|
||||
// ═══════════════════════════════════════════════
|
||||
|
||||
static readonly WorkflowExpressionDefinition nullExpr = WorkflowExpr.Null();
|
||||
static readonly WorkflowExpressionDefinition stringExpr = WorkflowExpr.String("hello");
|
||||
static readonly WorkflowExpressionDefinition numberExpr = WorkflowExpr.Number(42);
|
||||
static readonly WorkflowExpressionDefinition boolExpr = WorkflowExpr.Bool(true);
|
||||
|
||||
// ═══════════════════════════════════════════════
|
||||
// PATH NAVIGATION
|
||||
// ═══════════════════════════════════════════════
|
||||
|
||||
// Read from start request
|
||||
static readonly WorkflowExpressionDefinition fromStart = WorkflowExpr.Path("start.policyId");
|
||||
|
||||
// Read from workflow state
|
||||
static readonly WorkflowExpressionDefinition fromState = WorkflowExpr.Path("state.customerName");
|
||||
|
||||
// Read from task completion payload
|
||||
static readonly WorkflowExpressionDefinition fromPayload = WorkflowExpr.Path("payload.answer");
|
||||
|
||||
// Read from step result (requires resultKey on the Call step)
|
||||
static readonly WorkflowExpressionDefinition fromResult = WorkflowExpr.Path("result.productInfo.lob");
|
||||
|
||||
// Nested path navigation
|
||||
static readonly WorkflowExpressionDefinition nested = WorkflowExpr.Path("state.entityData.address.city");
|
||||
|
||||
// ═══════════════════════════════════════════════
|
||||
// OBJECT & ARRAY CONSTRUCTION
|
||||
// ═══════════════════════════════════════════════
|
||||
|
||||
static readonly WorkflowExpressionDefinition obj = WorkflowExpr.Object(
|
||||
WorkflowExpr.Prop("policyId", WorkflowExpr.Path("state.policyId")),
|
||||
WorkflowExpr.Prop("status", WorkflowExpr.String("ACTIVE")),
|
||||
WorkflowExpr.Prop("tags", WorkflowExpr.Array(
|
||||
WorkflowExpr.String("motor"),
|
||||
WorkflowExpr.String("casco"))));
|
||||
|
||||
// ═══════════════════════════════════════════════
|
||||
// COMPARISONS
|
||||
// ═══════════════════════════════════════════════
|
||||
|
||||
static readonly WorkflowExpressionDefinition eq = WorkflowExpr.Eq(
|
||||
WorkflowExpr.Path("state.status"), WorkflowExpr.String("APPROVED"));
|
||||
static readonly WorkflowExpressionDefinition ne = WorkflowExpr.Ne(
|
||||
WorkflowExpr.Path("state.status"), WorkflowExpr.String("REJECTED"));
|
||||
static readonly WorkflowExpressionDefinition gt = WorkflowExpr.Gt(
|
||||
WorkflowExpr.Path("state.premium"), WorkflowExpr.Number(1000));
|
||||
|
||||
// ═══════════════════════════════════════════════
|
||||
// BOOLEAN LOGIC
|
||||
// ═══════════════════════════════════════════════
|
||||
|
||||
static readonly WorkflowExpressionDefinition not = WorkflowExpr.Not(
|
||||
WorkflowExpr.Path("state.isRejected"));
|
||||
static readonly WorkflowExpressionDefinition and = WorkflowExpr.And(
|
||||
WorkflowExpr.Path("state.isValid"),
|
||||
WorkflowExpr.Not(WorkflowExpr.Path("state.isRejected")));
|
||||
static readonly WorkflowExpressionDefinition or = WorkflowExpr.Or(
|
||||
WorkflowExpr.Eq(WorkflowExpr.Path("state.status"), WorkflowExpr.String("APPROVED")),
|
||||
WorkflowExpr.Eq(WorkflowExpr.Path("state.status"), WorkflowExpr.String("OVERRIDE")));
|
||||
|
||||
// ═══════════════════════════════════════════════
|
||||
// FUNCTION CALLS
|
||||
// ═══════════════════════════════════════════════
|
||||
|
||||
// Coalesce: first non-null value
|
||||
static readonly WorkflowExpressionDefinition coalesce = WorkflowExpr.Func("coalesce",
|
||||
WorkflowExpr.Path("state.customerId"),
|
||||
WorkflowExpr.Path("start.customerId"),
|
||||
WorkflowExpr.Number(0));
|
||||
|
||||
// String concatenation
|
||||
static readonly WorkflowExpressionDefinition concat = WorkflowExpr.Func("concat",
|
||||
WorkflowExpr.String("Policy #"),
|
||||
WorkflowExpr.Path("state.policyNo"));
|
||||
|
||||
// Arithmetic
|
||||
static readonly WorkflowExpressionDefinition increment = WorkflowExpr.Func("add",
|
||||
WorkflowExpr.Func("coalesce",
|
||||
WorkflowExpr.Path("state.attempt"), WorkflowExpr.Number(0)),
|
||||
WorkflowExpr.Number(1));
|
||||
|
||||
// Conditional value
|
||||
static readonly WorkflowExpressionDefinition conditional = WorkflowExpr.Func("if",
|
||||
WorkflowExpr.Path("state.isVip"),
|
||||
WorkflowExpr.String("VIP"),
|
||||
WorkflowExpr.String("Standard"));
|
||||
|
||||
// Uppercase
|
||||
static readonly WorkflowExpressionDefinition upper = WorkflowExpr.Func("upper",
|
||||
WorkflowExpr.Path("state.annexType"));
|
||||
|
||||
// Null check
|
||||
static readonly WorkflowExpressionDefinition nullCheck = WorkflowExpr.Func("isNullOrWhiteSpace",
|
||||
WorkflowExpr.Path("state.integrationId"));
|
||||
|
||||
// Array length
|
||||
static readonly WorkflowExpressionDefinition length = WorkflowExpr.Func("length",
|
||||
WorkflowExpr.Path("state.documents"));
|
||||
|
||||
// ═══════════════════════════════════════════════
|
||||
// COMBINING EXPRESSIONS (real-world patterns)
|
||||
// ═══════════════════════════════════════════════
|
||||
|
||||
// "Use integration customer ID if present, otherwise use lookup ID"
|
||||
static readonly WorkflowExpressionDefinition resolveCustomerId = WorkflowExpr.Func("if",
|
||||
WorkflowExpr.Not(
|
||||
WorkflowExpr.Func("isNullOrWhiteSpace",
|
||||
WorkflowExpr.Path("state.integrationCustomerId"))),
|
||||
WorkflowExpr.Path("state.integrationCustomerId"),
|
||||
WorkflowExpr.Path("state.lookupCustomerId"));
|
||||
|
||||
// "Should we retry? (first attempt or previous failed, and not timed out)"
|
||||
static readonly WorkflowExpressionDefinition shouldRetry = WorkflowExpr.Or(
|
||||
WorkflowExpr.Eq(WorkflowExpr.Path("state.retryAttempt"), WorkflowExpr.Number(0)),
|
||||
WorkflowExpr.And(
|
||||
WorkflowExpr.Not(WorkflowExpr.Path("state.timedOut")),
|
||||
WorkflowExpr.Path("state.integrationFailed")));
|
||||
}
|
||||
@@ -0,0 +1,166 @@
|
||||
{
|
||||
"_comment": "Expression examples in canonical JSON format. Each key shows a different expression pattern.",
|
||||
|
||||
"literals": {
|
||||
"null": { "$type": "null" },
|
||||
"string": { "$type": "string", "value": "hello" },
|
||||
"number": { "$type": "number", "value": 42 },
|
||||
"boolean": { "$type": "boolean", "value": true }
|
||||
},
|
||||
|
||||
"paths": {
|
||||
"fromStartRequest": { "$type": "path", "path": "start.policyId" },
|
||||
"fromState": { "$type": "path", "path": "state.customerName" },
|
||||
"fromPayload": { "$type": "path", "path": "payload.answer" },
|
||||
"fromResult": { "$type": "path", "path": "result.productInfo.lob" },
|
||||
"nestedPath": { "$type": "path", "path": "state.entityData.address.city" }
|
||||
},
|
||||
|
||||
"objectConstruction": {
|
||||
"$type": "object",
|
||||
"properties": [
|
||||
{ "name": "policyId", "expression": { "$type": "path", "path": "state.policyId" } },
|
||||
{ "name": "status", "expression": { "$type": "string", "value": "ACTIVE" } },
|
||||
{ "name": "tags", "expression": {
|
||||
"$type": "array",
|
||||
"items": [
|
||||
{ "$type": "string", "value": "motor" },
|
||||
{ "$type": "string", "value": "casco" }
|
||||
]
|
||||
}}
|
||||
]
|
||||
},
|
||||
|
||||
"comparisons": {
|
||||
"equals": {
|
||||
"$type": "binary", "operator": "eq",
|
||||
"left": { "$type": "path", "path": "state.status" },
|
||||
"right": { "$type": "string", "value": "APPROVED" }
|
||||
},
|
||||
"notEquals": {
|
||||
"$type": "binary", "operator": "ne",
|
||||
"left": { "$type": "path", "path": "state.status" },
|
||||
"right": { "$type": "string", "value": "REJECTED" }
|
||||
},
|
||||
"greaterThan": {
|
||||
"$type": "binary", "operator": "gt",
|
||||
"left": { "$type": "path", "path": "state.premium" },
|
||||
"right": { "$type": "number", "value": 1000 }
|
||||
}
|
||||
},
|
||||
|
||||
"booleanLogic": {
|
||||
"not": {
|
||||
"$type": "unary", "operator": "not",
|
||||
"operand": { "$type": "path", "path": "state.isRejected" }
|
||||
},
|
||||
"and": {
|
||||
"$type": "binary", "operator": "and",
|
||||
"left": { "$type": "path", "path": "state.isValid" },
|
||||
"right": {
|
||||
"$type": "unary", "operator": "not",
|
||||
"operand": { "$type": "path", "path": "state.isRejected" }
|
||||
}
|
||||
},
|
||||
"or": {
|
||||
"$type": "binary", "operator": "or",
|
||||
"left": {
|
||||
"$type": "binary", "operator": "eq",
|
||||
"left": { "$type": "path", "path": "state.status" },
|
||||
"right": { "$type": "string", "value": "APPROVED" }
|
||||
},
|
||||
"right": {
|
||||
"$type": "binary", "operator": "eq",
|
||||
"left": { "$type": "path", "path": "state.status" },
|
||||
"right": { "$type": "string", "value": "OVERRIDE" }
|
||||
}
|
||||
}
|
||||
},
|
||||
|
||||
"functions": {
|
||||
"coalesce": {
|
||||
"$type": "function", "functionName": "coalesce",
|
||||
"arguments": [
|
||||
{ "$type": "path", "path": "state.customerId" },
|
||||
{ "$type": "path", "path": "start.customerId" },
|
||||
{ "$type": "number", "value": 0 }
|
||||
]
|
||||
},
|
||||
"concat": {
|
||||
"$type": "function", "functionName": "concat",
|
||||
"arguments": [
|
||||
{ "$type": "string", "value": "Policy #" },
|
||||
{ "$type": "path", "path": "state.policyNo" }
|
||||
]
|
||||
},
|
||||
"increment": {
|
||||
"$type": "function", "functionName": "add",
|
||||
"arguments": [
|
||||
{
|
||||
"$type": "function", "functionName": "coalesce",
|
||||
"arguments": [
|
||||
{ "$type": "path", "path": "state.attempt" },
|
||||
{ "$type": "number", "value": 0 }
|
||||
]
|
||||
},
|
||||
{ "$type": "number", "value": 1 }
|
||||
]
|
||||
},
|
||||
"conditional": {
|
||||
"$type": "function", "functionName": "if",
|
||||
"arguments": [
|
||||
{ "$type": "path", "path": "state.isVip" },
|
||||
{ "$type": "string", "value": "VIP" },
|
||||
{ "$type": "string", "value": "Standard" }
|
||||
]
|
||||
},
|
||||
"uppercase": {
|
||||
"$type": "function", "functionName": "upper",
|
||||
"arguments": [{ "$type": "path", "path": "state.annexType" }]
|
||||
},
|
||||
"nullCheck": {
|
||||
"$type": "function", "functionName": "isNullOrWhiteSpace",
|
||||
"arguments": [{ "$type": "path", "path": "state.integrationId" }]
|
||||
},
|
||||
"arrayLength": {
|
||||
"$type": "function", "functionName": "length",
|
||||
"arguments": [{ "$type": "path", "path": "state.documents" }]
|
||||
}
|
||||
},
|
||||
|
||||
"realWorldPatterns": {
|
||||
"resolveCustomerId_comment": "Use integration customer ID if present, otherwise use lookup ID",
|
||||
"resolveCustomerId": {
|
||||
"$type": "function", "functionName": "if",
|
||||
"arguments": [
|
||||
{
|
||||
"$type": "unary", "operator": "not",
|
||||
"operand": {
|
||||
"$type": "function", "functionName": "isNullOrWhiteSpace",
|
||||
"arguments": [{ "$type": "path", "path": "state.integrationCustomerId" }]
|
||||
}
|
||||
},
|
||||
{ "$type": "path", "path": "state.integrationCustomerId" },
|
||||
{ "$type": "path", "path": "state.lookupCustomerId" }
|
||||
]
|
||||
},
|
||||
|
||||
"shouldRetry_comment": "First attempt or previous failed and not timed out",
|
||||
"shouldRetry": {
|
||||
"$type": "binary", "operator": "or",
|
||||
"left": {
|
||||
"$type": "binary", "operator": "eq",
|
||||
"left": { "$type": "path", "path": "state.retryAttempt" },
|
||||
"right": { "$type": "number", "value": 0 }
|
||||
},
|
||||
"right": {
|
||||
"$type": "binary", "operator": "and",
|
||||
"left": {
|
||||
"$type": "unary", "operator": "not",
|
||||
"operand": { "$type": "path", "path": "state.timedOut" }
|
||||
},
|
||||
"right": { "$type": "path", "path": "state.integrationFailed" }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
29
docs/workflow/tutorials/09-testing/README.md
Normal file
29
docs/workflow/tutorials/09-testing/README.md
Normal file
@@ -0,0 +1,29 @@
|
||||
# Tutorial 9: Testing Your Workflow
|
||||
|
||||
Write unit tests for workflows using `RecordingSerdicaLegacyRabbitTransport` and `TechnicalStyleWorkflowTestHelpers`.
|
||||
|
||||
## Test Setup Pattern
|
||||
|
||||
1. Create a recording transport with pre-configured responses
|
||||
2. Build a test service provider via `TechnicalStyleWorkflowTestHelpers.CreateServiceProvider`
|
||||
3. Resolve `WorkflowRuntimeService` from DI
|
||||
4. Call `StartWorkflowAsync` with test payload
|
||||
5. Assert: tasks created, transport calls made, state values correct
|
||||
6. Optionally complete tasks and verify downstream behavior
|
||||
|
||||
## What to Test
|
||||
|
||||
| Scenario | Approach |
|
||||
|----------|----------|
|
||||
| Workflow starts correctly | Assert single open task after start |
|
||||
| Service calls made in order | `transport.Invocations.Select(x => x.Command).Should().Equal(...)` |
|
||||
| Rejection flow | Complete task with `"answer": "reject"`, verify cancel call |
|
||||
| Approval flow | Complete with `"answer": "approve"`, verify conversion calls |
|
||||
| Operations failure re-opens task | Check same task re-appears after operations return `passed: false` |
|
||||
| Sub-workflow creates child tasks | Query tasks by child workflow name |
|
||||
| Business reference set | `startResponse.BusinessReference.Key.Should().Be(...)` |
|
||||
|
||||
## C#-Only Tutorial
|
||||
|
||||
- [C# Test Examples](csharp/)
|
||||
|
||||
196
docs/workflow/tutorials/09-testing/csharp/WorkflowTests.cs
Normal file
196
docs/workflow/tutorials/09-testing/csharp/WorkflowTests.cs
Normal file
@@ -0,0 +1,196 @@
|
||||
using System.Collections.Generic;
|
||||
using System.Linq;
|
||||
using System.Threading.Tasks;
|
||||
|
||||
using WorkflowEngine.Abstractions;
|
||||
using WorkflowEngine.Contracts;
|
||||
using WorkflowEngine.Services;
|
||||
|
||||
using FluentAssertions;
|
||||
|
||||
using Microsoft.Extensions.DependencyInjection;
|
||||
|
||||
using NUnit.Framework;
|
||||
|
||||
namespace WorkflowEngine.Tutorials.Tests;
|
||||
|
||||
[TestFixture]
|
||||
public class WorkflowTestExamples
|
||||
{
|
||||
// ═══════════════════════════════════════════════
|
||||
// BASIC: Start workflow and verify task created
|
||||
// ═══════════════════════════════════════════════
|
||||
|
||||
[Test]
|
||||
public async Task Workflow_WhenStarted_ShouldCreateOpenTask()
|
||||
{
|
||||
// 1. Configure fake transport responses
|
||||
var transport = new RecordingSerdicaLegacyRabbitTransport()
|
||||
.Respond("pas_policy_validate", new { valid = true })
|
||||
.Respond("pas_get_policy_product_info", new
|
||||
{
|
||||
productCode = "4704",
|
||||
lob = "MOT",
|
||||
contractType = "STANDARD",
|
||||
});
|
||||
|
||||
// 2. Build test service provider
|
||||
using var provider = TechnicalStyleWorkflowTestHelpers.CreateServiceProvider(
|
||||
transport,
|
||||
WorkflowRuntimeProviderNames.SerdicaEngine);
|
||||
var runtimeService = provider.GetRequiredService<WorkflowRuntimeService>();
|
||||
|
||||
// 3. Start the workflow
|
||||
var start = await runtimeService.StartWorkflowAsync(new StartWorkflowRequest
|
||||
{
|
||||
WorkflowName = "ApproveApplication",
|
||||
Payload = new Dictionary<string, object?>
|
||||
{
|
||||
["srPolicyId"] = 12345L,
|
||||
["srAnnexId"] = 67890L,
|
||||
["srCustId"] = 11111L,
|
||||
},
|
||||
});
|
||||
|
||||
// 4. Assert task was created
|
||||
var tasks = await runtimeService.GetTasksAsync(new WorkflowTasksGetRequest
|
||||
{
|
||||
WorkflowInstanceId = start.WorkflowInstanceId,
|
||||
Status = "Open",
|
||||
});
|
||||
tasks.Tasks.Should().ContainSingle();
|
||||
tasks.Tasks.Single().TaskName.Should().Be("Approve Application");
|
||||
}
|
||||
|
||||
// ═══════════════════════════════════════════════
|
||||
// VERIFY: Service calls made in order
|
||||
// ═══════════════════════════════════════════════
|
||||
|
||||
[Test]
|
||||
public async Task Workflow_WhenStarted_ShouldCallServicesInOrder()
|
||||
{
|
||||
var transport = new RecordingSerdicaLegacyRabbitTransport()
|
||||
.Respond("pas_annexprocessing_alterpolicy", new
|
||||
{
|
||||
srPolicyId = 1L,
|
||||
srAnnexId = 2L,
|
||||
previouslyOpened = false,
|
||||
})
|
||||
.Respond("pas_polclmparticipants_create", new { ok = true })
|
||||
.Respond("pas_premium_calculate_for_object", new { ok = true },
|
||||
SerdicaLegacyRabbitMode.MicroserviceConsumer)
|
||||
.Respond("pas_polannexes_get", new
|
||||
{
|
||||
shortDescription = "Test annex",
|
||||
policyNo = "POL-001",
|
||||
});
|
||||
|
||||
using var provider = TechnicalStyleWorkflowTestHelpers.CreateServiceProvider(
|
||||
transport,
|
||||
WorkflowRuntimeProviderNames.SerdicaEngine);
|
||||
var runtimeService = provider.GetRequiredService<WorkflowRuntimeService>();
|
||||
|
||||
await runtimeService.StartWorkflowAsync(new StartWorkflowRequest
|
||||
{
|
||||
WorkflowName = "AssistantAddAnnex",
|
||||
Payload = new Dictionary<string, object?>
|
||||
{
|
||||
["srPolicyId"] = 1L,
|
||||
["srAnnexId"] = 2L,
|
||||
["policyExistsOnIPAL"] = true,
|
||||
["annexPreviouslyOpened"] = false,
|
||||
["annexType"] = "BENEF",
|
||||
["entityData"] = new { srCustId = 3L },
|
||||
},
|
||||
});
|
||||
|
||||
// Verify exact call sequence
|
||||
transport.Invocations.Select(x => x.Command)
|
||||
.Should().Equal(
|
||||
"pas_annexprocessing_alterpolicy",
|
||||
"pas_polclmparticipants_create",
|
||||
"pas_premium_calculate_for_object",
|
||||
"pas_polannexes_get");
|
||||
}
|
||||
|
||||
// ═══════════════════════════════════════════════
|
||||
// TASK COMPLETION: Approve/reject flows
|
||||
// ═══════════════════════════════════════════════
|
||||
|
||||
[Test]
|
||||
public async Task Workflow_WhenTaskCompleted_ShouldExecuteOnCompleteFlow()
|
||||
{
|
||||
var transport = new RecordingSerdicaLegacyRabbitTransport()
|
||||
.Respond("pas_operations_perform", new { passed = true, nextStep = "CONTINUE" },
|
||||
SerdicaLegacyRabbitMode.MicroserviceConsumer)
|
||||
.Respond("pas_polreg_convertapltopol", new { ok = true })
|
||||
.Respond("pas_annexprocessing_generatepolicyno", new { ok = true })
|
||||
.Respond("pas_get_policy_product_info", new
|
||||
{
|
||||
productCode = "4704",
|
||||
lob = "MOT",
|
||||
contractType = "STANDARD",
|
||||
});
|
||||
|
||||
using var provider = TechnicalStyleWorkflowTestHelpers.CreateServiceProvider(
|
||||
transport,
|
||||
WorkflowRuntimeProviderNames.SerdicaEngine);
|
||||
var runtimeService = provider.GetRequiredService<WorkflowRuntimeService>();
|
||||
|
||||
// Start workflow
|
||||
var start = await runtimeService.StartWorkflowAsync(new StartWorkflowRequest
|
||||
{
|
||||
WorkflowName = "ApproveApplication",
|
||||
Payload = new Dictionary<string, object?>
|
||||
{
|
||||
["srPolicyId"] = 1L,
|
||||
["srAnnexId"] = 2L,
|
||||
["srCustId"] = 3L,
|
||||
},
|
||||
});
|
||||
|
||||
// Get the open task
|
||||
var task = (await runtimeService.GetTasksAsync(new WorkflowTasksGetRequest
|
||||
{
|
||||
WorkflowInstanceId = start.WorkflowInstanceId,
|
||||
Status = "Open",
|
||||
})).Tasks.Single();
|
||||
|
||||
// Complete with "approve"
|
||||
await runtimeService.CompleteTaskAsync(new WorkflowTaskCompleteRequest
|
||||
{
|
||||
WorkflowTaskId = task.WorkflowTaskId,
|
||||
ActorId = "test-user",
|
||||
ActorRoles = ["DBA"],
|
||||
Payload = new Dictionary<string, object?> { ["answer"] = "approve" },
|
||||
});
|
||||
|
||||
// Verify operations and conversion were called
|
||||
transport.Invocations.Should().Contain(x => x.Command == "pas_operations_perform");
|
||||
transport.Invocations.Should().Contain(x => x.Command == "pas_polreg_convertapltopol");
|
||||
}
|
||||
|
||||
// ═══════════════════════════════════════════════
|
||||
// RECORDING TRANSPORT: multiple responses
|
||||
// ═══════════════════════════════════════════════
|
||||
|
||||
[Test]
|
||||
public void RecordingTransport_CanConfigureMultipleResponses()
|
||||
{
|
||||
var transport = new RecordingSerdicaLegacyRabbitTransport()
|
||||
// Default mode (Envelope)
|
||||
.Respond("command_a", new { result = "first" })
|
||||
// Specific mode
|
||||
.Respond("command_b", new { result = "second" },
|
||||
SerdicaLegacyRabbitMode.MicroserviceConsumer)
|
||||
// Same command, different responses (returned in order)
|
||||
.Respond("command_c", new { attempt = 1 })
|
||||
.Respond("command_c", new { attempt = 2 });
|
||||
|
||||
// After workflow execution, inspect:
|
||||
// transport.Invocations — list of all calls made
|
||||
// transport.Invocations[0].Command — command name
|
||||
// transport.Invocations[0].Payload — request payload
|
||||
transport.Should().NotBeNull();
|
||||
}
|
||||
}
|
||||
32
docs/workflow/tutorials/README.md
Normal file
32
docs/workflow/tutorials/README.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# Workflow Declaration Tutorials
|
||||
|
||||
Step-by-step tutorials for building workflows with the Serdica Workflow Engine. Each tutorial is available in both **C# fluent DSL** and **canonical JSON** variants.
|
||||
|
||||
## Reference Documentation
|
||||
|
||||
- [Engine Reference Manual](../ENGINE.md) - Architecture, configuration, service surface, timeout model, signal system
|
||||
- [Fluent DSL Syntax Guide](../workflow-fluent-syntax-guide.md) - Complete DSL method reference
|
||||
|
||||
## Tutorials
|
||||
|
||||
| # | Tutorial | C# | JSON | Topics |
|
||||
|---|---------|-----|------|--------|
|
||||
| 01 | [Hello World](01-hello-world/) | [C#](01-hello-world/csharp/) | [JSON](01-hello-world/json/) | Minimal workflow, single task, state init |
|
||||
| 02 | [Service Tasks](02-service-tasks/) | [C#](02-service-tasks/csharp/) | [JSON](02-service-tasks/json/) | Transport calls, addresses, failure/timeout handling |
|
||||
| 03 | [Decisions](03-decisions/) | [C#](03-decisions/csharp/) | [JSON](03-decisions/json/) | WhenExpression, WhenStateFlag, nested branching |
|
||||
| 04 | [Human Tasks](04-human-tasks/) | [C#](04-human-tasks/csharp/) | [JSON](04-human-tasks/json/) | Approve/reject, OnComplete, re-activation, deadlines |
|
||||
| 05 | [Sub-Workflows](05-sub-workflows/) | [C#](05-sub-workflows/csharp/) | [JSON](05-sub-workflows/json/) | SubWorkflow vs ContinueWith, state flow |
|
||||
| 06 | [Advanced Patterns](06-advanced-patterns/) | [C#](06-advanced-patterns/csharp/) | [JSON](06-advanced-patterns/json/) | Fork, Repeat, Timer, External Signal |
|
||||
| 07 | [Shared Helpers](07-shared-helpers/) | [C#](07-shared-helpers/csharp/) | - | Address registries, payload builders, extensions |
|
||||
| 08 | [Expressions](08-expressions/) | [C#](08-expressions/csharp/) | [JSON](08-expressions/json/) | Path navigation, functions, operators |
|
||||
| 09 | [Testing](09-testing/) | [C#](09-testing/csharp/) | - | Recording transports, task completion, assertions |
|
||||
|
||||
## How to Read
|
||||
|
||||
Each tutorial folder contains:
|
||||
- **`README.md`** - Explanation, concepts, and what to expect
|
||||
- **`csharp/`** - C# fluent DSL examples
|
||||
- **`json/`** - Equivalent canonical JSON definitions (where applicable)
|
||||
|
||||
Start with Tutorial 01 and progress sequentially. Tutorials 07 (Shared Helpers) and 09 (Testing) are C#-only since they cover code organization and test infrastructure.
|
||||
|
||||
Reference in New Issue
Block a user