Files

master f5b5f24d95 Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects

Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into
standalone StellaOps.Workflow.* libraries targeting net10.0.

Libraries (14):
- Contracts, Abstractions (compiler, decompiler, expression runtime)
- Engine (execution, signaling, scheduling, projections, hosted services)
- ElkSharp (generic graph layout algorithm)
- Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg
- Signaling.Redis, Signaling.OracleAq
- DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle

WebService: ASP.NET Core Minimal API with 22 endpoints

Tests (8 projects, 109 tests pass):
- Engine.Tests (105 pass), WebService.Tests (4 E2E pass)
- Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests
- Signaling.Redis.Tests, IntegrationTests.Shared

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-20 19:14:44 +02:00

7.7 KiB

Raw Blame History

01. Requirements And Principles

1. Product Goal

Build a Serdica-owned workflow engine that can run the current Bulstrad workflow corpus without Elsa while preserving the existing service-level workflow product:

workflow start
task inbox and task lifecycle
business-reference based lookup
runtime state inspection
workflow diagrams
canonical schema and canonical validation exposure
workflow retention and hosted jobs

The engine must execute the same business behavior currently expressed in the declarative workflow DSL and canonical workflow definition model.

2. Functional Requirements

2.1 Workflow Definition Handling

The engine must:

discover workflow registrations from authored C# workflow classes
resolve the latest or exact workflow version through the existing registration catalog
compile authored declarative workflows into canonical runtime definitions
keep canonical validation as a first-class platform capability
reject invalid or unsupported definitions during startup or validation

2.2 Workflow Start

The engine must:

bind the untyped start payload to the workflow start request type
resolve or derive business reference data
initialize canonical workflow state
execute the initial sequence until a wait boundary or completion
create workflow projections and runtime state in one durable flow
support workflow continuations created during start

2.3 Human Tasks

The engine must:

activate human tasks with:
- task type
- route
- workflow roles
- task roles
- runtime roles
- payload
- business reference
preserve the current task assignment model:
- assign to self
- assign to user
- assign to runtime roles
- release
expose completed and active task history through the existing projection model

2.4 Task Completion

The engine must:

load the current workflow state and task context
authorize completion through the existing service layer
apply completion payload
continue execution from the task completion entry point
produce next tasks, next waits, next continuations, or completion
update runtime state and read projections durably

2.5 Runtime Semantics

The engine must support the semantic surface already present in declarative workflows:

state assignment
business reference assignment
human task activation
microservice calls
legacy rabbit calls
GraphQL calls
HTTP calls
conditional branches
decision branches
repeat loops
subworkflow invocation
continue-with orchestration
timeout branches
failure branches
function-backed expressions

2.6 Subworkflows

The engine must:

start child workflows
persist parent resume frames
carry child output back into parent state
support nested resume across multiple levels
preserve current declarative subworkflow semantics

2.7 Scheduling

The engine must support:

timeouts
retry wake-ups
delayed continuation
explicit wait-until behavior

This must happen without a steady-state polling loop.

2.8 Inspection And Operations

The service must continue to expose:

workflow definitions
workflow instances
workflow tasks
workflow task events
workflow diagrams
runtime state snapshots
canonical schema
canonical validation

3. Non-Functional Requirements

3.1 Multi-Instance Deployment

The service must support multiple application nodes against one shared Oracle database.

Implications:

no single-node assumptions
no in-memory-only correctness logic
no sticky workflow ownership
duplicate signal delivery must be safe

3.2 Durability

The system of record must be durable across:

process restart
node restart
full cluster restart
database restart

Workflow progress, pending waits, active tasks, and due timers must not be lost.

3.3 No Polling

Signal-driven wake-up is mandatory.

The engine must not rely on a periodic database scan loop to discover work. Blocking or event-driven delivery is required for:

task completion wake-up
delayed resume wake-up
subworkflow completion wake-up
external signal wake-up

3.4 One Database

Oracle is the shared durable state backend for:

workflow projections
workflow runtime snapshots
host coordination
signal and schedule durability through Oracle AQ

Redis may exist in the wider platform, but it is not required for engine correctness.

3.5 Observability

The engine must produce enough telemetry to answer:

what instance is waiting
why it is waiting
which signal resumed it
which node executed it
which definition version it used
why it failed
whether a message was retried, dead-lettered, or ignored as stale

3.6 Compatibility

The engine must preserve the existing public workflow service contracts unless a future product change explicitly changes them.

The following service-contract groups are especially important:

workflow start contracts
workflow definition contracts
workflow task contracts
workflow instance contracts
workflow operational contracts

4. Explicit V1 Assumptions

These assumptions simplify the engine architecture and are intentional.

4.1 Single Active Runtime Provider Per Deployment

The service runs one engine provider at a time.

This means:

no mixed-provider instance routing
no live migration between engines
no simultaneous old-runtime and engine execution inside one deployment

The design still keeps abstractions around the runtime, signaling bus, and scheduler so that future replacement remains possible.

4.2 Canonical Runtime, Not Elsa Activity Runtime

The target engine executes canonical workflow definitions directly.

Authored C# remains the source of truth, but runtime semantics are driven by canonical definitions compiled from that source.

4.3 Oracle AQ Is The Default Event Backbone

Oracle AQ is treated as part of the durable engine platform because it satisfies:

one-database architecture
blocking dequeue
durable delivery
delayed delivery
transactional behavior

5. Design Principles

5.1 Keep The Product Surface Stable

The workflow service remains the product boundary. The engine is an internal subsystem.

5.2 Separate Read Model From Runtime Model

Task and instance projections are optimized for product reads.

Runtime snapshots are optimized for deterministic resume.

They are related, but they are not the same data structure.

5.3 Run To Wait

The engine should never keep a workflow instance â€œhotâ€ in memory for correctness.

Execution should run until:

a task is activated
a timer is scheduled
an external signal wait is registered
the workflow completes

Then the snapshot is persisted and released.

5.4 Make Delivery At-Least-Once And Resume Idempotent

Distributed delivery is never exactly-once in practice.

The engine must treat duplicate signals, duplicate wake-ups, and late timer arrivals as normal conditions.

5.5 Keep Signals Small

Signals should identify work, not carry the full workflow state.

The database snapshot remains authoritative.

5.6 Keep Abstractions At The Backend Boundary

Abstract:

runtime provider
signal bus
schedule bus
snapshot store

Do not abstract away the workflow semantics themselves.

5.7 Prefer Transactional Consistency Over Cleverness

If a feature can be made transactional in Oracle, prefer that over eventually-consistent coordination tricks.

6. Success Criteria

The engine architecture is successful when:

the service can start and complete workflows without Elsa
task projections remain correct
delayed resumes happen without polling
a stopped cluster resumes safely after restart
a multi-node deployment does not corrupt workflow state
canonical definitions remain the execution contract
operations can inspect and support the system with existing product-level APIs

7.7 KiB Raw Blame History Unescape Escape

01. Requirements And Principles

1. Product Goal

2. Functional Requirements

2.1 Workflow Definition Handling

2.2 Workflow Start

2.3 Human Tasks

2.4 Task Completion

2.5 Runtime Semantics

2.6 Subworkflows

2.7 Scheduling

2.8 Inspection And Operations

3. Non-Functional Requirements

3.1 Multi-Instance Deployment

3.2 Durability

3.3 No Polling

3.4 One Database

3.5 Observability

3.6 Compatibility

4. Explicit V1 Assumptions

4.1 Single Active Runtime Provider Per Deployment

4.2 Canonical Runtime, Not Elsa Activity Runtime

4.3 Oracle AQ Is The Default Event Backbone

5. Design Principles

5.1 Keep The Product Surface Stable

5.2 Separate Read Model From Runtime Model

5.3 Run To Wait

5.4 Make Delivery At-Least-Once And Resume Idempotent

5.5 Keep Signals Small

5.6 Keep Abstractions At The Backend Boundary

5.7 Prefer Transactional Consistency Over Cleverness

6. Success Criteria

7.7 KiB

Raw Blame History