Files

master f5b5f24d95 Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects

Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into
standalone StellaOps.Workflow.* libraries targeting net10.0.

Libraries (14):
- Contracts, Abstractions (compiler, decompiler, expression runtime)
- Engine (execution, signaling, scheduling, projections, hosted services)
- ElkSharp (generic graph layout algorithm)
- Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg
- Signaling.Redis, Signaling.OracleAq
- DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle

WebService: ASP.NET Core Minimal API with 22 endpoints

Tests (8 projects, 109 tests pass):
- Engine.Tests (105 pass), WebService.Tests (4 E2E pass)
- Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests
- Signaling.Redis.Tests, IntegrationTests.Shared

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-20 19:14:44 +02:00

21 KiB

Raw Blame History

07. Sprint Plan

Planning Assumptions

sprint length: 2 weeks
one team owning runtime, persistence, and service integration
Oracle AQ available
no concurrent-engine migration scope
acceptance means code, tests, and updated docs

Sprint 1: Foundations And Contracts

Goal

Create the engine skeleton and the stable interfaces.

Scope

add runtime provider abstraction
add signal bus abstraction
add schedule bus abstraction
add runtime snapshot abstraction
add engine option classes
add docs/engine/ package

Deliverables

interface set compiled into shared abstractions
configuration classes
initial DI composition path
unit tests for options and registration

Exit Criteria

service builds with engine abstractions present
no Elsa runtime assumptions are introduced into new code
docs and interface names are stable enough for later sprints

Sprint 2: Canonical Runtime Definition Store

Goal

Make canonical execution definitions available at runtime without Elsa.

Scope

compile authored workflows to canonical runtime definitions at startup
validate definitions during startup
cache runtime definitions
expose startup failure mode for invalid definitions

Deliverables

WorkflowRuntimeDefinitionStore
definition normalization pipeline
startup validator
tests covering:
- valid definition load
- invalid definition rejection
- version resolution

Exit Criteria

all registered workflows load into runtime definition cache
the runtime can resolve definition by name/version

Sprint 3: Snapshot Store And Versioned Runtime State

Goal

Turn WF_RUNTIME_STATES into a first-class engine snapshot store.

Scope

extend runtime state schema
implement snapshot mapper
implement optimistic concurrency versioning
wire snapshot reads and writes

Deliverables

database migration scripts
OracleWorkflowRuntimeSnapshotStore
snapshot serialization contracts
tests for:
- initial insert
- update with expected version
- stale version conflict

Exit Criteria

runtime snapshots can be loaded and committed with version control
stale updates are rejected safely

Sprint 4: AQ Signal And Schedule Backbone

Goal

Introduce Oracle AQ as the durable event backbone.

Scope

create AQ setup scripts
implement signal bus
implement schedule bus
implement signal envelope serialization
implement hosted signal consumer skeleton

Deliverables

AQ DDL scripts
OracleAqWorkflowSignalBus
OracleAqWorkflowScheduleBus
integration tests with enqueue/dequeue
delayed message smoke tests

Exit Criteria

engine can publish and receive immediate signals without polling
engine can publish and receive delayed signals

Sprint 5: Start Flow And Human Task Activation

Goal

Run workflows from start until first durable wait.

Scope

implement execution coordinator
implement canonical interpreter subset:
- state assignment
- business reference assignment
- task activation
- terminal completion
integrate with WorkflowRuntimeService
keep existing projection model

Deliverables

SerdicaEngineRuntimeProvider.StartAsync
execution slice result model
task activation write path
tests for:
- start to task
- start to completion
- business reference propagation

Exit Criteria

selected declarative workflows can start and create correct tasks without Elsa

Sprint 6: Task Completion And Transport Calls

Goal

Advance workflows after task completion and support transport-backed orchestration.

Scope

implement task completion execution path
implement canonical interpreter support for:
- transport calls
- branches
- success/failure paths
integrate completion flow with runtime snapshot commit

Deliverables

SerdicaEngineRuntimeProvider.CompleteAsync
transport dispatcher
tests for:
- completion to next task
- failure branch
- timeout branch where applicable

Exit Criteria

representative workflows can complete first task and reach correct next state

Sprint 7: Subworkflows, Continue-With, And Repeat

Goal

Support the higher-order orchestration patterns used heavily in the corpus.

Scope

implement subworkflow frame persistence
implement parent resume
implement continue-with production
implement repeat resume semantics

Deliverables

subworkflow coordinator
resume pointer serializer
tests for:
- child completion resumes parent
- nested frame handling
- repeat interrupted by wait
- continue-with request emission

Exit Criteria

representative subworkflow-heavy families execute correctly

Sprint 8: Timers, Retries, And Delayed Resume

Goal

Finish the non-polling scheduling path.

Scope

implement timer waits
implement retry scheduling
implement stale timer ignore logic via waiting tokens
integrate delayed AQ delivery into execution coordinator

Deliverables

timer wait model
delayed resume handler
tests for:
- timer due resume
- retry due resume
- canceled timer ignored
- restart-safe delayed processing

Exit Criteria

the engine supports time-based orchestration without polling loops

Sprint 9: Operational Parity

Goal

Reach product-surface and operations parity with the existing workflow service.

Scope

diagram parity validation
runtime state inspection parity
retention integration
structured metrics and logging
DLQ handling and diagnostics

Deliverables

runtime metadata mapping updates
operational dashboards or documented metric set
DLQ support
tests for supportability paths

Exit Criteria

operations can inspect and support engine-driven instances through the existing product surface

Sprint 10: Corpus Parity And Hardening

Goal

Prove the engine against the real declarative workflow corpus.

Scope

execute representative high-fanout families end-to-end
resolve remaining interpreter gaps
multi-node duplicate delivery testing
restart and recovery testing
performance and soak tests

Deliverables

parity report against selected workflow families
load test results
recovery test results
production readiness checklist

Exit Criteria

selected production-grade workflows run without Elsa
restart recovery is proven
no polling is used for steady-state signal or timer discovery

Sprint 11: Bulstrad E2E Parity And Oracle Reliability

Goal

Turn the engine from a validated runtime into a production-grade execution platform by proving it against real Bulstrad workflows and hostile Oracle operating conditions.

Scope

build a curated Bulstrad Oracle-AQ E2E suite
replace synthetic runtime-state backing in Oracle integration tests with the real Oracle runtime-state store
add Oracle transaction-coupling tests for state, projections, and AQ publish
add Oracle restart, redelivery, and DLQ replay tests
add multi-worker and duplicate-delivery race tests
add deterministic fault-injection around commit boundaries

Deliverables

BulstradOracleAqE2ETests
curated representative workflows with scripted downstream responders
Oracle transport reliability suite covering:
- immediate and delayed delivery
- rollback and redelivery
- dead-letter browse and replay
- restart-safe delayed processing
concurrency suite covering:
- duplicate signal delivery
- same-instance multi-worker races
- retry-after-conflict behavior
documented timing expectations for cold-start and steady-state Oracle AQ

Implemented Coverage

The current Oracle-backed integration harness now includes:

Bulstrad policy-change families:
- OpenForChangePolicy
- ReviewPolicyOpenForChange
- AssistantAddAnnex
- AnnexCancellation
- AssistantPolicyReinstate
- AssistantPolicyCancellation
- AssistantPrintInsisDocuments
shared policy families:
- InsisIntegrationNew
- QuotationConfirm
- QuoteOrAplCancel
Oracle transport and recovery matrix:
- immediate and delayed AQ delivery
- delayed backlog drain within a bounded latency envelope
- dequeue rollback redelivery
- ambient Oracle transaction commit and rollback for immediate messages
- ambient Oracle transaction commit and rollback for delayed messages
- dead-letter browse, replay, and backlog replay
- dead-letter backlog survival across Oracle restart
- timer backlog recovery across provider restart and Oracle restart
- external-signal backlog recovery, worker abandon/recovery, and duplicate-delivery races
- schedule/publish failure rollback inside workflow mutation transactions

Exit Criteria

representative Bulstrad workflows execute correctly on SerdicaEngine with real Oracle AQ
AQ-backed restart and delayed-delivery behavior is proven under realistic timing variance
duplicate delivery and commit-boundary failures are shown to be safe
the team has a stable PR suite and a broader nightly suite for Oracle-backed engine validation

Sprint 12: Load, Performance, And Capacity Characterization

Goal

Turn the correctness-focused Oracle validation suite into a real load and performance program with stable smoke gates, nightly trend runs, soak coverage, and first capacity numbers.

Scope

build a dedicated performance harness on top of the Oracle AQ integration foundation
separate PR smoke, nightly characterization, weekly soak, and explicit capacity tiers
add synthetic engine workloads for stable measurement
add representative Bulstrad workload runners for business realism
persist performance artifacts and summary reports
define baseline and regression strategy per environment

Deliverables

categorized performance scenarios:
- WorkflowPerfLatency
- WorkflowPerfThroughput
- WorkflowPerfSmoke
- WorkflowPerfNightly
- WorkflowPerfSoak
- WorkflowPerfCapacity
result artifact writer under TestResults/workflow-performance/
scenario matrix covering:
- AQ immediate bursts
- AQ delayed bursts
- mixed signal backlogs
- synthetic start/task/signal/timer/subworkflow flows
- representative Bulstrad families
- restart and replay under load
first baseline report for local Docker and CI Oracle
first capacity note for one-node and multi-node assumptions

Exit Criteria

PR smoke load checks are cheap and stable enough to run continuously
nightly runs capture latency, throughput, and correctness artifacts
soak runs prove no backlog drift or correctness decay over extended execution
representative Bulstrad workflows have measured latency envelopes, not just functional pass/fail
the team has an initial sizing recommendation for worker concurrency and queue backlog expectations

Implemented Foundation

The current Sprint 12 implementation now includes:

performance categories and artifact generation under TestResults/workflow-performance/
Oracle AQ smoke scenarios for:
- immediate burst drain
- delayed burst drain
- synthetic external-signal backlog resume
- short Bulstrad business burst using QuoteOrAplCancel
persisted comparison against the previous artifact for the same scenario and tier
Oracle AQ nightly scenarios for:
- larger immediate burst drain
- larger delayed burst drain
- larger synthetic external-signal backlog resume
- Bulstrad QuotationConfirm -> PdfGenerator burst
Oracle AQ soak scenario for:
- sustained synthetic signal round-trip waves without correctness drift
Oracle AQ latency baseline for:
- one-at-a-time synthetic signal round-trip with phase-level latency summaries
Oracle AQ throughput baseline for:
- parallel synthetic signal round-trip with 16 workload concurrency and 8 signal workers
Oracle AQ capacity ladder for:
- synthetic signal round-trip at concurrency 1, 4, 8, and 16
thread-safe scripted transport recording for concurrent smoke scenarios
first full Oracle baseline run with documented metrics in:
- 10-oracle-performance-baseline-2026-03-17.md
- 10-oracle-performance-baseline-2026-03-17.json

Reference

The detailed workload model, KPI set, harness design, and baseline strategy are defined in 08-load-and-performance-plan.md.

Sprint 13: Engine-Native Rendering And Authoring Projection

Goal

Restore definition rendering and authoring projection without reintroducing Elsa types or runtime dependencies into the workflow declarations or the engine host.

Scope

design and implement a native definition-to-diagram projection for declarative and canonical workflows
support deterministic node and edge generation from runtime definitions
preserve task, branch, repeat, fork, timer, signal, and subworkflow visibility in the rendered output
define a stable rendering contract for the operational API and future authoring tools
keep rendering as a separate projection layer, not as part of runtime execution

Deliverables

native rendering model and renderer for WorkflowRuntimeDefinition
canonical-to-diagram projection rules for:
- linear sequences
- decisions and conditional branches
- repeats
- forks and joins
- timers and external-signal waits
- continuations and subworkflows
updated operational metadata and diagram endpoints backed only by engine assets
test suite covering rendering determinism and parity for representative Bulstrad workflows

Exit Criteria

workflow definitions render without any Elsa packages, builders, or activity models
rendered diagrams remain stable for the same declarative definition across rebuilds
operational diagram inspection uses the native renderer only
the rendering layer is ready to support a later authoring surface without changing workflow declarations

Sprint 14: Backend Portability And Store Profiles

Goal

Turn the Oracle-first engine into a backend-switchable engine with one selected backend profile per deployment.

Scope

introduce backend profile abstraction and dedicated backend plugin registration
split projection persistence from the current Oracle-first application service
formalize mutation coordinator abstraction
add backend-neutral dead-letter contract
add backend conformance suite
implement PostgreSQL profile
design MongoDB profile in executable detail, with implementation only after explicit product approval

Deliverables

IWorkflowBackendRegistrationMarker
backend-neutral projection contract
backend-neutral mutation coordinator contract
backend conformance suite
dedicated Oracle, PostgreSQL, and MongoDB backend plugin projects
executable MongoDB backend plugin design package

Exit Criteria

host selects one backend profile by configuration
host stays backend-neutral and does not resolve Oracle/PostgreSQL directly
Oracle and PostgreSQL pass the same conformance suite
MongoDB path is specified well enough that implementation is a bounded engineering task
workflow declarations and canonical definitions remain unchanged across backend profiles

Sprint 15: Backend-Neutral Parity And Performance Harness

Goal

Remove the remaining Oracle-only assumptions from the validation stack so PostgreSQL and MongoDB can be measured with the same correctness, Bulstrad, and performance scenarios.

Scope

extract backend-neutral performance artifacts, categories, and scenario drivers
extract backend-neutral runtime workload helpers from the Oracle-only harness
define one hostile-condition matrix shared by Oracle, PostgreSQL, and MongoDB
define one curated Bulstrad parity pack shared by all backends
define one normalized performance artifact format and baseline comparison model

Deliverables

shared IntegrationTests/Performance/Common/ package
shared normalized performance metrics model
shared Bulstrad workload catalog for:
- OpenForChangePolicy
- ReviewPolicyOpenForChange
- AssistantPrintInsisDocuments
- AssistantAddAnnex
- AnnexCancellation
- AssistantPolicyCancellation
- AssistantPolicyReinstate
- InsisIntegrationNew
- QuotationConfirm
- QuoteOrAplCancel
backend-neutral hostile-condition checklist for:
- duplicate delivery
- same-instance resume race
- abandon and reclaim
- rollback on publish/schedule failure
- restart with pending due messages
- DLQ replay
- backlog drain

Exit Criteria

Oracle, PostgreSQL, and MongoDB use the same performance artifact shape
Oracle no longer owns the reporting model for later backend baselines
PostgreSQL and MongoDB can plug into the same workload definitions without changing workflow semantics

Sprint 16: PostgreSQL Hardening, Bulstrad Parity, And Baseline

Goal

Bring PostgreSQL to Oracle-level confidence for correctness, hostile conditions, representative product behavior, and measured performance.

Scope

close the PostgreSQL hostile-condition gap to the Oracle matrix
add PostgreSQL-backed Bulstrad E2E parity
implement PostgreSQL latency, throughput, smoke, nightly, soak, and capacity suites
publish PostgreSQL baseline artifacts and narrative summary

Deliverables

PostgreSQL hostile-condition integration suite
PostgreSQL Bulstrad parity suite
PostgreSQL performance suites for:
- latency
- throughput
- smoke
- nightly
- soak
- capacity
baseline documents:
- 11-postgres-performance-baseline-<date>.md
- 11-postgres-performance-baseline-<date>.json

Exit Criteria

PostgreSQL passes the same hostile-condition matrix as Oracle
representative Bulstrad workflows run correctly on PostgreSQL
PostgreSQL has a durable, documented performance baseline comparable to Oracle

Sprint 17: MongoDB Hardening, Bulstrad Parity, And Baseline

Goal

Bring MongoDB to the same product and operational confidence level as the relational backends without changing workflow behavior.

Scope

close the MongoDB hostile-condition gap to the Oracle matrix
add MongoDB-backed Bulstrad E2E parity
implement MongoDB latency, throughput, smoke, nightly, soak, and capacity suites
publish MongoDB baseline artifacts and narrative summary

Deliverables

MongoDB hostile-condition integration suite
MongoDB Bulstrad parity suite
MongoDB performance suites for:
- latency
- throughput
- smoke
- nightly
- soak
- capacity
baseline documents:
- 12-mongo-performance-baseline-<date>.md
- 12-mongo-performance-baseline-<date>.json

Exit Criteria

MongoDB passes the same hostile-condition matrix as Oracle
representative Bulstrad workflows run correctly on MongoDB
MongoDB has a durable, documented performance baseline comparable to Oracle and PostgreSQL

Sprint 18: Final Three-Backend Characterization And Decision Pack

Goal

Produce the final side-by-side comparison for Oracle, PostgreSQL, and MongoDB using the same workloads, the same correctness rules, and the same performance artifact format.

Scope

rerun the shared Bulstrad parity pack on all three backends
rerun the shared hostile-condition matrix on all three backends
rerun the shared performance tiers and compare normalized metrics
capture backend-specific metrics appendices without letting them replace normalized workflow metrics
publish the final recommendation pack

Deliverables

final comparison documents:
- 13-backend-comparison-<date>.md
- 13-backend-comparison-<date>.json
normalized comparison across:
- serial latency
- steady-state throughput
- capacity ladder
- backlog drain
- duplicate-delivery safety
- restart recovery
backend-specific appendices for:
- Oracle wait and AQ observations
- PostgreSQL lock, WAL, and queue-table observations
- MongoDB transaction, lock, and change-stream observations

Exit Criteria

all three backends are compared through the same workload lens
the team has one documented backend recommendation pack
future backend decisions can reuse the same comparison harness instead of inventing new ad hoc measurements

Current Status

baseline comparison pack published in:
- 13-backend-comparison-2026-03-17.md
- 13-backend-comparison-2026-03-17.json
normalized performance comparison is complete for Oracle, PostgreSQL, and MongoDB
reliability and Bulstrad hardening depth remains Oracle-first, so the current comparison is a baseline decision pack, not the final production closeout
the signal path is now split into durable store and wake driver seams
PostgreSQL and MongoDB now persist transactional wake-outbox records behind that seam
the optional Redis wake-driver plugin is implemented for PostgreSQL and MongoDB
Oracle intentionally remains on native AQ and does not support the Redis wake-driver combination

Cross-Sprint Work Items

These should be maintained continuously, not left to the end:

architecture doc updates
test harness improvements
canonical execution parity assertions
operational telemetry quality
snapshot schema versioning discipline
Oracle timing-envelope observations for CI and local Docker environments

Final Milestone Definition

The project is complete when:

the workflow service can run on the engine as the active runtime
task and instance APIs remain stable
Oracle AQ handles both immediate signaling and delayed scheduling
the service resumes correctly after restart without polling
the engine runs representative real workflows with production-grade observability

21 KiB Raw Blame History

07. Sprint Plan

Planning Assumptions

Sprint 1: Foundations And Contracts

Goal

Scope

Deliverables

Exit Criteria

Sprint 2: Canonical Runtime Definition Store

Goal

Scope

Deliverables

Exit Criteria

Sprint 3: Snapshot Store And Versioned Runtime State

Goal

Scope

Deliverables

Exit Criteria

Sprint 4: AQ Signal And Schedule Backbone

Goal

Scope

Deliverables

Exit Criteria

Sprint 5: Start Flow And Human Task Activation

Goal

Scope

Deliverables

Exit Criteria

Sprint 6: Task Completion And Transport Calls

Goal

Scope

Deliverables

Exit Criteria

Sprint 7: Subworkflows, Continue-With, And Repeat

Goal

Scope

Deliverables

Exit Criteria

Sprint 8: Timers, Retries, And Delayed Resume

Goal

Scope

Deliverables

Exit Criteria

Sprint 9: Operational Parity

Goal

Scope

Deliverables

Exit Criteria

Sprint 10: Corpus Parity And Hardening

Goal

Scope

Deliverables

Exit Criteria

Sprint 11: Bulstrad E2E Parity And Oracle Reliability

Goal

Scope

Deliverables

Implemented Coverage

Exit Criteria

Sprint 12: Load, Performance, And Capacity Characterization

Goal

Scope

Deliverables

Exit Criteria

Implemented Foundation

Reference

Sprint 13: Engine-Native Rendering And Authoring Projection

Goal

Scope

Deliverables

Exit Criteria

Sprint 14: Backend Portability And Store Profiles

Goal

Scope

Deliverables

Exit Criteria

Sprint 15: Backend-Neutral Parity And Performance Harness

Goal

Scope

Deliverables

21 KiB

Raw Blame History