Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into standalone StellaOps.Workflow.* libraries targeting net10.0. Libraries (14): - Contracts, Abstractions (compiler, decompiler, expression runtime) - Engine (execution, signaling, scheduling, projections, hosted services) - ElkSharp (generic graph layout algorithm) - Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg - Signaling.Redis, Signaling.OracleAq - DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle WebService: ASP.NET Core Minimal API with 22 endpoints Tests (8 projects, 109 tests pass): - Engine.Tests (105 pass), WebService.Tests (4 E2E pass) - Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests - Signaling.Redis.Tests, IntegrationTests.Shared Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
21 KiB
07. Sprint Plan
Planning Assumptions
- sprint length: 2 weeks
- one team owning runtime, persistence, and service integration
- Oracle AQ available
- no concurrent-engine migration scope
- acceptance means code, tests, and updated docs
Sprint 1: Foundations And Contracts
Goal
Create the engine skeleton and the stable interfaces.
Scope
- add runtime provider abstraction
- add signal bus abstraction
- add schedule bus abstraction
- add runtime snapshot abstraction
- add engine option classes
- add
docs/engine/package
Deliverables
- interface set compiled into shared abstractions
- configuration classes
- initial DI composition path
- unit tests for options and registration
Exit Criteria
- service builds with engine abstractions present
- no Elsa runtime assumptions are introduced into new code
- docs and interface names are stable enough for later sprints
Sprint 2: Canonical Runtime Definition Store
Goal
Make canonical execution definitions available at runtime without Elsa.
Scope
- compile authored workflows to canonical runtime definitions at startup
- validate definitions during startup
- cache runtime definitions
- expose startup failure mode for invalid definitions
Deliverables
WorkflowRuntimeDefinitionStore- definition normalization pipeline
- startup validator
- tests covering:
- valid definition load
- invalid definition rejection
- version resolution
Exit Criteria
- all registered workflows load into runtime definition cache
- the runtime can resolve definition by name/version
Sprint 3: Snapshot Store And Versioned Runtime State
Goal
Turn WF_RUNTIME_STATES into a first-class engine snapshot store.
Scope
- extend runtime state schema
- implement snapshot mapper
- implement optimistic concurrency versioning
- wire snapshot reads and writes
Deliverables
- database migration scripts
OracleWorkflowRuntimeSnapshotStore- snapshot serialization contracts
- tests for:
- initial insert
- update with expected version
- stale version conflict
Exit Criteria
- runtime snapshots can be loaded and committed with version control
- stale updates are rejected safely
Sprint 4: AQ Signal And Schedule Backbone
Goal
Introduce Oracle AQ as the durable event backbone.
Scope
- create AQ setup scripts
- implement signal bus
- implement schedule bus
- implement signal envelope serialization
- implement hosted signal consumer skeleton
Deliverables
- AQ DDL scripts
OracleAqWorkflowSignalBusOracleAqWorkflowScheduleBus- integration tests with enqueue/dequeue
- delayed message smoke tests
Exit Criteria
- engine can publish and receive immediate signals without polling
- engine can publish and receive delayed signals
Sprint 5: Start Flow And Human Task Activation
Goal
Run workflows from start until first durable wait.
Scope
- implement execution coordinator
- implement canonical interpreter subset:
- state assignment
- business reference assignment
- task activation
- terminal completion
- integrate with
WorkflowRuntimeService - keep existing projection model
Deliverables
SerdicaEngineRuntimeProvider.StartAsync- execution slice result model
- task activation write path
- tests for:
- start to task
- start to completion
- business reference propagation
Exit Criteria
- selected declarative workflows can start and create correct tasks without Elsa
Sprint 6: Task Completion And Transport Calls
Goal
Advance workflows after task completion and support transport-backed orchestration.
Scope
- implement task completion execution path
- implement canonical interpreter support for:
- transport calls
- branches
- success/failure paths
- integrate completion flow with runtime snapshot commit
Deliverables
SerdicaEngineRuntimeProvider.CompleteAsync- transport dispatcher
- tests for:
- completion to next task
- failure branch
- timeout branch where applicable
Exit Criteria
- representative workflows can complete first task and reach correct next state
Sprint 7: Subworkflows, Continue-With, And Repeat
Goal
Support the higher-order orchestration patterns used heavily in the corpus.
Scope
- implement subworkflow frame persistence
- implement parent resume
- implement continue-with production
- implement repeat resume semantics
Deliverables
- subworkflow coordinator
- resume pointer serializer
- tests for:
- child completion resumes parent
- nested frame handling
- repeat interrupted by wait
- continue-with request emission
Exit Criteria
- representative subworkflow-heavy families execute correctly
Sprint 8: Timers, Retries, And Delayed Resume
Goal
Finish the non-polling scheduling path.
Scope
- implement timer waits
- implement retry scheduling
- implement stale timer ignore logic via waiting tokens
- integrate delayed AQ delivery into execution coordinator
Deliverables
- timer wait model
- delayed resume handler
- tests for:
- timer due resume
- retry due resume
- canceled timer ignored
- restart-safe delayed processing
Exit Criteria
- the engine supports time-based orchestration without polling loops
Sprint 9: Operational Parity
Goal
Reach product-surface and operations parity with the existing workflow service.
Scope
- diagram parity validation
- runtime state inspection parity
- retention integration
- structured metrics and logging
- DLQ handling and diagnostics
Deliverables
- runtime metadata mapping updates
- operational dashboards or documented metric set
- DLQ support
- tests for supportability paths
Exit Criteria
- operations can inspect and support engine-driven instances through the existing product surface
Sprint 10: Corpus Parity And Hardening
Goal
Prove the engine against the real declarative workflow corpus.
Scope
- execute representative high-fanout families end-to-end
- resolve remaining interpreter gaps
- multi-node duplicate delivery testing
- restart and recovery testing
- performance and soak tests
Deliverables
- parity report against selected workflow families
- load test results
- recovery test results
- production readiness checklist
Exit Criteria
- selected production-grade workflows run without Elsa
- restart recovery is proven
- no polling is used for steady-state signal or timer discovery
Sprint 11: Bulstrad E2E Parity And Oracle Reliability
Goal
Turn the engine from a validated runtime into a production-grade execution platform by proving it against real Bulstrad workflows and hostile Oracle operating conditions.
Scope
- build a curated Bulstrad Oracle-AQ E2E suite
- replace synthetic runtime-state backing in Oracle integration tests with the real Oracle runtime-state store
- add Oracle transaction-coupling tests for state, projections, and AQ publish
- add Oracle restart, redelivery, and DLQ replay tests
- add multi-worker and duplicate-delivery race tests
- add deterministic fault-injection around commit boundaries
Deliverables
BulstradOracleAqE2ETests- curated representative workflows with scripted downstream responders
- Oracle transport reliability suite covering:
- immediate and delayed delivery
- rollback and redelivery
- dead-letter browse and replay
- restart-safe delayed processing
- concurrency suite covering:
- duplicate signal delivery
- same-instance multi-worker races
- retry-after-conflict behavior
- documented timing expectations for cold-start and steady-state Oracle AQ
Implemented Coverage
The current Oracle-backed integration harness now includes:
- Bulstrad policy-change families:
OpenForChangePolicyReviewPolicyOpenForChangeAssistantAddAnnexAnnexCancellationAssistantPolicyReinstateAssistantPolicyCancellationAssistantPrintInsisDocuments
- shared policy families:
InsisIntegrationNewQuotationConfirmQuoteOrAplCancel
- Oracle transport and recovery matrix:
- immediate and delayed AQ delivery
- delayed backlog drain within a bounded latency envelope
- dequeue rollback redelivery
- ambient Oracle transaction commit and rollback for immediate messages
- ambient Oracle transaction commit and rollback for delayed messages
- dead-letter browse, replay, and backlog replay
- dead-letter backlog survival across Oracle restart
- timer backlog recovery across provider restart and Oracle restart
- external-signal backlog recovery, worker abandon/recovery, and duplicate-delivery races
- schedule/publish failure rollback inside workflow mutation transactions
Exit Criteria
- representative Bulstrad workflows execute correctly on
SerdicaEnginewith real Oracle AQ - AQ-backed restart and delayed-delivery behavior is proven under realistic timing variance
- duplicate delivery and commit-boundary failures are shown to be safe
- the team has a stable PR suite and a broader nightly suite for Oracle-backed engine validation
Sprint 12: Load, Performance, And Capacity Characterization
Goal
Turn the correctness-focused Oracle validation suite into a real load and performance program with stable smoke gates, nightly trend runs, soak coverage, and first capacity numbers.
Scope
- build a dedicated performance harness on top of the Oracle AQ integration foundation
- separate PR smoke, nightly characterization, weekly soak, and explicit capacity tiers
- add synthetic engine workloads for stable measurement
- add representative Bulstrad workload runners for business realism
- persist performance artifacts and summary reports
- define baseline and regression strategy per environment
Deliverables
- categorized performance scenarios:
WorkflowPerfLatencyWorkflowPerfThroughputWorkflowPerfSmokeWorkflowPerfNightlyWorkflowPerfSoakWorkflowPerfCapacity
- result artifact writer under
TestResults/workflow-performance/ - scenario matrix covering:
- AQ immediate bursts
- AQ delayed bursts
- mixed signal backlogs
- synthetic start/task/signal/timer/subworkflow flows
- representative Bulstrad families
- restart and replay under load
- first baseline report for local Docker and CI Oracle
- first capacity note for one-node and multi-node assumptions
Exit Criteria
- PR smoke load checks are cheap and stable enough to run continuously
- nightly runs capture latency, throughput, and correctness artifacts
- soak runs prove no backlog drift or correctness decay over extended execution
- representative Bulstrad workflows have measured latency envelopes, not just functional pass/fail
- the team has an initial sizing recommendation for worker concurrency and queue backlog expectations
Implemented Foundation
The current Sprint 12 implementation now includes:
- performance categories and artifact generation under
TestResults/workflow-performance/ - Oracle AQ smoke scenarios for:
- immediate burst drain
- delayed burst drain
- synthetic external-signal backlog resume
- short Bulstrad business burst using
QuoteOrAplCancel
- persisted comparison against the previous artifact for the same scenario and tier
- Oracle AQ nightly scenarios for:
- larger immediate burst drain
- larger delayed burst drain
- larger synthetic external-signal backlog resume
- Bulstrad
QuotationConfirm -> PdfGeneratorburst
- Oracle AQ soak scenario for:
- sustained synthetic signal round-trip waves without correctness drift
- Oracle AQ latency baseline for:
- one-at-a-time synthetic signal round-trip with phase-level latency summaries
- Oracle AQ throughput baseline for:
- parallel synthetic signal round-trip with
16workload concurrency and8signal workers
- parallel synthetic signal round-trip with
- Oracle AQ capacity ladder for:
- synthetic signal round-trip at concurrency
1,4,8, and16
- synthetic signal round-trip at concurrency
- thread-safe scripted transport recording for concurrent smoke scenarios
- first full Oracle baseline run with documented metrics in:
Reference
The detailed workload model, KPI set, harness design, and baseline strategy are defined in 08-load-and-performance-plan.md.
Sprint 13: Engine-Native Rendering And Authoring Projection
Goal
Restore definition rendering and authoring projection without reintroducing Elsa types or runtime dependencies into the workflow declarations or the engine host.
Scope
- design and implement a native definition-to-diagram projection for declarative and canonical workflows
- support deterministic node and edge generation from runtime definitions
- preserve task, branch, repeat, fork, timer, signal, and subworkflow visibility in the rendered output
- define a stable rendering contract for the operational API and future authoring tools
- keep rendering as a separate projection layer, not as part of runtime execution
Deliverables
- native rendering model and renderer for
WorkflowRuntimeDefinition - canonical-to-diagram projection rules for:
- linear sequences
- decisions and conditional branches
- repeats
- forks and joins
- timers and external-signal waits
- continuations and subworkflows
- updated operational metadata and diagram endpoints backed only by engine assets
- test suite covering rendering determinism and parity for representative Bulstrad workflows
Exit Criteria
- workflow definitions render without any Elsa packages, builders, or activity models
- rendered diagrams remain stable for the same declarative definition across rebuilds
- operational diagram inspection uses the native renderer only
- the rendering layer is ready to support a later authoring surface without changing workflow declarations
Sprint 14: Backend Portability And Store Profiles
Goal
Turn the Oracle-first engine into a backend-switchable engine with one selected backend profile per deployment.
Scope
- introduce backend profile abstraction and dedicated backend plugin registration
- split projection persistence from the current Oracle-first application service
- formalize mutation coordinator abstraction
- add backend-neutral dead-letter contract
- add backend conformance suite
- implement PostgreSQL profile
- design MongoDB profile in executable detail, with implementation only after explicit product approval
Deliverables
IWorkflowBackendRegistrationMarker- backend-neutral projection contract
- backend-neutral mutation coordinator contract
- backend conformance suite
- dedicated Oracle, PostgreSQL, and MongoDB backend plugin projects
- executable MongoDB backend plugin design package
Exit Criteria
- host selects one backend profile by configuration
- host stays backend-neutral and does not resolve Oracle/PostgreSQL directly
- Oracle and PostgreSQL pass the same conformance suite
- MongoDB path is specified well enough that implementation is a bounded engineering task
- workflow declarations and canonical definitions remain unchanged across backend profiles
Sprint 15: Backend-Neutral Parity And Performance Harness
Goal
Remove the remaining Oracle-only assumptions from the validation stack so PostgreSQL and MongoDB can be measured with the same correctness, Bulstrad, and performance scenarios.
Scope
- extract backend-neutral performance artifacts, categories, and scenario drivers
- extract backend-neutral runtime workload helpers from the Oracle-only harness
- define one hostile-condition matrix shared by Oracle, PostgreSQL, and MongoDB
- define one curated Bulstrad parity pack shared by all backends
- define one normalized performance artifact format and baseline comparison model
Deliverables
- shared
IntegrationTests/Performance/Common/package - shared normalized performance metrics model
- shared Bulstrad workload catalog for:
OpenForChangePolicyReviewPolicyOpenForChangeAssistantPrintInsisDocumentsAssistantAddAnnexAnnexCancellationAssistantPolicyCancellationAssistantPolicyReinstateInsisIntegrationNewQuotationConfirmQuoteOrAplCancel
- backend-neutral hostile-condition checklist for:
- duplicate delivery
- same-instance resume race
- abandon and reclaim
- rollback on publish/schedule failure
- restart with pending due messages
- DLQ replay
- backlog drain
Exit Criteria
- Oracle, PostgreSQL, and MongoDB use the same performance artifact shape
- Oracle no longer owns the reporting model for later backend baselines
- PostgreSQL and MongoDB can plug into the same workload definitions without changing workflow semantics
Sprint 16: PostgreSQL Hardening, Bulstrad Parity, And Baseline
Goal
Bring PostgreSQL to Oracle-level confidence for correctness, hostile conditions, representative product behavior, and measured performance.
Scope
- close the PostgreSQL hostile-condition gap to the Oracle matrix
- add PostgreSQL-backed Bulstrad E2E parity
- implement PostgreSQL latency, throughput, smoke, nightly, soak, and capacity suites
- publish PostgreSQL baseline artifacts and narrative summary
Deliverables
- PostgreSQL hostile-condition integration suite
- PostgreSQL Bulstrad parity suite
- PostgreSQL performance suites for:
- latency
- throughput
- smoke
- nightly
- soak
- capacity
- baseline documents:
11-postgres-performance-baseline-<date>.md11-postgres-performance-baseline-<date>.json
Exit Criteria
- PostgreSQL passes the same hostile-condition matrix as Oracle
- representative Bulstrad workflows run correctly on PostgreSQL
- PostgreSQL has a durable, documented performance baseline comparable to Oracle
Sprint 17: MongoDB Hardening, Bulstrad Parity, And Baseline
Goal
Bring MongoDB to the same product and operational confidence level as the relational backends without changing workflow behavior.
Scope
- close the MongoDB hostile-condition gap to the Oracle matrix
- add MongoDB-backed Bulstrad E2E parity
- implement MongoDB latency, throughput, smoke, nightly, soak, and capacity suites
- publish MongoDB baseline artifacts and narrative summary
Deliverables
- MongoDB hostile-condition integration suite
- MongoDB Bulstrad parity suite
- MongoDB performance suites for:
- latency
- throughput
- smoke
- nightly
- soak
- capacity
- baseline documents:
12-mongo-performance-baseline-<date>.md12-mongo-performance-baseline-<date>.json
Exit Criteria
- MongoDB passes the same hostile-condition matrix as Oracle
- representative Bulstrad workflows run correctly on MongoDB
- MongoDB has a durable, documented performance baseline comparable to Oracle and PostgreSQL
Sprint 18: Final Three-Backend Characterization And Decision Pack
Goal
Produce the final side-by-side comparison for Oracle, PostgreSQL, and MongoDB using the same workloads, the same correctness rules, and the same performance artifact format.
Scope
- rerun the shared Bulstrad parity pack on all three backends
- rerun the shared hostile-condition matrix on all three backends
- rerun the shared performance tiers and compare normalized metrics
- capture backend-specific metrics appendices without letting them replace normalized workflow metrics
- publish the final recommendation pack
Deliverables
- final comparison documents:
13-backend-comparison-<date>.md13-backend-comparison-<date>.json
- normalized comparison across:
- serial latency
- steady-state throughput
- capacity ladder
- backlog drain
- duplicate-delivery safety
- restart recovery
- backend-specific appendices for:
- Oracle wait and AQ observations
- PostgreSQL lock, WAL, and queue-table observations
- MongoDB transaction, lock, and change-stream observations
Exit Criteria
- all three backends are compared through the same workload lens
- the team has one documented backend recommendation pack
- future backend decisions can reuse the same comparison harness instead of inventing new ad hoc measurements
Current Status
- baseline comparison pack published in:
- normalized performance comparison is complete for Oracle, PostgreSQL, and MongoDB
- reliability and Bulstrad hardening depth remains Oracle-first, so the current comparison is a baseline decision pack, not the final production closeout
- the signal path is now split into durable store and wake driver seams
- PostgreSQL and MongoDB now persist transactional wake-outbox records behind that seam
- the optional Redis wake-driver plugin is implemented for PostgreSQL and MongoDB
- Oracle intentionally remains on native AQ and does not support the Redis wake-driver combination
Cross-Sprint Work Items
These should be maintained continuously, not left to the end:
- architecture doc updates
- test harness improvements
- canonical execution parity assertions
- operational telemetry quality
- snapshot schema versioning discipline
- Oracle timing-envelope observations for CI and local Docker environments
Final Milestone Definition
The project is complete when:
- the workflow service can run on the engine as the active runtime
- task and instance APIs remain stable
- Oracle AQ handles both immediate signaling and delayed scheduling
- the service resumes correctly after restart without polling
- the engine runs representative real workflows with production-grade observability