# 07. Sprint Plan ## Planning Assumptions - sprint length: 2 weeks - one team owning runtime, persistence, and service integration - Oracle AQ available - no concurrent-engine migration scope - acceptance means code, tests, and updated docs ## Sprint 1: Foundations And Contracts ### Goal Create the engine skeleton and the stable interfaces. ### Scope - add runtime provider abstraction - add signal bus abstraction - add schedule bus abstraction - add runtime snapshot abstraction - add engine option classes - add `docs/engine/` package ### Deliverables - interface set compiled into shared abstractions - configuration classes - initial DI composition path - unit tests for options and registration ### Exit Criteria - service builds with engine abstractions present - no Elsa runtime assumptions are introduced into new code - docs and interface names are stable enough for later sprints ## Sprint 2: Canonical Runtime Definition Store ### Goal Make canonical execution definitions available at runtime without Elsa. ### Scope - compile authored workflows to canonical runtime definitions at startup - validate definitions during startup - cache runtime definitions - expose startup failure mode for invalid definitions ### Deliverables - `WorkflowRuntimeDefinitionStore` - definition normalization pipeline - startup validator - tests covering: - valid definition load - invalid definition rejection - version resolution ### Exit Criteria - all registered workflows load into runtime definition cache - the runtime can resolve definition by name/version ## Sprint 3: Snapshot Store And Versioned Runtime State ### Goal Turn `WF_RUNTIME_STATES` into a first-class engine snapshot store. ### Scope - extend runtime state schema - implement snapshot mapper - implement optimistic concurrency versioning - wire snapshot reads and writes ### Deliverables - database migration scripts - `OracleWorkflowRuntimeSnapshotStore` - snapshot serialization contracts - tests for: - initial insert - update with expected version - stale version conflict ### Exit Criteria - runtime snapshots can be loaded and committed with version control - stale updates are rejected safely ## Sprint 4: AQ Signal And Schedule Backbone ### Goal Introduce Oracle AQ as the durable event backbone. ### Scope - create AQ setup scripts - implement signal bus - implement schedule bus - implement signal envelope serialization - implement hosted signal consumer skeleton ### Deliverables - AQ DDL scripts - `OracleAqWorkflowSignalBus` - `OracleAqWorkflowScheduleBus` - integration tests with enqueue/dequeue - delayed message smoke tests ### Exit Criteria - engine can publish and receive immediate signals without polling - engine can publish and receive delayed signals ## Sprint 5: Start Flow And Human Task Activation ### Goal Run workflows from start until first durable wait. ### Scope - implement execution coordinator - implement canonical interpreter subset: - state assignment - business reference assignment - task activation - terminal completion - integrate with `WorkflowRuntimeService` - keep existing projection model ### Deliverables - `SerdicaEngineRuntimeProvider.StartAsync` - execution slice result model - task activation write path - tests for: - start to task - start to completion - business reference propagation ### Exit Criteria - selected declarative workflows can start and create correct tasks without Elsa ## Sprint 6: Task Completion And Transport Calls ### Goal Advance workflows after task completion and support transport-backed orchestration. ### Scope - implement task completion execution path - implement canonical interpreter support for: - transport calls - branches - success/failure paths - integrate completion flow with runtime snapshot commit ### Deliverables - `SerdicaEngineRuntimeProvider.CompleteAsync` - transport dispatcher - tests for: - completion to next task - failure branch - timeout branch where applicable ### Exit Criteria - representative workflows can complete first task and reach correct next state ## Sprint 7: Subworkflows, Continue-With, And Repeat ### Goal Support the higher-order orchestration patterns used heavily in the corpus. ### Scope - implement subworkflow frame persistence - implement parent resume - implement continue-with production - implement repeat resume semantics ### Deliverables - subworkflow coordinator - resume pointer serializer - tests for: - child completion resumes parent - nested frame handling - repeat interrupted by wait - continue-with request emission ### Exit Criteria - representative subworkflow-heavy families execute correctly ## Sprint 8: Timers, Retries, And Delayed Resume ### Goal Finish the non-polling scheduling path. ### Scope - implement timer waits - implement retry scheduling - implement stale timer ignore logic via waiting tokens - integrate delayed AQ delivery into execution coordinator ### Deliverables - timer wait model - delayed resume handler - tests for: - timer due resume - retry due resume - canceled timer ignored - restart-safe delayed processing ### Exit Criteria - the engine supports time-based orchestration without polling loops ## Sprint 9: Operational Parity ### Goal Reach product-surface and operations parity with the existing workflow service. ### Scope - diagram parity validation - runtime state inspection parity - retention integration - structured metrics and logging - DLQ handling and diagnostics ### Deliverables - runtime metadata mapping updates - operational dashboards or documented metric set - DLQ support - tests for supportability paths ### Exit Criteria - operations can inspect and support engine-driven instances through the existing product surface ## Sprint 10: Corpus Parity And Hardening ### Goal Prove the engine against the real declarative workflow corpus. ### Scope - execute representative high-fanout families end-to-end - resolve remaining interpreter gaps - multi-node duplicate delivery testing - restart and recovery testing - performance and soak tests ### Deliverables - parity report against selected workflow families - load test results - recovery test results - production readiness checklist ### Exit Criteria - selected production-grade workflows run without Elsa - restart recovery is proven - no polling is used for steady-state signal or timer discovery ## Sprint 11: Bulstrad E2E Parity And Oracle Reliability ### Goal Turn the engine from a validated runtime into a production-grade execution platform by proving it against real Bulstrad workflows and hostile Oracle operating conditions. ### Scope - build a curated Bulstrad Oracle-AQ E2E suite - replace synthetic runtime-state backing in Oracle integration tests with the real Oracle runtime-state store - add Oracle transaction-coupling tests for state, projections, and AQ publish - add Oracle restart, redelivery, and DLQ replay tests - add multi-worker and duplicate-delivery race tests - add deterministic fault-injection around commit boundaries ### Deliverables - `BulstradOracleAqE2ETests` - curated representative workflows with scripted downstream responders - Oracle transport reliability suite covering: - immediate and delayed delivery - rollback and redelivery - dead-letter browse and replay - restart-safe delayed processing - concurrency suite covering: - duplicate signal delivery - same-instance multi-worker races - retry-after-conflict behavior - documented timing expectations for cold-start and steady-state Oracle AQ ### Implemented Coverage The current Oracle-backed integration harness now includes: - Bulstrad policy-change families: - `OpenForChangePolicy` - `ReviewPolicyOpenForChange` - `AssistantAddAnnex` - `AnnexCancellation` - `AssistantPolicyReinstate` - `AssistantPolicyCancellation` - `AssistantPrintInsisDocuments` - shared policy families: - `InsisIntegrationNew` - `QuotationConfirm` - `QuoteOrAplCancel` - Oracle transport and recovery matrix: - immediate and delayed AQ delivery - delayed backlog drain within a bounded latency envelope - dequeue rollback redelivery - ambient Oracle transaction commit and rollback for immediate messages - ambient Oracle transaction commit and rollback for delayed messages - dead-letter browse, replay, and backlog replay - dead-letter backlog survival across Oracle restart - timer backlog recovery across provider restart and Oracle restart - external-signal backlog recovery, worker abandon/recovery, and duplicate-delivery races - schedule/publish failure rollback inside workflow mutation transactions ### Exit Criteria - representative Bulstrad workflows execute correctly on `SerdicaEngine` with real Oracle AQ - AQ-backed restart and delayed-delivery behavior is proven under realistic timing variance - duplicate delivery and commit-boundary failures are shown to be safe - the team has a stable PR suite and a broader nightly suite for Oracle-backed engine validation ## Sprint 12: Load, Performance, And Capacity Characterization ### Goal Turn the correctness-focused Oracle validation suite into a real load and performance program with stable smoke gates, nightly trend runs, soak coverage, and first capacity numbers. ### Scope - build a dedicated performance harness on top of the Oracle AQ integration foundation - separate PR smoke, nightly characterization, weekly soak, and explicit capacity tiers - add synthetic engine workloads for stable measurement - add representative Bulstrad workload runners for business realism - persist performance artifacts and summary reports - define baseline and regression strategy per environment ### Deliverables - categorized performance scenarios: - `WorkflowPerfLatency` - `WorkflowPerfThroughput` - `WorkflowPerfSmoke` - `WorkflowPerfNightly` - `WorkflowPerfSoak` - `WorkflowPerfCapacity` - result artifact writer under `TestResults/workflow-performance/` - scenario matrix covering: - AQ immediate bursts - AQ delayed bursts - mixed signal backlogs - synthetic start/task/signal/timer/subworkflow flows - representative Bulstrad families - restart and replay under load - first baseline report for local Docker and CI Oracle - first capacity note for one-node and multi-node assumptions ### Exit Criteria - PR smoke load checks are cheap and stable enough to run continuously - nightly runs capture latency, throughput, and correctness artifacts - soak runs prove no backlog drift or correctness decay over extended execution - representative Bulstrad workflows have measured latency envelopes, not just functional pass/fail - the team has an initial sizing recommendation for worker concurrency and queue backlog expectations ### Implemented Foundation The current Sprint 12 implementation now includes: - performance categories and artifact generation under `TestResults/workflow-performance/` - Oracle AQ smoke scenarios for: - immediate burst drain - delayed burst drain - synthetic external-signal backlog resume - short Bulstrad business burst using `QuoteOrAplCancel` - persisted comparison against the previous artifact for the same scenario and tier - Oracle AQ nightly scenarios for: - larger immediate burst drain - larger delayed burst drain - larger synthetic external-signal backlog resume - Bulstrad `QuotationConfirm -> PdfGenerator` burst - Oracle AQ soak scenario for: - sustained synthetic signal round-trip waves without correctness drift - Oracle AQ latency baseline for: - one-at-a-time synthetic signal round-trip with phase-level latency summaries - Oracle AQ throughput baseline for: - parallel synthetic signal round-trip with `16` workload concurrency and `8` signal workers - Oracle AQ capacity ladder for: - synthetic signal round-trip at concurrency `1`, `4`, `8`, and `16` - thread-safe scripted transport recording for concurrent smoke scenarios - first full Oracle baseline run with documented metrics in: - [10-oracle-performance-baseline-2026-03-17.md](10-oracle-performance-baseline-2026-03-17.md) - [10-oracle-performance-baseline-2026-03-17.json](10-oracle-performance-baseline-2026-03-17.json) ### Reference The detailed workload model, KPI set, harness design, and baseline strategy are defined in [08-load-and-performance-plan.md](08-load-and-performance-plan.md). ## Sprint 13: Engine-Native Rendering And Authoring Projection ### Goal Restore definition rendering and authoring projection without reintroducing Elsa types or runtime dependencies into the workflow declarations or the engine host. ### Scope - design and implement a native definition-to-diagram projection for declarative and canonical workflows - support deterministic node and edge generation from runtime definitions - preserve task, branch, repeat, fork, timer, signal, and subworkflow visibility in the rendered output - define a stable rendering contract for the operational API and future authoring tools - keep rendering as a separate projection layer, not as part of runtime execution ### Deliverables - native rendering model and renderer for `WorkflowRuntimeDefinition` - canonical-to-diagram projection rules for: - linear sequences - decisions and conditional branches - repeats - forks and joins - timers and external-signal waits - continuations and subworkflows - updated operational metadata and diagram endpoints backed only by engine assets - test suite covering rendering determinism and parity for representative Bulstrad workflows ### Exit Criteria - workflow definitions render without any Elsa packages, builders, or activity models - rendered diagrams remain stable for the same declarative definition across rebuilds - operational diagram inspection uses the native renderer only - the rendering layer is ready to support a later authoring surface without changing workflow declarations ## Sprint 14: Backend Portability And Store Profiles ### Goal Turn the Oracle-first engine into a backend-switchable engine with one selected backend profile per deployment. ### Scope - introduce backend profile abstraction and dedicated backend plugin registration - split projection persistence from the current Oracle-first application service - formalize mutation coordinator abstraction - add backend-neutral dead-letter contract - add backend conformance suite - implement PostgreSQL profile - design MongoDB profile in executable detail, with implementation only after explicit product approval ### Deliverables - `IWorkflowBackendRegistrationMarker` - backend-neutral projection contract - backend-neutral mutation coordinator contract - backend conformance suite - dedicated Oracle, PostgreSQL, and MongoDB backend plugin projects - executable MongoDB backend plugin design package ### Exit Criteria - host selects one backend profile by configuration - host stays backend-neutral and does not resolve Oracle/PostgreSQL directly - Oracle and PostgreSQL pass the same conformance suite - MongoDB path is specified well enough that implementation is a bounded engineering task - workflow declarations and canonical definitions remain unchanged across backend profiles ## Sprint 15: Backend-Neutral Parity And Performance Harness ### Goal Remove the remaining Oracle-only assumptions from the validation stack so PostgreSQL and MongoDB can be measured with the same correctness, Bulstrad, and performance scenarios. ### Scope - extract backend-neutral performance artifacts, categories, and scenario drivers - extract backend-neutral runtime workload helpers from the Oracle-only harness - define one hostile-condition matrix shared by Oracle, PostgreSQL, and MongoDB - define one curated Bulstrad parity pack shared by all backends - define one normalized performance artifact format and baseline comparison model ### Deliverables - shared `IntegrationTests/Performance/Common/` package - shared normalized performance metrics model - shared Bulstrad workload catalog for: - `OpenForChangePolicy` - `ReviewPolicyOpenForChange` - `AssistantPrintInsisDocuments` - `AssistantAddAnnex` - `AnnexCancellation` - `AssistantPolicyCancellation` - `AssistantPolicyReinstate` - `InsisIntegrationNew` - `QuotationConfirm` - `QuoteOrAplCancel` - backend-neutral hostile-condition checklist for: - duplicate delivery - same-instance resume race - abandon and reclaim - rollback on publish/schedule failure - restart with pending due messages - DLQ replay - backlog drain ### Exit Criteria - Oracle, PostgreSQL, and MongoDB use the same performance artifact shape - Oracle no longer owns the reporting model for later backend baselines - PostgreSQL and MongoDB can plug into the same workload definitions without changing workflow semantics ## Sprint 16: PostgreSQL Hardening, Bulstrad Parity, And Baseline ### Goal Bring PostgreSQL to Oracle-level confidence for correctness, hostile conditions, representative product behavior, and measured performance. ### Scope - close the PostgreSQL hostile-condition gap to the Oracle matrix - add PostgreSQL-backed Bulstrad E2E parity - implement PostgreSQL latency, throughput, smoke, nightly, soak, and capacity suites - publish PostgreSQL baseline artifacts and narrative summary ### Deliverables - PostgreSQL hostile-condition integration suite - PostgreSQL Bulstrad parity suite - PostgreSQL performance suites for: - latency - throughput - smoke - nightly - soak - capacity - baseline documents: - `11-postgres-performance-baseline-.md` - `11-postgres-performance-baseline-.json` ### Exit Criteria - PostgreSQL passes the same hostile-condition matrix as Oracle - representative Bulstrad workflows run correctly on PostgreSQL - PostgreSQL has a durable, documented performance baseline comparable to Oracle ## Sprint 17: MongoDB Hardening, Bulstrad Parity, And Baseline ### Goal Bring MongoDB to the same product and operational confidence level as the relational backends without changing workflow behavior. ### Scope - close the MongoDB hostile-condition gap to the Oracle matrix - add MongoDB-backed Bulstrad E2E parity - implement MongoDB latency, throughput, smoke, nightly, soak, and capacity suites - publish MongoDB baseline artifacts and narrative summary ### Deliverables - MongoDB hostile-condition integration suite - MongoDB Bulstrad parity suite - MongoDB performance suites for: - latency - throughput - smoke - nightly - soak - capacity - baseline documents: - `12-mongo-performance-baseline-.md` - `12-mongo-performance-baseline-.json` ### Exit Criteria - MongoDB passes the same hostile-condition matrix as Oracle - representative Bulstrad workflows run correctly on MongoDB - MongoDB has a durable, documented performance baseline comparable to Oracle and PostgreSQL ## Sprint 18: Final Three-Backend Characterization And Decision Pack ### Goal Produce the final side-by-side comparison for Oracle, PostgreSQL, and MongoDB using the same workloads, the same correctness rules, and the same performance artifact format. ### Scope - rerun the shared Bulstrad parity pack on all three backends - rerun the shared hostile-condition matrix on all three backends - rerun the shared performance tiers and compare normalized metrics - capture backend-specific metrics appendices without letting them replace normalized workflow metrics - publish the final recommendation pack ### Deliverables - final comparison documents: - `13-backend-comparison-.md` - `13-backend-comparison-.json` - normalized comparison across: - serial latency - steady-state throughput - capacity ladder - backlog drain - duplicate-delivery safety - restart recovery - backend-specific appendices for: - Oracle wait and AQ observations - PostgreSQL lock, WAL, and queue-table observations - MongoDB transaction, lock, and change-stream observations ### Exit Criteria - all three backends are compared through the same workload lens - the team has one documented backend recommendation pack - future backend decisions can reuse the same comparison harness instead of inventing new ad hoc measurements ### Current Status - baseline comparison pack published in: - [13-backend-comparison-2026-03-17.md](13-backend-comparison-2026-03-17.md) - [13-backend-comparison-2026-03-17.json](13-backend-comparison-2026-03-17.json) - normalized performance comparison is complete for Oracle, PostgreSQL, and MongoDB - reliability and Bulstrad hardening depth remains Oracle-first, so the current comparison is a baseline decision pack, not the final production closeout - the signal path is now split into durable store and wake driver seams - PostgreSQL and MongoDB now persist transactional wake-outbox records behind that seam - the optional Redis wake-driver plugin is implemented for PostgreSQL and MongoDB - Oracle intentionally remains on native AQ and does not support the Redis wake-driver combination ## Cross-Sprint Work Items These should be maintained continuously, not left to the end: - architecture doc updates - test harness improvements - canonical execution parity assertions - operational telemetry quality - snapshot schema versioning discipline - Oracle timing-envelope observations for CI and local Docker environments ## Final Milestone Definition The project is complete when: - the workflow service can run on the engine as the active runtime - task and instance APIs remain stable - Oracle AQ handles both immediate signaling and delayed scheduling - the service resumes correctly after restart without polling - the engine runs representative real workflows with production-grade observability