Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into standalone StellaOps.Workflow.* libraries targeting net10.0. Libraries (14): - Contracts, Abstractions (compiler, decompiler, expression runtime) - Engine (execution, signaling, scheduling, projections, hosted services) - ElkSharp (generic graph layout algorithm) - Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg - Signaling.Redis, Signaling.OracleAq - DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle WebService: ASP.NET Core Minimal API with 22 endpoints Tests (8 projects, 109 tests pass): - Engine.Tests (105 pass), WebService.Tests (4 E2E pass) - Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests - Signaling.Redis.Tests, IntegrationTests.Shared Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
677 lines
21 KiB
Markdown
677 lines
21 KiB
Markdown
# 07. Sprint Plan
|
|
|
|
## Planning Assumptions
|
|
|
|
- sprint length: 2 weeks
|
|
- one team owning runtime, persistence, and service integration
|
|
- Oracle AQ available
|
|
- no concurrent-engine migration scope
|
|
- acceptance means code, tests, and updated docs
|
|
|
|
## Sprint 1: Foundations And Contracts
|
|
|
|
### Goal
|
|
|
|
Create the engine skeleton and the stable interfaces.
|
|
|
|
### Scope
|
|
|
|
- add runtime provider abstraction
|
|
- add signal bus abstraction
|
|
- add schedule bus abstraction
|
|
- add runtime snapshot abstraction
|
|
- add engine option classes
|
|
- add `docs/engine/` package
|
|
|
|
### Deliverables
|
|
|
|
- interface set compiled into shared abstractions
|
|
- configuration classes
|
|
- initial DI composition path
|
|
- unit tests for options and registration
|
|
|
|
### Exit Criteria
|
|
|
|
- service builds with engine abstractions present
|
|
- no Elsa runtime assumptions are introduced into new code
|
|
- docs and interface names are stable enough for later sprints
|
|
|
|
## Sprint 2: Canonical Runtime Definition Store
|
|
|
|
### Goal
|
|
|
|
Make canonical execution definitions available at runtime without Elsa.
|
|
|
|
### Scope
|
|
|
|
- compile authored workflows to canonical runtime definitions at startup
|
|
- validate definitions during startup
|
|
- cache runtime definitions
|
|
- expose startup failure mode for invalid definitions
|
|
|
|
### Deliverables
|
|
|
|
- `WorkflowRuntimeDefinitionStore`
|
|
- definition normalization pipeline
|
|
- startup validator
|
|
- tests covering:
|
|
- valid definition load
|
|
- invalid definition rejection
|
|
- version resolution
|
|
|
|
### Exit Criteria
|
|
|
|
- all registered workflows load into runtime definition cache
|
|
- the runtime can resolve definition by name/version
|
|
|
|
## Sprint 3: Snapshot Store And Versioned Runtime State
|
|
|
|
### Goal
|
|
|
|
Turn `WF_RUNTIME_STATES` into a first-class engine snapshot store.
|
|
|
|
### Scope
|
|
|
|
- extend runtime state schema
|
|
- implement snapshot mapper
|
|
- implement optimistic concurrency versioning
|
|
- wire snapshot reads and writes
|
|
|
|
### Deliverables
|
|
|
|
- database migration scripts
|
|
- `OracleWorkflowRuntimeSnapshotStore`
|
|
- snapshot serialization contracts
|
|
- tests for:
|
|
- initial insert
|
|
- update with expected version
|
|
- stale version conflict
|
|
|
|
### Exit Criteria
|
|
|
|
- runtime snapshots can be loaded and committed with version control
|
|
- stale updates are rejected safely
|
|
|
|
## Sprint 4: AQ Signal And Schedule Backbone
|
|
|
|
### Goal
|
|
|
|
Introduce Oracle AQ as the durable event backbone.
|
|
|
|
### Scope
|
|
|
|
- create AQ setup scripts
|
|
- implement signal bus
|
|
- implement schedule bus
|
|
- implement signal envelope serialization
|
|
- implement hosted signal consumer skeleton
|
|
|
|
### Deliverables
|
|
|
|
- AQ DDL scripts
|
|
- `OracleAqWorkflowSignalBus`
|
|
- `OracleAqWorkflowScheduleBus`
|
|
- integration tests with enqueue/dequeue
|
|
- delayed message smoke tests
|
|
|
|
### Exit Criteria
|
|
|
|
- engine can publish and receive immediate signals without polling
|
|
- engine can publish and receive delayed signals
|
|
|
|
## Sprint 5: Start Flow And Human Task Activation
|
|
|
|
### Goal
|
|
|
|
Run workflows from start until first durable wait.
|
|
|
|
### Scope
|
|
|
|
- implement execution coordinator
|
|
- implement canonical interpreter subset:
|
|
- state assignment
|
|
- business reference assignment
|
|
- task activation
|
|
- terminal completion
|
|
- integrate with `WorkflowRuntimeService`
|
|
- keep existing projection model
|
|
|
|
### Deliverables
|
|
|
|
- `SerdicaEngineRuntimeProvider.StartAsync`
|
|
- execution slice result model
|
|
- task activation write path
|
|
- tests for:
|
|
- start to task
|
|
- start to completion
|
|
- business reference propagation
|
|
|
|
### Exit Criteria
|
|
|
|
- selected declarative workflows can start and create correct tasks without Elsa
|
|
|
|
## Sprint 6: Task Completion And Transport Calls
|
|
|
|
### Goal
|
|
|
|
Advance workflows after task completion and support transport-backed orchestration.
|
|
|
|
### Scope
|
|
|
|
- implement task completion execution path
|
|
- implement canonical interpreter support for:
|
|
- transport calls
|
|
- branches
|
|
- success/failure paths
|
|
- integrate completion flow with runtime snapshot commit
|
|
|
|
### Deliverables
|
|
|
|
- `SerdicaEngineRuntimeProvider.CompleteAsync`
|
|
- transport dispatcher
|
|
- tests for:
|
|
- completion to next task
|
|
- failure branch
|
|
- timeout branch where applicable
|
|
|
|
### Exit Criteria
|
|
|
|
- representative workflows can complete first task and reach correct next state
|
|
|
|
## Sprint 7: Subworkflows, Continue-With, And Repeat
|
|
|
|
### Goal
|
|
|
|
Support the higher-order orchestration patterns used heavily in the corpus.
|
|
|
|
### Scope
|
|
|
|
- implement subworkflow frame persistence
|
|
- implement parent resume
|
|
- implement continue-with production
|
|
- implement repeat resume semantics
|
|
|
|
### Deliverables
|
|
|
|
- subworkflow coordinator
|
|
- resume pointer serializer
|
|
- tests for:
|
|
- child completion resumes parent
|
|
- nested frame handling
|
|
- repeat interrupted by wait
|
|
- continue-with request emission
|
|
|
|
### Exit Criteria
|
|
|
|
- representative subworkflow-heavy families execute correctly
|
|
|
|
## Sprint 8: Timers, Retries, And Delayed Resume
|
|
|
|
### Goal
|
|
|
|
Finish the non-polling scheduling path.
|
|
|
|
### Scope
|
|
|
|
- implement timer waits
|
|
- implement retry scheduling
|
|
- implement stale timer ignore logic via waiting tokens
|
|
- integrate delayed AQ delivery into execution coordinator
|
|
|
|
### Deliverables
|
|
|
|
- timer wait model
|
|
- delayed resume handler
|
|
- tests for:
|
|
- timer due resume
|
|
- retry due resume
|
|
- canceled timer ignored
|
|
- restart-safe delayed processing
|
|
|
|
### Exit Criteria
|
|
|
|
- the engine supports time-based orchestration without polling loops
|
|
|
|
## Sprint 9: Operational Parity
|
|
|
|
### Goal
|
|
|
|
Reach product-surface and operations parity with the existing workflow service.
|
|
|
|
### Scope
|
|
|
|
- diagram parity validation
|
|
- runtime state inspection parity
|
|
- retention integration
|
|
- structured metrics and logging
|
|
- DLQ handling and diagnostics
|
|
|
|
### Deliverables
|
|
|
|
- runtime metadata mapping updates
|
|
- operational dashboards or documented metric set
|
|
- DLQ support
|
|
- tests for supportability paths
|
|
|
|
### Exit Criteria
|
|
|
|
- operations can inspect and support engine-driven instances through the existing product surface
|
|
|
|
## Sprint 10: Corpus Parity And Hardening
|
|
|
|
### Goal
|
|
|
|
Prove the engine against the real declarative workflow corpus.
|
|
|
|
### Scope
|
|
|
|
- execute representative high-fanout families end-to-end
|
|
- resolve remaining interpreter gaps
|
|
- multi-node duplicate delivery testing
|
|
- restart and recovery testing
|
|
- performance and soak tests
|
|
|
|
### Deliverables
|
|
|
|
- parity report against selected workflow families
|
|
- load test results
|
|
- recovery test results
|
|
- production readiness checklist
|
|
|
|
### Exit Criteria
|
|
|
|
- selected production-grade workflows run without Elsa
|
|
- restart recovery is proven
|
|
- no polling is used for steady-state signal or timer discovery
|
|
|
|
## Sprint 11: Bulstrad E2E Parity And Oracle Reliability
|
|
|
|
### Goal
|
|
|
|
Turn the engine from a validated runtime into a production-grade execution platform by proving it against real Bulstrad workflows and hostile Oracle operating conditions.
|
|
|
|
### Scope
|
|
|
|
- build a curated Bulstrad Oracle-AQ E2E suite
|
|
- replace synthetic runtime-state backing in Oracle integration tests with the real Oracle runtime-state store
|
|
- add Oracle transaction-coupling tests for state, projections, and AQ publish
|
|
- add Oracle restart, redelivery, and DLQ replay tests
|
|
- add multi-worker and duplicate-delivery race tests
|
|
- add deterministic fault-injection around commit boundaries
|
|
|
|
### Deliverables
|
|
|
|
- `BulstradOracleAqE2ETests`
|
|
- curated representative workflows with scripted downstream responders
|
|
- Oracle transport reliability suite covering:
|
|
- immediate and delayed delivery
|
|
- rollback and redelivery
|
|
- dead-letter browse and replay
|
|
- restart-safe delayed processing
|
|
- concurrency suite covering:
|
|
- duplicate signal delivery
|
|
- same-instance multi-worker races
|
|
- retry-after-conflict behavior
|
|
- documented timing expectations for cold-start and steady-state Oracle AQ
|
|
|
|
### Implemented Coverage
|
|
|
|
The current Oracle-backed integration harness now includes:
|
|
|
|
- Bulstrad policy-change families:
|
|
- `OpenForChangePolicy`
|
|
- `ReviewPolicyOpenForChange`
|
|
- `AssistantAddAnnex`
|
|
- `AnnexCancellation`
|
|
- `AssistantPolicyReinstate`
|
|
- `AssistantPolicyCancellation`
|
|
- `AssistantPrintInsisDocuments`
|
|
- shared policy families:
|
|
- `InsisIntegrationNew`
|
|
- `QuotationConfirm`
|
|
- `QuoteOrAplCancel`
|
|
- Oracle transport and recovery matrix:
|
|
- immediate and delayed AQ delivery
|
|
- delayed backlog drain within a bounded latency envelope
|
|
- dequeue rollback redelivery
|
|
- ambient Oracle transaction commit and rollback for immediate messages
|
|
- ambient Oracle transaction commit and rollback for delayed messages
|
|
- dead-letter browse, replay, and backlog replay
|
|
- dead-letter backlog survival across Oracle restart
|
|
- timer backlog recovery across provider restart and Oracle restart
|
|
- external-signal backlog recovery, worker abandon/recovery, and duplicate-delivery races
|
|
- schedule/publish failure rollback inside workflow mutation transactions
|
|
|
|
### Exit Criteria
|
|
|
|
- representative Bulstrad workflows execute correctly on `SerdicaEngine` with real Oracle AQ
|
|
- AQ-backed restart and delayed-delivery behavior is proven under realistic timing variance
|
|
- duplicate delivery and commit-boundary failures are shown to be safe
|
|
- the team has a stable PR suite and a broader nightly suite for Oracle-backed engine validation
|
|
|
|
## Sprint 12: Load, Performance, And Capacity Characterization
|
|
|
|
### Goal
|
|
|
|
Turn the correctness-focused Oracle validation suite into a real load and performance program with stable smoke gates, nightly trend runs, soak coverage, and first capacity numbers.
|
|
|
|
### Scope
|
|
|
|
- build a dedicated performance harness on top of the Oracle AQ integration foundation
|
|
- separate PR smoke, nightly characterization, weekly soak, and explicit capacity tiers
|
|
- add synthetic engine workloads for stable measurement
|
|
- add representative Bulstrad workload runners for business realism
|
|
- persist performance artifacts and summary reports
|
|
- define baseline and regression strategy per environment
|
|
|
|
### Deliverables
|
|
|
|
- categorized performance scenarios:
|
|
- `WorkflowPerfLatency`
|
|
- `WorkflowPerfThroughput`
|
|
- `WorkflowPerfSmoke`
|
|
- `WorkflowPerfNightly`
|
|
- `WorkflowPerfSoak`
|
|
- `WorkflowPerfCapacity`
|
|
- result artifact writer under `TestResults/workflow-performance/`
|
|
- scenario matrix covering:
|
|
- AQ immediate bursts
|
|
- AQ delayed bursts
|
|
- mixed signal backlogs
|
|
- synthetic start/task/signal/timer/subworkflow flows
|
|
- representative Bulstrad families
|
|
- restart and replay under load
|
|
- first baseline report for local Docker and CI Oracle
|
|
- first capacity note for one-node and multi-node assumptions
|
|
|
|
### Exit Criteria
|
|
|
|
- PR smoke load checks are cheap and stable enough to run continuously
|
|
- nightly runs capture latency, throughput, and correctness artifacts
|
|
- soak runs prove no backlog drift or correctness decay over extended execution
|
|
- representative Bulstrad workflows have measured latency envelopes, not just functional pass/fail
|
|
- the team has an initial sizing recommendation for worker concurrency and queue backlog expectations
|
|
|
|
### Implemented Foundation
|
|
|
|
The current Sprint 12 implementation now includes:
|
|
|
|
- performance categories and artifact generation under `TestResults/workflow-performance/`
|
|
- Oracle AQ smoke scenarios for:
|
|
- immediate burst drain
|
|
- delayed burst drain
|
|
- synthetic external-signal backlog resume
|
|
- short Bulstrad business burst using `QuoteOrAplCancel`
|
|
- persisted comparison against the previous artifact for the same scenario and tier
|
|
- Oracle AQ nightly scenarios for:
|
|
- larger immediate burst drain
|
|
- larger delayed burst drain
|
|
- larger synthetic external-signal backlog resume
|
|
- Bulstrad `QuotationConfirm -> PdfGenerator` burst
|
|
- Oracle AQ soak scenario for:
|
|
- sustained synthetic signal round-trip waves without correctness drift
|
|
- Oracle AQ latency baseline for:
|
|
- one-at-a-time synthetic signal round-trip with phase-level latency summaries
|
|
- Oracle AQ throughput baseline for:
|
|
- parallel synthetic signal round-trip with `16` workload concurrency and `8` signal workers
|
|
- Oracle AQ capacity ladder for:
|
|
- synthetic signal round-trip at concurrency `1`, `4`, `8`, and `16`
|
|
- thread-safe scripted transport recording for concurrent smoke scenarios
|
|
- first full Oracle baseline run with documented metrics in:
|
|
- [10-oracle-performance-baseline-2026-03-17.md](10-oracle-performance-baseline-2026-03-17.md)
|
|
- [10-oracle-performance-baseline-2026-03-17.json](10-oracle-performance-baseline-2026-03-17.json)
|
|
|
|
### Reference
|
|
|
|
The detailed workload model, KPI set, harness design, and baseline strategy are defined in [08-load-and-performance-plan.md](08-load-and-performance-plan.md).
|
|
|
|
## Sprint 13: Engine-Native Rendering And Authoring Projection
|
|
|
|
### Goal
|
|
|
|
Restore definition rendering and authoring projection without reintroducing Elsa types or runtime dependencies into the workflow declarations or the engine host.
|
|
|
|
### Scope
|
|
|
|
- design and implement a native definition-to-diagram projection for declarative and canonical workflows
|
|
- support deterministic node and edge generation from runtime definitions
|
|
- preserve task, branch, repeat, fork, timer, signal, and subworkflow visibility in the rendered output
|
|
- define a stable rendering contract for the operational API and future authoring tools
|
|
- keep rendering as a separate projection layer, not as part of runtime execution
|
|
|
|
### Deliverables
|
|
|
|
- native rendering model and renderer for `WorkflowRuntimeDefinition`
|
|
- canonical-to-diagram projection rules for:
|
|
- linear sequences
|
|
- decisions and conditional branches
|
|
- repeats
|
|
- forks and joins
|
|
- timers and external-signal waits
|
|
- continuations and subworkflows
|
|
- updated operational metadata and diagram endpoints backed only by engine assets
|
|
- test suite covering rendering determinism and parity for representative Bulstrad workflows
|
|
|
|
### Exit Criteria
|
|
|
|
- workflow definitions render without any Elsa packages, builders, or activity models
|
|
- rendered diagrams remain stable for the same declarative definition across rebuilds
|
|
- operational diagram inspection uses the native renderer only
|
|
- the rendering layer is ready to support a later authoring surface without changing workflow declarations
|
|
|
|
## Sprint 14: Backend Portability And Store Profiles
|
|
|
|
### Goal
|
|
|
|
Turn the Oracle-first engine into a backend-switchable engine with one selected backend profile per deployment.
|
|
|
|
### Scope
|
|
|
|
- introduce backend profile abstraction and dedicated backend plugin registration
|
|
- split projection persistence from the current Oracle-first application service
|
|
- formalize mutation coordinator abstraction
|
|
- add backend-neutral dead-letter contract
|
|
- add backend conformance suite
|
|
- implement PostgreSQL profile
|
|
- design MongoDB profile in executable detail, with implementation only after explicit product approval
|
|
|
|
### Deliverables
|
|
|
|
- `IWorkflowBackendRegistrationMarker`
|
|
- backend-neutral projection contract
|
|
- backend-neutral mutation coordinator contract
|
|
- backend conformance suite
|
|
- dedicated Oracle, PostgreSQL, and MongoDB backend plugin projects
|
|
- executable MongoDB backend plugin design package
|
|
|
|
### Exit Criteria
|
|
|
|
- host selects one backend profile by configuration
|
|
- host stays backend-neutral and does not resolve Oracle/PostgreSQL directly
|
|
- Oracle and PostgreSQL pass the same conformance suite
|
|
- MongoDB path is specified well enough that implementation is a bounded engineering task
|
|
- workflow declarations and canonical definitions remain unchanged across backend profiles
|
|
|
|
## Sprint 15: Backend-Neutral Parity And Performance Harness
|
|
|
|
### Goal
|
|
|
|
Remove the remaining Oracle-only assumptions from the validation stack so PostgreSQL and MongoDB can be measured with the same correctness, Bulstrad, and performance scenarios.
|
|
|
|
### Scope
|
|
|
|
- extract backend-neutral performance artifacts, categories, and scenario drivers
|
|
- extract backend-neutral runtime workload helpers from the Oracle-only harness
|
|
- define one hostile-condition matrix shared by Oracle, PostgreSQL, and MongoDB
|
|
- define one curated Bulstrad parity pack shared by all backends
|
|
- define one normalized performance artifact format and baseline comparison model
|
|
|
|
### Deliverables
|
|
|
|
- shared `IntegrationTests/Performance/Common/` package
|
|
- shared normalized performance metrics model
|
|
- shared Bulstrad workload catalog for:
|
|
- `OpenForChangePolicy`
|
|
- `ReviewPolicyOpenForChange`
|
|
- `AssistantPrintInsisDocuments`
|
|
- `AssistantAddAnnex`
|
|
- `AnnexCancellation`
|
|
- `AssistantPolicyCancellation`
|
|
- `AssistantPolicyReinstate`
|
|
- `InsisIntegrationNew`
|
|
- `QuotationConfirm`
|
|
- `QuoteOrAplCancel`
|
|
- backend-neutral hostile-condition checklist for:
|
|
- duplicate delivery
|
|
- same-instance resume race
|
|
- abandon and reclaim
|
|
- rollback on publish/schedule failure
|
|
- restart with pending due messages
|
|
- DLQ replay
|
|
- backlog drain
|
|
|
|
### Exit Criteria
|
|
|
|
- Oracle, PostgreSQL, and MongoDB use the same performance artifact shape
|
|
- Oracle no longer owns the reporting model for later backend baselines
|
|
- PostgreSQL and MongoDB can plug into the same workload definitions without changing workflow semantics
|
|
|
|
## Sprint 16: PostgreSQL Hardening, Bulstrad Parity, And Baseline
|
|
|
|
### Goal
|
|
|
|
Bring PostgreSQL to Oracle-level confidence for correctness, hostile conditions, representative product behavior, and measured performance.
|
|
|
|
### Scope
|
|
|
|
- close the PostgreSQL hostile-condition gap to the Oracle matrix
|
|
- add PostgreSQL-backed Bulstrad E2E parity
|
|
- implement PostgreSQL latency, throughput, smoke, nightly, soak, and capacity suites
|
|
- publish PostgreSQL baseline artifacts and narrative summary
|
|
|
|
### Deliverables
|
|
|
|
- PostgreSQL hostile-condition integration suite
|
|
- PostgreSQL Bulstrad parity suite
|
|
- PostgreSQL performance suites for:
|
|
- latency
|
|
- throughput
|
|
- smoke
|
|
- nightly
|
|
- soak
|
|
- capacity
|
|
- baseline documents:
|
|
- `11-postgres-performance-baseline-<date>.md`
|
|
- `11-postgres-performance-baseline-<date>.json`
|
|
|
|
### Exit Criteria
|
|
|
|
- PostgreSQL passes the same hostile-condition matrix as Oracle
|
|
- representative Bulstrad workflows run correctly on PostgreSQL
|
|
- PostgreSQL has a durable, documented performance baseline comparable to Oracle
|
|
|
|
## Sprint 17: MongoDB Hardening, Bulstrad Parity, And Baseline
|
|
|
|
### Goal
|
|
|
|
Bring MongoDB to the same product and operational confidence level as the relational backends without changing workflow behavior.
|
|
|
|
### Scope
|
|
|
|
- close the MongoDB hostile-condition gap to the Oracle matrix
|
|
- add MongoDB-backed Bulstrad E2E parity
|
|
- implement MongoDB latency, throughput, smoke, nightly, soak, and capacity suites
|
|
- publish MongoDB baseline artifacts and narrative summary
|
|
|
|
### Deliverables
|
|
|
|
- MongoDB hostile-condition integration suite
|
|
- MongoDB Bulstrad parity suite
|
|
- MongoDB performance suites for:
|
|
- latency
|
|
- throughput
|
|
- smoke
|
|
- nightly
|
|
- soak
|
|
- capacity
|
|
- baseline documents:
|
|
- `12-mongo-performance-baseline-<date>.md`
|
|
- `12-mongo-performance-baseline-<date>.json`
|
|
|
|
### Exit Criteria
|
|
|
|
- MongoDB passes the same hostile-condition matrix as Oracle
|
|
- representative Bulstrad workflows run correctly on MongoDB
|
|
- MongoDB has a durable, documented performance baseline comparable to Oracle and PostgreSQL
|
|
|
|
## Sprint 18: Final Three-Backend Characterization And Decision Pack
|
|
|
|
### Goal
|
|
|
|
Produce the final side-by-side comparison for Oracle, PostgreSQL, and MongoDB using the same workloads, the same correctness rules, and the same performance artifact format.
|
|
|
|
### Scope
|
|
|
|
- rerun the shared Bulstrad parity pack on all three backends
|
|
- rerun the shared hostile-condition matrix on all three backends
|
|
- rerun the shared performance tiers and compare normalized metrics
|
|
- capture backend-specific metrics appendices without letting them replace normalized workflow metrics
|
|
- publish the final recommendation pack
|
|
|
|
### Deliverables
|
|
|
|
- final comparison documents:
|
|
- `13-backend-comparison-<date>.md`
|
|
- `13-backend-comparison-<date>.json`
|
|
- normalized comparison across:
|
|
- serial latency
|
|
- steady-state throughput
|
|
- capacity ladder
|
|
- backlog drain
|
|
- duplicate-delivery safety
|
|
- restart recovery
|
|
- backend-specific appendices for:
|
|
- Oracle wait and AQ observations
|
|
- PostgreSQL lock, WAL, and queue-table observations
|
|
- MongoDB transaction, lock, and change-stream observations
|
|
|
|
### Exit Criteria
|
|
|
|
- all three backends are compared through the same workload lens
|
|
- the team has one documented backend recommendation pack
|
|
- future backend decisions can reuse the same comparison harness instead of inventing new ad hoc measurements
|
|
|
|
### Current Status
|
|
|
|
- baseline comparison pack published in:
|
|
- [13-backend-comparison-2026-03-17.md](13-backend-comparison-2026-03-17.md)
|
|
- [13-backend-comparison-2026-03-17.json](13-backend-comparison-2026-03-17.json)
|
|
- normalized performance comparison is complete for Oracle, PostgreSQL, and MongoDB
|
|
- reliability and Bulstrad hardening depth remains Oracle-first, so the current comparison is a baseline decision pack, not the final production closeout
|
|
- the signal path is now split into durable store and wake driver seams
|
|
- PostgreSQL and MongoDB now persist transactional wake-outbox records behind that seam
|
|
- the optional Redis wake-driver plugin is implemented for PostgreSQL and MongoDB
|
|
- Oracle intentionally remains on native AQ and does not support the Redis wake-driver combination
|
|
|
|
## Cross-Sprint Work Items
|
|
|
|
These should be maintained continuously, not left to the end:
|
|
|
|
- architecture doc updates
|
|
- test harness improvements
|
|
- canonical execution parity assertions
|
|
- operational telemetry quality
|
|
- snapshot schema versioning discipline
|
|
- Oracle timing-envelope observations for CI and local Docker environments
|
|
|
|
## Final Milestone Definition
|
|
|
|
The project is complete when:
|
|
|
|
- the workflow service can run on the engine as the active runtime
|
|
- task and instance APIs remain stable
|
|
- Oracle AQ handles both immediate signaling and delayed scheduling
|
|
- the service resumes correctly after restart without polling
|
|
- the engine runs representative real workflows with production-grade observability
|
|
|