Add StellaOps.Workflow engine: 14 libraries, WebService, 8 test projects
Extract product-agnostic workflow engine from Ablera.Serdica.Workflow into standalone StellaOps.Workflow.* libraries targeting net10.0. Libraries (14): - Contracts, Abstractions (compiler, decompiler, expression runtime) - Engine (execution, signaling, scheduling, projections, hosted services) - ElkSharp (generic graph layout algorithm) - Renderer.ElkSharp, Renderer.ElkJs, Renderer.Msagl, Renderer.Svg - Signaling.Redis, Signaling.OracleAq - DataStore.MongoDB, DataStore.PostgreSQL, DataStore.Oracle WebService: ASP.NET Core Minimal API with 22 endpoints Tests (8 projects, 109 tests pass): - Engine.Tests (105 pass), WebService.Tests (4 E2E pass) - Renderer.Tests, DataStore.MongoDB/Oracle/PostgreSQL.Tests - Signaling.Redis.Tests, IntegrationTests.Shared Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
676
docs/workflow/engine/07-sprint-plan.md
Normal file
676
docs/workflow/engine/07-sprint-plan.md
Normal file
@@ -0,0 +1,676 @@
|
||||
# 07. Sprint Plan
|
||||
|
||||
## Planning Assumptions
|
||||
|
||||
- sprint length: 2 weeks
|
||||
- one team owning runtime, persistence, and service integration
|
||||
- Oracle AQ available
|
||||
- no concurrent-engine migration scope
|
||||
- acceptance means code, tests, and updated docs
|
||||
|
||||
## Sprint 1: Foundations And Contracts
|
||||
|
||||
### Goal
|
||||
|
||||
Create the engine skeleton and the stable interfaces.
|
||||
|
||||
### Scope
|
||||
|
||||
- add runtime provider abstraction
|
||||
- add signal bus abstraction
|
||||
- add schedule bus abstraction
|
||||
- add runtime snapshot abstraction
|
||||
- add engine option classes
|
||||
- add `docs/engine/` package
|
||||
|
||||
### Deliverables
|
||||
|
||||
- interface set compiled into shared abstractions
|
||||
- configuration classes
|
||||
- initial DI composition path
|
||||
- unit tests for options and registration
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- service builds with engine abstractions present
|
||||
- no Elsa runtime assumptions are introduced into new code
|
||||
- docs and interface names are stable enough for later sprints
|
||||
|
||||
## Sprint 2: Canonical Runtime Definition Store
|
||||
|
||||
### Goal
|
||||
|
||||
Make canonical execution definitions available at runtime without Elsa.
|
||||
|
||||
### Scope
|
||||
|
||||
- compile authored workflows to canonical runtime definitions at startup
|
||||
- validate definitions during startup
|
||||
- cache runtime definitions
|
||||
- expose startup failure mode for invalid definitions
|
||||
|
||||
### Deliverables
|
||||
|
||||
- `WorkflowRuntimeDefinitionStore`
|
||||
- definition normalization pipeline
|
||||
- startup validator
|
||||
- tests covering:
|
||||
- valid definition load
|
||||
- invalid definition rejection
|
||||
- version resolution
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- all registered workflows load into runtime definition cache
|
||||
- the runtime can resolve definition by name/version
|
||||
|
||||
## Sprint 3: Snapshot Store And Versioned Runtime State
|
||||
|
||||
### Goal
|
||||
|
||||
Turn `WF_RUNTIME_STATES` into a first-class engine snapshot store.
|
||||
|
||||
### Scope
|
||||
|
||||
- extend runtime state schema
|
||||
- implement snapshot mapper
|
||||
- implement optimistic concurrency versioning
|
||||
- wire snapshot reads and writes
|
||||
|
||||
### Deliverables
|
||||
|
||||
- database migration scripts
|
||||
- `OracleWorkflowRuntimeSnapshotStore`
|
||||
- snapshot serialization contracts
|
||||
- tests for:
|
||||
- initial insert
|
||||
- update with expected version
|
||||
- stale version conflict
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- runtime snapshots can be loaded and committed with version control
|
||||
- stale updates are rejected safely
|
||||
|
||||
## Sprint 4: AQ Signal And Schedule Backbone
|
||||
|
||||
### Goal
|
||||
|
||||
Introduce Oracle AQ as the durable event backbone.
|
||||
|
||||
### Scope
|
||||
|
||||
- create AQ setup scripts
|
||||
- implement signal bus
|
||||
- implement schedule bus
|
||||
- implement signal envelope serialization
|
||||
- implement hosted signal consumer skeleton
|
||||
|
||||
### Deliverables
|
||||
|
||||
- AQ DDL scripts
|
||||
- `OracleAqWorkflowSignalBus`
|
||||
- `OracleAqWorkflowScheduleBus`
|
||||
- integration tests with enqueue/dequeue
|
||||
- delayed message smoke tests
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- engine can publish and receive immediate signals without polling
|
||||
- engine can publish and receive delayed signals
|
||||
|
||||
## Sprint 5: Start Flow And Human Task Activation
|
||||
|
||||
### Goal
|
||||
|
||||
Run workflows from start until first durable wait.
|
||||
|
||||
### Scope
|
||||
|
||||
- implement execution coordinator
|
||||
- implement canonical interpreter subset:
|
||||
- state assignment
|
||||
- business reference assignment
|
||||
- task activation
|
||||
- terminal completion
|
||||
- integrate with `WorkflowRuntimeService`
|
||||
- keep existing projection model
|
||||
|
||||
### Deliverables
|
||||
|
||||
- `SerdicaEngineRuntimeProvider.StartAsync`
|
||||
- execution slice result model
|
||||
- task activation write path
|
||||
- tests for:
|
||||
- start to task
|
||||
- start to completion
|
||||
- business reference propagation
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- selected declarative workflows can start and create correct tasks without Elsa
|
||||
|
||||
## Sprint 6: Task Completion And Transport Calls
|
||||
|
||||
### Goal
|
||||
|
||||
Advance workflows after task completion and support transport-backed orchestration.
|
||||
|
||||
### Scope
|
||||
|
||||
- implement task completion execution path
|
||||
- implement canonical interpreter support for:
|
||||
- transport calls
|
||||
- branches
|
||||
- success/failure paths
|
||||
- integrate completion flow with runtime snapshot commit
|
||||
|
||||
### Deliverables
|
||||
|
||||
- `SerdicaEngineRuntimeProvider.CompleteAsync`
|
||||
- transport dispatcher
|
||||
- tests for:
|
||||
- completion to next task
|
||||
- failure branch
|
||||
- timeout branch where applicable
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- representative workflows can complete first task and reach correct next state
|
||||
|
||||
## Sprint 7: Subworkflows, Continue-With, And Repeat
|
||||
|
||||
### Goal
|
||||
|
||||
Support the higher-order orchestration patterns used heavily in the corpus.
|
||||
|
||||
### Scope
|
||||
|
||||
- implement subworkflow frame persistence
|
||||
- implement parent resume
|
||||
- implement continue-with production
|
||||
- implement repeat resume semantics
|
||||
|
||||
### Deliverables
|
||||
|
||||
- subworkflow coordinator
|
||||
- resume pointer serializer
|
||||
- tests for:
|
||||
- child completion resumes parent
|
||||
- nested frame handling
|
||||
- repeat interrupted by wait
|
||||
- continue-with request emission
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- representative subworkflow-heavy families execute correctly
|
||||
|
||||
## Sprint 8: Timers, Retries, And Delayed Resume
|
||||
|
||||
### Goal
|
||||
|
||||
Finish the non-polling scheduling path.
|
||||
|
||||
### Scope
|
||||
|
||||
- implement timer waits
|
||||
- implement retry scheduling
|
||||
- implement stale timer ignore logic via waiting tokens
|
||||
- integrate delayed AQ delivery into execution coordinator
|
||||
|
||||
### Deliverables
|
||||
|
||||
- timer wait model
|
||||
- delayed resume handler
|
||||
- tests for:
|
||||
- timer due resume
|
||||
- retry due resume
|
||||
- canceled timer ignored
|
||||
- restart-safe delayed processing
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- the engine supports time-based orchestration without polling loops
|
||||
|
||||
## Sprint 9: Operational Parity
|
||||
|
||||
### Goal
|
||||
|
||||
Reach product-surface and operations parity with the existing workflow service.
|
||||
|
||||
### Scope
|
||||
|
||||
- diagram parity validation
|
||||
- runtime state inspection parity
|
||||
- retention integration
|
||||
- structured metrics and logging
|
||||
- DLQ handling and diagnostics
|
||||
|
||||
### Deliverables
|
||||
|
||||
- runtime metadata mapping updates
|
||||
- operational dashboards or documented metric set
|
||||
- DLQ support
|
||||
- tests for supportability paths
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- operations can inspect and support engine-driven instances through the existing product surface
|
||||
|
||||
## Sprint 10: Corpus Parity And Hardening
|
||||
|
||||
### Goal
|
||||
|
||||
Prove the engine against the real declarative workflow corpus.
|
||||
|
||||
### Scope
|
||||
|
||||
- execute representative high-fanout families end-to-end
|
||||
- resolve remaining interpreter gaps
|
||||
- multi-node duplicate delivery testing
|
||||
- restart and recovery testing
|
||||
- performance and soak tests
|
||||
|
||||
### Deliverables
|
||||
|
||||
- parity report against selected workflow families
|
||||
- load test results
|
||||
- recovery test results
|
||||
- production readiness checklist
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- selected production-grade workflows run without Elsa
|
||||
- restart recovery is proven
|
||||
- no polling is used for steady-state signal or timer discovery
|
||||
|
||||
## Sprint 11: Bulstrad E2E Parity And Oracle Reliability
|
||||
|
||||
### Goal
|
||||
|
||||
Turn the engine from a validated runtime into a production-grade execution platform by proving it against real Bulstrad workflows and hostile Oracle operating conditions.
|
||||
|
||||
### Scope
|
||||
|
||||
- build a curated Bulstrad Oracle-AQ E2E suite
|
||||
- replace synthetic runtime-state backing in Oracle integration tests with the real Oracle runtime-state store
|
||||
- add Oracle transaction-coupling tests for state, projections, and AQ publish
|
||||
- add Oracle restart, redelivery, and DLQ replay tests
|
||||
- add multi-worker and duplicate-delivery race tests
|
||||
- add deterministic fault-injection around commit boundaries
|
||||
|
||||
### Deliverables
|
||||
|
||||
- `BulstradOracleAqE2ETests`
|
||||
- curated representative workflows with scripted downstream responders
|
||||
- Oracle transport reliability suite covering:
|
||||
- immediate and delayed delivery
|
||||
- rollback and redelivery
|
||||
- dead-letter browse and replay
|
||||
- restart-safe delayed processing
|
||||
- concurrency suite covering:
|
||||
- duplicate signal delivery
|
||||
- same-instance multi-worker races
|
||||
- retry-after-conflict behavior
|
||||
- documented timing expectations for cold-start and steady-state Oracle AQ
|
||||
|
||||
### Implemented Coverage
|
||||
|
||||
The current Oracle-backed integration harness now includes:
|
||||
|
||||
- Bulstrad policy-change families:
|
||||
- `OpenForChangePolicy`
|
||||
- `ReviewPolicyOpenForChange`
|
||||
- `AssistantAddAnnex`
|
||||
- `AnnexCancellation`
|
||||
- `AssistantPolicyReinstate`
|
||||
- `AssistantPolicyCancellation`
|
||||
- `AssistantPrintInsisDocuments`
|
||||
- shared policy families:
|
||||
- `InsisIntegrationNew`
|
||||
- `QuotationConfirm`
|
||||
- `QuoteOrAplCancel`
|
||||
- Oracle transport and recovery matrix:
|
||||
- immediate and delayed AQ delivery
|
||||
- delayed backlog drain within a bounded latency envelope
|
||||
- dequeue rollback redelivery
|
||||
- ambient Oracle transaction commit and rollback for immediate messages
|
||||
- ambient Oracle transaction commit and rollback for delayed messages
|
||||
- dead-letter browse, replay, and backlog replay
|
||||
- dead-letter backlog survival across Oracle restart
|
||||
- timer backlog recovery across provider restart and Oracle restart
|
||||
- external-signal backlog recovery, worker abandon/recovery, and duplicate-delivery races
|
||||
- schedule/publish failure rollback inside workflow mutation transactions
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- representative Bulstrad workflows execute correctly on `SerdicaEngine` with real Oracle AQ
|
||||
- AQ-backed restart and delayed-delivery behavior is proven under realistic timing variance
|
||||
- duplicate delivery and commit-boundary failures are shown to be safe
|
||||
- the team has a stable PR suite and a broader nightly suite for Oracle-backed engine validation
|
||||
|
||||
## Sprint 12: Load, Performance, And Capacity Characterization
|
||||
|
||||
### Goal
|
||||
|
||||
Turn the correctness-focused Oracle validation suite into a real load and performance program with stable smoke gates, nightly trend runs, soak coverage, and first capacity numbers.
|
||||
|
||||
### Scope
|
||||
|
||||
- build a dedicated performance harness on top of the Oracle AQ integration foundation
|
||||
- separate PR smoke, nightly characterization, weekly soak, and explicit capacity tiers
|
||||
- add synthetic engine workloads for stable measurement
|
||||
- add representative Bulstrad workload runners for business realism
|
||||
- persist performance artifacts and summary reports
|
||||
- define baseline and regression strategy per environment
|
||||
|
||||
### Deliverables
|
||||
|
||||
- categorized performance scenarios:
|
||||
- `WorkflowPerfLatency`
|
||||
- `WorkflowPerfThroughput`
|
||||
- `WorkflowPerfSmoke`
|
||||
- `WorkflowPerfNightly`
|
||||
- `WorkflowPerfSoak`
|
||||
- `WorkflowPerfCapacity`
|
||||
- result artifact writer under `TestResults/workflow-performance/`
|
||||
- scenario matrix covering:
|
||||
- AQ immediate bursts
|
||||
- AQ delayed bursts
|
||||
- mixed signal backlogs
|
||||
- synthetic start/task/signal/timer/subworkflow flows
|
||||
- representative Bulstrad families
|
||||
- restart and replay under load
|
||||
- first baseline report for local Docker and CI Oracle
|
||||
- first capacity note for one-node and multi-node assumptions
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- PR smoke load checks are cheap and stable enough to run continuously
|
||||
- nightly runs capture latency, throughput, and correctness artifacts
|
||||
- soak runs prove no backlog drift or correctness decay over extended execution
|
||||
- representative Bulstrad workflows have measured latency envelopes, not just functional pass/fail
|
||||
- the team has an initial sizing recommendation for worker concurrency and queue backlog expectations
|
||||
|
||||
### Implemented Foundation
|
||||
|
||||
The current Sprint 12 implementation now includes:
|
||||
|
||||
- performance categories and artifact generation under `TestResults/workflow-performance/`
|
||||
- Oracle AQ smoke scenarios for:
|
||||
- immediate burst drain
|
||||
- delayed burst drain
|
||||
- synthetic external-signal backlog resume
|
||||
- short Bulstrad business burst using `QuoteOrAplCancel`
|
||||
- persisted comparison against the previous artifact for the same scenario and tier
|
||||
- Oracle AQ nightly scenarios for:
|
||||
- larger immediate burst drain
|
||||
- larger delayed burst drain
|
||||
- larger synthetic external-signal backlog resume
|
||||
- Bulstrad `QuotationConfirm -> PdfGenerator` burst
|
||||
- Oracle AQ soak scenario for:
|
||||
- sustained synthetic signal round-trip waves without correctness drift
|
||||
- Oracle AQ latency baseline for:
|
||||
- one-at-a-time synthetic signal round-trip with phase-level latency summaries
|
||||
- Oracle AQ throughput baseline for:
|
||||
- parallel synthetic signal round-trip with `16` workload concurrency and `8` signal workers
|
||||
- Oracle AQ capacity ladder for:
|
||||
- synthetic signal round-trip at concurrency `1`, `4`, `8`, and `16`
|
||||
- thread-safe scripted transport recording for concurrent smoke scenarios
|
||||
- first full Oracle baseline run with documented metrics in:
|
||||
- [10-oracle-performance-baseline-2026-03-17.md](10-oracle-performance-baseline-2026-03-17.md)
|
||||
- [10-oracle-performance-baseline-2026-03-17.json](10-oracle-performance-baseline-2026-03-17.json)
|
||||
|
||||
### Reference
|
||||
|
||||
The detailed workload model, KPI set, harness design, and baseline strategy are defined in [08-load-and-performance-plan.md](08-load-and-performance-plan.md).
|
||||
|
||||
## Sprint 13: Engine-Native Rendering And Authoring Projection
|
||||
|
||||
### Goal
|
||||
|
||||
Restore definition rendering and authoring projection without reintroducing Elsa types or runtime dependencies into the workflow declarations or the engine host.
|
||||
|
||||
### Scope
|
||||
|
||||
- design and implement a native definition-to-diagram projection for declarative and canonical workflows
|
||||
- support deterministic node and edge generation from runtime definitions
|
||||
- preserve task, branch, repeat, fork, timer, signal, and subworkflow visibility in the rendered output
|
||||
- define a stable rendering contract for the operational API and future authoring tools
|
||||
- keep rendering as a separate projection layer, not as part of runtime execution
|
||||
|
||||
### Deliverables
|
||||
|
||||
- native rendering model and renderer for `WorkflowRuntimeDefinition`
|
||||
- canonical-to-diagram projection rules for:
|
||||
- linear sequences
|
||||
- decisions and conditional branches
|
||||
- repeats
|
||||
- forks and joins
|
||||
- timers and external-signal waits
|
||||
- continuations and subworkflows
|
||||
- updated operational metadata and diagram endpoints backed only by engine assets
|
||||
- test suite covering rendering determinism and parity for representative Bulstrad workflows
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- workflow definitions render without any Elsa packages, builders, or activity models
|
||||
- rendered diagrams remain stable for the same declarative definition across rebuilds
|
||||
- operational diagram inspection uses the native renderer only
|
||||
- the rendering layer is ready to support a later authoring surface without changing workflow declarations
|
||||
|
||||
## Sprint 14: Backend Portability And Store Profiles
|
||||
|
||||
### Goal
|
||||
|
||||
Turn the Oracle-first engine into a backend-switchable engine with one selected backend profile per deployment.
|
||||
|
||||
### Scope
|
||||
|
||||
- introduce backend profile abstraction and dedicated backend plugin registration
|
||||
- split projection persistence from the current Oracle-first application service
|
||||
- formalize mutation coordinator abstraction
|
||||
- add backend-neutral dead-letter contract
|
||||
- add backend conformance suite
|
||||
- implement PostgreSQL profile
|
||||
- design MongoDB profile in executable detail, with implementation only after explicit product approval
|
||||
|
||||
### Deliverables
|
||||
|
||||
- `IWorkflowBackendRegistrationMarker`
|
||||
- backend-neutral projection contract
|
||||
- backend-neutral mutation coordinator contract
|
||||
- backend conformance suite
|
||||
- dedicated Oracle, PostgreSQL, and MongoDB backend plugin projects
|
||||
- executable MongoDB backend plugin design package
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- host selects one backend profile by configuration
|
||||
- host stays backend-neutral and does not resolve Oracle/PostgreSQL directly
|
||||
- Oracle and PostgreSQL pass the same conformance suite
|
||||
- MongoDB path is specified well enough that implementation is a bounded engineering task
|
||||
- workflow declarations and canonical definitions remain unchanged across backend profiles
|
||||
|
||||
## Sprint 15: Backend-Neutral Parity And Performance Harness
|
||||
|
||||
### Goal
|
||||
|
||||
Remove the remaining Oracle-only assumptions from the validation stack so PostgreSQL and MongoDB can be measured with the same correctness, Bulstrad, and performance scenarios.
|
||||
|
||||
### Scope
|
||||
|
||||
- extract backend-neutral performance artifacts, categories, and scenario drivers
|
||||
- extract backend-neutral runtime workload helpers from the Oracle-only harness
|
||||
- define one hostile-condition matrix shared by Oracle, PostgreSQL, and MongoDB
|
||||
- define one curated Bulstrad parity pack shared by all backends
|
||||
- define one normalized performance artifact format and baseline comparison model
|
||||
|
||||
### Deliverables
|
||||
|
||||
- shared `IntegrationTests/Performance/Common/` package
|
||||
- shared normalized performance metrics model
|
||||
- shared Bulstrad workload catalog for:
|
||||
- `OpenForChangePolicy`
|
||||
- `ReviewPolicyOpenForChange`
|
||||
- `AssistantPrintInsisDocuments`
|
||||
- `AssistantAddAnnex`
|
||||
- `AnnexCancellation`
|
||||
- `AssistantPolicyCancellation`
|
||||
- `AssistantPolicyReinstate`
|
||||
- `InsisIntegrationNew`
|
||||
- `QuotationConfirm`
|
||||
- `QuoteOrAplCancel`
|
||||
- backend-neutral hostile-condition checklist for:
|
||||
- duplicate delivery
|
||||
- same-instance resume race
|
||||
- abandon and reclaim
|
||||
- rollback on publish/schedule failure
|
||||
- restart with pending due messages
|
||||
- DLQ replay
|
||||
- backlog drain
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- Oracle, PostgreSQL, and MongoDB use the same performance artifact shape
|
||||
- Oracle no longer owns the reporting model for later backend baselines
|
||||
- PostgreSQL and MongoDB can plug into the same workload definitions without changing workflow semantics
|
||||
|
||||
## Sprint 16: PostgreSQL Hardening, Bulstrad Parity, And Baseline
|
||||
|
||||
### Goal
|
||||
|
||||
Bring PostgreSQL to Oracle-level confidence for correctness, hostile conditions, representative product behavior, and measured performance.
|
||||
|
||||
### Scope
|
||||
|
||||
- close the PostgreSQL hostile-condition gap to the Oracle matrix
|
||||
- add PostgreSQL-backed Bulstrad E2E parity
|
||||
- implement PostgreSQL latency, throughput, smoke, nightly, soak, and capacity suites
|
||||
- publish PostgreSQL baseline artifacts and narrative summary
|
||||
|
||||
### Deliverables
|
||||
|
||||
- PostgreSQL hostile-condition integration suite
|
||||
- PostgreSQL Bulstrad parity suite
|
||||
- PostgreSQL performance suites for:
|
||||
- latency
|
||||
- throughput
|
||||
- smoke
|
||||
- nightly
|
||||
- soak
|
||||
- capacity
|
||||
- baseline documents:
|
||||
- `11-postgres-performance-baseline-<date>.md`
|
||||
- `11-postgres-performance-baseline-<date>.json`
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- PostgreSQL passes the same hostile-condition matrix as Oracle
|
||||
- representative Bulstrad workflows run correctly on PostgreSQL
|
||||
- PostgreSQL has a durable, documented performance baseline comparable to Oracle
|
||||
|
||||
## Sprint 17: MongoDB Hardening, Bulstrad Parity, And Baseline
|
||||
|
||||
### Goal
|
||||
|
||||
Bring MongoDB to the same product and operational confidence level as the relational backends without changing workflow behavior.
|
||||
|
||||
### Scope
|
||||
|
||||
- close the MongoDB hostile-condition gap to the Oracle matrix
|
||||
- add MongoDB-backed Bulstrad E2E parity
|
||||
- implement MongoDB latency, throughput, smoke, nightly, soak, and capacity suites
|
||||
- publish MongoDB baseline artifacts and narrative summary
|
||||
|
||||
### Deliverables
|
||||
|
||||
- MongoDB hostile-condition integration suite
|
||||
- MongoDB Bulstrad parity suite
|
||||
- MongoDB performance suites for:
|
||||
- latency
|
||||
- throughput
|
||||
- smoke
|
||||
- nightly
|
||||
- soak
|
||||
- capacity
|
||||
- baseline documents:
|
||||
- `12-mongo-performance-baseline-<date>.md`
|
||||
- `12-mongo-performance-baseline-<date>.json`
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- MongoDB passes the same hostile-condition matrix as Oracle
|
||||
- representative Bulstrad workflows run correctly on MongoDB
|
||||
- MongoDB has a durable, documented performance baseline comparable to Oracle and PostgreSQL
|
||||
|
||||
## Sprint 18: Final Three-Backend Characterization And Decision Pack
|
||||
|
||||
### Goal
|
||||
|
||||
Produce the final side-by-side comparison for Oracle, PostgreSQL, and MongoDB using the same workloads, the same correctness rules, and the same performance artifact format.
|
||||
|
||||
### Scope
|
||||
|
||||
- rerun the shared Bulstrad parity pack on all three backends
|
||||
- rerun the shared hostile-condition matrix on all three backends
|
||||
- rerun the shared performance tiers and compare normalized metrics
|
||||
- capture backend-specific metrics appendices without letting them replace normalized workflow metrics
|
||||
- publish the final recommendation pack
|
||||
|
||||
### Deliverables
|
||||
|
||||
- final comparison documents:
|
||||
- `13-backend-comparison-<date>.md`
|
||||
- `13-backend-comparison-<date>.json`
|
||||
- normalized comparison across:
|
||||
- serial latency
|
||||
- steady-state throughput
|
||||
- capacity ladder
|
||||
- backlog drain
|
||||
- duplicate-delivery safety
|
||||
- restart recovery
|
||||
- backend-specific appendices for:
|
||||
- Oracle wait and AQ observations
|
||||
- PostgreSQL lock, WAL, and queue-table observations
|
||||
- MongoDB transaction, lock, and change-stream observations
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- all three backends are compared through the same workload lens
|
||||
- the team has one documented backend recommendation pack
|
||||
- future backend decisions can reuse the same comparison harness instead of inventing new ad hoc measurements
|
||||
|
||||
### Current Status
|
||||
|
||||
- baseline comparison pack published in:
|
||||
- [13-backend-comparison-2026-03-17.md](13-backend-comparison-2026-03-17.md)
|
||||
- [13-backend-comparison-2026-03-17.json](13-backend-comparison-2026-03-17.json)
|
||||
- normalized performance comparison is complete for Oracle, PostgreSQL, and MongoDB
|
||||
- reliability and Bulstrad hardening depth remains Oracle-first, so the current comparison is a baseline decision pack, not the final production closeout
|
||||
- the signal path is now split into durable store and wake driver seams
|
||||
- PostgreSQL and MongoDB now persist transactional wake-outbox records behind that seam
|
||||
- the optional Redis wake-driver plugin is implemented for PostgreSQL and MongoDB
|
||||
- Oracle intentionally remains on native AQ and does not support the Redis wake-driver combination
|
||||
|
||||
## Cross-Sprint Work Items
|
||||
|
||||
These should be maintained continuously, not left to the end:
|
||||
|
||||
- architecture doc updates
|
||||
- test harness improvements
|
||||
- canonical execution parity assertions
|
||||
- operational telemetry quality
|
||||
- snapshot schema versioning discipline
|
||||
- Oracle timing-envelope observations for CI and local Docker environments
|
||||
|
||||
## Final Milestone Definition
|
||||
|
||||
The project is complete when:
|
||||
|
||||
- the workflow service can run on the engine as the active runtime
|
||||
- task and instance APIs remain stable
|
||||
- Oracle AQ handles both immediate signaling and delayed scheduling
|
||||
- the service resumes correctly after restart without polling
|
||||
- the engine runs representative real workflows with production-grade observability
|
||||
|
||||
Reference in New Issue
Block a user