- Module dossiers: attestor, authority, cli, graph, scanner - Policy assistant parameters guide - UI v2-rewire navigation rendering policy - Test suite overview update - Workflow engine requirements and tutorial series (01-08) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
292 lines
7.7 KiB
Markdown
292 lines
7.7 KiB
Markdown
# 01. Requirements And Principles
|
|
|
|
## 1. Product Goal
|
|
|
|
Build a Serdica-owned workflow engine that can run the current Bulstrad workflow corpus without Elsa while preserving the existing service-level workflow product:
|
|
|
|
- workflow start
|
|
- task inbox and task lifecycle
|
|
- business-reference based lookup
|
|
- runtime state inspection
|
|
- workflow diagrams
|
|
- canonical schema and canonical validation exposure
|
|
- workflow retention and hosted jobs
|
|
|
|
The engine must execute the same business behavior currently expressed in the declarative workflow DSL and canonical workflow definition model.
|
|
|
|
## 2. Functional Requirements
|
|
|
|
### 2.1 Workflow Definition Handling
|
|
|
|
The engine must:
|
|
|
|
- discover workflow registrations from authored C# workflow classes
|
|
- resolve the latest or exact workflow version through the existing registration catalog
|
|
- compile authored declarative workflows into canonical runtime definitions
|
|
- keep canonical validation as a first-class platform capability
|
|
- reject invalid or unsupported definitions during startup or validation
|
|
|
|
### 2.2 Workflow Start
|
|
|
|
The engine must:
|
|
|
|
- bind the untyped start payload to the workflow start request type
|
|
- resolve or derive business reference data
|
|
- initialize canonical workflow state
|
|
- execute the initial sequence until a wait boundary or completion
|
|
- create workflow projections and runtime state in one durable flow
|
|
- support workflow continuations created during start
|
|
|
|
### 2.3 Human Tasks
|
|
|
|
The engine must:
|
|
|
|
- activate human tasks with:
|
|
- task type
|
|
- route
|
|
- workflow roles
|
|
- task roles
|
|
- runtime roles
|
|
- payload
|
|
- business reference
|
|
- preserve the current task assignment model:
|
|
- assign to self
|
|
- assign to user
|
|
- assign to runtime roles
|
|
- release
|
|
- expose completed and active task history through the existing projection model
|
|
|
|
### 2.4 Task Completion
|
|
|
|
The engine must:
|
|
|
|
- load the current workflow state and task context
|
|
- authorize completion through the existing service layer
|
|
- apply completion payload
|
|
- continue execution from the task completion entry point
|
|
- produce next tasks, next waits, next continuations, or completion
|
|
- update runtime state and read projections durably
|
|
|
|
### 2.5 Runtime Semantics
|
|
|
|
The engine must support the semantic surface already present in declarative workflows:
|
|
|
|
- state assignment
|
|
- business reference assignment
|
|
- human task activation
|
|
- microservice calls
|
|
- legacy rabbit calls
|
|
- GraphQL calls
|
|
- HTTP calls
|
|
- conditional branches
|
|
- decision branches
|
|
- repeat loops
|
|
- subworkflow invocation
|
|
- continue-with orchestration
|
|
- timeout branches
|
|
- failure branches
|
|
- function-backed expressions
|
|
|
|
### 2.6 Subworkflows
|
|
|
|
The engine must:
|
|
|
|
- start child workflows
|
|
- persist parent resume frames
|
|
- carry child output back into parent state
|
|
- support nested resume across multiple levels
|
|
- preserve current declarative subworkflow semantics
|
|
|
|
### 2.7 Scheduling
|
|
|
|
The engine must support:
|
|
|
|
- timeouts
|
|
- retry wake-ups
|
|
- delayed continuation
|
|
- explicit wait-until behavior
|
|
|
|
This must happen without a steady-state polling loop.
|
|
|
|
### 2.8 Inspection And Operations
|
|
|
|
The service must continue to expose:
|
|
|
|
- workflow definitions
|
|
- workflow instances
|
|
- workflow tasks
|
|
- workflow task events
|
|
- workflow diagrams
|
|
- runtime state snapshots
|
|
- canonical schema
|
|
- canonical validation
|
|
|
|
## 3. Non-Functional Requirements
|
|
|
|
### 3.1 Multi-Instance Deployment
|
|
|
|
The service must support multiple application nodes against one shared Oracle database.
|
|
|
|
Implications:
|
|
|
|
- no single-node assumptions
|
|
- no in-memory-only correctness logic
|
|
- no sticky workflow ownership
|
|
- duplicate signal delivery must be safe
|
|
|
|
### 3.2 Durability
|
|
|
|
The system of record must be durable across:
|
|
|
|
- process restart
|
|
- node restart
|
|
- full cluster restart
|
|
- database restart
|
|
|
|
Workflow progress, pending waits, active tasks, and due timers must not be lost.
|
|
|
|
### 3.3 No Polling
|
|
|
|
Signal-driven wake-up is mandatory.
|
|
|
|
The engine must not rely on a periodic database scan loop to discover work. Blocking or event-driven delivery is required for:
|
|
|
|
- task completion wake-up
|
|
- delayed resume wake-up
|
|
- subworkflow completion wake-up
|
|
- external signal wake-up
|
|
|
|
### 3.4 One Database
|
|
|
|
Oracle is the shared durable state backend for:
|
|
|
|
- workflow projections
|
|
- workflow runtime snapshots
|
|
- host coordination
|
|
- signal and schedule durability through Oracle AQ
|
|
|
|
Redis may exist in the wider platform, but it is not required for engine correctness.
|
|
|
|
### 3.5 Observability
|
|
|
|
The engine must produce enough telemetry to answer:
|
|
|
|
- what instance is waiting
|
|
- why it is waiting
|
|
- which signal resumed it
|
|
- which node executed it
|
|
- which definition version it used
|
|
- why it failed
|
|
- whether a message was retried, dead-lettered, or ignored as stale
|
|
|
|
### 3.6 Compatibility
|
|
|
|
The engine must preserve the existing public workflow service contracts unless a future product change explicitly changes them.
|
|
|
|
The following service-contract groups are especially important:
|
|
|
|
- workflow start contracts
|
|
- workflow definition contracts
|
|
- workflow task contracts
|
|
- workflow instance contracts
|
|
- workflow operational contracts
|
|
|
|
## 4. Explicit V1 Assumptions
|
|
|
|
These assumptions simplify the engine architecture and are intentional.
|
|
|
|
### 4.1 Single Active Runtime Provider Per Deployment
|
|
|
|
The service runs one engine provider at a time.
|
|
|
|
This means:
|
|
|
|
- no mixed-provider instance routing
|
|
- no live migration between engines
|
|
- no simultaneous old-runtime and engine execution inside one deployment
|
|
|
|
The design still keeps abstractions around the runtime, signaling bus, and scheduler so that future replacement remains possible.
|
|
|
|
### 4.2 Canonical Runtime, Not Elsa Activity Runtime
|
|
|
|
The target engine executes canonical workflow definitions directly.
|
|
|
|
Authored C# remains the source of truth, but runtime semantics are driven by canonical definitions compiled from that source.
|
|
|
|
### 4.3 Oracle AQ Is The Default Event Backbone
|
|
|
|
Oracle AQ is treated as part of the durable engine platform because it satisfies:
|
|
|
|
- one-database architecture
|
|
- blocking dequeue
|
|
- durable delivery
|
|
- delayed delivery
|
|
- transactional behavior
|
|
|
|
## 5. Design Principles
|
|
|
|
### 5.1 Keep The Product Surface Stable
|
|
|
|
The workflow service remains the product boundary. The engine is an internal subsystem.
|
|
|
|
### 5.2 Separate Read Model From Runtime Model
|
|
|
|
Task and instance projections are optimized for product reads.
|
|
|
|
Runtime snapshots are optimized for deterministic resume.
|
|
|
|
They are related, but they are not the same data structure.
|
|
|
|
### 5.3 Run To Wait
|
|
|
|
The engine should never keep a workflow instance “hot” in memory for correctness.
|
|
|
|
Execution should run until:
|
|
|
|
- a task is activated
|
|
- a timer is scheduled
|
|
- an external signal wait is registered
|
|
- the workflow completes
|
|
|
|
Then the snapshot is persisted and released.
|
|
|
|
### 5.4 Make Delivery At-Least-Once And Resume Idempotent
|
|
|
|
Distributed delivery is never exactly-once in practice.
|
|
|
|
The engine must treat duplicate signals, duplicate wake-ups, and late timer arrivals as normal conditions.
|
|
|
|
### 5.5 Keep Signals Small
|
|
|
|
Signals should identify work, not carry the full workflow state.
|
|
|
|
The database snapshot remains authoritative.
|
|
|
|
### 5.6 Keep Abstractions At The Backend Boundary
|
|
|
|
Abstract:
|
|
|
|
- runtime provider
|
|
- signal bus
|
|
- schedule bus
|
|
- snapshot store
|
|
|
|
Do not abstract away the workflow semantics themselves.
|
|
|
|
### 5.7 Prefer Transactional Consistency Over Cleverness
|
|
|
|
If a feature can be made transactional in Oracle, prefer that over eventually-consistent coordination tricks.
|
|
|
|
## 6. Success Criteria
|
|
|
|
The engine architecture is successful when:
|
|
|
|
- the service can start and complete workflows without Elsa
|
|
- task projections remain correct
|
|
- delayed resumes happen without polling
|
|
- a stopped cluster resumes safely after restart
|
|
- a multi-node deployment does not corrupt workflow state
|
|
- canonical definitions remain the execution contract
|
|
- operations can inspect and support the system with existing product-level APIs
|
|
|