release orchestrator pivot, architecture and planning

2026-01-10 22:37:22 +02:00
parent c84f421e2f
commit d509c44411
130 changed files with 70292 additions and 721 deletions
--- a/docs/modules/release-orchestrator/workflow/execution.md
+++ b/docs/modules/release-orchestrator/workflow/execution.md
@@ -0,0 +1,591 @@
+# Workflow Execution
+
+## Overview
+
+The Workflow Engine executes workflow templates as DAGs (Directed Acyclic Graphs) of steps, managing state transitions, parallelism, retries, and failure handling.
+
+## Execution Architecture
+
+```
+                         WORKFLOW EXECUTION ARCHITECTURE
+
+  ┌─────────────────────────────────────────────────────────────────────────────┐
+  │                           WORKFLOW ENGINE                                    │
+  │                                                                             │
+  │  ┌─────────────────────────────────────────────────────────────────────┐   │
+  │  │                        WORKFLOW RUNNER                               │   │
+  │  │                                                                      │   │
+  │  │  ┌────────────┐    ┌────────────┐    ┌────────────┐                │   │
+  │  │  │ Template   │───►│ Execution  │───►│ Context    │                │   │
+  │  │  │ Parser     │    │ Planner    │    │ Builder    │                │   │
+  │  │  └────────────┘    └────────────┘    └────────────┘                │   │
+  │  │         │                │                 │                        │   │
+  │  │         └────────────────┼─────────────────┘                        │   │
+  │  │                          ▼                                          │   │
+  │  │  ┌─────────────────────────────────────────────────────────────┐   │   │
+  │  │  │                    DAG EXECUTOR                              │   │   │
+  │  │  │                                                              │   │   │
+  │  │  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │   │   │
+  │  │  │  │ Ready    │  │ Running  │  │ Waiting  │  │ Completed│   │   │   │
+  │  │  │  │ Queue    │  │ Set      │  │ Set      │  │ Set      │   │   │   │
+  │  │  │  └──────────┘  └──────────┘  └──────────┘  └──────────┘   │   │   │
+  │  │  │                                                              │   │   │
+  │  │  │  ┌──────────────────────────────────────────────────────┐   │   │   │
+  │  │  │  │                 STEP DISPATCHER                       │   │   │   │
+  │  │  │  └──────────────────────────────────────────────────────┘   │   │   │
+  │  │  └─────────────────────────────────────────────────────────────┘   │   │
+  │  └─────────────────────────────────────────────────────────────────────┘   │
+  │                                    │                                        │
+  │                                    ▼                                        │
+  │  ┌─────────────────────────────────────────────────────────────────────┐   │
+  │  │                        STEP EXECUTOR POOL                            │   │
+  │  │                                                                      │   │
+  │  │  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐   │   │
+  │  │  │ Executor 1 │  │ Executor 2 │  │ Executor 3 │  │ Executor N │   │   │
+  │  │  └────────────┘  └────────────┘  └────────────┘  └────────────┘   │   │
+  │  │                                                                      │   │
+  │  └─────────────────────────────────────────────────────────────────────┘   │
+  │                                                                             │
+  └─────────────────────────────────────────────────────────────────────────────┘
+```
+
+## Workflow Run State Machine
+
+```
+                         WORKFLOW RUN STATES
+
+                            ┌──────────┐
+                            │ CREATED  │
+                            └────┬─────┘
+                                 │ start()
+                                 ▼
+                            ┌──────────┐
+                            │ RUNNING  │◄──────────────────┐
+                            └────┬─────┘                   │
+                                 │                         │
+             ┌───────────────────┼───────────────────┐     │
+             │                   │                   │     │
+             ▼                   ▼                   ▼     │
+        ┌──────────┐       ┌──────────┐       ┌──────────┐│
+        │ WAITING  │       │ PAUSED   │       │ FAILING  ││
+        │ APPROVAL │       │          │       │          ││
+        └────┬─────┘       └────┬─────┘       └────┬─────┘│
+             │                  │                  │      │
+             │ approve()        │ resume()         │      │
+             │                  │                  │      │
+             └───────────────►──┴──────────────────┘      │
+                                │                         │
+                                └─────────────────────────┘
+                                │
+        ┌───────────────────────┼───────────────────┐
+        │                       │                   │
+        ▼                       ▼                   ▼
+   ┌──────────┐           ┌──────────┐       ┌──────────┐
+   │COMPLETED │           │ FAILED   │       │ CANCELLED│
+   └──────────┘           └──────────┘       └──────────┘
+```
+
+### State Transitions
+
+| Current State | Event | Next State | Description |
+|---------------|-------|------------|-------------|
+| `created` | `start()` | `running` | Begin workflow execution |
+| `running` | Step requires approval | `waiting_approval` | Pause for human approval |
+| `running` | `pause()` | `paused` | Manual pause requested |
+| `running` | Step fails | `failing` | Handle failure path |
+| `running` | All steps complete | `completed` | Workflow success |
+| `waiting_approval` | `approve()` | `running` | Resume after approval |
+| `waiting_approval` | `reject()` | `failed` | Rejection ends workflow |
+| `paused` | `resume()` | `running` | Resume execution |
+| `paused` | `cancel()` | `cancelled` | Cancel workflow |
+| `failing` | Rollback complete | `failed` | Failure handling done |
+| `failing` | Rollback succeeds | `running` | Resume with fallback |
+
+## Step Execution State Machine
+
+```
+                           STEP STATES
+
+                           ┌──────────┐
+                           │ PENDING  │
+                           └────┬─────┘
+                                │ schedule()
+                                ▼
+                           ┌──────────┐
+                           │ QUEUED   │
+                           └────┬─────┘
+                                │ dispatch()
+                                ▼
+                           ┌──────────┐
+                           │ RUNNING  │◄─────────┐
+                           └────┬─────┘          │
+                                │                │ retry()
+            ┌───────────────────┼───────────────┐│
+            │                   │               ││
+            ▼                   ▼               ▼│
+       ┌──────────┐       ┌──────────┐    ┌──────────┐
+       │SUCCEEDED │       │ FAILED   │    │ RETRYING │
+       └──────────┘       └────┬─────┘    └──────────┘
+                               │
+                               ▼
+                    ┌─────────────────────┐
+                    │  FAILURE HANDLER    │
+                    │  ┌───────────────┐  │
+                    │  │ fail          │──┼─► Mark workflow failing
+                    │  │ continue      │──┼─► Continue to next step
+                    │  │ rollback      │──┼─► Trigger rollback path
+                    │  │ goto:{nodeId} │──┼─► Jump to specific node
+                    │  └───────────────┘  │
+                    └─────────────────────┘
+```
+
+### Step States
+
+| State | Description |
+|-------|-------------|
+| `pending` | Step not yet ready (dependencies incomplete) |
+| `queued` | Ready for execution, waiting for executor |
+| `running` | Currently executing |
+| `succeeded` | Completed successfully |
+| `failed` | Failed after all retries exhausted |
+| `retrying` | Failed, waiting for retry |
+| `skipped` | Condition evaluated to false |
+
+## DAG Execution Algorithm
+
+```python
+class DAGExecutor:
+    def __init__(self, workflow_run: WorkflowRun):
+        self.run = workflow_run
+        self.template = workflow_run.template
+        self.pending = set(node.id for node in template.nodes)
+        self.running = set()
+        self.completed = set()
+        self.failed = set()
+        self.outputs = {}  # nodeId -> outputs
+
+    async def execute(self):
+        """Main execution loop."""
+        self.run.status = WorkflowStatus.RUNNING
+        self.run.started_at = datetime.utcnow()
+
+        while self.pending or self.running:
+            # Find ready nodes (all dependencies satisfied)
+            ready = self.find_ready_nodes()
+
+            # Dispatch ready nodes
+            for node_id in ready:
+                asyncio.create_task(self.execute_node(node_id))
+                self.pending.remove(node_id)
+                self.running.add(node_id)
+
+            # Wait for any node to complete
+            if self.running:
+                await self.wait_for_completion()
+
+            # Check for deadlock
+            if not ready and self.pending and not self.running:
+                raise DeadlockException(self.pending)
+
+        # Determine final status
+        if self.failed:
+            self.run.status = WorkflowStatus.FAILED
+        else:
+            self.run.status = WorkflowStatus.COMPLETED
+
+        self.run.completed_at = datetime.utcnow()
+
+    def find_ready_nodes(self) -> List[str]:
+        """Find nodes whose dependencies are all complete."""
+        ready = []
+        for node_id in self.pending:
+            node = self.template.get_node(node_id)
+
+            # Check condition
+            if node.condition:
+                if not self.evaluate_condition(node.condition):
+                    self.mark_skipped(node_id)
+                    continue
+
+            # Check all incoming edges
+            incoming = self.template.get_incoming_edges(node_id)
+            dependencies_met = all(
+                edge.from_node in self.completed
+                for edge in incoming
+                if self.evaluate_edge_condition(edge)
+            )
+
+            if dependencies_met:
+                ready.append(node_id)
+
+        return ready
+
+    async def execute_node(self, node_id: str):
+        """Execute a single node."""
+        node = self.template.get_node(node_id)
+        step_run = StepRun(
+            workflow_run_id=self.run.id,
+            node_id=node_id,
+            status=StepStatus.RUNNING
+        )
+
+        try:
+            # Resolve inputs
+            inputs = self.resolve_inputs(node)
+
+            # Get step executor
+            executor = self.step_registry.get_executor(node.type)
+
+            # Execute with timeout
+            async with asyncio.timeout(node.timeout):
+                outputs = await executor.execute(inputs, node.config)
+
+            # Store outputs
+            self.outputs[node_id] = outputs
+            step_run.outputs = outputs
+            step_run.status = StepStatus.SUCCEEDED
+
+            self.running.remove(node_id)
+            self.completed.add(node_id)
+
+        except Exception as e:
+            await self.handle_step_failure(node, step_run, e)
+
+    async def handle_step_failure(self, node, step_run, error):
+        """Handle step failure according to retry and failure policies."""
+        step_run.attempt_number += 1
+
+        # Check retry policy
+        if step_run.attempt_number <= node.retry_policy.max_retries:
+            if self.is_retryable(error, node.retry_policy):
+                step_run.status = StepStatus.RETRYING
+                delay = self.calculate_backoff(node.retry_policy, step_run.attempt_number)
+                await asyncio.sleep(delay)
+                await self.execute_node(node.id)  # Retry
+                return
+
+        # No more retries - handle failure
+        step_run.status = StepStatus.FAILED
+        step_run.error = str(error)
+
+        match node.on_failure:
+            case "fail":
+                self.run.status = WorkflowStatus.FAILING
+                self.failed.add(node.id)
+            case "continue":
+                self.completed.add(node.id)  # Continue as if succeeded
+            case "rollback":
+                await self.trigger_rollback(node)
+            case _ if node.on_failure.startswith("goto:"):
+                target = node.on_failure.split(":")[1]
+                self.pending.add(target)  # Add target to pending
+
+        self.running.remove(node.id)
+```
+
+## Input Resolution
+
+Inputs to steps can come from multiple sources:
+
+```typescript
+interface InputResolver {
+  resolve(binding: InputBinding, context: ExecutionContext): any;
+}
+
+class StandardInputResolver implements InputResolver {
+  resolve(binding: InputBinding, context: ExecutionContext): any {
+    switch (binding.source.type) {
+      case "literal":
+        return binding.source.value;
+
+      case "context":
+        // Navigate context path: "release.name" -> context.release.name
+        return this.navigatePath(context, binding.source.path);
+
+      case "output":
+        // Get output from previous step
+        const stepOutputs = context.stepOutputs[binding.source.nodeId];
+        return stepOutputs?.[binding.source.outputName];
+
+      case "secret":
+        // Fetch from vault (never cached)
+        return this.secretsClient.fetch(binding.source.secretName);
+
+      case "expression":
+        // Evaluate JavaScript expression
+        return this.expressionEvaluator.evaluate(
+          binding.source.expression,
+          context
+        );
+    }
+  }
+}
+```
+
+## Execution Context
+
+The execution context provides data available to all steps:
+
+```typescript
+interface ExecutionContext {
+  // Workflow identifiers
+  workflowRunId: UUID;
+  templateId: UUID;
+  templateVersion: number;
+
+  // Input values
+  inputs: Record<string, any>;
+
+  // Domain objects (loaded at start)
+  release?: Release;
+  promotion?: Promotion;
+  environment?: Environment;
+  targets?: Target[];
+
+  // Step outputs (accumulated during execution)
+  stepOutputs: Record<string, Record<string, any>>;
+
+  // Tenant context
+  tenantId: UUID;
+  userId: UUID;
+
+  // Metadata
+  startedAt: DateTime;
+  correlationId: string;
+}
+```
+
+## Concurrency Control
+
+### Parallelism Within Workflows
+
+```typescript
+interface ParallelConfig {
+  maxConcurrency: number;     // Max simultaneous steps
+  failFast: boolean;          // Stop all on first failure
+}
+
+// Example: Parallel deployment to multiple targets
+const parallelDeploy: StepNode = {
+  id: "parallel-deploy",
+  type: "parallel",
+  config: {
+    maxConcurrency: 5,
+    failFast: false
+  },
+  children: [
+    { id: "deploy-target-1", type: "deploy-docker", ... },
+    { id: "deploy-target-2", type: "deploy-docker", ... },
+    { id: "deploy-target-3", type: "deploy-docker", ... },
+  ]
+};
+```
+
+### Global Concurrency Limits
+
+```typescript
+interface ConcurrencyLimits {
+  maxWorkflowsPerTenant: number;        // Concurrent workflow runs
+  maxStepsPerWorkflow: number;          // Concurrent steps per workflow
+  maxDeploymentsPerEnvironment: number; // Prevent deployment conflicts
+}
+
+// Default limits
+const defaults: ConcurrencyLimits = {
+  maxWorkflowsPerTenant: 10,
+  maxStepsPerWorkflow: 20,
+  maxDeploymentsPerEnvironment: 1  // One deployment at a time
+};
+```
+
+## Checkpoint and Resume
+
+Workflows support checkpointing for long-running executions:
+
+```typescript
+interface WorkflowCheckpoint {
+  workflowRunId: UUID;
+  checkpointedAt: DateTime;
+
+  // Execution state
+  pendingNodes: string[];
+  completedNodes: string[];
+  failedNodes: string[];
+
+  // Accumulated data
+  stepOutputs: Record<string, Record<string, any>>;
+
+  // Context snapshot
+  contextSnapshot: ExecutionContext;
+}
+
+class CheckpointManager {
+  // Save checkpoint after each step completion
+  async saveCheckpoint(run: WorkflowRun): Promise<void> {
+    const checkpoint: WorkflowCheckpoint = {
+      workflowRunId: run.id,
+      checkpointedAt: new Date(),
+      pendingNodes: Array.from(run.executor.pending),
+      completedNodes: Array.from(run.executor.completed),
+      failedNodes: Array.from(run.executor.failed),
+      stepOutputs: run.executor.outputs,
+      contextSnapshot: run.context
+    };
+
+    await this.repository.save(checkpoint);
+  }
+
+  // Resume from checkpoint after service restart
+  async resumeFromCheckpoint(workflowRunId: UUID): Promise<WorkflowRun> {
+    const checkpoint = await this.repository.get(workflowRunId);
+
+    const run = new WorkflowRun();
+    run.executor.pending = new Set(checkpoint.pendingNodes);
+    run.executor.completed = new Set(checkpoint.completedNodes);
+    run.executor.failed = new Set(checkpoint.failedNodes);
+    run.executor.outputs = checkpoint.stepOutputs;
+    run.context = checkpoint.contextSnapshot;
+
+    // Resume execution
+    await run.executor.execute();
+    return run;
+  }
+}
+```
+
+## Timeout Handling
+
+```typescript
+interface TimeoutConfig {
+  stepTimeout: number;       // Per-step timeout (seconds)
+  workflowTimeout: number;   // Total workflow timeout (seconds)
+}
+
+class TimeoutHandler {
+  async executeWithTimeout<T>(
+    operation: () => Promise<T>,
+    timeoutSeconds: number,
+    onTimeout: () => Promise<void>
+  ): Promise<T> {
+    const controller = new AbortController();
+    const timeoutId = setTimeout(
+      () => controller.abort(),
+      timeoutSeconds * 1000
+    );
+
+    try {
+      const result = await operation();
+      clearTimeout(timeoutId);
+      return result;
+    } catch (error) {
+      if (error.name === 'AbortError') {
+        await onTimeout();
+        throw new TimeoutException(timeoutSeconds);
+      }
+      throw error;
+    }
+  }
+}
+```
+
+## Event Emission
+
+The workflow engine emits events for observability:
+
+```typescript
+type WorkflowEvent =
+  | { type: "workflow.started"; workflowRunId: UUID; templateId: UUID }
+  | { type: "workflow.completed"; workflowRunId: UUID; status: string }
+  | { type: "workflow.failed"; workflowRunId: UUID; error: string }
+  | { type: "step.started"; workflowRunId: UUID; nodeId: string }
+  | { type: "step.completed"; workflowRunId: UUID; nodeId: string; outputs: any }
+  | { type: "step.failed"; workflowRunId: UUID; nodeId: string; error: string }
+  | { type: "step.retrying"; workflowRunId: UUID; nodeId: string; attempt: number };
+
+class WorkflowEventEmitter {
+  private subscribers: Map<string, ((event: WorkflowEvent) => void)[]> = new Map();
+
+  emit(event: WorkflowEvent): void {
+    const handlers = this.subscribers.get(event.type) || [];
+    for (const handler of handlers) {
+      handler(event);
+    }
+
+    // Also emit to event bus for external consumers
+    this.eventBus.publish("workflow.events", event);
+  }
+}
+```
+
+## Execution Monitoring
+
+### Real-time Progress
+
+```typescript
+interface WorkflowProgress {
+  workflowRunId: UUID;
+  status: WorkflowStatus;
+
+  // Step progress
+  totalSteps: number;
+  completedSteps: number;
+  runningSteps: number;
+  failedSteps: number;
+
+  // Current activity
+  currentNodes: string[];
+
+  // Timing
+  startedAt: DateTime;
+  estimatedCompletion?: DateTime;
+
+  // Step details
+  steps: StepProgress[];
+}
+
+interface StepProgress {
+  nodeId: string;
+  nodeName: string;
+  status: StepStatus;
+  startedAt?: DateTime;
+  completedAt?: DateTime;
+  attempt: number;
+  logs?: string;
+}
+```
+
+### WebSocket Streaming
+
+```typescript
+// Client subscribes to workflow progress
+const ws = new WebSocket(`/api/v1/workflow-runs/${runId}/stream`);
+
+ws.onmessage = (event) => {
+  const progress: WorkflowProgress = JSON.parse(event.data);
+  updateUI(progress);
+};
+
+// Server streams updates
+class WorkflowStreamHandler {
+  async stream(runId: UUID, connection: WebSocket): Promise<void> {
+    const subscription = this.eventBus.subscribe(`workflow.${runId}.*`);
+
+    for await (const event of subscription) {
+      const progress = await this.buildProgress(runId);
+      connection.send(JSON.stringify(progress));
+
+      if (progress.status === 'completed' || progress.status === 'failed') {
+        break;
+      }
+    }
+
+    connection.close();
+  }
+}
+```
+
+## References
+
+- [Workflow Templates](templates.md)
+- [Workflow Engine Module](../modules/workflow-engine.md)
+- [Promotion Manager](../modules/promotion-manager.md)
--- a/docs/modules/release-orchestrator/workflow/promotion.md
+++ b/docs/modules/release-orchestrator/workflow/promotion.md
@@ -0,0 +1,405 @@
+# Promotion State Machine
+
+## Overview
+
+Promotions move releases through environments (Dev -> Staging -> Production). The promotion state machine manages the lifecycle from request to completion.
+
+## Promotion States
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                    PROMOTION STATE MACHINE                                  │
+│                                                                             │
+│                         ┌──────────────────┐                               │
+│                         │ PENDING_APPROVAL │ (initial)                     │
+│                         └────────┬─────────┘                               │
+│                                  │                                          │
+│               ┌──────────────────┼──────────────────┐                      │
+│               │                  │                  │                      │
+│               ▼                  ▼                  ▼                      │
+│      ┌────────────────┐ ┌────────────────┐ ┌────────────────┐             │
+│      │   REJECTED     │ │  PENDING_GATE  │ │   CANCELLED    │             │
+│      └────────────────┘ └────────┬───────┘ └────────────────┘             │
+│                                  │                                          │
+│                                  │ gates pass                               │
+│                                  ▼                                          │
+│                         ┌────────────────┐                                 │
+│                         │    APPROVED    │                                 │
+│                         └────────┬───────┘                                 │
+│                                  │                                          │
+│                                  │ start deployment                         │
+│                                  ▼                                          │
+│                         ┌────────────────┐                                 │
+│                         │   DEPLOYING    │                                 │
+│                         └────────┬───────┘                                 │
+│                                  │                                          │
+│               ┌──────────────────┼──────────────────┐                      │
+│               │                  │                  │                      │
+│               ▼                  ▼                  ▼                      │
+│      ┌────────────────┐ ┌────────────────┐ ┌────────────────┐             │
+│      │    FAILED      │ │   DEPLOYED     │ │  ROLLED_BACK   │             │
+│      └────────────────┘ └────────────────┘ └────────────────┘             │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+## State Definitions
+
+| State | Description |
+|-------|-------------|
+| `pending_approval` | Awaiting human approval (if required) |
+| `pending_gate` | Awaiting automated gate evaluation |
+| `approved` | All approvals and gates satisfied; ready for deployment |
+| `rejected` | Blocked by approval rejection or gate failure |
+| `deploying` | Deployment in progress |
+| `deployed` | Successfully deployed to target environment |
+| `failed` | Deployment failed (not rolled back) |
+| `cancelled` | Cancelled by user before completion |
+| `rolled_back` | Deployment rolled back to previous version |
+
+## State Transitions
+
+### Valid Transitions
+
+```typescript
+const validTransitions: Record<PromotionStatus, PromotionStatus[]> = {
+  pending_approval: ["pending_gate", "approved", "rejected", "cancelled"],
+  pending_gate: ["approved", "rejected", "cancelled"],
+  approved: ["deploying", "cancelled"],
+  deploying: ["deployed", "failed", "rolled_back"],
+  rejected: [],  // terminal
+  cancelled: [], // terminal
+  deployed: [],  // terminal (for this promotion)
+  failed: ["rolled_back"],  // can trigger rollback
+  rolled_back: [] // terminal
+};
+```
+
+### Transition Events
+
+```typescript
+interface PromotionTransition {
+  promotionId: UUID;
+  fromState: PromotionStatus;
+  toState: PromotionStatus;
+  trigger: TransitionTrigger;
+  triggeredBy: UUID;  // user or system
+  timestamp: DateTime;
+  details: object;
+}
+
+type TransitionTrigger =
+  | "approval_granted"
+  | "approval_rejected"
+  | "gate_passed"
+  | "gate_failed"
+  | "deployment_started"
+  | "deployment_completed"
+  | "deployment_failed"
+  | "rollback_triggered"
+  | "rollback_completed"
+  | "user_cancelled";
+```
+
+## Promotion Flow
+
+### 1. Request Promotion
+
+```typescript
+async function requestPromotion(request: PromotionRequest): Promise<Promotion> {
+  // Validate release exists and is ready
+  const release = await getRelease(request.releaseId);
+  if (release.status !== "ready" && release.status !== "deployed") {
+    throw new Error("Release not ready for promotion");
+  }
+
+  // Validate target environment
+  const environment = await getEnvironment(request.targetEnvironmentId);
+
+  // Check freeze windows
+  if (await isEnvironmentFrozen(environment.id)) {
+    throw new Error("Environment is frozen");
+  }
+
+  // Determine initial state
+  const requiresApproval = environment.requiredApprovals > 0;
+  const initialStatus = requiresApproval ? "pending_approval" : "pending_gate";
+
+  // Create promotion
+  const promotion = await createPromotion({
+    releaseId: request.releaseId,
+    sourceEnvironmentId: release.currentEnvironmentId,
+    targetEnvironmentId: environment.id,
+    status: initialStatus,
+    requestedBy: request.userId,
+    requestReason: request.reason
+  });
+
+  // Emit event
+  await emitEvent("promotion.requested", promotion);
+
+  return promotion;
+}
+```
+
+### 2. Approval Phase
+
+```typescript
+async function processApproval(
+  promotionId: UUID,
+  approverId: UUID,
+  action: "approve" | "reject",
+  comment?: string
+): Promise<Promotion> {
+  const promotion = await getPromotion(promotionId);
+  const environment = await getEnvironment(promotion.targetEnvironmentId);
+
+  // Validate approver can approve
+  await validateApproverPermission(approverId, environment.id);
+
+  // Check separation of duties
+  if (environment.requireSeparationOfDuties) {
+    if (approverId === promotion.requestedBy) {
+      throw new Error("Separation of duties violation: requester cannot approve");
+    }
+  }
+
+  // Record approval
+  await recordApproval({
+    promotionId,
+    approverId,
+    action,
+    comment
+  });
+
+  if (action === "reject") {
+    return await transitionState(promotion, "rejected", {
+      trigger: "approval_rejected",
+      triggeredBy: approverId,
+      details: { reason: comment }
+    });
+  }
+
+  // Check if all required approvals received
+  const approvalCount = await countApprovals(promotionId);
+  if (approvalCount >= environment.requiredApprovals) {
+    return await transitionState(promotion, "pending_gate", {
+      trigger: "approval_granted",
+      triggeredBy: approverId
+    });
+  }
+
+  return promotion;
+}
+```
+
+### 3. Gate Evaluation
+
+```typescript
+async function evaluateGates(promotionId: UUID): Promise<GateEvaluationResult> {
+  const promotion = await getPromotion(promotionId);
+  const environment = await getEnvironment(promotion.targetEnvironmentId);
+  const release = await getRelease(promotion.releaseId);
+
+  const gateResults: GateResult[] = [];
+
+  // Security gate
+  const securityResult = await evaluateSecurityGate(release, environment);
+  gateResults.push(securityResult);
+
+  // Custom policy gates
+  for (const policy of environment.policies) {
+    const policyResult = await evaluatePolicyGate(release, environment, policy);
+    gateResults.push(policyResult);
+  }
+
+  // Aggregate results
+  const allPassed = gateResults.every(g => g.passed);
+  const blockingFailures = gateResults.filter(g => !g.passed && g.blocking);
+
+  // Create decision record
+  const decisionRecord = await createDecisionRecord({
+    promotionId,
+    gateResults,
+    decision: allPassed ? "allow" : "block",
+    decidedAt: new Date()
+  });
+
+  // Transition state
+  if (allPassed) {
+    await transitionState(promotion, "approved", {
+      trigger: "gate_passed",
+      triggeredBy: "system",
+      details: { decisionRecordId: decisionRecord.id }
+    });
+  } else {
+    await transitionState(promotion, "rejected", {
+      trigger: "gate_failed",
+      triggeredBy: "system",
+      details: { blockingGates: blockingFailures }
+    });
+  }
+
+  return { passed: allPassed, gateResults, decisionRecord };
+}
+```
+
+### 4. Deployment Execution
+
+```typescript
+async function executeDeployment(promotionId: UUID): Promise<DeploymentJob> {
+  const promotion = await getPromotion(promotionId);
+
+  // Transition to deploying
+  await transitionState(promotion, "deploying", {
+    trigger: "deployment_started",
+    triggeredBy: "system"
+  });
+
+  // Generate artifacts
+  const artifacts = await generateArtifacts(promotion);
+
+  // Create deployment job
+  const job = await createDeploymentJob({
+    promotionId,
+    releaseId: promotion.releaseId,
+    environmentId: promotion.targetEnvironmentId,
+    artifacts
+  });
+
+  // Execute via workflow or direct
+  const workflowRun = await startDeploymentWorkflow(job);
+
+  // Update promotion with workflow reference
+  await updatePromotion(promotionId, { workflowRunId: workflowRun.id });
+
+  return job;
+}
+```
+
+### 5. Completion Handling
+
+```typescript
+async function handleDeploymentCompletion(
+  jobId: UUID,
+  status: "succeeded" | "failed"
+): Promise<Promotion> {
+  const job = await getDeploymentJob(jobId);
+  const promotion = await getPromotion(job.promotionId);
+
+  if (status === "succeeded") {
+    // Generate evidence packet
+    const evidence = await generateEvidencePacket(promotion, job);
+
+    // Update release environment state
+    await updateReleaseEnvironmentState({
+      releaseId: promotion.releaseId,
+      environmentId: promotion.targetEnvironmentId,
+      status: "deployed",
+      promotionId: promotion.id,
+      evidenceRef: evidence.id
+    });
+
+    return await transitionState(promotion, "deployed", {
+      trigger: "deployment_completed",
+      triggeredBy: "system",
+      details: { evidencePacketId: evidence.id }
+    });
+  } else {
+    return await transitionState(promotion, "failed", {
+      trigger: "deployment_failed",
+      triggeredBy: "system",
+      details: { jobId, error: job.errorMessage }
+    });
+  }
+}
+```
+
+## Decision Record
+
+Every promotion produces a decision record:
+
+```typescript
+interface DecisionRecord {
+  id: UUID;
+  promotionId: UUID;
+  decision: "allow" | "block";
+  decidedAt: DateTime;
+
+  // Inputs
+  release: {
+    id: UUID;
+    name: string;
+    components: Array<{ name: string; digest: string }>;
+  };
+  environment: {
+    id: UUID;
+    name: string;
+  };
+
+  // Gate results
+  gateResults: Array<{
+    gateName: string;
+    gateType: string;
+    passed: boolean;
+    blocking: boolean;
+    message: string;
+    details: object;
+    evaluatedAt: DateTime;
+  }>;
+
+  // Approvals
+  approvals: Array<{
+    approverId: UUID;
+    approverName: string;
+    action: "approved" | "rejected";
+    comment?: string;
+    timestamp: DateTime;
+  }>;
+
+  // Context
+  requester: {
+    id: UUID;
+    name: string;
+  };
+  requestReason: string;
+
+  // Signature
+  contentHash: string;
+  signature: string;
+}
+```
+
+## API Endpoints
+
+```yaml
+# Request promotion
+POST /api/v1/promotions
+Body: { releaseId, targetEnvironmentId, reason? }
+Response: Promotion
+
+# Approve/reject promotion
+POST /api/v1/promotions/{id}/approve
+POST /api/v1/promotions/{id}/reject
+Body: { comment? }
+Response: Promotion
+
+# Cancel promotion
+POST /api/v1/promotions/{id}/cancel
+Response: Promotion
+
+# Get decision record
+GET /api/v1/promotions/{id}/decision
+Response: DecisionRecord
+
+# Preview gates (dry run)
+POST /api/v1/promotions/preview-gates
+Body: { releaseId, targetEnvironmentId }
+Response: { wouldPass: boolean, gates: GateResult[] }
+```
+
+## References
+
+- [Workflow Templates](templates.md)
+- [Workflow Execution](execution.md)
+- [Evidence Schema](../appendices/evidence-schema.md)
--- a/docs/modules/release-orchestrator/workflow/templates.md
+++ b/docs/modules/release-orchestrator/workflow/templates.md
@@ -0,0 +1,327 @@
+# Workflow Template Structure
+
+## Overview
+
+Workflow templates define the DAG (Directed Acyclic Graph) of steps to execute during deployment, promotion, and other automated processes.
+
+## Template Structure
+
+```typescript
+interface WorkflowTemplate {
+  id: UUID;
+  tenantId: UUID;
+  name: string;                    // "standard-deploy"
+  displayName: string;             // "Standard Deployment"
+  description: string;
+  version: number;                 // Auto-incremented
+
+  // DAG structure
+  nodes: StepNode[];
+  edges: StepEdge[];
+
+  // I/O definitions
+  inputs: InputDefinition[];
+  outputs: OutputDefinition[];
+
+  // Metadata
+  tags: string[];
+  isBuiltin: boolean;
+  createdAt: DateTime;
+  createdBy: UUID;
+}
+```
+
+## Node Types
+
+### Step Node
+
+```typescript
+interface StepNode {
+  id: string;                    // Unique within template (e.g., "deploy-api")
+  type: string;                  // Step type from registry
+  name: string;                  // Display name
+  config: Record<string, any>;   // Step-specific configuration
+  inputs: InputBinding[];        // Input value bindings
+  outputs: OutputBinding[];      // Output declarations
+  position: { x: number; y: number };  // UI position
+
+  // Execution settings
+  timeout: number;               // Seconds (default from step type)
+  retryPolicy: RetryPolicy;
+  onFailure: FailureAction;
+  condition?: string;            // JS expression for conditional execution
+
+  // Documentation
+  description?: string;
+  documentation?: string;
+}
+
+type FailureAction = "fail" | "continue" | "rollback" | "goto:{nodeId}";
+
+interface RetryPolicy {
+  maxRetries: number;
+  backoffType: "fixed" | "exponential";
+  backoffSeconds: number;
+  retryableErrors: string[];
+}
+```
+
+### Input Bindings
+
+```typescript
+interface InputBinding {
+  name: string;                  // Input parameter name
+  source: InputSource;
+}
+
+type InputSource =
+  | { type: "literal"; value: any }
+  | { type: "context"; path: string }        // e.g., "release.name"
+  | { type: "output"; nodeId: string; outputName: string }
+  | { type: "secret"; secretName: string }
+  | { type: "expression"; expression: string };  // JS expression
+```
+
+### Edge Types
+
+```typescript
+interface StepEdge {
+  id: string;
+  from: string;           // Source node ID
+  to: string;             // Target node ID
+  condition?: string;     // Optional condition expression
+  label?: string;         // Display label for conditional edges
+}
+```
+
+## Built-in Step Types
+
+### Control Steps
+
+| Type | Description | Config |
+|------|-------------|--------|
+| `approval` | Wait for human approval | `promotionId` |
+| `wait` | Wait for specified duration | `durationSeconds` |
+| `condition` | Branch based on condition | `expression` |
+| `parallel` | Execute children in parallel | `maxConcurrency` |
+
+### Gate Steps
+
+| Type | Description | Config |
+|------|-------------|--------|
+| `security-gate` | Evaluate security policy | `blockOnCritical`, `blockOnHigh` |
+| `custom-gate` | Custom OPA policy evaluation | `policyName` |
+| `freeze-check` | Check freeze windows | - |
+| `approval-check` | Check approval status | `requiredCount` |
+
+### Deploy Steps
+
+| Type | Description | Config |
+|------|-------------|--------|
+| `deploy-docker` | Deploy single container | `containerName`, `strategy` |
+| `deploy-compose` | Deploy Docker Compose stack | `composePath`, `strategy` |
+| `deploy-ecs` | Deploy to AWS ECS | `cluster`, `service` |
+| `deploy-nomad` | Deploy to HashiCorp Nomad | `jobName` |
+
+### Verification Steps
+
+| Type | Description | Config |
+|------|-------------|--------|
+| `health-check` | HTTP/TCP health check | `type`, `path`, `expectedStatus` |
+| `smoke-test` | Run smoke test suite | `testSuite`, `timeout` |
+| `verify-digest` | Verify deployed digest | `expectedDigest` |
+
+### Integration Steps
+
+| Type | Description | Config |
+|------|-------------|--------|
+| `webhook` | Call external webhook | `url`, `method`, `headers` |
+| `trigger-ci` | Trigger CI pipeline | `integrationId`, `pipelineId` |
+| `wait-ci` | Wait for CI pipeline | `runId`, `timeout` |
+
+### Notification Steps
+
+| Type | Description | Config |
+|------|-------------|--------|
+| `notify` | Send notification | `channel`, `template` |
+| `slack` | Send Slack message | `channel`, `message` |
+| `email` | Send email | `recipients`, `template` |
+
+### Recovery Steps
+
+| Type | Description | Config |
+|------|-------------|--------|
+| `rollback` | Rollback deployment | `strategy`, `targetReleaseId` |
+| `execute-script` | Run recovery script | `scriptType`, `scriptRef` |
+
+## Template Example: Standard Deployment
+
+```json
+{
+  "id": "template-standard-deploy",
+  "name": "standard-deploy",
+  "displayName": "Standard Deployment",
+  "version": 1,
+  "inputs": [
+    { "name": "releaseId", "type": "uuid", "required": true },
+    { "name": "environmentId", "type": "uuid", "required": true },
+    { "name": "promotionId", "type": "uuid", "required": true }
+  ],
+  "nodes": [
+    {
+      "id": "approval",
+      "type": "approval",
+      "name": "Approval Gate",
+      "config": {},
+      "inputs": [
+        { "name": "promotionId", "source": { "type": "context", "path": "promotionId" } }
+      ],
+      "position": { "x": 100, "y": 100 }
+    },
+    {
+      "id": "security-gate",
+      "type": "security-gate",
+      "name": "Security Verification",
+      "config": {
+        "blockOnCritical": true,
+        "blockOnHigh": true
+      },
+      "inputs": [
+        { "name": "releaseId", "source": { "type": "context", "path": "releaseId" } }
+      ],
+      "position": { "x": 100, "y": 200 }
+    },
+    {
+      "id": "pre-deploy-hook",
+      "type": "execute-script",
+      "name": "Pre-Deploy Hook",
+      "config": {
+        "scriptType": "csharp",
+        "scriptRef": "hooks/pre-deploy.csx"
+      },
+      "inputs": [
+        { "name": "release", "source": { "type": "context", "path": "release" } },
+        { "name": "environment", "source": { "type": "context", "path": "environment" } }
+      ],
+      "timeout": 300,
+      "onFailure": "fail",
+      "position": { "x": 100, "y": 300 }
+    },
+    {
+      "id": "deploy-targets",
+      "type": "deploy-compose",
+      "name": "Deploy to Targets",
+      "config": {
+        "strategy": "rolling",
+        "parallelism": 2
+      },
+      "inputs": [
+        { "name": "releaseId", "source": { "type": "context", "path": "releaseId" } },
+        { "name": "environmentId", "source": { "type": "context", "path": "environmentId" } }
+      ],
+      "timeout": 600,
+      "retryPolicy": {
+        "maxRetries": 2,
+        "backoffType": "exponential",
+        "backoffSeconds": 30
+      },
+      "onFailure": "rollback",
+      "position": { "x": 100, "y": 400 }
+    },
+    {
+      "id": "health-check",
+      "type": "health-check",
+      "name": "Health Verification",
+      "config": {
+        "type": "http",
+        "path": "/health",
+        "expectedStatus": 200,
+        "timeout": 30,
+        "retries": 5
+      },
+      "inputs": [
+        { "name": "targets", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "deployedTargets" } }
+      ],
+      "onFailure": "rollback",
+      "position": { "x": 100, "y": 500 }
+    },
+    {
+      "id": "post-deploy-hook",
+      "type": "execute-script",
+      "name": "Post-Deploy Hook",
+      "config": {
+        "scriptType": "bash",
+        "inline": "echo 'Deployment complete'"
+      },
+      "timeout": 300,
+      "onFailure": "continue",
+      "position": { "x": 100, "y": 600 }
+    },
+    {
+      "id": "notify-success",
+      "type": "notify",
+      "name": "Success Notification",
+      "config": {
+        "channel": "slack",
+        "template": "deployment-success"
+      },
+      "inputs": [
+        { "name": "release", "source": { "type": "context", "path": "release" } },
+        { "name": "environment", "source": { "type": "context", "path": "environment" } }
+      ],
+      "onFailure": "continue",
+      "position": { "x": 100, "y": 700 }
+    },
+    {
+      "id": "rollback-handler",
+      "type": "rollback",
+      "name": "Rollback Handler",
+      "config": {
+        "strategy": "to-previous"
+      },
+      "inputs": [
+        { "name": "deploymentJobId", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "jobId" } }
+      ],
+      "position": { "x": 300, "y": 450 }
+    },
+    {
+      "id": "notify-failure",
+      "type": "notify",
+      "name": "Failure Notification",
+      "config": {
+        "channel": "slack",
+        "template": "deployment-failure"
+      },
+      "onFailure": "continue",
+      "position": { "x": 300, "y": 550 }
+    }
+  ],
+  "edges": [
+    { "id": "e1", "from": "approval", "to": "security-gate" },
+    { "id": "e2", "from": "security-gate", "to": "pre-deploy-hook" },
+    { "id": "e3", "from": "pre-deploy-hook", "to": "deploy-targets" },
+    { "id": "e4", "from": "deploy-targets", "to": "health-check" },
+    { "id": "e5", "from": "health-check", "to": "post-deploy-hook" },
+    { "id": "e6", "from": "post-deploy-hook", "to": "notify-success" },
+    { "id": "e7", "from": "deploy-targets", "to": "rollback-handler", "condition": "status === 'failed'" },
+    { "id": "e8", "from": "health-check", "to": "rollback-handler", "condition": "status === 'failed'" },
+    { "id": "e9", "from": "rollback-handler", "to": "notify-failure" }
+  ]
+}
+```
+
+## Template Validation
+
+Templates are validated for:
+
+1. **Structural validity**: Valid JSON/YAML, required fields present
+2. **DAG validity**: No cycles, all edges reference valid nodes
+3. **Type validity**: All step types exist in registry
+4. **Schema validity**: Step configs match type schemas
+5. **Input validity**: All required inputs are bindable
+
+## References
+
+- [Workflow Engine](../modules/workflow-engine.md)
+- [Execution State Machine](execution.md)
+- [Step Registry](../modules/workflow-engine.md#module-step-registry)