release orchestrator pivot, architecture and planning

This commit is contained in:
2026-01-10 22:37:22 +02:00
parent c84f421e2f
commit d509c44411
130 changed files with 70292 additions and 721 deletions

View File

@@ -0,0 +1,591 @@
# Workflow Execution
## Overview
The Workflow Engine executes workflow templates as DAGs (Directed Acyclic Graphs) of steps, managing state transitions, parallelism, retries, and failure handling.
## Execution Architecture
```
WORKFLOW EXECUTION ARCHITECTURE
┌─────────────────────────────────────────────────────────────────────────────┐
│ WORKFLOW ENGINE │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ WORKFLOW RUNNER │ │
│ │ │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ Template │───►│ Execution │───►│ Context │ │ │
│ │ │ Parser │ │ Planner │ │ Builder │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ │ │ │ │ │ │
│ │ └────────────────┼─────────────────┘ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ DAG EXECUTOR │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │
│ │ │ │ Ready │ │ Running │ │ Waiting │ │ Completed│ │ │ │
│ │ │ │ Queue │ │ Set │ │ Set │ │ Set │ │ │ │
│ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ STEP DISPATCHER │ │ │ │
│ │ │ └──────────────────────────────────────────────────────┘ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ STEP EXECUTOR POOL │ │
│ │ │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ Executor 1 │ │ Executor 2 │ │ Executor 3 │ │ Executor N │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Workflow Run State Machine
```
WORKFLOW RUN STATES
┌──────────┐
│ CREATED │
└────┬─────┘
│ start()
┌──────────┐
│ RUNNING │◄──────────────────┐
└────┬─────┘ │
│ │
┌───────────────────┼───────────────────┐ │
│ │ │ │
▼ ▼ ▼ │
┌──────────┐ ┌──────────┐ ┌──────────┐│
│ WAITING │ │ PAUSED │ │ FAILING ││
│ APPROVAL │ │ │ │ ││
└────┬─────┘ └────┬─────┘ └────┬─────┘│
│ │ │ │
│ approve() │ resume() │ │
│ │ │ │
└───────────────►──┴──────────────────┘ │
│ │
└─────────────────────────┘
┌───────────────────────┼───────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│COMPLETED │ │ FAILED │ │ CANCELLED│
└──────────┘ └──────────┘ └──────────┘
```
### State Transitions
| Current State | Event | Next State | Description |
|---------------|-------|------------|-------------|
| `created` | `start()` | `running` | Begin workflow execution |
| `running` | Step requires approval | `waiting_approval` | Pause for human approval |
| `running` | `pause()` | `paused` | Manual pause requested |
| `running` | Step fails | `failing` | Handle failure path |
| `running` | All steps complete | `completed` | Workflow success |
| `waiting_approval` | `approve()` | `running` | Resume after approval |
| `waiting_approval` | `reject()` | `failed` | Rejection ends workflow |
| `paused` | `resume()` | `running` | Resume execution |
| `paused` | `cancel()` | `cancelled` | Cancel workflow |
| `failing` | Rollback complete | `failed` | Failure handling done |
| `failing` | Rollback succeeds | `running` | Resume with fallback |
## Step Execution State Machine
```
STEP STATES
┌──────────┐
│ PENDING │
└────┬─────┘
│ schedule()
┌──────────┐
│ QUEUED │
└────┬─────┘
│ dispatch()
┌──────────┐
│ RUNNING │◄─────────┐
└────┬─────┘ │
│ │ retry()
┌───────────────────┼───────────────┐│
│ │ ││
▼ ▼ ▼│
┌──────────┐ ┌──────────┐ ┌──────────┐
│SUCCEEDED │ │ FAILED │ │ RETRYING │
└──────────┘ └────┬─────┘ └──────────┘
┌─────────────────────┐
│ FAILURE HANDLER │
│ ┌───────────────┐ │
│ │ fail │──┼─► Mark workflow failing
│ │ continue │──┼─► Continue to next step
│ │ rollback │──┼─► Trigger rollback path
│ │ goto:{nodeId} │──┼─► Jump to specific node
│ └───────────────┘ │
└─────────────────────┘
```
### Step States
| State | Description |
|-------|-------------|
| `pending` | Step not yet ready (dependencies incomplete) |
| `queued` | Ready for execution, waiting for executor |
| `running` | Currently executing |
| `succeeded` | Completed successfully |
| `failed` | Failed after all retries exhausted |
| `retrying` | Failed, waiting for retry |
| `skipped` | Condition evaluated to false |
## DAG Execution Algorithm
```python
class DAGExecutor:
def __init__(self, workflow_run: WorkflowRun):
self.run = workflow_run
self.template = workflow_run.template
self.pending = set(node.id for node in template.nodes)
self.running = set()
self.completed = set()
self.failed = set()
self.outputs = {} # nodeId -> outputs
async def execute(self):
"""Main execution loop."""
self.run.status = WorkflowStatus.RUNNING
self.run.started_at = datetime.utcnow()
while self.pending or self.running:
# Find ready nodes (all dependencies satisfied)
ready = self.find_ready_nodes()
# Dispatch ready nodes
for node_id in ready:
asyncio.create_task(self.execute_node(node_id))
self.pending.remove(node_id)
self.running.add(node_id)
# Wait for any node to complete
if self.running:
await self.wait_for_completion()
# Check for deadlock
if not ready and self.pending and not self.running:
raise DeadlockException(self.pending)
# Determine final status
if self.failed:
self.run.status = WorkflowStatus.FAILED
else:
self.run.status = WorkflowStatus.COMPLETED
self.run.completed_at = datetime.utcnow()
def find_ready_nodes(self) -> List[str]:
"""Find nodes whose dependencies are all complete."""
ready = []
for node_id in self.pending:
node = self.template.get_node(node_id)
# Check condition
if node.condition:
if not self.evaluate_condition(node.condition):
self.mark_skipped(node_id)
continue
# Check all incoming edges
incoming = self.template.get_incoming_edges(node_id)
dependencies_met = all(
edge.from_node in self.completed
for edge in incoming
if self.evaluate_edge_condition(edge)
)
if dependencies_met:
ready.append(node_id)
return ready
async def execute_node(self, node_id: str):
"""Execute a single node."""
node = self.template.get_node(node_id)
step_run = StepRun(
workflow_run_id=self.run.id,
node_id=node_id,
status=StepStatus.RUNNING
)
try:
# Resolve inputs
inputs = self.resolve_inputs(node)
# Get step executor
executor = self.step_registry.get_executor(node.type)
# Execute with timeout
async with asyncio.timeout(node.timeout):
outputs = await executor.execute(inputs, node.config)
# Store outputs
self.outputs[node_id] = outputs
step_run.outputs = outputs
step_run.status = StepStatus.SUCCEEDED
self.running.remove(node_id)
self.completed.add(node_id)
except Exception as e:
await self.handle_step_failure(node, step_run, e)
async def handle_step_failure(self, node, step_run, error):
"""Handle step failure according to retry and failure policies."""
step_run.attempt_number += 1
# Check retry policy
if step_run.attempt_number <= node.retry_policy.max_retries:
if self.is_retryable(error, node.retry_policy):
step_run.status = StepStatus.RETRYING
delay = self.calculate_backoff(node.retry_policy, step_run.attempt_number)
await asyncio.sleep(delay)
await self.execute_node(node.id) # Retry
return
# No more retries - handle failure
step_run.status = StepStatus.FAILED
step_run.error = str(error)
match node.on_failure:
case "fail":
self.run.status = WorkflowStatus.FAILING
self.failed.add(node.id)
case "continue":
self.completed.add(node.id) # Continue as if succeeded
case "rollback":
await self.trigger_rollback(node)
case _ if node.on_failure.startswith("goto:"):
target = node.on_failure.split(":")[1]
self.pending.add(target) # Add target to pending
self.running.remove(node.id)
```
## Input Resolution
Inputs to steps can come from multiple sources:
```typescript
interface InputResolver {
resolve(binding: InputBinding, context: ExecutionContext): any;
}
class StandardInputResolver implements InputResolver {
resolve(binding: InputBinding, context: ExecutionContext): any {
switch (binding.source.type) {
case "literal":
return binding.source.value;
case "context":
// Navigate context path: "release.name" -> context.release.name
return this.navigatePath(context, binding.source.path);
case "output":
// Get output from previous step
const stepOutputs = context.stepOutputs[binding.source.nodeId];
return stepOutputs?.[binding.source.outputName];
case "secret":
// Fetch from vault (never cached)
return this.secretsClient.fetch(binding.source.secretName);
case "expression":
// Evaluate JavaScript expression
return this.expressionEvaluator.evaluate(
binding.source.expression,
context
);
}
}
}
```
## Execution Context
The execution context provides data available to all steps:
```typescript
interface ExecutionContext {
// Workflow identifiers
workflowRunId: UUID;
templateId: UUID;
templateVersion: number;
// Input values
inputs: Record<string, any>;
// Domain objects (loaded at start)
release?: Release;
promotion?: Promotion;
environment?: Environment;
targets?: Target[];
// Step outputs (accumulated during execution)
stepOutputs: Record<string, Record<string, any>>;
// Tenant context
tenantId: UUID;
userId: UUID;
// Metadata
startedAt: DateTime;
correlationId: string;
}
```
## Concurrency Control
### Parallelism Within Workflows
```typescript
interface ParallelConfig {
maxConcurrency: number; // Max simultaneous steps
failFast: boolean; // Stop all on first failure
}
// Example: Parallel deployment to multiple targets
const parallelDeploy: StepNode = {
id: "parallel-deploy",
type: "parallel",
config: {
maxConcurrency: 5,
failFast: false
},
children: [
{ id: "deploy-target-1", type: "deploy-docker", ... },
{ id: "deploy-target-2", type: "deploy-docker", ... },
{ id: "deploy-target-3", type: "deploy-docker", ... },
]
};
```
### Global Concurrency Limits
```typescript
interface ConcurrencyLimits {
maxWorkflowsPerTenant: number; // Concurrent workflow runs
maxStepsPerWorkflow: number; // Concurrent steps per workflow
maxDeploymentsPerEnvironment: number; // Prevent deployment conflicts
}
// Default limits
const defaults: ConcurrencyLimits = {
maxWorkflowsPerTenant: 10,
maxStepsPerWorkflow: 20,
maxDeploymentsPerEnvironment: 1 // One deployment at a time
};
```
## Checkpoint and Resume
Workflows support checkpointing for long-running executions:
```typescript
interface WorkflowCheckpoint {
workflowRunId: UUID;
checkpointedAt: DateTime;
// Execution state
pendingNodes: string[];
completedNodes: string[];
failedNodes: string[];
// Accumulated data
stepOutputs: Record<string, Record<string, any>>;
// Context snapshot
contextSnapshot: ExecutionContext;
}
class CheckpointManager {
// Save checkpoint after each step completion
async saveCheckpoint(run: WorkflowRun): Promise<void> {
const checkpoint: WorkflowCheckpoint = {
workflowRunId: run.id,
checkpointedAt: new Date(),
pendingNodes: Array.from(run.executor.pending),
completedNodes: Array.from(run.executor.completed),
failedNodes: Array.from(run.executor.failed),
stepOutputs: run.executor.outputs,
contextSnapshot: run.context
};
await this.repository.save(checkpoint);
}
// Resume from checkpoint after service restart
async resumeFromCheckpoint(workflowRunId: UUID): Promise<WorkflowRun> {
const checkpoint = await this.repository.get(workflowRunId);
const run = new WorkflowRun();
run.executor.pending = new Set(checkpoint.pendingNodes);
run.executor.completed = new Set(checkpoint.completedNodes);
run.executor.failed = new Set(checkpoint.failedNodes);
run.executor.outputs = checkpoint.stepOutputs;
run.context = checkpoint.contextSnapshot;
// Resume execution
await run.executor.execute();
return run;
}
}
```
## Timeout Handling
```typescript
interface TimeoutConfig {
stepTimeout: number; // Per-step timeout (seconds)
workflowTimeout: number; // Total workflow timeout (seconds)
}
class TimeoutHandler {
async executeWithTimeout<T>(
operation: () => Promise<T>,
timeoutSeconds: number,
onTimeout: () => Promise<void>
): Promise<T> {
const controller = new AbortController();
const timeoutId = setTimeout(
() => controller.abort(),
timeoutSeconds * 1000
);
try {
const result = await operation();
clearTimeout(timeoutId);
return result;
} catch (error) {
if (error.name === 'AbortError') {
await onTimeout();
throw new TimeoutException(timeoutSeconds);
}
throw error;
}
}
}
```
## Event Emission
The workflow engine emits events for observability:
```typescript
type WorkflowEvent =
| { type: "workflow.started"; workflowRunId: UUID; templateId: UUID }
| { type: "workflow.completed"; workflowRunId: UUID; status: string }
| { type: "workflow.failed"; workflowRunId: UUID; error: string }
| { type: "step.started"; workflowRunId: UUID; nodeId: string }
| { type: "step.completed"; workflowRunId: UUID; nodeId: string; outputs: any }
| { type: "step.failed"; workflowRunId: UUID; nodeId: string; error: string }
| { type: "step.retrying"; workflowRunId: UUID; nodeId: string; attempt: number };
class WorkflowEventEmitter {
private subscribers: Map<string, ((event: WorkflowEvent) => void)[]> = new Map();
emit(event: WorkflowEvent): void {
const handlers = this.subscribers.get(event.type) || [];
for (const handler of handlers) {
handler(event);
}
// Also emit to event bus for external consumers
this.eventBus.publish("workflow.events", event);
}
}
```
## Execution Monitoring
### Real-time Progress
```typescript
interface WorkflowProgress {
workflowRunId: UUID;
status: WorkflowStatus;
// Step progress
totalSteps: number;
completedSteps: number;
runningSteps: number;
failedSteps: number;
// Current activity
currentNodes: string[];
// Timing
startedAt: DateTime;
estimatedCompletion?: DateTime;
// Step details
steps: StepProgress[];
}
interface StepProgress {
nodeId: string;
nodeName: string;
status: StepStatus;
startedAt?: DateTime;
completedAt?: DateTime;
attempt: number;
logs?: string;
}
```
### WebSocket Streaming
```typescript
// Client subscribes to workflow progress
const ws = new WebSocket(`/api/v1/workflow-runs/${runId}/stream`);
ws.onmessage = (event) => {
const progress: WorkflowProgress = JSON.parse(event.data);
updateUI(progress);
};
// Server streams updates
class WorkflowStreamHandler {
async stream(runId: UUID, connection: WebSocket): Promise<void> {
const subscription = this.eventBus.subscribe(`workflow.${runId}.*`);
for await (const event of subscription) {
const progress = await this.buildProgress(runId);
connection.send(JSON.stringify(progress));
if (progress.status === 'completed' || progress.status === 'failed') {
break;
}
}
connection.close();
}
}
```
## References
- [Workflow Templates](templates.md)
- [Workflow Engine Module](../modules/workflow-engine.md)
- [Promotion Manager](../modules/promotion-manager.md)

View File

@@ -0,0 +1,405 @@
# Promotion State Machine
## Overview
Promotions move releases through environments (Dev -> Staging -> Production). The promotion state machine manages the lifecycle from request to completion.
## Promotion States
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ PROMOTION STATE MACHINE │
│ │
│ ┌──────────────────┐ │
│ │ PENDING_APPROVAL │ (initial) │
│ └────────┬─────────┘ │
│ │ │
│ ┌──────────────────┼──────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ REJECTED │ │ PENDING_GATE │ │ CANCELLED │ │
│ └────────────────┘ └────────┬───────┘ └────────────────┘ │
│ │ │
│ │ gates pass │
│ ▼ │
│ ┌────────────────┐ │
│ │ APPROVED │ │
│ └────────┬───────┘ │
│ │ │
│ │ start deployment │
│ ▼ │
│ ┌────────────────┐ │
│ │ DEPLOYING │ │
│ └────────┬───────┘ │
│ │ │
│ ┌──────────────────┼──────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ FAILED │ │ DEPLOYED │ │ ROLLED_BACK │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## State Definitions
| State | Description |
|-------|-------------|
| `pending_approval` | Awaiting human approval (if required) |
| `pending_gate` | Awaiting automated gate evaluation |
| `approved` | All approvals and gates satisfied; ready for deployment |
| `rejected` | Blocked by approval rejection or gate failure |
| `deploying` | Deployment in progress |
| `deployed` | Successfully deployed to target environment |
| `failed` | Deployment failed (not rolled back) |
| `cancelled` | Cancelled by user before completion |
| `rolled_back` | Deployment rolled back to previous version |
## State Transitions
### Valid Transitions
```typescript
const validTransitions: Record<PromotionStatus, PromotionStatus[]> = {
pending_approval: ["pending_gate", "approved", "rejected", "cancelled"],
pending_gate: ["approved", "rejected", "cancelled"],
approved: ["deploying", "cancelled"],
deploying: ["deployed", "failed", "rolled_back"],
rejected: [], // terminal
cancelled: [], // terminal
deployed: [], // terminal (for this promotion)
failed: ["rolled_back"], // can trigger rollback
rolled_back: [] // terminal
};
```
### Transition Events
```typescript
interface PromotionTransition {
promotionId: UUID;
fromState: PromotionStatus;
toState: PromotionStatus;
trigger: TransitionTrigger;
triggeredBy: UUID; // user or system
timestamp: DateTime;
details: object;
}
type TransitionTrigger =
| "approval_granted"
| "approval_rejected"
| "gate_passed"
| "gate_failed"
| "deployment_started"
| "deployment_completed"
| "deployment_failed"
| "rollback_triggered"
| "rollback_completed"
| "user_cancelled";
```
## Promotion Flow
### 1. Request Promotion
```typescript
async function requestPromotion(request: PromotionRequest): Promise<Promotion> {
// Validate release exists and is ready
const release = await getRelease(request.releaseId);
if (release.status !== "ready" && release.status !== "deployed") {
throw new Error("Release not ready for promotion");
}
// Validate target environment
const environment = await getEnvironment(request.targetEnvironmentId);
// Check freeze windows
if (await isEnvironmentFrozen(environment.id)) {
throw new Error("Environment is frozen");
}
// Determine initial state
const requiresApproval = environment.requiredApprovals > 0;
const initialStatus = requiresApproval ? "pending_approval" : "pending_gate";
// Create promotion
const promotion = await createPromotion({
releaseId: request.releaseId,
sourceEnvironmentId: release.currentEnvironmentId,
targetEnvironmentId: environment.id,
status: initialStatus,
requestedBy: request.userId,
requestReason: request.reason
});
// Emit event
await emitEvent("promotion.requested", promotion);
return promotion;
}
```
### 2. Approval Phase
```typescript
async function processApproval(
promotionId: UUID,
approverId: UUID,
action: "approve" | "reject",
comment?: string
): Promise<Promotion> {
const promotion = await getPromotion(promotionId);
const environment = await getEnvironment(promotion.targetEnvironmentId);
// Validate approver can approve
await validateApproverPermission(approverId, environment.id);
// Check separation of duties
if (environment.requireSeparationOfDuties) {
if (approverId === promotion.requestedBy) {
throw new Error("Separation of duties violation: requester cannot approve");
}
}
// Record approval
await recordApproval({
promotionId,
approverId,
action,
comment
});
if (action === "reject") {
return await transitionState(promotion, "rejected", {
trigger: "approval_rejected",
triggeredBy: approverId,
details: { reason: comment }
});
}
// Check if all required approvals received
const approvalCount = await countApprovals(promotionId);
if (approvalCount >= environment.requiredApprovals) {
return await transitionState(promotion, "pending_gate", {
trigger: "approval_granted",
triggeredBy: approverId
});
}
return promotion;
}
```
### 3. Gate Evaluation
```typescript
async function evaluateGates(promotionId: UUID): Promise<GateEvaluationResult> {
const promotion = await getPromotion(promotionId);
const environment = await getEnvironment(promotion.targetEnvironmentId);
const release = await getRelease(promotion.releaseId);
const gateResults: GateResult[] = [];
// Security gate
const securityResult = await evaluateSecurityGate(release, environment);
gateResults.push(securityResult);
// Custom policy gates
for (const policy of environment.policies) {
const policyResult = await evaluatePolicyGate(release, environment, policy);
gateResults.push(policyResult);
}
// Aggregate results
const allPassed = gateResults.every(g => g.passed);
const blockingFailures = gateResults.filter(g => !g.passed && g.blocking);
// Create decision record
const decisionRecord = await createDecisionRecord({
promotionId,
gateResults,
decision: allPassed ? "allow" : "block",
decidedAt: new Date()
});
// Transition state
if (allPassed) {
await transitionState(promotion, "approved", {
trigger: "gate_passed",
triggeredBy: "system",
details: { decisionRecordId: decisionRecord.id }
});
} else {
await transitionState(promotion, "rejected", {
trigger: "gate_failed",
triggeredBy: "system",
details: { blockingGates: blockingFailures }
});
}
return { passed: allPassed, gateResults, decisionRecord };
}
```
### 4. Deployment Execution
```typescript
async function executeDeployment(promotionId: UUID): Promise<DeploymentJob> {
const promotion = await getPromotion(promotionId);
// Transition to deploying
await transitionState(promotion, "deploying", {
trigger: "deployment_started",
triggeredBy: "system"
});
// Generate artifacts
const artifacts = await generateArtifacts(promotion);
// Create deployment job
const job = await createDeploymentJob({
promotionId,
releaseId: promotion.releaseId,
environmentId: promotion.targetEnvironmentId,
artifacts
});
// Execute via workflow or direct
const workflowRun = await startDeploymentWorkflow(job);
// Update promotion with workflow reference
await updatePromotion(promotionId, { workflowRunId: workflowRun.id });
return job;
}
```
### 5. Completion Handling
```typescript
async function handleDeploymentCompletion(
jobId: UUID,
status: "succeeded" | "failed"
): Promise<Promotion> {
const job = await getDeploymentJob(jobId);
const promotion = await getPromotion(job.promotionId);
if (status === "succeeded") {
// Generate evidence packet
const evidence = await generateEvidencePacket(promotion, job);
// Update release environment state
await updateReleaseEnvironmentState({
releaseId: promotion.releaseId,
environmentId: promotion.targetEnvironmentId,
status: "deployed",
promotionId: promotion.id,
evidenceRef: evidence.id
});
return await transitionState(promotion, "deployed", {
trigger: "deployment_completed",
triggeredBy: "system",
details: { evidencePacketId: evidence.id }
});
} else {
return await transitionState(promotion, "failed", {
trigger: "deployment_failed",
triggeredBy: "system",
details: { jobId, error: job.errorMessage }
});
}
}
```
## Decision Record
Every promotion produces a decision record:
```typescript
interface DecisionRecord {
id: UUID;
promotionId: UUID;
decision: "allow" | "block";
decidedAt: DateTime;
// Inputs
release: {
id: UUID;
name: string;
components: Array<{ name: string; digest: string }>;
};
environment: {
id: UUID;
name: string;
};
// Gate results
gateResults: Array<{
gateName: string;
gateType: string;
passed: boolean;
blocking: boolean;
message: string;
details: object;
evaluatedAt: DateTime;
}>;
// Approvals
approvals: Array<{
approverId: UUID;
approverName: string;
action: "approved" | "rejected";
comment?: string;
timestamp: DateTime;
}>;
// Context
requester: {
id: UUID;
name: string;
};
requestReason: string;
// Signature
contentHash: string;
signature: string;
}
```
## API Endpoints
```yaml
# Request promotion
POST /api/v1/promotions
Body: { releaseId, targetEnvironmentId, reason? }
Response: Promotion
# Approve/reject promotion
POST /api/v1/promotions/{id}/approve
POST /api/v1/promotions/{id}/reject
Body: { comment? }
Response: Promotion
# Cancel promotion
POST /api/v1/promotions/{id}/cancel
Response: Promotion
# Get decision record
GET /api/v1/promotions/{id}/decision
Response: DecisionRecord
# Preview gates (dry run)
POST /api/v1/promotions/preview-gates
Body: { releaseId, targetEnvironmentId }
Response: { wouldPass: boolean, gates: GateResult[] }
```
## References
- [Workflow Templates](templates.md)
- [Workflow Execution](execution.md)
- [Evidence Schema](../appendices/evidence-schema.md)

View File

@@ -0,0 +1,327 @@
# Workflow Template Structure
## Overview
Workflow templates define the DAG (Directed Acyclic Graph) of steps to execute during deployment, promotion, and other automated processes.
## Template Structure
```typescript
interface WorkflowTemplate {
id: UUID;
tenantId: UUID;
name: string; // "standard-deploy"
displayName: string; // "Standard Deployment"
description: string;
version: number; // Auto-incremented
// DAG structure
nodes: StepNode[];
edges: StepEdge[];
// I/O definitions
inputs: InputDefinition[];
outputs: OutputDefinition[];
// Metadata
tags: string[];
isBuiltin: boolean;
createdAt: DateTime;
createdBy: UUID;
}
```
## Node Types
### Step Node
```typescript
interface StepNode {
id: string; // Unique within template (e.g., "deploy-api")
type: string; // Step type from registry
name: string; // Display name
config: Record<string, any>; // Step-specific configuration
inputs: InputBinding[]; // Input value bindings
outputs: OutputBinding[]; // Output declarations
position: { x: number; y: number }; // UI position
// Execution settings
timeout: number; // Seconds (default from step type)
retryPolicy: RetryPolicy;
onFailure: FailureAction;
condition?: string; // JS expression for conditional execution
// Documentation
description?: string;
documentation?: string;
}
type FailureAction = "fail" | "continue" | "rollback" | "goto:{nodeId}";
interface RetryPolicy {
maxRetries: number;
backoffType: "fixed" | "exponential";
backoffSeconds: number;
retryableErrors: string[];
}
```
### Input Bindings
```typescript
interface InputBinding {
name: string; // Input parameter name
source: InputSource;
}
type InputSource =
| { type: "literal"; value: any }
| { type: "context"; path: string } // e.g., "release.name"
| { type: "output"; nodeId: string; outputName: string }
| { type: "secret"; secretName: string }
| { type: "expression"; expression: string }; // JS expression
```
### Edge Types
```typescript
interface StepEdge {
id: string;
from: string; // Source node ID
to: string; // Target node ID
condition?: string; // Optional condition expression
label?: string; // Display label for conditional edges
}
```
## Built-in Step Types
### Control Steps
| Type | Description | Config |
|------|-------------|--------|
| `approval` | Wait for human approval | `promotionId` |
| `wait` | Wait for specified duration | `durationSeconds` |
| `condition` | Branch based on condition | `expression` |
| `parallel` | Execute children in parallel | `maxConcurrency` |
### Gate Steps
| Type | Description | Config |
|------|-------------|--------|
| `security-gate` | Evaluate security policy | `blockOnCritical`, `blockOnHigh` |
| `custom-gate` | Custom OPA policy evaluation | `policyName` |
| `freeze-check` | Check freeze windows | - |
| `approval-check` | Check approval status | `requiredCount` |
### Deploy Steps
| Type | Description | Config |
|------|-------------|--------|
| `deploy-docker` | Deploy single container | `containerName`, `strategy` |
| `deploy-compose` | Deploy Docker Compose stack | `composePath`, `strategy` |
| `deploy-ecs` | Deploy to AWS ECS | `cluster`, `service` |
| `deploy-nomad` | Deploy to HashiCorp Nomad | `jobName` |
### Verification Steps
| Type | Description | Config |
|------|-------------|--------|
| `health-check` | HTTP/TCP health check | `type`, `path`, `expectedStatus` |
| `smoke-test` | Run smoke test suite | `testSuite`, `timeout` |
| `verify-digest` | Verify deployed digest | `expectedDigest` |
### Integration Steps
| Type | Description | Config |
|------|-------------|--------|
| `webhook` | Call external webhook | `url`, `method`, `headers` |
| `trigger-ci` | Trigger CI pipeline | `integrationId`, `pipelineId` |
| `wait-ci` | Wait for CI pipeline | `runId`, `timeout` |
### Notification Steps
| Type | Description | Config |
|------|-------------|--------|
| `notify` | Send notification | `channel`, `template` |
| `slack` | Send Slack message | `channel`, `message` |
| `email` | Send email | `recipients`, `template` |
### Recovery Steps
| Type | Description | Config |
|------|-------------|--------|
| `rollback` | Rollback deployment | `strategy`, `targetReleaseId` |
| `execute-script` | Run recovery script | `scriptType`, `scriptRef` |
## Template Example: Standard Deployment
```json
{
"id": "template-standard-deploy",
"name": "standard-deploy",
"displayName": "Standard Deployment",
"version": 1,
"inputs": [
{ "name": "releaseId", "type": "uuid", "required": true },
{ "name": "environmentId", "type": "uuid", "required": true },
{ "name": "promotionId", "type": "uuid", "required": true }
],
"nodes": [
{
"id": "approval",
"type": "approval",
"name": "Approval Gate",
"config": {},
"inputs": [
{ "name": "promotionId", "source": { "type": "context", "path": "promotionId" } }
],
"position": { "x": 100, "y": 100 }
},
{
"id": "security-gate",
"type": "security-gate",
"name": "Security Verification",
"config": {
"blockOnCritical": true,
"blockOnHigh": true
},
"inputs": [
{ "name": "releaseId", "source": { "type": "context", "path": "releaseId" } }
],
"position": { "x": 100, "y": 200 }
},
{
"id": "pre-deploy-hook",
"type": "execute-script",
"name": "Pre-Deploy Hook",
"config": {
"scriptType": "csharp",
"scriptRef": "hooks/pre-deploy.csx"
},
"inputs": [
{ "name": "release", "source": { "type": "context", "path": "release" } },
{ "name": "environment", "source": { "type": "context", "path": "environment" } }
],
"timeout": 300,
"onFailure": "fail",
"position": { "x": 100, "y": 300 }
},
{
"id": "deploy-targets",
"type": "deploy-compose",
"name": "Deploy to Targets",
"config": {
"strategy": "rolling",
"parallelism": 2
},
"inputs": [
{ "name": "releaseId", "source": { "type": "context", "path": "releaseId" } },
{ "name": "environmentId", "source": { "type": "context", "path": "environmentId" } }
],
"timeout": 600,
"retryPolicy": {
"maxRetries": 2,
"backoffType": "exponential",
"backoffSeconds": 30
},
"onFailure": "rollback",
"position": { "x": 100, "y": 400 }
},
{
"id": "health-check",
"type": "health-check",
"name": "Health Verification",
"config": {
"type": "http",
"path": "/health",
"expectedStatus": 200,
"timeout": 30,
"retries": 5
},
"inputs": [
{ "name": "targets", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "deployedTargets" } }
],
"onFailure": "rollback",
"position": { "x": 100, "y": 500 }
},
{
"id": "post-deploy-hook",
"type": "execute-script",
"name": "Post-Deploy Hook",
"config": {
"scriptType": "bash",
"inline": "echo 'Deployment complete'"
},
"timeout": 300,
"onFailure": "continue",
"position": { "x": 100, "y": 600 }
},
{
"id": "notify-success",
"type": "notify",
"name": "Success Notification",
"config": {
"channel": "slack",
"template": "deployment-success"
},
"inputs": [
{ "name": "release", "source": { "type": "context", "path": "release" } },
{ "name": "environment", "source": { "type": "context", "path": "environment" } }
],
"onFailure": "continue",
"position": { "x": 100, "y": 700 }
},
{
"id": "rollback-handler",
"type": "rollback",
"name": "Rollback Handler",
"config": {
"strategy": "to-previous"
},
"inputs": [
{ "name": "deploymentJobId", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "jobId" } }
],
"position": { "x": 300, "y": 450 }
},
{
"id": "notify-failure",
"type": "notify",
"name": "Failure Notification",
"config": {
"channel": "slack",
"template": "deployment-failure"
},
"onFailure": "continue",
"position": { "x": 300, "y": 550 }
}
],
"edges": [
{ "id": "e1", "from": "approval", "to": "security-gate" },
{ "id": "e2", "from": "security-gate", "to": "pre-deploy-hook" },
{ "id": "e3", "from": "pre-deploy-hook", "to": "deploy-targets" },
{ "id": "e4", "from": "deploy-targets", "to": "health-check" },
{ "id": "e5", "from": "health-check", "to": "post-deploy-hook" },
{ "id": "e6", "from": "post-deploy-hook", "to": "notify-success" },
{ "id": "e7", "from": "deploy-targets", "to": "rollback-handler", "condition": "status === 'failed'" },
{ "id": "e8", "from": "health-check", "to": "rollback-handler", "condition": "status === 'failed'" },
{ "id": "e9", "from": "rollback-handler", "to": "notify-failure" }
]
}
```
## Template Validation
Templates are validated for:
1. **Structural validity**: Valid JSON/YAML, required fields present
2. **DAG validity**: No cycles, all edges reference valid nodes
3. **Type validity**: All step types exist in registry
4. **Schema validity**: Step configs match type schemas
5. **Input validity**: All required inputs are bindable
## References
- [Workflow Engine](../modules/workflow-engine.md)
- [Execution State Machine](execution.md)
- [Step Registry](../modules/workflow-engine.md#module-step-registry)