diff --git a/docs/product/advisories/09-Jan-2026 - Stella Ops Orchestrator Architecture.md b/docs/product/advisories/09-Jan-2026 - Stella Ops Orchestrator Architecture.md new file mode 100644 index 000000000..f351eea7d --- /dev/null +++ b/docs/product/advisories/09-Jan-2026 - Stella Ops Orchestrator Architecture.md @@ -0,0 +1,7322 @@ +# Stella Ops Suite: Unified Architecture Specification +## Release Orchestration Module — Implementation-Grade Design + +**Document Version:** 2.0 +**Date:** January 2026 +**Status:** Approved for Implementation +**Classification:** Internal Engineering + +--- + +# Table of Contents + +1. [Executive Summary](#1-executive-summary) +2. [Design Principles & Invariants](#2-design-principles--invariants) +3. [Module Architecture](#3-module-architecture) +4. [Plugin System Specification](#4-plugin-system-specification) +5. [Data Model](#5-data-model) +6. [API Specification](#6-api-specification) +7. [Workflow Engine & State Machines](#7-workflow-engine--state-machines) +8. [Security Architecture](#8-security-architecture) +9. [Integration Architecture](#9-integration-architecture) +10. [Deployment Execution Model](#10-deployment-execution-model) +11. [A/B & Progressive Delivery](#11-ab--progressive-delivery) +12. [UI/UX Specification](#12-uiux-specification) +13. [Observability & Operations](#13-observability--operations) +14. [Implementation Roadmap](#14-implementation-roadmap) +15. [Appendices](#15-appendices) + +--- + +# 1. Executive Summary + +## 1.1 Purpose + +This document specifies the complete technical architecture for extending Stella Ops from a vulnerability scanning platform into **Stella Ops Suite** — a unified release control plane for non-Kubernetes container environments. The architecture integrates: + +- **Existing capabilities**: SBOM generation, reachability-aware vulnerability analysis, VEX support, policy engine (OPA/Rego), evidence locker, deterministic replay +- **New capabilities**: Environment management, release orchestration, promotion workflows, deployment execution, progressive delivery, audit-grade release governance + +## 1.2 Scope + +| In Scope | Out of Scope | +|----------|--------------| +| Non-K8s container deployments (Docker, Compose, ECS, Nomad) | Kubernetes deployments (use ArgoCD, Flux) | +| Release identity via OCI digests | Tag-based release identity | +| Plugin-extensible integrations | Hard-coded vendor integrations | +| SSH/WinRM + agent-based deployment | Cloud-native serverless deployments | +| L4/L7 traffic management via router plugins | Built-in service mesh | + +## 1.3 Key Architectural Decisions + +| Decision | Rationale | +|----------|-----------| +| **3-surface plugin model** | Enables extensibility without core code changes; plugins contribute UI, runtime logic, and step types | +| **Digest-first release identity** | Tags are mutable; digests provide immutable release identity for audit | +| **Compiled C# scripts + sandboxed bash** | C# for complex orchestration (typed, safe); bash for simple hooks (sandboxed) | +| **Agent + agentless execution** | Agent-based preferred for reliability; agentless for adoption in agent-averse environments | +| **Evidence packets for every decision** | Enables deterministic replay and audit-grade compliance | +| **Certified path scoping for v1** | GitHub + Harbor + Docker/Compose as v1; everything else via plugins | + +## 1.4 Document Conventions + +- **MUST**: Mandatory requirement +- **SHOULD**: Recommended but not mandatory +- **MAY**: Optional +- **Entity names**: `PascalCase` (e.g., `ReleaseBundle`) +- **Table names**: `snake_case` (e.g., `release_bundles`) +- **API paths**: `/api/v1/resource-name` +- **Module names**: `kebab-case` (e.g., `release-manager`) + +--- + +# 2. Design Principles & Invariants + +## 2.1 Core Principles + +These principles are **inviolable** and MUST be reflected in all code, UI, documentation, and audit artifacts. + +### Principle 1: Release Identity via Digest + +``` +INVARIANT: A release is a set of OCI image digests (component → digest mapping), never tags. +``` + +- Tags are convenience inputs for resolution +- Tags are resolved to digests at release creation time +- All downstream operations (promotion, deployment, rollback) use digests +- Digest mismatch at pull time = deployment failure (tamper detection) + +### Principle 2: Determinism and Evidence + +``` +INVARIANT: Every deployment/promotion produces an immutable evidence record. +``` + +Evidence record contains: +- **Who**: User identity (from Authority) +- **What**: Release bundle (digests), target environment, target hosts +- **Why**: Policy evaluation result, approval records, decision reasons +- **How**: Generated artifacts (compose files, scripts), execution logs +- **When**: Timestamps for request, decision, execution, completion + +Evidence enables: +- Audit-grade compliance reporting +- Deterministic replay (same inputs + policy → same decision) +- "Why blocked?" explainability + +### Principle 3: Pluggable Everything, Stable Core + +``` +INVARIANT: Integrations are plugins; the core orchestration engine is stable. +``` + +Plugins contribute: +- Configuration screens (UI) +- Connector logic (runtime) +- Step node types (workflow) +- Doctor checks (diagnostics) +- Agent types (deployment) + +Core engine provides: +- Workflow execution (DAG processing) +- State machine management +- Evidence generation +- Policy evaluation +- Credential brokering + +### Principle 4: No Feature Gating + +``` +INVARIANT: All plans include all features. Limits are only: +- Number of environments +- Number of new digests analyzed per day +- Fair use on deployments +``` + +This prevents: +- "Pay for security" anti-pattern +- Per-project/per-seat billing landmines +- Feature fragmentation across tiers + +### Principle 5: Offline-First Operation + +``` +INVARIANT: All core operations MUST work in air-gapped environments. +``` + +Implications: +- No runtime calls to external APIs for core decisions +- Vulnerability data synced via mirror bundles +- Plugins may require connectivity; core does not +- Evidence packets exportable for external audit + +### Principle 6: Immutable Generated Artifacts + +``` +INVARIANT: Every deployment generates and stores immutable artifacts. +``` + +Generated artifacts: +- `compose.stella.lock.yml`: Pinned digests, resolved env refs +- `deploy.stella.script.dll`: Compiled C# script (or hash reference) +- `release.evidence.json`: Decision record +- `stella.version.json`: Version sticker placed on target + +Version sticker enables: +- Drift detection (expected vs actual) +- Audit trail on target host +- Rollback reference + +## 2.2 Architectural Invariants (Enforced by Design) + +| Invariant | Enforcement Mechanism | +|-----------|----------------------| +| Digests are immutable | Database constraint: digest column is unique, no updates | +| Evidence packets are append-only | Evidence table has no UPDATE/DELETE permissions | +| Secrets never in database | Vault integration; only references stored | +| Plugins cannot bypass policy | Policy evaluation in core, not plugin | +| Multi-tenant isolation | `tenant_id` FK on all tables; row-level security | + +--- + +# 3. Module Architecture + +## 3.1 Module Landscape Overview + +The Stella Ops Suite comprises existing modules (vulnerability scanning) and new modules (release orchestration). Modules are organized into **themes** (functional areas). + +``` +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ STELLA OPS SUITE │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────────┐ │ +│ │ EXISTING THEMES (Vulnerability) │ │ +│ │ │ │ +│ │ INGEST VEXOPS REASON SCANENG EVIDENCE │ │ +│ │ ├─concelier ├─excititor ├─policy ├─scanner ├─locker │ │ +│ │ └─advisory-ai └─linksets └─opa-runtime ├─sbom-gen ├─export │ │ +│ │ └─reachability └─timeline │ │ +│ │ │ │ +│ │ RUNTIME JOBCTRL OBSERVE REPLAY DEVEXP │ │ +│ │ ├─signals ├─scheduler ├─notifier └─replay-core ├─cli │ │ +│ │ ├─graph ├─orchestrator └─telemetry ├─web-ui │ │ +│ │ └─zastava └─task-runner └─sdk │ │ +│ └───────────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────────┐ │ +│ │ NEW THEMES (Release Orchestration) │ │ +│ │ │ │ +│ │ INTHUB (Integration Hub) │ │ +│ │ ├─integration-manager Central registry of configured integrations │ │ +│ │ ├─connection-profiles Default settings + credential management │ │ +│ │ ├─connector-runtime Plugin connector execution environment │ │ +│ │ └─doctor-checks Integration health diagnostics │ │ +│ │ │ │ +│ │ ENVMGR (Environment & Inventory) │ │ +│ │ ├─environment-manager Environment CRUD, ordering, config │ │ +│ │ ├─target-registry Deployment targets (hosts/services) │ │ +│ │ ├─agent-manager Agent registration, health, capabilities │ │ +│ │ └─inventory-sync Drift detection, state reconciliation │ │ +│ │ │ │ +│ │ RELMAN (Release Management) │ │ +│ │ ├─component-registry Image repos → components mapping │ │ +│ │ ├─version-manager Tag/digest → semver mapping │ │ +│ │ ├─release-manager Release bundle lifecycle │ │ +│ │ └─release-catalog Release history, search, compare │ │ +│ │ │ │ +│ │ WORKFL (Workflow Engine) │ │ +│ │ ├─workflow-designer Template creation, step graph editor │ │ +│ │ ├─workflow-engine DAG execution, state machine │ │ +│ │ ├─step-executor Step dispatch, retry, timeout │ │ +│ │ └─step-registry Built-in + plugin-provided steps │ │ +│ │ │ │ +│ │ PROMOT (Promotion & Approval) │ │ +│ │ ├─promotion-manager Promotion request lifecycle │ │ +│ │ ├─approval-gateway Approval collection, SoD enforcement │ │ +│ │ ├─decision-engine Gate evaluation, policy integration │ │ +│ │ └─gate-registry Built-in + custom gates │ │ +│ │ │ │ +│ │ DEPLOY (Deployment Execution) │ │ +│ │ ├─deploy-orchestrator Deployment job coordination │ │ +│ │ ├─target-executor Target-specific deployment logic │ │ +│ │ ├─runner-executor Script/hook execution sandbox │ │ +│ │ ├─artifact-generator Compose/script artifact generation │ │ +│ │ └─rollback-manager Rollback orchestration │ │ +│ │ │ │ +│ │ AGENTS (Deployment Agents) │ │ +│ │ ├─agent-core Shared agent runtime │ │ +│ │ ├─agent-docker Docker host agent │ │ +│ │ ├─agent-compose Docker Compose agent │ │ +│ │ ├─agent-ssh SSH remote executor │ │ +│ │ ├─agent-winrm WinRM remote executor │ │ +│ │ ├─agent-ecs AWS ECS agent │ │ +│ │ └─agent-nomad HashiCorp Nomad agent │ │ +│ │ │ │ +│ │ PROGDL (Progressive Delivery) │ │ +│ │ ├─ab-manager A/B release coordination │ │ +│ │ ├─traffic-router Router plugin orchestration │ │ +│ │ ├─canary-controller Canary ramp automation │ │ +│ │ └─rollout-strategy Strategy templates │ │ +│ │ │ │ +│ │ RELEVI (Release Evidence) │ │ +│ │ ├─evidence-collector Evidence aggregation │ │ +│ │ ├─evidence-signer Cryptographic signing │ │ +│ │ ├─sticker-writer Version sticker generation │ │ +│ │ └─audit-exporter Compliance report generation │ │ +│ │ │ │ +│ │ PLUGIN (Plugin Infrastructure) │ │ +│ │ ├─plugin-registry Plugin discovery, versioning │ │ +│ │ ├─plugin-loader Plugin lifecycle management │ │ +│ │ ├─plugin-sandbox Isolation, resource limits │ │ +│ │ └─plugin-sdk SDK for plugin development │ │ +│ └───────────────────────────────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` + +## 3.2 Module Specifications + +### 3.2.1 INTHUB: Integration Hub + +**Purpose**: Central management of all external integrations (SCM, CI, registries, vaults, targets). + +#### Module: `integration-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | CRUD for integration instances; plugin type registry | +| **Dependencies** | `plugin-registry`, `authority` (for credentials) | +| **Data Entities** | `Integration`, `IntegrationType`, `IntegrationCredential` | +| **Events Produced** | `integration.created`, `integration.updated`, `integration.deleted`, `integration.health_changed` | +| **Events Consumed** | `plugin.registered`, `plugin.unregistered` | + +**Key Operations**: +``` +CreateIntegration(type, name, config, credentials) → Integration +UpdateIntegration(id, config, credentials) → Integration +DeleteIntegration(id) → void +TestConnection(id) → ConnectionTestResult +DiscoverResources(id, resourceType) → Resource[] +GetIntegrationHealth(id) → HealthStatus +ListIntegrations(filter) → Integration[] +``` + +#### Module: `connection-profiles` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Default settings management; "last used" pattern | +| **Dependencies** | `integration-manager` | +| **Data Entities** | `ConnectionProfile`, `ProfileTemplate` | + +**Behavior**: When user adds a new integration instance: +1. Wizard defaults to last used endpoint, auth mode, network settings +2. Secrets are **never** auto-reused (explicit confirmation required) +3. User can save as named profile for reuse + +#### Module: `connector-runtime` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Execute plugin connector logic in controlled environment | +| **Dependencies** | `plugin-loader`, `plugin-sandbox` | +| **Protocol** | gRPC (preferred) or HTTP/REST | + +**Connector Interface** (implemented by plugins): +```protobuf +service Connector { + rpc TestConnection(TestConnectionRequest) returns (TestConnectionResponse); + rpc DiscoverResources(DiscoverRequest) returns (DiscoverResponse); + rpc ResolveTagToDigest(ResolveRequest) returns (ResolveResponse); + rpc FetchMetadata(MetadataRequest) returns (MetadataResponse); + rpc GetSecretsRef(SecretsRequest) returns (SecretsResponse); + rpc ExecuteStep(StepRequest) returns (stream StepResponse); +} +``` + +#### Module: `doctor-checks` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Integration health diagnostics; troubleshooting | +| **Dependencies** | `integration-manager`, `connector-runtime` | + +**Doctor Check Types**: +- Connectivity (can reach endpoint) +- Authentication (credentials valid) +- Authorization (permissions sufficient) +- Version compatibility (API version supported) +- Rate limit status (remaining quota) + +--- + +### 3.2.2 ENVMGR: Environment & Inventory Manager + +**Purpose**: Model environments, targets, agents, and their relationships. + +#### Module: `environment-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Environment CRUD, ordering, configuration, freeze windows | +| **Dependencies** | `authority` | +| **Data Entities** | `Environment`, `EnvironmentConfig`, `FreezeWindow` | +| **Events Produced** | `environment.created`, `environment.updated`, `environment.freeze_started`, `environment.freeze_ended` | + +**Environment Attributes**: +```typescript +interface Environment { + id: UUID; + tenantId: UUID; + name: string; // "dev", "stage", "prod" + displayName: string; // "Development" + orderIndex: number; // 0, 1, 2 for promotion order + config: EnvironmentConfig; + freezeWindows: FreezeWindow[]; + requiredApprovals: number; // 0 for dev, 1+ for prod + requireSeparationOfDuties: boolean; + autoPromoteFrom: UUID | null; // auto-promote from this env + promotionPolicy: string; // OPA policy name + createdAt: DateTime; + updatedAt: DateTime; +} + +interface EnvironmentConfig { + variables: Record; // env-specific variables + secrets: SecretReference[]; // vault references + registryOverrides: RegistryOverride[]; // per-env registry + agentLabels: string[]; // required agent labels + deploymentTimeout: number; // seconds + healthCheckConfig: HealthCheckConfig; +} + +interface FreezeWindow { + id: UUID; + start: DateTime; + end: DateTime; + reason: string; + createdBy: UUID; + exceptions: UUID[]; // users who can override +} +``` + +#### Module: `target-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Deployment target inventory; capability tracking | +| **Dependencies** | `environment-manager`, `agent-manager` | +| **Data Entities** | `Target`, `TargetGroup`, `TargetCapability` | + +**Target Types** (plugin-provided): +- `docker_host`: Single Docker host +- `compose_host`: Docker Compose host +- `ssh_remote`: Generic SSH target +- `winrm_remote`: Windows remote target +- `ecs_service`: AWS ECS service +- `nomad_job`: HashiCorp Nomad job + +**Target Attributes**: +```typescript +interface Target { + id: UUID; + tenantId: UUID; + environmentId: UUID; + name: string; // "prod-web-01" + targetType: string; // "docker_host" + connection: TargetConnection; // type-specific + capabilities: TargetCapability[]; + labels: Record; // for grouping + healthStatus: HealthStatus; + lastHealthCheck: DateTime; + deploymentDirectory: string; // where artifacts are placed + currentDigest: string | null; // what's currently deployed + agentId: UUID | null; // assigned agent +} + +interface TargetConnection { + // Common fields + host: string; + port: number; + + // Type-specific (examples) + // docker_host: + dockerSocket?: string; + tlsCert?: SecretReference; + + // ssh_remote: + username?: string; + privateKey?: SecretReference; + + // ecs_service: + cluster?: string; + service?: string; + region?: string; + roleArn?: string; +} +``` + +#### Module: `agent-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Agent registration, heartbeat, capability advertisement | +| **Dependencies** | `authority` (for agent tokens) | +| **Data Entities** | `Agent`, `AgentCapability`, `AgentHeartbeat` | +| **Events Produced** | `agent.registered`, `agent.online`, `agent.offline`, `agent.capability_changed` | + +**Agent Lifecycle**: +1. Agent starts, requests registration token from Authority +2. Agent registers with capabilities and labels +3. Agent sends heartbeats (default: 30s interval) +4. Agent pulls tasks from task queue +5. Agent reports task completion/failure + +**Agent Attributes**: +```typescript +interface Agent { + id: UUID; + tenantId: UUID; + name: string; + version: string; + capabilities: AgentCapability[]; + labels: Record; + status: "online" | "offline" | "degraded"; + lastHeartbeat: DateTime; + assignedTargets: UUID[]; + resourceUsage: ResourceUsage; +} + +interface AgentCapability { + type: string; // "docker", "compose", "ssh", "winrm" + version: string; // capability version + config: object; // capability-specific config +} +``` + +#### Module: `inventory-sync` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Drift detection; expected vs actual state reconciliation | +| **Dependencies** | `target-registry`, `agent-manager` | +| **Events Produced** | `inventory.drift_detected`, `inventory.reconciled` | + +**Drift Detection**: +1. Read `stella.version.json` from target deployment directory +2. Compare with expected state in database +3. Flag discrepancies (digest mismatch, missing sticker, unexpected files) +4. Report on dashboard + +--- + +### 3.2.3 RELMAN: Release Management + +**Purpose**: Manage components, versions, and release bundles. + +#### Module: `component-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Map image repositories to logical components | +| **Dependencies** | `integration-manager` (for registry access) | +| **Data Entities** | `Component`, `ComponentVersion` | + +**Component Attributes**: +```typescript +interface Component { + id: UUID; + tenantId: UUID; + name: string; // "api", "worker", "frontend" + displayName: string; // "API Service" + imageRepository: string; // "registry.example.com/myapp/api" + registryIntegrationId: UUID; // which registry integration + versioningStrategy: VersionStrategy; + deploymentTemplate: string; // which workflow template to use + defaultChannel: string; // "stable", "beta" + metadata: Record; +} + +interface VersionStrategy { + type: "semver" | "date" | "sequential" | "manual"; + tagPattern?: string; // regex for tag extraction + semverExtract?: string; // regex capture group +} +``` + +#### Module: `version-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Tag/digest → semver mapping; version rules | +| **Dependencies** | `component-registry`, `connector-runtime` | +| **Data Entities** | `VersionMap`, `VersionRule`, `Channel` | + +**Version Resolution**: +```typescript +interface VersionMap { + id: UUID; + componentId: UUID; + tag: string; // "v2.3.1" + digest: string; // "sha256:abc123..." + semver: string; // "2.3.1" + channel: string; // "stable" + prerelease: boolean; + buildMetadata: string; + resolvedAt: DateTime; + source: "auto" | "manual"; +} + +interface VersionRule { + id: UUID; + componentId: UUID; + pattern: string; // "^v(\\d+\\.\\d+\\.\\d+)$" + channel: string; // "stable" + prereleasePattern: string;// ".*-(alpha|beta|rc).*" +} +``` + +**Version Resolution Algorithm**: +1. Fetch tags from registry (via connector) +2. Apply version rules to extract semver +3. Resolve each tag to digest +4. Store in version map +5. Update channels ("latest stable", "latest beta") + +#### Module: `release-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Release bundle lifecycle; composition | +| **Dependencies** | `component-registry`, `version-manager` | +| **Data Entities** | `Release`, `ReleaseComponent` | +| **Events Produced** | `release.created`, `release.promoted`, `release.deprecated` | + +**Release Attributes**: +```typescript +interface Release { + id: UUID; + tenantId: UUID; + name: string; // "myapp-v2.3.1" + displayName: string; // "MyApp 2.3.1" + components: ReleaseComponent[]; + sourceRef: SourceReference; + status: ReleaseStatus; + createdAt: DateTime; + createdBy: UUID; + deployedEnvironments: UUID[]; // where currently deployed + metadata: Record; +} + +interface ReleaseComponent { + componentId: UUID; + componentName: string; + digest: string; // sha256:... + semver: string; // resolved semver + tag: string; // original tag (for display) + role: "primary" | "sidecar" | "init" | "migration"; +} + +interface SourceReference { + scmIntegrationId?: UUID; + commitSha?: string; + branch?: string; + ciIntegrationId?: UUID; + buildId?: string; + pipelineUrl?: string; +} + +type ReleaseStatus = + | "draft" // being composed + | "ready" // ready for promotion + | "promoting" // promotion in progress + | "deployed" // deployed to at least one env + | "deprecated" // marked as deprecated + | "archived"; // no longer active +``` + +**Release Creation Modes**: + +1. **Full Release**: All components, latest versions +2. **Partial Release**: Subset of components updated; others pinned from last deployment +3. **Pinned Release**: All versions explicitly specified +4. **Channel Release**: All components from specific channel ("beta") + +#### Module: `release-catalog` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Release history, search, comparison | +| **Dependencies** | `release-manager` | + +**Operations**: +``` +SearchReleases(filter, pagination) → Release[] +CompareReleases(releaseA, releaseB) → ReleaseDiff +GetReleaseHistory(componentId) → Release[] +GetReleaseLineage(releaseId) → ReleaseLineage // promotion path +``` + +--- + +### 3.2.4 WORKFL: Workflow Engine + +**Purpose**: Define and execute deployment workflows as DAGs. + +#### Module: `workflow-designer` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Workflow template CRUD; graph editing | +| **Dependencies** | `step-registry` | +| **Data Entities** | `WorkflowTemplate`, `StepNode`, `StepEdge` | + +**Workflow Template Structure**: +```typescript +interface WorkflowTemplate { + id: UUID; + tenantId: UUID; + name: string; + displayName: string; + description: string; + version: number; + + // Graph definition + nodes: StepNode[]; + edges: StepEdge[]; + + // Input schema + inputs: InputDefinition[]; + + // Output schema + outputs: OutputDefinition[]; + + // Metadata + createdAt: DateTime; + updatedAt: DateTime; + createdBy: UUID; + isBuiltIn: boolean; + tags: string[]; +} + +interface StepNode { + id: string; // unique within template + type: string; // "approval", "deploy-compose", etc. + name: string; + config: Record; // step-specific config + inputs: InputBinding[]; // bind inputs from context/previous steps + outputs: OutputBinding[]; + position: { x: number; y: number }; // for UI rendering + + // Execution settings + timeout: number; // seconds + retryPolicy: RetryPolicy; + onFailure: "fail" | "continue" | "rollback"; + condition?: string; // expression for conditional execution +} + +interface StepEdge { + id: string; + from: string; // node id + to: string; // node id + condition?: string; // optional condition +} + +interface RetryPolicy { + maxRetries: number; + backoffType: "fixed" | "exponential"; + backoffSeconds: number; + retryOn: string[]; // error types to retry +} +``` + +#### Module: `workflow-engine` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | DAG execution; state machine management | +| **Dependencies** | `step-executor`, `step-registry` | +| **Data Entities** | `WorkflowRun`, `StepRun` | +| **Events Produced** | `workflow.started`, `workflow.completed`, `workflow.failed`, `step.started`, `step.completed`, `step.failed` | + +**Workflow Run State Machine**: +``` + ┌─────────────────────────────────────────────┐ + │ │ + ▼ │ +┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ +│ CREATED │───►│ RUNNING │───►│ SUCCEEDED│ │ FAILED │ │ +└──────────┘ └────┬─────┘ └──────────┘ └──────────┘ │ + │ ▲ │ + │ ┌──────────┐ │ │ + ├─────────►│ PAUSED │─────────┤ │ + │ └──────────┘ │ │ + │ │ │ + │ ┌──────────┐ │ │ + └─────────►│ CANCELLED│─────────┴───────────┘ + └──────────┘ +``` + +**Step Run State Machine**: +``` +┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ +│ PENDING │───►│ RUNNING │───►│ SUCCEEDED│ │ FAILED │ +└──────────┘ └────┬─────┘ └──────────┘ └────┬─────┘ + │ │ + │ ┌──────────┐ │ + ├─────────►│ RETRYING │─────────┘ + │ └──────────┘ + │ + │ ┌──────────┐ + └─────────►│ SKIPPED │ (condition not met) + └──────────┘ +``` + +**DAG Execution Algorithm**: +```python +def execute_workflow(workflow_run): + ready_nodes = get_nodes_with_no_pending_dependencies(workflow_run) + + while ready_nodes: + # Execute ready nodes in parallel + for node in ready_nodes: + if evaluate_condition(node, workflow_run.context): + dispatch_step(node, workflow_run) + else: + mark_skipped(node, workflow_run) + + # Wait for any step to complete + completed_step = wait_for_completion(workflow_run) + + if completed_step.status == FAILED: + if completed_step.on_failure == "fail": + fail_workflow(workflow_run) + return + elif completed_step.on_failure == "rollback": + trigger_rollback(workflow_run) + return + # "continue" proceeds to next nodes + + # Update ready nodes + ready_nodes = get_nodes_with_no_pending_dependencies(workflow_run) + + complete_workflow(workflow_run) +``` + +#### Module: `step-executor` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Step dispatch, retry, timeout management | +| **Dependencies** | `step-registry`, `plugin-sandbox` | + +**Step Execution Flow**: +1. Resolve step type from registry +2. Validate inputs against schema +3. Prepare execution context (credentials, variables) +4. Dispatch to appropriate executor (built-in or plugin) +5. Stream logs to step run record +6. Handle timeout/retry +7. Collect outputs +8. Update step run status + +#### Module: `step-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Registry of available step types (built-in + plugin) | +| **Dependencies** | `plugin-registry` | +| **Data Entities** | `StepType`, `StepSchema` | + +**Built-in Step Types** (ship with v1): + +| Step Type | Category | Description | +|-----------|----------|-------------| +| `approval` | Gate | Manual approval gate | +| `policy-gate` | Gate | Policy evaluation (OPA/Rego) | +| `security-gate` | Gate | Vulnerability scan verdict check | +| `deploy-compose` | Deploy | Deploy via Docker Compose | +| `deploy-docker` | Deploy | Deploy to Docker host | +| `deploy-script` | Deploy | Deploy via custom script | +| `copy-assets` | Deploy | Copy files to target | +| `execute-command` | Deploy | Run command on target/container | +| `update-config` | Deploy | Update configuration files | +| `reload-service` | Deploy | Reload service (nginx, systemd) | +| `health-check` | Verify | HTTP/TCP/script health check | +| `smoke-test` | Verify | Run smoke test suite | +| `notify` | Notify | Send notification (webhook/email) | +| `wait` | Control | Wait for duration | +| `parallel` | Control | Parallel execution block | +| `conditional` | Control | If/else branching | + +**Step Type Schema**: +```typescript +interface StepType { + id: string; // "deploy-compose" + name: string; // "Deploy via Compose" + category: StepCategory; + provider: "builtin" | UUID; // plugin ID if plugin-provided + + // Schemas + inputSchema: JSONSchema; + outputSchema: JSONSchema; + configSchema: JSONSchema; + + // Execution metadata + idempotencyStrategy: "key" | "none" | "automatic"; + idempotencyKeyExpression?: string; + safeToRetry: boolean; + estimatedDuration: number; // seconds (for UI) + + // Requirements + requiredCapabilities: string[]; // ["docker", "compose"] + requiredPermissions: string[]; + + // UI metadata + icon: string; + color: string; + documentation: string; +} +``` + +--- + +### 3.2.5 PROMOT: Promotion & Approval + +**Purpose**: Manage promotion requests, approvals, and gate evaluation. + +#### Module: `promotion-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Promotion request lifecycle | +| **Dependencies** | `release-manager`, `environment-manager`, `workflow-engine` | +| **Data Entities** | `Promotion` | +| **Events Produced** | `promotion.requested`, `promotion.approved`, `promotion.rejected`, `promotion.completed`, `promotion.failed` | + +**Promotion Attributes**: +```typescript +interface Promotion { + id: UUID; + tenantId: UUID; + releaseId: UUID; + sourceEnvironmentId: UUID | null; // null for initial deployment + targetEnvironmentId: UUID; + status: PromotionStatus; + + // Request + requestedAt: DateTime; + requestedBy: UUID; + requestReason: string; + + // Decision + decisionRecord: DecisionRecord | null; + decidedAt: DateTime | null; + + // Execution + workflowRunId: UUID | null; + startedAt: DateTime | null; + completedAt: DateTime | null; + + // Evidence + evidencePacketId: UUID | null; +} + +type PromotionStatus = + | "pending_approval" + | "pending_gate" + | "approved" + | "rejected" + | "deploying" + | "deployed" + | "failed" + | "cancelled" + | "rolled_back"; +``` + +**Promotion State Machine**: +``` +┌─────────────────┐ +│ REQUESTED │ +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ ┌─────────────────┐ +│PENDING_APPROVAL │────►│ REJECTED │ +└────────┬────────┘ └─────────────────┘ + │ + ▼ +┌─────────────────┐ ┌─────────────────┐ +│ PENDING_GATE │────►│ REJECTED │ +└────────┬────────┘ └─────────────────┘ + │ + ▼ +┌─────────────────┐ +│ APPROVED │ +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ ┌─────────────────┐ +│ DEPLOYING │────►│ FAILED │ +└────────┬────────┘ └────────┬────────┘ + │ │ + ▼ ▼ +┌─────────────────┐ ┌─────────────────┐ +│ DEPLOYED │ │ ROLLED_BACK │ +└─────────────────┘ └─────────────────┘ +``` + +#### Module: `approval-gateway` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Approval collection, SoD enforcement | +| **Dependencies** | `authority`, `promotion-manager` | +| **Data Entities** | `Approval`, `ApprovalPolicy` | +| **Events Produced** | `approval.granted`, `approval.denied` | + +**Approval Attributes**: +```typescript +interface Approval { + id: UUID; + tenantId: UUID; + promotionId: UUID; + approverId: UUID; + action: "approved" | "rejected"; + comment: string; + approvedAt: DateTime; + + // For audit + approverRole: string; + approverGroups: string[]; +} + +interface ApprovalPolicy { + id: UUID; + environmentId: UUID; + requiredCount: number; + requiredRoles: string[]; // any of these roles + requiredGroups: string[]; // any of these groups + requireSeparationOfDuties: boolean; + allowSelfApproval: boolean; + expirationMinutes: number; // approval expires after +} +``` + +**Separation of Duties Enforcement**: +```typescript +function checkSeparationOfDuties(promotion: Promotion, approvals: Approval[]): boolean { + const policy = getApprovalPolicy(promotion.targetEnvironmentId); + + if (!policy.requireSeparationOfDuties) { + return true; + } + + // Requester cannot be the sole approver + const nonRequesterApprovals = approvals.filter( + a => a.approverId !== promotion.requestedBy && a.action === "approved" + ); + + return nonRequesterApprovals.length >= 1; +} +``` + +#### Module: `decision-engine` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Gate evaluation, policy integration | +| **Dependencies** | `policy` (REASON theme), `approval-gateway`, `scanner` (SCANENG theme) | +| **Data Entities** | `DecisionRecord`, `GateResult` | + +**Decision Flow**: +``` +Input: Promotion Request + │ + ▼ +┌─────────────────────────────────────────────────────┐ +│ DECISION ENGINE │ +│ │ +│ ┌─────────────────────────────────────────────┐ │ +│ │ EVIDENCE COLLECTOR │ │ +│ │ - Fetch scan verdicts for all digests │ │ +│ │ - Fetch approval records │ │ +│ │ - Fetch environment config │ │ +│ │ - Fetch freeze window status │ │ +│ └─────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────┐ │ +│ │ GATE EVALUATOR │ │ +│ │ │ │ +│ │ Gate 1: Security Gate │ │ +│ │ - Check reachable_critical == 0 │ │ +│ │ - Check reachable_high == 0 │ │ +│ │ - Or: check against threshold policy │ │ +│ │ │ │ +│ │ Gate 2: Approval Gate │ │ +│ │ - Check approval count >= required │ │ +│ │ - Check SoD compliance │ │ +│ │ │ │ +│ │ Gate 3: Freeze Window Gate │ │ +│ │ - Check not in active freeze │ │ +│ │ - Or: requester has exception │ │ +│ │ │ │ +│ │ Gate 4: Custom Policy Gates │ │ +│ │ - Evaluate OPA/Rego policies │ │ +│ │ │ │ +│ └─────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────┐ │ +│ │ DECISION RECORD │ │ +│ │ { │ │ +│ │ decision: "allow" | "deny", │ │ +│ │ gates: [ │ │ +│ │ { name, passed, reason, evidence_refs }│ │ +│ │ ], │ │ +│ │ policyHash: "sha256:...", │ │ +│ │ inputsHash: "sha256:...", │ │ +│ │ evaluatedAt: DateTime │ │ +│ │ } │ │ +│ └─────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────┘ + │ + ▼ +Output: DecisionRecord +``` + +**Decision Record Schema**: +```typescript +interface DecisionRecord { + decision: "allow" | "deny"; + gates: GateResult[]; + policyHash: string; // SHA256 of all policies evaluated + inputsHash: string; // SHA256 of all inputs + evaluatedAt: DateTime; + evaluatorVersion: string; // for replay compatibility +} + +interface GateResult { + name: string; + type: string; // "security", "approval", "freeze", "custom" + passed: boolean; + reason: string; // human-readable explanation + evidenceRefs: string[]; // ["scan:uuid1", "approval:uuid2"] + details: object; // gate-specific details +} +``` + +#### Module: `gate-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Registry of gate types (built-in + custom) | +| **Dependencies** | `plugin-registry` | + +**Built-in Gate Types**: + +| Gate Type | Description | +|-----------|-------------| +| `security-verdict` | Check scan verdict (reachable vulns) | +| `vulnerability-threshold` | Check vuln count against threshold | +| `approval-count` | Check approval count | +| `separation-of-duties` | Check SoD compliance | +| `freeze-window` | Check freeze window | +| `time-window` | Check deployment time window | +| `opa-policy` | Evaluate custom OPA/Rego policy | +| `previous-env-deployed` | Check release deployed to previous env | + +--- + +### 3.2.6 DEPLOY: Deployment Execution + +**Purpose**: Execute deployments on targets. + +#### Module: `deploy-orchestrator` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Deployment job coordination across targets | +| **Dependencies** | `target-registry`, `agent-manager`, `artifact-generator` | +| **Data Entities** | `DeploymentJob`, `DeploymentTask` | +| **Events Produced** | `deployment.started`, `deployment.progress`, `deployment.completed`, `deployment.failed` | + +**Deployment Job Structure**: +```typescript +interface DeploymentJob { + id: UUID; + promotionId: UUID; + releaseId: UUID; + environmentId: UUID; + + tasks: DeploymentTask[]; + + status: DeploymentJobStatus; + startedAt: DateTime; + completedAt: DateTime | null; + + artifacts: GeneratedArtifact[]; + + rollbackOf: UUID | null; // if this is a rollback +} + +interface DeploymentTask { + id: UUID; + jobId: UUID; + targetId: UUID; + digest: string; + + status: TaskStatus; + agentId: UUID | null; + + startedAt: DateTime | null; + completedAt: DateTime | null; + + logs: string; + exitCode: number | null; + + previousDigest: string | null; // for rollback + stickerWritten: boolean; +} +``` + +**Deployment Strategies**: + +| Strategy | Description | +|----------|-------------| +| `all-at-once` | Deploy to all targets simultaneously | +| `rolling` | Deploy to targets one-by-one with health checks | +| `canary` | Deploy to subset, verify, then proceed | +| `blue-green` | Deploy to B targets, switch traffic, retire A | + +#### Module: `target-executor` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Target-specific deployment logic | +| **Dependencies** | `agent-manager`, `connector-runtime` | + +**Executor Interface** (implemented by agents/plugins): +```typescript +interface TargetExecutor { + // Deploy a digest to a target + deploy(request: DeployRequest): Promise; + + // Inspect current state + inspect(target: Target): Promise; + + // Rollback to previous digest + rollback(request: RollbackRequest): Promise; + + // Health check + healthCheck(target: Target, config: HealthCheckConfig): Promise; + + // Write version sticker + writeSticker(target: Target, sticker: VersionSticker): Promise; + + // Read version sticker (for drift detection) + readSticker(target: Target): Promise; +} + +interface DeployRequest { + target: Target; + digest: string; + image: string; + config: DeployConfig; + artifacts: GeneratedArtifact[]; + timeout: number; + dryRun: boolean; +} + +interface DeployResult { + success: boolean; + containerId: string | null; + logs: string; + durationMs: number; + previousDigest: string | null; + artifactsDeployed: string[]; +} +``` + +#### Module: `runner-executor` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Script/hook execution in sandbox | +| **Dependencies** | `plugin-sandbox` | + +**Script Execution Modes**: + +1. **C# Script (Compiled)** + - Compiled via Roslyn to deterministic assembly + - Executed in sandbox container + - Typed context injection + - Stella SDK available + +2. **Bash Script (Sandboxed)** + - Executed in restricted container + - Limited filesystem mount + - No network by default + - Environment variable injection + +**Stella SDK (for C# scripts)**: +```csharp +namespace Stella.Scripting +{ + public static class Step + { + // Logging + public static void Log(string message); + public static void LogWarning(string message); + public static void LogError(string message); + + // Artifacts + public static void WriteArtifact(string name, byte[] content); + public static void WriteArtifact(string name, string content); + + // Outputs + public static void SetOutput(string name, object value); + + // Failure + public static void Fail(string reason); + public static void FailWithRetry(string reason); + } + + public static class Context + { + // Inputs + public static T GetInput(string name); + public static string GetVariable(string name); + + // Secrets (secure handles) + public static SecretHandle GetSecret(string name); + + // Release info + public static ReleaseInfo Release { get; } + public static EnvironmentInfo Environment { get; } + public static TargetInfo Target { get; } + } + + public static class Targets + { + // Remote execution + public static CommandResult RunCommand(string command, TimeSpan timeout); + public static void UploadFile(string localPath, string remotePath); + public static void DownloadFile(string remotePath, string localPath); + } + + public static class Registry + { + // Image operations + public static string ResolveDigest(string imageRef); + public static void PullImage(string imageRef); + } +} +``` + +#### Module: `artifact-generator` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Generate immutable deployment artifacts | +| **Dependencies** | `release-manager`, `environment-manager` | + +**Generated Artifacts**: + +1. **compose.stella.lock.yml** +```yaml +# Immutable, pinned compose file +version: "3.8" +services: + api: + image: registry.example.com/myapp/api@sha256:abc123... + environment: + - DB_HOST=${stella.secrets.db_host} # resolved at deploy time + deploy: + replicas: 3 + +# Stella metadata +x-stella: + release_id: "uuid" + release_name: "myapp-v2.3.1" + generated_at: "2026-01-09T14:30:00Z" + generator_version: "1.0.0" + inputs_hash: "sha256:..." +``` + +2. **stella.version.json** (Version Sticker) +```json +{ + "stella_version": "1.0", + "release_id": "uuid", + "release_name": "myapp-v2.3.1", + "components": [ + { + "name": "api", + "digest": "sha256:abc123...", + "semver": "2.3.1", + "tag": "v2.3.1" + } + ], + "environment": "prod", + "deployed_at": "2026-01-09T14:32:15Z", + "deployed_by": "jane@example.com", + "promotion_id": "uuid", + "evidence_packet_hash": "sha256:...", + "policy_decision_hash": "sha256:...", + "orchestrator_version": "1.0.0" +} +``` + +#### Module: `rollback-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Rollback orchestration | +| **Dependencies** | `deploy-orchestrator`, `target-registry` | +| **Events Produced** | `rollback.started`, `rollback.completed`, `rollback.failed` | + +**Rollback Strategies**: + +| Strategy | Description | +|----------|-------------| +| `to-previous` | Roll back to previous digest (from deployment record) | +| `to-release` | Roll back to specific release | +| `to-sticker` | Roll back to digest in version sticker | + +**Rollback Flow**: +1. Identify rollback target (previous digest) +2. Create rollback deployment job +3. Execute deployment (same flow as forward deploy) +4. Update version sticker +5. Record rollback evidence + +--- + +### 3.2.7 AGENTS: Deployment Agents + +**Purpose**: Execute deployments on specific target types. + +#### Module: `agent-core` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Shared agent runtime, communication, task queue | +| **Dependencies** | `authority` (for tokens) | + +**Agent Communication Protocol**: +- **Task Pull**: Agent polls for tasks (gRPC streaming or long-poll HTTP) +- **Heartbeat**: Agent sends health status every 30s +- **Log Streaming**: Agent streams step logs via gRPC/WebSocket +- **Authentication**: mTLS + short-lived JWT tokens + +**Task Queue Model**: +```typescript +interface AgentTask { + id: UUID; + type: string; // "deploy", "health-check", "rollback" + targetId: UUID; + payload: object; // type-specific + priority: number; + timeout: number; + assignedAt: DateTime; + expiresAt: DateTime; +} +``` + +#### Module: `agent-docker` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Docker host deployment | +| **Dependencies** | `agent-core` | +| **Capabilities Advertised** | `["docker"]` | + +**Operations**: +- Pull image (verify digest) +- Stop/rename existing container +- Create new container with config +- Start container +- Health check +- Write version sticker to volume/bind mount + +#### Module: `agent-compose` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Docker Compose deployment | +| **Dependencies** | `agent-core` | +| **Capabilities Advertised** | `["docker", "compose"]` | + +**Operations**: +- Write compose.stella.lock.yml to deployment directory +- `docker-compose pull` +- `docker-compose up -d` +- Health check on services +- Write version sticker + +#### Module: `agent-ssh` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | SSH remote executor (agentless) | +| **Dependencies** | `agent-core` | +| **Capabilities Advertised** | `["ssh", "linux"]` | + +**Operations**: +- Connect via SSH (key or password) +- Upload artifacts (SCP/SFTP) +- Execute commands +- Capture output/logs +- Write version sticker + +**Security Considerations**: +- Private keys stored in Vault, fetched at execution time +- Connection timeout and command timeout enforced +- No interactive sessions (command-only) + +#### Module: `agent-winrm` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | WinRM remote executor (agentless) | +| **Dependencies** | `agent-core` | +| **Capabilities Advertised** | `["winrm", "windows"]` | + +**Operations**: +- Connect via WinRM (NTLM or Kerberos) +- Upload artifacts (PowerShell remoting) +- Execute PowerShell commands +- Capture output/logs +- Write version sticker + +#### Module: `agent-ecs` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | AWS ECS deployment | +| **Dependencies** | `agent-core` | +| **Capabilities Advertised** | `["aws", "ecs"]` | + +**Operations**: +- Update task definition with new image digest +- Update service to use new task definition +- Wait for deployment to complete +- Health check via ECS service status +- Tag resources with version metadata + +#### Module: `agent-nomad` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | HashiCorp Nomad deployment | +| **Dependencies** | `agent-core` | +| **Capabilities Advertised** | `["nomad"]` | + +**Operations**: +- Update job spec with new image digest +- Submit job +- Monitor allocation status +- Health check +- Tag job with version metadata + +--- + +### 3.2.8 PROGDL: Progressive Delivery + +**Purpose**: A/B releases, canary deployments, traffic management. + +#### Module: `ab-manager` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | A/B release coordination | +| **Dependencies** | `release-manager`, `target-registry` | +| **Data Entities** | `ABRelease`, `Variation` | + +**A/B Release Structure**: +```typescript +interface ABRelease { + id: UUID; + tenantId: UUID; + environmentId: UUID; + name: string; + + variations: Variation[]; + activeVariation: string; // "A" or "B" + + trafficSplit: TrafficSplit; + rolloutStrategy: RolloutStrategy; + + status: ABStatus; + createdAt: DateTime; + completedAt: DateTime | null; +} + +interface Variation { + name: string; // "A", "B" + releaseId: UUID; + targetGroupId: UUID; // or targetLabels + trafficPercentage: number; + healthStatus: HealthStatus; +} + +interface TrafficSplit { + type: "percentage" | "header" | "cookie" | "tenant" | "label"; + config: object; // type-specific +} + +interface RolloutStrategy { + type: "manual" | "time-based" | "health-based" | "mixed"; + stages: RolloutStage[]; +} + +interface RolloutStage { + trafficPercentage: number; + duration: number; // seconds (for time-based) + healthThreshold: number; // percentage (for health-based) + requireManualApproval: boolean; +} +``` + +#### Module: `traffic-router` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Router plugin orchestration | +| **Dependencies** | `integration-manager`, `connector-runtime` | + +**Router Plugin Interface**: +```typescript +interface TrafficRouterPlugin { + // Configure traffic split + configureRoute(config: RouteConfig): Promise; + + // Get current traffic distribution + getTrafficDistribution(): Promise; + + // Shift traffic + shiftTraffic(from: string, to: string, percentage: number): Promise; + + // Health of routing layer + getHealth(): Promise; +} + +interface RouteConfig { + upstream: string; // backend service name + variations: VariationRoute[]; +} + +interface VariationRoute { + name: string; + targets: string[]; // backend addresses + weight: number; // 0-100 + headers?: Record; // header-based routing + cookies?: Record; // cookie-based routing +} +``` + +**Supported Routers** (via plugins): +- Nginx (config generation + reload) +- HAProxy (config generation + reload) +- Traefik (dynamic config API) +- AWS ALB (target group weights) +- Custom (webhook) + +#### Module: `canary-controller` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Canary ramp automation | +| **Dependencies** | `ab-manager`, `traffic-router` | +| **Events Produced** | `canary.stage_started`, `canary.stage_completed`, `canary.promoted`, `canary.rolled_back` | + +**Canary Flow**: +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ CANARY CONTROLLER │ +│ │ +│ Stage 1: 10% Stage 2: 25% Stage 3: 50% │ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ +│ │ Deploy B │ │ Shift 25% │ │ Shift 50% │ │ +│ │ Shift 10% │──────►│ Check health│─────►│ Check health│───────►│ +│ │ Check health│ │ Wait 5m │ │ Wait 10m │ │ +│ └─────────────┘ └─────────────┘ └─────────────┘ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ +│ │ Health fail?│ │ Health fail?│ │ Health fail?│ │ +│ │ → Rollback │ │ → Rollback │ │ → Rollback │ │ +│ └─────────────┘ └─────────────┘ └─────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────┐ │ +│ │ Stage 4:100%│ │ +│ │ Full promote│ │ +│ │ Retire A │ │ +│ └─────────────┘ │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +#### Module: `rollout-strategy` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Strategy templates | +| **Dependencies** | - | +| **Data Entities** | `RolloutStrategyTemplate` | + +**Built-in Strategies**: + +| Strategy | Description | +|----------|-------------| +| `immediate` | 0% → 100% immediately | +| `canary-10-25-50-100` | 10% → 25% → 50% → 100% with health checks | +| `blue-green` | 0% → 100% with instant cutover | +| `rolling-10` | 10% at a time, rolling update | +| `manual-stages` | Manual approval at each stage | + +--- + +### 3.2.9 RELEVI: Release Evidence + +**Purpose**: Generate, sign, and export release evidence packets. + +#### Module: `evidence-collector` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Aggregate evidence from all sources | +| **Dependencies** | All modules that produce evidence | + +**Evidence Sources**: +- Scan verdicts (from SCANENG) +- Approval records (from PROMOT) +- Decision records (from decision-engine) +- Deployment logs (from DEPLOY) +- Version stickers (from targets) +- Policy snapshots (from REASON) + +#### Module: `evidence-signer` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Cryptographic signing of evidence packets | +| **Dependencies** | Key management (Vault integration) | + +**Signing Algorithm**: +1. Compute SHA256 of evidence packet JSON +2. Sign hash with Stella release authority key +3. Attach signature and key reference to packet +4. Optionally: counter-sign with customer key + +#### Module: `sticker-writer` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Generate and write version stickers | +| **Dependencies** | `artifact-generator` | + +**Sticker Placement**: +- Docker: Volume or bind mount at `/stella/version.json` +- Compose: Deployment directory +- SSH: Specified deployment directory +- ECS: Task metadata (tags) +- Nomad: Job metadata + +#### Module: `audit-exporter` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Compliance report generation | +| **Dependencies** | `evidence-collector` | + +**Export Formats**: +- JSON (machine-readable) +- PDF (human-readable report) +- CSV (spreadsheet) +- SARIF (security findings) + +**Report Types**: +- Release audit trail (single release) +- Environment audit (all releases to env) +- Compliance summary (SOC2, FedRAMP) +- Vulnerability remediation (CVE resolution history) + +--- + +### 3.2.10 PLUGIN: Plugin Infrastructure + +**Purpose**: Enable extensibility via plugins. + +#### Module: `plugin-registry` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Plugin discovery, versioning, dependency resolution | +| **Data Entities** | `Plugin`, `PluginVersion`, `PluginDependency` | + +**Plugin Manifest Schema**: +```typescript +interface PluginManifest { + // Identity + pluginId: string; // "stella-plugin-github" + version: string; // semver + vendor: string; + license: string; + + // Capabilities + capabilities: PluginCapabilities; + + // Dependencies + dependencies: PluginDependency[]; + stellaMinVersion: string; + + // Runtime + entrypoint: string; // container image or assembly + configSchema: JSONSchema; + + // UI contributions + uiContributions: UIContribution[]; +} + +interface PluginCapabilities { + providesIntegrations?: IntegrationCapability[]; + providesSteps?: StepCapability[]; + providesAgentTypes?: AgentCapability[]; + providesGates?: GateCapability[]; + providesRouters?: RouterCapability[]; + providesDoctorChecks?: DoctorCheckCapability[]; +} + +interface IntegrationCapability { + type: "scm" | "ci" | "registry" | "vault" | "target" | "router"; + name: string; + configSchema: JSONSchema; + secretsSchema: JSONSchema; +} + +interface StepCapability { + stepType: string; + name: string; + category: StepCategory; + inputSchema: JSONSchema; + outputSchema: JSONSchema; + configSchema: JSONSchema; + requiredCapabilities: string[]; + safeToRetry: boolean; + idempotencyStrategy: string; +} + +interface UIContribution { + type: "config-form" | "dashboard-widget" | "step-editor"; + component: string; // component name + assets: string[]; // JS/CSS bundles +} +``` + +#### Module: `plugin-loader` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Plugin lifecycle management | +| **Dependencies** | `plugin-registry`, `plugin-sandbox` | + +**Plugin Lifecycle**: +``` +┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ +│DISCOVERED│───►│ LOADED │───►│ ACTIVE │───►│ STOPPED │ +└──────────┘ └────┬─────┘ └────┬─────┘ └──────────┘ + │ │ + ▼ ▼ + ┌──────────┐ ┌──────────┐ + │ FAILED │ │ DEGRADED │ + └──────────┘ └──────────┘ +``` + +#### Module: `plugin-sandbox` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | Isolation, resource limits | +| **Dependencies** | Container runtime | + +**Sandbox Constraints**: +- CPU limit: configurable (default 1 core) +- Memory limit: configurable (default 512MB) +- Network: restricted (allowlist of endpoints) +- Filesystem: read-only except temp directory +- Timeout: enforced per operation + +#### Module: `plugin-sdk` + +| Aspect | Specification | +|--------|---------------| +| **Responsibility** | SDK for plugin development | +| **Languages** | C#, TypeScript, Go | + +**SDK Components**: +- Base classes for connector implementation +- Step provider interface +- gRPC/HTTP client for Stella APIs +- Logging and telemetry helpers +- Testing utilities + +--- + +# 4. Plugin System Specification + +## 4.1 Three-Surface Plugin Model + +Plugins interact with Stella through three distinct surfaces: + +### Surface 1: Plugin Manifest (Static Declaration) + +The manifest declares what the plugin provides. It is evaluated at plugin discovery time. + +```yaml +# stella-plugin.yaml +pluginId: stella-plugin-github +version: 1.0.0 +vendor: Stella Ops +license: Apache-2.0 + +stellaMinVersion: "1.0.0" + +capabilities: + providesIntegrations: + - type: scm + name: GitHub + configSchema: + type: object + properties: + apiUrl: + type: string + default: "https://api.github.com" + organization: + type: string + required: [organization] + secretsSchema: + type: object + properties: + token: + type: string + x-stella-secret: true + required: [token] + + providesSteps: + - stepType: github-create-status + name: Create GitHub Status + category: notify + inputSchema: + type: object + properties: + commitSha: { type: string } + state: { type: string, enum: [pending, success, failure, error] } + description: { type: string } + required: [commitSha, state] + outputSchema: + type: object + properties: + statusUrl: { type: string } + safeToRetry: true + idempotencyStrategy: key + idempotencyKeyExpression: "${commitSha}-${state}" + +uiContributions: + - type: config-form + component: GitHubConfigForm + assets: + - /ui/github-config.js + +entrypoint: ghcr.io/stella-ops/plugin-github:1.0.0 +``` + +### Surface 2: Connector Runtime (Dynamic Execution) + +Plugins implement a standard gRPC interface for runtime operations. + +```protobuf +syntax = "proto3"; + +package stella.plugin.v1; + +service ConnectorService { + // Test connectivity to the integration + rpc TestConnection(TestConnectionRequest) returns (TestConnectionResponse); + + // Discover resources (repos, branches, etc.) + rpc DiscoverResources(DiscoverResourcesRequest) returns (DiscoverResourcesResponse); + + // Resolve tag to digest + rpc ResolveTagToDigest(ResolveTagRequest) returns (ResolveTagResponse); + + // Fetch metadata (commit info, build info) + rpc FetchMetadata(FetchMetadataRequest) returns (FetchMetadataResponse); + + // Execute a plugin-provided step + rpc ExecuteStep(ExecuteStepRequest) returns (stream ExecuteStepResponse); + + // Health check + rpc HealthCheck(HealthCheckRequest) returns (HealthCheckResponse); +} + +message TestConnectionRequest { + string integration_id = 1; + bytes config = 2; // JSON-encoded config + bytes secrets = 3; // JSON-encoded secrets (encrypted in transit) +} + +message TestConnectionResponse { + bool success = 1; + string message = 2; + repeated DiagnosticItem diagnostics = 3; +} + +message ExecuteStepRequest { + string step_type = 1; + string step_run_id = 2; + bytes config = 3; // step config + bytes inputs = 4; // step inputs + bytes context = 5; // execution context (release, env, etc.) +} + +message ExecuteStepResponse { + oneof payload { + LogEntry log = 1; + ProgressUpdate progress = 2; + OutputValue output = 3; + Artifact artifact = 4; + StepResult result = 5; + } +} + +message StepResult { + bool success = 1; + string message = 2; + bytes outputs = 3; // JSON-encoded outputs +} +``` + +### Surface 3: Step Provider (Execution Contract) + +Steps must declare their execution characteristics for the orchestrator. + +```typescript +interface StepProviderContract { + // Step metadata + stepType: string; + + // Input/output contracts + validateInputs(inputs: object): ValidationResult; + validateConfig(config: object): ValidationResult; + + // Execution characteristics + idempotencyStrategy: "none" | "key" | "automatic"; + computeIdempotencyKey?(inputs: object, config: object): string; + + // Retry behavior + safeToRetry: boolean; + retryableErrors: string[]; // error types that can be retried + + // Timeout + defaultTimeout: number; + maxTimeout: number; + + // Capability requirements + requiredCapabilities: string[]; + requiredPermissions: string[]; + + // Execution + execute(context: StepContext): AsyncIterable; +} +``` + +## 4.2 Plugin Discovery and Loading + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ PLUGIN DISCOVERY FLOW │ +│ │ +│ ┌──────────────┐ │ +│ │ Plugin Source│ (local dir, OCI registry, marketplace) │ +│ └──────┬───────┘ │ +│ │ │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │Read Manifest │ │ +│ └──────┬───────┘ │ +│ │ │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │Validate │ - Schema validation │ +│ │Manifest │ - Version compatibility check │ +│ └──────┬───────┘ - Capability conflict detection │ +│ │ │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │Resolve │ │ +│ │Dependencies │ │ +│ └──────┬───────┘ │ +│ │ │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │Register │ - Add to plugin registry │ +│ │Plugin │ - Register step types │ +│ └──────┬───────┘ - Register integration types │ +│ │ │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │Start Plugin │ - Pull container image │ +│ │Container │ - Start with sandbox constraints │ +│ └──────┬───────┘ - Establish gRPC connection │ +│ │ │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │Health Check │ - Verify plugin is responsive │ +│ └──────────────┘ - Mark as ACTIVE or FAILED │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +## 4.3 Plugin Security Model + +| Concern | Mitigation | +|---------|------------| +| **Malicious plugin code** | Sandbox with resource limits; no host access | +| **Secret exfiltration** | Secrets injected at execution time; audit log of access | +| **Network abuse** | Allowlist of endpoints; egress filtering | +| **Denial of service** | CPU/memory limits; timeout enforcement | +| **Plugin impersonation** | Signed manifests; registry verification | +| **Privilege escalation** | Plugins cannot bypass policy engine | + +--- + +# 5. Data Model + +## 5.1 Database Schema (PostgreSQL) + +### 5.1.1 Core Tables + +```sql +-- ============================================================ +-- TENANT AND AUTHORITY (existing, extended) +-- ============================================================ + +-- Extended: Add release-related permissions +ALTER TABLE permissions ADD COLUMN IF NOT EXISTS + resource_type VARCHAR(50) CHECK (resource_type IN ( + 'environment', 'release', 'promotion', 'target', 'workflow', 'plugin' + )); + +-- ============================================================ +-- INTEGRATION HUB +-- ============================================================ + +CREATE TABLE integration_types ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + name VARCHAR(100) NOT NULL UNIQUE, + category VARCHAR(50) NOT NULL CHECK (category IN ( + 'scm', 'ci', 'registry', 'vault', 'target', 'router' + )), + plugin_id UUID REFERENCES plugins(id), + config_schema JSONB NOT NULL, + secrets_schema JSONB NOT NULL, + is_builtin BOOLEAN NOT NULL DEFAULT FALSE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE TABLE integrations ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + integration_type_id UUID NOT NULL REFERENCES integration_types(id), + name VARCHAR(255) NOT NULL, + config JSONB NOT NULL, + credential_ref VARCHAR(500), -- Vault path or encrypted ref + status VARCHAR(50) NOT NULL DEFAULT 'unknown' CHECK (status IN ( + 'healthy', 'degraded', 'unhealthy', 'unknown' + )), + last_health_check TIMESTAMPTZ, + last_health_message TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_integrations_tenant ON integrations(tenant_id); +CREATE INDEX idx_integrations_type ON integrations(integration_type_id); + +CREATE TABLE connection_profiles ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + user_id UUID NOT NULL REFERENCES users(id), + integration_type_id UUID NOT NULL REFERENCES integration_types(id), + name VARCHAR(255) NOT NULL, + config_defaults JSONB NOT NULL, + is_default BOOLEAN NOT NULL DEFAULT FALSE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, user_id, integration_type_id, name) +); + +-- ============================================================ +-- ENVIRONMENT & INVENTORY +-- ============================================================ + +CREATE TABLE environments ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(100) NOT NULL, + display_name VARCHAR(255) NOT NULL, + order_index INTEGER NOT NULL, + config JSONB NOT NULL DEFAULT '{}', + freeze_windows JSONB NOT NULL DEFAULT '[]', + required_approvals INTEGER NOT NULL DEFAULT 0, + require_sod BOOLEAN NOT NULL DEFAULT FALSE, + auto_promote_from UUID REFERENCES environments(id), + promotion_policy VARCHAR(255), + deployment_timeout INTEGER NOT NULL DEFAULT 600, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_environments_tenant ON environments(tenant_id); +CREATE INDEX idx_environments_order ON environments(tenant_id, order_index); + +CREATE TABLE target_groups ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + labels JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, environment_id, name) +); + +CREATE TABLE targets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + target_group_id UUID REFERENCES target_groups(id), + name VARCHAR(255) NOT NULL, + target_type VARCHAR(100) NOT NULL, + connection JSONB NOT NULL, + capabilities JSONB NOT NULL DEFAULT '[]', + labels JSONB NOT NULL DEFAULT '{}', + deployment_directory VARCHAR(500), + health_status VARCHAR(50) NOT NULL DEFAULT 'unknown' CHECK (health_status IN ( + 'healthy', 'degraded', 'unhealthy', 'unknown' + )), + last_health_check TIMESTAMPTZ, + current_digest VARCHAR(100), + agent_id UUID REFERENCES agents(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, environment_id, name) +); + +CREATE INDEX idx_targets_tenant_env ON targets(tenant_id, environment_id); +CREATE INDEX idx_targets_type ON targets(target_type); +CREATE INDEX idx_targets_labels ON targets USING GIN (labels); + +CREATE TABLE agents ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + version VARCHAR(50) NOT NULL, + capabilities JSONB NOT NULL DEFAULT '[]', + labels JSONB NOT NULL DEFAULT '{}', + status VARCHAR(50) NOT NULL DEFAULT 'offline' CHECK (status IN ( + 'online', 'offline', 'degraded' + )), + last_heartbeat TIMESTAMPTZ, + resource_usage JSONB, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_agents_tenant ON agents(tenant_id); +CREATE INDEX idx_agents_status ON agents(status); +CREATE INDEX idx_agents_capabilities ON agents USING GIN (capabilities); + +-- ============================================================ +-- RELEASE MANAGEMENT +-- ============================================================ + +CREATE TABLE components ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + image_repository VARCHAR(500) NOT NULL, + registry_integration_id UUID REFERENCES integrations(id), + versioning_strategy JSONB NOT NULL DEFAULT '{"type": "semver"}', + deployment_template VARCHAR(255), + default_channel VARCHAR(50) NOT NULL DEFAULT 'stable', + metadata JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_components_tenant ON components(tenant_id); + +CREATE TABLE version_maps ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + component_id UUID NOT NULL REFERENCES components(id) ON DELETE CASCADE, + tag VARCHAR(255) NOT NULL, + digest VARCHAR(100) NOT NULL, + semver VARCHAR(50), + channel VARCHAR(50) NOT NULL DEFAULT 'stable', + prerelease BOOLEAN NOT NULL DEFAULT FALSE, + build_metadata VARCHAR(255), + resolved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + source VARCHAR(50) NOT NULL DEFAULT 'auto' CHECK (source IN ('auto', 'manual')), + UNIQUE (tenant_id, component_id, digest) +); + +CREATE INDEX idx_version_maps_component ON version_maps(component_id); +CREATE INDEX idx_version_maps_digest ON version_maps(digest); +CREATE INDEX idx_version_maps_semver ON version_maps(semver); + +CREATE TABLE releases ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + components JSONB NOT NULL, -- [{componentId, digest, semver, tag, role}] + source_ref JSONB, -- {scmIntegrationId, commitSha, ciIntegrationId, buildId} + status VARCHAR(50) NOT NULL DEFAULT 'draft' CHECK (status IN ( + 'draft', 'ready', 'promoting', 'deployed', 'deprecated', 'archived' + )), + metadata JSONB NOT NULL DEFAULT '{}', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id), + UNIQUE (tenant_id, name) +); + +CREATE INDEX idx_releases_tenant ON releases(tenant_id); +CREATE INDEX idx_releases_status ON releases(status); +CREATE INDEX idx_releases_created ON releases(created_at DESC); + +CREATE TABLE release_environment_state ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + release_id UUID NOT NULL REFERENCES releases(id), + status VARCHAR(50) NOT NULL CHECK (status IN ( + 'deployed', 'deploying', 'failed', 'rolling_back', 'rolled_back' + )), + deployed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + deployed_by UUID REFERENCES users(id), + promotion_id UUID, -- will reference promotions + evidence_ref VARCHAR(255), + UNIQUE (tenant_id, environment_id) +); + +CREATE INDEX idx_release_env_state_env ON release_environment_state(environment_id); +CREATE INDEX idx_release_env_state_release ON release_environment_state(release_id); + +-- ============================================================ +-- WORKFLOW ENGINE +-- ============================================================ + +CREATE TABLE workflow_templates ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE, -- NULL for builtin + name VARCHAR(255) NOT NULL, + display_name VARCHAR(255) NOT NULL, + description TEXT, + version INTEGER NOT NULL DEFAULT 1, + nodes JSONB NOT NULL, + edges JSONB NOT NULL, + inputs JSONB NOT NULL DEFAULT '[]', + outputs JSONB NOT NULL DEFAULT '[]', + is_builtin BOOLEAN NOT NULL DEFAULT FALSE, + tags JSONB NOT NULL DEFAULT '[]', + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + created_by UUID REFERENCES users(id), + UNIQUE (tenant_id, name, version) +); + +CREATE INDEX idx_workflow_templates_tenant ON workflow_templates(tenant_id); +CREATE INDEX idx_workflow_templates_builtin ON workflow_templates(is_builtin); + +CREATE TABLE workflow_runs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + template_id UUID NOT NULL REFERENCES workflow_templates(id), + template_version INTEGER NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN ( + 'created', 'running', 'paused', 'succeeded', 'failed', 'cancelled' + )), + context JSONB NOT NULL, -- inputs, variables, release info + outputs JSONB, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + error_message TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + triggered_by UUID REFERENCES users(id) +); + +CREATE INDEX idx_workflow_runs_tenant ON workflow_runs(tenant_id); +CREATE INDEX idx_workflow_runs_status ON workflow_runs(status); +CREATE INDEX idx_workflow_runs_template ON workflow_runs(template_id); + +CREATE TABLE step_runs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + workflow_run_id UUID NOT NULL REFERENCES workflow_runs(id) ON DELETE CASCADE, + node_id VARCHAR(100) NOT NULL, + step_type VARCHAR(100) NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'skipped', 'retrying', 'cancelled' + )), + inputs JSONB NOT NULL, + config JSONB NOT NULL, + outputs JSONB, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + attempt_number INTEGER NOT NULL DEFAULT 1, + error_message TEXT, + error_type VARCHAR(100), + logs TEXT, + artifacts JSONB NOT NULL DEFAULT '[]', + UNIQUE (workflow_run_id, node_id, attempt_number) +); + +CREATE INDEX idx_step_runs_workflow ON step_runs(workflow_run_id); +CREATE INDEX idx_step_runs_status ON step_runs(status); + +-- ============================================================ +-- PROMOTION & APPROVAL +-- ============================================================ + +CREATE TABLE promotions ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + release_id UUID NOT NULL REFERENCES releases(id), + source_environment_id UUID REFERENCES environments(id), + target_environment_id UUID NOT NULL REFERENCES environments(id), + status VARCHAR(50) NOT NULL DEFAULT 'pending_approval' CHECK (status IN ( + 'pending_approval', 'pending_gate', 'approved', 'rejected', + 'deploying', 'deployed', 'failed', 'cancelled', 'rolled_back' + )), + decision_record JSONB, + workflow_run_id UUID REFERENCES workflow_runs(id), + requested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + requested_by UUID NOT NULL REFERENCES users(id), + request_reason TEXT, + decided_at TIMESTAMPTZ, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + evidence_packet_id UUID +); + +CREATE INDEX idx_promotions_tenant ON promotions(tenant_id); +CREATE INDEX idx_promotions_release ON promotions(release_id); +CREATE INDEX idx_promotions_status ON promotions(status); +CREATE INDEX idx_promotions_target_env ON promotions(target_environment_id); + +-- Add FK to release_environment_state +ALTER TABLE release_environment_state + ADD CONSTRAINT fk_release_env_state_promotion + FOREIGN KEY (promotion_id) REFERENCES promotions(id); + +CREATE TABLE approvals ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES promotions(id) ON DELETE CASCADE, + approver_id UUID NOT NULL REFERENCES users(id), + action VARCHAR(50) NOT NULL CHECK (action IN ('approved', 'rejected')), + comment TEXT, + approved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + approver_role VARCHAR(255), + approver_groups JSONB NOT NULL DEFAULT '[]' +); + +CREATE INDEX idx_approvals_promotion ON approvals(promotion_id); +CREATE INDEX idx_approvals_approver ON approvals(approver_id); + +CREATE TABLE approval_policies ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + required_count INTEGER NOT NULL DEFAULT 1, + required_roles JSONB NOT NULL DEFAULT '[]', + required_groups JSONB NOT NULL DEFAULT '[]', + require_sod BOOLEAN NOT NULL DEFAULT FALSE, + allow_self_approval BOOLEAN NOT NULL DEFAULT FALSE, + expiration_minutes INTEGER NOT NULL DEFAULT 1440, + UNIQUE (tenant_id, environment_id) +); + +-- ============================================================ +-- DEPLOYMENT +-- ============================================================ + +CREATE TABLE deployment_jobs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES promotions(id), + release_id UUID NOT NULL REFERENCES releases(id), + environment_id UUID NOT NULL REFERENCES environments(id), + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'cancelled', 'rolling_back', 'rolled_back' + )), + strategy VARCHAR(50) NOT NULL DEFAULT 'all-at-once', + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + artifacts JSONB NOT NULL DEFAULT '[]', + rollback_of UUID REFERENCES deployment_jobs(id), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_deployment_jobs_promotion ON deployment_jobs(promotion_id); +CREATE INDEX idx_deployment_jobs_status ON deployment_jobs(status); + +CREATE TABLE deployment_tasks ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + job_id UUID NOT NULL REFERENCES deployment_jobs(id) ON DELETE CASCADE, + target_id UUID NOT NULL REFERENCES targets(id), + digest VARCHAR(100) NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'cancelled', 'skipped' + )), + agent_id UUID REFERENCES agents(id), + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + exit_code INTEGER, + logs TEXT, + previous_digest VARCHAR(100), + sticker_written BOOLEAN NOT NULL DEFAULT FALSE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_deployment_tasks_job ON deployment_tasks(job_id); +CREATE INDEX idx_deployment_tasks_target ON deployment_tasks(target_id); +CREATE INDEX idx_deployment_tasks_status ON deployment_tasks(status); + +CREATE TABLE generated_artifacts ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + deployment_job_id UUID REFERENCES deployment_jobs(id) ON DELETE CASCADE, + artifact_type VARCHAR(50) NOT NULL CHECK (artifact_type IN ( + 'compose_lock', 'script', 'sticker', 'evidence', 'config' + )), + name VARCHAR(255) NOT NULL, + content_hash VARCHAR(100) NOT NULL, + content BYTEA, -- for small artifacts + storage_ref VARCHAR(500), -- for large artifacts (S3, etc.) + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_generated_artifacts_job ON generated_artifacts(deployment_job_id); + +-- ============================================================ +-- PROGRESSIVE DELIVERY +-- ============================================================ + +CREATE TABLE ab_releases ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + environment_id UUID NOT NULL REFERENCES environments(id), + name VARCHAR(255) NOT NULL, + variations JSONB NOT NULL, -- [{name, releaseId, targetGroupId, trafficPercentage}] + active_variation VARCHAR(50) NOT NULL DEFAULT 'A', + traffic_split JSONB NOT NULL, + rollout_strategy JSONB NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN ( + 'created', 'deploying', 'running', 'promoting', 'completed', 'rolled_back' + )), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + completed_at TIMESTAMPTZ, + created_by UUID REFERENCES users(id) +); + +CREATE INDEX idx_ab_releases_tenant_env ON ab_releases(tenant_id, environment_id); +CREATE INDEX idx_ab_releases_status ON ab_releases(status); + +CREATE TABLE canary_stages ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + ab_release_id UUID NOT NULL REFERENCES ab_releases(id) ON DELETE CASCADE, + stage_number INTEGER NOT NULL, + traffic_percentage INTEGER NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN ( + 'pending', 'running', 'succeeded', 'failed', 'skipped' + )), + health_threshold DECIMAL(5,2), + duration_seconds INTEGER, + require_approval BOOLEAN NOT NULL DEFAULT FALSE, + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + health_result JSONB, + UNIQUE (ab_release_id, stage_number) +); + +-- ============================================================ +-- RELEASE EVIDENCE +-- ============================================================ + +CREATE TABLE evidence_packets ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + promotion_id UUID NOT NULL REFERENCES promotions(id), + packet_type VARCHAR(50) NOT NULL CHECK (packet_type IN ( + 'release_decision', 'deployment', 'rollback', 'ab_promotion' + )), + content JSONB NOT NULL, + content_hash VARCHAR(100) NOT NULL, + signature TEXT, + signer_key_ref VARCHAR(255), + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() + -- Note: No UPDATE or DELETE allowed (append-only) +); + +CREATE INDEX idx_evidence_packets_promotion ON evidence_packets(promotion_id); +CREATE INDEX idx_evidence_packets_created ON evidence_packets(created_at DESC); + +-- Append-only enforcement via trigger +CREATE OR REPLACE FUNCTION prevent_evidence_modification() +RETURNS TRIGGER AS $$ +BEGIN + RAISE EXCEPTION 'Evidence packets are immutable and cannot be modified or deleted'; +END; +$$ LANGUAGE plpgsql; + +CREATE TRIGGER evidence_packets_immutable +BEFORE UPDATE OR DELETE ON evidence_packets +FOR EACH ROW EXECUTE FUNCTION prevent_evidence_modification(); + +CREATE TABLE version_stickers ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + target_id UUID NOT NULL REFERENCES targets(id), + release_id UUID NOT NULL REFERENCES releases(id), + promotion_id UUID NOT NULL REFERENCES promotions(id), + sticker_content JSONB NOT NULL, + content_hash VARCHAR(100) NOT NULL, + written_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + verified_at TIMESTAMPTZ, + drift_detected BOOLEAN NOT NULL DEFAULT FALSE +); + +CREATE INDEX idx_version_stickers_target ON version_stickers(target_id); +CREATE INDEX idx_version_stickers_release ON version_stickers(release_id); + +-- ============================================================ +-- PLUGINS +-- ============================================================ + +CREATE TABLE plugins ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id VARCHAR(255) NOT NULL UNIQUE, + version VARCHAR(50) NOT NULL, + vendor VARCHAR(255) NOT NULL, + license VARCHAR(100), + manifest JSONB NOT NULL, + status VARCHAR(50) NOT NULL DEFAULT 'discovered' CHECK (status IN ( + 'discovered', 'loaded', 'active', 'stopped', 'failed', 'degraded' + )), + entrypoint VARCHAR(500) NOT NULL, + last_health_check TIMESTAMPTZ, + health_message TEXT, + installed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_plugins_status ON plugins(status); + +CREATE TABLE plugin_instances ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + plugin_id UUID NOT NULL REFERENCES plugins(id) ON DELETE CASCADE, + tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, + config JSONB NOT NULL DEFAULT '{}', + enabled BOOLEAN NOT NULL DEFAULT TRUE, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_plugin_instances_tenant ON plugin_instances(tenant_id); +``` + +## 5.2 Entity Relationship Diagram + +``` +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ ENTITY RELATIONSHIPS │ +│ │ +│ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │ +│ │ Tenant │───────│ Environment │───────│ Target │ │ +│ └──────────┘ └──────────────┘ └────────────┘ │ +│ │ │ │ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │ +│ │ Component│ │ Approval │ │ Agent │ │ +│ └──────────┘ │ Policy │ └────────────┘ │ +│ │ └──────────────┘ │ │ +│ │ │ │ │ +│ ▼ │ ▼ │ +│ ┌──────────┐ │ ┌─────────────┐ │ +│ │ Version │ │ │ Deployment │ │ +│ │ Map │ │ │ Task │ │ +│ └──────────┘ │ └─────────────┘ │ +│ │ │ │ │ +│ │ │ │ │ +│ ▼ │ ▼ │ +│ ┌─────────────────────────┼─────────────────────────────┐ │ +│ │ │ │ │ +│ │ ┌──────────┐ ┌─────▼─────┐ ┌─────────────┐ │ │ +│ │ │ Release │─────│ Promotion │─────│ Deployment │ │ │ +│ │ └──────────┘ └───────────┘ │ Job │ │ │ +│ │ │ │ └─────────────┘ │ │ +│ │ │ │ │ │ │ +│ │ │ ▼ │ │ │ +│ │ │ ┌───────────┐ │ │ │ +│ │ │ │ Approval │ │ │ │ +│ │ │ └───────────┘ │ │ │ +│ │ │ │ │ │ │ +│ │ │ ▼ ▼ │ │ +│ │ │ ┌───────────┐ ┌───────────┐ │ │ +│ │ │ │ Decision │ │ Generated │ │ │ +│ │ │ │ Record │ │ Artifacts │ │ │ +│ │ │ └───────────┘ └───────────┘ │ │ +│ │ │ │ │ │ │ +│ │ │ └────────┬────────┘ │ │ +│ │ │ │ │ │ +│ │ │ ▼ │ │ +│ │ │ ┌───────────┐ │ │ +│ │ └───────────────────►│ Evidence │◄────────────┘ │ +│ │ │ Packet │ │ +│ │ └───────────┘ │ +│ │ │ │ +│ │ ▼ │ +│ │ ┌───────────┐ │ +│ │ │ Version │ │ +│ │ │ Sticker │ │ +│ │ └───────────┘ │ +│ │ │ +│ └─────────────────────────────────────────────────────────────────────────────┘ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────────────┐ +│ │ WORKFLOW RELATIONSHIPS │ +│ │ │ +│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ │ Workflow │───────│ Workflow │───────│ Step │ │ +│ │ │ Template │ │ Run │ │ Run │ │ +│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ +│ │ │ │ │ +│ │ │ │ │ +│ │ ▼ ▼ │ +│ │ ┌──────────────┐ ┌──────────────┐ │ +│ │ │ Step │ │ Promotion │ │ +│ │ │ Registry │ │ │ │ +│ │ └──────────────┘ └──────────────┘ │ +│ │ │ │ +│ │ │ │ +│ │ ▼ │ +│ │ ┌──────────────┐ │ +│ │ │ Plugin │ │ +│ │ └──────────────┘ │ +│ │ │ +│ └─────────────────────────────────────────────────────────────────────────────┘ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +# 6. API Specification + +## 6.1 API Design Principles + +| Principle | Implementation | +|-----------|----------------| +| **RESTful** | Resource-oriented URLs, standard HTTP methods | +| **Versioned** | `/api/v1/...` prefix; breaking changes require version bump | +| **Consistent** | Standard response envelope, error format, pagination | +| **Authenticated** | OAuth 2.0 Bearer tokens via Authority module | +| **Tenant-scoped** | Tenant ID from token; all operations scoped to tenant | +| **Audited** | All mutating operations logged with user/timestamp | + +## 6.2 Standard Response Envelope + +```typescript +// Success response +interface ApiResponse { + success: true; + data: T; + meta?: { + pagination?: PaginationMeta; + requestId: string; + timestamp: string; + }; +} + +// Error response +interface ApiErrorResponse { + success: false; + error: { + code: string; // e.g., "PROMOTION_BLOCKED" + message: string; // Human-readable message + details?: object; // Additional context + validationErrors?: ValidationError[]; + }; + meta: { + requestId: string; + timestamp: string; + }; +} + +interface PaginationMeta { + page: number; + pageSize: number; + totalItems: number; + totalPages: number; + hasNext: boolean; + hasPrevious: boolean; +} + +interface ValidationError { + field: string; + message: string; + code: string; +} +``` + +## 6.3 API Endpoints by Module + +### 6.3.1 Integration Hub APIs + +```yaml +# Integration Types (read-only, populated by plugins) +GET /api/v1/integration-types +GET /api/v1/integration-types/{typeId} +GET /api/v1/integration-types/{typeId}/schema + +# Integrations +POST /api/v1/integrations + Body: { typeId, name, config, credentials } + Response: Integration + +GET /api/v1/integrations + Query: ?type={scm|ci|registry|vault|target|router}&status={healthy|degraded|unhealthy} + Response: Integration[] + +GET /api/v1/integrations/{id} + Response: Integration + +PUT /api/v1/integrations/{id} + Body: { name?, config?, credentials? } + Response: Integration + +DELETE /api/v1/integrations/{id} + Response: { deleted: true } + +POST /api/v1/integrations/{id}/test + Response: { success, message, diagnostics[] } + +POST /api/v1/integrations/{id}/discover + Body: { resourceType: "repos" | "branches" | "pipelines" | "images" } + Response: { resources: Resource[] } + +GET /api/v1/integrations/{id}/health + Response: { status, message, lastCheck, details } + +# Connection Profiles +POST /api/v1/connection-profiles +GET /api/v1/connection-profiles +GET /api/v1/connection-profiles/{id} +PUT /api/v1/connection-profiles/{id} +DELETE /api/v1/connection-profiles/{id} +POST /api/v1/connection-profiles/{id}/set-default +``` + +### 6.3.2 Environment & Inventory APIs + +```yaml +# Environments +POST /api/v1/environments + Body: { + name: string, + displayName: string, + orderIndex: number, + config: EnvironmentConfig, + requiredApprovals?: number, + requireSod?: boolean, + promotionPolicy?: string + } + Response: Environment + +GET /api/v1/environments + Query: ?includeState=true + Response: Environment[] (with current release state if requested) + +GET /api/v1/environments/{id} + Response: Environment + +PUT /api/v1/environments/{id} + Body: { ...partial Environment } + Response: Environment + +DELETE /api/v1/environments/{id} + Response: { deleted: true } + +# Freeze Windows +POST /api/v1/environments/{envId}/freeze-windows + Body: { start: DateTime, end: DateTime, reason: string, exceptions?: UUID[] } + Response: FreezeWindow + +GET /api/v1/environments/{envId}/freeze-windows + Query: ?active=true + Response: FreezeWindow[] + +DELETE /api/v1/environments/{envId}/freeze-windows/{windowId} + Response: { deleted: true } + +# Target Groups +POST /api/v1/environments/{envId}/target-groups +GET /api/v1/environments/{envId}/target-groups +GET /api/v1/target-groups/{id} +PUT /api/v1/target-groups/{id} +DELETE /api/v1/target-groups/{id} + +# Targets +POST /api/v1/targets + Body: { + environmentId: UUID, + targetGroupId?: UUID, + name: string, + targetType: string, + connection: TargetConnection, + labels?: Record, + deploymentDirectory?: string + } + Response: Target + +GET /api/v1/targets + Query: ?environmentId={uuid}&targetType={type}&labels={json}&healthStatus={status} + Response: Target[] + +GET /api/v1/targets/{id} + Response: Target + +PUT /api/v1/targets/{id} + Body: { ...partial Target } + Response: Target + +DELETE /api/v1/targets/{id} + Response: { deleted: true } + +POST /api/v1/targets/{id}/health-check + Response: { status, message, checkedAt } + +GET /api/v1/targets/{id}/sticker + Response: VersionSticker | null + +GET /api/v1/targets/{id}/drift + Response: { hasDrift: boolean, expected: VersionSticker, actual: VersionSticker | null, differences: Diff[] } + +# Agents +POST /api/v1/agents/register + Body: { name, version, capabilities[], labels } + Headers: X-Agent-Token: {registration-token} + Response: { agentId, token, config } + +GET /api/v1/agents + Query: ?status={online|offline|degraded}&capability={type} + Response: Agent[] + +GET /api/v1/agents/{id} + Response: Agent + +PUT /api/v1/agents/{id} + Body: { labels?, capabilities? } + Response: Agent + +DELETE /api/v1/agents/{id} + Response: { deleted: true } + +POST /api/v1/agents/{id}/heartbeat + Body: { status, resourceUsage, capabilities } + Response: { tasks: AgentTask[] } + +POST /api/v1/agents/{id}/tasks/{taskId}/complete + Body: { success, result, logs } + Response: { acknowledged: true } +``` + +### 6.3.3 Release Management APIs + +```yaml +# Components +POST /api/v1/components + Body: { + name: string, + displayName: string, + imageRepository: string, + registryIntegrationId: UUID, + versioningStrategy?: VersionStrategy, + defaultChannel?: string + } + Response: Component + +GET /api/v1/components + Response: Component[] + +GET /api/v1/components/{id} + Response: Component + +PUT /api/v1/components/{id} + Response: Component + +DELETE /api/v1/components/{id} + Response: { deleted: true } + +POST /api/v1/components/{id}/sync-versions + Body: { forceRefresh?: boolean } + Response: { synced: number, versions: VersionMap[] } + +GET /api/v1/components/{id}/versions + Query: ?channel={stable|beta}&limit={n} + Response: VersionMap[] + +# Version Maps +POST /api/v1/version-maps + Body: { componentId, tag, semver, channel } # manual version assignment + Response: VersionMap + +GET /api/v1/version-maps + Query: ?componentId={uuid}&channel={channel} + Response: VersionMap[] + +# Releases +POST /api/v1/releases + Body: { + name: string, + displayName?: string, + components: [ + { componentId: UUID, version?: string, digest?: string, channel?: string } + ], + sourceRef?: SourceReference + } + Response: Release + +GET /api/v1/releases + Query: ?status={status}&componentId={uuid}&page={n}&pageSize={n} + Response: { data: Release[], meta: PaginationMeta } + +GET /api/v1/releases/{id} + Response: Release (with full component details) + +PUT /api/v1/releases/{id} + Body: { displayName?, metadata?, status? } + Response: Release + +DELETE /api/v1/releases/{id} + Response: { deleted: true } + +GET /api/v1/releases/{id}/state + Response: { environments: [{ environmentId, status, deployedAt }] } + +POST /api/v1/releases/{id}/deprecate + Response: Release + +GET /api/v1/releases/{id}/compare/{otherId} + Response: { + added: Component[], + removed: Component[], + changed: [{ component, fromVersion, toVersion, fromDigest, toDigest }] + } + +# Quick release creation +POST /api/v1/releases/from-latest + Body: { + name: string, + channel?: string, # default: stable + componentIds?: UUID[], # default: all + pinFrom?: { environmentId: UUID } # for partial release + } + Response: Release +``` + +### 6.3.4 Workflow APIs + +```yaml +# Workflow Templates +POST /api/v1/workflow-templates + Body: { + name: string, + displayName: string, + description?: string, + nodes: StepNode[], + edges: StepEdge[], + inputs?: InputDefinition[], + outputs?: OutputDefinition[] + } + Response: WorkflowTemplate + +GET /api/v1/workflow-templates + Query: ?includeBuiltin=true&tags={tag} + Response: WorkflowTemplate[] + +GET /api/v1/workflow-templates/{id} + Response: WorkflowTemplate + +PUT /api/v1/workflow-templates/{id} + Body: { ...updates } # Creates new version + Response: WorkflowTemplate (new version) + +DELETE /api/v1/workflow-templates/{id} + Response: { deleted: true } + +POST /api/v1/workflow-templates/{id}/validate + Body: { inputs: object } + Response: { valid: boolean, errors: ValidationError[] } + +# Step Registry (read-only, populated by core + plugins) +GET /api/v1/step-types + Query: ?category={category}&provider={builtin|pluginId} + Response: StepType[] + +GET /api/v1/step-types/{type} + Response: StepType (with full schemas) + +# Workflow Runs +POST /api/v1/workflow-runs + Body: { + templateId: UUID, + context: object # inputs, release, environment, etc. + } + Response: WorkflowRun + +GET /api/v1/workflow-runs + Query: ?status={status}&templateId={uuid}&page={n} + Response: { data: WorkflowRun[], meta: PaginationMeta } + +GET /api/v1/workflow-runs/{id} + Response: WorkflowRun (with step runs) + +POST /api/v1/workflow-runs/{id}/pause + Response: WorkflowRun + +POST /api/v1/workflow-runs/{id}/resume + Response: WorkflowRun + +POST /api/v1/workflow-runs/{id}/cancel + Response: WorkflowRun + +GET /api/v1/workflow-runs/{id}/steps + Response: StepRun[] + +GET /api/v1/workflow-runs/{id}/steps/{nodeId} + Response: StepRun (with logs) + +GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs + Query: ?follow=true # for streaming + Response: string | SSE stream + +GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts + Response: Artifact[] + +GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts/{artifactId} + Response: binary (download) +``` + +### 6.3.5 Promotion & Approval APIs + +```yaml +# Promotions +POST /api/v1/promotions + Body: { + releaseId: UUID, + targetEnvironmentId: UUID, + reason?: string + } + Response: Promotion + +GET /api/v1/promotions + Query: ?status={status}&releaseId={uuid}&environmentId={uuid}&page={n} + Response: { data: Promotion[], meta: PaginationMeta } + +GET /api/v1/promotions/{id} + Response: Promotion (with decision record, approvals) + +POST /api/v1/promotions/{id}/approve + Body: { comment?: string } + Response: Promotion + +POST /api/v1/promotions/{id}/reject + Body: { reason: string } + Response: Promotion + +POST /api/v1/promotions/{id}/cancel + Response: Promotion + +GET /api/v1/promotions/{id}/decision + Response: DecisionRecord + +GET /api/v1/promotions/{id}/approvals + Response: Approval[] + +GET /api/v1/promotions/{id}/evidence + Response: EvidencePacket + +# Gate Evaluation (for preview) +POST /api/v1/promotions/preview-gates + Body: { + releaseId: UUID, + targetEnvironmentId: UUID + } + Response: { + wouldPass: boolean, + gates: GateResult[] + } + +# Approval Policies +POST /api/v1/approval-policies +GET /api/v1/approval-policies +GET /api/v1/approval-policies/{id} +PUT /api/v1/approval-policies/{id} +DELETE /api/v1/approval-policies/{id} + +# Pending Approvals (for current user) +GET /api/v1/my/pending-approvals + Response: Promotion[] +``` + +### 6.3.6 Deployment APIs + +```yaml +# Deployment Jobs (mostly read-only; created by promotions) +GET /api/v1/deployment-jobs + Query: ?promotionId={uuid}&status={status}&environmentId={uuid} + Response: DeploymentJob[] + +GET /api/v1/deployment-jobs/{id} + Response: DeploymentJob (with tasks) + +GET /api/v1/deployment-jobs/{id}/tasks + Response: DeploymentTask[] + +GET /api/v1/deployment-jobs/{id}/tasks/{taskId} + Response: DeploymentTask (with logs) + +GET /api/v1/deployment-jobs/{id}/tasks/{taskId}/logs + Query: ?follow=true + Response: string | SSE stream + +GET /api/v1/deployment-jobs/{id}/artifacts + Response: GeneratedArtifact[] + +GET /api/v1/deployment-jobs/{id}/artifacts/{artifactId} + Response: binary (download) + +# Rollback +POST /api/v1/rollbacks + Body: { + environmentId: UUID, + strategy: "to-previous" | "to-release" | "to-sticker", + targetReleaseId?: UUID # for to-release strategy + } + Response: DeploymentJob (rollback job) + +GET /api/v1/rollbacks + Query: ?environmentId={uuid} + Response: DeploymentJob[] (rollback jobs only) +``` + +### 6.3.7 Progressive Delivery APIs + +```yaml +# A/B Releases +POST /api/v1/ab-releases + Body: { + environmentId: UUID, + name: string, + variations: [ + { name: "A", releaseId: UUID, targetGroupId?: UUID }, + { name: "B", releaseId: UUID, targetGroupId?: UUID } + ], + trafficSplit: TrafficSplit, + rolloutStrategy: RolloutStrategy + } + Response: ABRelease + +GET /api/v1/ab-releases + Query: ?environmentId={uuid}&status={status} + Response: ABRelease[] + +GET /api/v1/ab-releases/{id} + Response: ABRelease (with stages) + +POST /api/v1/ab-releases/{id}/start + Response: ABRelease + +POST /api/v1/ab-releases/{id}/advance + Body: { stageNumber?: number } # advance to next or specific stage + Response: ABRelease + +POST /api/v1/ab-releases/{id}/promote + Body: { variation: "A" | "B" } # promote to 100% + Response: ABRelease + +POST /api/v1/ab-releases/{id}/rollback + Response: ABRelease + +GET /api/v1/ab-releases/{id}/traffic + Response: { currentSplit: TrafficDistribution, history: TrafficHistory[] } + +GET /api/v1/ab-releases/{id}/health + Response: { variations: [{ name, healthStatus, metrics }] } + +# Rollout Strategies (templates) +GET /api/v1/rollout-strategies + Response: RolloutStrategyTemplate[] + +GET /api/v1/rollout-strategies/{id} + Response: RolloutStrategyTemplate +``` + +### 6.3.8 Evidence & Audit APIs + +```yaml +# Evidence Packets +GET /api/v1/evidence-packets + Query: ?promotionId={uuid}&type={type}&from={date}&to={date} + Response: EvidencePacket[] + +GET /api/v1/evidence-packets/{id} + Response: EvidencePacket (full content) + +GET /api/v1/evidence-packets/{id}/download + Query: ?format={json|pdf} + Response: binary + +# Audit Reports +POST /api/v1/audit-reports + Body: { + type: "release" | "environment" | "compliance", + scope: { releaseId?: UUID, environmentId?: UUID, from?: Date, to?: Date }, + format: "json" | "pdf" | "csv" + } + Response: { reportId: UUID, status: "generating" } + +GET /api/v1/audit-reports/{id} + Response: { status, downloadUrl? } + +GET /api/v1/audit-reports/{id}/download + Response: binary + +# Version Stickers +GET /api/v1/version-stickers + Query: ?targetId={uuid}&releaseId={uuid} + Response: VersionSticker[] + +GET /api/v1/version-stickers/{id} + Response: VersionSticker +``` + +### 6.3.9 Plugin APIs + +```yaml +# Plugin Registry +GET /api/v1/plugins + Query: ?status={status}&capability={type} + Response: Plugin[] + +GET /api/v1/plugins/{id} + Response: Plugin (with manifest) + +POST /api/v1/plugins/{id}/enable + Response: Plugin + +POST /api/v1/plugins/{id}/disable + Response: Plugin + +GET /api/v1/plugins/{id}/health + Response: { status, message, diagnostics[] } + +# Plugin Instances (per-tenant config) +POST /api/v1/plugin-instances + Body: { pluginId: UUID, config: object } + Response: PluginInstance + +GET /api/v1/plugin-instances + Response: PluginInstance[] + +PUT /api/v1/plugin-instances/{id} + Body: { config: object, enabled: boolean } + Response: PluginInstance + +DELETE /api/v1/plugin-instances/{id} + Response: { deleted: true } +``` + +## 6.4 WebSocket / SSE Endpoints + +```yaml +# Real-time workflow run updates +WS /api/v1/workflow-runs/{id}/stream + Messages: + - { type: "step_started", nodeId, timestamp } + - { type: "step_progress", nodeId, progress, message } + - { type: "step_log", nodeId, line, level } + - { type: "step_completed", nodeId, status, outputs } + - { type: "workflow_completed", status } + +# Real-time deployment updates +WS /api/v1/deployment-jobs/{id}/stream + Messages: + - { type: "task_started", taskId, targetId } + - { type: "task_progress", taskId, progress } + - { type: "task_log", taskId, line } + - { type: "task_completed", taskId, status } + - { type: "job_completed", status } + +# Agent task stream +WS /api/v1/agents/{id}/task-stream + Messages: + - { type: "task_assigned", task: AgentTask } + - { type: "task_cancelled", taskId } + +# Dashboard metrics stream +WS /api/v1/dashboard/stream + Messages: + - { type: "metric_update", metrics: DashboardMetrics } + - { type: "alert", alert: Alert } +``` + +--- + +# 7. Workflow Engine & State Machines + +## 7.1 Workflow Template Structure + +### 7.1.1 Node Types + +```typescript +// Base node structure +interface StepNode { + id: string; // Unique within template (e.g., "deploy-api") + type: string; // Step type from registry + name: string; // Display name + config: Record; // Step-specific configuration + inputs: InputBinding[]; // Input value bindings + outputs: OutputBinding[]; // Output declarations + position: { x: number; y: number }; // UI position + + // Execution settings + timeout: number; // Seconds (default from step type) + retryPolicy: RetryPolicy; + onFailure: FailureAction; + condition?: string; // JS expression for conditional execution + + // Documentation + description?: string; + documentation?: string; +} + +type FailureAction = "fail" | "continue" | "rollback" | "goto:{nodeId}"; + +interface InputBinding { + name: string; // Input parameter name + source: InputSource; +} + +type InputSource = + | { type: "literal"; value: any } + | { type: "context"; path: string } // e.g., "release.name" + | { type: "output"; nodeId: string; outputName: string } + | { type: "secret"; secretName: string } + | { type: "expression"; expression: string }; // JS expression + +interface OutputBinding { + name: string; + description?: string; +} +``` + +### 7.1.2 Edge Types + +```typescript +interface StepEdge { + id: string; + from: string; // Source node ID + to: string; // Target node ID + condition?: string; // Optional condition expression + label?: string; // Display label for conditional edges +} +``` + +### 7.1.3 Template Example: Standard Deployment + +```json +{ + "id": "template-standard-deploy", + "name": "standard-deploy", + "displayName": "Standard Deployment", + "version": 1, + "inputs": [ + { "name": "releaseId", "type": "uuid", "required": true }, + { "name": "environmentId", "type": "uuid", "required": true }, + { "name": "promotionId", "type": "uuid", "required": true } + ], + "nodes": [ + { + "id": "approval", + "type": "approval", + "name": "Approval Gate", + "config": {}, + "inputs": [ + { "name": "promotionId", "source": { "type": "context", "path": "promotionId" } } + ], + "position": { "x": 100, "y": 100 } + }, + { + "id": "security-gate", + "type": "security-gate", + "name": "Security Verification", + "config": { + "blockOnCritical": true, + "blockOnHigh": true + }, + "inputs": [ + { "name": "releaseId", "source": { "type": "context", "path": "releaseId" } } + ], + "position": { "x": 100, "y": 200 } + }, + { + "id": "pre-deploy-hook", + "type": "execute-script", + "name": "Pre-Deploy Hook", + "config": { + "scriptType": "csharp", + "scriptRef": "hooks/pre-deploy.csx" + }, + "inputs": [ + { "name": "release", "source": { "type": "context", "path": "release" } }, + { "name": "environment", "source": { "type": "context", "path": "environment" } } + ], + "timeout": 300, + "onFailure": "fail", + "position": { "x": 100, "y": 300 } + }, + { + "id": "deploy-targets", + "type": "deploy-compose", + "name": "Deploy to Targets", + "config": { + "strategy": "rolling", + "parallelism": 2 + }, + "inputs": [ + { "name": "releaseId", "source": { "type": "context", "path": "releaseId" } }, + { "name": "environmentId", "source": { "type": "context", "path": "environmentId" } } + ], + "timeout": 600, + "retryPolicy": { + "maxRetries": 2, + "backoffType": "exponential", + "backoffSeconds": 30 + }, + "onFailure": "rollback", + "position": { "x": 100, "y": 400 } + }, + { + "id": "health-check", + "type": "health-check", + "name": "Health Verification", + "config": { + "type": "http", + "path": "/health", + "expectedStatus": 200, + "timeout": 30, + "retries": 5 + }, + "inputs": [ + { "name": "targets", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "deployedTargets" } } + ], + "onFailure": "rollback", + "position": { "x": 100, "y": 500 } + }, + { + "id": "post-deploy-hook", + "type": "execute-script", + "name": "Post-Deploy Hook", + "config": { + "scriptType": "bash", + "inline": "echo 'Deployment complete'" + }, + "timeout": 300, + "onFailure": "continue", + "position": { "x": 100, "y": 600 } + }, + { + "id": "notify-success", + "type": "notify", + "name": "Success Notification", + "config": { + "channel": "slack", + "template": "deployment-success" + }, + "inputs": [ + { "name": "release", "source": { "type": "context", "path": "release" } }, + { "name": "environment", "source": { "type": "context", "path": "environment" } } + ], + "onFailure": "continue", + "position": { "x": 100, "y": 700 } + }, + { + "id": "rollback-handler", + "type": "rollback", + "name": "Rollback Handler", + "config": { + "strategy": "to-previous" + }, + "inputs": [ + { "name": "deploymentJobId", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "jobId" } } + ], + "position": { "x": 300, "y": 450 } + }, + { + "id": "notify-failure", + "type": "notify", + "name": "Failure Notification", + "config": { + "channel": "slack", + "template": "deployment-failure" + }, + "onFailure": "continue", + "position": { "x": 300, "y": 550 } + } + ], + "edges": [ + { "id": "e1", "from": "approval", "to": "security-gate" }, + { "id": "e2", "from": "security-gate", "to": "pre-deploy-hook" }, + { "id": "e3", "from": "pre-deploy-hook", "to": "deploy-targets" }, + { "id": "e4", "from": "deploy-targets", "to": "health-check" }, + { "id": "e5", "from": "health-check", "to": "post-deploy-hook" }, + { "id": "e6", "from": "post-deploy-hook", "to": "notify-success" }, + { "id": "e7", "from": "deploy-targets", "to": "rollback-handler", "condition": "status === 'failed'" }, + { "id": "e8", "from": "health-check", "to": "rollback-handler", "condition": "status === 'failed'" }, + { "id": "e9", "from": "rollback-handler", "to": "notify-failure" } + ] +} +``` + +## 7.2 Workflow Execution State Machine + +### 7.2.1 Workflow Run States + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ WORKFLOW RUN STATE MACHINE │ +│ │ +│ ┌──────────┐ │ +│ │ CREATED │ │ +│ └────┬─────┘ │ +│ │ start() │ +│ ▼ │ +│ ┌─────────────────────────────┐ │ +│ │ │ │ +│ pause() ┌──┴──────────┐ │ │ +│ ┌────────►│ PAUSED │◄─────────┐ │ │ +│ │ └──────┬──────┘ │ │ │ +│ │ │ resume() │ │ │ +│ │ ▼ │ │ │ +│ │ ┌─────────────┐ │ │ │ +│ └─────────│ RUNNING │──────────┘ │ │ +│ └──────┬──────┘ (waiting for │ │ +│ │ approval) │ │ +│ ┌────────────┼────────────┐ │ │ +│ │ │ │ │ │ +│ ▼ ▼ ▼ │ │ +│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │ +│ │ SUCCEEDED │ │ FAILED │ │ CANCELLED │ │ │ +│ └───────────┘ └───────────┘ └───────────┘ │ │ +│ │ │ +│ │ │ +│ Transitions: │ │ +│ - CREATED → RUNNING: start() │ │ +│ - RUNNING → PAUSED: pause(), waiting approval│ │ +│ - PAUSED → RUNNING: resume(), approval granted │ +│ - RUNNING → SUCCEEDED: all nodes complete │ │ +│ - RUNNING → FAILED: node fails with fail action │ +│ - RUNNING → CANCELLED: cancel() │ │ +│ - PAUSED → CANCELLED: cancel() │ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### 7.2.2 Step Run States + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ STEP RUN STATE MACHINE │ +│ │ +│ ┌──────────┐ │ +│ │ PENDING │ ◄──── Initial state; dependencies not met │ +│ └────┬─────┘ │ +│ │ dependencies met + condition true │ +│ ▼ │ +│ ┌──────────┐ │ +│ │ RUNNING │ ◄──── Step is executing │ +│ └────┬─────┘ │ +│ │ │ +│ ┌────┴────────────────┬─────────────────┐ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ +│ │ SUCCEEDED │ │ FAILED │ │ SKIPPED │ │ +│ └───────────┘ └─────┬─────┘ └───────────┘ │ +│ │ ▲ │ +│ │ │ condition false │ +│ ▼ │ │ +│ ┌───────────┐ │ │ +│ │ RETRYING │──────┘ (max retries exceeded) │ +│ └─────┬─────┘ │ +│ │ │ +│ │ retry attempt │ +│ └──────────────────┐ │ +│ │ │ +│ ▼ │ +│ ┌──────────┐ │ +│ │ RUNNING │ (retry) │ +│ └──────────┘ │ +│ │ +│ Additional transitions: │ +│ - Any state → CANCELLED: workflow cancelled │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### 7.2.3 Execution Algorithm + +```python +class WorkflowEngine: + def execute(self, workflow_run: WorkflowRun) -> None: + """Main workflow execution loop.""" + + # Initialize + workflow_run.status = "running" + workflow_run.started_at = now() + self.save(workflow_run) + + try: + while not self.is_terminal(workflow_run): + # Handle pause state + if workflow_run.status == "paused": + self.wait_for_resume(workflow_run) + continue + + # Get nodes ready for execution + ready_nodes = self.get_ready_nodes(workflow_run) + + if not ready_nodes: + # Check if we're waiting on approvals + if self.has_pending_approvals(workflow_run): + workflow_run.status = "paused" + self.save(workflow_run) + continue + + # Check if all nodes are complete + if self.all_nodes_complete(workflow_run): + break + + # Deadlock detection + raise WorkflowDeadlockError(workflow_run.id) + + # Execute ready nodes in parallel + futures = [] + for node in ready_nodes: + future = self.executor.submit( + self.execute_node, + workflow_run, + node + ) + futures.append((node, future)) + + # Wait for at least one to complete + completed = self.wait_any(futures) + + for node, result in completed: + step_run = self.get_step_run(workflow_run, node.id) + + if result.success: + step_run.status = "succeeded" + step_run.outputs = result.outputs + self.propagate_outputs(workflow_run, node, result.outputs) + else: + step_run.status = "failed" + step_run.error_message = result.error + + # Handle failure action + if node.on_failure == "fail": + workflow_run.status = "failed" + workflow_run.error_message = f"Step {node.name} failed: {result.error}" + self.cancel_pending_steps(workflow_run) + return + elif node.on_failure == "rollback": + self.trigger_rollback(workflow_run, node) + elif node.on_failure.startswith("goto:"): + target = node.on_failure.split(":")[1] + self.add_ready_node(workflow_run, target) + # "continue" just continues to next nodes + + step_run.completed_at = now() + self.save(step_run) + + # Workflow completed successfully + workflow_run.status = "succeeded" + workflow_run.completed_at = now() + self.save(workflow_run) + + except WorkflowCancelledError: + workflow_run.status = "cancelled" + workflow_run.completed_at = now() + self.save(workflow_run) + except Exception as e: + workflow_run.status = "failed" + workflow_run.error_message = str(e) + workflow_run.completed_at = now() + self.save(workflow_run) + + def get_ready_nodes(self, workflow_run: WorkflowRun) -> List[StepNode]: + """Find nodes whose dependencies are all satisfied.""" + ready = [] + + for node in workflow_run.template.nodes: + step_run = self.get_step_run(workflow_run, node.id) + + # Skip if already started + if step_run.status != "pending": + continue + + # Check all incoming edges + incoming = self.get_incoming_edges(workflow_run.template, node.id) + all_satisfied = True + + for edge in incoming: + source_step = self.get_step_run(workflow_run, edge.from_node) + + # Check edge condition + if edge.condition: + if not self.evaluate_condition(edge.condition, source_step): + continue + + # Source must be succeeded (or skipped if condition false) + if source_step.status not in ["succeeded", "skipped"]: + all_satisfied = False + break + + if all_satisfied: + # Check node condition + if node.condition: + if not self.evaluate_condition(node.condition, workflow_run.context): + step_run.status = "skipped" + self.save(step_run) + continue + + ready.append(node) + + return ready + + def execute_node(self, workflow_run: WorkflowRun, node: StepNode) -> StepResult: + """Execute a single step node.""" + step_run = self.get_step_run(workflow_run, node.id) + step_run.status = "running" + step_run.started_at = now() + self.save(step_run) + + # Resolve inputs + resolved_inputs = self.resolve_inputs(node.inputs, workflow_run.context) + step_run.inputs = resolved_inputs + + # Get step executor + step_type = self.step_registry.get(node.type) + executor = self.get_executor(step_type) + + # Execute with retry + attempt = 0 + last_error = None + + while attempt <= node.retry_policy.max_retries: + attempt += 1 + step_run.attempt_number = attempt + + try: + result = executor.execute( + config=node.config, + inputs=resolved_inputs, + context=self.build_step_context(workflow_run), + timeout=node.timeout + ) + + # Stream logs + async for event in result: + if event.type == "log": + step_run.logs += event.line + "\n" + self.broadcast_log(workflow_run.id, node.id, event.line) + elif event.type == "output": + step_run.outputs[event.name] = event.value + elif event.type == "artifact": + step_run.artifacts.append(event.artifact) + elif event.type == "result": + if event.success: + return StepResult(success=True, outputs=step_run.outputs) + else: + raise StepExecutionError(event.message) + + except Exception as e: + last_error = e + + # Check if retryable + if attempt <= node.retry_policy.max_retries: + if step_type.safe_to_retry or type(e).__name__ in step_type.retryable_errors: + step_run.status = "retrying" + self.save(step_run) + + # Backoff + delay = self.calculate_backoff(node.retry_policy, attempt) + time.sleep(delay) + continue + + return StepResult(success=False, error=str(last_error)) + + return StepResult(success=False, error=str(last_error)) +``` + +## 7.3 Promotion State Machine + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PROMOTION STATE MACHINE │ +│ │ +│ ┌───────────────┐ │ +│ │ REQUESTED │ ◄──── User requests promotion │ +│ └───────┬───────┘ │ +│ │ │ +│ ▼ │ +│ ┌───────────────┐ ┌───────────────┐ │ +│ │ PENDING │─────►│ REJECTED │ ◄──── Approver rejects │ +│ │ APPROVAL │ └───────────────┘ │ +│ └───────┬───────┘ │ +│ │ approval received │ +│ ▼ │ +│ ┌───────────────┐ ┌───────────────┐ │ +│ │ PENDING │─────►│ REJECTED │ ◄──── Gate fails │ +│ │ GATE │ └───────────────┘ │ +│ └───────┬───────┘ │ +│ │ all gates pass │ +│ ▼ │ +│ ┌───────────────┐ │ +│ │ APPROVED │ ◄──── Ready for deployment │ +│ └───────┬───────┘ │ +│ │ workflow starts │ +│ ▼ │ +│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ +│ │ DEPLOYING │─────►│ FAILED │─────►│ ROLLED_BACK │ │ +│ └───────┬───────┘ └───────────────┘ └───────────────┘ │ +│ │ │ +│ │ deployment complete │ +│ ▼ │ +│ ┌───────────────┐ │ +│ │ DEPLOYED │ ◄──── Success! │ +│ └───────────────┘ │ +│ │ +│ Additional transitions: │ +│ - Any non-terminal → CANCELLED: user cancels │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +# 8. Security Architecture + +## 8.1 Security Principles + +| Principle | Implementation | +|-----------|----------------| +| **Defense in depth** | Multiple layers: network, auth, authz, audit | +| **Least privilege** | Role-based access; minimal permissions | +| **Zero trust** | All requests authenticated; mTLS for agents | +| **Secrets hygiene** | Secrets in vault; never in DB; ephemeral injection | +| **Audit everything** | All mutations logged; evidence trail | +| **Immutable evidence** | Evidence packets append-only; cryptographically signed | + +## 8.2 Authentication & Authorization + +### 8.2.1 Authentication Flow + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AUTHENTICATION ARCHITECTURE │ +│ │ +│ Human Users Service/Agent │ +│ ┌──────────┐ ┌──────────┐ │ +│ │ Browser │ │ Agent │ │ +│ └────┬─────┘ └────┬─────┘ │ +│ │ │ │ +│ │ OAuth 2.0 │ mTLS + JWT │ +│ │ Authorization Code │ │ +│ ▼ ▼ │ +│ ┌──────────────────────────────────────────────────────────────────┐ │ +│ │ AUTHORITY MODULE │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ OAuth 2.0 │ │ mTLS │ │ API Key │ │ │ +│ │ │ Provider │ │ Validator │ │ Validator │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ +│ │ │ TOKEN ISSUER │ │ │ +│ │ │ - Short-lived JWT (15 min) │ │ │ +│ │ │ - Contains: user_id, tenant_id, roles, permissions │ │ │ +│ │ │ - Signed with RS256 │ │ │ +│ │ └─────────────────────────────────────────────────────────────┘ │ │ +│ └──────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌──────────────────────────────────────────────────────────────────┐ │ +│ │ API GATEWAY │ │ +│ │ │ │ +│ │ - Validate JWT signature │ │ +│ │ - Check token expiration │ │ +│ │ - Extract tenant context │ │ +│ │ - Enforce rate limits │ │ +│ └──────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### 8.2.2 Authorization Model + +```typescript +// Permission structure +interface Permission { + resource: ResourceType; + action: ActionType; + scope?: ScopeType; + conditions?: Condition[]; +} + +type ResourceType = + | "environment" + | "release" + | "promotion" + | "target" + | "agent" + | "workflow" + | "plugin" + | "integration" + | "evidence"; + +type ActionType = + | "create" + | "read" + | "update" + | "delete" + | "execute" + | "approve" + | "deploy" + | "rollback"; + +type ScopeType = + | "*" // All resources + | { environmentId: UUID } // Specific environment + | { labels: Record }; // Label-based + +// Role definitions +const ROLES = { + "admin": { + permissions: [ + { resource: "*", action: "*", scope: "*" } + ] + }, + "release-manager": { + permissions: [ + { resource: "release", action: "*", scope: "*" }, + { resource: "promotion", action: "*", scope: "*" }, + { resource: "environment", action: "read", scope: "*" }, + { resource: "target", action: "read", scope: "*" } + ] + }, + "deployer": { + permissions: [ + { resource: "release", action: "read", scope: "*" }, + { resource: "promotion", action: ["create", "read"], scope: "*" }, + { resource: "target", action: "read", scope: "*" } + ] + }, + "approver": { + permissions: [ + { resource: "promotion", action: ["read", "approve"], scope: "*" } + ] + }, + "viewer": { + permissions: [ + { resource: "*", action: "read", scope: "*" } + ] + } +}; + +// Environment-scoped roles +const ENV_SCOPED_ROLES = { + "prod-deployer": { + permissions: [ + { resource: "promotion", action: "*", scope: { environmentId: "prod" } } + ] + } +}; +``` + +### 8.2.3 Policy Enforcement Points + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ POLICY ENFORCEMENT POINTS │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ API LAYER (PEP 1) │ │ +│ │ - Authenticate request │ │ +│ │ - Check resource-level permissions │ │ +│ │ - Enforce tenant isolation │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ SERVICE LAYER (PEP 2) │ │ +│ │ - Check business-level permissions │ │ +│ │ - Validate separation of duties │ │ +│ │ - Enforce approval policies │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ DECISION ENGINE (PEP 3) │ │ +│ │ - Evaluate security gates │ │ +│ │ - Evaluate custom OPA policies │ │ +│ │ - Produce signed decision records │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ DATA LAYER (PEP 4) │ │ +│ │ - Row-level security (tenant_id) │ │ +│ │ - Append-only enforcement (evidence) │ │ +│ │ - Encryption at rest │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## 8.3 Agent Security Model + +### 8.3.1 Agent Registration Flow + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AGENT REGISTRATION FLOW │ +│ │ +│ 1. Admin generates registration token (one-time use) │ +│ POST /api/v1/admin/agent-tokens │ +│ → { token: "reg_xxx", expiresAt: "..." } │ +│ │ +│ 2. Agent starts with registration token │ +│ ./stella-agent --register --token=reg_xxx │ +│ │ +│ 3. Agent requests mTLS certificate │ +│ POST /api/v1/agents/register │ +│ Headers: X-Registration-Token: reg_xxx │ +│ Body: { name, version, capabilities, csr } │ +│ → { agentId, certificate, caCertificate } │ +│ │ +│ 4. Agent establishes mTLS connection │ +│ Uses issued certificate for all subsequent requests │ +│ │ +│ 5. Agent requests short-lived JWT for task execution │ +│ POST /api/v1/agents/token (over mTLS) │ +│ → { token, expiresIn: 3600 } // 1 hour │ +│ │ +│ 6. Agent refreshes token before expiration │ +│ Token refresh only over mTLS connection │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### 8.3.2 Agent Communication Security + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ AGENT COMMUNICATION SECURITY │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ │ +│ │ AGENT │ │ STELLA CORE │ │ +│ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ +│ │ mTLS (mutual TLS) │ │ +│ │ - Agent cert signed by Stella CA │ │ +│ │ - Server cert verified by Agent │ │ +│ │ - TLS 1.3 only │ │ +│ │ - Perfect forward secrecy │ │ +│ │◄───────────────────────────────────────►│ │ +│ │ │ │ +│ │ Encrypted payload │ │ +│ │ - Task payloads encrypted with │ │ +│ │ agent-specific key │ │ +│ │ - Logs encrypted in transit │ │ +│ │◄───────────────────────────────────────►│ │ +│ │ │ │ +│ │ Heartbeat + capability refresh │ │ +│ │ - Every 30 seconds │ │ +│ │ - Signed with agent key │ │ +│ │─────────────────────────────────────────►│ │ +│ │ │ │ +│ │ Task assignment │ │ +│ │ - Contains short-lived credentials │ │ +│ │ - Scoped to specific target │ │ +│ │ - Expires after task timeout │ │ +│ │◄─────────────────────────────────────────│ │ +│ │ │ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### 8.3.3 Secrets Management + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ SECRETS FLOW (NEVER STORED IN DB) │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ VAULT │ │ STELLA CORE │ │ AGENT │ │ +│ │ (Source) │ │ (Broker) │ │ (Consumer) │ │ +│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ │ +│ │ │ Task requires secret │ │ +│ │ │ │ │ +│ │ Fetch with service │ │ │ +│ │ account token │ │ │ +│ │◄─────────────────────── │ │ +│ │ │ │ │ +│ │ Return secret │ │ │ +│ │ (wrapped, short TTL) │ │ │ +│ │───────────────────────► │ │ +│ │ │ │ │ +│ │ │ Embed in task payload │ │ +│ │ │ (encrypted) │ │ +│ │ │───────────────────────► │ +│ │ │ │ │ +│ │ │ │ Decrypt │ +│ │ │ │ Use for task │ +│ │ │ │ Discard │ +│ │ │ │ │ +│ │ │ │ │ +│ Rules: │ +│ - Secrets NEVER stored in Stella database │ +│ - Only Vault references stored │ +│ - Secrets fetched at execution time only │ +│ - Secrets not logged (masked in logs) │ +│ - Secrets not persisted in agent memory beyond task scope │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## 8.4 Threat Model & Mitigations + +| Threat | Attack Vector | Mitigation | +|--------|---------------|------------| +| **Credential theft** | Database breach | Secrets never in DB; only vault refs | +| **Token replay** | Stolen JWT | Short-lived tokens (15 min); refresh tokens rotated | +| **Agent impersonation** | Fake agent | mTLS with CA-signed certs; registration token one-time | +| **Digest tampering** | Modified image | Digest verification at pull time; mismatch = failure | +| **Evidence tampering** | Modified audit records | Append-only table; cryptographic signing | +| **Privilege escalation** | Compromised account | Role-based access; SoD enforcement; audit logs | +| **Supply chain attack** | Malicious plugin | Plugin sandbox; capability declarations; review process | +| **Lateral movement** | Compromised target | Short-lived task credentials; scoped permissions | +| **Data exfiltration** | Log/artifact theft | Encryption at rest; network segmentation | +| **Denial of service** | Resource exhaustion | Rate limiting; resource quotas; circuit breakers | + +## 8.5 Audit Trail + +### 8.5.1 Audit Event Structure + +```typescript +interface AuditEvent { + id: UUID; + timestamp: DateTime; + tenantId: UUID; + + // Actor + actorType: "user" | "agent" | "system" | "plugin"; + actorId: UUID; + actorName: string; + actorIp?: string; + + // Action + action: string; // "promotion.approved", "deployment.started" + resource: string; // "promotion" + resourceId: UUID; + + // Context + environmentId?: UUID; + releaseId?: UUID; + promotionId?: UUID; + + // Details + before?: object; // State before (for updates) + after?: object; // State after + metadata?: object; // Additional context + + // Integrity + previousEventHash: string; // Hash chain for tamper detection + eventHash: string; +} +``` + +### 8.5.2 Audited Operations + +| Category | Operations | +|----------|------------| +| **Authentication** | Login, logout, token refresh, failed attempts | +| **Authorization** | Permission denied events | +| **Environments** | Create, update, delete, freeze window changes | +| **Releases** | Create, deprecate, archive | +| **Promotions** | Request, approve, reject, cancel | +| **Deployments** | Start, complete, fail, rollback | +| **Targets** | Register, update, delete, health changes | +| **Agents** | Register, heartbeat gaps, capability changes | +| **Integrations** | Create, update, delete, test | +| **Plugins** | Enable, disable, config changes | +| **Evidence** | Create (never update/delete) | + +--- + +# 9. Integration Architecture + +## 9.1 Integration Types + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ INTEGRATION TYPES │ +│ │ +│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ +│ │ SCM │ │ CI │ │ REGISTRY │ │ +│ │ │ │ │ │ │ │ +│ │ - GitHub │ │ - GitHub Actions│ │ - Harbor │ │ +│ │ - GitLab │ │ - GitLab CI │ │ - Docker Hub │ │ +│ │ - Bitbucket │ │ - Jenkins │ │ - ECR │ │ +│ │ - Azure Repos │ │ - TeamCity │ │ - GCR │ │ +│ │ │ │ - Webhook │ │ - ACR │ │ +│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ +│ │ +│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ +│ │ VAULT │ │ TARGET │ │ ROUTER │ │ +│ │ │ │ │ │ │ │ +│ │ - HashiCorp │ │ - Docker Host │ │ - Nginx │ │ +│ │ - AWS Secrets │ │ - Compose Host │ │ - HAProxy │ │ +│ │ - Azure KeyVault│ │ - SSH Remote │ │ - Traefik │ │ +│ │ - Local Encrypt │ │ - WinRM Remote │ │ - AWS ALB │ │ +│ │ │ │ - ECS Service │ │ │ │ +│ │ │ │ - Nomad Job │ │ │ │ +│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## 9.2 Integration Connector Interface + +```typescript +// Base connector interface (all types implement this) +interface BaseConnector { + // Lifecycle + initialize(config: ConnectorConfig, secrets: SecretHandle[]): Promise; + dispose(): Promise; + + // Health + testConnection(): Promise; + getHealth(): Promise; + + // Discovery + discoverResources(resourceType: string): Promise; +} + +// SCM Connector +interface SCMConnector extends BaseConnector { + // Metadata + getCommit(ref: string): Promise; + getBranch(name: string): Promise; + listBranches(): Promise; + + // Status + createCommitStatus(commit: string, status: CommitStatus): Promise; + + // Webhooks + registerWebhook(config: WebhookConfig): Promise; + unregisterWebhook(id: string): Promise; +} + +// CI Connector +interface CIConnector extends BaseConnector { + // Pipelines + listPipelines(): Promise; + getPipelineRun(runId: string): Promise; + triggerPipeline(pipelineId: string, params: object): Promise; + + // Artifacts + listArtifacts(runId: string): Promise; + downloadArtifact(runId: string, artifactId: string): Promise; +} + +// Registry Connector +interface RegistryConnector extends BaseConnector { + // Images + listRepositories(): Promise; + listTags(repository: string): Promise; + resolveDigest(imageRef: string): Promise; + getManifest(imageRef: string): Promise; + + // Verification + verifyDigest(imageRef: string, expectedDigest: string): Promise; + verifySignature(imageRef: string): Promise; +} + +// Vault Connector +interface VaultConnector extends BaseConnector { + // Secrets + getSecret(path: string): Promise; + listSecrets(path: string): Promise; + + // Dynamic secrets (if supported) + requestCredential(role: string, ttl: number): Promise; + revokeCredential(leaseId: string): Promise; +} + +// Target Connector (implemented by agents) +interface TargetConnector extends BaseConnector { + // Deployment + deploy(request: DeployRequest): Promise; + inspect(): Promise; + rollback(request: RollbackRequest): Promise; + + // Files + uploadFile(localPath: string, remotePath: string): Promise; + downloadFile(remotePath: string, localPath: string): Promise; + + // Commands + executeCommand(command: string, timeout: number): Promise; + + // Sticker + writeSticker(sticker: VersionSticker): Promise; + readSticker(): Promise; + + // Health + checkHealth(config: HealthCheckConfig): Promise; +} + +// Router Connector +interface RouterConnector extends BaseConnector { + // Traffic management + configureRoute(config: RouteConfig): Promise; + getTrafficDistribution(): Promise; + shiftTraffic(from: string, to: string, percentage: number): Promise; + + // Configuration + reloadConfig(): Promise; + validateConfig(config: string): Promise; +} +``` + +## 9.3 Webhook Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ WEBHOOK PROCESSING ARCHITECTURE │ +│ │ +│ External Service (GitHub, GitLab, CI) │ +│ ┌──────────────────────────────┐ │ +│ │ Webhook Event │ │ +│ └──────────────┬───────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌──────────────────────────────────────────────────────────────────────┐ │ +│ │ WEBHOOK GATEWAY │ │ +│ │ │ │ +│ │ POST /webhooks/{integrationId}/{secret} │ │ +│ │ │ │ +│ │ 1. Verify signature (HMAC, etc.) │ │ +│ │ 2. Validate payload schema │ │ +│ │ 3. Lookup integration by ID │ │ +│ │ 4. Enqueue event for processing │ │ +│ └──────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌──────────────────────────────────────────────────────────────────────┐ │ +│ │ EVENT PROCESSOR │ │ +│ │ │ │ +│ │ Event Types: │ │ +│ │ - push: Trigger version sync │ │ +│ │ - tag: Trigger version resolution │ │ +│ │ - pr_merged: Trigger release creation │ │ +│ │ - pipeline_complete: Trigger promotion │ │ +│ │ - image_pushed: Trigger scan │ │ +│ │ │ │ +│ │ Processing: │ │ +│ │ 1. Parse event type │ │ +│ │ 2. Extract relevant data │ │ +│ │ 3. Trigger appropriate action │ │ +│ │ 4. Log audit event │ │ +│ └──────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## 9.4 CI/CD Integration Patterns + +### 9.4.1 Pattern: CI Triggers Scan + Release + +```yaml +# GitHub Actions example +name: Build and Create Release + +on: + push: + branches: [main] + +jobs: + build: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Build and Push Image + run: | + docker build -t $REGISTRY/$IMAGE:${{ github.sha }} . + docker push $REGISTRY/$IMAGE:${{ github.sha }} + + - name: Trigger Stella Scan + uses: stella-ops/scan-action@v1 + with: + stella-url: ${{ secrets.STELLA_URL }} + stella-token: ${{ secrets.STELLA_TOKEN }} + image: $REGISTRY/$IMAGE:${{ github.sha }} + + - name: Create Release + uses: stella-ops/release-action@v1 + with: + stella-url: ${{ secrets.STELLA_URL }} + stella-token: ${{ secrets.STELLA_TOKEN }} + release-name: myapp-${{ github.sha }} + components: + - name: api + image: $REGISTRY/$IMAGE + tag: ${{ github.sha }} +``` + +### 9.4.2 Pattern: Stella Triggers CI + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ STELLA → CI TRIGGER PATTERN │ +│ │ +│ User promotes release in Stella │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ STELLA WORKFLOW │ │ +│ │ │ │ +│ │ Step: "trigger-ci-pipeline" │ │ +│ │ Config: │ │ +│ │ integrationId: ci-jenkins-prod │ │ +│ │ pipelineId: deploy-to-prod │ │ +│ │ params: │ │ +│ │ RELEASE_ID: ${release.id} │ │ +│ │ IMAGE_DIGEST: ${release.components[0].digest} │ │ +│ │ ENVIRONMENT: ${environment.name} │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ JENKINS │ │ +│ │ │ │ +│ │ 1. Receive trigger from Stella │ │ +│ │ 2. Execute custom deployment logic │ │ +│ │ 3. Report status back to Stella via webhook │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ STELLA WEBHOOK │ │ +│ │ │ │ +│ │ Receive pipeline completion event │ │ +│ │ Update workflow step status │ │ +│ │ Continue or fail workflow │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + + +--- + +# 10. Deployment Execution Model + +## 10.1 Deployment Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ DEPLOYMENT EXECUTION ARCHITECTURE │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ PROMOTION REQUEST │ │ +│ │ { releaseId, targetEnvironmentId, requestedBy } │ │ +│ └──────────────────────────────┬──────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ DECISION ENGINE │ │ +│ │ - Security gate evaluation │ │ +│ │ - Approval verification │ │ +│ │ - Freeze window check │ │ +│ │ - Custom policy evaluation │ │ +│ └──────────────────────────────┬──────────────────────────────────────┘ │ +│ │ decision: ALLOW │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ ARTIFACT GENERATOR │ │ +│ │ - Generate compose.stella.lock.yml │ │ +│ │ - Compile C# scripts │ │ +│ │ - Generate version sticker │ │ +│ │ - Package deployment bundle │ │ +│ └──────────────────────────────┬──────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ DEPLOY ORCHESTRATOR │ │ +│ │ - Select deployment strategy │ │ +│ │ - Create deployment job │ │ +│ │ - Assign tasks to targets │ │ +│ │ - Coordinate execution │ │ +│ └──────────────────────────────┬──────────────────────────────────────┘ │ +│ │ │ +│ ┌─────────────────────┼─────────────────────┐ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ +│ │ TARGET EXEC 1 │ │ TARGET EXEC 2 │ │ TARGET EXEC N │ │ +│ │ │ │ │ │ │ │ +│ │ Agent/Remote │ │ Agent/Remote │ │ Agent/Remote │ │ +│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ +│ │ TARGET 1 │ │ TARGET 2 │ │ TARGET N │ │ +│ │ Docker Host │ │ Compose Host │ │ ECS Service │ │ +│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## 10.2 Deployment Strategies + +### 10.2.1 Strategy Definitions + +```typescript +interface DeploymentStrategy { + type: StrategyType; + config: StrategyConfig; +} + +type StrategyType = + | "all-at-once" // Deploy to all targets simultaneously + | "rolling" // Deploy to targets sequentially with health checks + | "canary" // Deploy to subset, verify, then proceed + | "blue-green" // Deploy to B, switch traffic, retire A + | "custom"; // Custom workflow-driven + +interface AllAtOnceConfig { + parallelism: number; // Max concurrent deployments (0 = unlimited) + continueOnFailure: boolean; // Continue if some targets fail + failureThreshold: number; // Max failures before abort (percentage or count) +} + +interface RollingConfig { + batchSize: number; // Targets per batch (count or percentage) + batchDelay: number; // Seconds between batches + healthCheckBetweenBatches: boolean; + rollbackOnFailure: boolean; + maxUnavailable: number; // Max targets unavailable at once +} + +interface CanaryConfig { + canaryTargets: number; // Number or percentage of targets for canary + canaryDuration: number; // Seconds to run canary before proceeding + healthThreshold: number; // Required health percentage + autoPromote: boolean; // Auto-proceed if healthy + requireApproval: boolean; // Require manual approval after canary +} + +interface BlueGreenConfig { + targetGroupA: UUID; // Current (blue) target group + targetGroupB: UUID; // New (green) target group + trafficShiftType: "instant" | "gradual"; + gradualShiftSteps?: number[]; // e.g., [10, 25, 50, 100] + rollbackOnHealthFailure: boolean; +} +``` + +### 10.2.2 Rolling Deployment Algorithm + +```python +class RollingDeploymentExecutor: + def execute(self, job: DeploymentJob, config: RollingConfig) -> DeploymentResult: + targets = self.get_targets(job.environment_id) + batches = self.create_batches(targets, config.batch_size) + + deployed_targets = [] + failed_targets = [] + + for batch_index, batch in enumerate(batches): + self.log(f"Starting batch {batch_index + 1} of {len(batches)}") + + # Deploy batch in parallel + batch_results = self.deploy_batch(job, batch) + + for target, result in batch_results: + if result.success: + deployed_targets.append(target) + + # Write version sticker + self.write_sticker(target, job.release) + else: + failed_targets.append(target) + + if config.rollback_on_failure: + # Rollback all deployed targets + self.rollback_targets(deployed_targets, job.previous_release) + return DeploymentResult( + success=False, + error=f"Batch {batch_index + 1} failed, rolled back", + deployed=deployed_targets, + failed=failed_targets, + rolled_back=deployed_targets + ) + + # Health check between batches + if config.health_check_between_batches and batch_index < len(batches) - 1: + health_result = self.check_batch_health(deployed_targets[-len(batch):]) + + if not health_result.healthy: + if config.rollback_on_failure: + self.rollback_targets(deployed_targets, job.previous_release) + return DeploymentResult( + success=False, + error=f"Health check failed after batch {batch_index + 1}", + deployed=deployed_targets, + failed=failed_targets, + rolled_back=deployed_targets + ) + + # Delay between batches + if config.batch_delay > 0 and batch_index < len(batches) - 1: + self.log(f"Waiting {config.batch_delay}s before next batch") + time.sleep(config.batch_delay) + + return DeploymentResult( + success=len(failed_targets) == 0, + deployed=deployed_targets, + failed=failed_targets + ) + + def deploy_batch(self, job: DeploymentJob, batch: List[Target]) -> List[Tuple[Target, TaskResult]]: + futures = [] + + for target in batch: + task = DeploymentTask( + job_id=job.id, + target_id=target.id, + digest=job.release.get_digest_for_target(target), + config=self.prepare_task_config(job, target) + ) + + executor = self.get_executor(target) + future = self.thread_pool.submit(executor.deploy, task) + futures.append((target, future)) + + results = [] + for target, future in futures: + try: + result = future.result(timeout=job.deployment_timeout) + results.append((target, result)) + except Exception as e: + results.append((target, TaskResult(success=False, error=str(e)))) + + return results +``` + +## 10.3 Agent-Based Deployment + +### 10.3.1 Agent Task Protocol + +```typescript +// Task assignment (Core → Agent) +interface AgentTask { + id: UUID; + type: TaskType; + targetId: UUID; + payload: TaskPayload; + credentials: EncryptedCredentials; + timeout: number; + priority: TaskPriority; + idempotencyKey: string; + assignedAt: DateTime; + expiresAt: DateTime; +} + +type TaskType = + | "deploy" + | "rollback" + | "health-check" + | "inspect" + | "execute-command" + | "upload-files" + | "write-sticker" + | "read-sticker"; + +interface DeployTaskPayload { + image: string; + digest: string; + config: DeployConfig; + artifacts: ArtifactReference[]; + previousDigest?: string; + hooks: { + preDeploy?: HookConfig; + postDeploy?: HookConfig; + }; +} + +// Task result (Agent → Core) +interface TaskResult { + taskId: UUID; + success: boolean; + startedAt: DateTime; + completedAt: DateTime; + + // Success details + outputs?: Record; + artifacts?: ArtifactReference[]; + + // Failure details + error?: string; + errorType?: string; + retriable?: boolean; + + // Logs + logs: string; + + // Metrics + metrics: { + pullDurationMs?: number; + deployDurationMs?: number; + healthCheckDurationMs?: number; + }; +} +``` + +### 10.3.2 Docker Agent Implementation + +```typescript +class DockerAgent implements TargetExecutor { + private docker: Docker; + + async deploy(task: DeployTaskPayload): Promise { + const { image, digest, config, previousDigest } = task; + const containerName = config.containerName; + + // 1. Pull image and verify digest + this.log(`Pulling image ${image}@${digest}`); + await this.docker.pull(image, { digest }); + + const pulledDigest = await this.getImageDigest(image); + if (pulledDigest !== digest) { + throw new DigestMismatchError( + `Expected digest ${digest}, got ${pulledDigest}. Possible tampering detected.` + ); + } + + // 2. Run pre-deploy hook + if (task.hooks?.preDeploy) { + await this.runHook(task.hooks.preDeploy, "pre-deploy"); + } + + // 3. Stop and rename existing container + const existingContainer = await this.findContainer(containerName); + if (existingContainer) { + this.log(`Stopping existing container ${containerName}`); + await existingContainer.stop({ t: 10 }); + await existingContainer.rename(`${containerName}-previous-${Date.now()}`); + } + + // 4. Create new container + this.log(`Creating container ${containerName} from ${image}@${digest}`); + const container = await this.docker.createContainer({ + name: containerName, + Image: `${image}@${digest}`, // Always use digest, not tag + Env: this.buildEnvVars(config.environment), + HostConfig: { + PortBindings: this.buildPortBindings(config.ports), + Binds: this.buildBindMounts(config.volumes), + RestartPolicy: { Name: config.restartPolicy || "unless-stopped" }, + Memory: config.memoryLimit, + CpuQuota: config.cpuLimit, + }, + Labels: { + "stella.release.id": config.releaseId, + "stella.release.name": config.releaseName, + "stella.digest": digest, + "stella.deployed.at": new Date().toISOString(), + }, + }); + + // 5. Start container + this.log(`Starting container ${containerName}`); + await container.start(); + + // 6. Wait for container to be healthy (if health check configured) + if (config.healthCheck) { + this.log(`Waiting for container health check`); + const healthy = await this.waitForHealthy(container, config.healthCheck.timeout); + if (!healthy) { + // Rollback to previous container + await this.rollbackContainer(containerName, existingContainer); + throw new HealthCheckFailedError(`Container ${containerName} failed health check`); + } + } + + // 7. Run post-deploy hook + if (task.hooks?.postDeploy) { + await this.runHook(task.hooks.postDeploy, "post-deploy"); + } + + // 8. Cleanup previous container + if (existingContainer && config.cleanupPrevious !== false) { + this.log(`Removing previous container`); + await existingContainer.remove({ force: true }); + } + + return { + success: true, + containerId: container.id, + previousDigest: previousDigest, + logs: this.getLogs(), + durationMs: this.getDuration(), + }; + } + + async rollback(task: RollbackTaskPayload): Promise { + const { containerName, targetDigest } = task; + + // Find previous container or use specified digest + if (targetDigest) { + // Deploy specific digest + return this.deploy({ + ...task, + digest: targetDigest, + }); + } + + // Find and restore previous container + const previousContainer = await this.findContainer(`${containerName}-previous-*`); + if (!previousContainer) { + throw new RollbackError(`No previous container found for ${containerName}`); + } + + // Stop current, rename, start previous + const currentContainer = await this.findContainer(containerName); + if (currentContainer) { + await currentContainer.stop({ t: 10 }); + await currentContainer.rename(`${containerName}-failed-${Date.now()}`); + } + + await previousContainer.rename(containerName); + await previousContainer.start(); + + return { + success: true, + containerId: previousContainer.id, + logs: this.getLogs(), + durationMs: this.getDuration(), + }; + } + + async inspect(): Promise { + const containers = await this.docker.listContainers({ + filters: { label: ["stella.release.id"] } + }); + + return { + containers: containers.map(c => ({ + name: c.Names[0], + digest: c.Labels["stella.digest"], + releaseId: c.Labels["stella.release.id"], + status: c.State, + health: c.Status, + })), + }; + } + + async writeSticker(sticker: VersionSticker): Promise { + const stickerPath = this.config.stickerPath || "/var/stella/version.json"; + const stickerContent = JSON.stringify(sticker, null, 2); + + // Write to host filesystem or container volume + if (this.config.stickerLocation === "volume") { + // Write to shared volume + await this.docker.run("alpine", [ + "sh", "-c", + `echo '${stickerContent}' > ${stickerPath}` + ], { + HostConfig: { + Binds: [`${this.config.stickerVolume}:/var/stella`] + } + }); + } else { + // Write directly to host + fs.writeFileSync(stickerPath, stickerContent); + } + } + + async readSticker(): Promise { + const stickerPath = this.config.stickerPath || "/var/stella/version.json"; + + try { + const content = fs.readFileSync(stickerPath, "utf-8"); + return JSON.parse(content); + } catch { + return null; + } + } +} +``` + +### 10.3.3 Compose Agent Implementation + +```typescript +class ComposeAgent implements TargetExecutor { + async deploy(task: DeployTaskPayload): Promise { + const { artifacts, config } = task; + const deployDir = config.deploymentDirectory; + + // 1. Write compose lock file + const composeLock = artifacts.find(a => a.type === "compose_lock"); + const composeContent = await this.fetchArtifact(composeLock); + + const composePath = path.join(deployDir, "compose.stella.lock.yml"); + await fs.writeFile(composePath, composeContent); + + // 2. Write any additional config files + for (const artifact of artifacts.filter(a => a.type === "config")) { + const content = await this.fetchArtifact(artifact); + await fs.writeFile(path.join(deployDir, artifact.name), content); + } + + // 3. Run pre-deploy hook + if (task.hooks?.preDeploy) { + await this.runHook(task.hooks.preDeploy, deployDir); + } + + // 4. Pull images + this.log("Pulling images..."); + const pullResult = await this.runCompose(deployDir, ["pull"]); + if (!pullResult.success) { + throw new Error(`Failed to pull images: ${pullResult.stderr}`); + } + + // 5. Verify digests + await this.verifyDigests(composePath, config.expectedDigests); + + // 6. Deploy + this.log("Deploying services..."); + const upResult = await this.runCompose(deployDir, [ + "up", "-d", + "--remove-orphans", + "--force-recreate" + ]); + + if (!upResult.success) { + throw new Error(`Failed to deploy: ${upResult.stderr}`); + } + + // 7. Wait for services to be healthy + if (config.healthCheck) { + this.log("Waiting for services to be healthy..."); + const healthy = await this.waitForServicesHealthy( + deployDir, + config.healthCheck.timeout + ); + + if (!healthy) { + // Rollback + await this.rollbackToBackup(deployDir); + throw new HealthCheckFailedError("Services failed health check"); + } + } + + // 8. Run post-deploy hook + if (task.hooks?.postDeploy) { + await this.runHook(task.hooks.postDeploy, deployDir); + } + + // 9. Write version sticker + await this.writeSticker(config.sticker, deployDir); + + return { + success: true, + logs: this.getLogs(), + durationMs: this.getDuration(), + }; + } + + private async runCompose(dir: string, args: string[]): Promise { + return this.exec("docker-compose", ["-f", "compose.stella.lock.yml", ...args], { + cwd: dir, + timeout: this.config.commandTimeout, + }); + } + + private async verifyDigests( + composePath: string, + expectedDigests: Record + ): Promise { + const composeContent = yaml.parse(await fs.readFile(composePath, "utf-8")); + + for (const [service, expectedDigest] of Object.entries(expectedDigests)) { + const serviceConfig = composeContent.services[service]; + if (!serviceConfig) { + throw new Error(`Service ${service} not found in compose file`); + } + + const image = serviceConfig.image; + if (!image.includes("@sha256:")) { + throw new Error(`Service ${service} image not pinned to digest: ${image}`); + } + + const actualDigest = image.split("@")[1]; + if (actualDigest !== expectedDigest) { + throw new DigestMismatchError( + `Service ${service}: expected ${expectedDigest}, got ${actualDigest}` + ); + } + } + } +} +``` + +## 10.4 Agentless Deployment (SSH/WinRM) + +### 10.4.1 SSH Remote Executor + +```typescript +class SSHRemoteExecutor implements TargetExecutor { + private ssh: SSHClient; + + async connect(config: SSHConnectionConfig): Promise { + const privateKey = await this.secrets.getSecret(config.privateKeyRef); + + this.ssh = new SSHClient(); + await this.ssh.connect({ + host: config.host, + port: config.port || 22, + username: config.username, + privateKey: privateKey.value, + readyTimeout: config.connectionTimeout || 30000, + keepaliveInterval: 10000, + }); + } + + async deploy(task: DeployTaskPayload): Promise { + const { artifacts, config } = task; + const deployDir = config.deploymentDirectory; + + try { + // 1. Ensure deployment directory exists + await this.exec(`mkdir -p ${deployDir}`); + await this.exec(`mkdir -p ${deployDir}/.stella-backup`); + + // 2. Backup current deployment + await this.exec(`cp -r ${deployDir}/* ${deployDir}/.stella-backup/ 2>/dev/null || true`); + + // 3. Upload artifacts + for (const artifact of artifacts) { + const content = await this.fetchArtifact(artifact); + const remotePath = path.join(deployDir, artifact.name); + await this.uploadFile(content, remotePath); + } + + // 4. Run pre-deploy hook + if (task.hooks?.preDeploy) { + await this.runRemoteHook(task.hooks.preDeploy, deployDir); + } + + // 5. Execute deployment script + const deployScript = artifacts.find(a => a.type === "deploy_script"); + if (deployScript) { + const scriptPath = path.join(deployDir, deployScript.name); + await this.exec(`chmod +x ${scriptPath}`); + + const result = await this.exec(scriptPath, { + cwd: deployDir, + timeout: config.deploymentTimeout, + env: config.environment, + }); + + if (result.exitCode !== 0) { + throw new DeploymentError(`Deploy script failed: ${result.stderr}`); + } + } + + // 6. Run post-deploy hook + if (task.hooks?.postDeploy) { + await this.runRemoteHook(task.hooks.postDeploy, deployDir); + } + + // 7. Health check + if (config.healthCheck) { + const healthy = await this.runHealthCheck(config.healthCheck); + if (!healthy) { + await this.rollback(task); + throw new HealthCheckFailedError("Health check failed"); + } + } + + // 8. Write version sticker + await this.writeSticker(config.sticker, deployDir); + + // 9. Cleanup backup + await this.exec(`rm -rf ${deployDir}/.stella-backup`); + + return { + success: true, + logs: this.getLogs(), + durationMs: this.getDuration(), + }; + + } finally { + this.ssh.end(); + } + } + + async rollback(task: RollbackTaskPayload): Promise { + const deployDir = task.config.deploymentDirectory; + + // Restore from backup + await this.exec(`rm -rf ${deployDir}/*`); + await this.exec(`cp -r ${deployDir}/.stella-backup/* ${deployDir}/`); + + // Re-run deployment from backup + const deployScript = path.join(deployDir, "deploy.sh"); + await this.exec(deployScript, { cwd: deployDir }); + + return { + success: true, + logs: this.getLogs(), + durationMs: this.getDuration(), + }; + } + + private async exec( + command: string, + options?: ExecOptions + ): Promise { + return new Promise((resolve, reject) => { + const timeout = options?.timeout || 60000; + let stdout = ""; + let stderr = ""; + + this.ssh.exec(command, { cwd: options?.cwd }, (err, stream) => { + if (err) { + reject(err); + return; + } + + const timer = setTimeout(() => { + stream.close(); + reject(new TimeoutError(`Command timed out after ${timeout}ms`)); + }, timeout); + + stream.on("data", (data: Buffer) => { + stdout += data.toString(); + this.log(data.toString()); + }); + + stream.stderr.on("data", (data: Buffer) => { + stderr += data.toString(); + this.log(`[stderr] ${data.toString()}`); + }); + + stream.on("close", (code: number) => { + clearTimeout(timer); + resolve({ exitCode: code, stdout, stderr }); + }); + }); + }); + } + + private async uploadFile(content: Buffer | string, remotePath: string): Promise { + return new Promise((resolve, reject) => { + this.ssh.sftp((err, sftp) => { + if (err) { + reject(err); + return; + } + + const writeStream = sftp.createWriteStream(remotePath); + writeStream.on("close", () => resolve()); + writeStream.on("error", reject); + writeStream.end(content); + }); + }); + } +} +``` + +### 10.4.2 WinRM Remote Executor + +```typescript +class WinRMRemoteExecutor implements TargetExecutor { + private winrm: WinRMClient; + + async connect(config: WinRMConnectionConfig): Promise { + const credential = await this.secrets.getSecret(config.credentialRef); + + this.winrm = new WinRMClient({ + host: config.host, + port: config.port || 5986, + username: credential.username, + password: credential.password, + protocol: config.useHttps ? "https" : "http", + authentication: config.authType || "ntlm", // ntlm, kerberos, basic + }); + + await this.winrm.openShell(); + } + + async deploy(task: DeployTaskPayload): Promise { + const { artifacts, config } = task; + const deployDir = config.deploymentDirectory; + + try { + // 1. Ensure deployment directory exists + await this.execPowerShell(` + if (-not (Test-Path "${deployDir}")) { + New-Item -ItemType Directory -Path "${deployDir}" -Force + } + if (-not (Test-Path "${deployDir}\\.stella-backup")) { + New-Item -ItemType Directory -Path "${deployDir}\\.stella-backup" -Force + } + `); + + // 2. Backup current deployment + await this.execPowerShell(` + Get-ChildItem "${deployDir}" -Exclude ".stella-backup" | + Copy-Item -Destination "${deployDir}\\.stella-backup" -Recurse -Force + `); + + // 3. Upload artifacts + for (const artifact of artifacts) { + const content = await this.fetchArtifact(artifact); + const remotePath = `${deployDir}\\${artifact.name}`; + await this.uploadFile(content, remotePath); + } + + // 4. Run pre-deploy hook + if (task.hooks?.preDeploy) { + await this.runRemoteHook(task.hooks.preDeploy, deployDir); + } + + // 5. Execute deployment script + const deployScript = artifacts.find(a => a.type === "deploy_script"); + if (deployScript) { + const scriptPath = `${deployDir}\\${deployScript.name}`; + + const result = await this.execPowerShell(` + Set-Location "${deployDir}" + & "${scriptPath}" + exit $LASTEXITCODE + `, { timeout: config.deploymentTimeout }); + + if (result.exitCode !== 0) { + throw new DeploymentError(`Deploy script failed: ${result.stderr}`); + } + } + + // 6. Run post-deploy hook + if (task.hooks?.postDeploy) { + await this.runRemoteHook(task.hooks.postDeploy, deployDir); + } + + // 7. Health check + if (config.healthCheck) { + const healthy = await this.runHealthCheck(config.healthCheck); + if (!healthy) { + await this.rollback(task); + throw new HealthCheckFailedError("Health check failed"); + } + } + + // 8. Write version sticker + await this.writeSticker(config.sticker, deployDir); + + // 9. Cleanup backup + await this.execPowerShell(` + Remove-Item -Path "${deployDir}\\.stella-backup" -Recurse -Force + `); + + return { + success: true, + logs: this.getLogs(), + durationMs: this.getDuration(), + }; + + } finally { + this.winrm.closeShell(); + } + } + + private async execPowerShell( + script: string, + options?: ExecOptions + ): Promise { + const encoded = Buffer.from(script, "utf16le").toString("base64"); + return this.winrm.runCommand( + `powershell -EncodedCommand ${encoded}`, + { timeout: options?.timeout || 60000 } + ); + } + + private async uploadFile(content: Buffer | string, remotePath: string): Promise { + // Use PowerShell to write file content + const base64Content = Buffer.from(content).toString("base64"); + + await this.execPowerShell(` + $bytes = [Convert]::FromBase64String("${base64Content}") + [IO.File]::WriteAllBytes("${remotePath}", $bytes) + `); + } +} +``` + +## 10.5 Immutable Artifact Generation + +### 10.5.1 Compose Lock File Generation + +```typescript +class ComposeLockGenerator { + async generate( + release: Release, + environment: Environment, + targets: Target[] + ): Promise { + + // Start with template or existing compose file + const template = await this.loadTemplate(environment.config.composeTemplate); + + // Build services section with pinned digests + const services: Record = {}; + + for (const component of release.components) { + const service = template.services[component.componentName] || {}; + + services[component.componentName] = { + ...service, + // CRITICAL: Always use digest, never tag + image: `${component.imageRepository}@${component.digest}`, + + // Environment variables + environment: this.mergeEnvironment( + service.environment, + environment.config.variables, + this.buildStellaEnv(release, environment) + ), + + // Labels for Stella tracking + labels: { + ...service.labels, + "stella.release.id": release.id, + "stella.release.name": release.name, + "stella.component.name": component.componentName, + "stella.component.digest": component.digest, + "stella.component.semver": component.semver, + "stella.environment": environment.name, + "stella.deployed.at": new Date().toISOString(), + }, + + // Replicas from environment config + deploy: { + ...service.deploy, + replicas: environment.config.replicas?.[component.componentName] || + service.deploy?.replicas || 1, + }, + }; + } + + // Build complete compose file + const composeLock = { + version: "3.8", + services, + + // Networks from template + networks: template.networks, + + // Volumes from template + volumes: template.volumes, + + // Stella metadata + "x-stella": { + release_id: release.id, + release_name: release.name, + environment: environment.name, + generated_at: new Date().toISOString(), + generator_version: this.version, + inputs_hash: this.computeInputsHash(release, environment), + components: release.components.map(c => ({ + name: c.componentName, + digest: c.digest, + semver: c.semver, + })), + }, + }; + + const content = yaml.stringify(composeLock); + const hash = crypto.createHash("sha256").update(content).digest("hex"); + + return { + type: "compose_lock", + name: "compose.stella.lock.yml", + content: Buffer.from(content), + contentHash: `sha256:${hash}`, + }; + } + + private mergeEnvironment( + serviceEnv: string[] | Record, + envVars: Record, + stellaEnv: Record + ): string[] { + const merged: Record = {}; + + // Parse service environment + if (Array.isArray(serviceEnv)) { + for (const entry of serviceEnv) { + const [key, ...valueParts] = entry.split("="); + merged[key] = valueParts.join("="); + } + } else if (serviceEnv) { + Object.assign(merged, serviceEnv); + } + + // Apply environment-specific overrides + Object.assign(merged, envVars); + + // Add Stella metadata + Object.assign(merged, stellaEnv); + + // Convert back to array format + return Object.entries(merged).map(([k, v]) => `${k}=${v}`); + } + + private buildStellaEnv(release: Release, environment: Environment): Record { + return { + STELLA_RELEASE_ID: release.id, + STELLA_RELEASE_NAME: release.name, + STELLA_ENVIRONMENT: environment.name, + STELLA_DEPLOYED_AT: new Date().toISOString(), + }; + } +} +``` + +### 10.5.2 Version Sticker Generation + +```typescript +class VersionStickerGenerator { + generate( + release: Release, + environment: Environment, + promotion: Promotion, + evidencePacket: EvidencePacket + ): VersionSticker { + + const sticker: VersionSticker = { + // Schema version + stella_version: "1.0", + + // Release identity + release_id: release.id, + release_name: release.name, + + // Component details + components: release.components.map(c => ({ + name: c.componentName, + digest: c.digest, + semver: c.semver, + tag: c.tag, + image_repository: c.imageRepository, + })), + + // Deployment context + environment: environment.name, + environment_id: environment.id, + deployed_at: new Date().toISOString(), + deployed_by: promotion.requestedBy, + + // Traceability + promotion_id: promotion.id, + workflow_run_id: promotion.workflowRunId, + + // Evidence chain + evidence_packet_id: evidencePacket.id, + evidence_packet_hash: evidencePacket.contentHash, + policy_decision_hash: promotion.decisionRecord?.inputsHash, + + // Orchestrator info + orchestrator_version: this.version, + + // Source reference + source_ref: release.sourceRef ? { + commit_sha: release.sourceRef.commitSha, + branch: release.sourceRef.branch, + repository: release.sourceRef.repository, + } : undefined, + }; + + return sticker; + } + + serialize(sticker: VersionSticker): string { + return JSON.stringify(sticker, null, 2); + } + + verify(sticker: VersionSticker, expectedRelease: Release): VerificationResult { + const errors: string[] = []; + + // Verify release ID + if (sticker.release_id !== expectedRelease.id) { + errors.push(`Release ID mismatch: ${sticker.release_id} vs ${expectedRelease.id}`); + } + + // Verify all component digests + for (const expectedComponent of expectedRelease.components) { + const stickerComponent = sticker.components.find( + c => c.name === expectedComponent.componentName + ); + + if (!stickerComponent) { + errors.push(`Missing component: ${expectedComponent.componentName}`); + continue; + } + + if (stickerComponent.digest !== expectedComponent.digest) { + errors.push( + `Digest mismatch for ${expectedComponent.componentName}: ` + + `${stickerComponent.digest} vs ${expectedComponent.digest}` + ); + } + } + + return { + valid: errors.length === 0, + errors, + }; + } +} +``` + +--- + +# 11. A/B & Progressive Delivery + +## 11.1 Progressive Delivery Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ PROGRESSIVE DELIVERY ARCHITECTURE │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ A/B RELEASE MANAGER │ │ +│ │ │ │ +│ │ - Create A/B release with variations │ │ +│ │ - Manage traffic split configuration │ │ +│ │ - Coordinate rollout stages │ │ +│ │ - Handle promotion/rollback │ │ +│ └──────────────────────────────┬──────────────────────────────────────┘ │ +│ │ │ +│ ┌──────────────────┴──────────────────┐ │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌───────────────────────┐ ┌───────────────────────┐ │ +│ │ TARGET-GROUP A/B │ │ ROUTER-BASED A/B │ │ +│ │ │ │ │ │ +│ │ Deploy to groups │ │ Configure traffic │ │ +│ │ by labels/membership │ │ via load balancer │ │ +│ │ │ │ │ │ +│ │ Good for: │ │ Good for: │ │ +│ │ - Background workers │ │ - Web/API traffic │ │ +│ │ - Batch processors │ │ - Customer-facing │ │ +│ │ - Internal services │ │ - L7 routing │ │ +│ └───────────────────────┘ └───────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ CANARY CONTROLLER │ │ +│ │ │ │ +│ │ - Execute rollout stages │ │ +│ │ - Monitor health metrics │ │ +│ │ - Auto-advance or pause │ │ +│ │ - Trigger rollback on failure │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ TRAFFIC ROUTER INTEGRATION │ │ +│ │ │ │ +│ │ Plugin-based integration with: │ │ +│ │ - Nginx (config generation + reload) │ │ +│ │ - HAProxy (config generation + reload) │ │ +│ │ - Traefik (dynamic config API) │ │ +│ │ - AWS ALB (target group weights) │ │ +│ │ - Custom (webhook) │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## 11.2 A/B Release Models + +### 11.2.1 Model 1: Target-Group A/B + +```typescript +interface TargetGroupABConfig { + type: "target-group"; + + // Group definitions + groupA: { + targetGroupId: UUID; // or labels + labels?: Record; + }; + groupB: { + targetGroupId: UUID; + labels?: Record; + }; + + // Rollout by scaling groups + rolloutStrategy: { + type: "scale-groups"; + stages: ScaleStage[]; + }; +} + +interface ScaleStage { + name: string; + groupAPercentage: number; // Percentage of group A targets active + groupBPercentage: number; // Percentage of group B targets active + duration?: number; // Auto-advance after duration (seconds) + healthThreshold?: number; // Required health % to advance + requireApproval?: boolean; +} + +// Example: Worker service canary +const workerCanaryConfig: TargetGroupABConfig = { + type: "target-group", + groupA: { labels: { "worker-group": "A" } }, + groupB: { labels: { "worker-group": "B" } }, + rolloutStrategy: { + type: "scale-groups", + stages: [ + // Stage 1: 100% A, 10% B (canary) + { name: "canary", groupAPercentage: 100, groupBPercentage: 10, + duration: 300, healthThreshold: 95 }, + // Stage 2: 100% A, 50% B + { name: "expand", groupAPercentage: 100, groupBPercentage: 50, + duration: 600, healthThreshold: 95 }, + // Stage 3: 50% A, 100% B + { name: "shift", groupAPercentage: 50, groupBPercentage: 100, + duration: 600, healthThreshold: 95 }, + // Stage 4: 0% A, 100% B (complete) + { name: "complete", groupAPercentage: 0, groupBPercentage: 100, + requireApproval: true }, + ], + }, +}; +``` + +### 11.2.2 Model 2: Router-Based A/B + +```typescript +interface RouterBasedABConfig { + type: "router-based"; + + // Router integration + routerIntegrationId: UUID; + + // Upstream configuration + upstreamName: string; // e.g., "api-backend" + variationA: { + targets: string[]; // Backend addresses + serviceName?: string; // Service discovery name + }; + variationB: { + targets: string[]; + serviceName?: string; + }; + + // Traffic split configuration + trafficSplit: TrafficSplitConfig; + + // Rollout strategy + rolloutStrategy: RouterRolloutStrategy; +} + +interface TrafficSplitConfig { + type: "weight" | "header" | "cookie" | "tenant" | "composite"; + + // Weight-based (percentage) + weights?: { A: number; B: number }; + + // Header-based + headerName?: string; + headerValueA?: string; + headerValueB?: string; + + // Cookie-based + cookieName?: string; + cookieValueA?: string; + cookieValueB?: string; + + // Tenant-based (by host/path) + tenantRules?: TenantRule[]; +} + +interface RouterRolloutStrategy { + type: "manual" | "time-based" | "health-based" | "composite"; + stages: RouterRolloutStage[]; +} + +interface RouterRolloutStage { + name: string; + trafficPercentageB: number; // % of traffic to variation B + + // Advancement criteria + duration?: number; // Auto-advance after duration + healthThreshold?: number; // Required health % + errorRateThreshold?: number; // Max error rate % + latencyThreshold?: number; // Max p99 latency ms + requireApproval?: boolean; + + // Optional: specific routing rules for this stage + routingOverrides?: TrafficSplitConfig; +} + +// Example: API canary with health-based advancement +const apiCanaryConfig: RouterBasedABConfig = { + type: "router-based", + routerIntegrationId: "nginx-prod", + upstreamName: "api-backend", + variationA: { serviceName: "api-v1" }, + variationB: { serviceName: "api-v2" }, + trafficSplit: { type: "weight", weights: { A: 100, B: 0 } }, + rolloutStrategy: { + type: "health-based", + stages: [ + { name: "canary-10", trafficPercentageB: 10, + duration: 300, healthThreshold: 99, errorRateThreshold: 1 }, + { name: "canary-25", trafficPercentageB: 25, + duration: 600, healthThreshold: 99, errorRateThreshold: 1 }, + { name: "canary-50", trafficPercentageB: 50, + duration: 900, healthThreshold: 99, errorRateThreshold: 1 }, + { name: "promote", trafficPercentageB: 100, + requireApproval: true }, + ], + }, +}; +``` + +## 11.3 Canary Controller Implementation + +```typescript +class CanaryController { + async executeRollout(abRelease: ABRelease): Promise { + const strategy = abRelease.rolloutStrategy; + + for (let i = 0; i < strategy.stages.length; i++) { + const stage = strategy.stages[i]; + const stageRecord = await this.startStage(abRelease, stage, i); + + try { + // 1. Apply traffic configuration for this stage + await this.applyStageTraffic(abRelease, stage); + this.emit("canary.stage_started", { abRelease, stage, stageNumber: i }); + + // 2. Wait for stage completion based on criteria + const result = await this.waitForStageCompletion(abRelease, stage); + + if (!result.success) { + // Health check failed - rollback + this.log(`Stage ${stage.name} failed health check: ${result.reason}`); + await this.rollback(abRelease, result.reason); + return; + } + + // 3. Check if approval required + if (stage.requireApproval) { + this.log(`Stage ${stage.name} requires approval`); + await this.pauseForApproval(abRelease, stage); + + // Wait for approval + const approval = await this.waitForApproval(abRelease, stage); + if (!approval.approved) { + await this.rollback(abRelease, "Approval denied"); + return; + } + } + + await this.completeStage(stageRecord, "succeeded"); + this.emit("canary.stage_completed", { abRelease, stage, stageNumber: i }); + + } catch (error) { + await this.completeStage(stageRecord, "failed", error.message); + await this.rollback(abRelease, error.message); + return; + } + } + + // Rollout complete + await this.completeRollout(abRelease); + this.emit("canary.promoted", { abRelease }); + } + + private async waitForStageCompletion( + abRelease: ABRelease, + stage: RolloutStage + ): Promise { + + const startTime = Date.now(); + const checkInterval = 30000; // 30 seconds + + while (true) { + // Check health metrics + const health = await this.checkHealth(abRelease, stage); + + if (!health.healthy) { + return { + success: false, + reason: `Health check failed: ${health.reason}` + }; + } + + // Check error rate (if threshold configured) + if (stage.errorRateThreshold !== undefined) { + const errorRate = await this.getErrorRate(abRelease); + if (errorRate > stage.errorRateThreshold) { + return { + success: false, + reason: `Error rate ${errorRate}% exceeds threshold ${stage.errorRateThreshold}%` + }; + } + } + + // Check latency (if threshold configured) + if (stage.latencyThreshold !== undefined) { + const latency = await this.getP99Latency(abRelease); + if (latency > stage.latencyThreshold) { + return { + success: false, + reason: `P99 latency ${latency}ms exceeds threshold ${stage.latencyThreshold}ms` + }; + } + } + + // Check duration (auto-advance) + if (stage.duration !== undefined) { + const elapsed = (Date.now() - startTime) / 1000; + if (elapsed >= stage.duration) { + return { success: true }; + } + } + + // Wait before next check + await sleep(checkInterval); + } + } + + private async applyStageTraffic(abRelease: ABRelease, stage: RolloutStage): Promise { + if (abRelease.config.type === "router-based") { + const router = await this.getRouterConnector(abRelease.config.routerIntegrationId); + + await router.shiftTraffic( + abRelease.config.variationA.serviceName, + abRelease.config.variationB.serviceName, + stage.trafficPercentageB + ); + + } else if (abRelease.config.type === "target-group") { + // Scale target groups + await this.scaleTargetGroup( + abRelease.config.groupA, + stage.groupAPercentage + ); + await this.scaleTargetGroup( + abRelease.config.groupB, + stage.groupBPercentage + ); + } + } + + async rollback(abRelease: ABRelease, reason: string): Promise { + this.log(`Rolling back A/B release: ${reason}`); + this.emit("canary.rollback_started", { abRelease, reason }); + + if (abRelease.config.type === "router-based") { + // Shift all traffic back to A + const router = await this.getRouterConnector(abRelease.config.routerIntegrationId); + await router.shiftTraffic( + abRelease.config.variationB.serviceName, + abRelease.config.variationA.serviceName, + 100 + ); + + } else if (abRelease.config.type === "target-group") { + // Scale B to 0, A to 100 + await this.scaleTargetGroup(abRelease.config.groupA, 100); + await this.scaleTargetGroup(abRelease.config.groupB, 0); + } + + abRelease.status = "rolled_back"; + await this.save(abRelease); + + this.emit("canary.rolled_back", { abRelease, reason }); + } +} +``` + +## 11.4 Router Plugin Implementations + +### 11.4.1 Nginx Router Plugin + +```typescript +class NginxRouterPlugin implements TrafficRouterPlugin { + async configureRoute(config: RouteConfig): Promise { + const upstreamConfig = this.generateUpstreamConfig(config); + const serverConfig = this.generateServerConfig(config); + + // Write configuration files + await this.writeConfig( + `/etc/nginx/conf.d/upstream-${config.upstream}.conf`, + upstreamConfig + ); + await this.writeConfig( + `/etc/nginx/conf.d/server-${config.upstream}.conf`, + serverConfig + ); + + // Validate configuration + const validation = await this.validateConfig(); + if (!validation.valid) { + throw new Error(`Nginx config validation failed: ${validation.error}`); + } + + // Reload nginx + await this.reload(); + } + + private generateUpstreamConfig(config: RouteConfig): string { + const lines: string[] = []; + + for (const variation of config.variations) { + lines.push(`upstream ${config.upstream}_${variation.name} {`); + + for (const target of variation.targets) { + lines.push(` server ${target};`); + } + + lines.push(`}`); + lines.push(``); + } + + // Combined upstream with weights (for percentage-based routing) + if (config.splitType === "weight") { + lines.push(`upstream ${config.upstream} {`); + + for (const variation of config.variations) { + const weight = variation.weight; + for (const target of variation.targets) { + lines.push(` server ${target} weight=${weight};`); + } + } + + lines.push(`}`); + } + + return lines.join("\n"); + } + + private generateServerConfig(config: RouteConfig): string { + if (config.splitType === "header" || config.splitType === "cookie") { + // Split block based on header/cookie + return ` +map $http_${config.headerName || "x-variation"} $${config.upstream}_backend { + default ${config.upstream}_A; + "${config.headerValueB || "B"}" ${config.upstream}_B; +} + +server { + listen 80; + server_name ${config.serverName}; + + location / { + proxy_pass http://$${config.upstream}_backend; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + } +} +`; + } else { + // Weight-based (default) + return ` +server { + listen 80; + server_name ${config.serverName}; + + location / { + proxy_pass http://${config.upstream}; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + } +} +`; + } + } + + async shiftTraffic(from: string, to: string, percentage: number): Promise { + const config = await this.getCurrentConfig(); + + // Update weights + for (const variation of config.variations) { + if (variation.name === to) { + variation.weight = percentage; + } else { + variation.weight = 100 - percentage; + } + } + + await this.configureRoute(config); + } + + async getTrafficDistribution(): Promise { + // Parse current nginx config to get weights + const config = await this.parseCurrentConfig(); + + return { + variations: config.variations.map(v => ({ + name: v.name, + percentage: v.weight, + targets: v.targets, + })), + }; + } +} +``` + +### 11.4.2 AWS ALB Router Plugin + +```typescript +class AWSALBRouterPlugin implements TrafficRouterPlugin { + private alb: AWS.ELBv2; + + async configureRoute(config: RouteConfig): Promise { + const listenerArn = config.listenerArn; + + // Create/update target groups for each variation + const targetGroupArns: Record = {}; + + for (const variation of config.variations) { + const tgArn = await this.ensureTargetGroup( + `${config.upstream}-${variation.name}`, + variation.targets + ); + targetGroupArns[variation.name] = tgArn; + } + + // Update listener rule with weighted target groups + await this.alb.modifyRule({ + RuleArn: config.ruleArn, + Actions: [{ + Type: "forward", + ForwardConfig: { + TargetGroups: config.variations.map(v => ({ + TargetGroupArn: targetGroupArns[v.name], + Weight: v.weight, + })), + TargetGroupStickinessConfig: { + Enabled: config.stickySession || false, + DurationSeconds: config.stickyDuration || 3600, + }, + }, + }], + }).promise(); + } + + async shiftTraffic(from: string, to: string, percentage: number): Promise { + const rule = await this.getRule(); + const forwardConfig = rule.Actions[0].ForwardConfig; + + // Update weights + for (const tg of forwardConfig.TargetGroups) { + if (tg.TargetGroupArn.includes(`-${to}`)) { + tg.Weight = percentage; + } else { + tg.Weight = 100 - percentage; + } + } + + await this.alb.modifyRule({ + RuleArn: rule.RuleArn, + Actions: rule.Actions, + }).promise(); + } + + async getTrafficDistribution(): Promise { + const rule = await this.getRule(); + const forwardConfig = rule.Actions[0].ForwardConfig; + + const variations = []; + for (const tg of forwardConfig.TargetGroups) { + const targets = await this.getTargetGroupTargets(tg.TargetGroupArn); + const name = tg.TargetGroupArn.split("-").pop(); // Extract variation name + + variations.push({ + name, + percentage: tg.Weight, + targets: targets.map(t => t.Id), + }); + } + + return { variations }; + } +} +``` + +--- + +# 12. UI/UX Specification + +## 12.1 Dashboard Specification + +### 12.1.1 Dashboard Layout + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ STELLA OPS SUITE │ +│ ┌─────┐ [User Menu ▼] │ +│ │Logo │ Dashboard Releases Environments Workflows Integrations │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌───────────────────────────────┐ ┌───────────────────────────────────┐ │ +│ │ SECURITY POSTURE │ │ RELEASE OPERATIONS │ │ +│ │ │ │ │ │ +│ │ ┌─────────┐ ┌─────────┐ │ │ ┌─────────┐ ┌─────────┐ │ │ +│ │ │Critical │ │ High │ │ │ │In Flight│ │Completed│ │ │ +│ │ │ 0 ● │ │ 3 ● │ │ │ │ 2 │ │ 47 │ │ │ +│ │ │reachable│ │reachable│ │ │ │deploys │ │ today │ │ │ +│ │ └─────────┘ └─────────┘ │ │ └─────────┘ └─────────┘ │ │ +│ │ │ │ │ │ +│ │ Blocked: 2 releases │ │ Pending Approval: 3 │ │ +│ │ Risk Drift: 1 env │ │ Failed (24h): 1 │ │ +│ │ │ │ │ │ +│ └───────────────────────────────┘ └───────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────┐ ┌───────────────────────────────────┐ │ +│ │ ESTATE HEALTH │ │ COMPLIANCE/AUDIT │ │ +│ │ │ │ │ │ +│ │ Agents: 12 online, 1 offline│ │ Evidence Complete: 98% │ │ +│ │ Targets: 45/47 healthy │ │ Policy Changes: 2 (this week) │ │ +│ │ Drift Detected: 2 targets │ │ Audit Exports: 5 (this month) │ │ +│ │ │ │ │ │ +│ └───────────────────────────────┘ └───────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ RECENT ACTIVITY │ │ +│ │ │ │ +│ │ ● 14:32 myapp-v2.3.1 deployed to prod (jane@example.com) │ │ +│ │ ○ 14:28 myapp-v2.3.1 promoted to stage (auto) │ │ +│ │ ● 14:15 api-v1.2.0 blocked: critical vuln CVE-2024-1234 │ │ +│ │ ○ 13:45 worker-v3.0.0 release created (john@example.com) │ │ +│ │ ● 13:30 Target prod-web-03 health: degraded │ │ +│ │ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### 12.1.2 Dashboard Metrics + +```typescript +interface DashboardMetrics { + // Security Posture + security: { + criticalReachable: number; + highReachable: number; + blockedReleases: number; + riskDriftEnvironments: number; + digestsAnalyzedToday: number; + digestQuota: number; + }; + + // Release Operations + operations: { + deploymentsInFlight: number; + deploymentsCompletedToday: number; + deploymentsFailed24h: number; + pendingApprovals: number; + averageDeployTime: number; // seconds + }; + + // Estate Health + estate: { + agentsOnline: number; + agentsOffline: number; + agentsDegraded: number; + targetsHealthy: number; + targetsUnhealthy: number; + targetsDrift: number; + }; + + // Compliance/Audit + compliance: { + evidenceCompleteness: number; // percentage + policyChangesThisWeek: number; + auditExportsThisMonth: number; + lastExportDate: DateTime; + }; +} +``` + +## 12.2 Workflow Editor Specification + +### 12.2.1 Graph Editor Component + +```typescript +interface WorkflowEditorState { + template: WorkflowTemplate; + selectedNode: string | null; + selectedEdge: string | null; + zoom: number; + pan: { x: number; y: number }; + mode: "select" | "pan" | "connect"; + clipboard: StepNode[] | null; + undoStack: WorkflowTemplate[]; + redoStack: WorkflowTemplate[]; +} + +interface WorkflowEditorProps { + template: WorkflowTemplate; + stepTypes: StepType[]; + readOnly: boolean; + onSave: (template: WorkflowTemplate) => void; + onValidate: (template: WorkflowTemplate) => ValidationResult; +} + +// Node component +interface NodeRendererProps { + node: StepNode; + stepType: StepType; + status?: StepRunStatus; // For run visualization + selected: boolean; + onSelect: () => void; + onMove: (position: Position) => void; + onConnect: (sourceHandle: string) => void; +} + +// Node display +const NodeRenderer: React.FC = ({ node, stepType, status, selected }) => { + const statusColor = getStatusColor(status); + + return ( +
+ + {/* Node header */} +
+ + {node.name} + {status && } +
+ + {/* Node body */} +
+ {stepType.name} + {node.timeout && ⏱ {node.timeout}s} +
+ + {/* Connection handles */} + + + + {/* Conditional indicator */} + {node.condition && ( +
+ +
+ )} +
+ ); +}; +``` + +### 12.2.2 Run Visualization Overlay + +```typescript +interface RunVisualizationProps { + template: WorkflowTemplate; + workflowRun: WorkflowRun; + stepRuns: StepRun[]; + onNodeClick: (nodeId: string) => void; +} + +const RunVisualization: React.FC = ({ + template, workflowRun, stepRuns, onNodeClick +}) => { + // WebSocket for real-time updates + const { subscribe, unsubscribe } = useWorkflowStream(workflowRun.id); + + useEffect(() => { + const handlers = { + 'step_started': (data) => updateStepStatus(data.nodeId, 'running'), + 'step_completed': (data) => updateStepStatus(data.nodeId, data.status), + 'step_log': (data) => appendLog(data.nodeId, data.line), + }; + + subscribe(handlers); + return () => unsubscribe(); + }, [workflowRun.id]); + + return ( +
+ {/* Workflow graph with status overlay */} + ( + setSelectedNode(node.id)} + /> + )} + edgeRenderer={(edge) => ( + + )} + /> + + {/* Log panel */} + {selectedNode && ( + + )} + + {/* Progress bar */} + +
+ ); +}; +``` + +## 12.3 Key UI Screens + +### 12.3.1 Environment Overview Screen + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ ENVIRONMENTS [+ New Environment] │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌──────────────────────────────────────────────────────────────────────┐ │ +│ │ ENVIRONMENT PIPELINE │ │ +│ │ │ │ +│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ +│ │ │ DEV │ ───► │ TEST │ ───► │ STAGE │ ───► │ PROD │ │ │ +│ │ │ │ │ │ │ │ │ │ │ │ +│ │ │ v2.4.0 │ │ v2.3.1 │ │ v2.3.1 │ │ v2.3.0 │ │ │ +│ │ │ ● 5 min │ │ ● 2h │ │ ● 1d │ │ ● 3d │ │ │ +│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │ +│ │ │ │ +│ └──────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────────────────────────────────────────────┐ │ +│ │ PRODUCTION [Manage] [View] │ │ +│ │ │ │ +│ │ Current Release: myapp-v2.3.0 │ │ +│ │ Deployed: 3 days ago by jane@example.com │ │ +│ │ Targets: 5 healthy, 0 unhealthy │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Pending Promotion: myapp-v2.3.1 [Review] │ │ │ +│ │ │ Waiting: 2 approvals (1/2) │ │ │ +│ │ │ Security: ✓ All gates pass │ │ │ +│ │ └─────────────────────────────────────────────────────────────┘ │ │ +│ │ │ │ +│ │ Freeze Windows: None active │ │ +│ │ Required Approvals: 2 │ │ +│ │ │ │ +│ └──────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### 12.3.2 Release Detail Screen + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ RELEASE: myapp-v2.3.1 │ +│ Created: 2 hours ago by jane@example.com │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ [Overview] [Components] [Security] [Deployments] [Evidence] │ +│ │ +│ ┌──────────────────────────────────────────────────────────────────────┐ │ +│ │ COMPONENTS │ │ +│ │ │ │ +│ │ ┌──────────────────────────────────────────────────────────────┐ │ │ +│ │ │ api │ │ │ +│ │ │ Version: 2.3.1 Digest: sha256:abc123... │ │ │ +│ │ │ Security: ✓ 0 critical, 0 high (0 reachable) │ │ │ +│ │ │ Image: registry.example.com/myapp/api@sha256:abc123 │ │ │ +│ │ └──────────────────────────────────────────────────────────────┘ │ │ +│ │ │ │ +│ │ ┌──────────────────────────────────────────────────────────────┐ │ │ +│ │ │ worker │ │ │ +│ │ │ Version: 2.3.1 Digest: sha256:def456... │ │ │ +│ │ │ Security: ✓ 0 critical, 0 high (0 reachable) │ │ │ +│ │ │ Image: registry.example.com/myapp/worker@sha256:def456 │ │ │ +│ │ └──────────────────────────────────────────────────────────────┘ │ │ +│ │ │ │ +│ └──────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────────────────────────────────────────────┐ │ +│ │ DEPLOYMENT STATUS │ │ +│ │ │ │ +│ │ dev ●────────────────────────────────────────● Deployed (2h) │ │ +│ │ test ●────────────────────────────────────────● Deployed (1h) │ │ +│ │ stage ○────────────────────────────────────────● Deploying... │ │ +│ │ prod ○ Not deployed │ │ +│ │ │ │ +│ └──────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ [Promote to Stage ▼] [Compare with Production] [Download Evidence] │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### 12.3.3 "Why Blocked?" Modal + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ WHY IS THIS PROMOTION BLOCKED? [Close] │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ Release: myapp-v2.4.0 → Production │ +│ │ +│ ┌──────────────────────────────────────────────────────────────────────┐ │ +│ │ ✗ SECURITY GATE FAILED │ │ +│ │ │ │ +│ │ Component 'api' has 1 critical reachable vulnerability: │ │ +│ │ │ │ +│ │ • CVE-2024-1234 (Critical, CVSS 9.8) │ │ +│ │ Package: log4j 2.14.0 │ │ +│ │ Reachability: ✓ Confirmed reachable via api/logging/Logger.java │ │ +│ │ Fixed in: 2.17.1 │ │ +│ │ [View Details] [View Evidence] │ │ +│ │ │ │ +│ │ Remediation: Update log4j to version 2.17.1 or later │ │ +│ │ │ │ +│ └──────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────────────────────────────────────────────┐ │ +│ │ ✓ APPROVAL GATE PASSED │ │ +│ │ │ │ +│ │ Required: 2 approvals │ │ +│ │ Received: 2 approvals │ │ +│ │ • john@example.com (2h ago): "LGTM" │ │ +│ │ • sarah@example.com (1h ago): "Approved for prod" │ │ +│ │ │ │ +│ └──────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────────────────────────────────────────────┐ │ +│ │ ✓ FREEZE WINDOW GATE PASSED │ │ +│ │ │ │ +│ │ No active freeze windows for production │ │ +│ │ │ │ +│ └──────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ Policy evaluated at: 2026-01-09T14:32:15Z │ +│ Policy hash: sha256:789xyz... │ +│ [View Full Decision Record] │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +# 13. Observability & Operations + +## 13.1 Metrics Specification + +### 13.1.1 Prometheus Metrics + +```yaml +# Promotion Metrics +- name: stella_promotions_total + type: counter + labels: [tenant_id, environment, status, release_name] + description: Total number of promotions + +- name: stella_promotion_duration_seconds + type: histogram + labels: [tenant_id, environment, status] + buckets: [1, 5, 10, 30, 60, 120, 300, 600] + description: Duration of promotion workflow + +- name: stella_promotion_gate_duration_seconds + type: histogram + labels: [tenant_id, gate_name, result] + buckets: [0.01, 0.05, 0.1, 0.5, 1, 5] + description: Duration of gate evaluation + +# Deployment Metrics +- name: stella_deployments_total + type: counter + labels: [tenant_id, environment, target_type, status] + description: Total number of deployments + +- name: stella_deployment_duration_seconds + type: histogram + labels: [tenant_id, environment, target_type, strategy] + buckets: [10, 30, 60, 120, 300, 600, 900, 1800] + description: Duration of deployment job + +- name: stella_deployment_tasks_total + type: counter + labels: [tenant_id, target_type, status] + description: Total number of deployment tasks + +# Security Metrics +- name: stella_security_gate_results_total + type: counter + labels: [tenant_id, result, block_reason] + description: Security gate evaluation results + +- name: stella_vulnerabilities_blocked_total + type: counter + labels: [tenant_id, severity, reachable] + description: Vulnerabilities that blocked releases + +# Agent Metrics +- name: stella_agents_status + type: gauge + labels: [tenant_id, agent_id, status] + description: Agent status (1=online, 0=offline) + +- name: stella_agent_task_duration_seconds + type: histogram + labels: [tenant_id, agent_id, task_type] + buckets: [1, 5, 10, 30, 60, 120, 300] + description: Duration of agent tasks + +- name: stella_agent_tasks_total + type: counter + labels: [tenant_id, agent_id, task_type, status] + description: Total number of agent tasks + +# Workflow Metrics +- name: stella_workflow_runs_total + type: counter + labels: [tenant_id, template_name, status] + description: Total workflow runs + +- name: stella_workflow_step_duration_seconds + type: histogram + labels: [tenant_id, step_type, status] + buckets: [1, 5, 10, 30, 60, 120, 300, 600] + description: Duration of workflow steps + +# API Metrics +- name: stella_api_requests_total + type: counter + labels: [method, path, status_code] + description: Total API requests + +- name: stella_api_request_duration_seconds + type: histogram + labels: [method, path] + buckets: [0.01, 0.05, 0.1, 0.5, 1, 5] + description: API request duration +``` + +### 13.1.2 Health Check Endpoints + +```yaml +# Liveness probe +GET /health/live +Response: { "status": "ok" } +Status: 200 if alive, 503 if not + +# Readiness probe +GET /health/ready +Response: { + "status": "ready", + "checks": { + "database": { "status": "ok", "latency_ms": 5 }, + "scheduler": { "status": "ok" }, + "plugin_runtime": { "status": "ok" } + } +} +Status: 200 if ready, 503 if not + +# Detailed health (authenticated) +GET /api/v1/health +Response: { + "status": "healthy", + "version": "1.0.0", + "uptime_seconds": 86400, + "components": { + "database": { "status": "healthy", "pool_size": 20, "active": 5 }, + "scheduler": { "status": "healthy", "jobs_queued": 12 }, + "agents": { "online": 10, "offline": 1 }, + "plugins": { "active": 5, "failed": 0 }, + "integrations": { "healthy": 8, "unhealthy": 1 } + } +} +``` + +## 13.2 Logging Specification + +### 13.2.1 Structured Log Format + +```json +{ + "timestamp": "2026-01-09T14:32:15.123Z", + "level": "info", + "module": "promotion-manager", + "message": "Promotion approved", + "context": { + "tenant_id": "uuid", + "promotion_id": "uuid", + "release_id": "uuid", + "environment": "prod", + "user_id": "uuid" + }, + "details": { + "approvals_count": 2, + "gates_passed": ["security", "approval", "freeze"], + "decision": "allow" + }, + "trace_id": "abc123", + "span_id": "def456", + "duration_ms": 45 +} +``` + +### 13.2.2 Log Levels and Categories + +| Level | Usage | +|-------|-------| +| `error` | Errors requiring attention; failures that impact functionality | +| `warn` | Potential issues; degraded functionality; approaching limits | +| `info` | Significant events; state changes; audit-relevant actions | +| `debug` | Detailed debugging info; request/response bodies | +| `trace` | Very detailed tracing; internal state; performance profiling | + +| Category | Examples | +|----------|----------| +| `api` | Request received, response sent, validation errors | +| `promotion` | Promotion requested, approved, rejected, completed | +| `deployment` | Deployment started, task assigned, completed, failed | +| `security` | Gate evaluation, vulnerability found, policy violation | +| `agent` | Agent registered, heartbeat, task execution | +| `workflow` | Workflow started, step executed, completed | +| `integration` | Integration tested, resource discovered, webhook received | + +## 13.3 Tracing Specification + +### 13.3.1 Trace Context Propagation + +```typescript +// Trace context structure +interface TraceContext { + traceId: string; // 32-char hex + spanId: string; // 16-char hex + parentSpanId?: string; + sampled: boolean; + baggage: Record; +} + +// Propagation headers +const TRACE_HEADERS = { + W3C_TRACEPARENT: "traceparent", + W3C_TRACESTATE: "tracestate", + BAGGAGE: "baggage", +}; + +// Example traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01 +``` + +### 13.3.2 Key Traces + +| Operation | Span Name | Attributes | +|-----------|-----------|------------| +| Promotion request | `promotion.request` | promotion_id, release_id, environment | +| Gate evaluation | `promotion.evaluate_gates` | gate_names, result | +| Workflow execution | `workflow.execute` | workflow_run_id, template_name | +| Step execution | `workflow.step.{type}` | step_run_id, node_id, inputs | +| Deployment job | `deployment.execute` | job_id, environment, strategy | +| Agent task | `agent.task.{type}` | task_id, agent_id, target_id | +| Plugin call | `plugin.{method}` | plugin_id, method, duration | + +## 13.4 Alerting Rules + +```yaml +# High-priority alerts +alerts: + - name: PromotionGateBlockRate + expr: | + rate(stella_security_gate_results_total{result="blocked"}[1h]) / + rate(stella_security_gate_results_total[1h]) > 0.5 + for: 15m + labels: + severity: warning + annotations: + summary: "High rate of security gate blocks" + description: "More than 50% of promotions are being blocked by security gates" + + - name: DeploymentFailureRate + expr: | + rate(stella_deployments_total{status="failed"}[1h]) / + rate(stella_deployments_total[1h]) > 0.1 + for: 10m + labels: + severity: critical + annotations: + summary: "High deployment failure rate" + description: "More than 10% of deployments are failing" + + - name: AgentOffline + expr: | + stella_agents_status{status="offline"} == 1 + for: 5m + labels: + severity: warning + annotations: + summary: "Agent offline" + description: "Agent {{ $labels.agent_id }} has been offline for 5 minutes" + + - name: PromotionStuck + expr: | + time() - stella_promotion_start_time{status="deploying"} > 1800 + for: 5m + labels: + severity: warning + annotations: + summary: "Promotion stuck in deploying state" + description: "Promotion {{ $labels.promotion_id }} has been deploying for more than 30 minutes" + + - name: IntegrationUnhealthy + expr: | + stella_integration_health{status="unhealthy"} == 1 + for: 10m + labels: + severity: warning + annotations: + summary: "Integration unhealthy" + description: "Integration {{ $labels.integration_name }} has been unhealthy for 10 minutes" +``` + +--- + +# 14. Implementation Roadmap + +## 14.1 Phased Delivery Plan + +### Phase 1: Foundation (Weeks 1-4) + +**Goal**: Core infrastructure and basic release management + +| Week | Deliverables | +|------|-------------| +| **Week 1** | Database schema migration; INTHUB integration-manager; connection-profiles | +| **Week 2** | ENVMGR environment-manager; target-registry (basic) | +| **Week 3** | RELMAN component-registry; version-manager; release-manager | +| **Week 4** | Basic release CRUD APIs; CLI commands; integration tests | + +**Exit Criteria**: +- [ ] Can create environments with config +- [ ] Can register components with image repos +- [ ] Can create releases with pinned digests +- [ ] Can list/search releases + +**Certified Path**: Manual release creation; no deployment yet + +--- + +### Phase 2: Workflow Engine (Weeks 5-8) + +**Goal**: Workflow execution capability + +| Week | Deliverables | +|------|-------------| +| **Week 5** | WORKFL step-registry; built-in step types (approval, policy-gate, notify) | +| **Week 6** | WORKFL workflow-designer; workflow template CRUD | +| **Week 7** | WORKFL workflow-engine; DAG execution; state machine | +| **Week 8** | Step executor; retry logic; timeout handling; workflow run APIs | + +**Exit Criteria**: +- [ ] Can create workflow templates via API +- [ ] Can execute workflows with approval steps +- [ ] Workflow state machine handles all transitions +- [ ] Step retries work correctly + +**Certified Path**: Approval-only workflows; no deployment execution yet + +--- + +### Phase 3: Promotion & Decision (Weeks 9-12) + +**Goal**: Promotion workflow with security gates + +| Week | Deliverables | +|------|-------------| +| **Week 9** | PROMOT promotion-manager; approval-gateway | +| **Week 10** | PROMOT decision-engine; security gate integration with SCANENG | +| **Week 11** | Gate registry; freeze window gate; SoD enforcement | +| **Week 12** | Promotion APIs; "Why blocked?" endpoint; decision record | + +**Exit Criteria**: +- [ ] Can request promotion +- [ ] Security gates evaluate scan verdicts +- [ ] Approval workflow enforces SoD +- [ ] Decision record captures gate results + +**Certified Path**: Promotions with security + approval gates; no deployment yet + +--- + +### Phase 4: Deployment Execution (Weeks 13-18) + +**Goal**: Deploy to Docker/Compose targets + +| Week | Deliverables | +|------|-------------| +| **Week 13** | AGENTS agent-core; agent registration; heartbeat | +| **Week 14** | AGENTS agent-docker; Docker host deployment | +| **Week 15** | AGENTS agent-compose; Compose deployment | +| **Week 16** | DEPLOY deploy-orchestrator; artifact-generator | +| **Week 17** | DEPLOY rollback-manager; version sticker writing | +| **Week 18** | RELEVI evidence-collector; evidence-signer; audit-exporter | + +**Exit Criteria**: +- [ ] Agents can register and receive tasks +- [ ] Docker deployment works with digest verification +- [ ] Compose deployment writes lock files +- [ ] Rollback restores previous version +- [ ] Evidence packets generated for deployments + +**Certified Path**: Full promotion → deployment flow for Docker/Compose + +--- + +### Phase 5: UI & Polish (Weeks 19-22) + +**Goal**: Web console for release orchestration + +| Week | Deliverables | +|------|-------------| +| **Week 19** | Dashboard components; metrics widgets | +| **Week 20** | Environment overview; release detail screens | +| **Week 21** | Workflow editor (graph); run visualization | +| **Week 22** | Promotion UI; approval queue; "Why blocked?" modal | + +**Exit Criteria**: +- [ ] Dashboard shows operational metrics +- [ ] Can manage environments/releases via UI +- [ ] Can create/edit workflows in graph editor +- [ ] Can approve promotions via UI + +**Certified Path**: Complete v1 user experience + +--- + +### Phase 6: Progressive Delivery (Weeks 23-26) + +**Goal**: A/B releases and canary deployments + +| Week | Deliverables | +|------|-------------| +| **Week 23** | PROGDL ab-manager; target-group A/B | +| **Week 24** | PROGDL canary-controller; stage execution | +| **Week 25** | PROGDL traffic-router; Nginx plugin | +| **Week 26** | Canary UI; traffic visualization; health monitoring | + +**Exit Criteria**: +- [ ] Can create A/B release with variations +- [ ] Canary controller advances stages based on health +- [ ] Traffic router shifts weights +- [ ] Rollback on health failure works + +**Certified Path**: Target-group A/B; Nginx router-based A/B + +--- + +### Phase 7: Extended Targets (Weeks 27-30) + +**Goal**: ECS and Nomad support; SSH/WinRM agentless + +| Week | Deliverables | +|------|-------------| +| **Week 27** | AGENTS agent-ssh; SSH remote executor | +| **Week 28** | AGENTS agent-winrm; WinRM remote executor | +| **Week 29** | AGENTS agent-ecs; ECS deployment | +| **Week 30** | AGENTS agent-nomad; Nomad deployment | + +**Exit Criteria**: +- [ ] SSH deployment works with script execution +- [ ] WinRM deployment works with PowerShell +- [ ] ECS task definition updates work +- [ ] Nomad job submissions work + +**Certified Path**: All target types operational + +--- + +### Phase 8: Plugin Ecosystem (Weeks 31-34) + +**Goal**: Full plugin system; external integrations + +| Week | Deliverables | +|------|-------------| +| **Week 31** | PLUGIN plugin-registry; plugin-loader | +| **Week 32** | PLUGIN plugin-sandbox; plugin-sdk | +| **Week 33** | GitHub plugin; GitLab plugin | +| **Week 34** | Jenkins plugin; Vault plugin | + +**Exit Criteria**: +- [ ] Can install and configure plugins +- [ ] Plugins can contribute step types +- [ ] Plugins can contribute integrations +- [ ] Plugin sandbox enforces limits + +**Certified Path**: GitHub + Harbor + Docker/Compose + Vault + +--- + +## 14.2 Resource Requirements + +### 14.2.1 Team Structure + +| Role | Count | Responsibilities | +|------|-------|------------------| +| **Tech Lead** | 1 | Architecture decisions; code review; unblocking | +| **Backend Engineers** | 4 | Module development; API implementation | +| **Frontend Engineers** | 2 | Web console; dashboard; workflow editor | +| **DevOps Engineer** | 1 | CI/CD; infrastructure; agent deployment | +| **QA Engineer** | 1 | Test automation; integration testing | +| **Technical Writer** | 0.5 | Documentation; API docs; user guides | + +### 14.2.2 Infrastructure Requirements + +| Component | Specification | +|-----------|---------------| +| **PostgreSQL** | Primary database; 16+ recommended; read replicas for scale | +| **Redis** | Job queues; caching; session storage | +| **Object Storage** | S3-compatible; evidence packets; large artifacts | +| **Container Runtime** | Docker; for plugin sandboxes | +| **Kubernetes** | Optional; for Stella core deployment (not required for targets) | + +## 14.3 Risk Mitigation + +| Risk | Likelihood | Impact | Mitigation | +|------|------------|--------|------------| +| **Agent security complexity** | High | High | Early security review; penetration testing; mTLS implementation in Phase 4 | +| **Workflow state machine edge cases** | Medium | High | Comprehensive state transition tests; chaos testing | +| **Plugin sandbox escapes** | Low | Critical | Security audit; capability restrictions; resource limits | +| **Database migration issues** | Medium | Medium | Staged rollout; rollback scripts; data validation | +| **UI performance with large workflows** | Medium | Medium | Virtual rendering; lazy loading; performance testing | +| **Integration compatibility** | High | Medium | Abstract connector interface; extensive integration tests | + +--- + +# 15. Appendices + +## 15.1 Glossary + +| Term | Definition | +|------|------------| +| **Component** | A deployable unit mapped to an image repository | +| **Digest** | Immutable SHA256 identifier of a container image | +| **Environment** | A deployment stage (dev, stage, prod) with configuration | +| **Evidence Packet** | Cryptographically signed record of a release decision | +| **Gate** | A policy check that must pass for promotion to proceed | +| **Promotion** | The process of moving a release to a target environment | +| **Release** | A bundle of component versions (digests) that are deployed together | +| **Target** | A deployment destination (host, service, container runtime) | +| **Version Sticker** | JSON file placed on target recording deployed version | +| **Workflow** | A DAG of steps defining a deployment process | + +## 15.2 Configuration Reference + +### 15.2.1 Environment Variables + +```bash +# Core +STELLA_DATABASE_URL=postgresql://user:pass@host:5432/stella +STELLA_REDIS_URL=redis://host:6379 +STELLA_SECRET_KEY=base64-encoded-32-bytes +STELLA_LOG_LEVEL=info +STELLA_LOG_FORMAT=json + +# Authority +STELLA_OAUTH_ISSUER=https://auth.example.com +STELLA_OAUTH_CLIENT_ID=stella-app +STELLA_OAUTH_CLIENT_SECRET=secret + +# Agents +STELLA_AGENT_LISTEN_PORT=8443 +STELLA_AGENT_TLS_CERT=/path/to/cert.pem +STELLA_AGENT_TLS_KEY=/path/to/key.pem +STELLA_AGENT_CA_CERT=/path/to/ca.pem + +# Plugins +STELLA_PLUGIN_DIR=/var/stella/plugins +STELLA_PLUGIN_SANDBOX_MEMORY=512m +STELLA_PLUGIN_SANDBOX_CPU=1 + +# Integrations +STELLA_VAULT_ADDR=https://vault.example.com +STELLA_VAULT_TOKEN=hvs.xxx +``` + +### 15.2.2 OPA Policy Examples + +```rego +# security_gate.rego +package stella.gates.security + +default allow = false + +allow { + input.release.components[_].security.reachable_critical == 0 + input.release.components[_].security.reachable_high == 0 +} + +deny[msg] { + component := input.release.components[_] + component.security.reachable_critical > 0 + msg := sprintf("Component %s has %d reachable critical vulnerabilities", + [component.name, component.security.reachable_critical]) +} + +# approval_gate.rego +package stella.gates.approval + +default allow = false + +allow { + count(input.approvals) >= input.environment.required_approvals + separation_of_duties_met +} + +separation_of_duties_met { + not input.environment.require_sod +} + +separation_of_duties_met { + input.environment.require_sod + approver_ids := {a.approver_id | a := input.approvals[_]; a.action == "approved"} + not input.promotion.requested_by in approver_ids +} + +# freeze_window_gate.rego +package stella.gates.freeze + +default allow = true + +allow = false { + window := input.environment.freeze_windows[_] + time.now_ns() >= time.parse_rfc3339_ns(window.start) + time.now_ns() <= time.parse_rfc3339_ns(window.end) + not input.promotion.requested_by in window.exceptions +} +``` + +## 15.3 API Error Codes + +| Code | HTTP Status | Description | +|------|-------------|-------------| +| `RELEASE_NOT_FOUND` | 404 | Release with specified ID does not exist | +| `ENVIRONMENT_NOT_FOUND` | 404 | Environment with specified ID does not exist | +| `PROMOTION_BLOCKED` | 403 | Promotion blocked by policy gates | +| `APPROVAL_REQUIRED` | 403 | Additional approvals required | +| `FREEZE_WINDOW_ACTIVE` | 403 | Environment is in freeze window | +| `DIGEST_MISMATCH` | 400 | Image digest does not match expected | +| `AGENT_OFFLINE` | 503 | Required agent is offline | +| `WORKFLOW_FAILED` | 500 | Workflow execution failed | +| `PLUGIN_ERROR` | 500 | Plugin returned an error | +| `QUOTA_EXCEEDED` | 429 | Digest analysis quota exceeded | +| `VALIDATION_ERROR` | 400 | Request validation failed | +| `UNAUTHORIZED` | 401 | Authentication required | +| `FORBIDDEN` | 403 | Insufficient permissions | + +## 15.4 Evidence Packet Schema + +```json +{ + "$schema": "https://stella-ops.io/schemas/evidence-packet-v1.json", + "type": "object", + "required": ["packet_type", "version", "timestamp", "tenant_id", "content", "signatures"], + "properties": { + "packet_type": { + "type": "string", + "enum": ["release_decision", "deployment", "rollback", "ab_promotion"] + }, + "version": { + "type": "string", + "const": "1.0" + }, + "timestamp": { + "type": "string", + "format": "date-time" + }, + "tenant_id": { + "type": "string", + "format": "uuid" + }, + "content": { + "type": "object", + "properties": { + "promotion_id": { "type": "string", "format": "uuid" }, + "release": { + "type": "object", + "properties": { + "id": { "type": "string", "format": "uuid" }, + "name": { "type": "string" }, + "components": { + "type": "array", + "items": { + "type": "object", + "properties": { + "name": { "type": "string" }, + "digest": { "type": "string", "pattern": "^sha256:[a-f0-9]{64}$" }, + "semver": { "type": "string" } + } + } + } + } + }, + "source_environment": { "type": "string" }, + "target_environment": { "type": "string" }, + "decision": { + "type": "object", + "properties": { + "outcome": { "type": "string", "enum": ["allow", "deny"] }, + "policy_hash": { "type": "string" }, + "inputs_hash": { "type": "string" }, + "evaluated_at": { "type": "string", "format": "date-time" }, + "gates": { + "type": "array", + "items": { + "type": "object", + "properties": { + "name": { "type": "string" }, + "passed": { "type": "boolean" }, + "reason": { "type": "string" }, + "evidence_refs": { "type": "array", "items": { "type": "string" } } + } + } + } + } + }, + "approvals": { + "type": "array", + "items": { + "type": "object", + "properties": { + "approver_id": { "type": "string", "format": "uuid" }, + "approver_email": { "type": "string", "format": "email" }, + "approved_at": { "type": "string", "format": "date-time" }, + "comment": { "type": "string" } + } + } + }, + "deployment": { + "type": "object", + "properties": { + "job_id": { "type": "string", "format": "uuid" }, + "started_at": { "type": "string", "format": "date-time" }, + "completed_at": { "type": "string", "format": "date-time" }, + "targets": { + "type": "array", + "items": { + "type": "object", + "properties": { + "target_id": { "type": "string", "format": "uuid" }, + "target_name": { "type": "string" }, + "status": { "type": "string" }, + "sticker_hash": { "type": "string" } + } + } + } + } + } + } + }, + "signatures": { + "type": "array", + "items": { + "type": "object", + "properties": { + "signer": { "type": "string" }, + "algorithm": { "type": "string" }, + "key_ref": { "type": "string" }, + "signature": { "type": "string" } + } + } + } + } +} +``` + +--- + +# Document Revision History + +| Version | Date | Author | Changes | +|---------|------|--------|---------| +| 1.0 | 2026-01-08 | Architecture Team | Initial architecture document | +| 2.0 | 2026-01-09 | Architecture Team | Unified architecture incorporating alternative proposal; added plugin system, version manager, A/B delivery, comprehensive APIs | + +--- + +**END OF DOCUMENT** \ No newline at end of file