Merge branch 'main' of https://git.stella-ops.org/stella-ops.org/git.stella-ops.org
This commit is contained in:
137
docs/modules/release-orchestrator/README.md
Normal file
137
docs/modules/release-orchestrator/README.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# Release Orchestrator
|
||||
|
||||
> Central release control plane for non-Kubernetes container estates.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Full Architecture Specification](../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
|
||||
## Purpose
|
||||
|
||||
The Release Orchestrator extends Stella Ops from a vulnerability scanning platform into **Stella Ops Suite** — a unified release control plane for non-Kubernetes container environments. It integrates:
|
||||
|
||||
- **Existing capabilities**: SBOM generation, reachability-aware vulnerability analysis, VEX support, policy engine, evidence locker, deterministic replay
|
||||
- **New capabilities**: Environment management, release orchestration, promotion workflows, deployment execution, progressive delivery, audit-grade release governance
|
||||
|
||||
## Scope
|
||||
|
||||
| In Scope | Out of Scope |
|
||||
|----------|--------------|
|
||||
| Non-K8s container deployments (Docker, Compose, ECS, Nomad) | Kubernetes deployments (use ArgoCD, Flux) |
|
||||
| Release identity via OCI digests | Tag-based release identity |
|
||||
| Plugin-extensible integrations | Hard-coded vendor integrations |
|
||||
| SSH/WinRM + agent-based deployment | Cloud-native serverless deployments |
|
||||
| L4/L7 traffic management via router plugins | Built-in service mesh |
|
||||
|
||||
## Documentation Structure
|
||||
|
||||
### Design & Principles
|
||||
- [Design Principles](design/principles.md) — Core principles and invariants
|
||||
- [Key Decisions](design/decisions.md) — Architectural decision record
|
||||
|
||||
### Implementation
|
||||
- [Implementation Guide](implementation-guide.md) — .NET 10 patterns and best practices
|
||||
- [Test Structure](test-structure.md) — Test organization and guidelines
|
||||
|
||||
### Module Architecture
|
||||
- [Module Overview](modules/overview.md) — All modules and themes
|
||||
- [Integration Hub (INTHUB)](modules/integration-hub.md) — External integrations
|
||||
- [Environment Manager (ENVMGR)](modules/environment-manager.md) — Environments and targets
|
||||
- [Release Manager (RELMAN)](modules/release-manager.md) — Release bundles and versions
|
||||
- [Workflow Engine (WORKFL)](modules/workflow-engine.md) — DAG execution
|
||||
- [Promotion Manager (PROMOT)](modules/promotion-manager.md) — Approvals and gates
|
||||
- [Deploy Orchestrator (DEPLOY)](modules/deploy-orchestrator.md) — Deployment execution
|
||||
- [Agents (AGENTS)](modules/agents.md) — Deployment agents
|
||||
- [Progressive Delivery (PROGDL)](modules/progressive-delivery.md) — A/B and canary
|
||||
- [Release Evidence (RELEVI)](modules/evidence.md) — Evidence packets
|
||||
- [Plugin System (PLUGIN)](modules/plugin-system.md) — Plugin infrastructure
|
||||
|
||||
### Data Model
|
||||
- [Database Schema](data-model/schema.md) — PostgreSQL schema specification
|
||||
- [Entity Definitions](data-model/entities.md) — Entity descriptions
|
||||
|
||||
### API Specification
|
||||
- [API Overview](api/overview.md) — API design principles
|
||||
- [Environment APIs](api/environments.md) — Environment endpoints
|
||||
- [Release APIs](api/releases.md) — Release endpoints
|
||||
- [Promotion APIs](api/promotions.md) — Promotion endpoints
|
||||
- [Workflow APIs](api/workflows.md) — Workflow endpoints
|
||||
- [Agent APIs](api/agents.md) — Agent endpoints
|
||||
- [WebSocket APIs](api/websockets.md) — Real-time endpoints
|
||||
|
||||
### Workflow Engine
|
||||
- [Template Structure](workflow/templates.md) — Workflow template specification
|
||||
- [Execution State Machine](workflow/execution.md) — Workflow state machine
|
||||
- [Promotion State Machine](workflow/promotion.md) — Promotion state machine
|
||||
|
||||
### Security
|
||||
- [Security Overview](security/overview.md) — Security principles
|
||||
- [Authentication & Authorization](security/auth.md) — AuthN/AuthZ
|
||||
- [Agent Security](security/agent-security.md) — Agent security model
|
||||
- [Threat Model](security/threat-model.md) — Threats and mitigations
|
||||
- [Audit Trail](security/audit-trail.md) — Audit logging
|
||||
|
||||
### Integrations
|
||||
- [Integration Overview](integrations/overview.md) — Integration types
|
||||
- [Connector Interface](integrations/connectors.md) — Connector specification
|
||||
- [Webhook Architecture](integrations/webhooks.md) — Webhook handling
|
||||
- [CI/CD Patterns](integrations/ci-cd.md) — CI/CD integration patterns
|
||||
|
||||
### Deployment
|
||||
- [Deployment Overview](deployment/overview.md) — Architecture overview
|
||||
- [Deployment Strategies](deployment/strategies.md) — Deployment strategies
|
||||
- [Agent-Based Deployment](deployment/agent-based.md) — Agent deployment
|
||||
- [Agentless Deployment](deployment/agentless.md) — SSH/WinRM deployment
|
||||
- [Artifact Generation](deployment/artifacts.md) — Generated artifacts
|
||||
|
||||
### Progressive Delivery
|
||||
- [Progressive Overview](progressive-delivery/overview.md) — Progressive delivery architecture
|
||||
- [A/B Releases](progressive-delivery/ab-releases.md) — A/B release models
|
||||
- [Canary Controller](progressive-delivery/canary.md) — Canary implementation
|
||||
- [Router Plugins](progressive-delivery/routers.md) — Traffic routing plugins
|
||||
|
||||
### UI/UX
|
||||
- [Dashboard Specification](ui/dashboard.md) — Dashboard screens
|
||||
- [Workflow Editor](ui/workflow-editor.md) — Workflow editor
|
||||
- [Screen Reference](ui/screens.md) — Key UI screens
|
||||
|
||||
### Operations
|
||||
- [Metrics](operations/metrics.md) — Metrics specification
|
||||
- [Logging](operations/logging.md) — Logging patterns
|
||||
- [Tracing](operations/tracing.md) — Distributed tracing
|
||||
- [Alerting](operations/alerting.md) — Alert rules
|
||||
|
||||
### Implementation
|
||||
- [Roadmap](roadmap.md) — Implementation phases
|
||||
- [Resource Requirements](roadmap.md#resource-requirements) — Sizing
|
||||
|
||||
### Appendices
|
||||
- [Glossary](appendices/glossary.md) — Term definitions
|
||||
- [Configuration Reference](appendices/config.md) — Configuration options
|
||||
- [Error Codes](appendices/errors.md) — API error codes
|
||||
- [Evidence Schema](appendices/evidence-schema.md) — Evidence packet format
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Key Principles
|
||||
|
||||
1. **Digest-first release identity** — Releases are immutable OCI digests, not tags
|
||||
2. **Evidence for every decision** — Every promotion/deployment produces sealed evidence
|
||||
3. **Pluggable everything, stable core** — Integrations are plugins; core is stable
|
||||
4. **No feature gating** — All plans include all features
|
||||
5. **Offline-first operation** — Core works in air-gapped environments
|
||||
6. **Immutable generated artifacts** — Every deployment generates stored artifacts
|
||||
|
||||
### Platform Themes
|
||||
|
||||
| Theme | Purpose |
|
||||
|-------|---------|
|
||||
| **INTHUB** | Integration hub — external system connections |
|
||||
| **ENVMGR** | Environment management — environments, targets, agents |
|
||||
| **RELMAN** | Release management — components, versions, releases |
|
||||
| **WORKFL** | Workflow engine — DAG execution, steps |
|
||||
| **PROMOT** | Promotion — approvals, gates, decisions |
|
||||
| **DEPLOY** | Deployment — execution, artifacts, rollback |
|
||||
| **AGENTS** | Agents — Docker, Compose, ECS, Nomad |
|
||||
| **PROGDL** | Progressive delivery — A/B, canary |
|
||||
| **RELEVI** | Evidence — packets, stickers, audit |
|
||||
| **PLUGIN** | Plugins — registry, loader, SDK |
|
||||
274
docs/modules/release-orchestrator/api/agents.md
Normal file
274
docs/modules/release-orchestrator/api/agents.md
Normal file
@@ -0,0 +1,274 @@
|
||||
# Agent APIs
|
||||
|
||||
> API endpoints for agent registration, lifecycle management, and task coordination.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 6.3.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Agents Module](../modules/agents.md), [Agent Security](../security/agent-security.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The Agent API provides endpoints for registering deployment agents, managing their lifecycle, and coordinating task execution. Agents use mTLS for secure communication after initial registration.
|
||||
|
||||
---
|
||||
|
||||
## Registration Endpoints
|
||||
|
||||
### Register Agent
|
||||
|
||||
**Endpoint:** `POST /api/v1/agents/register`
|
||||
|
||||
Registers a new agent with the orchestrator. Requires a one-time registration token.
|
||||
|
||||
**Headers:**
|
||||
```
|
||||
X-Agent-Token: {registration-token}
|
||||
```
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"name": "agent-prod-01",
|
||||
"version": "1.0.0",
|
||||
"capabilities": ["docker", "compose"],
|
||||
"labels": {
|
||||
"datacenter": "us-east-1",
|
||||
"role": "deployment"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `201 Created`
|
||||
```json
|
||||
{
|
||||
"agentId": "uuid",
|
||||
"token": "jwt-token-for-subsequent-requests",
|
||||
"config": {
|
||||
"heartbeatInterval": 30,
|
||||
"taskPollInterval": 5,
|
||||
"logLevel": "info"
|
||||
},
|
||||
"certificate": {
|
||||
"cert": "-----BEGIN CERTIFICATE-----...",
|
||||
"key": "-----BEGIN PRIVATE KEY-----...",
|
||||
"ca": "-----BEGIN CERTIFICATE-----...",
|
||||
"expiresAt": "2026-01-11T14:23:45Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Notes:**
|
||||
- Registration token is single-use and expires after 24 hours
|
||||
- After registration, agent must use mTLS for all subsequent requests
|
||||
- Certificate is short-lived (24h) and must be renewed via heartbeat
|
||||
|
||||
---
|
||||
|
||||
## Lifecycle Endpoints
|
||||
|
||||
### List Agents
|
||||
|
||||
**Endpoint:** `GET /api/v1/agents`
|
||||
|
||||
**Query Parameters:**
|
||||
- `status` (string): Filter by status (`online`, `offline`, `degraded`)
|
||||
- `capability` (string): Filter by capability (`docker`, `compose`, `ssh`, `winrm`, `ecs`, `nomad`)
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "uuid",
|
||||
"name": "agent-prod-01",
|
||||
"version": "1.0.0",
|
||||
"status": "online",
|
||||
"capabilities": ["docker", "compose"],
|
||||
"lastHeartbeat": "2026-01-10T14:23:45Z",
|
||||
"resourceUsage": {
|
||||
"cpu": 15.5,
|
||||
"memory": 45.2
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### Get Agent
|
||||
|
||||
**Endpoint:** `GET /api/v1/agents/{id}`
|
||||
|
||||
**Response:** `200 OK` - Full agent details including assigned targets
|
||||
|
||||
### Update Agent
|
||||
|
||||
**Endpoint:** `PUT /api/v1/agents/{id}`
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"labels": {
|
||||
"datacenter": "us-west-2"
|
||||
},
|
||||
"capabilities": ["docker", "compose", "ssh"]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `200 OK` - Updated agent
|
||||
|
||||
### Delete Agent
|
||||
|
||||
**Endpoint:** `DELETE /api/v1/agents/{id}`
|
||||
|
||||
Revokes agent credentials and removes registration.
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{ "deleted": true }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Heartbeat Endpoints
|
||||
|
||||
### Send Heartbeat
|
||||
|
||||
**Endpoint:** `POST /api/v1/agents/{id}/heartbeat`
|
||||
|
||||
Agents must send heartbeats at the configured interval to maintain online status and receive pending tasks.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"resourceUsage": {
|
||||
"cpu": 15.5,
|
||||
"memory": 45.2,
|
||||
"disk": 60.0
|
||||
},
|
||||
"capabilities": ["docker", "compose"],
|
||||
"runningTasks": 2
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"tasks": [
|
||||
{
|
||||
"taskId": "uuid",
|
||||
"taskType": "docker.pull",
|
||||
"payload": {
|
||||
"image": "myapp",
|
||||
"tag": "v2.3.1",
|
||||
"digest": "sha256:abc123..."
|
||||
},
|
||||
"credentials": {
|
||||
"registry.username": "user",
|
||||
"registry.password": "token"
|
||||
},
|
||||
"timeout": 300
|
||||
}
|
||||
],
|
||||
"certificateRenewal": {
|
||||
"cert": "-----BEGIN CERTIFICATE-----...",
|
||||
"expiresAt": "2026-01-11T14:23:45Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Notes:**
|
||||
- Certificate renewal is included when current certificate is within 1 hour of expiration
|
||||
- Tasks array contains pending work for the agent
|
||||
- Missing heartbeats for 3 intervals marks agent as `offline`
|
||||
|
||||
---
|
||||
|
||||
## Task Endpoints
|
||||
|
||||
### Complete Task
|
||||
|
||||
**Endpoint:** `POST /api/v1/agents/{id}/tasks/{taskId}/complete`
|
||||
|
||||
Reports task completion status back to the orchestrator.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"result": {
|
||||
"imageId": "sha256:abc123...",
|
||||
"containerId": "container-uuid"
|
||||
},
|
||||
"logs": [
|
||||
{ "timestamp": "2026-01-10T14:23:45Z", "level": "info", "message": "Pulling image..." },
|
||||
{ "timestamp": "2026-01-10T14:23:50Z", "level": "info", "message": "Image pulled successfully" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{ "acknowledged": true }
|
||||
```
|
||||
|
||||
### Get Pending Tasks
|
||||
|
||||
**Endpoint:** `GET /api/v1/agents/{id}/tasks`
|
||||
|
||||
Alternative to heartbeat for polling pending tasks.
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"tasks": [
|
||||
{
|
||||
"taskId": "uuid",
|
||||
"taskType": "docker.run",
|
||||
"priority": 10,
|
||||
"createdAt": "2026-01-10T14:20:00Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## WebSocket Endpoints
|
||||
|
||||
### Task Stream
|
||||
|
||||
**Endpoint:** `WS /api/v1/agents/{id}/task-stream`
|
||||
|
||||
Real-time task assignment stream for agents.
|
||||
|
||||
**Messages (Server to Agent):**
|
||||
```json
|
||||
{ "type": "task_assigned", "task": { "taskId": "uuid", "taskType": "docker.pull", ... } }
|
||||
{ "type": "task_cancelled", "taskId": "uuid" }
|
||||
```
|
||||
|
||||
**Messages (Agent to Server):**
|
||||
```json
|
||||
{ "type": "task_progress", "taskId": "uuid", "progress": 50, "message": "Pulling layer 3/5" }
|
||||
{ "type": "task_log", "taskId": "uuid", "level": "info", "message": "..." }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Responses
|
||||
|
||||
| Status Code | Description |
|
||||
|-------------|-------------|
|
||||
| `401` | Invalid or expired registration token |
|
||||
| `403` | Agent not authorized for this operation |
|
||||
| `404` | Agent not found |
|
||||
| `409` | Agent name already registered |
|
||||
| `503` | Agent offline or unreachable |
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Environments API](environments.md)
|
||||
- [Agents Module](../modules/agents.md)
|
||||
- [Agent Security](../security/agent-security.md)
|
||||
- [WebSocket APIs](websockets.md)
|
||||
289
docs/modules/release-orchestrator/api/environments.md
Normal file
289
docs/modules/release-orchestrator/api/environments.md
Normal file
@@ -0,0 +1,289 @@
|
||||
# Environment Management APIs
|
||||
|
||||
> API endpoints for managing environments, targets, agents, freeze windows, and inventory.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 6.3.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Environment Manager](../modules/environment-manager.md), [Agents](../modules/agents.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The Environment Management API provides CRUD operations for environments, target groups, deployment targets, agents, freeze windows, and inventory synchronization. All endpoints require authentication and respect tenant isolation via Row-Level Security.
|
||||
|
||||
---
|
||||
|
||||
## Environment Endpoints
|
||||
|
||||
### Create Environment
|
||||
|
||||
**Endpoint:** `POST /api/v1/environments`
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"name": "production",
|
||||
"displayName": "Production",
|
||||
"orderIndex": 3,
|
||||
"config": {
|
||||
"deploymentTimeout": 600,
|
||||
"healthCheckInterval": 30
|
||||
},
|
||||
"requiredApprovals": 2,
|
||||
"requireSod": true,
|
||||
"promotionPolicy": "default"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `201 Created`
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"name": "production",
|
||||
"displayName": "Production",
|
||||
"orderIndex": 3,
|
||||
"isProduction": true,
|
||||
"requiredApprovals": 2,
|
||||
"requireSeparationOfDuties": true,
|
||||
"createdAt": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
### List Environments
|
||||
|
||||
**Endpoint:** `GET /api/v1/environments`
|
||||
|
||||
**Query Parameters:**
|
||||
- `includeState` (boolean): Include current release state
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "uuid",
|
||||
"name": "development",
|
||||
"displayName": "Development",
|
||||
"orderIndex": 1,
|
||||
"currentRelease": {
|
||||
"id": "release-uuid",
|
||||
"name": "myapp-v2.3.1",
|
||||
"deployedAt": "2026-01-09T10:00:00Z"
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### Get Environment
|
||||
|
||||
**Endpoint:** `GET /api/v1/environments/{id}`
|
||||
|
||||
**Response:** `200 OK` - Full environment details
|
||||
|
||||
### Update Environment
|
||||
|
||||
**Endpoint:** `PUT /api/v1/environments/{id}`
|
||||
|
||||
**Request:** Partial environment object
|
||||
|
||||
**Response:** `200 OK` - Updated environment
|
||||
|
||||
### Delete Environment
|
||||
|
||||
**Endpoint:** `DELETE /api/v1/environments/{id}`
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{ "deleted": true }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Freeze Window Endpoints
|
||||
|
||||
### Create Freeze Window
|
||||
|
||||
**Endpoint:** `POST /api/v1/environments/{envId}/freeze-windows`
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"start": "2026-01-15T00:00:00Z",
|
||||
"end": "2026-01-20T00:00:00Z",
|
||||
"reason": "Holiday freeze",
|
||||
"exceptions": ["user-uuid-1", "user-uuid-2"]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `201 Created`
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"environmentId": "env-uuid",
|
||||
"start": "2026-01-15T00:00:00Z",
|
||||
"end": "2026-01-20T00:00:00Z",
|
||||
"reason": "Holiday freeze",
|
||||
"createdBy": "user-uuid"
|
||||
}
|
||||
```
|
||||
|
||||
### List Freeze Windows
|
||||
|
||||
**Endpoint:** `GET /api/v1/environments/{envId}/freeze-windows`
|
||||
|
||||
**Query Parameters:**
|
||||
- `active` (boolean): Filter to active freeze windows only
|
||||
|
||||
**Response:** `200 OK` - Array of freeze windows
|
||||
|
||||
### Delete Freeze Window
|
||||
|
||||
**Endpoint:** `DELETE /api/v1/environments/{envId}/freeze-windows/{windowId}`
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{ "deleted": true }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Target Group Endpoints
|
||||
|
||||
### Create Target Group
|
||||
|
||||
**Endpoint:** `POST /api/v1/environments/{envId}/target-groups`
|
||||
|
||||
### List Target Groups
|
||||
|
||||
**Endpoint:** `GET /api/v1/environments/{envId}/target-groups`
|
||||
|
||||
### Get Target Group
|
||||
|
||||
**Endpoint:** `GET /api/v1/target-groups/{id}`
|
||||
|
||||
### Update Target Group
|
||||
|
||||
**Endpoint:** `PUT /api/v1/target-groups/{id}`
|
||||
|
||||
### Delete Target Group
|
||||
|
||||
**Endpoint:** `DELETE /api/v1/target-groups/{id}`
|
||||
|
||||
---
|
||||
|
||||
## Target Endpoints
|
||||
|
||||
### Create Target
|
||||
|
||||
**Endpoint:** `POST /api/v1/targets`
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"environmentId": "env-uuid",
|
||||
"targetGroupId": "group-uuid",
|
||||
"name": "prod-web-01",
|
||||
"targetType": "docker_host",
|
||||
"connection": {
|
||||
"host": "192.168.1.100",
|
||||
"port": 2375,
|
||||
"tlsEnabled": true
|
||||
},
|
||||
"labels": {
|
||||
"role": "web",
|
||||
"datacenter": "us-east-1"
|
||||
},
|
||||
"deploymentDirectory": "/opt/deployments"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `201 Created`
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"name": "prod-web-01",
|
||||
"targetType": "docker_host",
|
||||
"healthStatus": "unknown",
|
||||
"createdAt": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
### List Targets
|
||||
|
||||
**Endpoint:** `GET /api/v1/targets`
|
||||
|
||||
**Query Parameters:**
|
||||
- `environmentId` (UUID): Filter by environment
|
||||
- `targetType` (string): Filter by type (`docker_host`, `compose_host`, `ecs_service`, `nomad_job`)
|
||||
- `labels` (JSON): Filter by labels
|
||||
- `healthStatus` (string): Filter by health status
|
||||
|
||||
**Response:** `200 OK` - Array of targets
|
||||
|
||||
### Get Target
|
||||
|
||||
**Endpoint:** `GET /api/v1/targets/{id}`
|
||||
|
||||
### Update Target
|
||||
|
||||
**Endpoint:** `PUT /api/v1/targets/{id}`
|
||||
|
||||
### Delete Target
|
||||
|
||||
**Endpoint:** `DELETE /api/v1/targets/{id}`
|
||||
|
||||
### Trigger Health Check
|
||||
|
||||
**Endpoint:** `POST /api/v1/targets/{id}/health-check`
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"message": "Docker daemon responding",
|
||||
"checkedAt": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
### Get Version Sticker
|
||||
|
||||
**Endpoint:** `GET /api/v1/targets/{id}/sticker`
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"releaseId": "uuid",
|
||||
"releaseName": "myapp-v2.3.1",
|
||||
"components": [
|
||||
{
|
||||
"componentId": "uuid",
|
||||
"componentName": "api",
|
||||
"digest": "sha256:abc123..."
|
||||
}
|
||||
],
|
||||
"deployedAt": "2026-01-09T10:00:00Z",
|
||||
"deployedBy": "user-uuid"
|
||||
}
|
||||
```
|
||||
|
||||
### Check Drift
|
||||
|
||||
**Endpoint:** `GET /api/v1/targets/{id}/drift`
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"hasDrift": true,
|
||||
"expected": { "releaseId": "uuid", "digest": "sha256:abc..." },
|
||||
"actual": { "digest": "sha256:def..." },
|
||||
"differences": [
|
||||
{ "component": "api", "expected": "sha256:abc...", "actual": "sha256:def..." }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Agents API](agents.md)
|
||||
- [Environment Manager Module](../modules/environment-manager.md)
|
||||
- [Agent Security](../security/agent-security.md)
|
||||
299
docs/modules/release-orchestrator/api/overview.md
Normal file
299
docs/modules/release-orchestrator/api/overview.md
Normal file
@@ -0,0 +1,299 @@
|
||||
# API Overview
|
||||
|
||||
**Version**: v1
|
||||
**Base Path**: `/api/v1`
|
||||
|
||||
## Design Principles
|
||||
|
||||
| Principle | Implementation |
|
||||
|-----------|----------------|
|
||||
| **RESTful** | Resource-oriented URLs, standard HTTP methods |
|
||||
| **Versioned** | `/api/v1/...` prefix; breaking changes require version bump |
|
||||
| **Consistent** | Standard response envelope, error format, pagination |
|
||||
| **Authenticated** | OAuth 2.0 Bearer tokens via Authority module |
|
||||
| **Tenant-scoped** | Tenant ID from token; all operations scoped to tenant |
|
||||
| **Audited** | All mutating operations logged with user/timestamp |
|
||||
|
||||
## Authentication
|
||||
|
||||
All API requests require a valid JWT Bearer token:
|
||||
|
||||
```http
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
Tokens are issued by the Authority module and contain:
|
||||
- `user_id`: User identifier
|
||||
- `tenant_id`: Tenant scope
|
||||
- `roles`: User roles
|
||||
- `permissions`: Specific permissions
|
||||
|
||||
## Standard Response Envelope
|
||||
|
||||
### Success Response
|
||||
|
||||
```typescript
|
||||
interface ApiResponse<T> {
|
||||
success: true;
|
||||
data: T;
|
||||
meta?: {
|
||||
pagination?: PaginationMeta;
|
||||
requestId: string;
|
||||
timestamp: string;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Error Response
|
||||
|
||||
```typescript
|
||||
interface ApiErrorResponse {
|
||||
success: false;
|
||||
error: {
|
||||
code: string; // e.g., "PROMOTION_BLOCKED"
|
||||
message: string; // Human-readable message
|
||||
details?: object; // Additional context
|
||||
validationErrors?: ValidationError[];
|
||||
};
|
||||
meta: {
|
||||
requestId: string;
|
||||
timestamp: string;
|
||||
};
|
||||
}
|
||||
|
||||
interface ValidationError {
|
||||
field: string;
|
||||
message: string;
|
||||
code: string;
|
||||
}
|
||||
```
|
||||
|
||||
### Pagination
|
||||
|
||||
```typescript
|
||||
interface PaginationMeta {
|
||||
page: number;
|
||||
pageSize: number;
|
||||
totalItems: number;
|
||||
totalPages: number;
|
||||
hasNext: boolean;
|
||||
hasPrevious: boolean;
|
||||
}
|
||||
```
|
||||
|
||||
## HTTP Status Codes
|
||||
|
||||
| Code | Description |
|
||||
|------|-------------|
|
||||
| `200` | Success |
|
||||
| `201` | Created |
|
||||
| `204` | No Content |
|
||||
| `400` | Bad Request - validation error |
|
||||
| `401` | Unauthorized - invalid/missing token |
|
||||
| `403` | Forbidden - insufficient permissions |
|
||||
| `404` | Not Found |
|
||||
| `409` | Conflict - resource state conflict |
|
||||
| `422` | Unprocessable Entity - business rule violation |
|
||||
| `429` | Too Many Requests - rate limited |
|
||||
| `500` | Internal Server Error |
|
||||
|
||||
## Common Query Parameters
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `page` | integer | Page number (1-indexed) |
|
||||
| `pageSize` | integer | Items per page (max 100) |
|
||||
| `sort` | string | Sort field (prefix `-` for descending) |
|
||||
| `filter` | string | JSON filter expression |
|
||||
|
||||
## API Modules
|
||||
|
||||
### Integration Hub (INTHUB)
|
||||
|
||||
```
|
||||
GET /api/v1/integration-types
|
||||
GET /api/v1/integration-types/{typeId}
|
||||
POST /api/v1/integrations
|
||||
GET /api/v1/integrations
|
||||
GET /api/v1/integrations/{id}
|
||||
PUT /api/v1/integrations/{id}
|
||||
DELETE /api/v1/integrations/{id}
|
||||
POST /api/v1/integrations/{id}/test
|
||||
POST /api/v1/integrations/{id}/discover
|
||||
GET /api/v1/integrations/{id}/health
|
||||
```
|
||||
|
||||
### Environment & Inventory (ENVMGR)
|
||||
|
||||
```
|
||||
POST /api/v1/environments
|
||||
GET /api/v1/environments
|
||||
GET /api/v1/environments/{id}
|
||||
PUT /api/v1/environments/{id}
|
||||
DELETE /api/v1/environments/{id}
|
||||
POST /api/v1/environments/{envId}/freeze-windows
|
||||
GET /api/v1/environments/{envId}/freeze-windows
|
||||
DELETE /api/v1/environments/{envId}/freeze-windows/{windowId}
|
||||
POST /api/v1/targets
|
||||
GET /api/v1/targets
|
||||
GET /api/v1/targets/{id}
|
||||
PUT /api/v1/targets/{id}
|
||||
DELETE /api/v1/targets/{id}
|
||||
POST /api/v1/targets/{id}/health-check
|
||||
GET /api/v1/targets/{id}/sticker
|
||||
GET /api/v1/targets/{id}/drift
|
||||
POST /api/v1/agents/register
|
||||
GET /api/v1/agents
|
||||
GET /api/v1/agents/{id}
|
||||
PUT /api/v1/agents/{id}
|
||||
DELETE /api/v1/agents/{id}
|
||||
POST /api/v1/agents/{id}/heartbeat
|
||||
```
|
||||
|
||||
### Release Management (RELMAN)
|
||||
|
||||
```
|
||||
POST /api/v1/components
|
||||
GET /api/v1/components
|
||||
GET /api/v1/components/{id}
|
||||
PUT /api/v1/components/{id}
|
||||
DELETE /api/v1/components/{id}
|
||||
POST /api/v1/components/{id}/sync-versions
|
||||
GET /api/v1/components/{id}/versions
|
||||
POST /api/v1/releases
|
||||
GET /api/v1/releases
|
||||
GET /api/v1/releases/{id}
|
||||
PUT /api/v1/releases/{id}
|
||||
DELETE /api/v1/releases/{id}
|
||||
GET /api/v1/releases/{id}/state
|
||||
POST /api/v1/releases/{id}/deprecate
|
||||
GET /api/v1/releases/{id}/compare/{otherId}
|
||||
POST /api/v1/releases/from-latest
|
||||
```
|
||||
|
||||
### Workflow Engine (WORKFL)
|
||||
|
||||
```
|
||||
POST /api/v1/workflow-templates
|
||||
GET /api/v1/workflow-templates
|
||||
GET /api/v1/workflow-templates/{id}
|
||||
PUT /api/v1/workflow-templates/{id}
|
||||
DELETE /api/v1/workflow-templates/{id}
|
||||
POST /api/v1/workflow-templates/{id}/validate
|
||||
GET /api/v1/step-types
|
||||
GET /api/v1/step-types/{type}
|
||||
POST /api/v1/workflow-runs
|
||||
GET /api/v1/workflow-runs
|
||||
GET /api/v1/workflow-runs/{id}
|
||||
POST /api/v1/workflow-runs/{id}/pause
|
||||
POST /api/v1/workflow-runs/{id}/resume
|
||||
POST /api/v1/workflow-runs/{id}/cancel
|
||||
GET /api/v1/workflow-runs/{id}/steps
|
||||
GET /api/v1/workflow-runs/{id}/steps/{nodeId}
|
||||
GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs
|
||||
GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts
|
||||
```
|
||||
|
||||
### Promotion & Approval (PROMOT)
|
||||
|
||||
```
|
||||
POST /api/v1/promotions
|
||||
GET /api/v1/promotions
|
||||
GET /api/v1/promotions/{id}
|
||||
POST /api/v1/promotions/{id}/approve
|
||||
POST /api/v1/promotions/{id}/reject
|
||||
POST /api/v1/promotions/{id}/cancel
|
||||
GET /api/v1/promotions/{id}/decision
|
||||
GET /api/v1/promotions/{id}/approvals
|
||||
GET /api/v1/promotions/{id}/evidence
|
||||
POST /api/v1/promotions/preview-gates
|
||||
POST /api/v1/approval-policies
|
||||
GET /api/v1/approval-policies
|
||||
GET /api/v1/my/pending-approvals
|
||||
```
|
||||
|
||||
### Deployment (DEPLOY)
|
||||
|
||||
```
|
||||
GET /api/v1/deployment-jobs
|
||||
GET /api/v1/deployment-jobs/{id}
|
||||
GET /api/v1/deployment-jobs/{id}/tasks
|
||||
GET /api/v1/deployment-jobs/{id}/tasks/{taskId}
|
||||
GET /api/v1/deployment-jobs/{id}/tasks/{taskId}/logs
|
||||
GET /api/v1/deployment-jobs/{id}/artifacts
|
||||
GET /api/v1/deployment-jobs/{id}/artifacts/{artifactId}
|
||||
POST /api/v1/rollbacks
|
||||
GET /api/v1/rollbacks
|
||||
```
|
||||
|
||||
### Progressive Delivery (PROGDL)
|
||||
|
||||
```
|
||||
POST /api/v1/ab-releases
|
||||
GET /api/v1/ab-releases
|
||||
GET /api/v1/ab-releases/{id}
|
||||
POST /api/v1/ab-releases/{id}/start
|
||||
POST /api/v1/ab-releases/{id}/advance
|
||||
POST /api/v1/ab-releases/{id}/promote
|
||||
POST /api/v1/ab-releases/{id}/rollback
|
||||
GET /api/v1/ab-releases/{id}/traffic
|
||||
GET /api/v1/ab-releases/{id}/health
|
||||
GET /api/v1/rollout-strategies
|
||||
```
|
||||
|
||||
### Release Evidence (RELEVI)
|
||||
|
||||
```
|
||||
GET /api/v1/evidence-packets
|
||||
GET /api/v1/evidence-packets/{id}
|
||||
GET /api/v1/evidence-packets/{id}/download
|
||||
POST /api/v1/audit-reports
|
||||
GET /api/v1/audit-reports/{id}
|
||||
GET /api/v1/audit-reports/{id}/download
|
||||
GET /api/v1/version-stickers
|
||||
GET /api/v1/version-stickers/{id}
|
||||
```
|
||||
|
||||
### Plugin Infrastructure (PLUGIN)
|
||||
|
||||
```
|
||||
GET /api/v1/plugins
|
||||
GET /api/v1/plugins/{id}
|
||||
POST /api/v1/plugins/{id}/enable
|
||||
POST /api/v1/plugins/{id}/disable
|
||||
GET /api/v1/plugins/{id}/health
|
||||
POST /api/v1/plugin-instances
|
||||
GET /api/v1/plugin-instances
|
||||
PUT /api/v1/plugin-instances/{id}
|
||||
DELETE /api/v1/plugin-instances/{id}
|
||||
```
|
||||
|
||||
## WebSocket Endpoints
|
||||
|
||||
```
|
||||
WS /api/v1/workflow-runs/{id}/stream
|
||||
WS /api/v1/deployment-jobs/{id}/stream
|
||||
WS /api/v1/agents/{id}/task-stream
|
||||
WS /api/v1/dashboard/stream
|
||||
```
|
||||
|
||||
## Rate Limits
|
||||
|
||||
| Tier | Requests/minute | Burst |
|
||||
|------|-----------------|-------|
|
||||
| Standard | 1000 | 100 |
|
||||
| Premium | 5000 | 500 |
|
||||
|
||||
Rate limit headers:
|
||||
- `X-RateLimit-Limit`: Request limit
|
||||
- `X-RateLimit-Remaining`: Remaining requests
|
||||
- `X-RateLimit-Reset`: Reset timestamp
|
||||
|
||||
## References
|
||||
|
||||
- [Environments API](environments.md)
|
||||
- [Releases API](releases.md)
|
||||
- [Promotions API](promotions.md)
|
||||
- [Workflows API](workflows.md)
|
||||
- [Agents API](agents.md)
|
||||
- [WebSocket API](websockets.md)
|
||||
317
docs/modules/release-orchestrator/api/promotions.md
Normal file
317
docs/modules/release-orchestrator/api/promotions.md
Normal file
@@ -0,0 +1,317 @@
|
||||
# Promotion & Approval APIs
|
||||
|
||||
> API endpoints for managing promotions, approvals, and gate evaluations.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 6.3.5](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Promotion Manager Module](../modules/promotion-manager.md), [Workflow Promotion](../workflow/promotion.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The Promotion API provides endpoints for requesting release promotions between environments, managing approvals, and evaluating promotion gates. Promotions enforce separation of duties (SoD) and require configured approvals before deployment proceeds.
|
||||
|
||||
---
|
||||
|
||||
## Promotion Endpoints
|
||||
|
||||
### Create Promotion Request
|
||||
|
||||
**Endpoint:** `POST /api/v1/promotions`
|
||||
|
||||
Initiates a promotion request for a release to a target environment.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"releaseId": "uuid",
|
||||
"targetEnvironmentId": "uuid",
|
||||
"reason": "Deploying v2.3.1 with critical bug fix"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `201 Created`
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"releaseId": "uuid",
|
||||
"releaseName": "myapp-v2.3.1",
|
||||
"sourceEnvironmentId": "uuid",
|
||||
"sourceEnvironmentName": "Staging",
|
||||
"targetEnvironmentId": "uuid",
|
||||
"targetEnvironmentName": "Production",
|
||||
"status": "pending",
|
||||
"requestedBy": "user-uuid",
|
||||
"requestedAt": "2026-01-10T14:23:45Z",
|
||||
"reason": "Deploying v2.3.1 with critical bug fix"
|
||||
}
|
||||
```
|
||||
|
||||
**Status Flow:**
|
||||
```
|
||||
pending -> awaiting_approval -> approved -> deploying -> deployed
|
||||
-> rejected
|
||||
-> cancelled
|
||||
-> failed
|
||||
-> rolled_back
|
||||
```
|
||||
|
||||
### List Promotions
|
||||
|
||||
**Endpoint:** `GET /api/v1/promotions`
|
||||
|
||||
**Query Parameters:**
|
||||
- `status` (string): Filter by status
|
||||
- `releaseId` (UUID): Filter by release
|
||||
- `environmentId` (UUID): Filter by target environment
|
||||
- `page` (number): Page number
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"id": "uuid",
|
||||
"releaseName": "myapp-v2.3.1",
|
||||
"targetEnvironmentName": "Production",
|
||||
"status": "awaiting_approval",
|
||||
"requestedAt": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
],
|
||||
"meta": { "page": 1, "totalCount": 25 }
|
||||
}
|
||||
```
|
||||
|
||||
### Get Promotion
|
||||
|
||||
**Endpoint:** `GET /api/v1/promotions/{id}`
|
||||
|
||||
**Response:** `200 OK` - Full promotion with decision record and approvals
|
||||
|
||||
### Approve Promotion
|
||||
|
||||
**Endpoint:** `POST /api/v1/promotions/{id}/approve`
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"comment": "Approved after reviewing security scan results"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"status": "approved",
|
||||
"approvalCount": 2,
|
||||
"requiredApprovals": 2,
|
||||
"decidedAt": "2026-01-10T14:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Notes:**
|
||||
- Separation of Duties (SoD): The user who requested the promotion cannot approve it if `requireSod` is enabled on the environment
|
||||
- Multi-party approval: Promotion proceeds when `approvalCount >= requiredApprovals`
|
||||
|
||||
### Reject Promotion
|
||||
|
||||
**Endpoint:** `POST /api/v1/promotions/{id}/reject`
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"reason": "Security vulnerabilities not addressed"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `200 OK` - Updated promotion with `status: rejected`
|
||||
|
||||
### Cancel Promotion
|
||||
|
||||
**Endpoint:** `POST /api/v1/promotions/{id}/cancel`
|
||||
|
||||
Cancels a pending or awaiting_approval promotion.
|
||||
|
||||
**Response:** `200 OK` - Updated promotion with `status: cancelled`
|
||||
|
||||
---
|
||||
|
||||
## Decision & Evidence Endpoints
|
||||
|
||||
### Get Decision Record
|
||||
|
||||
**Endpoint:** `GET /api/v1/promotions/{id}/decision`
|
||||
|
||||
Returns the full decision record including gate evaluations.
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"promotionId": "uuid",
|
||||
"decision": "allow",
|
||||
"decidedAt": "2026-01-10T14:30:00Z",
|
||||
"gates": [
|
||||
{
|
||||
"gateName": "security-gate",
|
||||
"passed": true,
|
||||
"details": {
|
||||
"criticalCount": 0,
|
||||
"highCount": 3,
|
||||
"maxCritical": 0,
|
||||
"maxHigh": 5
|
||||
}
|
||||
},
|
||||
{
|
||||
"gateName": "freeze-window-gate",
|
||||
"passed": true,
|
||||
"details": {
|
||||
"activeFreezeWindow": null
|
||||
}
|
||||
}
|
||||
],
|
||||
"approvals": [
|
||||
{
|
||||
"approverId": "uuid",
|
||||
"approverName": "John Doe",
|
||||
"decision": "approved",
|
||||
"comment": "LGTM",
|
||||
"approvedAt": "2026-01-10T14:28:00Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Get Approvals
|
||||
|
||||
**Endpoint:** `GET /api/v1/promotions/{id}/approvals`
|
||||
|
||||
**Response:** `200 OK` - Array of approval records
|
||||
|
||||
### Get Evidence Packet
|
||||
|
||||
**Endpoint:** `GET /api/v1/promotions/{id}/evidence`
|
||||
|
||||
Returns the signed evidence packet for the promotion decision.
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"type": "release_decision",
|
||||
"version": "1.0",
|
||||
"content": { ... },
|
||||
"contentHash": "sha256:abc...",
|
||||
"signature": "base64-signature",
|
||||
"signatureAlgorithm": "ECDSA-P256-SHA256",
|
||||
"signerKeyRef": "key-id",
|
||||
"generatedAt": "2026-01-10T14:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Gate Preview Endpoints
|
||||
|
||||
### Preview Gate Evaluation
|
||||
|
||||
**Endpoint:** `POST /api/v1/promotions/preview-gates`
|
||||
|
||||
Evaluates gates without creating a promotion (dry run).
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"releaseId": "uuid",
|
||||
"targetEnvironmentId": "uuid"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"wouldPass": false,
|
||||
"gates": [
|
||||
{
|
||||
"gateName": "security-gate",
|
||||
"passed": false,
|
||||
"blocking": true,
|
||||
"message": "3 critical vulnerabilities exceed threshold (max: 0)"
|
||||
},
|
||||
{
|
||||
"gateName": "freeze-window-gate",
|
||||
"passed": true,
|
||||
"blocking": false,
|
||||
"message": "No active freeze window"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Approval Policy Endpoints
|
||||
|
||||
### Create Approval Policy
|
||||
|
||||
**Endpoint:** `POST /api/v1/approval-policies`
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"name": "production-policy",
|
||||
"environmentId": "uuid",
|
||||
"requiredApprovals": 2,
|
||||
"approverGroups": ["release-managers", "sre-team"],
|
||||
"requireSeparationOfDuties": true,
|
||||
"autoExpireHours": 24
|
||||
}
|
||||
```
|
||||
|
||||
### List Approval Policies
|
||||
|
||||
**Endpoint:** `GET /api/v1/approval-policies`
|
||||
|
||||
### Get Approval Policy
|
||||
|
||||
**Endpoint:** `GET /api/v1/approval-policies/{id}`
|
||||
|
||||
### Update Approval Policy
|
||||
|
||||
**Endpoint:** `PUT /api/v1/approval-policies/{id}`
|
||||
|
||||
### Delete Approval Policy
|
||||
|
||||
**Endpoint:** `DELETE /api/v1/approval-policies/{id}`
|
||||
|
||||
---
|
||||
|
||||
## Current User Endpoints
|
||||
|
||||
### Get My Pending Approvals
|
||||
|
||||
**Endpoint:** `GET /api/v1/my/pending-approvals`
|
||||
|
||||
Returns promotions awaiting approval from the current user.
|
||||
|
||||
**Response:** `200 OK` - Array of promotions
|
||||
|
||||
---
|
||||
|
||||
## Error Responses
|
||||
|
||||
| Status Code | Description |
|
||||
|-------------|-------------|
|
||||
| `400` | Invalid promotion request |
|
||||
| `403` | User cannot approve (SoD violation or not in approver list) |
|
||||
| `404` | Promotion not found |
|
||||
| `409` | Promotion already decided |
|
||||
| `422` | Gate evaluation failed |
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Workflows API](workflows.md)
|
||||
- [Releases API](releases.md)
|
||||
- [Promotion Manager Module](../modules/promotion-manager.md)
|
||||
- [Security Gates](../modules/promotion-manager.md#security-gate)
|
||||
345
docs/modules/release-orchestrator/api/releases.md
Normal file
345
docs/modules/release-orchestrator/api/releases.md
Normal file
@@ -0,0 +1,345 @@
|
||||
# Release Management APIs
|
||||
|
||||
> API endpoints for managing components, versions, and release bundles.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 6.3.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Release Manager Module](../modules/release-manager.md), [Integration Hub](../modules/integration-hub.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The Release Management API provides endpoints for managing container components, version tracking, and release bundle creation. All releases are identified by immutable OCI digests, ensuring cryptographic verification throughout the deployment pipeline.
|
||||
|
||||
> **Design Principle:** Release identity is established via digest, not tag. Tags are human-friendly aliases; digests are the source of truth.
|
||||
|
||||
---
|
||||
|
||||
## Component Endpoints
|
||||
|
||||
### Create Component
|
||||
|
||||
**Endpoint:** `POST /api/v1/components`
|
||||
|
||||
Registers a new container component for release management.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"name": "api",
|
||||
"displayName": "API Service",
|
||||
"imageRepository": "myorg/api",
|
||||
"registryIntegrationId": "uuid",
|
||||
"versioningStrategy": "semver",
|
||||
"defaultChannel": "stable"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `201 Created`
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"name": "api",
|
||||
"displayName": "API Service",
|
||||
"imageRepository": "myorg/api",
|
||||
"registryIntegrationId": "uuid",
|
||||
"versioningStrategy": "semver",
|
||||
"createdAt": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
### List Components
|
||||
|
||||
**Endpoint:** `GET /api/v1/components`
|
||||
|
||||
**Response:** `200 OK` - Array of components
|
||||
|
||||
### Get Component
|
||||
|
||||
**Endpoint:** `GET /api/v1/components/{id}`
|
||||
|
||||
### Update Component
|
||||
|
||||
**Endpoint:** `PUT /api/v1/components/{id}`
|
||||
|
||||
### Delete Component
|
||||
|
||||
**Endpoint:** `DELETE /api/v1/components/{id}`
|
||||
|
||||
### Sync Versions
|
||||
|
||||
**Endpoint:** `POST /api/v1/components/{id}/sync-versions`
|
||||
|
||||
Triggers a refresh of available versions from the container registry.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"forceRefresh": true
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"synced": 15,
|
||||
"versions": [
|
||||
{
|
||||
"tag": "v2.3.1",
|
||||
"digest": "sha256:abc123...",
|
||||
"semver": "2.3.1",
|
||||
"channel": "stable",
|
||||
"pushedAt": "2026-01-09T10:00:00Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### List Component Versions
|
||||
|
||||
**Endpoint:** `GET /api/v1/components/{id}/versions`
|
||||
|
||||
**Query Parameters:**
|
||||
- `channel` (string): Filter by channel (`stable`, `beta`, `rc`)
|
||||
- `limit` (number): Maximum versions to return
|
||||
|
||||
**Response:** `200 OK` - Array of version maps
|
||||
|
||||
---
|
||||
|
||||
## Version Map Endpoints
|
||||
|
||||
### Create Version Map
|
||||
|
||||
**Endpoint:** `POST /api/v1/version-maps`
|
||||
|
||||
Manually assign a semver and channel to a tag/digest.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"componentId": "uuid",
|
||||
"tag": "v2.3.1",
|
||||
"semver": "2.3.1",
|
||||
"channel": "stable"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `201 Created`
|
||||
|
||||
### List Version Maps
|
||||
|
||||
**Endpoint:** `GET /api/v1/version-maps`
|
||||
|
||||
**Query Parameters:**
|
||||
- `componentId` (UUID): Filter by component
|
||||
- `channel` (string): Filter by channel
|
||||
|
||||
---
|
||||
|
||||
## Release Endpoints
|
||||
|
||||
### Create Release
|
||||
|
||||
**Endpoint:** `POST /api/v1/releases`
|
||||
|
||||
Creates a new release bundle with specified component versions.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"name": "myapp-v2.3.1",
|
||||
"displayName": "My App 2.3.1",
|
||||
"components": [
|
||||
{ "componentId": "uuid", "version": "2.3.1" },
|
||||
{ "componentId": "uuid", "digest": "sha256:def456..." },
|
||||
{ "componentId": "uuid", "channel": "stable" }
|
||||
],
|
||||
"sourceRef": {
|
||||
"scmIntegrationId": "uuid",
|
||||
"repository": "myorg/myapp",
|
||||
"branch": "main",
|
||||
"commitSha": "abc123"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `201 Created`
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"name": "myapp-v2.3.1",
|
||||
"displayName": "My App 2.3.1",
|
||||
"status": "draft",
|
||||
"components": [
|
||||
{
|
||||
"componentId": "uuid",
|
||||
"componentName": "api",
|
||||
"version": "2.3.1",
|
||||
"digest": "sha256:abc123...",
|
||||
"channel": "stable"
|
||||
}
|
||||
],
|
||||
"createdAt": "2026-01-10T14:23:45Z",
|
||||
"createdBy": "user-uuid"
|
||||
}
|
||||
```
|
||||
|
||||
### Create Release from Latest
|
||||
|
||||
**Endpoint:** `POST /api/v1/releases/from-latest`
|
||||
|
||||
Convenience endpoint to create a release from the latest versions of all (or specified) components.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"name": "myapp-latest",
|
||||
"channel": "stable",
|
||||
"componentIds": ["uuid1", "uuid2"],
|
||||
"pinFrom": {
|
||||
"environmentId": "uuid"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `201 Created` - Release with resolved digests
|
||||
|
||||
### List Releases
|
||||
|
||||
**Endpoint:** `GET /api/v1/releases`
|
||||
|
||||
**Query Parameters:**
|
||||
- `status` (string): Filter by status (`draft`, `ready`, `promoting`, `deployed`, `deprecated`)
|
||||
- `componentId` (UUID): Filter by component inclusion
|
||||
- `page` (number): Page number
|
||||
- `pageSize` (number): Items per page
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"id": "uuid",
|
||||
"name": "myapp-v2.3.1",
|
||||
"status": "deployed",
|
||||
"componentCount": 3,
|
||||
"createdAt": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
],
|
||||
"meta": {
|
||||
"page": 1,
|
||||
"pageSize": 20,
|
||||
"totalCount": 150,
|
||||
"totalPages": 8
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Get Release
|
||||
|
||||
**Endpoint:** `GET /api/v1/releases/{id}`
|
||||
|
||||
**Response:** `200 OK` - Full release with component details
|
||||
|
||||
### Update Release
|
||||
|
||||
**Endpoint:** `PUT /api/v1/releases/{id}`
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"displayName": "Updated Display Name",
|
||||
"metadata": { "key": "value" },
|
||||
"status": "ready"
|
||||
}
|
||||
```
|
||||
|
||||
### Delete Release
|
||||
|
||||
**Endpoint:** `DELETE /api/v1/releases/{id}`
|
||||
|
||||
### Get Release State
|
||||
|
||||
**Endpoint:** `GET /api/v1/releases/{id}/state`
|
||||
|
||||
Returns the deployment state of a release across environments.
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"environments": [
|
||||
{
|
||||
"environmentId": "uuid",
|
||||
"environmentName": "Development",
|
||||
"status": "deployed",
|
||||
"deployedAt": "2026-01-09T10:00:00Z"
|
||||
},
|
||||
{
|
||||
"environmentId": "uuid",
|
||||
"environmentName": "Staging",
|
||||
"status": "deployed",
|
||||
"deployedAt": "2026-01-10T08:00:00Z"
|
||||
},
|
||||
{
|
||||
"environmentId": "uuid",
|
||||
"environmentName": "Production",
|
||||
"status": "not_deployed"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Deprecate Release
|
||||
|
||||
**Endpoint:** `POST /api/v1/releases/{id}/deprecate`
|
||||
|
||||
Marks a release as deprecated, preventing new promotions.
|
||||
|
||||
**Response:** `200 OK` - Updated release with `status: deprecated`
|
||||
|
||||
### Compare Releases
|
||||
|
||||
**Endpoint:** `GET /api/v1/releases/{id}/compare/{otherId}`
|
||||
|
||||
Compares two releases to identify component differences.
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"added": [
|
||||
{ "componentId": "uuid", "componentName": "worker" }
|
||||
],
|
||||
"removed": [
|
||||
{ "componentId": "uuid", "componentName": "legacy-service" }
|
||||
],
|
||||
"changed": [
|
||||
{
|
||||
"component": "api",
|
||||
"fromVersion": "2.3.0",
|
||||
"toVersion": "2.3.1",
|
||||
"fromDigest": "sha256:old...",
|
||||
"toDigest": "sha256:new..."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Responses
|
||||
|
||||
| Status Code | Description |
|
||||
|-------------|-------------|
|
||||
| `400` | Invalid release configuration |
|
||||
| `404` | Release or component not found |
|
||||
| `409` | Release name already exists |
|
||||
| `422` | Cannot resolve component version |
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Promotions API](promotions.md)
|
||||
- [Release Manager Module](../modules/release-manager.md)
|
||||
- [Integration Hub](../modules/integration-hub.md)
|
||||
- [Design Principles](../design/principles.md)
|
||||
374
docs/modules/release-orchestrator/api/websockets.md
Normal file
374
docs/modules/release-orchestrator/api/websockets.md
Normal file
@@ -0,0 +1,374 @@
|
||||
# Real-Time APIs (WebSocket/SSE)
|
||||
|
||||
> WebSocket and Server-Sent Events endpoints for real-time updates.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 6.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Workflow Execution](../workflow/execution.md), [UI Dashboard](../ui/dashboard.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The Release Orchestrator provides real-time streaming endpoints for workflow runs, deployment progress, agent tasks, and dashboard metrics. These endpoints support both WebSocket connections and Server-Sent Events (SSE) for browser compatibility.
|
||||
|
||||
---
|
||||
|
||||
## Authentication
|
||||
|
||||
All WebSocket and SSE connections require authentication via JWT token:
|
||||
|
||||
**WebSocket:** Token in query parameter or first message
|
||||
```
|
||||
ws://api/v1/workflow-runs/{id}/stream?token=jwt-token
|
||||
```
|
||||
|
||||
**SSE:** Token in Authorization header
|
||||
```
|
||||
GET /api/v1/dashboard/stream
|
||||
Authorization: Bearer jwt-token
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow Run Stream
|
||||
|
||||
**Endpoint:** `WS /api/v1/workflow-runs/{id}/stream`
|
||||
|
||||
Streams real-time updates for a workflow run including step progress and logs.
|
||||
|
||||
### Message Types (Server to Client)
|
||||
|
||||
**Step Started:**
|
||||
```json
|
||||
{
|
||||
"type": "step_started",
|
||||
"nodeId": "security-check",
|
||||
"stepType": "security-gate",
|
||||
"timestamp": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Step Progress:**
|
||||
```json
|
||||
{
|
||||
"type": "step_progress",
|
||||
"nodeId": "deploy",
|
||||
"progress": 50,
|
||||
"message": "Deploying to target 3/6"
|
||||
}
|
||||
```
|
||||
|
||||
**Step Log:**
|
||||
```json
|
||||
{
|
||||
"type": "step_log",
|
||||
"nodeId": "deploy",
|
||||
"line": "Pulling image sha256:abc123...",
|
||||
"level": "info",
|
||||
"timestamp": "2026-01-10T14:23:50Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Step Completed:**
|
||||
```json
|
||||
{
|
||||
"type": "step_completed",
|
||||
"nodeId": "security-check",
|
||||
"status": "succeeded",
|
||||
"outputs": {
|
||||
"criticalCount": 0,
|
||||
"highCount": 3
|
||||
},
|
||||
"duration": 5.2,
|
||||
"timestamp": "2026-01-10T14:23:50Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Workflow Completed:**
|
||||
```json
|
||||
{
|
||||
"type": "workflow_completed",
|
||||
"status": "succeeded",
|
||||
"duration": 125.5,
|
||||
"outputs": {
|
||||
"deploymentId": "uuid"
|
||||
},
|
||||
"timestamp": "2026-01-10T14:25:50Z"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Job Stream
|
||||
|
||||
**Endpoint:** `WS /api/v1/deployment-jobs/{id}/stream`
|
||||
|
||||
Streams real-time updates for deployment job execution.
|
||||
|
||||
### Message Types (Server to Client)
|
||||
|
||||
**Task Started:**
|
||||
```json
|
||||
{
|
||||
"type": "task_started",
|
||||
"taskId": "uuid",
|
||||
"targetId": "uuid",
|
||||
"targetName": "prod-web-01",
|
||||
"taskType": "docker.pull",
|
||||
"timestamp": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Task Progress:**
|
||||
```json
|
||||
{
|
||||
"type": "task_progress",
|
||||
"taskId": "uuid",
|
||||
"progress": 75,
|
||||
"message": "Pulling layer 4/5"
|
||||
}
|
||||
```
|
||||
|
||||
**Task Log:**
|
||||
```json
|
||||
{
|
||||
"type": "task_log",
|
||||
"taskId": "uuid",
|
||||
"line": "Container started successfully",
|
||||
"level": "info"
|
||||
}
|
||||
```
|
||||
|
||||
**Task Completed:**
|
||||
```json
|
||||
{
|
||||
"type": "task_completed",
|
||||
"taskId": "uuid",
|
||||
"targetId": "uuid",
|
||||
"status": "succeeded",
|
||||
"duration": 45.2,
|
||||
"result": {
|
||||
"containerId": "abc123",
|
||||
"digest": "sha256:..."
|
||||
},
|
||||
"timestamp": "2026-01-10T14:24:30Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Job Completed:**
|
||||
```json
|
||||
{
|
||||
"type": "job_completed",
|
||||
"status": "succeeded",
|
||||
"targetsDeployed": 4,
|
||||
"targetsFailed": 0,
|
||||
"duration": 180.5,
|
||||
"timestamp": "2026-01-10T14:26:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Agent Task Stream
|
||||
|
||||
**Endpoint:** `WS /api/v1/agents/{id}/task-stream`
|
||||
|
||||
Bidirectional stream for agent task assignment and progress reporting.
|
||||
|
||||
### Message Types (Server to Agent)
|
||||
|
||||
**Task Assigned:**
|
||||
```json
|
||||
{
|
||||
"type": "task_assigned",
|
||||
"task": {
|
||||
"taskId": "uuid",
|
||||
"taskType": "docker.pull",
|
||||
"payload": {
|
||||
"image": "myapp",
|
||||
"digest": "sha256:abc123..."
|
||||
},
|
||||
"credentials": {
|
||||
"registry.username": "user",
|
||||
"registry.password": "token"
|
||||
},
|
||||
"timeout": 300
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Task Cancelled:**
|
||||
```json
|
||||
{
|
||||
"type": "task_cancelled",
|
||||
"taskId": "uuid",
|
||||
"reason": "Deployment cancelled by user"
|
||||
}
|
||||
```
|
||||
|
||||
### Message Types (Agent to Server)
|
||||
|
||||
**Task Progress:**
|
||||
```json
|
||||
{
|
||||
"type": "task_progress",
|
||||
"taskId": "uuid",
|
||||
"progress": 50,
|
||||
"message": "Pulling image layer 3/5"
|
||||
}
|
||||
```
|
||||
|
||||
**Task Log:**
|
||||
```json
|
||||
{
|
||||
"type": "task_log",
|
||||
"taskId": "uuid",
|
||||
"level": "info",
|
||||
"message": "Image layer downloaded: sha256:def456..."
|
||||
}
|
||||
```
|
||||
|
||||
**Task Completed:**
|
||||
```json
|
||||
{
|
||||
"type": "task_completed",
|
||||
"taskId": "uuid",
|
||||
"success": true,
|
||||
"result": {
|
||||
"imageId": "sha256:abc123..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dashboard Metrics Stream
|
||||
|
||||
**Endpoint:** `WS /api/v1/dashboard/stream`
|
||||
|
||||
Streams real-time dashboard metrics and alerts.
|
||||
|
||||
### Message Types (Server to Client)
|
||||
|
||||
**Metric Update:**
|
||||
```json
|
||||
{
|
||||
"type": "metric_update",
|
||||
"metrics": {
|
||||
"pipelineStatus": [
|
||||
{ "environmentId": "uuid", "name": "Production", "health": "healthy" }
|
||||
],
|
||||
"pendingApprovals": 3,
|
||||
"activeDeployments": 1,
|
||||
"recentReleases": 12,
|
||||
"systemHealth": {
|
||||
"agentsOnline": 8,
|
||||
"agentsTotal": 10,
|
||||
"queueDepth": 5
|
||||
}
|
||||
},
|
||||
"timestamp": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Alert:**
|
||||
```json
|
||||
{
|
||||
"type": "alert",
|
||||
"alert": {
|
||||
"id": "uuid",
|
||||
"severity": "warning",
|
||||
"title": "Deployment Failed",
|
||||
"message": "Deployment to Production failed: health check timeout",
|
||||
"resourceType": "deployment",
|
||||
"resourceId": "uuid",
|
||||
"timestamp": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Promotion Update:**
|
||||
```json
|
||||
{
|
||||
"type": "promotion_update",
|
||||
"promotion": {
|
||||
"id": "uuid",
|
||||
"releaseName": "myapp-v2.3.1",
|
||||
"targetEnvironment": "Production",
|
||||
"status": "awaiting_approval",
|
||||
"requestedBy": "John Doe"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Connection Management
|
||||
|
||||
### Reconnection
|
||||
|
||||
Clients should implement exponential backoff reconnection:
|
||||
|
||||
```javascript
|
||||
const connect = (retryCount = 0) => {
|
||||
const ws = new WebSocket(url);
|
||||
|
||||
ws.onclose = () => {
|
||||
const delay = Math.min(1000 * Math.pow(2, retryCount), 30000);
|
||||
setTimeout(() => connect(retryCount + 1), delay);
|
||||
};
|
||||
|
||||
ws.onopen = () => {
|
||||
retryCount = 0;
|
||||
};
|
||||
};
|
||||
```
|
||||
|
||||
### Heartbeat
|
||||
|
||||
WebSocket connections receive periodic heartbeat messages:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "heartbeat",
|
||||
"timestamp": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
Clients should respond with:
|
||||
```json
|
||||
{
|
||||
"type": "pong"
|
||||
}
|
||||
```
|
||||
|
||||
Connections without pong response within 30 seconds are terminated.
|
||||
|
||||
---
|
||||
|
||||
## Error Messages
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "error",
|
||||
"code": "unauthorized",
|
||||
"message": "Token expired",
|
||||
"timestamp": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
| Error Code | Description |
|
||||
|------------|-------------|
|
||||
| `unauthorized` | Invalid or expired token |
|
||||
| `forbidden` | No access to resource |
|
||||
| `not_found` | Resource not found |
|
||||
| `rate_limited` | Too many connections |
|
||||
| `internal_error` | Server error |
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Workflows API](workflows.md)
|
||||
- [Agents API](agents.md)
|
||||
- [UI Dashboard](../ui/dashboard.md)
|
||||
- [Workflow Execution](../workflow/execution.md)
|
||||
354
docs/modules/release-orchestrator/api/workflows.md
Normal file
354
docs/modules/release-orchestrator/api/workflows.md
Normal file
@@ -0,0 +1,354 @@
|
||||
# Workflow APIs
|
||||
|
||||
> API endpoints for managing workflow templates, step registry, and workflow runs.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 6.3.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Workflow Engine Module](../modules/workflow-engine.md), [Workflow Templates](../workflow/templates.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The Workflow API provides endpoints for managing workflow templates (DAG definitions), discovering available step types, and executing workflow runs. Workflows are directed acyclic graphs (DAGs) of steps that orchestrate promotions, deployments, and other automation tasks.
|
||||
|
||||
---
|
||||
|
||||
## Workflow Template Endpoints
|
||||
|
||||
### Create Workflow Template
|
||||
|
||||
**Endpoint:** `POST /api/v1/workflow-templates`
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"name": "standard-promotion",
|
||||
"displayName": "Standard Promotion Workflow",
|
||||
"description": "Default workflow for promoting releases",
|
||||
"nodes": [
|
||||
{
|
||||
"id": "security-check",
|
||||
"type": "security-gate",
|
||||
"name": "Security Check",
|
||||
"config": {
|
||||
"maxCritical": 0,
|
||||
"maxHigh": 5
|
||||
},
|
||||
"position": { "x": 100, "y": 100 }
|
||||
},
|
||||
{
|
||||
"id": "approval",
|
||||
"type": "approval",
|
||||
"name": "Manager Approval",
|
||||
"config": {
|
||||
"approvers": ["manager-group"],
|
||||
"minApprovals": 1
|
||||
},
|
||||
"position": { "x": 300, "y": 100 }
|
||||
},
|
||||
{
|
||||
"id": "deploy",
|
||||
"type": "deploy",
|
||||
"name": "Deploy to Target",
|
||||
"config": {
|
||||
"strategy": "rolling",
|
||||
"batchSize": "25%"
|
||||
},
|
||||
"position": { "x": 500, "y": 100 }
|
||||
}
|
||||
],
|
||||
"edges": [
|
||||
{ "from": "security-check", "to": "approval" },
|
||||
{ "from": "approval", "to": "deploy" }
|
||||
],
|
||||
"inputs": [
|
||||
{ "name": "releaseId", "type": "uuid", "required": true },
|
||||
{ "name": "environmentId", "type": "uuid", "required": true }
|
||||
],
|
||||
"outputs": [
|
||||
{ "name": "deploymentId", "type": "uuid" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `201 Created`
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"name": "standard-promotion",
|
||||
"displayName": "Standard Promotion Workflow",
|
||||
"version": 1,
|
||||
"nodeCount": 3,
|
||||
"isActive": true,
|
||||
"createdAt": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
### List Workflow Templates
|
||||
|
||||
**Endpoint:** `GET /api/v1/workflow-templates`
|
||||
|
||||
**Query Parameters:**
|
||||
- `includeBuiltin` (boolean): Include system-provided templates
|
||||
- `tags` (string): Filter by tags
|
||||
|
||||
**Response:** `200 OK` - Array of workflow templates
|
||||
|
||||
### Get Workflow Template
|
||||
|
||||
**Endpoint:** `GET /api/v1/workflow-templates/{id}`
|
||||
|
||||
**Response:** `200 OK` - Full template with nodes and edges
|
||||
|
||||
### Update Workflow Template
|
||||
|
||||
**Endpoint:** `PUT /api/v1/workflow-templates/{id}`
|
||||
|
||||
Creates a new version of the template.
|
||||
|
||||
**Request:** Partial or full template definition
|
||||
|
||||
**Response:** `200 OK` - New version of template
|
||||
|
||||
### Delete Workflow Template
|
||||
|
||||
**Endpoint:** `DELETE /api/v1/workflow-templates/{id}`
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{ "deleted": true }
|
||||
```
|
||||
|
||||
### Validate Workflow Template
|
||||
|
||||
**Endpoint:** `POST /api/v1/workflow-templates/{id}/validate`
|
||||
|
||||
Validates a template with sample inputs.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"inputs": {
|
||||
"releaseId": "sample-uuid",
|
||||
"environmentId": "sample-uuid"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"valid": true,
|
||||
"errors": []
|
||||
}
|
||||
```
|
||||
|
||||
Or on validation failure:
|
||||
```json
|
||||
{
|
||||
"valid": false,
|
||||
"errors": [
|
||||
{ "nodeId": "deploy", "field": "config.strategy", "message": "Invalid strategy: unknown" },
|
||||
{ "type": "dag", "message": "Cycle detected: node-a -> node-b -> node-a" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step Registry Endpoints
|
||||
|
||||
### List Step Types
|
||||
|
||||
**Endpoint:** `GET /api/v1/step-types`
|
||||
|
||||
Lists all available step types from core and plugins.
|
||||
|
||||
**Query Parameters:**
|
||||
- `category` (string): Filter by category (`deployment`, `gate`, `notification`, `utility`)
|
||||
- `provider` (string): Filter by provider (`builtin`, `plugin-id`)
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
[
|
||||
{
|
||||
"type": "script",
|
||||
"displayName": "Script",
|
||||
"description": "Execute shell script on target",
|
||||
"category": "utility",
|
||||
"provider": "builtin",
|
||||
"configSchema": { ... }
|
||||
},
|
||||
{
|
||||
"type": "security-gate",
|
||||
"displayName": "Security Gate",
|
||||
"description": "Check vulnerability thresholds",
|
||||
"category": "gate",
|
||||
"provider": "builtin",
|
||||
"configSchema": { ... }
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### Get Step Type
|
||||
|
||||
**Endpoint:** `GET /api/v1/step-types/{type}`
|
||||
|
||||
**Response:** `200 OK` - Full step type with configuration schema
|
||||
|
||||
---
|
||||
|
||||
## Workflow Run Endpoints
|
||||
|
||||
### Start Workflow Run
|
||||
|
||||
**Endpoint:** `POST /api/v1/workflow-runs`
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"templateId": "uuid",
|
||||
"context": {
|
||||
"releaseId": "uuid",
|
||||
"environmentId": "uuid",
|
||||
"variables": {
|
||||
"deploymentTimeout": 600
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `201 Created`
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"templateId": "uuid",
|
||||
"templateVersion": 1,
|
||||
"status": "running",
|
||||
"startedAt": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
### List Workflow Runs
|
||||
|
||||
**Endpoint:** `GET /api/v1/workflow-runs`
|
||||
|
||||
**Query Parameters:**
|
||||
- `status` (string): Filter by status (`pending`, `running`, `succeeded`, `failed`, `cancelled`)
|
||||
- `templateId` (UUID): Filter by template
|
||||
- `page` (number): Page number
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"id": "uuid",
|
||||
"templateName": "standard-promotion",
|
||||
"status": "running",
|
||||
"progress": 66,
|
||||
"startedAt": "2026-01-10T14:23:45Z"
|
||||
}
|
||||
],
|
||||
"meta": { "page": 1, "totalCount": 50 }
|
||||
}
|
||||
```
|
||||
|
||||
### Get Workflow Run
|
||||
|
||||
**Endpoint:** `GET /api/v1/workflow-runs/{id}`
|
||||
|
||||
**Response:** `200 OK` - Full run with step statuses
|
||||
|
||||
### Pause Workflow Run
|
||||
|
||||
**Endpoint:** `POST /api/v1/workflow-runs/{id}/pause`
|
||||
|
||||
Pauses a running workflow at the next step boundary.
|
||||
|
||||
**Response:** `200 OK` - Updated workflow run
|
||||
|
||||
### Resume Workflow Run
|
||||
|
||||
**Endpoint:** `POST /api/v1/workflow-runs/{id}/resume`
|
||||
|
||||
Resumes a paused workflow.
|
||||
|
||||
**Response:** `200 OK` - Updated workflow run
|
||||
|
||||
### Cancel Workflow Run
|
||||
|
||||
**Endpoint:** `POST /api/v1/workflow-runs/{id}/cancel`
|
||||
|
||||
Cancels a running or paused workflow.
|
||||
|
||||
**Response:** `200 OK` - Updated workflow run
|
||||
|
||||
### List Step Runs
|
||||
|
||||
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps`
|
||||
|
||||
**Response:** `200 OK`
|
||||
```json
|
||||
[
|
||||
{
|
||||
"nodeId": "security-check",
|
||||
"stepType": "security-gate",
|
||||
"status": "succeeded",
|
||||
"startedAt": "2026-01-10T14:23:45Z",
|
||||
"completedAt": "2026-01-10T14:23:50Z"
|
||||
},
|
||||
{
|
||||
"nodeId": "approval",
|
||||
"stepType": "approval",
|
||||
"status": "running",
|
||||
"startedAt": "2026-01-10T14:23:50Z"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### Get Step Run
|
||||
|
||||
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}`
|
||||
|
||||
**Response:** `200 OK` - Step run with logs
|
||||
|
||||
### Get Step Logs
|
||||
|
||||
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs`
|
||||
|
||||
**Query Parameters:**
|
||||
- `follow` (boolean): Stream logs in real-time via SSE
|
||||
|
||||
**Response:** `200 OK` - Log content or SSE stream
|
||||
|
||||
### List Step Artifacts
|
||||
|
||||
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts`
|
||||
|
||||
**Response:** `200 OK` - Array of artifacts
|
||||
|
||||
### Download Artifact
|
||||
|
||||
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts/{artifactId}`
|
||||
|
||||
**Response:** Binary download
|
||||
|
||||
---
|
||||
|
||||
## Error Responses
|
||||
|
||||
| Status Code | Description |
|
||||
|-------------|-------------|
|
||||
| `400` | Invalid workflow template |
|
||||
| `404` | Template or run not found |
|
||||
| `409` | Workflow already running |
|
||||
| `422` | DAG validation failed |
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [WebSocket APIs](websockets.md) - Real-time workflow updates
|
||||
- [Workflow Engine Module](../modules/workflow-engine.md)
|
||||
- [Workflow Templates](../workflow/templates.md)
|
||||
- [Workflow Execution](../workflow/execution.md)
|
||||
224
docs/modules/release-orchestrator/appendices/config.md
Normal file
224
docs/modules/release-orchestrator/appendices/config.md
Normal file
@@ -0,0 +1,224 @@
|
||||
# Configuration Reference
|
||||
|
||||
> Environment variables and OPA policy examples for the Release Orchestrator.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 15.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Security Overview](../security/overview.md), [Promotion Manager](../modules/promotion-manager.md)
|
||||
**Sprint:** [101_001 Foundation](../../../../implplan/SPRINT_20260110_101_001_DB_schema_core_tables.md)
|
||||
|
||||
## Overview
|
||||
|
||||
This document provides the configuration reference for the Release Orchestrator, including environment variables and OPA policy examples.
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
### Core Configuration
|
||||
|
||||
```bash
|
||||
# Database
|
||||
STELLA_DATABASE_URL=postgresql://user:pass@host:5432/stella
|
||||
STELLA_REDIS_URL=redis://host:6379
|
||||
STELLA_SECRET_KEY=base64-encoded-32-bytes
|
||||
STELLA_LOG_LEVEL=info
|
||||
STELLA_LOG_FORMAT=json
|
||||
```
|
||||
|
||||
### Authentication (Authority)
|
||||
|
||||
```bash
|
||||
# OAuth/OIDC
|
||||
STELLA_OAUTH_ISSUER=https://auth.example.com
|
||||
STELLA_OAUTH_CLIENT_ID=stella-app
|
||||
STELLA_OAUTH_CLIENT_SECRET=secret
|
||||
```
|
||||
|
||||
### Agents
|
||||
|
||||
```bash
|
||||
# Agent TLS
|
||||
STELLA_AGENT_LISTEN_PORT=8443
|
||||
STELLA_AGENT_TLS_CERT=/path/to/cert.pem
|
||||
STELLA_AGENT_TLS_KEY=/path/to/key.pem
|
||||
STELLA_AGENT_CA_CERT=/path/to/ca.pem
|
||||
```
|
||||
|
||||
### Plugins
|
||||
|
||||
```bash
|
||||
# Plugin configuration
|
||||
STELLA_PLUGIN_DIR=/var/stella/plugins
|
||||
STELLA_PLUGIN_SANDBOX_MEMORY=512m
|
||||
STELLA_PLUGIN_SANDBOX_CPU=1
|
||||
```
|
||||
|
||||
### Integrations
|
||||
|
||||
```bash
|
||||
# Vault integration
|
||||
STELLA_VAULT_ADDR=https://vault.example.com
|
||||
STELLA_VAULT_TOKEN=hvs.xxx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Full Configuration File
|
||||
|
||||
```yaml
|
||||
# stella-config.yaml
|
||||
|
||||
database:
|
||||
url: postgresql://user:pass@host:5432/stella
|
||||
pool_size: 20
|
||||
ssl_mode: require
|
||||
|
||||
redis:
|
||||
url: redis://host:6379
|
||||
prefix: stella
|
||||
|
||||
auth:
|
||||
issuer: https://auth.example.com
|
||||
client_id: stella-app
|
||||
client_secret_ref: vault://secrets/oauth-client-secret
|
||||
|
||||
agents:
|
||||
listen_port: 8443
|
||||
tls:
|
||||
cert_path: /etc/stella/agent.crt
|
||||
key_path: /etc/stella/agent.key
|
||||
ca_path: /etc/stella/ca.crt
|
||||
heartbeat_interval: 30
|
||||
task_timeout: 600
|
||||
|
||||
plugins:
|
||||
directory: /var/stella/plugins
|
||||
sandbox:
|
||||
memory: 512m
|
||||
cpu: 1
|
||||
network: restricted
|
||||
|
||||
evidence:
|
||||
storage_path: /var/stella/evidence
|
||||
signing_key_ref: vault://secrets/evidence-signing-key
|
||||
retention_days: 2555 # 7 years
|
||||
|
||||
logging:
|
||||
level: info
|
||||
format: json
|
||||
output: stdout
|
||||
|
||||
telemetry:
|
||||
enabled: true
|
||||
otlp_endpoint: otel-collector:4317
|
||||
service_name: stella-release-orchestrator
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## OPA Policy Examples
|
||||
|
||||
### Security Gate Policy
|
||||
|
||||
```rego
|
||||
# security_gate.rego
|
||||
package stella.gates.security
|
||||
|
||||
default allow = false
|
||||
|
||||
allow {
|
||||
input.release.components[_].security.reachable_critical == 0
|
||||
input.release.components[_].security.reachable_high == 0
|
||||
}
|
||||
|
||||
deny[msg] {
|
||||
component := input.release.components[_]
|
||||
component.security.reachable_critical > 0
|
||||
msg := sprintf("Component %s has %d reachable critical vulnerabilities",
|
||||
[component.name, component.security.reachable_critical])
|
||||
}
|
||||
```
|
||||
|
||||
### Approval Gate Policy
|
||||
|
||||
```rego
|
||||
# approval_gate.rego
|
||||
package stella.gates.approval
|
||||
|
||||
default allow = false
|
||||
|
||||
allow {
|
||||
count(input.approvals) >= input.environment.required_approvals
|
||||
separation_of_duties_met
|
||||
}
|
||||
|
||||
separation_of_duties_met {
|
||||
not input.environment.require_sod
|
||||
}
|
||||
|
||||
separation_of_duties_met {
|
||||
input.environment.require_sod
|
||||
approver_ids := {a.approver_id | a := input.approvals[_]; a.action == "approved"}
|
||||
not input.promotion.requested_by in approver_ids
|
||||
}
|
||||
```
|
||||
|
||||
### Freeze Window Gate Policy
|
||||
|
||||
```rego
|
||||
# freeze_window_gate.rego
|
||||
package stella.gates.freeze
|
||||
|
||||
default allow = true
|
||||
|
||||
allow = false {
|
||||
window := input.environment.freeze_windows[_]
|
||||
time.now_ns() >= time.parse_rfc3339_ns(window.start)
|
||||
time.now_ns() <= time.parse_rfc3339_ns(window.end)
|
||||
not input.promotion.requested_by in window.exceptions
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Error Codes
|
||||
|
||||
| Code | HTTP Status | Description |
|
||||
|------|-------------|-------------|
|
||||
| `RELEASE_NOT_FOUND` | 404 | Release with specified ID does not exist |
|
||||
| `ENVIRONMENT_NOT_FOUND` | 404 | Environment with specified ID does not exist |
|
||||
| `PROMOTION_BLOCKED` | 403 | Promotion blocked by policy gates |
|
||||
| `APPROVAL_REQUIRED` | 403 | Additional approvals required |
|
||||
| `FREEZE_WINDOW_ACTIVE` | 403 | Environment is in freeze window |
|
||||
| `DIGEST_MISMATCH` | 400 | Image digest does not match expected |
|
||||
| `AGENT_OFFLINE` | 503 | Required agent is offline |
|
||||
| `WORKFLOW_FAILED` | 500 | Workflow execution failed |
|
||||
| `PLUGIN_ERROR` | 500 | Plugin returned an error |
|
||||
| `QUOTA_EXCEEDED` | 429 | Digest analysis quota exceeded |
|
||||
| `VALIDATION_ERROR` | 400 | Request validation failed |
|
||||
| `UNAUTHORIZED` | 401 | Authentication required |
|
||||
| `FORBIDDEN` | 403 | Insufficient permissions |
|
||||
|
||||
---
|
||||
|
||||
## Default Values
|
||||
|
||||
| Setting | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| Agent heartbeat interval | 30s | Frequency of agent heartbeats |
|
||||
| Task timeout | 600s | Maximum time for agent task |
|
||||
| Deployment batch size | 25% | Percentage of targets per batch |
|
||||
| Health check timeout | 60s | Timeout for health checks |
|
||||
| Evidence retention | 7 years | Audit compliance requirement |
|
||||
| Max workflow steps | 50 | Maximum steps per workflow |
|
||||
| Max parallel tasks | 10 | Per-agent concurrent tasks |
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Security Overview](../security/overview.md)
|
||||
- [Promotion Manager](../modules/promotion-manager.md)
|
||||
- [Database Schema](../data-model/schema.md)
|
||||
- [Glossary](glossary.md)
|
||||
296
docs/modules/release-orchestrator/appendices/errors.md
Normal file
296
docs/modules/release-orchestrator/appendices/errors.md
Normal file
@@ -0,0 +1,296 @@
|
||||
# API Error Codes
|
||||
|
||||
## Overview
|
||||
|
||||
All API errors follow a consistent format with error codes for programmatic handling.
|
||||
|
||||
## Error Response Format
|
||||
|
||||
```typescript
|
||||
interface ApiErrorResponse {
|
||||
success: false;
|
||||
error: {
|
||||
code: string; // Machine-readable error code
|
||||
message: string; // Human-readable message
|
||||
details?: object; // Additional context
|
||||
validationErrors?: ValidationError[];
|
||||
};
|
||||
meta: {
|
||||
requestId: string;
|
||||
timestamp: string;
|
||||
};
|
||||
}
|
||||
|
||||
interface ValidationError {
|
||||
field: string;
|
||||
message: string;
|
||||
code: string;
|
||||
}
|
||||
```
|
||||
|
||||
## Error Code Categories
|
||||
|
||||
| Prefix | Category | HTTP Status Range |
|
||||
|--------|----------|-------------------|
|
||||
| `AUTH_` | Authentication | 401 |
|
||||
| `PERM_` | Authorization/Permission | 403 |
|
||||
| `VAL_` | Validation | 400 |
|
||||
| `RES_` | Resource | 404, 409 |
|
||||
| `ENV_` | Environment | 422 |
|
||||
| `REL_` | Release | 422 |
|
||||
| `PROM_` | Promotion | 422 |
|
||||
| `DEPLOY_` | Deployment | 422 |
|
||||
| `GATE_` | Gate | 422 |
|
||||
| `AGT_` | Agent | 422 |
|
||||
| `INT_` | Integration | 422 |
|
||||
| `WF_` | Workflow | 422 |
|
||||
| `SYS_` | System | 500 |
|
||||
|
||||
## Authentication Errors (401)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `AUTH_TOKEN_MISSING` | Authentication token required | No token provided |
|
||||
| `AUTH_TOKEN_INVALID` | Invalid authentication token | Token cannot be parsed |
|
||||
| `AUTH_TOKEN_EXPIRED` | Authentication token expired | Token has expired |
|
||||
| `AUTH_TOKEN_REVOKED` | Authentication token revoked | Token has been revoked |
|
||||
| `AUTH_AGENT_CERT_INVALID` | Invalid agent certificate | Agent mTLS cert invalid |
|
||||
| `AUTH_AGENT_CERT_EXPIRED` | Agent certificate expired | Agent cert has expired |
|
||||
| `AUTH_API_KEY_INVALID` | Invalid API key | API key not recognized |
|
||||
|
||||
## Permission Errors (403)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `PERM_DENIED` | Permission denied | Generic permission denial |
|
||||
| `PERM_RESOURCE_DENIED` | Access to resource denied | Cannot access specific resource |
|
||||
| `PERM_ACTION_DENIED` | Action not permitted | Cannot perform specific action |
|
||||
| `PERM_SCOPE_DENIED` | Outside permitted scope | Action outside user's scope |
|
||||
| `PERM_SOD_VIOLATION` | Separation of duties violation | SoD prevents action |
|
||||
| `PERM_SELF_APPROVAL` | Cannot approve own request | Self-approval not allowed |
|
||||
| `PERM_TENANT_MISMATCH` | Tenant mismatch | Resource belongs to different tenant |
|
||||
|
||||
## Validation Errors (400)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `VAL_REQUIRED_FIELD` | Required field missing | Field is required |
|
||||
| `VAL_INVALID_FORMAT` | Invalid field format | Field format incorrect |
|
||||
| `VAL_INVALID_VALUE` | Invalid field value | Value not in allowed set |
|
||||
| `VAL_TOO_LONG` | Field value too long | Exceeds max length |
|
||||
| `VAL_TOO_SHORT` | Field value too short | Below min length |
|
||||
| `VAL_INVALID_UUID` | Invalid UUID format | Not a valid UUID |
|
||||
| `VAL_INVALID_DIGEST` | Invalid digest format | Not a valid OCI digest |
|
||||
| `VAL_INVALID_SEMVER` | Invalid semver format | Not valid semantic version |
|
||||
| `VAL_INVALID_JSON` | Invalid JSON | Request body not valid JSON |
|
||||
| `VAL_SCHEMA_MISMATCH` | Schema validation failed | Doesn't match schema |
|
||||
|
||||
## Resource Errors (404, 409)
|
||||
|
||||
| Code | Message | HTTP | Description |
|
||||
|------|---------|------|-------------|
|
||||
| `RES_NOT_FOUND` | Resource not found | 404 | Generic not found |
|
||||
| `RES_ENVIRONMENT_NOT_FOUND` | Environment not found | 404 | Environment doesn't exist |
|
||||
| `RES_RELEASE_NOT_FOUND` | Release not found | 404 | Release doesn't exist |
|
||||
| `RES_PROMOTION_NOT_FOUND` | Promotion not found | 404 | Promotion doesn't exist |
|
||||
| `RES_TARGET_NOT_FOUND` | Target not found | 404 | Target doesn't exist |
|
||||
| `RES_AGENT_NOT_FOUND` | Agent not found | 404 | Agent doesn't exist |
|
||||
| `RES_CONFLICT` | Resource conflict | 409 | Resource state conflict |
|
||||
| `RES_ALREADY_EXISTS` | Resource already exists | 409 | Duplicate resource |
|
||||
| `RES_VERSION_CONFLICT` | Version conflict | 409 | Optimistic lock failure |
|
||||
|
||||
## Environment Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `ENV_FROZEN` | Environment is frozen | Deployment blocked by freeze window |
|
||||
| `ENV_FREEZE_ACTIVE` | Active freeze window | Cannot modify during freeze |
|
||||
| `ENV_INVALID_ORDER` | Invalid environment order | Order index conflict |
|
||||
| `ENV_CIRCULAR_PROMOTION` | Circular promotion path | Auto-promote creates cycle |
|
||||
| `ENV_QUOTA_EXCEEDED` | Environment quota exceeded | Max environments reached |
|
||||
|
||||
## Release Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `REL_ALREADY_FINALIZED` | Release already finalized | Cannot modify finalized release |
|
||||
| `REL_NOT_READY` | Release not ready | Release not in ready state |
|
||||
| `REL_DIGEST_MISMATCH` | Digest mismatch | Resolved digest differs |
|
||||
| `REL_TAG_NOT_FOUND` | Tag not found in registry | Cannot resolve tag |
|
||||
| `REL_COMPONENT_MISSING` | Component not found | Referenced component missing |
|
||||
| `REL_INVALID_STATUS_TRANSITION` | Invalid status transition | Status change not allowed |
|
||||
| `REL_DEPRECATED` | Release deprecated | Cannot promote deprecated release |
|
||||
|
||||
## Promotion Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `PROM_ALREADY_EXISTS` | Promotion already pending | Duplicate promotion request |
|
||||
| `PROM_NOT_PENDING` | Promotion not pending | Cannot approve/reject |
|
||||
| `PROM_ALREADY_APPROVED` | Promotion already approved | Already approved |
|
||||
| `PROM_ALREADY_REJECTED` | Promotion already rejected | Already rejected |
|
||||
| `PROM_ALREADY_CANCELLED` | Promotion already cancelled | Already cancelled |
|
||||
| `PROM_DEPLOYING` | Promotion is deploying | Cannot cancel during deploy |
|
||||
| `PROM_INVALID_STATE` | Invalid promotion state | State doesn't allow action |
|
||||
| `PROM_APPROVER_REQUIRED` | Additional approvers required | Insufficient approvals |
|
||||
| `PROM_SKIP_ENVIRONMENT` | Cannot skip environments | Must promote sequentially |
|
||||
|
||||
## Deployment Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `DEPLOY_IN_PROGRESS` | Deployment in progress | Another deployment running |
|
||||
| `DEPLOY_NO_TARGETS` | No targets available | No targets in environment |
|
||||
| `DEPLOY_TARGET_UNHEALTHY` | Target unhealthy | Target failed health check |
|
||||
| `DEPLOY_AGENT_UNAVAILABLE` | Agent unavailable | Required agent offline |
|
||||
| `DEPLOY_ARTIFACT_MISSING` | Deployment artifact missing | Required artifact not found |
|
||||
| `DEPLOY_TIMEOUT` | Deployment timeout | Exceeded timeout |
|
||||
| `DEPLOY_PULL_FAILED` | Image pull failed | Cannot pull container image |
|
||||
| `DEPLOY_DIGEST_VERIFICATION_FAILED` | Digest verification failed | Image tampered |
|
||||
| `DEPLOY_HEALTH_CHECK_FAILED` | Health check failed | Post-deploy health failed |
|
||||
| `DEPLOY_ROLLBACK_IN_PROGRESS` | Rollback in progress | Already rolling back |
|
||||
| `DEPLOY_NOTHING_TO_ROLLBACK` | Nothing to rollback | No previous deployment |
|
||||
|
||||
## Gate Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `GATE_EVALUATION_FAILED` | Gate evaluation failed | Gate cannot be evaluated |
|
||||
| `GATE_SECURITY_BLOCKED` | Blocked by security gate | Security policy violation |
|
||||
| `GATE_POLICY_BLOCKED` | Blocked by policy gate | Custom policy violation |
|
||||
| `GATE_APPROVAL_BLOCKED` | Blocked pending approval | Awaiting approval |
|
||||
| `GATE_TIMEOUT` | Gate evaluation timeout | Evaluation exceeded timeout |
|
||||
|
||||
## Agent Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `AGT_REGISTRATION_FAILED` | Agent registration failed | Cannot register agent |
|
||||
| `AGT_TOKEN_INVALID` | Invalid registration token | Bad or expired token |
|
||||
| `AGT_TOKEN_USED` | Registration token already used | One-time token reused |
|
||||
| `AGT_CERTIFICATE_FAILED` | Certificate issuance failed | Cannot issue certificate |
|
||||
| `AGT_OFFLINE` | Agent offline | Agent not responding |
|
||||
| `AGT_CAPABILITY_MISSING` | Missing capability | Agent lacks required capability |
|
||||
| `AGT_TASK_FAILED` | Task execution failed | Agent task failed |
|
||||
| `AGT_HEARTBEAT_TIMEOUT` | Heartbeat timeout | Agent heartbeat overdue |
|
||||
|
||||
## Integration Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `INT_CONNECTION_FAILED` | Connection failed | Cannot connect to integration |
|
||||
| `INT_AUTH_FAILED` | Authentication failed | Integration auth failed |
|
||||
| `INT_RATE_LIMITED` | Rate limited | Integration rate limit hit |
|
||||
| `INT_TIMEOUT` | Integration timeout | Request timeout |
|
||||
| `INT_INVALID_RESPONSE` | Invalid response | Unexpected response format |
|
||||
| `INT_RESOURCE_NOT_FOUND` | External resource not found | Registry/SCM resource missing |
|
||||
|
||||
## Workflow Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `WF_TEMPLATE_NOT_FOUND` | Workflow template not found | Template doesn't exist |
|
||||
| `WF_TEMPLATE_INVALID` | Invalid workflow template | Template validation failed |
|
||||
| `WF_CYCLE_DETECTED` | Cycle detected in workflow | DAG contains cycle |
|
||||
| `WF_STEP_FAILED` | Workflow step failed | Step execution failed |
|
||||
| `WF_ALREADY_RUNNING` | Workflow already running | Duplicate workflow run |
|
||||
| `WF_INVALID_STATE` | Invalid workflow state | Cannot perform action |
|
||||
| `WF_EXPRESSION_ERROR` | Expression evaluation error | Bad expression |
|
||||
|
||||
## System Errors (500)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `SYS_INTERNAL_ERROR` | Internal server error | Unexpected error |
|
||||
| `SYS_DATABASE_ERROR` | Database error | Database operation failed |
|
||||
| `SYS_STORAGE_ERROR` | Storage error | Storage operation failed |
|
||||
| `SYS_VAULT_ERROR` | Vault error | Secret retrieval failed |
|
||||
| `SYS_QUEUE_ERROR` | Queue error | Message queue failed |
|
||||
| `SYS_SERVICE_UNAVAILABLE` | Service unavailable | Dependency unavailable |
|
||||
| `SYS_OVERLOADED` | System overloaded | Capacity exceeded |
|
||||
|
||||
## Example Error Responses
|
||||
|
||||
### Validation Error
|
||||
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"error": {
|
||||
"code": "VAL_REQUIRED_FIELD",
|
||||
"message": "Validation failed",
|
||||
"validationErrors": [
|
||||
{
|
||||
"field": "releaseId",
|
||||
"message": "Release ID is required",
|
||||
"code": "VAL_REQUIRED_FIELD"
|
||||
},
|
||||
{
|
||||
"field": "targetEnvironmentId",
|
||||
"message": "Invalid UUID format",
|
||||
"code": "VAL_INVALID_UUID"
|
||||
}
|
||||
]
|
||||
},
|
||||
"meta": {
|
||||
"requestId": "req-12345",
|
||||
"timestamp": "2026-01-10T14:30:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Permission Error
|
||||
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"error": {
|
||||
"code": "PERM_SOD_VIOLATION",
|
||||
"message": "Separation of duties violation: requester cannot approve their own promotion",
|
||||
"details": {
|
||||
"promotionId": "promo-uuid",
|
||||
"requesterId": "user-uuid",
|
||||
"approverId": "user-uuid",
|
||||
"environmentId": "env-uuid",
|
||||
"requiresSoD": true
|
||||
}
|
||||
},
|
||||
"meta": {
|
||||
"requestId": "req-12345",
|
||||
"timestamp": "2026-01-10T14:30:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Gate Block Error
|
||||
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"error": {
|
||||
"code": "GATE_SECURITY_BLOCKED",
|
||||
"message": "Promotion blocked by security gate",
|
||||
"details": {
|
||||
"gateName": "security-gate",
|
||||
"releaseId": "rel-uuid",
|
||||
"targetEnvironment": "production",
|
||||
"violations": [
|
||||
{
|
||||
"type": "critical_vulnerability",
|
||||
"count": 3,
|
||||
"threshold": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"meta": {
|
||||
"requestId": "req-12345",
|
||||
"timestamp": "2026-01-10T14:30:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [API Overview](../api/overview.md)
|
||||
- [Security Overview](../security/overview.md)
|
||||
549
docs/modules/release-orchestrator/appendices/evidence-schema.md
Normal file
549
docs/modules/release-orchestrator/appendices/evidence-schema.md
Normal file
@@ -0,0 +1,549 @@
|
||||
# Evidence Packet Schema
|
||||
|
||||
## Overview
|
||||
|
||||
Evidence packets are cryptographically signed, immutable records of deployment decisions and outcomes. They provide audit-grade proof of who did what, when, and why.
|
||||
|
||||
## Evidence Packet Types
|
||||
|
||||
| Type | Description | Generated When |
|
||||
|------|-------------|----------------|
|
||||
| `release_decision` | Promotion decision evidence | Promotion approved/rejected |
|
||||
| `deployment` | Deployment execution evidence | Deployment completes |
|
||||
| `rollback` | Rollback evidence | Rollback completes |
|
||||
| `ab_promotion` | A/B release promotion evidence | A/B promotion completes |
|
||||
|
||||
## Schema Definition
|
||||
|
||||
### Evidence Packet Structure
|
||||
|
||||
```typescript
|
||||
interface EvidencePacket {
|
||||
// Identification
|
||||
id: UUID;
|
||||
version: "1.0";
|
||||
type: EvidencePacketType;
|
||||
|
||||
// Metadata
|
||||
generatedAt: DateTime;
|
||||
generatorVersion: string;
|
||||
tenantId: UUID;
|
||||
|
||||
// Content
|
||||
content: EvidenceContent;
|
||||
|
||||
// Integrity
|
||||
contentHash: string; // SHA-256 of canonical JSON content
|
||||
signature: string; // Base64-encoded signature
|
||||
signatureAlgorithm: string; // "RS256", "ES256"
|
||||
signerKeyRef: string; // Reference to signing key
|
||||
}
|
||||
|
||||
type EvidencePacketType =
|
||||
| "release_decision"
|
||||
| "deployment"
|
||||
| "rollback"
|
||||
| "ab_promotion";
|
||||
```
|
||||
|
||||
### Evidence Content
|
||||
|
||||
```typescript
|
||||
interface EvidenceContent {
|
||||
// What was released
|
||||
release: ReleaseEvidence;
|
||||
|
||||
// Where it was released
|
||||
environment: EnvironmentEvidence;
|
||||
|
||||
// Who requested and approved
|
||||
actors: ActorEvidence;
|
||||
|
||||
// Why it was allowed
|
||||
decision: DecisionEvidence;
|
||||
|
||||
// How it was executed (deployment only)
|
||||
execution?: ExecutionEvidence;
|
||||
|
||||
// Previous state (for rollback)
|
||||
previous?: PreviousStateEvidence;
|
||||
}
|
||||
```
|
||||
|
||||
### Release Evidence
|
||||
|
||||
```typescript
|
||||
interface ReleaseEvidence {
|
||||
id: UUID;
|
||||
name: string;
|
||||
displayName: string;
|
||||
createdAt: DateTime;
|
||||
createdBy: ActorRef;
|
||||
|
||||
components: Array<{
|
||||
id: UUID;
|
||||
name: string;
|
||||
digest: string;
|
||||
semver: string;
|
||||
tag: string;
|
||||
role: "primary" | "sidecar" | "init" | "migration";
|
||||
}>;
|
||||
|
||||
sourceRef?: {
|
||||
scmIntegrationId?: UUID;
|
||||
repository?: string;
|
||||
commitSha?: string;
|
||||
branch?: string;
|
||||
ciIntegrationId?: UUID;
|
||||
buildId?: string;
|
||||
pipelineUrl?: string;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Environment Evidence
|
||||
|
||||
```typescript
|
||||
interface EnvironmentEvidence {
|
||||
id: UUID;
|
||||
name: string;
|
||||
displayName: string;
|
||||
orderIndex: number;
|
||||
|
||||
targets: Array<{
|
||||
id: UUID;
|
||||
name: string;
|
||||
type: string;
|
||||
healthStatus: string;
|
||||
}>;
|
||||
|
||||
configuration: {
|
||||
requiredApprovals: number;
|
||||
requireSeparationOfDuties: boolean;
|
||||
promotionPolicy?: string;
|
||||
deploymentTimeout: number;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Actor Evidence
|
||||
|
||||
```typescript
|
||||
interface ActorEvidence {
|
||||
requester: ActorRef;
|
||||
requestReason: string;
|
||||
requestedAt: DateTime;
|
||||
|
||||
approvers: Array<{
|
||||
actor: ActorRef;
|
||||
action: "approved" | "rejected";
|
||||
comment?: string;
|
||||
timestamp: DateTime;
|
||||
roles: string[];
|
||||
}>;
|
||||
|
||||
deployer?: {
|
||||
agent: AgentRef;
|
||||
triggeredBy: ActorRef;
|
||||
startedAt: DateTime;
|
||||
};
|
||||
}
|
||||
|
||||
interface ActorRef {
|
||||
id: UUID;
|
||||
type: "user" | "system" | "agent";
|
||||
name: string;
|
||||
email?: string;
|
||||
}
|
||||
|
||||
interface AgentRef {
|
||||
id: UUID;
|
||||
name: string;
|
||||
version: string;
|
||||
}
|
||||
```
|
||||
|
||||
### Decision Evidence
|
||||
|
||||
```typescript
|
||||
interface DecisionEvidence {
|
||||
promotionId: UUID;
|
||||
decision: "allow" | "block";
|
||||
decidedAt: DateTime;
|
||||
|
||||
gateResults: Array<{
|
||||
gateName: string;
|
||||
gateType: string;
|
||||
passed: boolean;
|
||||
blocking: boolean;
|
||||
message: string;
|
||||
evaluatedAt: DateTime;
|
||||
details: object;
|
||||
}>;
|
||||
|
||||
freezeWindowCheck: {
|
||||
checked: boolean;
|
||||
windowActive: boolean;
|
||||
windowId?: UUID;
|
||||
exemption?: {
|
||||
grantedBy: UUID;
|
||||
reason: string;
|
||||
};
|
||||
};
|
||||
|
||||
separationOfDuties: {
|
||||
required: boolean;
|
||||
satisfied: boolean;
|
||||
requesterIds: UUID[];
|
||||
approverIds: UUID[];
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Execution Evidence
|
||||
|
||||
```typescript
|
||||
interface ExecutionEvidence {
|
||||
deploymentJobId: UUID;
|
||||
strategy: string;
|
||||
startedAt: DateTime;
|
||||
completedAt: DateTime;
|
||||
status: "succeeded" | "failed" | "rolled_back";
|
||||
|
||||
tasks: Array<{
|
||||
targetId: UUID;
|
||||
targetName: string;
|
||||
agentId: UUID;
|
||||
status: string;
|
||||
startedAt: DateTime;
|
||||
completedAt: DateTime;
|
||||
digest: string;
|
||||
stickerWritten: boolean;
|
||||
error?: string;
|
||||
}>;
|
||||
|
||||
artifacts: Array<{
|
||||
name: string;
|
||||
type: string;
|
||||
sha256: string;
|
||||
storageRef: string;
|
||||
}>;
|
||||
|
||||
metrics: {
|
||||
totalTasks: number;
|
||||
succeededTasks: number;
|
||||
failedTasks: number;
|
||||
totalDurationSeconds: number;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Previous State Evidence
|
||||
|
||||
```typescript
|
||||
interface PreviousStateEvidence {
|
||||
releaseId: UUID;
|
||||
releaseName: string;
|
||||
deployedAt: DateTime;
|
||||
deployedBy: ActorRef;
|
||||
components: Array<{
|
||||
name: string;
|
||||
digest: string;
|
||||
}>;
|
||||
}
|
||||
```
|
||||
|
||||
## Example Evidence Packet
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "evid-12345-uuid",
|
||||
"version": "1.0",
|
||||
"type": "deployment",
|
||||
"generatedAt": "2026-01-10T14:35:00Z",
|
||||
"generatorVersion": "stella-evidence-generator@1.5.0",
|
||||
"tenantId": "tenant-uuid",
|
||||
|
||||
"content": {
|
||||
"release": {
|
||||
"id": "rel-uuid",
|
||||
"name": "myapp-v2.3.1",
|
||||
"displayName": "MyApp v2.3.1",
|
||||
"createdAt": "2026-01-10T10:00:00Z",
|
||||
"createdBy": {
|
||||
"id": "user-uuid",
|
||||
"type": "user",
|
||||
"name": "John Doe",
|
||||
"email": "john@example.com"
|
||||
},
|
||||
"components": [
|
||||
{
|
||||
"id": "comp-api-uuid",
|
||||
"name": "api",
|
||||
"digest": "sha256:abc123def456...",
|
||||
"semver": "2.3.1",
|
||||
"tag": "v2.3.1",
|
||||
"role": "primary"
|
||||
},
|
||||
{
|
||||
"id": "comp-worker-uuid",
|
||||
"name": "worker",
|
||||
"digest": "sha256:789xyz...",
|
||||
"semver": "2.3.1",
|
||||
"tag": "v2.3.1",
|
||||
"role": "primary"
|
||||
}
|
||||
],
|
||||
"sourceRef": {
|
||||
"repository": "github.com/myorg/myapp",
|
||||
"commitSha": "abc123",
|
||||
"branch": "main",
|
||||
"buildId": "build-456"
|
||||
}
|
||||
},
|
||||
|
||||
"environment": {
|
||||
"id": "env-prod-uuid",
|
||||
"name": "production",
|
||||
"displayName": "Production",
|
||||
"orderIndex": 2,
|
||||
"targets": [
|
||||
{
|
||||
"id": "target-1-uuid",
|
||||
"name": "prod-web-01",
|
||||
"type": "compose_host",
|
||||
"healthStatus": "healthy"
|
||||
},
|
||||
{
|
||||
"id": "target-2-uuid",
|
||||
"name": "prod-web-02",
|
||||
"type": "compose_host",
|
||||
"healthStatus": "healthy"
|
||||
}
|
||||
],
|
||||
"configuration": {
|
||||
"requiredApprovals": 2,
|
||||
"requireSeparationOfDuties": true,
|
||||
"deploymentTimeout": 600
|
||||
}
|
||||
},
|
||||
|
||||
"actors": {
|
||||
"requester": {
|
||||
"id": "user-john-uuid",
|
||||
"type": "user",
|
||||
"name": "John Doe",
|
||||
"email": "john@example.com"
|
||||
},
|
||||
"requestReason": "Release v2.3.1 with performance improvements",
|
||||
"requestedAt": "2026-01-10T12:00:00Z",
|
||||
"approvers": [
|
||||
{
|
||||
"actor": {
|
||||
"id": "user-jane-uuid",
|
||||
"type": "user",
|
||||
"name": "Jane Smith",
|
||||
"email": "jane@example.com"
|
||||
},
|
||||
"action": "approved",
|
||||
"comment": "LGTM, tests passed",
|
||||
"timestamp": "2026-01-10T13:00:00Z",
|
||||
"roles": ["release_manager"]
|
||||
},
|
||||
{
|
||||
"actor": {
|
||||
"id": "user-bob-uuid",
|
||||
"type": "user",
|
||||
"name": "Bob Johnson",
|
||||
"email": "bob@example.com"
|
||||
},
|
||||
"action": "approved",
|
||||
"comment": "Approved for production",
|
||||
"timestamp": "2026-01-10T13:30:00Z",
|
||||
"roles": ["approver"]
|
||||
}
|
||||
],
|
||||
"deployer": {
|
||||
"agent": {
|
||||
"id": "agent-prod-uuid",
|
||||
"name": "prod-agent-01",
|
||||
"version": "1.5.0"
|
||||
},
|
||||
"triggeredBy": {
|
||||
"id": "system",
|
||||
"type": "system",
|
||||
"name": "Stella Orchestrator"
|
||||
},
|
||||
"startedAt": "2026-01-10T14:00:00Z"
|
||||
}
|
||||
},
|
||||
|
||||
"decision": {
|
||||
"promotionId": "promo-uuid",
|
||||
"decision": "allow",
|
||||
"decidedAt": "2026-01-10T13:55:00Z",
|
||||
"gateResults": [
|
||||
{
|
||||
"gateName": "security-gate",
|
||||
"gateType": "security",
|
||||
"passed": true,
|
||||
"blocking": true,
|
||||
"message": "No critical or high vulnerabilities",
|
||||
"evaluatedAt": "2026-01-10T13:50:00Z",
|
||||
"details": {
|
||||
"critical": 0,
|
||||
"high": 0,
|
||||
"medium": 5,
|
||||
"low": 12
|
||||
}
|
||||
},
|
||||
{
|
||||
"gateName": "approval-gate",
|
||||
"gateType": "approval",
|
||||
"passed": true,
|
||||
"blocking": true,
|
||||
"message": "2/2 required approvals received",
|
||||
"evaluatedAt": "2026-01-10T13:55:00Z",
|
||||
"details": {
|
||||
"required": 2,
|
||||
"received": 2
|
||||
}
|
||||
}
|
||||
],
|
||||
"freezeWindowCheck": {
|
||||
"checked": true,
|
||||
"windowActive": false
|
||||
},
|
||||
"separationOfDuties": {
|
||||
"required": true,
|
||||
"satisfied": true,
|
||||
"requesterIds": ["user-john-uuid"],
|
||||
"approverIds": ["user-jane-uuid", "user-bob-uuid"]
|
||||
}
|
||||
},
|
||||
|
||||
"execution": {
|
||||
"deploymentJobId": "job-uuid",
|
||||
"strategy": "rolling",
|
||||
"startedAt": "2026-01-10T14:00:00Z",
|
||||
"completedAt": "2026-01-10T14:35:00Z",
|
||||
"status": "succeeded",
|
||||
"tasks": [
|
||||
{
|
||||
"targetId": "target-1-uuid",
|
||||
"targetName": "prod-web-01",
|
||||
"agentId": "agent-prod-uuid",
|
||||
"status": "succeeded",
|
||||
"startedAt": "2026-01-10T14:00:00Z",
|
||||
"completedAt": "2026-01-10T14:15:00Z",
|
||||
"digest": "sha256:abc123def456...",
|
||||
"stickerWritten": true
|
||||
},
|
||||
{
|
||||
"targetId": "target-2-uuid",
|
||||
"targetName": "prod-web-02",
|
||||
"agentId": "agent-prod-uuid",
|
||||
"status": "succeeded",
|
||||
"startedAt": "2026-01-10T14:20:00Z",
|
||||
"completedAt": "2026-01-10T14:35:00Z",
|
||||
"digest": "sha256:abc123def456...",
|
||||
"stickerWritten": true
|
||||
}
|
||||
],
|
||||
"artifacts": [
|
||||
{
|
||||
"name": "compose.stella.lock.yml",
|
||||
"type": "compose-lock",
|
||||
"sha256": "checksum...",
|
||||
"storageRef": "s3://artifacts/job-uuid/compose.stella.lock.yml"
|
||||
}
|
||||
],
|
||||
"metrics": {
|
||||
"totalTasks": 2,
|
||||
"succeededTasks": 2,
|
||||
"failedTasks": 0,
|
||||
"totalDurationSeconds": 2100
|
||||
}
|
||||
}
|
||||
},
|
||||
|
||||
"contentHash": "sha256:content-hash...",
|
||||
"signature": "base64-signature...",
|
||||
"signatureAlgorithm": "RS256",
|
||||
"signerKeyRef": "stella/signing/prod-key-2026"
|
||||
}
|
||||
```
|
||||
|
||||
## Signature Verification
|
||||
|
||||
```typescript
|
||||
async function verifyEvidencePacket(packet: EvidencePacket): Promise<VerificationResult> {
|
||||
// 1. Verify content hash
|
||||
const canonicalContent = canonicalize(packet.content);
|
||||
const computedHash = sha256(canonicalContent);
|
||||
|
||||
if (computedHash !== packet.contentHash) {
|
||||
return { valid: false, error: "Content hash mismatch" };
|
||||
}
|
||||
|
||||
// 2. Get signing key
|
||||
const publicKey = await getPublicKey(packet.signerKeyRef);
|
||||
|
||||
// 3. Verify signature
|
||||
const signatureValid = await verify(
|
||||
packet.signature,
|
||||
packet.contentHash,
|
||||
publicKey,
|
||||
packet.signatureAlgorithm
|
||||
);
|
||||
|
||||
if (!signatureValid) {
|
||||
return { valid: false, error: "Invalid signature" };
|
||||
}
|
||||
|
||||
return { valid: true };
|
||||
}
|
||||
```
|
||||
|
||||
## Storage
|
||||
|
||||
Evidence packets are stored in an append-only table:
|
||||
|
||||
```sql
|
||||
CREATE TABLE release.evidence_packets (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id),
|
||||
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
|
||||
type TEXT NOT NULL,
|
||||
version TEXT NOT NULL DEFAULT '1.0',
|
||||
content JSONB NOT NULL,
|
||||
content_hash TEXT NOT NULL,
|
||||
signature TEXT NOT NULL,
|
||||
signature_algorithm TEXT NOT NULL,
|
||||
signer_key_ref TEXT NOT NULL,
|
||||
generated_at TIMESTAMPTZ NOT NULL,
|
||||
generator_version TEXT NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
-- Note: No updated_at - packets are immutable
|
||||
);
|
||||
|
||||
-- Prevent modifications
|
||||
REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role;
|
||||
```
|
||||
|
||||
## Export Formats
|
||||
|
||||
Evidence packets can be exported in multiple formats:
|
||||
|
||||
| Format | Use Case |
|
||||
|--------|----------|
|
||||
| JSON | API consumption, archival |
|
||||
| PDF | Human-readable compliance reports |
|
||||
| CSV | Spreadsheet analysis |
|
||||
| SLSA | SLSA provenance format |
|
||||
|
||||
## References
|
||||
|
||||
- [Security Overview](../security/overview.md)
|
||||
- [Deployment Artifacts](../deployment/artifacts.md)
|
||||
- [Audit Trail](../security/audit-trail.md)
|
||||
235
docs/modules/release-orchestrator/appendices/glossary.md
Normal file
235
docs/modules/release-orchestrator/appendices/glossary.md
Normal file
@@ -0,0 +1,235 @@
|
||||
# Glossary
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### Agent
|
||||
A software component installed on deployment targets that receives and executes deployment tasks. Agents communicate with the orchestrator via mTLS and execute deployments locally on the target.
|
||||
|
||||
### Approval
|
||||
A human decision to authorize a promotion request. Approvals may require multiple approvers and enforce separation of duties.
|
||||
|
||||
### Approval Policy
|
||||
Rules defining who can approve promotions to specific environments, including required approval counts and SoD requirements.
|
||||
|
||||
### Blue-Green Deployment
|
||||
A deployment strategy using two identical production environments. Traffic switches from "blue" (current) to "green" (new) after validation.
|
||||
|
||||
### Canary Deployment
|
||||
A deployment strategy that gradually rolls out changes to a small subset of targets before full deployment, allowing validation with real traffic.
|
||||
|
||||
### Channel
|
||||
A version stream for components (e.g., "stable", "beta", "nightly"). Each channel tracks the latest compatible version.
|
||||
|
||||
### Component
|
||||
A deployable unit mapped to a container image repository. Components have versions tracked via digest.
|
||||
|
||||
### Compose Lock
|
||||
A Docker Compose file with all image references pinned to specific digests, ensuring reproducible deployments.
|
||||
|
||||
### Connector
|
||||
A plugin that integrates Release Orchestrator with external systems (registries, CI/CD, notifications, etc.).
|
||||
|
||||
### Decision Record
|
||||
An immutable record of all gate evaluations and conditions considered when making a promotion decision.
|
||||
|
||||
### Deployment Job
|
||||
A unit of work representing the deployment of a release to an environment. Contains multiple deployment tasks.
|
||||
|
||||
### Deployment Task
|
||||
A single target-level deployment operation within a deployment job.
|
||||
|
||||
### Digest
|
||||
A cryptographic hash (SHA-256) that uniquely identifies a container image. Format: `sha256:abc123...`
|
||||
|
||||
### Drift
|
||||
A mismatch between the expected deployed version (from version sticker) and the actual running version on a target.
|
||||
|
||||
### Environment
|
||||
A logical grouping of deployment targets representing a stage in the promotion pipeline (e.g., dev, staging, production).
|
||||
|
||||
### Evidence Packet
|
||||
An immutable, cryptographically signed record of deployment decisions and outcomes for audit purposes.
|
||||
|
||||
### Freeze Window
|
||||
A time period during which deployments to an environment are blocked (e.g., holiday code freeze).
|
||||
|
||||
### Gate
|
||||
A checkpoint in the promotion workflow that must pass before deployment proceeds. Types include security gates, approval gates, and custom policy gates.
|
||||
|
||||
### Promotion
|
||||
The process of moving a release from one environment to another, subject to gates and approvals.
|
||||
|
||||
### Release
|
||||
A versioned bundle of component digests representing a deployable unit. Releases are immutable once created.
|
||||
|
||||
### Rolling Deployment
|
||||
A deployment strategy that updates targets in batches, maintaining availability throughout the process.
|
||||
|
||||
### Rollback
|
||||
The process of reverting to a previous release version when a deployment fails or causes issues.
|
||||
|
||||
### Security Gate
|
||||
An automated gate that evaluates security policies (vulnerability thresholds, compliance requirements) before allowing promotion.
|
||||
|
||||
### Separation of Duties (SoD)
|
||||
A security principle requiring that the person who requests a promotion cannot be the same person who approves it.
|
||||
|
||||
### Step
|
||||
A single unit of work within a workflow template. Steps have types (deploy, approve, notify, etc.) and can have dependencies.
|
||||
|
||||
### Target
|
||||
A specific deployment destination (host, service, container) within an environment.
|
||||
|
||||
### Tenant
|
||||
An isolated organizational unit with its own environments, releases, and configurations. Multi-tenancy ensures data isolation.
|
||||
|
||||
### Version Map
|
||||
A mapping of image tags to digests for a component, allowing tag-based references while maintaining digest-based deployments.
|
||||
|
||||
### Version Sticker
|
||||
Metadata placed on deployment targets indicating the currently deployed release and digest.
|
||||
|
||||
### Workflow
|
||||
A DAG (Directed Acyclic Graph) of steps defining the deployment process, including gates, approvals, and verification.
|
||||
|
||||
### Workflow Template
|
||||
A reusable workflow definition that can be customized for specific deployment scenarios.
|
||||
|
||||
## Module Abbreviations
|
||||
|
||||
| Abbreviation | Full Name | Description |
|
||||
|--------------|-----------|-------------|
|
||||
| INTHUB | Integration Hub | External system integration |
|
||||
| ENVMGR | Environment Manager | Environment and target management |
|
||||
| RELMAN | Release Management | Component and release management |
|
||||
| WORKFL | Workflow Engine | Workflow execution |
|
||||
| PROMOT | Promotion & Approval | Promotion and approval handling |
|
||||
| DEPLOY | Deployment Execution | Deployment orchestration |
|
||||
| AGENTS | Deployment Agents | Agent management |
|
||||
| PROGDL | Progressive Delivery | A/B and canary releases |
|
||||
| RELEVI | Release Evidence | Audit and compliance |
|
||||
| PLUGIN | Plugin Infrastructure | Plugin system |
|
||||
|
||||
## Deployment Strategies
|
||||
|
||||
| Strategy | Description |
|
||||
|----------|-------------|
|
||||
| All-at-once | Deploy to all targets simultaneously |
|
||||
| Rolling | Deploy in batches with availability |
|
||||
| Canary | Gradual rollout with metrics validation |
|
||||
| Blue-Green | Parallel environment with traffic switch |
|
||||
|
||||
## Status Values
|
||||
|
||||
### Promotion Status
|
||||
|
||||
| Status | Description |
|
||||
|--------|-------------|
|
||||
| `pending` | Promotion created, not yet evaluated |
|
||||
| `pending_approval` | Waiting for human approval |
|
||||
| `approved` | Approved, ready for deployment |
|
||||
| `rejected` | Rejected by approver |
|
||||
| `deploying` | Deployment in progress |
|
||||
| `completed` | Successfully deployed |
|
||||
| `failed` | Deployment failed |
|
||||
| `cancelled` | Cancelled by user |
|
||||
|
||||
### Deployment Job Status
|
||||
|
||||
| Status | Description |
|
||||
|--------|-------------|
|
||||
| `pending` | Job created, not started |
|
||||
| `preparing` | Generating artifacts |
|
||||
| `running` | Tasks executing |
|
||||
| `completing` | Verifying deployment |
|
||||
| `completed` | Successfully completed |
|
||||
| `failed` | Deployment failed |
|
||||
| `rolling_back` | Rollback in progress |
|
||||
| `rolled_back` | Rollback completed |
|
||||
|
||||
### Agent Status
|
||||
|
||||
| Status | Description |
|
||||
|--------|-------------|
|
||||
| `online` | Agent connected and healthy |
|
||||
| `offline` | Agent not connected |
|
||||
| `degraded` | Agent connected but reporting issues |
|
||||
|
||||
### Target Health Status
|
||||
|
||||
| Status | Description |
|
||||
|--------|-------------|
|
||||
| `healthy` | Target responding correctly |
|
||||
| `unhealthy` | Target failing health checks |
|
||||
| `unknown` | Health status not determined |
|
||||
|
||||
## API Error Codes
|
||||
|
||||
| Code | Description |
|
||||
|------|-------------|
|
||||
| `RELEASE_NOT_FOUND` | Release ID does not exist |
|
||||
| `ENVIRONMENT_NOT_FOUND` | Environment ID does not exist |
|
||||
| `PROMOTION_BLOCKED` | Promotion blocked by gate or freeze |
|
||||
| `APPROVAL_REQUIRED` | Promotion requires approval |
|
||||
| `INSUFFICIENT_APPROVALS` | Not enough approvals |
|
||||
| `SOD_VIOLATION` | Separation of duties violated |
|
||||
| `FREEZE_WINDOW_ACTIVE` | Environment in freeze window |
|
||||
| `SECURITY_GATE_FAILED` | Security requirements not met |
|
||||
| `NO_AGENT_AVAILABLE` | No agent available for target |
|
||||
| `DEPLOYMENT_IN_PROGRESS` | Another deployment running |
|
||||
| `ROLLBACK_NOT_POSSIBLE` | No previous version to rollback to |
|
||||
|
||||
## Integration Types
|
||||
|
||||
| Type | Category | Description |
|
||||
|------|----------|-------------|
|
||||
| `docker-registry` | Registry | Docker Registry v2 |
|
||||
| `ecr` | Registry | AWS ECR |
|
||||
| `acr` | Registry | Azure Container Registry |
|
||||
| `gcr` | Registry | Google Container Registry |
|
||||
| `harbor` | Registry | Harbor Registry |
|
||||
| `gitlab-ci` | CI/CD | GitLab CI/CD |
|
||||
| `github-actions` | CI/CD | GitHub Actions |
|
||||
| `jenkins` | CI/CD | Jenkins |
|
||||
| `slack` | Notification | Slack |
|
||||
| `teams` | Notification | Microsoft Teams |
|
||||
| `email` | Notification | Email (SMTP) |
|
||||
| `hashicorp-vault` | Secrets | HashiCorp Vault |
|
||||
| `prometheus` | Metrics | Prometheus |
|
||||
|
||||
## Workflow Step Types
|
||||
|
||||
| Type | Category | Description |
|
||||
|------|----------|-------------|
|
||||
| `approval` | Control | Wait for human approval |
|
||||
| `wait` | Control | Wait for duration |
|
||||
| `condition` | Control | Branch based on condition |
|
||||
| `parallel` | Control | Execute children in parallel |
|
||||
| `security-gate` | Gate | Evaluate security policy |
|
||||
| `custom-gate` | Gate | Custom OPA policy |
|
||||
| `freeze-check` | Gate | Check freeze windows |
|
||||
| `deploy-docker` | Deploy | Deploy single container |
|
||||
| `deploy-compose` | Deploy | Deploy Compose stack |
|
||||
| `health-check` | Verify | HTTP/TCP health check |
|
||||
| `smoke-test` | Verify | Run smoke tests |
|
||||
| `notify` | Notify | Send notification |
|
||||
| `webhook` | Integration | Call external webhook |
|
||||
| `trigger-ci` | Integration | Trigger CI pipeline |
|
||||
| `rollback` | Recovery | Rollback deployment |
|
||||
|
||||
## Security Terms
|
||||
|
||||
| Term | Description |
|
||||
|------|-------------|
|
||||
| mTLS | Mutual TLS - both client and server authenticate with certificates |
|
||||
| JWT | JSON Web Token - used for API authentication |
|
||||
| RBAC | Role-Based Access Control |
|
||||
| OPA | Open Policy Agent - policy evaluation engine |
|
||||
| SoD | Separation of Duties |
|
||||
| PEP | Policy Enforcement Point |
|
||||
|
||||
## References
|
||||
|
||||
- [Design Principles](../design/principles.md)
|
||||
- [API Overview](../api/overview.md)
|
||||
- [Security Overview](../security/overview.md)
|
||||
410
docs/modules/release-orchestrator/architecture.md
Normal file
410
docs/modules/release-orchestrator/architecture.md
Normal file
@@ -0,0 +1,410 @@
|
||||
# Release Orchestrator Architecture
|
||||
|
||||
> Technical architecture specification for the Release Orchestrator — Stella Ops Suite's central release control plane for non-Kubernetes container estates.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
|
||||
## Overview
|
||||
|
||||
The Release Orchestrator transforms Stella Ops Suite from a vulnerability scanning platform into a centralized, auditable release control plane. It sits between CI systems and runtime targets, governing promotion across environments, enforcing security and policy gates, and producing verifiable evidence for every release decision.
|
||||
|
||||
### Core Value Proposition
|
||||
|
||||
- **Release orchestration** — UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks
|
||||
- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates
|
||||
- **OCI-digest-first releases** — Immutable digest-based release identity
|
||||
- **Toolchain-agnostic integrations** — Plug into any SCM, CI, registry, secrets system
|
||||
- **Auditability + standards** — Evidence packets, SBOM/VEX/attestation support, deterministic replay
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Digest-First Release Identity** — A release is an immutable set of OCI digests, never mutable tags. Tags are resolved to digests at release creation time.
|
||||
|
||||
2. **Pluggable Everything, Stable Core** — Integrations are plugins; the core orchestration engine is stable. Plugins contribute UI screens, connector logic, step types, and agent types.
|
||||
|
||||
3. **Evidence for Every Decision** — Every deployment/promotion produces an immutable evidence record containing who, what, why, how, and when.
|
||||
|
||||
4. **No Feature Gating** — All plans include all features. Limits are only: environments, new digests/day, fair use on deployments.
|
||||
|
||||
5. **Offline-First Operation** — All core operations work in air-gapped environments. Plugins may require connectivity; core does not.
|
||||
|
||||
6. **Immutable Generated Artifacts** — Every deployment generates and stores immutable artifacts (compose lockfiles, scripts, evidence).
|
||||
|
||||
## Platform Themes
|
||||
|
||||
The Release Orchestrator introduces ten new functional themes:
|
||||
|
||||
| Theme | Purpose | Key Modules |
|
||||
|-------|---------|-------------|
|
||||
| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime |
|
||||
| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager |
|
||||
| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager |
|
||||
| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor |
|
||||
| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine |
|
||||
| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Artifact Generator |
|
||||
| **AGENTS** | Deployment agents | Agent Core, Docker/Compose/ECS/Nomad agents |
|
||||
| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller |
|
||||
| **RELEVI** | Release evidence | Evidence Collector, Sticker Writer, Audit Exporter |
|
||||
| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin SDK |
|
||||
|
||||
## Components
|
||||
|
||||
```
|
||||
ReleaseOrchestrator/
|
||||
├── __Libraries/
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Core/ # Core domain models
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Workflow/ # DAG workflow engine
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Promotion/ # Promotion logic
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Deploy/ # Deployment coordination
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Evidence/ # Evidence generation
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Plugin/ # Plugin infrastructure
|
||||
│ └── StellaOps.ReleaseOrchestrator.Integration/ # Integration connectors
|
||||
├── StellaOps.ReleaseOrchestrator.WebService/ # HTTP API
|
||||
├── StellaOps.ReleaseOrchestrator.Worker/ # Background processing
|
||||
├── StellaOps.Agent.Core/ # Agent base framework
|
||||
├── StellaOps.Agent.Docker/ # Docker host agent
|
||||
├── StellaOps.Agent.Compose/ # Docker Compose agent
|
||||
├── StellaOps.Agent.SSH/ # SSH agentless executor
|
||||
├── StellaOps.Agent.WinRM/ # WinRM agentless executor
|
||||
├── StellaOps.Agent.ECS/ # AWS ECS agent
|
||||
├── StellaOps.Agent.Nomad/ # HashiCorp Nomad agent
|
||||
└── __Tests/
|
||||
└── StellaOps.ReleaseOrchestrator.*.Tests/
|
||||
```
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Release Orchestration Flow
|
||||
|
||||
```
|
||||
CI Build → Registry Push → Webhook → Stella Scan → Create Release →
|
||||
Request Promotion → Gate Evaluation → Decision Record →
|
||||
Deploy via Agent → Version Sticker → Evidence Packet
|
||||
```
|
||||
|
||||
### Detailed Flow
|
||||
|
||||
1. **CI pushes image** to registry by digest; triggers webhook to Stella
|
||||
2. **Stella scans** the new digest (if not already scanned); stores verdict
|
||||
3. **Release created** bundling component digests with semantic version
|
||||
4. **Promotion requested** to move release from source → target environment
|
||||
5. **Gate evaluation** runs: security verdict, approval count, freeze windows, custom policies
|
||||
6. **Decision record** produced with evidence refs and signed
|
||||
7. **Deployment executed** via agent to target (Docker/Compose/ECS/Nomad)
|
||||
8. **Version sticker** written to target for drift detection
|
||||
9. **Evidence packet** sealed and stored
|
||||
|
||||
## Key Abstractions
|
||||
|
||||
### Environment
|
||||
|
||||
```csharp
|
||||
public sealed record Environment
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required string Name { get; init; } // "dev", "stage", "prod"
|
||||
public required string Slug { get; init; } // URL-safe identifier
|
||||
public required int PromotionOrder { get; init; } // 1, 2, 3...
|
||||
public required FreezeWindow[] FreezeWindows { get; init; }
|
||||
public required ApprovalPolicy ApprovalPolicy { get; init; }
|
||||
public required bool IsProduction { get; init; }
|
||||
public EnvironmentState State { get; init; } // Active, Frozen, Retired
|
||||
}
|
||||
```
|
||||
|
||||
### Release
|
||||
|
||||
```csharp
|
||||
public sealed record Release
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required string Version { get; init; } // SemVer: "2.3.1"
|
||||
public required string Name { get; init; } // Display name
|
||||
public required ImmutableDictionary<string, ComponentDigest> Components { get; init; }
|
||||
public required string SourceRef { get; init; } // Git SHA or tag
|
||||
public required DateTimeOffset CreatedAt { get; init; }
|
||||
public required Guid CreatedBy { get; init; }
|
||||
public ReleaseState State { get; init; } // Draft, Active, Deprecated
|
||||
}
|
||||
|
||||
public sealed record ComponentDigest
|
||||
{
|
||||
public required string Repository { get; init; } // registry.example.com/app/api
|
||||
public required string Digest { get; init; } // sha256:abc123...
|
||||
public required string? ResolvedFromTag { get; init; } // Optional: "v2.3.1"
|
||||
}
|
||||
```
|
||||
|
||||
### Promotion
|
||||
|
||||
```csharp
|
||||
public sealed record Promotion
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required Guid ReleaseId { get; init; }
|
||||
public required Guid SourceEnvironmentId { get; init; }
|
||||
public required Guid TargetEnvironmentId { get; init; }
|
||||
public required Guid RequestedBy { get; init; }
|
||||
public required DateTimeOffset RequestedAt { get; init; }
|
||||
public PromotionState State { get; init; } // Pending, Approved, Rejected, Deployed, RolledBack
|
||||
public required ImmutableArray<GateResult> GateResults { get; init; }
|
||||
public required ImmutableArray<ApprovalRecord> Approvals { get; init; }
|
||||
public required DecisionRecord? Decision { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### Workflow
|
||||
|
||||
```csharp
|
||||
public sealed record Workflow
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required string Name { get; init; }
|
||||
public required ImmutableArray<WorkflowStep> Steps { get; init; }
|
||||
public required ImmutableDictionary<string, string[]> DependencyGraph { get; init; }
|
||||
}
|
||||
|
||||
public sealed record WorkflowStep
|
||||
{
|
||||
public required string Id { get; init; }
|
||||
public required string Type { get; init; } // "script", "approval", "deploy", "gate"
|
||||
public required StepProvider Provider { get; init; }
|
||||
public required ImmutableDictionary<string, object> Config { get; init; }
|
||||
public required string[] DependsOn { get; init; }
|
||||
public StepState State { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### Target
|
||||
|
||||
```csharp
|
||||
public sealed record Target
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required Guid EnvironmentId { get; init; }
|
||||
public required string Name { get; init; }
|
||||
public required TargetType Type { get; init; } // DockerHost, ComposeHost, ECSService, NomadJob
|
||||
public required ImmutableDictionary<string, string> Labels { get; init; }
|
||||
public required Guid? AgentId { get; init; } // Null for agentless
|
||||
public required TargetState State { get; init; }
|
||||
public required HealthStatus Health { get; init; }
|
||||
}
|
||||
|
||||
public enum TargetType
|
||||
{
|
||||
DockerHost,
|
||||
ComposeHost,
|
||||
ECSService,
|
||||
NomadJob,
|
||||
SSHRemote,
|
||||
WinRMRemote
|
||||
}
|
||||
```
|
||||
|
||||
### Agent
|
||||
|
||||
```csharp
|
||||
public sealed record Agent
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required string Name { get; init; }
|
||||
public required string Version { get; init; }
|
||||
public required ImmutableArray<string> Capabilities { get; init; }
|
||||
public required DateTimeOffset LastHeartbeat { get; init; }
|
||||
public required AgentState State { get; init; } // Online, Offline, Degraded
|
||||
public required ImmutableDictionary<string, string> Labels { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
| Table | Purpose |
|
||||
|-------|---------|
|
||||
| `release.environments` | Environment definitions with freeze windows |
|
||||
| `release.targets` | Deployment targets within environments |
|
||||
| `release.agents` | Registered deployment agents |
|
||||
| `release.components` | Component definitions (service → repository mapping) |
|
||||
| `release.releases` | Release bundles (version → component digests) |
|
||||
| `release.promotions` | Promotion requests and state |
|
||||
| `release.approvals` | Approval records |
|
||||
| `release.workflows` | Workflow templates |
|
||||
| `release.workflow_runs` | Workflow execution state |
|
||||
| `release.deployment_jobs` | Deployment job records |
|
||||
| `release.evidence_packets` | Sealed evidence records |
|
||||
| `release.integrations` | Integration configurations |
|
||||
| `release.plugins` | Plugin registrations |
|
||||
|
||||
## Gate Types
|
||||
|
||||
| Gate | Purpose | Evaluation |
|
||||
|------|---------|------------|
|
||||
| **Security** | Check scan verdict | Query latest scan for release digest; block on critical/high reachable |
|
||||
| **Approval** | Human sign-off | Count approvals; check SoD rules |
|
||||
| **FreezeWindow** | Calendar-based blocking | Check target environment freeze windows |
|
||||
| **PreviousEnvironment** | Require prior deployment | Verify release deployed to source environment |
|
||||
| **Policy** | Custom OPA/Rego rules | Evaluate policy with promotion context |
|
||||
| **HealthCheck** | Target health | Verify target is healthy before deploy |
|
||||
|
||||
## Plugin System (Three-Surface Model)
|
||||
|
||||
Plugins contribute through three surfaces:
|
||||
|
||||
### 1. Manifest (Static Declaration)
|
||||
|
||||
```yaml
|
||||
# plugin-manifest.yaml
|
||||
name: github-integration
|
||||
version: 1.0.0
|
||||
provider: StellaOps.Integration.GitHub.Plugin
|
||||
capabilities:
|
||||
integrations:
|
||||
- type: scm
|
||||
id: github
|
||||
displayName: GitHub
|
||||
steps:
|
||||
- type: github-status
|
||||
displayName: Update GitHub Status
|
||||
gates:
|
||||
- type: github-check
|
||||
displayName: GitHub Check Required
|
||||
```
|
||||
|
||||
### 2. Connector Runtime (Dynamic Execution)
|
||||
|
||||
```csharp
|
||||
public interface IIntegrationConnector
|
||||
{
|
||||
Task<ConnectionTestResult> TestConnectionAsync(CancellationToken ct);
|
||||
Task<HealthStatus> GetHealthAsync(CancellationToken ct);
|
||||
Task<IReadOnlyList<Resource>> DiscoverResourcesAsync(string resourceType, CancellationToken ct);
|
||||
}
|
||||
|
||||
public interface ISCMConnector : IIntegrationConnector
|
||||
{
|
||||
Task<CommitInfo> GetCommitAsync(string ref, CancellationToken ct);
|
||||
Task CreateCommitStatusAsync(string commit, CommitStatus status, CancellationToken ct);
|
||||
}
|
||||
|
||||
public interface IRegistryConnector : IIntegrationConnector
|
||||
{
|
||||
Task<string> ResolveDigestAsync(string imageRef, CancellationToken ct);
|
||||
Task<bool> VerifyDigestAsync(string imageRef, string expectedDigest, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Step Provider (Execution Contract)
|
||||
|
||||
```csharp
|
||||
public interface IStepProvider
|
||||
{
|
||||
StepExecutionCharacteristics Characteristics { get; }
|
||||
Task<StepResult> ExecuteAsync(StepContext context, CancellationToken ct);
|
||||
Task<StepResult> RollbackAsync(StepContext context, CancellationToken ct);
|
||||
}
|
||||
|
||||
public sealed record StepExecutionCharacteristics
|
||||
{
|
||||
public bool IsIdempotent { get; init; }
|
||||
public bool SupportsRollback { get; init; }
|
||||
public TimeSpan DefaultTimeout { get; init; }
|
||||
public ResourceRequirements Resources { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
## Invariants
|
||||
|
||||
1. **Release identity is immutable** — Once created, a release's component digests cannot be changed. Create a new release instead.
|
||||
|
||||
2. **Promotions are append-only** — Promotion state transitions are recorded; no edits or deletions.
|
||||
|
||||
3. **Evidence packets are sealed** — Evidence is cryptographically signed and stored immutably.
|
||||
|
||||
4. **Digest verification at deploy time** — Agents verify image digests at pull time; mismatch fails deployment.
|
||||
|
||||
5. **Separation of duties enforced** — Requester cannot be sole approver for production promotions.
|
||||
|
||||
6. **Workflow execution is deterministic** — Same inputs produce same execution order and outputs.
|
||||
|
||||
## Error Handling
|
||||
|
||||
- **Transient failures** — Retry with exponential backoff; circuit breaker for repeated failures
|
||||
- **Agent disconnection** — Mark agent offline; reassign pending tasks to other agents
|
||||
- **Deployment failure** — Automatic rollback if configured; otherwise mark promotion as failed
|
||||
- **Gate failure** — Block promotion; require manual intervention or re-evaluation
|
||||
|
||||
## Observability
|
||||
|
||||
### Metrics
|
||||
|
||||
- `release_promotions_total` — Counter by environment and outcome
|
||||
- `release_deployments_duration_seconds` — Histogram of deployment times
|
||||
- `release_gate_evaluations_total` — Counter by gate type and result
|
||||
- `release_agents_online` — Gauge of online agents
|
||||
- `release_workflow_steps_duration_seconds` — Histogram by step type
|
||||
|
||||
### Traces
|
||||
|
||||
- `promotion.request` — Span for promotion request handling
|
||||
- `gate.evaluate` — Span for each gate evaluation
|
||||
- `deployment.execute` — Span for deployment execution
|
||||
- `agent.task` — Span for agent task execution
|
||||
|
||||
### Logs
|
||||
|
||||
- Structured logs with correlation IDs
|
||||
- Promotion ID, release ID, environment ID in all relevant logs
|
||||
- Sensitive data (secrets, credentials) masked
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Agent Security
|
||||
|
||||
- **mTLS authentication** — Agents authenticate with CA-signed certificates
|
||||
- **Short-lived credentials** — Task credentials expire after execution
|
||||
- **Capability-based authorization** — Agents only receive tasks matching their capabilities
|
||||
- **Heartbeat monitoring** — Detect and flag agent disconnections
|
||||
|
||||
### Secrets Management
|
||||
|
||||
- **Never stored in database** — Only vault references stored
|
||||
- **Fetched at execution time** — Secrets retrieved just-in-time for deployment
|
||||
- **Short-lived** — Dynamic credentials with minimal TTL
|
||||
- **Masked in logs** — Secret values never logged
|
||||
|
||||
### Plugin Sandbox
|
||||
|
||||
- **Resource limits** — CPU, memory, timeout limits per plugin
|
||||
- **Capability restrictions** — Plugins declare required capabilities
|
||||
- **Network isolation** — Optional network restrictions for plugins
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- **Promotion evaluation** — < 5 seconds for typical gate evaluation
|
||||
- **Deployment latency** — Dominated by image pull time; orchestration overhead < 10 seconds
|
||||
- **Agent heartbeat** — 30-second interval; offline detection within 90 seconds
|
||||
- **Workflow step timeout** — Configurable; default 5 minutes per step
|
||||
|
||||
## Implementation Roadmap
|
||||
|
||||
| Phase | Focus | Key Deliverables |
|
||||
|-------|-------|------------------|
|
||||
| **Phase 1** | Foundation | Environment management, integration hub, release bundles |
|
||||
| **Phase 2** | Workflow Engine | DAG execution, step registry, workflow templates |
|
||||
| **Phase 3** | Promotion & Decision | Approval gateway, security gates, decision records |
|
||||
| **Phase 4** | Deployment Execution | Docker/Compose agents, artifact generation, rollback |
|
||||
| **Phase 5** | UI & Polish | Release dashboard, promotion UI, environment management |
|
||||
| **Phase 6** | Progressive Delivery | A/B releases, canary, traffic routing |
|
||||
| **Phase 7** | Extended Targets | ECS, Nomad, SSH/WinRM agentless |
|
||||
| **Phase 8** | Plugin Ecosystem | Full plugin system, marketplace |
|
||||
|
||||
## References
|
||||
|
||||
- [Product Vision](../../product/VISION.md)
|
||||
- [Architecture Overview](../../ARCHITECTURE_OVERVIEW.md)
|
||||
- [Full Orchestrator Specification](../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
- [Competitive Landscape](../../product/competitive-landscape.md)
|
||||
343
docs/modules/release-orchestrator/data-model/entities.md
Normal file
343
docs/modules/release-orchestrator/data-model/entities.md
Normal file
@@ -0,0 +1,343 @@
|
||||
# Entity Definitions
|
||||
|
||||
This document describes the core entities in the Release Orchestrator data model.
|
||||
|
||||
## Entity Relationship Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ENTITY RELATIONSHIPS │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │
|
||||
│ │ Tenant │───────│ Environment │───────│ Target │ │
|
||||
│ └──────────┘ └──────────────┘ └────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │
|
||||
│ │ Component│ │ Approval │ │ Agent │ │
|
||||
│ └──────────┘ │ Policy │ └────────────┘ │
|
||||
│ │ └──────────────┘ │ │
|
||||
│ │ │ │ │
|
||||
│ ▼ │ ▼ │
|
||||
│ ┌──────────┐ │ ┌─────────────┐ │
|
||||
│ │ Version │ │ │ Deployment │ │
|
||||
│ │ Map │ │ │ Task │ │
|
||||
│ └──────────┘ │ └─────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ │ │ │ │
|
||||
│ ▼ │ ▼ │
|
||||
│ ┌─────────────────────────┼─────────────────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ │ ┌──────────┐ ┌─────▼─────┐ ┌─────────────┐ │ │
|
||||
│ │ │ Release │─────│ Promotion │─────│ Deployment │ │ │
|
||||
│ │ └──────────┘ └───────────┘ │ Job │ │ │
|
||||
│ │ │ │ └─────────────┘ │ │
|
||||
│ │ │ │ │ │ │
|
||||
│ │ │ ▼ │ │ │
|
||||
│ │ │ ┌───────────┐ │ │ │
|
||||
│ │ │ │ Approval │ │ │ │
|
||||
│ │ │ └───────────┘ │ │ │
|
||||
│ │ │ │ │ │ │
|
||||
│ │ │ ▼ ▼ │ │
|
||||
│ │ │ ┌───────────┐ ┌───────────┐ │ │
|
||||
│ │ │ │ Decision │ │ Generated │ │ │
|
||||
│ │ │ │ Record │ │ Artifacts │ │ │
|
||||
│ │ │ └───────────┘ └───────────┘ │ │
|
||||
│ │ │ │ │ │ │
|
||||
│ │ │ └────────┬────────┘ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ ▼ │ │
|
||||
│ │ │ ┌───────────┐ │ │
|
||||
│ │ └───────────────────►│ Evidence │◄────────────┘ │
|
||||
│ │ │ Packet │ │
|
||||
│ │ └───────────┘ │
|
||||
│ │ │ │
|
||||
│ │ ▼ │
|
||||
│ │ ┌───────────┐ │
|
||||
│ │ │ Version │ │
|
||||
│ │ │ Sticker │ │
|
||||
│ │ └───────────┘ │
|
||||
│ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────┘
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Core Entities
|
||||
|
||||
### Environment
|
||||
|
||||
Represents a deployment target environment (dev, staging, production).
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `name` | string | Unique name (e.g., "prod") |
|
||||
| `display_name` | string | Display name (e.g., "Production") |
|
||||
| `order_index` | integer | Promotion order |
|
||||
| `config` | JSONB | Environment configuration |
|
||||
| `freeze_windows` | JSONB | Active freeze windows |
|
||||
| `required_approvals` | integer | Approvals needed for promotion |
|
||||
| `require_sod` | boolean | Require separation of duties |
|
||||
| `created_at` | timestamp | Creation time |
|
||||
|
||||
### Target
|
||||
|
||||
Represents a deployment target (host, service).
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `environment_id` | UUID | Environment reference |
|
||||
| `name` | string | Target name |
|
||||
| `target_type` | string | Type (docker_host, compose_host, etc.) |
|
||||
| `connection` | JSONB | Connection configuration |
|
||||
| `labels` | JSONB | Target labels |
|
||||
| `health_status` | string | Current health status |
|
||||
| `current_digest` | string | Currently deployed digest |
|
||||
|
||||
### Agent
|
||||
|
||||
Represents a deployment agent.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `name` | string | Agent name |
|
||||
| `version` | string | Agent version |
|
||||
| `capabilities` | JSONB | Agent capabilities |
|
||||
| `status` | string | online/offline/degraded |
|
||||
| `last_heartbeat` | timestamp | Last heartbeat time |
|
||||
|
||||
### Component
|
||||
|
||||
Represents a deployable component (maps to an image repository).
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `name` | string | Component name |
|
||||
| `display_name` | string | Display name |
|
||||
| `image_repository` | string | Image repository URL |
|
||||
| `versioning_strategy` | JSONB | How versions are determined |
|
||||
| `default_channel` | string | Default version channel |
|
||||
|
||||
### Version Map
|
||||
|
||||
Maps image tags to digests and semantic versions.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `component_id` | UUID | Component reference |
|
||||
| `tag` | string | Image tag |
|
||||
| `digest` | string | Image digest (sha256:...) |
|
||||
| `semver` | string | Semantic version |
|
||||
| `channel` | string | Version channel (stable, beta) |
|
||||
|
||||
### Release
|
||||
|
||||
A versioned bundle of component digests.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `name` | string | Release name |
|
||||
| `display_name` | string | Display name |
|
||||
| `components` | JSONB | Component/digest mappings |
|
||||
| `source_ref` | JSONB | Source code reference |
|
||||
| `status` | string | draft/ready/deployed/deprecated |
|
||||
| `created_by` | UUID | Creator user reference |
|
||||
|
||||
### Promotion
|
||||
|
||||
A request to promote a release to an environment.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `release_id` | UUID | Release reference |
|
||||
| `source_environment_id` | UUID | Source environment (nullable) |
|
||||
| `target_environment_id` | UUID | Target environment |
|
||||
| `status` | string | Promotion status |
|
||||
| `decision_record` | JSONB | Gate evaluation results |
|
||||
| `workflow_run_id` | UUID | Associated workflow run |
|
||||
| `requested_by` | UUID | Requesting user |
|
||||
| `requested_at` | timestamp | Request time |
|
||||
|
||||
### Approval
|
||||
|
||||
An approval or rejection of a promotion.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `promotion_id` | UUID | Promotion reference |
|
||||
| `approver_id` | UUID | Approving user |
|
||||
| `action` | string | approved/rejected |
|
||||
| `comment` | string | Approval comment |
|
||||
| `approved_at` | timestamp | Approval time |
|
||||
|
||||
### Deployment Job
|
||||
|
||||
A deployment execution job.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `promotion_id` | UUID | Promotion reference |
|
||||
| `release_id` | UUID | Release reference |
|
||||
| `environment_id` | UUID | Environment reference |
|
||||
| `status` | string | Job status |
|
||||
| `strategy` | string | Deployment strategy |
|
||||
| `artifacts` | JSONB | Generated artifacts |
|
||||
| `rollback_of` | UUID | If rollback, original job |
|
||||
|
||||
### Deployment Task
|
||||
|
||||
A task to deploy to a single target.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `job_id` | UUID | Job reference |
|
||||
| `target_id` | UUID | Target reference |
|
||||
| `digest` | string | Digest to deploy |
|
||||
| `status` | string | Task status |
|
||||
| `agent_id` | UUID | Assigned agent |
|
||||
| `logs` | text | Execution logs |
|
||||
| `previous_digest` | string | Previous digest (for rollback) |
|
||||
|
||||
### Evidence Packet
|
||||
|
||||
Immutable audit evidence for a promotion/deployment.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `promotion_id` | UUID | Promotion reference |
|
||||
| `packet_type` | string | Type of evidence |
|
||||
| `content` | JSONB | Evidence content |
|
||||
| `content_hash` | string | SHA-256 of content |
|
||||
| `signature` | string | Cryptographic signature |
|
||||
| `signer_key_ref` | string | Signing key reference |
|
||||
| `created_at` | timestamp | Creation time (no update) |
|
||||
|
||||
### Version Sticker
|
||||
|
||||
Version marker placed on deployment targets.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `target_id` | UUID | Target reference |
|
||||
| `release_id` | UUID | Release reference |
|
||||
| `promotion_id` | UUID | Promotion reference |
|
||||
| `sticker_content` | JSONB | Sticker JSON content |
|
||||
| `content_hash` | string | Content hash |
|
||||
| `written_at` | timestamp | Write time |
|
||||
| `drift_detected` | boolean | Drift detection flag |
|
||||
|
||||
## Workflow Entities
|
||||
|
||||
### Workflow Template
|
||||
|
||||
A reusable workflow definition.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference (null for builtin) |
|
||||
| `name` | string | Template name |
|
||||
| `version` | integer | Template version |
|
||||
| `nodes` | JSONB | Step nodes |
|
||||
| `edges` | JSONB | Step edges |
|
||||
| `inputs` | JSONB | Input definitions |
|
||||
| `outputs` | JSONB | Output definitions |
|
||||
| `is_builtin` | boolean | Is built-in template |
|
||||
|
||||
### Workflow Run
|
||||
|
||||
An execution of a workflow template.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `template_id` | UUID | Template reference |
|
||||
| `template_version` | integer | Template version at execution |
|
||||
| `status` | string | Run status |
|
||||
| `context` | JSONB | Execution context |
|
||||
| `inputs` | JSONB | Input values |
|
||||
| `outputs` | JSONB | Output values |
|
||||
| `started_at` | timestamp | Start time |
|
||||
| `completed_at` | timestamp | Completion time |
|
||||
|
||||
### Step Run
|
||||
|
||||
Execution of a single step within a workflow run.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `workflow_run_id` | UUID | Workflow run reference |
|
||||
| `node_id` | string | Node ID from template |
|
||||
| `status` | string | Step status |
|
||||
| `inputs` | JSONB | Resolved inputs |
|
||||
| `outputs` | JSONB | Produced outputs |
|
||||
| `logs` | text | Execution logs |
|
||||
| `attempt_number` | integer | Retry attempt number |
|
||||
|
||||
## Plugin Entities
|
||||
|
||||
### Plugin
|
||||
|
||||
A registered plugin.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `plugin_id` | string | Unique plugin identifier |
|
||||
| `version` | string | Plugin version |
|
||||
| `vendor` | string | Plugin vendor |
|
||||
| `manifest` | JSONB | Plugin manifest |
|
||||
| `status` | string | Plugin status |
|
||||
| `entrypoint` | string | Plugin entrypoint path |
|
||||
|
||||
### Plugin Instance
|
||||
|
||||
A tenant-specific plugin configuration.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `plugin_id` | UUID | Plugin reference |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `config` | JSONB | Tenant configuration |
|
||||
| `enabled` | boolean | Is enabled for tenant |
|
||||
|
||||
## Integration Entities
|
||||
|
||||
### Integration
|
||||
|
||||
A configured external integration.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `type_id` | string | Integration type |
|
||||
| `name` | string | Integration name |
|
||||
| `config` | JSONB | Integration configuration |
|
||||
| `credential_ref` | string | Vault credential reference |
|
||||
| `health_status` | string | Connection health |
|
||||
|
||||
## References
|
||||
|
||||
- [Database Schema](schema.md)
|
||||
- [Module Overview](../modules/overview.md)
|
||||
631
docs/modules/release-orchestrator/data-model/schema.md
Normal file
631
docs/modules/release-orchestrator/data-model/schema.md
Normal file
@@ -0,0 +1,631 @@
|
||||
# Database Schema (PostgreSQL)
|
||||
|
||||
This document specifies the complete PostgreSQL schema for the Release Orchestrator.
|
||||
|
||||
## Schema Organization
|
||||
|
||||
All release orchestration tables reside in the `release` schema:
|
||||
|
||||
```sql
|
||||
CREATE SCHEMA IF NOT EXISTS release;
|
||||
SET search_path TO release, public;
|
||||
```
|
||||
|
||||
## Core Tables
|
||||
|
||||
### Tenant and Authority Extensions
|
||||
|
||||
```sql
|
||||
-- Extended: Add release-related permissions
|
||||
ALTER TABLE permissions ADD COLUMN IF NOT EXISTS
|
||||
resource_type VARCHAR(50) CHECK (resource_type IN (
|
||||
'environment', 'release', 'promotion', 'target', 'workflow', 'plugin'
|
||||
));
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration Hub
|
||||
|
||||
```sql
|
||||
CREATE TABLE integration_types (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
name VARCHAR(100) NOT NULL UNIQUE,
|
||||
category VARCHAR(50) NOT NULL CHECK (category IN (
|
||||
'scm', 'ci', 'registry', 'vault', 'target', 'router'
|
||||
)),
|
||||
plugin_id UUID REFERENCES plugins(id),
|
||||
config_schema JSONB NOT NULL,
|
||||
secrets_schema JSONB NOT NULL,
|
||||
is_builtin BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE TABLE integrations (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
integration_type_id UUID NOT NULL REFERENCES integration_types(id),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
config JSONB NOT NULL,
|
||||
credential_ref VARCHAR(500), -- Vault path or encrypted ref
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'unknown' CHECK (status IN (
|
||||
'healthy', 'degraded', 'unhealthy', 'unknown'
|
||||
)),
|
||||
last_health_check TIMESTAMPTZ,
|
||||
last_health_message TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
created_by UUID REFERENCES users(id),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_integrations_tenant ON integrations(tenant_id);
|
||||
CREATE INDEX idx_integrations_type ON integrations(integration_type_id);
|
||||
|
||||
CREATE TABLE connection_profiles (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
user_id UUID NOT NULL REFERENCES users(id),
|
||||
integration_type_id UUID NOT NULL REFERENCES integration_types(id),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
config_defaults JSONB NOT NULL,
|
||||
is_default BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, user_id, integration_type_id, name)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Environment & Inventory
|
||||
|
||||
```sql
|
||||
CREATE TABLE environments (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(100) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
order_index INTEGER NOT NULL,
|
||||
config JSONB NOT NULL DEFAULT '{}',
|
||||
freeze_windows JSONB NOT NULL DEFAULT '[]',
|
||||
required_approvals INTEGER NOT NULL DEFAULT 0,
|
||||
require_sod BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
auto_promote_from UUID REFERENCES environments(id),
|
||||
promotion_policy VARCHAR(255),
|
||||
deployment_timeout INTEGER NOT NULL DEFAULT 600,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_environments_tenant ON environments(tenant_id);
|
||||
CREATE INDEX idx_environments_order ON environments(tenant_id, order_index);
|
||||
|
||||
CREATE TABLE target_groups (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
labels JSONB NOT NULL DEFAULT '{}',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, environment_id, name)
|
||||
);
|
||||
|
||||
CREATE TABLE targets (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
|
||||
target_group_id UUID REFERENCES target_groups(id),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
target_type VARCHAR(100) NOT NULL,
|
||||
connection JSONB NOT NULL,
|
||||
capabilities JSONB NOT NULL DEFAULT '[]',
|
||||
labels JSONB NOT NULL DEFAULT '{}',
|
||||
deployment_directory VARCHAR(500),
|
||||
health_status VARCHAR(50) NOT NULL DEFAULT 'unknown' CHECK (health_status IN (
|
||||
'healthy', 'degraded', 'unhealthy', 'unknown'
|
||||
)),
|
||||
last_health_check TIMESTAMPTZ,
|
||||
current_digest VARCHAR(100),
|
||||
agent_id UUID REFERENCES agents(id),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, environment_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_targets_tenant_env ON targets(tenant_id, environment_id);
|
||||
CREATE INDEX idx_targets_type ON targets(target_type);
|
||||
CREATE INDEX idx_targets_labels ON targets USING GIN (labels);
|
||||
|
||||
CREATE TABLE agents (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
version VARCHAR(50) NOT NULL,
|
||||
capabilities JSONB NOT NULL DEFAULT '[]',
|
||||
labels JSONB NOT NULL DEFAULT '{}',
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'offline' CHECK (status IN (
|
||||
'online', 'offline', 'degraded'
|
||||
)),
|
||||
last_heartbeat TIMESTAMPTZ,
|
||||
resource_usage JSONB,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_agents_tenant ON agents(tenant_id);
|
||||
CREATE INDEX idx_agents_status ON agents(status);
|
||||
CREATE INDEX idx_agents_capabilities ON agents USING GIN (capabilities);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Release Management
|
||||
|
||||
```sql
|
||||
CREATE TABLE components (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
image_repository VARCHAR(500) NOT NULL,
|
||||
registry_integration_id UUID REFERENCES integrations(id),
|
||||
versioning_strategy JSONB NOT NULL DEFAULT '{"type": "semver"}',
|
||||
deployment_template VARCHAR(255),
|
||||
default_channel VARCHAR(50) NOT NULL DEFAULT 'stable',
|
||||
metadata JSONB NOT NULL DEFAULT '{}',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_components_tenant ON components(tenant_id);
|
||||
|
||||
CREATE TABLE version_maps (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
component_id UUID NOT NULL REFERENCES components(id) ON DELETE CASCADE,
|
||||
tag VARCHAR(255) NOT NULL,
|
||||
digest VARCHAR(100) NOT NULL,
|
||||
semver VARCHAR(50),
|
||||
channel VARCHAR(50) NOT NULL DEFAULT 'stable',
|
||||
prerelease BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
build_metadata VARCHAR(255),
|
||||
resolved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
source VARCHAR(50) NOT NULL DEFAULT 'auto' CHECK (source IN ('auto', 'manual')),
|
||||
UNIQUE (tenant_id, component_id, digest)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_version_maps_component ON version_maps(component_id);
|
||||
CREATE INDEX idx_version_maps_digest ON version_maps(digest);
|
||||
CREATE INDEX idx_version_maps_semver ON version_maps(semver);
|
||||
|
||||
CREATE TABLE releases (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
components JSONB NOT NULL, -- [{componentId, digest, semver, tag, role}]
|
||||
source_ref JSONB, -- {scmIntegrationId, commitSha, ciIntegrationId, buildId}
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'draft' CHECK (status IN (
|
||||
'draft', 'ready', 'promoting', 'deployed', 'deprecated', 'archived'
|
||||
)),
|
||||
metadata JSONB NOT NULL DEFAULT '{}',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
created_by UUID REFERENCES users(id),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_releases_tenant ON releases(tenant_id);
|
||||
CREATE INDEX idx_releases_status ON releases(status);
|
||||
CREATE INDEX idx_releases_created ON releases(created_at DESC);
|
||||
|
||||
CREATE TABLE release_environment_state (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
|
||||
release_id UUID NOT NULL REFERENCES releases(id),
|
||||
status VARCHAR(50) NOT NULL CHECK (status IN (
|
||||
'deployed', 'deploying', 'failed', 'rolling_back', 'rolled_back'
|
||||
)),
|
||||
deployed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
deployed_by UUID REFERENCES users(id),
|
||||
promotion_id UUID, -- will reference promotions
|
||||
evidence_ref VARCHAR(255),
|
||||
UNIQUE (tenant_id, environment_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_release_env_state_env ON release_environment_state(environment_id);
|
||||
CREATE INDEX idx_release_env_state_release ON release_environment_state(release_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow Engine
|
||||
|
||||
```sql
|
||||
CREATE TABLE workflow_templates (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE, -- NULL for builtin
|
||||
name VARCHAR(255) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
description TEXT,
|
||||
version INTEGER NOT NULL DEFAULT 1,
|
||||
nodes JSONB NOT NULL,
|
||||
edges JSONB NOT NULL,
|
||||
inputs JSONB NOT NULL DEFAULT '[]',
|
||||
outputs JSONB NOT NULL DEFAULT '[]',
|
||||
is_builtin BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
tags JSONB NOT NULL DEFAULT '[]',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
created_by UUID REFERENCES users(id),
|
||||
UNIQUE (tenant_id, name, version)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_workflow_templates_tenant ON workflow_templates(tenant_id);
|
||||
CREATE INDEX idx_workflow_templates_builtin ON workflow_templates(is_builtin);
|
||||
|
||||
CREATE TABLE workflow_runs (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
template_id UUID NOT NULL REFERENCES workflow_templates(id),
|
||||
template_version INTEGER NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN (
|
||||
'created', 'running', 'paused', 'succeeded', 'failed', 'cancelled'
|
||||
)),
|
||||
context JSONB NOT NULL, -- inputs, variables, release info
|
||||
outputs JSONB,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
error_message TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
triggered_by UUID REFERENCES users(id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_workflow_runs_tenant ON workflow_runs(tenant_id);
|
||||
CREATE INDEX idx_workflow_runs_status ON workflow_runs(status);
|
||||
CREATE INDEX idx_workflow_runs_template ON workflow_runs(template_id);
|
||||
|
||||
CREATE TABLE step_runs (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
workflow_run_id UUID NOT NULL REFERENCES workflow_runs(id) ON DELETE CASCADE,
|
||||
node_id VARCHAR(100) NOT NULL,
|
||||
step_type VARCHAR(100) NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
|
||||
'pending', 'running', 'succeeded', 'failed', 'skipped', 'retrying', 'cancelled'
|
||||
)),
|
||||
inputs JSONB NOT NULL,
|
||||
config JSONB NOT NULL,
|
||||
outputs JSONB,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
attempt_number INTEGER NOT NULL DEFAULT 1,
|
||||
error_message TEXT,
|
||||
error_type VARCHAR(100),
|
||||
logs TEXT,
|
||||
artifacts JSONB NOT NULL DEFAULT '[]',
|
||||
t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional)
|
||||
ts_wall TIMESTAMPTZ, -- Wall-clock timestamp for debugging (optional)
|
||||
UNIQUE (workflow_run_id, node_id, attempt_number)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_step_runs_workflow ON step_runs(workflow_run_id);
|
||||
CREATE INDEX idx_step_runs_status ON step_runs(status);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Promotion & Approval
|
||||
|
||||
```sql
|
||||
CREATE TABLE promotions (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
release_id UUID NOT NULL REFERENCES releases(id),
|
||||
source_environment_id UUID REFERENCES environments(id),
|
||||
target_environment_id UUID NOT NULL REFERENCES environments(id),
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending_approval' CHECK (status IN (
|
||||
'pending_approval', 'pending_gate', 'approved', 'rejected',
|
||||
'deploying', 'deployed', 'failed', 'cancelled', 'rolled_back'
|
||||
)),
|
||||
decision_record JSONB,
|
||||
workflow_run_id UUID REFERENCES workflow_runs(id),
|
||||
requested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
requested_by UUID NOT NULL REFERENCES users(id),
|
||||
request_reason TEXT,
|
||||
decided_at TIMESTAMPTZ,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
evidence_packet_id UUID,
|
||||
t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional)
|
||||
ts_wall TIMESTAMPTZ -- Wall-clock timestamp for debugging (optional)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_promotions_tenant ON promotions(tenant_id);
|
||||
CREATE INDEX idx_promotions_release ON promotions(release_id);
|
||||
CREATE INDEX idx_promotions_status ON promotions(status);
|
||||
CREATE INDEX idx_promotions_target_env ON promotions(target_environment_id);
|
||||
|
||||
-- Add FK to release_environment_state
|
||||
ALTER TABLE release_environment_state
|
||||
ADD CONSTRAINT fk_release_env_state_promotion
|
||||
FOREIGN KEY (promotion_id) REFERENCES promotions(id);
|
||||
|
||||
CREATE TABLE approvals (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
promotion_id UUID NOT NULL REFERENCES promotions(id) ON DELETE CASCADE,
|
||||
approver_id UUID NOT NULL REFERENCES users(id),
|
||||
action VARCHAR(50) NOT NULL CHECK (action IN ('approved', 'rejected')),
|
||||
comment TEXT,
|
||||
approved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
approver_role VARCHAR(255),
|
||||
approver_groups JSONB NOT NULL DEFAULT '[]'
|
||||
);
|
||||
|
||||
CREATE INDEX idx_approvals_promotion ON approvals(promotion_id);
|
||||
CREATE INDEX idx_approvals_approver ON approvals(approver_id);
|
||||
|
||||
CREATE TABLE approval_policies (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
|
||||
required_count INTEGER NOT NULL DEFAULT 1,
|
||||
required_roles JSONB NOT NULL DEFAULT '[]',
|
||||
required_groups JSONB NOT NULL DEFAULT '[]',
|
||||
require_sod BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
allow_self_approval BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
expiration_minutes INTEGER NOT NULL DEFAULT 1440,
|
||||
UNIQUE (tenant_id, environment_id)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment
|
||||
|
||||
```sql
|
||||
CREATE TABLE deployment_jobs (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
promotion_id UUID NOT NULL REFERENCES promotions(id),
|
||||
release_id UUID NOT NULL REFERENCES releases(id),
|
||||
environment_id UUID NOT NULL REFERENCES environments(id),
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
|
||||
'pending', 'running', 'succeeded', 'failed', 'cancelled', 'rolling_back', 'rolled_back'
|
||||
)),
|
||||
strategy VARCHAR(50) NOT NULL DEFAULT 'all-at-once',
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
artifacts JSONB NOT NULL DEFAULT '[]',
|
||||
rollback_of UUID REFERENCES deployment_jobs(id),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional)
|
||||
ts_wall TIMESTAMPTZ -- Wall-clock timestamp for debugging (optional)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_deployment_jobs_promotion ON deployment_jobs(promotion_id);
|
||||
CREATE INDEX idx_deployment_jobs_status ON deployment_jobs(status);
|
||||
|
||||
CREATE TABLE deployment_tasks (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
job_id UUID NOT NULL REFERENCES deployment_jobs(id) ON DELETE CASCADE,
|
||||
target_id UUID NOT NULL REFERENCES targets(id),
|
||||
digest VARCHAR(100) NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
|
||||
'pending', 'running', 'succeeded', 'failed', 'cancelled', 'skipped'
|
||||
)),
|
||||
agent_id UUID REFERENCES agents(id),
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
exit_code INTEGER,
|
||||
logs TEXT,
|
||||
previous_digest VARCHAR(100),
|
||||
sticker_written BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_deployment_tasks_job ON deployment_tasks(job_id);
|
||||
CREATE INDEX idx_deployment_tasks_target ON deployment_tasks(target_id);
|
||||
CREATE INDEX idx_deployment_tasks_status ON deployment_tasks(status);
|
||||
|
||||
CREATE TABLE generated_artifacts (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
deployment_job_id UUID REFERENCES deployment_jobs(id) ON DELETE CASCADE,
|
||||
artifact_type VARCHAR(50) NOT NULL CHECK (artifact_type IN (
|
||||
'compose_lock', 'script', 'sticker', 'evidence', 'config'
|
||||
)),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
content_hash VARCHAR(100) NOT NULL,
|
||||
content BYTEA, -- for small artifacts
|
||||
storage_ref VARCHAR(500), -- for large artifacts (S3, etc.)
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_generated_artifacts_job ON generated_artifacts(deployment_job_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Progressive Delivery
|
||||
|
||||
```sql
|
||||
CREATE TABLE ab_releases (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES environments(id),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
variations JSONB NOT NULL, -- [{name, releaseId, targetGroupId, trafficPercentage}]
|
||||
active_variation VARCHAR(50) NOT NULL DEFAULT 'A',
|
||||
traffic_split JSONB NOT NULL,
|
||||
rollout_strategy JSONB NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN (
|
||||
'created', 'deploying', 'running', 'promoting', 'completed', 'rolled_back'
|
||||
)),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
completed_at TIMESTAMPTZ,
|
||||
created_by UUID REFERENCES users(id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_ab_releases_tenant_env ON ab_releases(tenant_id, environment_id);
|
||||
CREATE INDEX idx_ab_releases_status ON ab_releases(status);
|
||||
|
||||
CREATE TABLE canary_stages (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
ab_release_id UUID NOT NULL REFERENCES ab_releases(id) ON DELETE CASCADE,
|
||||
stage_number INTEGER NOT NULL,
|
||||
traffic_percentage INTEGER NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
|
||||
'pending', 'running', 'succeeded', 'failed', 'skipped'
|
||||
)),
|
||||
health_threshold DECIMAL(5,2),
|
||||
duration_seconds INTEGER,
|
||||
require_approval BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
health_result JSONB,
|
||||
UNIQUE (ab_release_id, stage_number)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Release Evidence
|
||||
|
||||
```sql
|
||||
CREATE TABLE evidence_packets (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
promotion_id UUID NOT NULL REFERENCES promotions(id),
|
||||
packet_type VARCHAR(50) NOT NULL CHECK (packet_type IN (
|
||||
'release_decision', 'deployment', 'rollback', 'ab_promotion'
|
||||
)),
|
||||
content JSONB NOT NULL,
|
||||
content_hash VARCHAR(100) NOT NULL,
|
||||
signature TEXT,
|
||||
signer_key_ref VARCHAR(255),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
-- Note: No UPDATE or DELETE allowed (append-only)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_evidence_packets_promotion ON evidence_packets(promotion_id);
|
||||
CREATE INDEX idx_evidence_packets_created ON evidence_packets(created_at DESC);
|
||||
|
||||
-- Append-only enforcement via trigger
|
||||
CREATE OR REPLACE FUNCTION prevent_evidence_modification()
|
||||
RETURNS TRIGGER AS $$
|
||||
BEGIN
|
||||
RAISE EXCEPTION 'Evidence packets are immutable and cannot be modified or deleted';
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
CREATE TRIGGER evidence_packets_immutable
|
||||
BEFORE UPDATE OR DELETE ON evidence_packets
|
||||
FOR EACH ROW EXECUTE FUNCTION prevent_evidence_modification();
|
||||
|
||||
CREATE TABLE version_stickers (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
target_id UUID NOT NULL REFERENCES targets(id),
|
||||
deployment_job_id UUID REFERENCES deployment_jobs(id),
|
||||
release_id UUID NOT NULL REFERENCES releases(id),
|
||||
digest VARCHAR(100) NOT NULL,
|
||||
sticker_content JSONB NOT NULL,
|
||||
written_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
verified_at TIMESTAMPTZ,
|
||||
verification_status VARCHAR(50) CHECK (verification_status IN ('valid', 'mismatch', 'missing'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_version_stickers_target ON version_stickers(target_id);
|
||||
CREATE INDEX idx_version_stickers_release ON version_stickers(release_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Plugin Infrastructure
|
||||
|
||||
```sql
|
||||
CREATE TABLE plugins (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
name VARCHAR(255) NOT NULL UNIQUE,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
version VARCHAR(50) NOT NULL,
|
||||
description TEXT,
|
||||
manifest JSONB NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'inactive' CHECK (status IN (
|
||||
'active', 'inactive', 'error'
|
||||
)),
|
||||
error_message TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE TABLE plugin_instances (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
plugin_id UUID NOT NULL REFERENCES plugins(id),
|
||||
config JSONB NOT NULL DEFAULT '{}',
|
||||
enabled BOOLEAN NOT NULL DEFAULT TRUE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, plugin_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_plugin_instances_tenant ON plugin_instances(tenant_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Hybrid Logical Clock (HLC) for Distributed Ordering
|
||||
|
||||
**Optional Enhancement**: For strict distributed ordering and multi-region support, the following tables include optional `t_hlc` (Hybrid Logical Clock timestamp) and `ts_wall` (wall-clock timestamp) columns:
|
||||
|
||||
- `promotions` — Promotion state transitions
|
||||
- `deployment_jobs` — Deployment task ordering
|
||||
- `step_runs` — Workflow step execution ordering
|
||||
|
||||
**When to use HLC**:
|
||||
- Multi-region deployments requiring strict causal ordering
|
||||
- Deterministic replay across distributed systems
|
||||
- Timeline event ordering in audit logs
|
||||
|
||||
**HLC Schema**:
|
||||
```sql
|
||||
t_hlc BIGINT -- HLC timestamp (monotonic, skew-tolerant)
|
||||
ts_wall TIMESTAMPTZ -- Wall-clock timestamp (informational)
|
||||
```
|
||||
|
||||
**Usage**:
|
||||
- `t_hlc` is generated by `IHybridLogicalClock.Tick()` on state transitions
|
||||
- `ts_wall` is populated by `TimeProvider.GetUtcNow()` for debugging
|
||||
- Index on `t_hlc` for ordering queries: `CREATE INDEX idx_promotions_hlc ON promotions(t_hlc);`
|
||||
|
||||
**Reference**: See [Implementation Guide](../implementation-guide.md#hybrid-logical-clock-hlc-for-distributed-ordering) for HLC usage patterns.
|
||||
|
||||
---
|
||||
|
||||
## Row-Level Security (Multi-Tenancy)
|
||||
|
||||
All tables with `tenant_id` should have RLS enabled:
|
||||
|
||||
```sql
|
||||
-- Enable RLS on all release tables
|
||||
ALTER TABLE integrations ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE environments ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE targets ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE releases ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE promotions ENABLE ROW LEVEL SECURITY;
|
||||
-- ... etc.
|
||||
|
||||
-- Example policy
|
||||
CREATE POLICY tenant_isolation ON integrations
|
||||
FOR ALL
|
||||
USING (tenant_id = current_setting('app.tenant_id')::UUID);
|
||||
```
|
||||
403
docs/modules/release-orchestrator/deployment/agent-based.md
Normal file
403
docs/modules/release-orchestrator/deployment/agent-based.md
Normal file
@@ -0,0 +1,403 @@
|
||||
# Agent-Based Deployment
|
||||
|
||||
> Agent-based deployment using Docker and Compose agents for executing tasks on targets.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 10.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Agents Module](../modules/agents.md), [Deploy Orchestrator](../modules/deploy-orchestrator.md)
|
||||
**Sprints:** [108_002 Docker Agent](../../../../implplan/SPRINT_20260110_108_002_AGENTS_docker.md), [108_003 Compose Agent](../../../../implplan/SPRINT_20260110_108_003_AGENTS_compose.md)
|
||||
|
||||
## Overview
|
||||
|
||||
Agent-based deployment uses lightweight agents installed on target hosts to execute deployment tasks. Agents communicate with the orchestrator over mTLS and receive tasks through heartbeat polling or WebSocket streams.
|
||||
|
||||
---
|
||||
|
||||
## Agent Task Protocol
|
||||
|
||||
### Task Payload Structure
|
||||
|
||||
```typescript
|
||||
// Task assignment (Core -> Agent)
|
||||
interface AgentTask {
|
||||
id: UUID;
|
||||
type: TaskType;
|
||||
targetId: UUID;
|
||||
payload: TaskPayload;
|
||||
credentials: EncryptedCredentials;
|
||||
timeout: number;
|
||||
priority: TaskPriority;
|
||||
idempotencyKey: string;
|
||||
assignedAt: DateTime;
|
||||
expiresAt: DateTime;
|
||||
}
|
||||
|
||||
type TaskType =
|
||||
| "deploy"
|
||||
| "rollback"
|
||||
| "health-check"
|
||||
| "inspect"
|
||||
| "execute-command"
|
||||
| "upload-files"
|
||||
| "write-sticker"
|
||||
| "read-sticker";
|
||||
|
||||
interface DeployTaskPayload {
|
||||
image: string;
|
||||
digest: string;
|
||||
config: DeployConfig;
|
||||
artifacts: ArtifactReference[];
|
||||
previousDigest?: string;
|
||||
hooks: {
|
||||
preDeploy?: HookConfig;
|
||||
postDeploy?: HookConfig;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Task Result Structure
|
||||
|
||||
```typescript
|
||||
// Task result (Agent -> Core)
|
||||
interface TaskResult {
|
||||
taskId: UUID;
|
||||
success: boolean;
|
||||
startedAt: DateTime;
|
||||
completedAt: DateTime;
|
||||
|
||||
// Success details
|
||||
outputs?: Record<string, any>;
|
||||
artifacts?: ArtifactReference[];
|
||||
|
||||
// Failure details
|
||||
error?: string;
|
||||
errorType?: string;
|
||||
retriable?: boolean;
|
||||
|
||||
// Logs
|
||||
logs: string;
|
||||
|
||||
// Metrics
|
||||
metrics: {
|
||||
pullDurationMs?: number;
|
||||
deployDurationMs?: number;
|
||||
healthCheckDurationMs?: number;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Docker Agent Implementation
|
||||
|
||||
The Docker agent deploys single containers to Docker hosts with digest verification.
|
||||
|
||||
### Docker Agent Capabilities
|
||||
|
||||
- Pull images with digest verification
|
||||
- Create and start containers
|
||||
- Stop and remove containers
|
||||
- Health check monitoring
|
||||
- Version sticker management
|
||||
- Rollback to previous container
|
||||
|
||||
### Deploy Task Flow
|
||||
|
||||
```typescript
|
||||
class DockerAgent implements TargetExecutor {
|
||||
private docker: Docker;
|
||||
|
||||
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
|
||||
const { image, digest, config, previousDigest } = task;
|
||||
const containerName = config.containerName;
|
||||
|
||||
// 1. Pull image and verify digest
|
||||
this.log(`Pulling image ${image}@${digest}`);
|
||||
await this.docker.pull(image, { digest });
|
||||
|
||||
const pulledDigest = await this.getImageDigest(image);
|
||||
if (pulledDigest !== digest) {
|
||||
throw new DigestMismatchError(
|
||||
`Expected digest ${digest}, got ${pulledDigest}. Possible tampering detected.`
|
||||
);
|
||||
}
|
||||
|
||||
// 2. Run pre-deploy hook
|
||||
if (task.hooks?.preDeploy) {
|
||||
await this.runHook(task.hooks.preDeploy, "pre-deploy");
|
||||
}
|
||||
|
||||
// 3. Stop and rename existing container
|
||||
const existingContainer = await this.findContainer(containerName);
|
||||
if (existingContainer) {
|
||||
this.log(`Stopping existing container ${containerName}`);
|
||||
await existingContainer.stop({ t: 10 });
|
||||
await existingContainer.rename(`${containerName}-previous-${Date.now()}`);
|
||||
}
|
||||
|
||||
// 4. Create new container
|
||||
this.log(`Creating container ${containerName} from ${image}@${digest}`);
|
||||
const container = await this.docker.createContainer({
|
||||
name: containerName,
|
||||
Image: `${image}@${digest}`, // Always use digest, not tag
|
||||
Env: this.buildEnvVars(config.environment),
|
||||
HostConfig: {
|
||||
PortBindings: this.buildPortBindings(config.ports),
|
||||
Binds: this.buildBindMounts(config.volumes),
|
||||
RestartPolicy: { Name: config.restartPolicy || "unless-stopped" },
|
||||
Memory: config.memoryLimit,
|
||||
CpuQuota: config.cpuLimit,
|
||||
},
|
||||
Labels: {
|
||||
"stella.release.id": config.releaseId,
|
||||
"stella.release.name": config.releaseName,
|
||||
"stella.digest": digest,
|
||||
"stella.deployed.at": new Date().toISOString(),
|
||||
},
|
||||
});
|
||||
|
||||
// 5. Start container
|
||||
this.log(`Starting container ${containerName}`);
|
||||
await container.start();
|
||||
|
||||
// 6. Wait for container to be healthy (if health check configured)
|
||||
if (config.healthCheck) {
|
||||
this.log(`Waiting for container health check`);
|
||||
const healthy = await this.waitForHealthy(container, config.healthCheck.timeout);
|
||||
if (!healthy) {
|
||||
// Rollback to previous container
|
||||
await this.rollbackContainer(containerName, existingContainer);
|
||||
throw new HealthCheckFailedError(`Container ${containerName} failed health check`);
|
||||
}
|
||||
}
|
||||
|
||||
// 7. Run post-deploy hook
|
||||
if (task.hooks?.postDeploy) {
|
||||
await this.runHook(task.hooks.postDeploy, "post-deploy");
|
||||
}
|
||||
|
||||
// 8. Cleanup previous container
|
||||
if (existingContainer && config.cleanupPrevious !== false) {
|
||||
this.log(`Removing previous container`);
|
||||
await existingContainer.remove({ force: true });
|
||||
}
|
||||
|
||||
return {
|
||||
success: true,
|
||||
containerId: container.id,
|
||||
previousDigest: previousDigest,
|
||||
logs: this.getLogs(),
|
||||
durationMs: this.getDuration(),
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Rollback Implementation
|
||||
|
||||
```typescript
|
||||
async rollback(task: RollbackTaskPayload): Promise<DeployResult> {
|
||||
const { containerName, targetDigest } = task;
|
||||
|
||||
// Find previous container or use specified digest
|
||||
if (targetDigest) {
|
||||
// Deploy specific digest
|
||||
return this.deploy({
|
||||
...task,
|
||||
digest: targetDigest,
|
||||
});
|
||||
}
|
||||
|
||||
// Find and restore previous container
|
||||
const previousContainer = await this.findContainer(`${containerName}-previous-*`);
|
||||
if (!previousContainer) {
|
||||
throw new RollbackError(`No previous container found for ${containerName}`);
|
||||
}
|
||||
|
||||
// Stop current, rename, start previous
|
||||
const currentContainer = await this.findContainer(containerName);
|
||||
if (currentContainer) {
|
||||
await currentContainer.stop({ t: 10 });
|
||||
await currentContainer.rename(`${containerName}-failed-${Date.now()}`);
|
||||
}
|
||||
|
||||
await previousContainer.rename(containerName);
|
||||
await previousContainer.start();
|
||||
|
||||
return {
|
||||
success: true,
|
||||
containerId: previousContainer.id,
|
||||
logs: this.getLogs(),
|
||||
durationMs: this.getDuration(),
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Version Sticker Management
|
||||
|
||||
```typescript
|
||||
async writeSticker(sticker: VersionSticker): Promise<void> {
|
||||
const stickerPath = this.config.stickerPath || "/var/stella/version.json";
|
||||
const stickerContent = JSON.stringify(sticker, null, 2);
|
||||
|
||||
// Write to host filesystem or container volume
|
||||
if (this.config.stickerLocation === "volume") {
|
||||
// Write to shared volume
|
||||
await this.docker.run("alpine", [
|
||||
"sh", "-c",
|
||||
`echo '${stickerContent}' > ${stickerPath}`
|
||||
], {
|
||||
HostConfig: {
|
||||
Binds: [`${this.config.stickerVolume}:/var/stella`]
|
||||
}
|
||||
});
|
||||
} else {
|
||||
// Write directly to host
|
||||
fs.writeFileSync(stickerPath, stickerContent);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Compose Agent Implementation
|
||||
|
||||
The Compose agent deploys multi-container applications defined in Docker Compose files.
|
||||
|
||||
### Compose Agent Capabilities
|
||||
|
||||
- Pull images for all services
|
||||
- Verify digests for all services
|
||||
- Deploy using compose lock files
|
||||
- Health check all services
|
||||
- Rollback to previous deployment
|
||||
- Version sticker management
|
||||
|
||||
### Deploy Task Flow
|
||||
|
||||
```typescript
|
||||
class ComposeAgent implements TargetExecutor {
|
||||
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
|
||||
const { artifacts, config } = task;
|
||||
const deployDir = config.deploymentDirectory;
|
||||
|
||||
// 1. Write compose lock file
|
||||
const composeLock = artifacts.find(a => a.type === "compose_lock");
|
||||
const composeContent = await this.fetchArtifact(composeLock);
|
||||
|
||||
const composePath = path.join(deployDir, "compose.stella.lock.yml");
|
||||
await fs.writeFile(composePath, composeContent);
|
||||
|
||||
// 2. Write any additional config files
|
||||
for (const artifact of artifacts.filter(a => a.type === "config")) {
|
||||
const content = await this.fetchArtifact(artifact);
|
||||
await fs.writeFile(path.join(deployDir, artifact.name), content);
|
||||
}
|
||||
|
||||
// 3. Run pre-deploy hook
|
||||
if (task.hooks?.preDeploy) {
|
||||
await this.runHook(task.hooks.preDeploy, deployDir);
|
||||
}
|
||||
|
||||
// 4. Pull images
|
||||
this.log("Pulling images...");
|
||||
const pullResult = await this.runCompose(deployDir, ["pull"]);
|
||||
if (!pullResult.success) {
|
||||
throw new Error(`Failed to pull images: ${pullResult.stderr}`);
|
||||
}
|
||||
|
||||
// 5. Verify digests
|
||||
await this.verifyDigests(composePath, config.expectedDigests);
|
||||
|
||||
// 6. Deploy
|
||||
this.log("Deploying services...");
|
||||
const upResult = await this.runCompose(deployDir, [
|
||||
"up", "-d",
|
||||
"--remove-orphans",
|
||||
"--force-recreate"
|
||||
]);
|
||||
|
||||
if (!upResult.success) {
|
||||
throw new Error(`Failed to deploy: ${upResult.stderr}`);
|
||||
}
|
||||
|
||||
// 7. Wait for services to be healthy
|
||||
if (config.healthCheck) {
|
||||
this.log("Waiting for services to be healthy...");
|
||||
const healthy = await this.waitForServicesHealthy(
|
||||
deployDir,
|
||||
config.healthCheck.timeout
|
||||
);
|
||||
|
||||
if (!healthy) {
|
||||
// Rollback
|
||||
await this.rollbackToBackup(deployDir);
|
||||
throw new HealthCheckFailedError("Services failed health check");
|
||||
}
|
||||
}
|
||||
|
||||
// 8. Run post-deploy hook
|
||||
if (task.hooks?.postDeploy) {
|
||||
await this.runHook(task.hooks.postDeploy, deployDir);
|
||||
}
|
||||
|
||||
// 9. Write version sticker
|
||||
await this.writeSticker(config.sticker, deployDir);
|
||||
|
||||
return {
|
||||
success: true,
|
||||
logs: this.getLogs(),
|
||||
durationMs: this.getDuration(),
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Digest Verification
|
||||
|
||||
```typescript
|
||||
private async verifyDigests(
|
||||
composePath: string,
|
||||
expectedDigests: Record<string, string>
|
||||
): Promise<void> {
|
||||
const composeContent = yaml.parse(await fs.readFile(composePath, "utf-8"));
|
||||
|
||||
for (const [service, expectedDigest] of Object.entries(expectedDigests)) {
|
||||
const serviceConfig = composeContent.services[service];
|
||||
if (!serviceConfig) {
|
||||
throw new Error(`Service ${service} not found in compose file`);
|
||||
}
|
||||
|
||||
const image = serviceConfig.image;
|
||||
if (!image.includes("@sha256:")) {
|
||||
throw new Error(`Service ${service} image not pinned to digest: ${image}`);
|
||||
}
|
||||
|
||||
const actualDigest = image.split("@")[1];
|
||||
if (actualDigest !== expectedDigest) {
|
||||
throw new DigestMismatchError(
|
||||
`Service ${service}: expected ${expectedDigest}, got ${actualDigest}`
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **Digest Verification:** All deployments verify image digests before execution
|
||||
2. **Credential Encryption:** Credentials are encrypted in transit and at rest
|
||||
3. **mTLS Communication:** All agent-server communication uses mutual TLS
|
||||
4. **Hook Sandboxing:** Pre/post-deploy hooks run in isolated environments
|
||||
5. **Audit Logging:** All deployment actions are logged with actor context
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Agents Module](../modules/agents.md)
|
||||
- [Agent Security](../security/agent-security.md)
|
||||
- [Deployment Orchestrator](../modules/deploy-orchestrator.md)
|
||||
- [Agentless Deployment](agentless.md)
|
||||
427
docs/modules/release-orchestrator/deployment/agentless.md
Normal file
427
docs/modules/release-orchestrator/deployment/agentless.md
Normal file
@@ -0,0 +1,427 @@
|
||||
# Agentless Deployment (SSH/WinRM)
|
||||
|
||||
> Agentless deployment using SSH and WinRM for remote execution without installing agents.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 10.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Agents Module](../modules/agents.md), [Deploy Orchestrator](../modules/deploy-orchestrator.md)
|
||||
**Sprints:** [108_004 SSH Agent](../../../../implplan/SPRINT_20260110_108_004_AGENTS_ssh.md), [108_005 WinRM Agent](../../../../implplan/SPRINT_20260110_108_005_AGENTS_winrm.md)
|
||||
|
||||
## Overview
|
||||
|
||||
Agentless deployment enables deployment to targets without requiring a pre-installed agent. The orchestrator connects directly to targets using SSH (Linux/Unix) or WinRM (Windows) to execute deployment commands.
|
||||
|
||||
---
|
||||
|
||||
## SSH Remote Executor
|
||||
|
||||
### Capabilities
|
||||
|
||||
- SSH key-based authentication
|
||||
- File transfer via SFTP
|
||||
- Remote command execution
|
||||
- Docker operations over SSH
|
||||
- Script execution
|
||||
- Backup and rollback
|
||||
|
||||
### Connection Management
|
||||
|
||||
```typescript
|
||||
class SSHRemoteExecutor implements TargetExecutor {
|
||||
private ssh: SSHClient;
|
||||
|
||||
async connect(config: SSHConnectionConfig): Promise<void> {
|
||||
const privateKey = await this.secrets.getSecret(config.privateKeyRef);
|
||||
|
||||
this.ssh = new SSHClient();
|
||||
await this.ssh.connect({
|
||||
host: config.host,
|
||||
port: config.port || 22,
|
||||
username: config.username,
|
||||
privateKey: privateKey.value,
|
||||
readyTimeout: config.connectionTimeout || 30000,
|
||||
keepaliveInterval: 10000,
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Deploy Task Flow
|
||||
|
||||
```typescript
|
||||
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
|
||||
const { artifacts, config } = task;
|
||||
const deployDir = config.deploymentDirectory;
|
||||
|
||||
try {
|
||||
// 1. Ensure deployment directory exists
|
||||
await this.exec(`mkdir -p ${deployDir}`);
|
||||
await this.exec(`mkdir -p ${deployDir}/.stella-backup`);
|
||||
|
||||
// 2. Backup current deployment
|
||||
await this.exec(`cp -r ${deployDir}/* ${deployDir}/.stella-backup/ 2>/dev/null || true`);
|
||||
|
||||
// 3. Upload artifacts
|
||||
for (const artifact of artifacts) {
|
||||
const content = await this.fetchArtifact(artifact);
|
||||
const remotePath = path.join(deployDir, artifact.name);
|
||||
await this.uploadFile(content, remotePath);
|
||||
}
|
||||
|
||||
// 4. Run pre-deploy hook
|
||||
if (task.hooks?.preDeploy) {
|
||||
await this.runRemoteHook(task.hooks.preDeploy, deployDir);
|
||||
}
|
||||
|
||||
// 5. Execute deployment script
|
||||
const deployScript = artifacts.find(a => a.type === "deploy_script");
|
||||
if (deployScript) {
|
||||
const scriptPath = path.join(deployDir, deployScript.name);
|
||||
await this.exec(`chmod +x ${scriptPath}`);
|
||||
|
||||
const result = await this.exec(scriptPath, {
|
||||
cwd: deployDir,
|
||||
timeout: config.deploymentTimeout,
|
||||
env: config.environment,
|
||||
});
|
||||
|
||||
if (result.exitCode !== 0) {
|
||||
throw new DeploymentError(`Deploy script failed: ${result.stderr}`);
|
||||
}
|
||||
}
|
||||
|
||||
// 6. Run post-deploy hook
|
||||
if (task.hooks?.postDeploy) {
|
||||
await this.runRemoteHook(task.hooks.postDeploy, deployDir);
|
||||
}
|
||||
|
||||
// 7. Health check
|
||||
if (config.healthCheck) {
|
||||
const healthy = await this.runHealthCheck(config.healthCheck);
|
||||
if (!healthy) {
|
||||
await this.rollback(task);
|
||||
throw new HealthCheckFailedError("Health check failed");
|
||||
}
|
||||
}
|
||||
|
||||
// 8. Write version sticker
|
||||
await this.writeSticker(config.sticker, deployDir);
|
||||
|
||||
// 9. Cleanup backup
|
||||
await this.exec(`rm -rf ${deployDir}/.stella-backup`);
|
||||
|
||||
return {
|
||||
success: true,
|
||||
logs: this.getLogs(),
|
||||
durationMs: this.getDuration(),
|
||||
};
|
||||
|
||||
} finally {
|
||||
this.ssh.end();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Command Execution
|
||||
|
||||
```typescript
|
||||
private async exec(
|
||||
command: string,
|
||||
options?: ExecOptions
|
||||
): Promise<CommandResult> {
|
||||
return new Promise((resolve, reject) => {
|
||||
const timeout = options?.timeout || 60000;
|
||||
let stdout = "";
|
||||
let stderr = "";
|
||||
|
||||
this.ssh.exec(command, { cwd: options?.cwd }, (err, stream) => {
|
||||
if (err) {
|
||||
reject(err);
|
||||
return;
|
||||
}
|
||||
|
||||
const timer = setTimeout(() => {
|
||||
stream.close();
|
||||
reject(new TimeoutError(`Command timed out after ${timeout}ms`));
|
||||
}, timeout);
|
||||
|
||||
stream.on("data", (data: Buffer) => {
|
||||
stdout += data.toString();
|
||||
this.log(data.toString());
|
||||
});
|
||||
|
||||
stream.stderr.on("data", (data: Buffer) => {
|
||||
stderr += data.toString();
|
||||
this.log(`[stderr] ${data.toString()}`);
|
||||
});
|
||||
|
||||
stream.on("close", (code: number) => {
|
||||
clearTimeout(timer);
|
||||
resolve({ exitCode: code, stdout, stderr });
|
||||
});
|
||||
});
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### File Upload via SFTP
|
||||
|
||||
```typescript
|
||||
private async uploadFile(content: Buffer | string, remotePath: string): Promise<void> {
|
||||
return new Promise((resolve, reject) => {
|
||||
this.ssh.sftp((err, sftp) => {
|
||||
if (err) {
|
||||
reject(err);
|
||||
return;
|
||||
}
|
||||
|
||||
const writeStream = sftp.createWriteStream(remotePath);
|
||||
writeStream.on("close", () => resolve());
|
||||
writeStream.on("error", reject);
|
||||
writeStream.end(content);
|
||||
});
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Rollback
|
||||
|
||||
```typescript
|
||||
async rollback(task: RollbackTaskPayload): Promise<DeployResult> {
|
||||
const deployDir = task.config.deploymentDirectory;
|
||||
|
||||
// Restore from backup
|
||||
await this.exec(`rm -rf ${deployDir}/*`);
|
||||
await this.exec(`cp -r ${deployDir}/.stella-backup/* ${deployDir}/`);
|
||||
|
||||
// Re-run deployment from backup
|
||||
const deployScript = path.join(deployDir, "deploy.sh");
|
||||
await this.exec(deployScript, { cwd: deployDir });
|
||||
|
||||
return {
|
||||
success: true,
|
||||
logs: this.getLogs(),
|
||||
durationMs: this.getDuration(),
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## WinRM Remote Executor
|
||||
|
||||
### Capabilities
|
||||
|
||||
- NTLM/Kerberos authentication
|
||||
- PowerShell script execution
|
||||
- File transfer via base64 encoding
|
||||
- Windows container operations
|
||||
- Windows service management
|
||||
|
||||
### Connection Management
|
||||
|
||||
```typescript
|
||||
class WinRMRemoteExecutor implements TargetExecutor {
|
||||
private winrm: WinRMClient;
|
||||
|
||||
async connect(config: WinRMConnectionConfig): Promise<void> {
|
||||
const credential = await this.secrets.getSecret(config.credentialRef);
|
||||
|
||||
this.winrm = new WinRMClient({
|
||||
host: config.host,
|
||||
port: config.port || 5986,
|
||||
username: credential.username,
|
||||
password: credential.password,
|
||||
protocol: config.useHttps ? "https" : "http",
|
||||
authentication: config.authType || "ntlm", // ntlm, kerberos, basic
|
||||
});
|
||||
|
||||
await this.winrm.openShell();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Deploy Task Flow
|
||||
|
||||
```typescript
|
||||
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
|
||||
const { artifacts, config } = task;
|
||||
const deployDir = config.deploymentDirectory;
|
||||
|
||||
try {
|
||||
// 1. Ensure deployment directory exists
|
||||
await this.execPowerShell(`
|
||||
if (-not (Test-Path "${deployDir}")) {
|
||||
New-Item -ItemType Directory -Path "${deployDir}" -Force
|
||||
}
|
||||
if (-not (Test-Path "${deployDir}\\.stella-backup")) {
|
||||
New-Item -ItemType Directory -Path "${deployDir}\\.stella-backup" -Force
|
||||
}
|
||||
`);
|
||||
|
||||
// 2. Backup current deployment
|
||||
await this.execPowerShell(`
|
||||
Get-ChildItem "${deployDir}" -Exclude ".stella-backup" |
|
||||
Copy-Item -Destination "${deployDir}\\.stella-backup" -Recurse -Force
|
||||
`);
|
||||
|
||||
// 3. Upload artifacts
|
||||
for (const artifact of artifacts) {
|
||||
const content = await this.fetchArtifact(artifact);
|
||||
const remotePath = `${deployDir}\\${artifact.name}`;
|
||||
await this.uploadFile(content, remotePath);
|
||||
}
|
||||
|
||||
// 4. Run pre-deploy hook
|
||||
if (task.hooks?.preDeploy) {
|
||||
await this.runRemoteHook(task.hooks.preDeploy, deployDir);
|
||||
}
|
||||
|
||||
// 5. Execute deployment script
|
||||
const deployScript = artifacts.find(a => a.type === "deploy_script");
|
||||
if (deployScript) {
|
||||
const scriptPath = `${deployDir}\\${deployScript.name}`;
|
||||
|
||||
const result = await this.execPowerShell(`
|
||||
Set-Location "${deployDir}"
|
||||
& "${scriptPath}"
|
||||
exit $LASTEXITCODE
|
||||
`, { timeout: config.deploymentTimeout });
|
||||
|
||||
if (result.exitCode !== 0) {
|
||||
throw new DeploymentError(`Deploy script failed: ${result.stderr}`);
|
||||
}
|
||||
}
|
||||
|
||||
// 6. Run post-deploy hook
|
||||
if (task.hooks?.postDeploy) {
|
||||
await this.runRemoteHook(task.hooks.postDeploy, deployDir);
|
||||
}
|
||||
|
||||
// 7. Health check
|
||||
if (config.healthCheck) {
|
||||
const healthy = await this.runHealthCheck(config.healthCheck);
|
||||
if (!healthy) {
|
||||
await this.rollback(task);
|
||||
throw new HealthCheckFailedError("Health check failed");
|
||||
}
|
||||
}
|
||||
|
||||
// 8. Write version sticker
|
||||
await this.writeSticker(config.sticker, deployDir);
|
||||
|
||||
// 9. Cleanup backup
|
||||
await this.execPowerShell(`
|
||||
Remove-Item -Path "${deployDir}\\.stella-backup" -Recurse -Force
|
||||
`);
|
||||
|
||||
return {
|
||||
success: true,
|
||||
logs: this.getLogs(),
|
||||
durationMs: this.getDuration(),
|
||||
};
|
||||
|
||||
} finally {
|
||||
this.winrm.closeShell();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### PowerShell Execution
|
||||
|
||||
```typescript
|
||||
private async execPowerShell(
|
||||
script: string,
|
||||
options?: ExecOptions
|
||||
): Promise<CommandResult> {
|
||||
const encoded = Buffer.from(script, "utf16le").toString("base64");
|
||||
return this.winrm.runCommand(
|
||||
`powershell -EncodedCommand ${encoded}`,
|
||||
{ timeout: options?.timeout || 60000 }
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
### File Upload
|
||||
|
||||
```typescript
|
||||
private async uploadFile(content: Buffer | string, remotePath: string): Promise<void> {
|
||||
// Use PowerShell to write file content
|
||||
const base64Content = Buffer.from(content).toString("base64");
|
||||
|
||||
await this.execPowerShell(`
|
||||
$bytes = [Convert]::FromBase64String("${base64Content}")
|
||||
[IO.File]::WriteAllBytes("${remotePath}", $bytes)
|
||||
`);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### SSH Security
|
||||
|
||||
1. **Key-Based Authentication:** Always use SSH keys, never passwords
|
||||
2. **Key Rotation:** Regularly rotate SSH keys
|
||||
3. **Bastion Hosts:** Use jump hosts for network isolation
|
||||
4. **Connection Timeouts:** Enforce strict connection timeouts
|
||||
5. **Known Hosts:** Verify host fingerprints
|
||||
|
||||
### WinRM Security
|
||||
|
||||
1. **HTTPS Required:** Always use WinRM over HTTPS in production
|
||||
2. **Certificate Validation:** Validate server certificates
|
||||
3. **Kerberos Preferred:** Use Kerberos when available, NTLM as fallback
|
||||
4. **Credential Protection:** Store credentials in vault
|
||||
5. **Session Cleanup:** Always close sessions after use
|
||||
|
||||
---
|
||||
|
||||
## Configuration Examples
|
||||
|
||||
### SSH Target Configuration
|
||||
|
||||
```yaml
|
||||
target:
|
||||
name: web-server-01
|
||||
type: ssh
|
||||
connection:
|
||||
host: 192.168.1.100
|
||||
port: 22
|
||||
username: deploy
|
||||
privateKeyRef: vault://ssh-keys/deploy-key
|
||||
deployment:
|
||||
directory: /opt/myapp
|
||||
healthCheck:
|
||||
command: curl -f http://localhost:8080/health
|
||||
timeout: 30
|
||||
```
|
||||
|
||||
### WinRM Target Configuration
|
||||
|
||||
```yaml
|
||||
target:
|
||||
name: windows-server-01
|
||||
type: winrm
|
||||
connection:
|
||||
host: 192.168.1.200
|
||||
port: 5986
|
||||
useHttps: true
|
||||
authType: kerberos
|
||||
credentialRef: vault://windows-creds/deploy-user
|
||||
deployment:
|
||||
directory: C:\Apps\MyApp
|
||||
healthCheck:
|
||||
command: Invoke-WebRequest -Uri http://localhost:8080/health -UseBasicParsing
|
||||
timeout: 30
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Agent-Based Deployment](agent-based.md)
|
||||
- [Agents Module](../modules/agents.md)
|
||||
- [Deployment Orchestrator](../modules/deploy-orchestrator.md)
|
||||
- [Security Overview](../security/overview.md)
|
||||
308
docs/modules/release-orchestrator/deployment/artifacts.md
Normal file
308
docs/modules/release-orchestrator/deployment/artifacts.md
Normal file
@@ -0,0 +1,308 @@
|
||||
# Artifact Generation
|
||||
|
||||
## Overview
|
||||
|
||||
Every deployment generates immutable artifacts that enable reproducibility, audit, and rollback.
|
||||
|
||||
## Generated Artifacts
|
||||
|
||||
### 1. Compose Lock File
|
||||
|
||||
**File:** `compose.stella.lock.yml`
|
||||
|
||||
A Docker Compose file with all image references pinned to specific digests.
|
||||
|
||||
```yaml
|
||||
# compose.stella.lock.yml
|
||||
# Generated by Stella Ops - DO NOT EDIT
|
||||
# Release: myapp-v2.3.1
|
||||
# Generated: 2026-01-10T14:30:00Z
|
||||
# Generator: stella-artifact-generator@1.5.0
|
||||
|
||||
version: "3.8"
|
||||
|
||||
services:
|
||||
api:
|
||||
image: registry.example.com/myapp/api@sha256:abc123...
|
||||
# Original tag: v2.3.1
|
||||
deploy:
|
||||
replicas: 2
|
||||
environment:
|
||||
- DATABASE_URL=${DATABASE_URL}
|
||||
- REDIS_URL=${REDIS_URL}
|
||||
labels:
|
||||
stella.component.id: "comp-api-uuid"
|
||||
stella.release.id: "rel-uuid"
|
||||
stella.digest: "sha256:abc123..."
|
||||
|
||||
worker:
|
||||
image: registry.example.com/myapp/worker@sha256:def456...
|
||||
# Original tag: v2.3.1
|
||||
deploy:
|
||||
replicas: 1
|
||||
labels:
|
||||
stella.component.id: "comp-worker-uuid"
|
||||
stella.release.id: "rel-uuid"
|
||||
stella.digest: "sha256:def456..."
|
||||
|
||||
# Stella metadata
|
||||
x-stella:
|
||||
release:
|
||||
id: "rel-uuid"
|
||||
name: "myapp-v2.3.1"
|
||||
created_at: "2026-01-10T14:00:00Z"
|
||||
environment:
|
||||
id: "env-uuid"
|
||||
name: "production"
|
||||
deployment:
|
||||
id: "deploy-uuid"
|
||||
started_at: "2026-01-10T14:30:00Z"
|
||||
checksums:
|
||||
sha256: "checksum-of-this-file"
|
||||
```
|
||||
|
||||
### 2. Version Sticker
|
||||
|
||||
**File:** `stella.version.json`
|
||||
|
||||
Metadata file placed on deployment targets indicating current deployment state.
|
||||
|
||||
```json
|
||||
{
|
||||
"version": "1.0",
|
||||
"generatedAt": "2026-01-10T14:35:00Z",
|
||||
"generator": "stella-artifact-generator@1.5.0",
|
||||
|
||||
"release": {
|
||||
"id": "rel-uuid",
|
||||
"name": "myapp-v2.3.1",
|
||||
"createdAt": "2026-01-10T14:00:00Z",
|
||||
"components": [
|
||||
{
|
||||
"name": "api",
|
||||
"digest": "sha256:abc123...",
|
||||
"semver": "2.3.1",
|
||||
"tag": "v2.3.1"
|
||||
},
|
||||
{
|
||||
"name": "worker",
|
||||
"digest": "sha256:def456...",
|
||||
"semver": "2.3.1",
|
||||
"tag": "v2.3.1"
|
||||
}
|
||||
]
|
||||
},
|
||||
|
||||
"deployment": {
|
||||
"id": "deploy-uuid",
|
||||
"promotionId": "promo-uuid",
|
||||
"environmentId": "env-uuid",
|
||||
"environmentName": "production",
|
||||
"targetId": "target-uuid",
|
||||
"targetName": "prod-web-01",
|
||||
"strategy": "rolling",
|
||||
"startedAt": "2026-01-10T14:30:00Z",
|
||||
"completedAt": "2026-01-10T14:35:00Z"
|
||||
},
|
||||
|
||||
"deployer": {
|
||||
"userId": "user-uuid",
|
||||
"userName": "john.doe",
|
||||
"agentId": "agent-uuid",
|
||||
"agentName": "prod-agent-01"
|
||||
},
|
||||
|
||||
"previous": {
|
||||
"releaseId": "prev-rel-uuid",
|
||||
"releaseName": "myapp-v2.3.0",
|
||||
"digest": "sha256:789..."
|
||||
},
|
||||
|
||||
"signature": "base64-encoded-signature",
|
||||
"signatureAlgorithm": "RS256",
|
||||
"signerKeyRef": "stella/signing/prod-key-2026"
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Evidence Packet
|
||||
|
||||
**File:** Evidence stored in database (exportable as JSON/PDF)
|
||||
|
||||
See [Evidence Schema](../appendices/evidence-schema.md) for full specification.
|
||||
|
||||
### 4. Deployment Script (Optional)
|
||||
|
||||
**File:** `deploy.stella.script.dll` or `deploy.stella.sh`
|
||||
|
||||
When deployments use C# or shell scripts with hooks:
|
||||
|
||||
```csharp
|
||||
// deploy.stella.csx (source, compiled to DLL)
|
||||
#r "nuget: StellaOps.Sdk, 1.0.0"
|
||||
|
||||
using StellaOps.Sdk;
|
||||
|
||||
// Pre-deploy hook
|
||||
await Context.RunPreDeployHook(async (ctx) => {
|
||||
await ctx.ExecuteCommand("./scripts/backup-database.sh");
|
||||
await ctx.HealthCheck("/ready", timeout: 30);
|
||||
});
|
||||
|
||||
// Deploy
|
||||
await Context.Deploy();
|
||||
|
||||
// Post-deploy hook
|
||||
await Context.RunPostDeployHook(async (ctx) => {
|
||||
await ctx.ExecuteCommand("./scripts/warm-cache.sh");
|
||||
await ctx.Notify("slack", "Deployment complete");
|
||||
});
|
||||
```
|
||||
|
||||
## Artifact Storage
|
||||
|
||||
### Storage Structure
|
||||
|
||||
```
|
||||
artifacts/
|
||||
├── {tenant_id}/
|
||||
│ ├── {deployment_id}/
|
||||
│ │ ├── compose.stella.lock.yml
|
||||
│ │ ├── deploy.stella.script.dll (if applicable)
|
||||
│ │ ├── deploy.stella.script.csx (source)
|
||||
│ │ ├── manifest.json
|
||||
│ │ └── checksums.sha256
|
||||
│ └── ...
|
||||
└── ...
|
||||
```
|
||||
|
||||
### Manifest File
|
||||
|
||||
```json
|
||||
{
|
||||
"version": "1.0",
|
||||
"deploymentId": "deploy-uuid",
|
||||
"createdAt": "2026-01-10T14:30:00Z",
|
||||
"artifacts": [
|
||||
{
|
||||
"name": "compose.stella.lock.yml",
|
||||
"type": "compose-lock",
|
||||
"size": 2048,
|
||||
"sha256": "abc123..."
|
||||
},
|
||||
{
|
||||
"name": "deploy.stella.script.dll",
|
||||
"type": "script-compiled",
|
||||
"size": 8192,
|
||||
"sha256": "def456..."
|
||||
}
|
||||
],
|
||||
"totalSize": 10240,
|
||||
"signature": "base64-signature"
|
||||
}
|
||||
```
|
||||
|
||||
## Artifact Generation Process
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ARTIFACT GENERATION FLOW │
|
||||
│ │
|
||||
│ ┌─────────────────┐ │
|
||||
│ │ Promotion │ │
|
||||
│ │ Approved │ │
|
||||
│ └────────┬────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ ARTIFACT GENERATOR │ │
|
||||
│ │ │ │
|
||||
│ │ 1. Load release bundle (components, digests) │ │
|
||||
│ │ 2. Load environment configuration (variables, secrets refs) │ │
|
||||
│ │ 3. Load workflow template (hooks, scripts) │ │
|
||||
│ │ 4. Generate compose.stella.lock.yml │ │
|
||||
│ │ 5. Compile scripts (if any) │ │
|
||||
│ │ 6. Generate version sticker template │ │
|
||||
│ │ 7. Compute checksums │ │
|
||||
│ │ 8. Sign artifacts │ │
|
||||
│ │ 9. Store in artifact storage │ │
|
||||
│ │ │ │
|
||||
│ └────────────────────────────┬────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ DEPLOYMENT ORCHESTRATOR │ │
|
||||
│ │ │ │
|
||||
│ │ Artifacts distributed to targets via agents │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Artifact Properties
|
||||
|
||||
### Immutability
|
||||
|
||||
Once generated, artifacts are never modified:
|
||||
- Content-addressed storage (hash in path/metadata)
|
||||
- No overwrite capability
|
||||
- Append-only storage pattern
|
||||
|
||||
### Integrity
|
||||
|
||||
All artifacts are:
|
||||
- Checksummed (SHA-256)
|
||||
- Signed with deployment key
|
||||
- Verifiable at deployment time
|
||||
|
||||
### Retention
|
||||
|
||||
| Environment | Retention Period |
|
||||
|-------------|------------------|
|
||||
| Development | 30 days |
|
||||
| Staging | 90 days |
|
||||
| Production | 7 years (compliance) |
|
||||
|
||||
## API Operations
|
||||
|
||||
```yaml
|
||||
# List artifacts for deployment
|
||||
GET /api/v1/deployment-jobs/{id}/artifacts
|
||||
Response: Artifact[]
|
||||
|
||||
# Download specific artifact
|
||||
GET /api/v1/deployment-jobs/{id}/artifacts/{name}
|
||||
Response: binary
|
||||
|
||||
# Get artifact manifest
|
||||
GET /api/v1/deployment-jobs/{id}/artifacts/manifest
|
||||
Response: ArtifactManifest
|
||||
|
||||
# Verify artifact integrity
|
||||
POST /api/v1/deployment-jobs/{id}/artifacts/{name}/verify
|
||||
Response: { valid: boolean, checksum: string, signature: string }
|
||||
```
|
||||
|
||||
## Drift Detection
|
||||
|
||||
Version stickers enable drift detection:
|
||||
|
||||
```typescript
|
||||
interface DriftCheck {
|
||||
targetId: UUID;
|
||||
expectedSticker: VersionSticker;
|
||||
actualSticker: VersionSticker | null;
|
||||
driftDetected: boolean;
|
||||
driftType?: "missing" | "corrupted" | "mismatch";
|
||||
details?: {
|
||||
expectedDigest: string;
|
||||
actualDigest: string;
|
||||
field: string;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Deployment Overview](overview.md)
|
||||
- [Deployment Strategies](strategies.md)
|
||||
- [Evidence Schema](../appendices/evidence-schema.md)
|
||||
671
docs/modules/release-orchestrator/deployment/overview.md
Normal file
671
docs/modules/release-orchestrator/deployment/overview.md
Normal file
@@ -0,0 +1,671 @@
|
||||
# Deployment Overview
|
||||
|
||||
## Purpose
|
||||
|
||||
The Deployment system executes the actual deployment of releases to target environments, managing deployment jobs, tasks, artifact generation, and rollback capabilities.
|
||||
|
||||
## Deployment Architecture
|
||||
|
||||
```
|
||||
DEPLOYMENT ARCHITECTURE
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ DEPLOY ORCHESTRATOR │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ DEPLOYMENT JOB MANAGER │ │
|
||||
│ │ │ │
|
||||
│ │ Promotion ───► Create Job ───► Plan Tasks ───► Execute Tasks │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────────────┼───────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌─────────────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │
|
||||
│ │ TARGET EXECUTOR │ │ RUNNER EXECUTOR │ │ ARTIFACT GENERATOR │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ - Task dispatch │ │ - Agent tasks │ │ - Compose files │ │
|
||||
│ │ - Status tracking │ │ - SSH tasks │ │ - Env configs │ │
|
||||
│ │ - Log aggregation │ │ - API tasks │ │ - Manifests │ │
|
||||
│ └─────────────────────┘ └─────────────────┘ └─────────────────────┘ │
|
||||
│ │ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌────────────────────────────┼────────────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Agent │ │ Agentless │ │ API │
|
||||
│ Execution │ │ Execution │ │ Execution │
|
||||
│ │ │ │ │ │
|
||||
│ Docker, │ │ SSH, │ │ ECS, │
|
||||
│ Compose │ │ WinRM │ │ Nomad │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
## Deployment Flow
|
||||
|
||||
### Standard Deployment Flow
|
||||
|
||||
```
|
||||
DEPLOYMENT FLOW
|
||||
|
||||
Promotion Deployment Task Agent/Target
|
||||
Approved Job Execution
|
||||
│ │ │ │
|
||||
│ Create Job │ │ │
|
||||
├───────────────►│ │ │
|
||||
│ │ │ │
|
||||
│ │ Generate │ │
|
||||
│ │ Artifacts │ │
|
||||
│ ├────────────────►│ │
|
||||
│ │ │ │
|
||||
│ │ Create Tasks │ │
|
||||
│ │ per Target │ │
|
||||
│ ├────────────────►│ │
|
||||
│ │ │ │
|
||||
│ │ │ Dispatch Task │
|
||||
│ │ ├────────────────►│
|
||||
│ │ │ │
|
||||
│ │ │ Execute │
|
||||
│ │ │ (Pull, Deploy) │
|
||||
│ │ │ │
|
||||
│ │ │ Report Status │
|
||||
│ │ │◄────────────────┤
|
||||
│ │ │ │
|
||||
│ │ Aggregate │ │
|
||||
│ │ Results │ │
|
||||
│ │◄────────────────┤ │
|
||||
│ │ │ │
|
||||
│ Job Complete │ │ │
|
||||
│◄───────────────┤ │ │
|
||||
│ │ │ │
|
||||
```
|
||||
|
||||
## Deployment Job
|
||||
|
||||
### Job Entity
|
||||
|
||||
```typescript
|
||||
interface DeploymentJob {
|
||||
id: UUID;
|
||||
promotionId: UUID;
|
||||
releaseId: UUID;
|
||||
environmentId: UUID;
|
||||
|
||||
// Execution configuration
|
||||
strategy: DeploymentStrategy;
|
||||
parallelism: number;
|
||||
|
||||
// Status tracking
|
||||
status: JobStatus;
|
||||
startedAt?: DateTime;
|
||||
completedAt?: DateTime;
|
||||
|
||||
// Artifacts
|
||||
artifacts: GeneratedArtifact[];
|
||||
|
||||
// Rollback reference
|
||||
rollbackOf?: UUID; // If this is a rollback job
|
||||
previousJobId?: UUID; // Previous successful job
|
||||
|
||||
// Tasks
|
||||
tasks: DeploymentTask[];
|
||||
}
|
||||
|
||||
type JobStatus =
|
||||
| "pending"
|
||||
| "preparing"
|
||||
| "running"
|
||||
| "completing"
|
||||
| "completed"
|
||||
| "failed"
|
||||
| "rolling_back"
|
||||
| "rolled_back";
|
||||
|
||||
type DeploymentStrategy =
|
||||
| "all-at-once"
|
||||
| "rolling"
|
||||
| "canary"
|
||||
| "blue-green";
|
||||
```
|
||||
|
||||
### Job State Machine
|
||||
|
||||
```
|
||||
JOB STATE MACHINE
|
||||
|
||||
┌──────────┐
|
||||
│ PENDING │
|
||||
└────┬─────┘
|
||||
│ start()
|
||||
▼
|
||||
┌──────────┐
|
||||
│PREPARING │
|
||||
│ │
|
||||
│ Generate │
|
||||
│ artifacts│
|
||||
└────┬─────┘
|
||||
│
|
||||
▼
|
||||
┌──────────┐
|
||||
│ RUNNING │◄────────────────┐
|
||||
│ │ │
|
||||
│ Execute │ │
|
||||
│ tasks │ │
|
||||
└────┬─────┘ │
|
||||
│ │
|
||||
┌───────────────┼───────────────┐ │
|
||||
│ │ │ │
|
||||
▼ ▼ ▼ │
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│COMPLETING│ │ FAILED │ │ ROLLING │ │
|
||||
│ │ │ │ │ BACK │──┘
|
||||
│ Verify │ │ │ │ │
|
||||
│ health │ │ │ │ │
|
||||
└────┬─────┘ └────┬─────┘ └────┬─────┘
|
||||
│ │ │
|
||||
▼ │ ▼
|
||||
┌──────────┐ │ ┌──────────┐
|
||||
│COMPLETED │ │ │ ROLLED │
|
||||
└──────────┘ │ │ BACK │
|
||||
│ └──────────┘
|
||||
│
|
||||
▼
|
||||
[Failure
|
||||
handling]
|
||||
```
|
||||
|
||||
## Deployment Task
|
||||
|
||||
### Task Entity
|
||||
|
||||
```typescript
|
||||
interface DeploymentTask {
|
||||
id: UUID;
|
||||
jobId: UUID;
|
||||
targetId: UUID;
|
||||
|
||||
// What to deploy
|
||||
componentId: UUID;
|
||||
digest: string;
|
||||
|
||||
// Execution
|
||||
status: TaskStatus;
|
||||
agentId?: UUID;
|
||||
startedAt?: DateTime;
|
||||
completedAt?: DateTime;
|
||||
|
||||
// Results
|
||||
logs: string;
|
||||
previousDigest?: string; // For rollback
|
||||
error?: string;
|
||||
|
||||
// Retry tracking
|
||||
attemptNumber: number;
|
||||
maxAttempts: number;
|
||||
}
|
||||
|
||||
type TaskStatus =
|
||||
| "pending"
|
||||
| "queued"
|
||||
| "dispatched"
|
||||
| "running"
|
||||
| "verifying"
|
||||
| "succeeded"
|
||||
| "failed"
|
||||
| "retrying";
|
||||
```
|
||||
|
||||
### Task Dispatch
|
||||
|
||||
```typescript
|
||||
class TaskDispatcher {
|
||||
async dispatchTask(task: DeploymentTask): Promise<void> {
|
||||
const target = await this.targetRepository.get(task.targetId);
|
||||
|
||||
switch (target.executionModel) {
|
||||
case "agent":
|
||||
await this.dispatchToAgent(task, target);
|
||||
break;
|
||||
|
||||
case "ssh":
|
||||
await this.dispatchViaSsh(task, target);
|
||||
break;
|
||||
|
||||
case "api":
|
||||
await this.dispatchViaApi(task, target);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
private async dispatchToAgent(
|
||||
task: DeploymentTask,
|
||||
target: Target
|
||||
): Promise<void> {
|
||||
// Find available agent for target
|
||||
const agent = await this.agentManager.findAgentForTarget(target);
|
||||
|
||||
if (!agent) {
|
||||
throw new NoAgentAvailableError(target.id);
|
||||
}
|
||||
|
||||
// Create task payload
|
||||
const payload: AgentTaskPayload = {
|
||||
taskId: task.id,
|
||||
targetId: target.id,
|
||||
action: "deploy",
|
||||
digest: task.digest,
|
||||
config: target.connection,
|
||||
credentials: await this.fetchTaskCredentials(target)
|
||||
};
|
||||
|
||||
// Dispatch to agent
|
||||
await this.agentClient.dispatchTask(agent.id, payload);
|
||||
|
||||
// Update task status
|
||||
task.status = "dispatched";
|
||||
task.agentId = agent.id;
|
||||
await this.taskRepository.update(task);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Generated Artifacts
|
||||
|
||||
### Artifact Types
|
||||
|
||||
| Type | Description | Format |
|
||||
|------|-------------|--------|
|
||||
| `compose-file` | Docker Compose file | YAML |
|
||||
| `compose-lock` | Pinned compose file | YAML |
|
||||
| `env-file` | Environment variables | .env |
|
||||
| `systemd-unit` | Systemd service unit | .service |
|
||||
| `nginx-config` | Nginx configuration | .conf |
|
||||
| `manifest` | Deployment manifest | JSON |
|
||||
|
||||
### Compose Lock Generation
|
||||
|
||||
```typescript
|
||||
interface ComposeLock {
|
||||
version: string;
|
||||
services: Record<string, LockedService>;
|
||||
generated: {
|
||||
releaseId: string;
|
||||
promotionId: string;
|
||||
timestamp: string;
|
||||
digest: string; // Hash of this file
|
||||
};
|
||||
}
|
||||
|
||||
interface LockedService {
|
||||
image: string; // Full image reference with digest
|
||||
environment?: Record<string, string>;
|
||||
labels: Record<string, string>;
|
||||
}
|
||||
|
||||
class ComposeArtifactGenerator {
|
||||
async generateLock(
|
||||
release: Release,
|
||||
target: Target,
|
||||
template: ComposeTemplate
|
||||
): Promise<ComposeLock> {
|
||||
const services: Record<string, LockedService> = {};
|
||||
|
||||
for (const [serviceName, serviceConfig] of Object.entries(template.services)) {
|
||||
// Find component for this service
|
||||
const componentDigest = release.components.find(
|
||||
c => c.name === serviceConfig.componentName
|
||||
);
|
||||
|
||||
if (!componentDigest) {
|
||||
throw new Error(`No component found for service ${serviceName}`);
|
||||
}
|
||||
|
||||
// Build locked image reference
|
||||
const imageRef = `${componentDigest.repository}@${componentDigest.digest}`;
|
||||
|
||||
services[serviceName] = {
|
||||
image: imageRef,
|
||||
environment: {
|
||||
...serviceConfig.environment,
|
||||
STELLA_RELEASE_ID: release.id,
|
||||
STELLA_DIGEST: componentDigest.digest
|
||||
},
|
||||
labels: {
|
||||
"stella.release.id": release.id,
|
||||
"stella.component.name": componentDigest.name,
|
||||
"stella.digest": componentDigest.digest,
|
||||
"stella.deployed.at": new Date().toISOString()
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
const lock: ComposeLock = {
|
||||
version: "3.8",
|
||||
services,
|
||||
generated: {
|
||||
releaseId: release.id,
|
||||
promotionId: target.promotionId,
|
||||
timestamp: new Date().toISOString(),
|
||||
digest: "" // Computed below
|
||||
}
|
||||
};
|
||||
|
||||
// Compute content hash
|
||||
const content = yaml.stringify(lock);
|
||||
lock.generated.digest = crypto.createHash("sha256").update(content).digest("hex");
|
||||
|
||||
return lock;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Deployment Execution
|
||||
|
||||
### Execution Models
|
||||
|
||||
| Model | Description | Use Case |
|
||||
|-------|-------------|----------|
|
||||
| `agent` | Stella agent on target | Docker hosts, servers |
|
||||
| `ssh` | SSH-based agentless | Unix servers |
|
||||
| `winrm` | WinRM-based agentless | Windows servers |
|
||||
| `api` | API-based | ECS, Nomad, K8s |
|
||||
|
||||
### Agent-Based Execution
|
||||
|
||||
```typescript
|
||||
class AgentExecutor {
|
||||
async execute(task: DeploymentTask): Promise<ExecutionResult> {
|
||||
const agent = await this.agentManager.get(task.agentId);
|
||||
const target = await this.targetRepository.get(task.targetId);
|
||||
|
||||
// Prepare task payload with secrets
|
||||
const payload: TaskPayload = {
|
||||
taskId: task.id,
|
||||
targetId: target.id,
|
||||
action: "deploy",
|
||||
digest: task.digest,
|
||||
config: target.connection,
|
||||
artifacts: await this.getArtifacts(task.jobId),
|
||||
credentials: await this.secretsManager.fetchForTask(target)
|
||||
};
|
||||
|
||||
// Dispatch to agent
|
||||
const taskRef = await this.agentClient.dispatchTask(agent.id, payload);
|
||||
|
||||
// Wait for completion
|
||||
const result = await this.waitForTaskCompletion(taskRef, task.timeout);
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
private async waitForTaskCompletion(
|
||||
taskRef: TaskReference,
|
||||
timeout: number
|
||||
): Promise<ExecutionResult> {
|
||||
const deadline = Date.now() + timeout * 1000;
|
||||
|
||||
while (Date.now() < deadline) {
|
||||
const status = await this.agentClient.getTaskStatus(taskRef);
|
||||
|
||||
if (status.completed) {
|
||||
return {
|
||||
success: status.success,
|
||||
logs: status.logs,
|
||||
deployedDigest: status.deployedDigest,
|
||||
error: status.error
|
||||
};
|
||||
}
|
||||
|
||||
await sleep(1000);
|
||||
}
|
||||
|
||||
throw new TimeoutError(`Task did not complete within ${timeout} seconds`);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### SSH-Based Execution
|
||||
|
||||
```typescript
|
||||
class SshExecutor {
|
||||
async execute(task: DeploymentTask): Promise<ExecutionResult> {
|
||||
const target = await this.targetRepository.get(task.targetId);
|
||||
const sshConfig = target.connection as SshConnectionConfig;
|
||||
|
||||
// Get SSH credentials from vault
|
||||
const creds = await this.secretsManager.fetchSshCredentials(
|
||||
sshConfig.credentialRef
|
||||
);
|
||||
|
||||
// Connect via SSH
|
||||
const ssh = new NodeSSH();
|
||||
await ssh.connect({
|
||||
host: sshConfig.host,
|
||||
port: sshConfig.port || 22,
|
||||
username: creds.username,
|
||||
privateKey: creds.privateKey
|
||||
});
|
||||
|
||||
try {
|
||||
// Upload artifacts
|
||||
const artifacts = await this.getArtifacts(task.jobId);
|
||||
for (const artifact of artifacts) {
|
||||
await ssh.putFile(artifact.localPath, artifact.remotePath);
|
||||
}
|
||||
|
||||
// Execute deployment script
|
||||
const result = await ssh.execCommand(
|
||||
this.buildDeployCommand(task, target),
|
||||
{ cwd: sshConfig.workDir }
|
||||
);
|
||||
|
||||
return {
|
||||
success: result.code === 0,
|
||||
logs: `${result.stdout}\n${result.stderr}`,
|
||||
error: result.code !== 0 ? result.stderr : undefined
|
||||
};
|
||||
} finally {
|
||||
ssh.dispose();
|
||||
}
|
||||
}
|
||||
|
||||
private buildDeployCommand(task: DeploymentTask, target: Target): string {
|
||||
// Build deployment command based on target type
|
||||
switch (target.targetType) {
|
||||
case "compose_host":
|
||||
return `cd ${target.connection.workDir} && docker-compose pull && docker-compose up -d`;
|
||||
|
||||
case "docker_host":
|
||||
return `docker pull ${task.digest} && docker stop ${target.containerName} && docker run -d --name ${target.containerName} ${task.digest}`;
|
||||
|
||||
default:
|
||||
throw new Error(`Unsupported target type: ${target.targetType}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Health Verification
|
||||
|
||||
```typescript
|
||||
interface HealthCheckConfig {
|
||||
type: "http" | "tcp" | "command";
|
||||
timeout: number;
|
||||
retries: number;
|
||||
interval: number;
|
||||
|
||||
// HTTP-specific
|
||||
path?: string;
|
||||
expectedStatus?: number;
|
||||
expectedBody?: string;
|
||||
|
||||
// TCP-specific
|
||||
port?: number;
|
||||
|
||||
// Command-specific
|
||||
command?: string;
|
||||
}
|
||||
|
||||
class HealthVerifier {
|
||||
async verify(
|
||||
target: Target,
|
||||
config: HealthCheckConfig
|
||||
): Promise<HealthCheckResult> {
|
||||
let lastError: Error | undefined;
|
||||
|
||||
for (let attempt = 0; attempt < config.retries; attempt++) {
|
||||
try {
|
||||
const result = await this.performCheck(target, config);
|
||||
|
||||
if (result.healthy) {
|
||||
return result;
|
||||
}
|
||||
|
||||
lastError = new Error(result.message);
|
||||
} catch (error) {
|
||||
lastError = error as Error;
|
||||
}
|
||||
|
||||
if (attempt < config.retries - 1) {
|
||||
await sleep(config.interval * 1000);
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
healthy: false,
|
||||
message: lastError?.message || "Health check failed",
|
||||
attempts: config.retries
|
||||
};
|
||||
}
|
||||
|
||||
private async performCheck(
|
||||
target: Target,
|
||||
config: HealthCheckConfig
|
||||
): Promise<HealthCheckResult> {
|
||||
switch (config.type) {
|
||||
case "http":
|
||||
return this.httpCheck(target, config);
|
||||
|
||||
case "tcp":
|
||||
return this.tcpCheck(target, config);
|
||||
|
||||
case "command":
|
||||
return this.commandCheck(target, config);
|
||||
}
|
||||
}
|
||||
|
||||
private async httpCheck(
|
||||
target: Target,
|
||||
config: HealthCheckConfig
|
||||
): Promise<HealthCheckResult> {
|
||||
const url = `${target.healthEndpoint}${config.path || "/health"}`;
|
||||
|
||||
try {
|
||||
const response = await fetch(url, {
|
||||
signal: AbortSignal.timeout(config.timeout * 1000)
|
||||
});
|
||||
|
||||
const healthy = response.status === (config.expectedStatus || 200);
|
||||
|
||||
return {
|
||||
healthy,
|
||||
message: healthy ? "OK" : `Status ${response.status}`,
|
||||
statusCode: response.status
|
||||
};
|
||||
} catch (error) {
|
||||
return {
|
||||
healthy: false,
|
||||
message: (error as Error).message
|
||||
};
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Rollback Management
|
||||
|
||||
```typescript
|
||||
class RollbackManager {
|
||||
async initiateRollback(
|
||||
jobId: UUID,
|
||||
reason: string
|
||||
): Promise<DeploymentJob> {
|
||||
const failedJob = await this.jobRepository.get(jobId);
|
||||
const previousJob = await this.findPreviousSuccessfulJob(
|
||||
failedJob.environmentId,
|
||||
failedJob.releaseId
|
||||
);
|
||||
|
||||
if (!previousJob) {
|
||||
throw new NoRollbackTargetError(jobId);
|
||||
}
|
||||
|
||||
// Create rollback job
|
||||
const rollbackJob: DeploymentJob = {
|
||||
id: uuidv4(),
|
||||
promotionId: failedJob.promotionId,
|
||||
releaseId: previousJob.releaseId, // Previous release
|
||||
environmentId: failedJob.environmentId,
|
||||
strategy: "all-at-once", // Fast rollback
|
||||
parallelism: 10,
|
||||
status: "pending",
|
||||
rollbackOf: jobId,
|
||||
previousJobId: previousJob.id,
|
||||
artifacts: [],
|
||||
tasks: []
|
||||
};
|
||||
|
||||
// Create tasks to restore previous state
|
||||
for (const task of failedJob.tasks) {
|
||||
const previousTask = previousJob.tasks.find(
|
||||
t => t.targetId === task.targetId
|
||||
);
|
||||
|
||||
if (previousTask) {
|
||||
rollbackJob.tasks.push({
|
||||
id: uuidv4(),
|
||||
jobId: rollbackJob.id,
|
||||
targetId: task.targetId,
|
||||
componentId: previousTask.componentId,
|
||||
digest: previousTask.previousDigest || task.previousDigest!,
|
||||
status: "pending",
|
||||
logs: "",
|
||||
attemptNumber: 0,
|
||||
maxAttempts: 3
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
await this.jobRepository.save(rollbackJob);
|
||||
|
||||
// Execute rollback
|
||||
await this.executeJob(rollbackJob);
|
||||
|
||||
return rollbackJob;
|
||||
}
|
||||
|
||||
private async findPreviousSuccessfulJob(
|
||||
environmentId: UUID,
|
||||
excludeReleaseId: UUID
|
||||
): Promise<DeploymentJob | null> {
|
||||
return this.jobRepository.findOne({
|
||||
environmentId,
|
||||
status: "completed",
|
||||
releaseId: { $ne: excludeReleaseId }
|
||||
}, {
|
||||
orderBy: { completedAt: "desc" }
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Deployment Strategies](strategies.md)
|
||||
- [Agent-Based Deployment](agent-based.md)
|
||||
- [Agentless Deployment](agentless.md)
|
||||
- [Generated Artifacts](artifacts.md)
|
||||
- [Deploy Orchestrator Module](../modules/deploy-orchestrator.md)
|
||||
656
docs/modules/release-orchestrator/deployment/strategies.md
Normal file
656
docs/modules/release-orchestrator/deployment/strategies.md
Normal file
@@ -0,0 +1,656 @@
|
||||
# Deployment Strategies
|
||||
|
||||
## Overview
|
||||
|
||||
Release Orchestrator supports multiple deployment strategies to balance deployment speed, risk, and availability requirements.
|
||||
|
||||
## Strategy Comparison
|
||||
|
||||
| Strategy | Description | Risk Level | Downtime | Rollback Speed |
|
||||
|----------|-------------|------------|----------|----------------|
|
||||
| All-at-once | Deploy to all targets simultaneously | High | Brief | Fast |
|
||||
| Rolling | Deploy to targets in batches | Medium | None | Medium |
|
||||
| Canary | Deploy to subset, then expand | Low | None | Fast |
|
||||
| Blue-Green | Deploy to parallel environment | Low | None | Instant |
|
||||
|
||||
## All-at-Once Strategy
|
||||
|
||||
### Description
|
||||
|
||||
Deploys to all targets simultaneously. Simple and fast, but highest risk.
|
||||
|
||||
```
|
||||
ALL-AT-ONCE DEPLOYMENT
|
||||
|
||||
Time 0 Time 1
|
||||
┌─────────────────┐ ┌─────────────────┐
|
||||
│ Target 1 [v1] │ │ Target 1 [v2] │
|
||||
├─────────────────┤ ├─────────────────┤
|
||||
│ Target 2 [v1] │ ───► │ Target 2 [v2] │
|
||||
├─────────────────┤ ├─────────────────┤
|
||||
│ Target 3 [v1] │ │ Target 3 [v2] │
|
||||
└─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
```typescript
|
||||
interface AllAtOnceConfig {
|
||||
strategy: "all-at-once";
|
||||
|
||||
// Concurrency limit (0 = unlimited)
|
||||
maxConcurrent: number;
|
||||
|
||||
// Health check after deployment
|
||||
healthCheck: HealthCheckConfig;
|
||||
|
||||
// Failure behavior
|
||||
failureBehavior: "rollback" | "continue" | "pause";
|
||||
}
|
||||
|
||||
// Example
|
||||
const config: AllAtOnceConfig = {
|
||||
strategy: "all-at-once",
|
||||
maxConcurrent: 0,
|
||||
healthCheck: {
|
||||
type: "http",
|
||||
path: "/health",
|
||||
timeout: 30,
|
||||
retries: 3,
|
||||
interval: 10
|
||||
},
|
||||
failureBehavior: "rollback"
|
||||
};
|
||||
```
|
||||
|
||||
### Execution
|
||||
|
||||
```typescript
|
||||
class AllAtOnceExecutor {
|
||||
async execute(job: DeploymentJob, config: AllAtOnceConfig): Promise<void> {
|
||||
const tasks = job.tasks;
|
||||
const concurrency = config.maxConcurrent || tasks.length;
|
||||
|
||||
// Execute all tasks with concurrency limit
|
||||
const results = await pMap(
|
||||
tasks,
|
||||
async (task) => {
|
||||
try {
|
||||
await this.executeTask(task);
|
||||
return { taskId: task.id, success: true };
|
||||
} catch (error) {
|
||||
return { taskId: task.id, success: false, error };
|
||||
}
|
||||
},
|
||||
{ concurrency }
|
||||
);
|
||||
|
||||
// Check for failures
|
||||
const failures = results.filter(r => !r.success);
|
||||
|
||||
if (failures.length > 0) {
|
||||
if (config.failureBehavior === "rollback") {
|
||||
await this.rollbackAll(job);
|
||||
throw new DeploymentFailedError(failures);
|
||||
} else if (config.failureBehavior === "pause") {
|
||||
job.status = "failed";
|
||||
throw new DeploymentFailedError(failures);
|
||||
}
|
||||
// "continue" - proceed despite failures
|
||||
}
|
||||
|
||||
// Health check all targets
|
||||
await this.verifyAllTargets(job, config.healthCheck);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
- Development environments
|
||||
- Small deployments
|
||||
- Time-critical updates
|
||||
- Stateless services with fast startup
|
||||
|
||||
## Rolling Strategy
|
||||
|
||||
### Description
|
||||
|
||||
Deploys to targets in configurable batches, maintaining availability throughout.
|
||||
|
||||
```
|
||||
ROLLING DEPLOYMENT (batch size: 1)
|
||||
|
||||
Time 0 Time 1 Time 2 Time 3
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ T1 [v1] │ │ T1 [v2] ✓ │ │ T1 [v2] ✓ │ │ T1 [v2] ✓ │
|
||||
├─────────────┤ ├─────────────┤ ├─────────────┤ ├─────────────┤
|
||||
│ T2 [v1] │──►│ T2 [v1] │──►│ T2 [v2] ✓ │──►│ T2 [v2] ✓ │
|
||||
├─────────────┤ ├─────────────┤ ├─────────────┤ ├─────────────┤
|
||||
│ T3 [v1] │ │ T3 [v1] │ │ T3 [v1] │ │ T3 [v2] ✓ │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
```typescript
|
||||
interface RollingConfig {
|
||||
strategy: "rolling";
|
||||
|
||||
// Batch configuration
|
||||
batchSize: number; // Targets per batch
|
||||
batchPercent?: number; // Alternative: percentage of targets
|
||||
|
||||
// Timing
|
||||
batchDelay: number; // Seconds between batches
|
||||
stabilizationTime: number; // Wait after health check passes
|
||||
|
||||
// Health check
|
||||
healthCheck: HealthCheckConfig;
|
||||
|
||||
// Failure handling
|
||||
maxFailedBatches: number; // Failures before stopping
|
||||
failureBehavior: "rollback" | "pause" | "skip";
|
||||
|
||||
// Ordering
|
||||
targetOrder: "default" | "shuffle" | "priority";
|
||||
}
|
||||
|
||||
// Example
|
||||
const config: RollingConfig = {
|
||||
strategy: "rolling",
|
||||
batchSize: 2,
|
||||
batchDelay: 30,
|
||||
stabilizationTime: 60,
|
||||
healthCheck: {
|
||||
type: "http",
|
||||
path: "/health",
|
||||
timeout: 30,
|
||||
retries: 5,
|
||||
interval: 10
|
||||
},
|
||||
maxFailedBatches: 1,
|
||||
failureBehavior: "rollback",
|
||||
targetOrder: "default"
|
||||
};
|
||||
```
|
||||
|
||||
### Execution
|
||||
|
||||
```typescript
|
||||
class RollingExecutor {
|
||||
async execute(job: DeploymentJob, config: RollingConfig): Promise<void> {
|
||||
const tasks = this.orderTasks(job.tasks, config.targetOrder);
|
||||
const batches = this.createBatches(tasks, config);
|
||||
let failedBatches = 0;
|
||||
const completedTasks: DeploymentTask[] = [];
|
||||
|
||||
for (const batch of batches) {
|
||||
this.emitProgress(job, {
|
||||
phase: "deploying",
|
||||
currentBatch: batches.indexOf(batch) + 1,
|
||||
totalBatches: batches.length,
|
||||
completedTargets: completedTasks.length,
|
||||
totalTargets: tasks.length
|
||||
});
|
||||
|
||||
// Execute batch
|
||||
const results = await Promise.all(
|
||||
batch.map(task => this.executeTask(task))
|
||||
);
|
||||
|
||||
// Check batch results
|
||||
const failures = results.filter(r => !r.success);
|
||||
|
||||
if (failures.length > 0) {
|
||||
failedBatches++;
|
||||
|
||||
if (failedBatches > config.maxFailedBatches) {
|
||||
if (config.failureBehavior === "rollback") {
|
||||
await this.rollbackCompleted(completedTasks);
|
||||
}
|
||||
throw new DeploymentFailedError(failures);
|
||||
}
|
||||
|
||||
if (config.failureBehavior === "pause") {
|
||||
job.status = "failed";
|
||||
throw new DeploymentFailedError(failures);
|
||||
}
|
||||
// "skip" - continue to next batch
|
||||
}
|
||||
|
||||
// Health check batch targets
|
||||
await this.verifyBatch(batch, config.healthCheck);
|
||||
|
||||
// Wait for stabilization
|
||||
if (config.stabilizationTime > 0) {
|
||||
await sleep(config.stabilizationTime * 1000);
|
||||
}
|
||||
|
||||
completedTasks.push(...batch);
|
||||
|
||||
// Wait before next batch
|
||||
if (batches.indexOf(batch) < batches.length - 1) {
|
||||
await sleep(config.batchDelay * 1000);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private createBatches(
|
||||
tasks: DeploymentTask[],
|
||||
config: RollingConfig
|
||||
): DeploymentTask[][] {
|
||||
const batchSize = config.batchPercent
|
||||
? Math.ceil(tasks.length * config.batchPercent / 100)
|
||||
: config.batchSize;
|
||||
|
||||
const batches: DeploymentTask[][] = [];
|
||||
for (let i = 0; i < tasks.length; i += batchSize) {
|
||||
batches.push(tasks.slice(i, i + batchSize));
|
||||
}
|
||||
|
||||
return batches;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
- Production deployments
|
||||
- High-availability requirements
|
||||
- Large target counts
|
||||
- Services requiring gradual rollout
|
||||
|
||||
## Canary Strategy
|
||||
|
||||
### Description
|
||||
|
||||
Deploys to a small subset of targets first, validates, then expands to remaining targets.
|
||||
|
||||
```
|
||||
CANARY DEPLOYMENT
|
||||
|
||||
Phase 1: Canary (10%) Phase 2: Expand (50%) Phase 3: Full (100%)
|
||||
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ T1 [v2] ✓ │ ◄─canary │ T1 [v2] ✓ │ │ T1 [v2] ✓ │
|
||||
├─────────────┤ ├─────────────┤ ├─────────────┤
|
||||
│ T2 [v1] │ │ T2 [v2] ✓ │ │ T2 [v2] ✓ │
|
||||
├─────────────┤ ├─────────────┤ ├─────────────┤
|
||||
│ T3 [v1] │ │ T3 [v2] ✓ │ │ T3 [v2] ✓ │
|
||||
├─────────────┤ ├─────────────┤ ├─────────────┤
|
||||
│ T4 [v1] │ │ T4 [v2] ✓ │ │ T4 [v2] ✓ │
|
||||
├─────────────┤ ├─────────────┤ ├─────────────┤
|
||||
│ T5 [v1] │ │ T5 [v1] │ │ T5 [v2] ✓ │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
Health Check Health Check Health Check
|
||||
Error Rate Check Error Rate Check Error Rate Check
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
```typescript
|
||||
interface CanaryConfig {
|
||||
strategy: "canary";
|
||||
|
||||
// Canary stages
|
||||
stages: CanaryStage[];
|
||||
|
||||
// Canary selection
|
||||
canarySelector: "random" | "labeled" | "first";
|
||||
canaryLabel?: string; // Label for canary targets
|
||||
|
||||
// Automatic vs manual progression
|
||||
autoProgress: boolean;
|
||||
|
||||
// Health and metrics checks
|
||||
healthCheck: HealthCheckConfig;
|
||||
metricsCheck?: MetricsCheckConfig;
|
||||
}
|
||||
|
||||
interface CanaryStage {
|
||||
name: string;
|
||||
percentage: number; // Target percentage
|
||||
duration: number; // Minimum time at this stage (seconds)
|
||||
autoProgress: boolean; // Auto-advance after duration
|
||||
}
|
||||
|
||||
interface MetricsCheckConfig {
|
||||
integrationId: UUID; // Metrics integration
|
||||
queries: MetricQuery[];
|
||||
failureThreshold: number; // Percentage deviation to fail
|
||||
}
|
||||
|
||||
interface MetricQuery {
|
||||
name: string;
|
||||
query: string; // PromQL or similar
|
||||
operator: "lt" | "gt" | "eq";
|
||||
threshold: number;
|
||||
}
|
||||
|
||||
// Example
|
||||
const config: CanaryConfig = {
|
||||
strategy: "canary",
|
||||
stages: [
|
||||
{ name: "canary", percentage: 10, duration: 300, autoProgress: false },
|
||||
{ name: "expand", percentage: 50, duration: 300, autoProgress: true },
|
||||
{ name: "full", percentage: 100, duration: 0, autoProgress: true }
|
||||
],
|
||||
canarySelector: "labeled",
|
||||
canaryLabel: "canary=true",
|
||||
autoProgress: false,
|
||||
healthCheck: {
|
||||
type: "http",
|
||||
path: "/health",
|
||||
timeout: 30,
|
||||
retries: 5,
|
||||
interval: 10
|
||||
},
|
||||
metricsCheck: {
|
||||
integrationId: "prometheus-uuid",
|
||||
queries: [
|
||||
{
|
||||
name: "error_rate",
|
||||
query: "rate(http_requests_total{status=~\"5..\"}[5m]) / rate(http_requests_total[5m])",
|
||||
operator: "lt",
|
||||
threshold: 0.01 // Less than 1% error rate
|
||||
}
|
||||
],
|
||||
failureThreshold: 10
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### Execution
|
||||
|
||||
```typescript
|
||||
class CanaryExecutor {
|
||||
async execute(job: DeploymentJob, config: CanaryConfig): Promise<void> {
|
||||
const tasks = this.orderTasks(job.tasks, config);
|
||||
|
||||
for (const stage of config.stages) {
|
||||
const targetCount = Math.ceil(tasks.length * stage.percentage / 100);
|
||||
const stageTasks = tasks.slice(0, targetCount);
|
||||
const newTasks = stageTasks.filter(t => t.status === "pending");
|
||||
|
||||
this.emitProgress(job, {
|
||||
phase: "canary",
|
||||
stage: stage.name,
|
||||
percentage: stage.percentage,
|
||||
targets: stageTasks.length
|
||||
});
|
||||
|
||||
// Deploy to new targets in this stage
|
||||
await Promise.all(newTasks.map(task => this.executeTask(task)));
|
||||
|
||||
// Health check stage targets
|
||||
await this.verifyTargets(stageTasks, config.healthCheck);
|
||||
|
||||
// Metrics check if configured
|
||||
if (config.metricsCheck) {
|
||||
await this.checkMetrics(stageTasks, config.metricsCheck);
|
||||
}
|
||||
|
||||
// Wait for stage duration
|
||||
if (stage.duration > 0) {
|
||||
await this.waitWithMonitoring(
|
||||
stageTasks,
|
||||
stage.duration,
|
||||
config.metricsCheck
|
||||
);
|
||||
}
|
||||
|
||||
// Wait for manual approval if not auto-progress
|
||||
if (!stage.autoProgress && stage.percentage < 100) {
|
||||
await this.waitForApproval(job, stage.name);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private async checkMetrics(
|
||||
targets: DeploymentTask[],
|
||||
config: MetricsCheckConfig
|
||||
): Promise<void> {
|
||||
const metricsClient = await this.getMetricsClient(config.integrationId);
|
||||
|
||||
for (const query of config.queries) {
|
||||
const result = await metricsClient.query(query.query);
|
||||
|
||||
const passed = this.evaluateMetric(result, query);
|
||||
|
||||
if (!passed) {
|
||||
throw new CanaryMetricsFailedError(query.name, result, query.threshold);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
- Risk-sensitive deployments
|
||||
- Services with real user traffic
|
||||
- Deployments with metrics-based validation
|
||||
- Gradual feature rollouts
|
||||
|
||||
## Blue-Green Strategy
|
||||
|
||||
### Description
|
||||
|
||||
Deploys to a parallel "green" environment while "blue" continues serving traffic, then switches.
|
||||
|
||||
```
|
||||
BLUE-GREEN DEPLOYMENT
|
||||
|
||||
Phase 1: Deploy Green Phase 2: Switch Traffic
|
||||
|
||||
┌─────────────────────────┐ ┌─────────────────────────┐
|
||||
│ Load Balancer │ │ Load Balancer │
|
||||
│ │ │ │ │ │
|
||||
│ ▼ │ │ ▼ │
|
||||
│ ┌─────────────┐ │ │ ┌─────────────┐ │
|
||||
│ │ Blue [v1] │◄─active│ │ │ Blue [v1] │ │
|
||||
│ │ T1, T2, T3 │ │ │ │ T1, T2, T3 │ │
|
||||
│ └─────────────┘ │ │ └─────────────┘ │
|
||||
│ │ │ │
|
||||
│ ┌─────────────┐ │ │ ┌─────────────┐ │
|
||||
│ │ Green [v2] │◄─deploy│ │ │ Green [v2] │◄─active│
|
||||
│ │ T4, T5, T6 │ │ │ │ T4, T5, T6 │ │
|
||||
│ └─────────────┘ │ │ └─────────────┘ │
|
||||
│ │ │ │
|
||||
└─────────────────────────┘ └─────────────────────────┘
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
```typescript
|
||||
interface BlueGreenConfig {
|
||||
strategy: "blue-green";
|
||||
|
||||
// Environment labels
|
||||
blueLabel: string; // Label for blue targets
|
||||
greenLabel: string; // Label for green targets
|
||||
|
||||
// Traffic routing
|
||||
routerIntegration: UUID; // Router/LB integration
|
||||
routingConfig: RoutingConfig;
|
||||
|
||||
// Validation
|
||||
healthCheck: HealthCheckConfig;
|
||||
warmupTime: number; // Seconds to warm up green
|
||||
validationTests?: string[]; // Test suites to run
|
||||
|
||||
// Switchover
|
||||
switchoverMode: "instant" | "gradual";
|
||||
gradualSteps?: number[]; // Percentage steps for gradual
|
||||
|
||||
// Rollback
|
||||
keepBlueActive: number; // Seconds to keep blue ready
|
||||
}
|
||||
|
||||
// Example
|
||||
const config: BlueGreenConfig = {
|
||||
strategy: "blue-green",
|
||||
blueLabel: "deployment=blue",
|
||||
greenLabel: "deployment=green",
|
||||
routerIntegration: "nginx-lb-uuid",
|
||||
routingConfig: {
|
||||
upstreamName: "myapp",
|
||||
healthEndpoint: "/health"
|
||||
},
|
||||
healthCheck: {
|
||||
type: "http",
|
||||
path: "/health",
|
||||
timeout: 30,
|
||||
retries: 5,
|
||||
interval: 10
|
||||
},
|
||||
warmupTime: 60,
|
||||
validationTests: ["smoke-test-suite"],
|
||||
switchoverMode: "instant",
|
||||
keepBlueActive: 1800 // 30 minutes
|
||||
};
|
||||
```
|
||||
|
||||
### Execution
|
||||
|
||||
```typescript
|
||||
class BlueGreenExecutor {
|
||||
async execute(job: DeploymentJob, config: BlueGreenConfig): Promise<void> {
|
||||
// Identify blue and green targets
|
||||
const { blue, green } = this.categorizeTargets(job.tasks, config);
|
||||
|
||||
// Phase 1: Deploy to green
|
||||
this.emitProgress(job, { phase: "deploying-green" });
|
||||
|
||||
await Promise.all(green.map(task => this.executeTask(task)));
|
||||
|
||||
// Health check green targets
|
||||
await this.verifyTargets(green, config.healthCheck);
|
||||
|
||||
// Warmup period
|
||||
if (config.warmupTime > 0) {
|
||||
this.emitProgress(job, { phase: "warming-up" });
|
||||
await sleep(config.warmupTime * 1000);
|
||||
}
|
||||
|
||||
// Run validation tests
|
||||
if (config.validationTests?.length) {
|
||||
this.emitProgress(job, { phase: "validating" });
|
||||
await this.runValidationTests(green, config.validationTests);
|
||||
}
|
||||
|
||||
// Phase 2: Switch traffic
|
||||
this.emitProgress(job, { phase: "switching-traffic" });
|
||||
|
||||
if (config.switchoverMode === "instant") {
|
||||
await this.instantSwitchover(config, blue, green);
|
||||
} else {
|
||||
await this.gradualSwitchover(config, blue, green);
|
||||
}
|
||||
|
||||
// Verify traffic routing
|
||||
await this.verifyRouting(green, config);
|
||||
|
||||
// Schedule blue decommission
|
||||
if (config.keepBlueActive > 0) {
|
||||
this.scheduleBlueDecommission(blue, config.keepBlueActive);
|
||||
}
|
||||
}
|
||||
|
||||
private async instantSwitchover(
|
||||
config: BlueGreenConfig,
|
||||
blue: DeploymentTask[],
|
||||
green: DeploymentTask[]
|
||||
): Promise<void> {
|
||||
const router = await this.getRouter(config.routerIntegration);
|
||||
|
||||
// Update upstream to green targets
|
||||
await router.updateUpstream(config.routingConfig.upstreamName, {
|
||||
servers: green.map(t => ({
|
||||
address: t.target.address,
|
||||
weight: 1
|
||||
}))
|
||||
});
|
||||
|
||||
// Remove blue from rotation
|
||||
await router.removeServers(
|
||||
config.routingConfig.upstreamName,
|
||||
blue.map(t => t.target.address)
|
||||
);
|
||||
}
|
||||
|
||||
private async gradualSwitchover(
|
||||
config: BlueGreenConfig,
|
||||
blue: DeploymentTask[],
|
||||
green: DeploymentTask[]
|
||||
): Promise<void> {
|
||||
const router = await this.getRouter(config.routerIntegration);
|
||||
const steps = config.gradualSteps || [25, 50, 75, 100];
|
||||
|
||||
for (const percentage of steps) {
|
||||
await router.setTrafficSplit(config.routingConfig.upstreamName, {
|
||||
blue: 100 - percentage,
|
||||
green: percentage
|
||||
});
|
||||
|
||||
// Monitor for errors
|
||||
await this.monitorTraffic(30);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
- Zero-downtime deployments
|
||||
- Database migration deployments
|
||||
- High-stakes production updates
|
||||
- Instant rollback requirements
|
||||
|
||||
## Strategy Selection Guide
|
||||
|
||||
```
|
||||
STRATEGY SELECTION
|
||||
|
||||
START
|
||||
│
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ Zero downtime needed? │
|
||||
└───────────┬────────────┘
|
||||
│
|
||||
No │ Yes
|
||||
│ │ │
|
||||
▼ │ ▼
|
||||
┌──────────┐ │ ┌───────────────────┐
|
||||
│ All-at- │ │ │ Metrics-based │
|
||||
│ once │ │ │ validation needed?│
|
||||
└──────────┘ │ └─────────┬─────────┘
|
||||
│ │
|
||||
│ No │ Yes
|
||||
│ │ │ │
|
||||
│ ▼ │ ▼
|
||||
│ ┌──────────┐│ ┌──────────┐
|
||||
│ │ Instant ││ │ Canary │
|
||||
│ │ rollback? ││ │ │
|
||||
│ └────┬─────┘│ └──────────┘
|
||||
│ │ │
|
||||
│ No │ Yes │
|
||||
│ │ │ │ │
|
||||
│ ▼ │ ▼ │
|
||||
│┌──────┐│┌────┴─────┐
|
||||
││Rolling│││Blue-Green│
|
||||
│└──────┘│└──────────┘
|
||||
│ │
|
||||
└───────┘
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Deployment Overview](overview.md)
|
||||
- [Progressive Delivery](../modules/progressive-delivery.md)
|
||||
- [Rollback Management](overview.md#rollback-management)
|
||||
249
docs/modules/release-orchestrator/design/decisions.md
Normal file
249
docs/modules/release-orchestrator/design/decisions.md
Normal file
@@ -0,0 +1,249 @@
|
||||
# Key Architectural Decisions
|
||||
|
||||
This document records significant architectural decisions and their rationale.
|
||||
|
||||
## ADR-001: Digest-First Release Identity
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Container images can be referenced by tags (e.g., `v1.2.3`) or digests (e.g., `sha256:abc123...`). Tags are mutable - the same tag can point to different images over time.
|
||||
|
||||
**Decision:**
|
||||
All releases are identified by immutable OCI digests, never tags. Tags are accepted as input but immediately resolved to digests at release creation time.
|
||||
|
||||
**Consequences:**
|
||||
- Releases are immutable and reproducible
|
||||
- Digest mismatch at pull time indicates tampering (deployment fails)
|
||||
- Rollback targets specific digest, not "previous tag"
|
||||
- Requires registry integration for tag resolution
|
||||
- Users see both tag (friendly) and digest (authoritative) in UI
|
||||
|
||||
---
|
||||
|
||||
## ADR-002: Evidence for Every Decision
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Compliance and audit requirements demand proof of what was deployed, when, by whom, and why.
|
||||
|
||||
**Decision:**
|
||||
Every promotion and deployment produces a cryptographically signed evidence packet that is immutable and append-only.
|
||||
|
||||
**Consequences:**
|
||||
- Evidence table has no UPDATE/DELETE permissions
|
||||
- Evidence enables audit-grade compliance reporting
|
||||
- Evidence enables deterministic replay (same inputs + policy = same decision)
|
||||
- Evidence packets are exportable for external audit systems
|
||||
- Storage requirements increase over time
|
||||
|
||||
---
|
||||
|
||||
## ADR-003: Plugin Architecture for Integrations
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Organizations use diverse toolchains (registries, CI/CD, vaults, notification systems). Hard-coding integrations limits adoption.
|
||||
|
||||
**Decision:**
|
||||
All integrations are implemented as plugins via a three-surface contract (Manifest, Connector Runtime, Step Provider). Core orchestration is stable and plugin-agnostic.
|
||||
|
||||
**Consequences:**
|
||||
- Core has no hard-coded vendor integrations
|
||||
- New integrations can be added without core changes
|
||||
- Plugin failures cannot crash core (sandbox isolation)
|
||||
- Plugin interface must be versioned and stable
|
||||
- Additional complexity in plugin lifecycle management
|
||||
|
||||
---
|
||||
|
||||
## ADR-004: No Feature Gating
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Enterprise software often gates security features behind premium tiers, creating "pay for security" anti-patterns.
|
||||
|
||||
**Decision:**
|
||||
All plans include all features. Pricing is based only on:
|
||||
- Number of environments
|
||||
- New digests analyzed per day
|
||||
- Fair use on deployments
|
||||
|
||||
**Consequences:**
|
||||
- No feature flags tied to billing tier
|
||||
- Transparent pricing without feature fragmentation
|
||||
- May limit revenue optimization per customer
|
||||
- Quota enforcement must be clear and user-friendly
|
||||
|
||||
---
|
||||
|
||||
## ADR-005: Offline-First Operation
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Many organizations operate in air-gapped or restricted network environments. Dependency on external services limits adoption.
|
||||
|
||||
**Decision:**
|
||||
All core operations must work in air-gapped environments. External data is synced via mirror bundles. Plugins may require connectivity; core does not.
|
||||
|
||||
**Consequences:**
|
||||
- No runtime calls to external APIs for core decisions
|
||||
- Advisory data synced via offline bundles
|
||||
- Plugin connectivity requirements are declared in manifest
|
||||
- Evidence packets exportable for external submission
|
||||
- Additional complexity in data synchronization
|
||||
|
||||
---
|
||||
|
||||
## ADR-006: Agent-Based and Agentless Deployment
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Some organizations prefer agents for security isolation; others prefer agentless for simplicity.
|
||||
|
||||
**Decision:**
|
||||
Support both agent-based (persistent daemon on targets) and agentless (SSH/WinRM on demand) deployment models.
|
||||
|
||||
**Consequences:**
|
||||
- Agent provides better performance and reliability
|
||||
- Agentless reduces infrastructure footprint
|
||||
- Unified task model abstracts deployment details
|
||||
- Security model must handle both patterns
|
||||
- Higher testing matrix
|
||||
|
||||
---
|
||||
|
||||
## ADR-007: PostgreSQL as Primary Database
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Database choice affects scalability, operations, and feature availability.
|
||||
|
||||
**Decision:**
|
||||
PostgreSQL (16+) as the primary database with:
|
||||
- Per-module schema isolation
|
||||
- Row-level security for multi-tenancy
|
||||
- JSONB for flexible configuration
|
||||
- Append-only triggers for evidence tables
|
||||
|
||||
**Consequences:**
|
||||
- Proven scalability and reliability
|
||||
- Rich feature set (JSONB, RLS, triggers)
|
||||
- Single database technology to operate
|
||||
- Requires PostgreSQL expertise
|
||||
- Schema migrations must be carefully managed
|
||||
|
||||
---
|
||||
|
||||
## ADR-008: Workflow Engine with DAG Execution
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Deployment workflows need conditional logic, parallel execution, error handling, and rollback support.
|
||||
|
||||
**Decision:**
|
||||
Implement a DAG-based workflow engine where:
|
||||
- Workflows are templates with nodes (steps) and edges (dependencies)
|
||||
- Steps execute when all dependencies are satisfied
|
||||
- Expressions reference previous step outputs
|
||||
- Built-in support for approval, retry, timeout, and rollback
|
||||
|
||||
**Consequences:**
|
||||
- Flexible workflow composition
|
||||
- Visual representation in UI
|
||||
- Complex error handling scenarios supported
|
||||
- Learning curve for workflow authors
|
||||
- Expression engine security considerations
|
||||
|
||||
---
|
||||
|
||||
## ADR-009: Separation of Duties Enforcement
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Compliance requires that the person requesting a change cannot be the same person approving it.
|
||||
|
||||
**Decision:**
|
||||
Separation of Duties (SoD) is enforced at the approval gateway level, preventing self-approval when SoD is enabled for an environment.
|
||||
|
||||
**Consequences:**
|
||||
- Prevents single-person deployment to sensitive environments
|
||||
- Configurable per environment
|
||||
- May slow down deployments
|
||||
- Requires minimum team size for SoD-enabled environments
|
||||
|
||||
---
|
||||
|
||||
## ADR-010: Version Stickers for Drift Detection
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Knowing what's actually deployed on targets is essential for audit and troubleshooting.
|
||||
|
||||
**Decision:**
|
||||
Every deployment writes a `stella.version.json` sticker file on the target containing release ID, digests, deployment timestamp, and deployer identity.
|
||||
|
||||
**Consequences:**
|
||||
- Enables drift detection (expected vs actual)
|
||||
- Provides audit trail on target hosts
|
||||
- Enables accurate "what's deployed where" queries
|
||||
- Requires file access on targets
|
||||
- Sticker corruption/deletion must be handled
|
||||
|
||||
---
|
||||
|
||||
## ADR-011: Security Gate Integration
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Security scanning exists as a separate concern; release orchestration should leverage but not duplicate it.
|
||||
|
||||
**Decision:**
|
||||
Security scanning remains in existing modules (Scanner, VEX). Release orchestration consumes scan results through a security gate that evaluates vulnerability thresholds.
|
||||
|
||||
**Consequences:**
|
||||
- Clear separation of concerns
|
||||
- Existing scanning investment preserved
|
||||
- Gate configuration determines block thresholds
|
||||
- Requires API integration with scanning modules
|
||||
- Policy engine evaluates security verdicts
|
||||
|
||||
---
|
||||
|
||||
## ADR-012: gRPC for Agent Communication
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Agent communication requires efficient, bidirectional, and secure data transfer.
|
||||
|
||||
**Decision:**
|
||||
Use gRPC for agent communication with:
|
||||
- mTLS for transport security
|
||||
- Bidirectional streaming for logs and progress
|
||||
- Protocol buffers for efficient serialization
|
||||
|
||||
**Consequences:**
|
||||
- Efficient binary protocol
|
||||
- Strong typing via protobuf
|
||||
- Built-in streaming support
|
||||
- Requires gRPC infrastructure
|
||||
- Firewall considerations for gRPC traffic
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Design Principles](principles.md)
|
||||
- [Security Architecture](../security/overview.md)
|
||||
- [Plugin System](../modules/plugin-system.md)
|
||||
221
docs/modules/release-orchestrator/design/principles.md
Normal file
221
docs/modules/release-orchestrator/design/principles.md
Normal file
@@ -0,0 +1,221 @@
|
||||
# Design Principles & Invariants
|
||||
|
||||
> These principles are **inviolable** and MUST be reflected in all code, UI, documentation, and audit artifacts.
|
||||
|
||||
## Core Principles
|
||||
|
||||
### Principle 1: Release Identity via Digest
|
||||
|
||||
```
|
||||
INVARIANT: A release is a set of OCI image digests (component → digest mapping), never tags.
|
||||
```
|
||||
|
||||
- Tags are convenience inputs for resolution
|
||||
- Tags are resolved to digests at release creation time
|
||||
- All downstream operations (promotion, deployment, rollback) use digests
|
||||
- Digest mismatch at pull time = deployment failure (tamper detection)
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Release creation API accepts tags but immediately resolves to digests
|
||||
- All internal references use `sha256:` prefixed digests
|
||||
- Agent deployment verifies digest at pull time
|
||||
- Rollback targets specific digest, not "previous tag"
|
||||
|
||||
### Principle 2: Determinism and Evidence
|
||||
|
||||
```
|
||||
INVARIANT: Every deployment/promotion produces an immutable evidence record.
|
||||
```
|
||||
|
||||
Evidence record contains:
|
||||
- **Who**: User identity (from Authority)
|
||||
- **What**: Release bundle (digests), target environment, target hosts
|
||||
- **Why**: Policy evaluation result, approval records, decision reasons
|
||||
- **How**: Generated artifacts (compose files, scripts), execution logs
|
||||
- **When**: Timestamps for request, decision, execution, completion
|
||||
|
||||
Evidence enables:
|
||||
- Audit-grade compliance reporting
|
||||
- Deterministic replay (same inputs + policy → same decision)
|
||||
- "Why blocked?" explainability
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Evidence is generated synchronously with decision
|
||||
- Evidence is signed before storage
|
||||
- Evidence table is append-only (no UPDATE/DELETE)
|
||||
- Evidence includes hash of all inputs for replay verification
|
||||
|
||||
### Principle 3: Pluggable Everything, Stable Core
|
||||
|
||||
```
|
||||
INVARIANT: Integrations are plugins; the core orchestration engine is stable.
|
||||
```
|
||||
|
||||
**Plugins contribute:**
|
||||
- Configuration screens (UI)
|
||||
- Connector logic (runtime)
|
||||
- Step node types (workflow)
|
||||
- Doctor checks (diagnostics)
|
||||
- Agent types (deployment)
|
||||
|
||||
**Core engine provides:**
|
||||
- Workflow execution (DAG processing)
|
||||
- State machine management
|
||||
- Evidence generation
|
||||
- Policy evaluation
|
||||
- Credential brokering
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Core has no hard-coded integrations
|
||||
- Plugin interface is versioned and stable
|
||||
- Plugin failures cannot crash core
|
||||
- Core provides fallback behavior when plugins unavailable
|
||||
|
||||
### Principle 4: No Feature Gating
|
||||
|
||||
```
|
||||
INVARIANT: All plans include all features. Limits are only:
|
||||
- Number of environments
|
||||
- Number of new digests analyzed per day
|
||||
- Fair use on deployments
|
||||
```
|
||||
|
||||
This prevents:
|
||||
- "Pay for security" anti-pattern
|
||||
- Per-project/per-seat billing landmines
|
||||
- Feature fragmentation across tiers
|
||||
|
||||
**Implementation Requirements:**
|
||||
- No feature flags tied to billing tier
|
||||
- Quota enforcement is transparent (clear error messages)
|
||||
- Usage metrics exposed for customer visibility
|
||||
- Overage handling is graceful (soft limits with warnings)
|
||||
|
||||
### Principle 5: Offline-First Operation
|
||||
|
||||
```
|
||||
INVARIANT: All core operations MUST work in air-gapped environments.
|
||||
```
|
||||
|
||||
Implications:
|
||||
- No runtime calls to external APIs for core decisions
|
||||
- Vulnerability data synced via mirror bundles
|
||||
- Plugins may require connectivity; core does not
|
||||
- Evidence packets exportable for external audit
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Core decision logic has no external HTTP calls
|
||||
- All external data is pre-synced and cached
|
||||
- Plugin connectivity requirements are declared in manifest
|
||||
- Offline mode is explicit configuration, not degraded fallback
|
||||
|
||||
### Principle 6: Immutable Generated Artifacts
|
||||
|
||||
```
|
||||
INVARIANT: Every deployment generates and stores immutable artifacts.
|
||||
```
|
||||
|
||||
Generated artifacts:
|
||||
- `compose.stella.lock.yml`: Pinned digests, resolved env refs
|
||||
- `deploy.stella.script.dll`: Compiled C# script (or hash reference)
|
||||
- `release.evidence.json`: Decision record
|
||||
- `stella.version.json`: Version sticker placed on target
|
||||
|
||||
Version sticker enables:
|
||||
- Drift detection (expected vs actual)
|
||||
- Audit trail on target host
|
||||
- Rollback reference
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Artifacts are content-addressed (hash in filename or metadata)
|
||||
- Artifacts are stored before deployment execution
|
||||
- Artifact storage is immutable (no overwrites)
|
||||
- Version sticker is atomic write on target
|
||||
|
||||
---
|
||||
|
||||
## Architectural Invariants (Enforced by Design)
|
||||
|
||||
These invariants are enforced through database constraints, code architecture, and operational controls.
|
||||
|
||||
| Invariant | Enforcement Mechanism |
|
||||
|-----------|----------------------|
|
||||
| Digests are immutable | Database constraint: digest column is unique, no updates |
|
||||
| Evidence packets are append-only | Evidence table has no UPDATE/DELETE permissions |
|
||||
| Secrets never in database | Vault integration; only references stored |
|
||||
| Plugins cannot bypass policy | Policy evaluation in core, not plugin |
|
||||
| Multi-tenant isolation | `tenant_id` FK on all tables; row-level security |
|
||||
| Workflow state is auditable | State transitions logged; no direct state manipulation |
|
||||
| Approvals are tamper-evident | Approval records are signed and append-only |
|
||||
|
||||
### Database Enforcement
|
||||
|
||||
```sql
|
||||
-- Example: Evidence table with no UPDATE/DELETE
|
||||
CREATE TABLE release.evidence_packets (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id),
|
||||
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
|
||||
content_hash TEXT NOT NULL,
|
||||
content JSONB NOT NULL,
|
||||
signature TEXT NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
-- No updated_at column; immutable by design
|
||||
);
|
||||
|
||||
-- Revoke UPDATE/DELETE from application role
|
||||
REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role;
|
||||
```
|
||||
|
||||
### Code Architecture Enforcement
|
||||
|
||||
```csharp
|
||||
// Policy evaluation is ALWAYS in core, never delegated to plugins
|
||||
public sealed class PromotionDecisionEngine
|
||||
{
|
||||
// Plugins provide gate implementations, but core orchestrates evaluation
|
||||
public async Task<DecisionResult> EvaluateAsync(
|
||||
Promotion promotion,
|
||||
IReadOnlyList<IGateProvider> gates,
|
||||
CancellationToken ct)
|
||||
{
|
||||
// Core controls evaluation order and aggregation
|
||||
var results = new List<GateResult>();
|
||||
foreach (var gate in gates)
|
||||
{
|
||||
// Plugin provides evaluation logic
|
||||
var result = await gate.EvaluateAsync(promotion, ct);
|
||||
results.Add(result);
|
||||
|
||||
// Core decides how to aggregate (plugins cannot override)
|
||||
if (result.IsBlocking && _policy.FailFast)
|
||||
break;
|
||||
}
|
||||
|
||||
// Core makes final decision
|
||||
return _decisionAggregator.Aggregate(results);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Document Conventions
|
||||
|
||||
Throughout the Release Orchestrator documentation:
|
||||
|
||||
- **MUST**: Mandatory requirement; non-compliance is a bug
|
||||
- **SHOULD**: Recommended but not mandatory; deviation requires justification
|
||||
- **MAY**: Optional; implementation decision
|
||||
- **Entity names**: `PascalCase` (e.g., `ReleaseBundle`)
|
||||
- **Table names**: `snake_case` (e.g., `release_bundles`)
|
||||
- **API paths**: `/api/v1/resource-name`
|
||||
- **Module names**: `kebab-case` (e.g., `release-manager`)
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Key Architectural Decisions](decisions.md)
|
||||
- [Module Architecture](../modules/overview.md)
|
||||
- [Security Architecture](../security/overview.md)
|
||||
602
docs/modules/release-orchestrator/implementation-guide.md
Normal file
602
docs/modules/release-orchestrator/implementation-guide.md
Normal file
@@ -0,0 +1,602 @@
|
||||
# Implementation Guide
|
||||
|
||||
> .NET 10 implementation patterns and best practices for Release Orchestrator modules.
|
||||
|
||||
**Target Audience**: Development team implementing Release Orchestrator modules
|
||||
**Prerequisites**: Familiarity with [CLAUDE.md](../../../CLAUDE.md) coding rules
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This guide supplements the architecture documentation with .NET 10-specific implementation patterns required for all Release Orchestrator modules. These patterns ensure:
|
||||
|
||||
- Deterministic behavior for evidence reproducibility
|
||||
- Testability through dependency injection
|
||||
- Compliance with Stella Ops coding standards
|
||||
- Performance and reliability
|
||||
|
||||
---
|
||||
|
||||
## Code Quality Requirements
|
||||
|
||||
### Compiler Configuration
|
||||
|
||||
All Release Orchestrator projects **MUST** enforce warnings as errors:
|
||||
|
||||
```xml
|
||||
<!-- In Directory.Build.props or .csproj -->
|
||||
<PropertyGroup>
|
||||
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
|
||||
<Nullable>enable</Nullable>
|
||||
<ImplicitUsings>disable</ImplicitUsings>
|
||||
</PropertyGroup>
|
||||
```
|
||||
|
||||
**Rationale**: Warnings indicate potential bugs, regressions, or code quality drift. Treating them as errors prevents them from being ignored.
|
||||
|
||||
---
|
||||
|
||||
## Determinism & Time Handling
|
||||
|
||||
### TimeProvider Injection
|
||||
|
||||
**Never** use `DateTime.UtcNow`, `DateTimeOffset.UtcNow`, or `DateTimeOffset.Now` directly. Always inject `TimeProvider`.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - non-deterministic, hard to test
|
||||
public class PromotionManager
|
||||
{
|
||||
public Promotion CreatePromotion(Guid releaseId, Guid targetEnvId)
|
||||
{
|
||||
return new Promotion
|
||||
{
|
||||
Id = Guid.NewGuid(),
|
||||
ReleaseId = releaseId,
|
||||
TargetEnvironmentId = targetEnvId,
|
||||
RequestedAt = DateTimeOffset.UtcNow // ❌ Hard-coded time
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// ✅ GOOD - injectable, testable, deterministic
|
||||
public class PromotionManager
|
||||
{
|
||||
private readonly TimeProvider _timeProvider;
|
||||
private readonly IGuidGenerator _guidGenerator;
|
||||
|
||||
public PromotionManager(TimeProvider timeProvider, IGuidGenerator guidGenerator)
|
||||
{
|
||||
_timeProvider = timeProvider;
|
||||
_guidGenerator = guidGenerator;
|
||||
}
|
||||
|
||||
public Promotion CreatePromotion(Guid releaseId, Guid targetEnvId)
|
||||
{
|
||||
return new Promotion
|
||||
{
|
||||
Id = _guidGenerator.NewGuid(),
|
||||
ReleaseId = releaseId,
|
||||
TargetEnvironmentId = targetEnvId,
|
||||
RequestedAt = _timeProvider.GetUtcNow() // ✅ Injected, testable
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Registration**:
|
||||
```csharp
|
||||
// Production: use system time
|
||||
services.AddSingleton(TimeProvider.System);
|
||||
|
||||
// Testing: use manual time for deterministic tests
|
||||
var manualTime = new ManualTimeProvider();
|
||||
manualTime.SetUtcNow(new DateTimeOffset(2026, 1, 10, 12, 0, 0, TimeSpan.Zero));
|
||||
services.AddSingleton<TimeProvider>(manualTime);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GUID Generation
|
||||
|
||||
**Never** use `Guid.NewGuid()` directly. Always inject `IGuidGenerator`.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD
|
||||
var releaseId = Guid.NewGuid();
|
||||
|
||||
// ✅ GOOD
|
||||
var releaseId = _guidGenerator.NewGuid();
|
||||
```
|
||||
|
||||
**Interface**:
|
||||
```csharp
|
||||
public interface IGuidGenerator
|
||||
{
|
||||
Guid NewGuid();
|
||||
}
|
||||
|
||||
// Production implementation
|
||||
public sealed class SystemGuidGenerator : IGuidGenerator
|
||||
{
|
||||
public Guid NewGuid() => Guid.NewGuid();
|
||||
}
|
||||
|
||||
// Deterministic test implementation
|
||||
public sealed class SequentialGuidGenerator : IGuidGenerator
|
||||
{
|
||||
private int _counter;
|
||||
|
||||
public Guid NewGuid()
|
||||
{
|
||||
var bytes = new byte[16];
|
||||
BitConverter.GetBytes(_counter++).CopyTo(bytes, 0);
|
||||
return new Guid(bytes);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Async & Cancellation
|
||||
|
||||
### CancellationToken Propagation
|
||||
|
||||
**Always** propagate `CancellationToken` through async call chains. Never use `CancellationToken.None` except at entry points where no token is available.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - ignores cancellation
|
||||
public async Task<Promotion> ApprovePromotionAsync(Guid promotionId, Guid userId, CancellationToken ct)
|
||||
{
|
||||
var promotion = await _repository.GetByIdAsync(promotionId, CancellationToken.None); // ❌ Wrong
|
||||
|
||||
promotion.Approvals.Add(new Approval
|
||||
{
|
||||
ApproverId = userId,
|
||||
ApprovedAt = _timeProvider.GetUtcNow()
|
||||
});
|
||||
|
||||
await _repository.SaveAsync(promotion, CancellationToken.None); // ❌ Wrong
|
||||
await Task.Delay(1000); // ❌ Missing ct
|
||||
|
||||
return promotion;
|
||||
}
|
||||
|
||||
// ✅ GOOD - propagates cancellation
|
||||
public async Task<Promotion> ApprovePromotionAsync(Guid promotionId, Guid userId, CancellationToken ct)
|
||||
{
|
||||
var promotion = await _repository.GetByIdAsync(promotionId, ct); // ✅ Propagated
|
||||
|
||||
promotion.Approvals.Add(new Approval
|
||||
{
|
||||
ApproverId = userId,
|
||||
ApprovedAt = _timeProvider.GetUtcNow()
|
||||
});
|
||||
|
||||
await _repository.SaveAsync(promotion, ct); // ✅ Propagated
|
||||
await Task.Delay(1000, ct); // ✅ Cancellable
|
||||
|
||||
return promotion;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## HTTP Client Usage
|
||||
|
||||
### IHttpClientFactory for Connector Runtime
|
||||
|
||||
**Never** instantiate `HttpClient` directly. Always use `IHttpClientFactory` with configured timeouts and resilience policies.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - direct instantiation risks socket exhaustion
|
||||
public class GitHubConnector
|
||||
{
|
||||
public async Task<string> GetCommitAsync(string sha)
|
||||
{
|
||||
using var client = new HttpClient(); // ❌ Socket exhaustion risk
|
||||
var response = await client.GetAsync($"https://api.github.com/commits/{sha}");
|
||||
return await response.Content.ReadAsStringAsync();
|
||||
}
|
||||
}
|
||||
|
||||
// ✅ GOOD - factory with resilience
|
||||
public class GitHubConnector
|
||||
{
|
||||
private readonly IHttpClientFactory _httpClientFactory;
|
||||
|
||||
public GitHubConnector(IHttpClientFactory httpClientFactory)
|
||||
{
|
||||
_httpClientFactory = httpClientFactory;
|
||||
}
|
||||
|
||||
public async Task<string> GetCommitAsync(string sha, CancellationToken ct)
|
||||
{
|
||||
var client = _httpClientFactory.CreateClient("GitHub");
|
||||
var response = await client.GetAsync($"/commits/{sha}", ct);
|
||||
response.EnsureSuccessStatusCode();
|
||||
return await response.Content.ReadAsStringAsync(ct);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Registration with resilience**:
|
||||
```csharp
|
||||
services.AddHttpClient("GitHub", client =>
|
||||
{
|
||||
client.BaseAddress = new Uri("https://api.github.com");
|
||||
client.Timeout = TimeSpan.FromSeconds(30);
|
||||
client.DefaultRequestHeaders.Add("User-Agent", "StellaOps/1.0");
|
||||
})
|
||||
.AddStandardResilienceHandler(options =>
|
||||
{
|
||||
options.Retry.MaxRetryAttempts = 3;
|
||||
options.CircuitBreaker.SamplingDuration = TimeSpan.FromSeconds(30);
|
||||
options.TotalRequestTimeout.Timeout = TimeSpan.FromMinutes(1);
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Culture & Formatting
|
||||
|
||||
### Invariant Culture for Parsing
|
||||
|
||||
**Always** use `CultureInfo.InvariantCulture` for parsing and formatting dates, numbers, and any string that will be persisted, hashed, or compared.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - culture-sensitive
|
||||
var percentage = double.Parse(input);
|
||||
var formatted = value.ToString("P2");
|
||||
var dateStr = date.ToString("yyyy-MM-dd");
|
||||
|
||||
// ✅ GOOD - invariant culture
|
||||
var percentage = double.Parse(input, CultureInfo.InvariantCulture);
|
||||
var formatted = value.ToString("P2", CultureInfo.InvariantCulture);
|
||||
var dateStr = date.ToString("yyyy-MM-dd", CultureInfo.InvariantCulture);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## JSON Handling
|
||||
|
||||
### RFC 8785 Canonical JSON for Evidence
|
||||
|
||||
For evidence packets and decision records that will be hashed or signed, use **RFC 8785-compliant** canonical JSON serialization.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - non-canonical JSON
|
||||
var json = JsonSerializer.Serialize(decisionRecord, new JsonSerializerOptions
|
||||
{
|
||||
Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase
|
||||
});
|
||||
var hash = ComputeHash(json); // ❌ Non-deterministic
|
||||
|
||||
// ✅ GOOD - use shared canonicalizer
|
||||
var canonicalJson = CanonicalJsonSerializer.Serialize(decisionRecord);
|
||||
var hash = ComputeHash(canonicalJson); // ✅ Deterministic
|
||||
```
|
||||
|
||||
**Canonical JSON Requirements**:
|
||||
- Keys sorted alphabetically
|
||||
- Minimal escaping per RFC 8785 spec
|
||||
- No exponent notation for numbers
|
||||
- No trailing/leading zeros
|
||||
- No whitespace
|
||||
|
||||
---
|
||||
|
||||
## Database Interaction
|
||||
|
||||
### DateTimeOffset for PostgreSQL timestamptz
|
||||
|
||||
PostgreSQL `timestamptz` columns **MUST** be read and written as `DateTimeOffset`, not `DateTime`.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - loses offset information
|
||||
await using var reader = await command.ExecuteReaderAsync(ct);
|
||||
while (await reader.ReadAsync(ct))
|
||||
{
|
||||
var createdAt = reader.GetDateTime(reader.GetOrdinal("created_at")); // ❌ Loses offset
|
||||
}
|
||||
|
||||
// ✅ GOOD - preserves offset
|
||||
await using var reader = await command.ExecuteReaderAsync(ct);
|
||||
while (await reader.ReadAsync(ct))
|
||||
{
|
||||
var createdAt = reader.GetFieldValue<DateTimeOffset>(reader.GetOrdinal("created_at")); // ✅ Correct
|
||||
}
|
||||
```
|
||||
|
||||
**Insertion**:
|
||||
```csharp
|
||||
// ✅ Always use UTC DateTimeOffset
|
||||
var createdAt = _timeProvider.GetUtcNow(); // Returns DateTimeOffset
|
||||
await command.ExecuteNonQueryAsync(ct);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Hybrid Logical Clock (HLC) for Distributed Ordering
|
||||
|
||||
For distributed ordering and audit-safe sequencing, use `IHybridLogicalClock` from `StellaOps.HybridLogicalClock`.
|
||||
|
||||
**When to use HLC**:
|
||||
- Promotion state transitions
|
||||
- Workflow step execution ordering
|
||||
- Deployment task sequencing
|
||||
- Timeline event ordering
|
||||
|
||||
```csharp
|
||||
public class PromotionStateTransition
|
||||
{
|
||||
private readonly IHybridLogicalClock _hlc;
|
||||
private readonly TimeProvider _timeProvider;
|
||||
|
||||
public async Task TransitionStateAsync(
|
||||
Promotion promotion,
|
||||
PromotionState newState,
|
||||
CancellationToken ct)
|
||||
{
|
||||
var transition = new StateTransition
|
||||
{
|
||||
PromotionId = promotion.Id,
|
||||
FromState = promotion.Status,
|
||||
ToState = newState,
|
||||
THlc = _hlc.Tick(), // ✅ Monotonic, skew-tolerant ordering
|
||||
TsWall = _timeProvider.GetUtcNow(), // ✅ Informational timestamp
|
||||
TransitionedBy = _currentUser.Id
|
||||
};
|
||||
|
||||
await _repository.RecordTransitionAsync(transition, ct);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**HLC State Persistence**:
|
||||
```csharp
|
||||
// Service startup
|
||||
public async Task StartAsync(CancellationToken ct)
|
||||
{
|
||||
await _hlc.InitializeFromStateAsync(ct); // Restore monotonicity
|
||||
}
|
||||
|
||||
// Service shutdown
|
||||
public async Task StopAsync(CancellationToken ct)
|
||||
{
|
||||
await _hlc.PersistStateAsync(ct); // Persist HLC state
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration & Options
|
||||
|
||||
### Options Validation at Startup
|
||||
|
||||
Use `ValidateDataAnnotations()` and `ValidateOnStart()` for all options classes.
|
||||
|
||||
```csharp
|
||||
// Options class
|
||||
public sealed class PromotionManagerOptions
|
||||
{
|
||||
[Required]
|
||||
[Range(1, 10)]
|
||||
public int MaxConcurrentPromotions { get; set; } = 3;
|
||||
|
||||
[Required]
|
||||
[Range(1, 3600)]
|
||||
public int ApprovalExpirationSeconds { get; set; } = 1440;
|
||||
}
|
||||
|
||||
// Registration with validation
|
||||
services.AddOptions<PromotionManagerOptions>()
|
||||
.Bind(configuration.GetSection("PromotionManager"))
|
||||
.ValidateDataAnnotations()
|
||||
.ValidateOnStart();
|
||||
|
||||
// Complex validation
|
||||
public class PromotionManagerOptionsValidator : IValidateOptions<PromotionManagerOptions>
|
||||
{
|
||||
public ValidateOptionsResult Validate(string? name, PromotionManagerOptions options)
|
||||
{
|
||||
if (options.MaxConcurrentPromotions <= 0)
|
||||
return ValidateOptionsResult.Fail("MaxConcurrentPromotions must be positive");
|
||||
|
||||
return ValidateOptionsResult.Success;
|
||||
}
|
||||
}
|
||||
|
||||
services.AddSingleton<IValidateOptions<PromotionManagerOptions>, PromotionManagerOptionsValidator>();
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Immutability & Collections
|
||||
|
||||
### Return Immutable Collections from Public APIs
|
||||
|
||||
Public APIs **MUST** return `IReadOnlyList<T>`, `ImmutableArray<T>`, or defensive copies. Never expose mutable backing stores.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - exposes mutable backing store
|
||||
public class ReleaseManager
|
||||
{
|
||||
private readonly List<Component> _components = new();
|
||||
|
||||
public List<Component> Components => _components; // ❌ Callers can mutate!
|
||||
}
|
||||
|
||||
// ✅ GOOD - immutable return
|
||||
public class ReleaseManager
|
||||
{
|
||||
private readonly List<Component> _components = new();
|
||||
|
||||
public IReadOnlyList<Component> Components => _components.AsReadOnly(); // ✅ Read-only
|
||||
|
||||
// Or using ImmutableArray
|
||||
public ImmutableArray<Component> GetComponents() => _components.ToImmutableArray();
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
### No Silent Stubs
|
||||
|
||||
Placeholder code **MUST** throw `NotImplementedException` or return an explicit error. Never return success from unimplemented paths.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - silent stub masks missing implementation
|
||||
public async Task<Result> DeployToNomadAsync(Deployment deployment, CancellationToken ct)
|
||||
{
|
||||
// TODO: implement Nomad deployment
|
||||
return Result.Success(); // ❌ Ships broken feature!
|
||||
}
|
||||
|
||||
// ✅ GOOD - explicit failure
|
||||
public async Task<Result> DeployToNomadAsync(Deployment deployment, CancellationToken ct)
|
||||
{
|
||||
throw new NotImplementedException(
|
||||
"Nomad deployment not yet implemented. See SPRINT_20260115_003_AGENTS_nomad_support.md");
|
||||
}
|
||||
|
||||
// ✅ Alternative: return unsupported result
|
||||
public async Task<Result> DeployToNomadAsync(Deployment deployment, CancellationToken ct)
|
||||
{
|
||||
return Result.Failure("Nomad deployment target not yet supported. Use Docker or Compose.");
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Caching
|
||||
|
||||
### Bounded Caches with Eviction
|
||||
|
||||
**Do not** use `ConcurrentDictionary` or `Dictionary` for caching without eviction policies. Use bounded caches with TTL/LRU eviction.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - unbounded growth
|
||||
public class VersionMapCache
|
||||
{
|
||||
private readonly ConcurrentDictionary<string, DigestMapping> _cache = new();
|
||||
|
||||
public void Add(string tag, DigestMapping mapping)
|
||||
{
|
||||
_cache[tag] = mapping; // ❌ Never evicts, memory grows forever
|
||||
}
|
||||
}
|
||||
|
||||
// ✅ GOOD - bounded with eviction
|
||||
public class VersionMapCache
|
||||
{
|
||||
private readonly MemoryCache _cache;
|
||||
|
||||
public VersionMapCache()
|
||||
{
|
||||
_cache = new MemoryCache(new MemoryCacheOptions
|
||||
{
|
||||
SizeLimit = 10_000 // Max 10k entries
|
||||
});
|
||||
}
|
||||
|
||||
public void Add(string tag, DigestMapping mapping)
|
||||
{
|
||||
_cache.Set(tag, mapping, new MemoryCacheEntryOptions
|
||||
{
|
||||
Size = 1,
|
||||
SlidingExpiration = TimeSpan.FromHours(1) // ✅ 1 hour TTL
|
||||
});
|
||||
}
|
||||
|
||||
public DigestMapping? Get(string tag) => _cache.Get<DigestMapping>(tag);
|
||||
}
|
||||
```
|
||||
|
||||
**Cache TTL Recommendations**:
|
||||
- **Integration health checks**: 5 minutes
|
||||
- **Version maps (tag → digest)**: 1 hour
|
||||
- **Environment configs**: 30 minutes
|
||||
- **Agent capabilities**: 10 minutes
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Test Helpers Must Call Production Code
|
||||
|
||||
Test helpers **MUST** call production code, not reimplement algorithms. Only mock I/O and network boundaries.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - test reimplements production logic
|
||||
public static string ComputeEvidenceHash(DecisionRecord record)
|
||||
{
|
||||
// Custom hash implementation in test
|
||||
var json = JsonSerializer.Serialize(record); // ❌ Different from production!
|
||||
return SHA256.HashData(Encoding.UTF8.GetBytes(json)).ToHexString();
|
||||
}
|
||||
|
||||
// ✅ GOOD - test uses production code
|
||||
public static string ComputeEvidenceHash(DecisionRecord record)
|
||||
{
|
||||
// Calls production EvidenceHasher
|
||||
return EvidenceHasher.ComputeHash(record); // ✅ Same as production
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Path Resolution
|
||||
|
||||
### Explicit CLI Options for Paths
|
||||
|
||||
**Do not** derive paths from `AppContext.BaseDirectory` with parent directory walks. Use explicit CLI options or environment variables.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - fragile parent walks
|
||||
var repoRoot = Path.GetFullPath(Path.Combine(
|
||||
AppContext.BaseDirectory, "..", "..", "..", ".."));
|
||||
|
||||
// ✅ GOOD - explicit option with fallback
|
||||
[Option("--repo-root", Description = "Repository root path")]
|
||||
public string? RepoRoot { get; set; }
|
||||
|
||||
public string GetRepoRoot() =>
|
||||
RepoRoot
|
||||
?? Environment.GetEnvironmentVariable("STELLAOPS_REPO_ROOT")
|
||||
?? throw new InvalidOperationException(
|
||||
"Repository root not specified. Use --repo-root or set STELLAOPS_REPO_ROOT.");
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary Checklist
|
||||
|
||||
Before submitting a pull request, verify:
|
||||
|
||||
- [ ] `TreatWarningsAsErrors` enabled in project file
|
||||
- [ ] All timestamps use `TimeProvider`, never `DateTime.UtcNow`
|
||||
- [ ] All GUIDs use `IGuidGenerator`, never `Guid.NewGuid()`
|
||||
- [ ] `CancellationToken` propagated through all async methods
|
||||
- [ ] HTTP clients use `IHttpClientFactory`, never `new HttpClient()`
|
||||
- [ ] Culture-invariant parsing for all formatted strings
|
||||
- [ ] Canonical JSON for evidence/decision records
|
||||
- [ ] `DateTimeOffset` for all PostgreSQL `timestamptz` columns
|
||||
- [ ] HLC used for distributed ordering where applicable
|
||||
- [ ] Options classes validated at startup with `ValidateOnStart()`
|
||||
- [ ] Public APIs return immutable collections
|
||||
- [ ] No silent stubs; unimplemented code throws `NotImplementedException`
|
||||
- [ ] Caches have bounded size and TTL eviction
|
||||
- [ ] Tests exercise production code, not reimplementations
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [CLAUDE.md](../../../CLAUDE.md) — Stella Ops coding rules
|
||||
- [Test Structure](./test-structure.md) — Test organization guidelines
|
||||
- [Database Schema](./data-model/schema.md) — Schema patterns
|
||||
- [HLC Documentation](../../eventing/event-envelope-schema.md) — Event ordering with HLC
|
||||
643
docs/modules/release-orchestrator/integrations/ci-cd.md
Normal file
643
docs/modules/release-orchestrator/integrations/ci-cd.md
Normal file
@@ -0,0 +1,643 @@
|
||||
# CI/CD Integration
|
||||
|
||||
## Overview
|
||||
|
||||
Release Orchestrator integrates with CI/CD systems to:
|
||||
- Receive build completion notifications
|
||||
- Trigger additional pipelines during deployment
|
||||
- Create releases from CI artifacts
|
||||
- Report deployment status back to CI systems
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Pattern 1: CI Triggers Release
|
||||
|
||||
```
|
||||
CI TRIGGERS RELEASE
|
||||
|
||||
┌────────────┐ ┌────────────┐ ┌────────────────────┐
|
||||
│ CI/CD │ │ Container │ │ Release │
|
||||
│ System │ │ Registry │ │ Orchestrator │
|
||||
└─────┬──────┘ └─────┬──────┘ └─────────┬──────────┘
|
||||
│ │ │
|
||||
│ Build & Push │ │
|
||||
│─────────────────►│ │
|
||||
│ │ │
|
||||
│ │ Webhook: image pushed
|
||||
│ │─────────────────────►│
|
||||
│ │ │
|
||||
│ │ │ Create/Update
|
||||
│ │ │ Version Map
|
||||
│ │ │
|
||||
│ │ │ Auto-create
|
||||
│ │ │ Release (if configured)
|
||||
│ │ │
|
||||
│ API: Create Release (optional) │
|
||||
│────────────────────────────────────────►│
|
||||
│ │ │
|
||||
│ │ │ Start Promotion
|
||||
│ │ │ Workflow
|
||||
│ │ │
|
||||
```
|
||||
|
||||
### Pattern 2: Orchestrator Triggers CI
|
||||
|
||||
```
|
||||
ORCHESTRATOR TRIGGERS CI
|
||||
|
||||
┌────────────────────┐ ┌────────────┐ ┌────────────┐
|
||||
│ Release │ │ CI/CD │ │ Target │
|
||||
│ Orchestrator │ │ System │ │ Systems │
|
||||
└─────────┬──────────┘ └─────┬──────┘ └─────┬──────┘
|
||||
│ │ │
|
||||
│ Pre-deploy: Trigger │ │
|
||||
│ Integration Tests │ │
|
||||
│─────────────────────►│ │
|
||||
│ │ │
|
||||
│ │ Run Tests │
|
||||
│ │─────────────────►│
|
||||
│ │ │
|
||||
│ Wait for completion │ │
|
||||
│◄─────────────────────│ │
|
||||
│ │ │
|
||||
│ If passed: Deploy │ │
|
||||
│─────────────────────────────────────────►
|
||||
│ │ │
|
||||
```
|
||||
|
||||
### Pattern 3: Bidirectional Integration
|
||||
|
||||
```
|
||||
BIDIRECTIONAL INTEGRATION
|
||||
|
||||
┌────────────┐ ┌────────────────────┐
|
||||
│ CI/CD │◄───────────────────────►│ Release │
|
||||
│ System │ │ Orchestrator │
|
||||
└─────┬──────┘ └─────────┬──────────┘
|
||||
│ │
|
||||
│══════════════════════════════════════════│
|
||||
│ Events (both directions) │
|
||||
│══════════════════════════════════════════│
|
||||
│ │
|
||||
│ CI Events: │
|
||||
│ - Pipeline completed │
|
||||
│ - Tests passed/failed │
|
||||
│ - Artifacts ready │
|
||||
│ │
|
||||
│ Orchestrator Events: │
|
||||
│ - Deployment started │
|
||||
│ - Deployment completed │
|
||||
│ - Rollback initiated │
|
||||
│ │
|
||||
```
|
||||
|
||||
## CI/CD System Configuration
|
||||
|
||||
### GitLab CI Integration
|
||||
|
||||
```yaml
|
||||
# .gitlab-ci.yml
|
||||
stages:
|
||||
- build
|
||||
- push
|
||||
- release
|
||||
|
||||
variables:
|
||||
STELLA_API_URL: https://stella.example.com/api/v1
|
||||
COMPONENT_NAME: myapp
|
||||
|
||||
build:
|
||||
stage: build
|
||||
script:
|
||||
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
|
||||
|
||||
push:
|
||||
stage: push
|
||||
script:
|
||||
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
|
||||
- docker tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG
|
||||
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG
|
||||
rules:
|
||||
- if: $CI_COMMIT_TAG
|
||||
|
||||
release:
|
||||
stage: release
|
||||
image: curlimages/curl:latest
|
||||
script:
|
||||
- |
|
||||
# Get image digest
|
||||
DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG | cut -d@ -f2)
|
||||
|
||||
# Create release in Stella
|
||||
curl -X POST "$STELLA_API_URL/releases" \
|
||||
-H "Authorization: Bearer $STELLA_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{
|
||||
\"name\": \"$COMPONENT_NAME-$CI_COMMIT_TAG\",
|
||||
\"components\": [{
|
||||
\"componentId\": \"$STELLA_COMPONENT_ID\",
|
||||
\"digest\": \"$DIGEST\"
|
||||
}],
|
||||
\"sourceRef\": {
|
||||
\"type\": \"git\",
|
||||
\"repository\": \"$CI_PROJECT_URL\",
|
||||
\"commit\": \"$CI_COMMIT_SHA\",
|
||||
\"tag\": \"$CI_COMMIT_TAG\"
|
||||
}
|
||||
}"
|
||||
rules:
|
||||
- if: $CI_COMMIT_TAG
|
||||
```
|
||||
|
||||
### GitHub Actions Integration
|
||||
|
||||
```yaml
|
||||
# .github/workflows/release.yml
|
||||
name: Release to Stella
|
||||
|
||||
on:
|
||||
push:
|
||||
tags:
|
||||
- 'v*'
|
||||
|
||||
jobs:
|
||||
build-and-release:
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
|
||||
- name: Login to Container Registry
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
registry: ghcr.io
|
||||
username: ${{ github.actor }}
|
||||
password: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
- name: Build and push
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
push: true
|
||||
tags: |
|
||||
ghcr.io/${{ github.repository }}:${{ github.sha }}
|
||||
ghcr.io/${{ github.repository }}:${{ github.ref_name }}
|
||||
|
||||
- name: Get image digest
|
||||
id: digest
|
||||
run: |
|
||||
DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' ghcr.io/${{ github.repository }}:${{ github.ref_name }} | cut -d@ -f2)
|
||||
echo "digest=$DIGEST" >> $GITHUB_OUTPUT
|
||||
|
||||
- name: Create Stella Release
|
||||
uses: stella-ops/create-release-action@v1
|
||||
with:
|
||||
stella-url: ${{ vars.STELLA_API_URL }}
|
||||
stella-token: ${{ secrets.STELLA_TOKEN }}
|
||||
release-name: ${{ github.event.repository.name }}-${{ github.ref_name }}
|
||||
components: |
|
||||
- componentId: ${{ vars.STELLA_COMPONENT_ID }}
|
||||
digest: ${{ steps.digest.outputs.digest }}
|
||||
source-ref: |
|
||||
type: git
|
||||
repository: ${{ github.server_url }}/${{ github.repository }}
|
||||
commit: ${{ github.sha }}
|
||||
tag: ${{ github.ref_name }}
|
||||
```
|
||||
|
||||
### Jenkins Integration
|
||||
|
||||
```groovy
|
||||
// Jenkinsfile
|
||||
pipeline {
|
||||
agent any
|
||||
|
||||
environment {
|
||||
STELLA_API_URL = 'https://stella.example.com/api/v1'
|
||||
STELLA_TOKEN = credentials('stella-api-token')
|
||||
REGISTRY = 'registry.example.com'
|
||||
IMAGE_NAME = 'myorg/myapp'
|
||||
}
|
||||
|
||||
stages {
|
||||
stage('Build') {
|
||||
steps {
|
||||
script {
|
||||
docker.build("${REGISTRY}/${IMAGE_NAME}:${env.BUILD_TAG}")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
stage('Push') {
|
||||
steps {
|
||||
script {
|
||||
docker.withRegistry("https://${REGISTRY}", 'registry-creds') {
|
||||
docker.image("${REGISTRY}/${IMAGE_NAME}:${env.BUILD_TAG}").push()
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
stage('Create Release') {
|
||||
when {
|
||||
tag pattern: "v\\d+\\.\\d+\\.\\d+", comparator: "REGEXP"
|
||||
}
|
||||
steps {
|
||||
script {
|
||||
def digest = sh(
|
||||
script: "docker inspect --format='{{index .RepoDigests 0}}' ${REGISTRY}/${IMAGE_NAME}:${env.TAG_NAME} | cut -d@ -f2",
|
||||
returnStdout: true
|
||||
).trim()
|
||||
|
||||
def response = httpRequest(
|
||||
url: "${STELLA_API_URL}/releases",
|
||||
httpMode: 'POST',
|
||||
contentType: 'APPLICATION_JSON',
|
||||
customHeaders: [[name: 'Authorization', value: "Bearer ${STELLA_TOKEN}"]],
|
||||
requestBody: """
|
||||
{
|
||||
"name": "${IMAGE_NAME}-${env.TAG_NAME}",
|
||||
"components": [{
|
||||
"componentId": "${env.STELLA_COMPONENT_ID}",
|
||||
"digest": "${digest}"
|
||||
}],
|
||||
"sourceRef": {
|
||||
"type": "git",
|
||||
"repository": "${env.GIT_URL}",
|
||||
"commit": "${env.GIT_COMMIT}",
|
||||
"tag": "${env.TAG_NAME}"
|
||||
}
|
||||
}
|
||||
"""
|
||||
)
|
||||
|
||||
echo "Release created: ${response.content}"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
post {
|
||||
success {
|
||||
// Notify Stella of successful build
|
||||
httpRequest(
|
||||
url: "${STELLA_API_URL}/webhooks/ci-status",
|
||||
httpMode: 'POST',
|
||||
contentType: 'APPLICATION_JSON',
|
||||
customHeaders: [[name: 'Authorization', value: "Bearer ${STELLA_TOKEN}"]],
|
||||
requestBody: """
|
||||
{
|
||||
"buildId": "${env.BUILD_ID}",
|
||||
"status": "success",
|
||||
"commit": "${env.GIT_COMMIT}"
|
||||
}
|
||||
"""
|
||||
)
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Workflow Step Integration
|
||||
|
||||
### Trigger CI Pipeline Step
|
||||
|
||||
```typescript
|
||||
// Step type: trigger-ci
|
||||
interface TriggerCIConfig {
|
||||
integrationId: UUID; // CI integration reference
|
||||
pipelineId: string; // Pipeline to trigger
|
||||
ref?: string; // Branch/tag reference
|
||||
variables?: Record<string, string>;
|
||||
waitForCompletion: boolean;
|
||||
timeout?: number;
|
||||
}
|
||||
|
||||
class TriggerCIStep implements IStepExecutor {
|
||||
async execute(
|
||||
inputs: StepInputs,
|
||||
config: TriggerCIConfig,
|
||||
context: ExecutionContext
|
||||
): Promise<StepOutputs> {
|
||||
const connector = await this.getConnector(config.integrationId);
|
||||
|
||||
// Trigger pipeline
|
||||
const run = await connector.triggerPipeline(
|
||||
config.pipelineId,
|
||||
{
|
||||
ref: config.ref || context.release?.sourceRef?.tag,
|
||||
variables: {
|
||||
...config.variables,
|
||||
STELLA_RELEASE_ID: context.release?.id,
|
||||
STELLA_PROMOTION_ID: context.promotion?.id,
|
||||
STELLA_ENVIRONMENT: context.environment?.name
|
||||
}
|
||||
}
|
||||
);
|
||||
|
||||
if (!config.waitForCompletion) {
|
||||
return {
|
||||
pipelineRunId: run.id,
|
||||
status: run.status,
|
||||
webUrl: run.webUrl
|
||||
};
|
||||
}
|
||||
|
||||
// Wait for completion
|
||||
const finalStatus = await this.waitForPipeline(
|
||||
connector,
|
||||
run.id,
|
||||
config.timeout || 3600
|
||||
);
|
||||
|
||||
if (finalStatus.status !== "success") {
|
||||
throw new StepError(
|
||||
`Pipeline failed with status: ${finalStatus.status}`,
|
||||
{ pipelineRunId: run.id, status: finalStatus }
|
||||
);
|
||||
}
|
||||
|
||||
return {
|
||||
pipelineRunId: run.id,
|
||||
status: finalStatus.status,
|
||||
webUrl: run.webUrl
|
||||
};
|
||||
}
|
||||
|
||||
private async waitForPipeline(
|
||||
connector: ICICDConnector,
|
||||
runId: string,
|
||||
timeout: number
|
||||
): Promise<PipelineRun> {
|
||||
const deadline = Date.now() + timeout * 1000;
|
||||
|
||||
while (Date.now() < deadline) {
|
||||
const run = await connector.getPipelineRun(runId);
|
||||
|
||||
if (run.status === "success" || run.status === "failed" || run.status === "cancelled") {
|
||||
return run;
|
||||
}
|
||||
|
||||
await sleep(10000); // Poll every 10 seconds
|
||||
}
|
||||
|
||||
throw new TimeoutError(`Pipeline did not complete within ${timeout} seconds`);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Wait for CI Step
|
||||
|
||||
```typescript
|
||||
// Step type: wait-ci
|
||||
interface WaitCIConfig {
|
||||
integrationId: UUID;
|
||||
runId?: string; // If known, or from input
|
||||
runIdInput?: string; // Input name containing run ID
|
||||
timeout: number;
|
||||
failOnError: boolean;
|
||||
}
|
||||
|
||||
class WaitCIStep implements IStepExecutor {
|
||||
async execute(
|
||||
inputs: StepInputs,
|
||||
config: WaitCIConfig,
|
||||
context: ExecutionContext
|
||||
): Promise<StepOutputs> {
|
||||
const runId = config.runId || inputs[config.runIdInput!];
|
||||
|
||||
if (!runId) {
|
||||
throw new StepError("Pipeline run ID not provided");
|
||||
}
|
||||
|
||||
const connector = await this.getConnector(config.integrationId);
|
||||
|
||||
const finalStatus = await this.waitForPipeline(
|
||||
connector,
|
||||
runId,
|
||||
config.timeout
|
||||
);
|
||||
|
||||
const success = finalStatus.status === "success";
|
||||
|
||||
if (!success && config.failOnError) {
|
||||
throw new StepError(
|
||||
`Pipeline failed with status: ${finalStatus.status}`,
|
||||
{ pipelineRunId: runId, status: finalStatus }
|
||||
);
|
||||
}
|
||||
|
||||
return {
|
||||
status: finalStatus.status,
|
||||
success,
|
||||
pipelineRun: finalStatus
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Deployment Status Reporting
|
||||
|
||||
### GitHub Deployment Status
|
||||
|
||||
```typescript
|
||||
class GitHubStatusReporter {
|
||||
async reportDeploymentStart(
|
||||
integration: Integration,
|
||||
deployment: DeploymentContext
|
||||
): Promise<void> {
|
||||
const client = await this.getClient(integration);
|
||||
|
||||
// Create deployment
|
||||
const { data: ghDeployment } = await client.repos.createDeployment({
|
||||
owner: deployment.repository.owner,
|
||||
repo: deployment.repository.name,
|
||||
ref: deployment.sourceRef.commit,
|
||||
environment: deployment.environment.name,
|
||||
auto_merge: false,
|
||||
required_contexts: [],
|
||||
payload: {
|
||||
stellaReleaseId: deployment.release.id,
|
||||
stellaPromotionId: deployment.promotion.id
|
||||
}
|
||||
});
|
||||
|
||||
// Set status to in_progress
|
||||
await client.repos.createDeploymentStatus({
|
||||
owner: deployment.repository.owner,
|
||||
repo: deployment.repository.name,
|
||||
deployment_id: ghDeployment.id,
|
||||
state: "in_progress",
|
||||
log_url: `${this.stellaUrl}/deployments/${deployment.jobId}`,
|
||||
description: "Deployment in progress"
|
||||
});
|
||||
|
||||
// Store deployment ID for later status update
|
||||
await this.storeMapping(deployment.jobId, ghDeployment.id);
|
||||
}
|
||||
|
||||
async reportDeploymentComplete(
|
||||
integration: Integration,
|
||||
deployment: DeploymentContext,
|
||||
success: boolean
|
||||
): Promise<void> {
|
||||
const client = await this.getClient(integration);
|
||||
const ghDeploymentId = await this.getMapping(deployment.jobId);
|
||||
|
||||
await client.repos.createDeploymentStatus({
|
||||
owner: deployment.repository.owner,
|
||||
repo: deployment.repository.name,
|
||||
deployment_id: ghDeploymentId,
|
||||
state: success ? "success" : "failure",
|
||||
log_url: `${this.stellaUrl}/deployments/${deployment.jobId}`,
|
||||
environment_url: deployment.environment.url,
|
||||
description: success
|
||||
? "Deployment completed successfully"
|
||||
: "Deployment failed"
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### GitLab Pipeline Status
|
||||
|
||||
```typescript
|
||||
class GitLabStatusReporter {
|
||||
async reportDeploymentStatus(
|
||||
integration: Integration,
|
||||
deployment: DeploymentContext,
|
||||
state: "running" | "success" | "failed" | "canceled"
|
||||
): Promise<void> {
|
||||
const client = await this.getClient(integration);
|
||||
|
||||
await client.post(
|
||||
`/projects/${integration.config.projectId}/statuses/${deployment.sourceRef.commit}`,
|
||||
{
|
||||
state,
|
||||
ref: deployment.sourceRef.tag || deployment.sourceRef.branch,
|
||||
name: `stella/${deployment.environment.name}`,
|
||||
target_url: `${this.stellaUrl}/deployments/${deployment.jobId}`,
|
||||
description: this.getDescription(state, deployment)
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
private getDescription(state: string, deployment: DeploymentContext): string {
|
||||
switch (state) {
|
||||
case "running":
|
||||
return `Deploying to ${deployment.environment.name}`;
|
||||
case "success":
|
||||
return `Deployed to ${deployment.environment.name}`;
|
||||
case "failed":
|
||||
return `Deployment to ${deployment.environment.name} failed`;
|
||||
case "canceled":
|
||||
return `Deployment to ${deployment.environment.name} cancelled`;
|
||||
default:
|
||||
return "";
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## API for CI Systems
|
||||
|
||||
### Create Release from CI
|
||||
|
||||
```http
|
||||
POST /api/v1/releases
|
||||
Authorization: Bearer <ci-token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"name": "myapp-v1.2.0",
|
||||
"components": [
|
||||
{
|
||||
"componentId": "component-uuid",
|
||||
"digest": "sha256:abc123..."
|
||||
}
|
||||
],
|
||||
"sourceRef": {
|
||||
"type": "git",
|
||||
"repository": "https://github.com/myorg/myapp",
|
||||
"commit": "abc123def456",
|
||||
"tag": "v1.2.0",
|
||||
"branch": "main"
|
||||
},
|
||||
"metadata": {
|
||||
"buildId": "12345",
|
||||
"buildUrl": "https://ci.example.com/builds/12345",
|
||||
"triggeredBy": "ci-pipeline"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Report Build Status
|
||||
|
||||
```http
|
||||
POST /api/v1/ci-events/build-complete
|
||||
Authorization: Bearer <ci-token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"integrationId": "integration-uuid",
|
||||
"buildId": "12345",
|
||||
"status": "success",
|
||||
"commit": "abc123def456",
|
||||
"artifacts": [
|
||||
{
|
||||
"name": "myapp",
|
||||
"digest": "sha256:abc123...",
|
||||
"repository": "registry.example.com/myorg/myapp"
|
||||
}
|
||||
],
|
||||
"testResults": {
|
||||
"passed": 150,
|
||||
"failed": 0,
|
||||
"skipped": 5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Service Account for CI
|
||||
|
||||
### Creating CI Service Account
|
||||
|
||||
```http
|
||||
POST /api/v1/service-accounts
|
||||
Authorization: Bearer <admin-token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"name": "ci-pipeline",
|
||||
"description": "Service account for CI/CD integration",
|
||||
"roles": ["release-creator"],
|
||||
"permissions": [
|
||||
{ "resource": "release", "action": "create" },
|
||||
{ "resource": "component", "action": "read" },
|
||||
{ "resource": "version-map", "action": "read" }
|
||||
],
|
||||
"expiresIn": "365d"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"id": "sa-uuid",
|
||||
"name": "ci-pipeline",
|
||||
"token": "stella_sa_xxxxxxxxxxxxx",
|
||||
"expiresAt": "2027-01-09T00:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Integrations Overview](overview.md)
|
||||
- [Connectors](connectors.md)
|
||||
- [Webhooks](webhooks.md)
|
||||
- [Workflow Templates](../workflow/templates.md)
|
||||
900
docs/modules/release-orchestrator/integrations/connectors.md
Normal file
900
docs/modules/release-orchestrator/integrations/connectors.md
Normal file
@@ -0,0 +1,900 @@
|
||||
# Connector Development
|
||||
|
||||
## Overview
|
||||
|
||||
Connectors are the integration layer between Release Orchestrator and external systems. Each connector implements a standard interface for its integration type.
|
||||
|
||||
## Connector Architecture
|
||||
|
||||
```
|
||||
CONNECTOR ARCHITECTURE
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ CONNECTOR RUNTIME │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ CONNECTOR INTERFACE │ │
|
||||
│ │ │ │
|
||||
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │
|
||||
│ │ │ getCapabilities()│ │ ping() │ │ authenticate() │ │ │
|
||||
│ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │
|
||||
│ │ │ discover() │ │ execute() │ │ healthCheck() │ │ │
|
||||
│ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ CONNECTOR IMPLEMENTATIONS │ │
|
||||
│ │ │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
|
||||
│ │ │ Registry │ │ CI/CD │ │ Notification│ │ Secret │ │ │
|
||||
│ │ │ Connectors │ │ Connectors │ │ Connectors │ │ Connectors │ │ │
|
||||
│ │ │ │ │ │ │ │ │ │ │ │
|
||||
│ │ │ - Docker │ │ - GitLab │ │ - Slack │ │ - Vault │ │ │
|
||||
│ │ │ - ECR │ │ - GitHub │ │ - Teams │ │ - AWS SM │ │ │
|
||||
│ │ │ - ACR │ │ - Jenkins │ │ - Email │ │ - Azure KV │ │ │
|
||||
│ │ │ - Harbor │ │ - Azure DO │ │ - PagerDuty │ │ │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Base Connector Interface
|
||||
|
||||
```typescript
|
||||
interface IConnector {
|
||||
// Metadata
|
||||
readonly typeId: string;
|
||||
readonly displayName: string;
|
||||
readonly version: string;
|
||||
readonly capabilities: ConnectorCapabilities;
|
||||
|
||||
// Lifecycle
|
||||
initialize(config: IntegrationConfig): Promise<void>;
|
||||
dispose(): Promise<void>;
|
||||
|
||||
// Health
|
||||
ping(config: IntegrationConfig): Promise<void>;
|
||||
healthCheck(config: IntegrationConfig, creds: Credential): Promise<HealthCheckResult>;
|
||||
|
||||
// Authentication
|
||||
authenticate(config: IntegrationConfig, creds: Credential): Promise<AuthContext>;
|
||||
|
||||
// Discovery (optional)
|
||||
discover?(
|
||||
config: IntegrationConfig,
|
||||
authContext: AuthContext,
|
||||
resourceType: string,
|
||||
filter?: DiscoveryFilter
|
||||
): Promise<DiscoveredResource[]>;
|
||||
}
|
||||
|
||||
interface ConnectorCapabilities {
|
||||
discovery: boolean;
|
||||
webhooks: boolean;
|
||||
streaming: boolean;
|
||||
batchOperations: boolean;
|
||||
customActions: string[];
|
||||
}
|
||||
```
|
||||
|
||||
## Registry Connectors
|
||||
|
||||
### IRegistryConnector
|
||||
|
||||
```typescript
|
||||
interface IRegistryConnector extends IConnector {
|
||||
// Repository operations
|
||||
listRepositories(authContext: AuthContext): Promise<Repository[]>;
|
||||
|
||||
// Tag operations
|
||||
listTags(authContext: AuthContext, repository: string): Promise<Tag[]>;
|
||||
getManifest(authContext: AuthContext, repository: string, reference: string): Promise<Manifest>;
|
||||
getDigest(authContext: AuthContext, repository: string, tag: string): Promise<string>;
|
||||
|
||||
// Image operations
|
||||
imageExists(authContext: AuthContext, repository: string, digest: string): Promise<boolean>;
|
||||
getImageMetadata(authContext: AuthContext, repository: string, digest: string): Promise<ImageMetadata>;
|
||||
}
|
||||
|
||||
interface Repository {
|
||||
name: string;
|
||||
fullName: string;
|
||||
tagCount?: number;
|
||||
lastUpdated?: DateTime;
|
||||
}
|
||||
|
||||
interface Tag {
|
||||
name: string;
|
||||
digest: string;
|
||||
createdAt?: DateTime;
|
||||
size?: number;
|
||||
}
|
||||
|
||||
interface ImageMetadata {
|
||||
digest: string;
|
||||
mediaType: string;
|
||||
size: number;
|
||||
architecture: string;
|
||||
os: string;
|
||||
created: DateTime;
|
||||
labels: Record<string, string>;
|
||||
layers: LayerInfo[];
|
||||
}
|
||||
```
|
||||
|
||||
### Docker Registry Connector
|
||||
|
||||
```typescript
|
||||
class DockerRegistryConnector implements IRegistryConnector {
|
||||
readonly typeId = "docker-registry";
|
||||
readonly displayName = "Docker Registry";
|
||||
readonly version = "1.0.0";
|
||||
readonly capabilities: ConnectorCapabilities = {
|
||||
discovery: true,
|
||||
webhooks: true,
|
||||
streaming: false,
|
||||
batchOperations: false,
|
||||
customActions: []
|
||||
};
|
||||
|
||||
private httpClient: HttpClient;
|
||||
|
||||
async initialize(config: DockerRegistryConfig): Promise<void> {
|
||||
this.httpClient = new HttpClient({
|
||||
baseUrl: config.url,
|
||||
timeout: config.timeout || 30000,
|
||||
insecureSkipVerify: config.insecureSkipVerify
|
||||
});
|
||||
}
|
||||
|
||||
async ping(config: DockerRegistryConfig): Promise<void> {
|
||||
const response = await this.httpClient.get("/v2/");
|
||||
if (response.status !== 200 && response.status !== 401) {
|
||||
throw new Error(`Registry unavailable: ${response.status}`);
|
||||
}
|
||||
}
|
||||
|
||||
async authenticate(
|
||||
config: DockerRegistryConfig,
|
||||
creds: BasicCredential
|
||||
): Promise<AuthContext> {
|
||||
// Get auth challenge from /v2/
|
||||
const challenge = await this.getAuthChallenge();
|
||||
|
||||
if (challenge.type === "bearer") {
|
||||
// OAuth2 token flow
|
||||
const token = await this.getToken(challenge, creds);
|
||||
return { type: "bearer", token };
|
||||
} else {
|
||||
// Basic auth
|
||||
return {
|
||||
type: "basic",
|
||||
credentials: Buffer.from(`${creds.username}:${creds.password}`).toString("base64")
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
async getDigest(
|
||||
authContext: AuthContext,
|
||||
repository: string,
|
||||
tag: string
|
||||
): Promise<string> {
|
||||
const response = await this.httpClient.head(
|
||||
`/v2/${repository}/manifests/${tag}`,
|
||||
{
|
||||
headers: {
|
||||
...this.authHeader(authContext),
|
||||
Accept: "application/vnd.docker.distribution.manifest.v2+json"
|
||||
}
|
||||
}
|
||||
);
|
||||
|
||||
const digest = response.headers.get("docker-content-digest");
|
||||
if (!digest) {
|
||||
throw new Error("No digest header in response");
|
||||
}
|
||||
|
||||
return digest;
|
||||
}
|
||||
|
||||
async getImageMetadata(
|
||||
authContext: AuthContext,
|
||||
repository: string,
|
||||
digest: string
|
||||
): Promise<ImageMetadata> {
|
||||
// Fetch manifest
|
||||
const manifest = await this.getManifest(authContext, repository, digest);
|
||||
|
||||
// Fetch config blob
|
||||
const configDigest = manifest.config.digest;
|
||||
const configResponse = await this.httpClient.get(
|
||||
`/v2/${repository}/blobs/${configDigest}`,
|
||||
{ headers: this.authHeader(authContext) }
|
||||
);
|
||||
|
||||
const config = await configResponse.json();
|
||||
|
||||
return {
|
||||
digest,
|
||||
mediaType: manifest.mediaType,
|
||||
size: manifest.config.size,
|
||||
architecture: config.architecture,
|
||||
os: config.os,
|
||||
created: new Date(config.created),
|
||||
labels: config.config?.Labels || {},
|
||||
layers: manifest.layers.map(l => ({
|
||||
digest: l.digest,
|
||||
size: l.size,
|
||||
mediaType: l.mediaType
|
||||
}))
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### ECR Connector
|
||||
|
||||
```typescript
|
||||
class ECRConnector implements IRegistryConnector {
|
||||
readonly typeId = "ecr";
|
||||
readonly displayName = "AWS ECR";
|
||||
readonly version = "1.0.0";
|
||||
readonly capabilities: ConnectorCapabilities = {
|
||||
discovery: true,
|
||||
webhooks: false,
|
||||
streaming: false,
|
||||
batchOperations: true,
|
||||
customActions: ["createRepository", "setLifecyclePolicy"]
|
||||
};
|
||||
|
||||
private ecrClient: ECRClient;
|
||||
|
||||
async initialize(config: ECRConfig): Promise<void> {
|
||||
this.ecrClient = new ECRClient({
|
||||
region: config.region,
|
||||
credentials: {
|
||||
accessKeyId: config.accessKeyId,
|
||||
secretAccessKey: config.secretAccessKey
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
async authenticate(
|
||||
config: ECRConfig,
|
||||
creds: AWSCredential
|
||||
): Promise<AuthContext> {
|
||||
const command = new GetAuthorizationTokenCommand({});
|
||||
const response = await this.ecrClient.send(command);
|
||||
|
||||
const authData = response.authorizationData?.[0];
|
||||
if (!authData?.authorizationToken) {
|
||||
throw new Error("Failed to get ECR authorization token");
|
||||
}
|
||||
|
||||
return {
|
||||
type: "bearer",
|
||||
token: authData.authorizationToken,
|
||||
expiresAt: authData.expiresAt
|
||||
};
|
||||
}
|
||||
|
||||
async listRepositories(authContext: AuthContext): Promise<Repository[]> {
|
||||
const repositories: Repository[] = [];
|
||||
let nextToken: string | undefined;
|
||||
|
||||
do {
|
||||
const command = new DescribeRepositoriesCommand({
|
||||
nextToken
|
||||
});
|
||||
const response = await this.ecrClient.send(command);
|
||||
|
||||
for (const repo of response.repositories || []) {
|
||||
repositories.push({
|
||||
name: repo.repositoryName!,
|
||||
fullName: repo.repositoryUri!,
|
||||
lastUpdated: repo.createdAt
|
||||
});
|
||||
}
|
||||
|
||||
nextToken = response.nextToken;
|
||||
} while (nextToken);
|
||||
|
||||
return repositories;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## CI/CD Connectors
|
||||
|
||||
### ICICDConnector
|
||||
|
||||
```typescript
|
||||
interface ICICDConnector extends IConnector {
|
||||
// Pipeline operations
|
||||
listPipelines(authContext: AuthContext): Promise<Pipeline[]>;
|
||||
getPipeline(authContext: AuthContext, pipelineId: string): Promise<Pipeline>;
|
||||
|
||||
// Trigger operations
|
||||
triggerPipeline(
|
||||
authContext: AuthContext,
|
||||
pipelineId: string,
|
||||
params: TriggerParams
|
||||
): Promise<PipelineRun>;
|
||||
|
||||
// Run operations
|
||||
getPipelineRun(authContext: AuthContext, runId: string): Promise<PipelineRun>;
|
||||
cancelPipelineRun(authContext: AuthContext, runId: string): Promise<void>;
|
||||
getPipelineRunLogs(authContext: AuthContext, runId: string): Promise<string>;
|
||||
}
|
||||
|
||||
interface Pipeline {
|
||||
id: string;
|
||||
name: string;
|
||||
ref?: string;
|
||||
webUrl?: string;
|
||||
}
|
||||
|
||||
interface TriggerParams {
|
||||
ref?: string; // Branch/tag
|
||||
variables?: Record<string, string>;
|
||||
}
|
||||
|
||||
interface PipelineRun {
|
||||
id: string;
|
||||
pipelineId: string;
|
||||
status: PipelineStatus;
|
||||
ref?: string;
|
||||
webUrl?: string;
|
||||
startedAt?: DateTime;
|
||||
finishedAt?: DateTime;
|
||||
}
|
||||
|
||||
type PipelineStatus =
|
||||
| "pending"
|
||||
| "running"
|
||||
| "success"
|
||||
| "failed"
|
||||
| "cancelled";
|
||||
```
|
||||
|
||||
### GitLab CI Connector
|
||||
|
||||
```typescript
|
||||
class GitLabCIConnector implements ICICDConnector {
|
||||
readonly typeId = "gitlab-ci";
|
||||
readonly displayName = "GitLab CI/CD";
|
||||
readonly version = "1.0.0";
|
||||
readonly capabilities: ConnectorCapabilities = {
|
||||
discovery: true,
|
||||
webhooks: true,
|
||||
streaming: false,
|
||||
batchOperations: false,
|
||||
customActions: ["retryPipeline"]
|
||||
};
|
||||
|
||||
private apiClient: GitLabClient;
|
||||
|
||||
async initialize(config: GitLabCIConfig): Promise<void> {
|
||||
this.apiClient = new GitLabClient({
|
||||
baseUrl: config.url,
|
||||
projectId: config.projectId
|
||||
});
|
||||
}
|
||||
|
||||
async authenticate(
|
||||
config: GitLabCIConfig,
|
||||
creds: TokenCredential
|
||||
): Promise<AuthContext> {
|
||||
// Validate token with user endpoint
|
||||
this.apiClient.setToken(creds.token);
|
||||
await this.apiClient.get("/user");
|
||||
|
||||
return {
|
||||
type: "bearer",
|
||||
token: creds.token
|
||||
};
|
||||
}
|
||||
|
||||
async triggerPipeline(
|
||||
authContext: AuthContext,
|
||||
pipelineId: string,
|
||||
params: TriggerParams
|
||||
): Promise<PipelineRun> {
|
||||
const response = await this.apiClient.post(
|
||||
`/projects/${this.projectId}/pipeline`,
|
||||
{
|
||||
ref: params.ref || this.defaultBranch,
|
||||
variables: Object.entries(params.variables || {}).map(([key, value]) => ({
|
||||
key,
|
||||
value,
|
||||
variable_type: "env_var"
|
||||
}))
|
||||
},
|
||||
{ headers: { Authorization: `Bearer ${authContext.token}` } }
|
||||
);
|
||||
|
||||
return {
|
||||
id: response.id.toString(),
|
||||
pipelineId: pipelineId,
|
||||
status: this.mapStatus(response.status),
|
||||
ref: response.ref,
|
||||
webUrl: response.web_url,
|
||||
startedAt: response.started_at ? new Date(response.started_at) : undefined
|
||||
};
|
||||
}
|
||||
|
||||
async getPipelineRun(
|
||||
authContext: AuthContext,
|
||||
runId: string
|
||||
): Promise<PipelineRun> {
|
||||
const response = await this.apiClient.get(
|
||||
`/projects/${this.projectId}/pipelines/${runId}`,
|
||||
{ headers: { Authorization: `Bearer ${authContext.token}` } }
|
||||
);
|
||||
|
||||
return {
|
||||
id: response.id.toString(),
|
||||
pipelineId: response.id.toString(),
|
||||
status: this.mapStatus(response.status),
|
||||
ref: response.ref,
|
||||
webUrl: response.web_url,
|
||||
startedAt: response.started_at ? new Date(response.started_at) : undefined,
|
||||
finishedAt: response.finished_at ? new Date(response.finished_at) : undefined
|
||||
};
|
||||
}
|
||||
|
||||
private mapStatus(gitlabStatus: string): PipelineStatus {
|
||||
const statusMap: Record<string, PipelineStatus> = {
|
||||
created: "pending",
|
||||
waiting_for_resource: "pending",
|
||||
preparing: "pending",
|
||||
pending: "pending",
|
||||
running: "running",
|
||||
success: "success",
|
||||
failed: "failed",
|
||||
canceled: "cancelled",
|
||||
skipped: "cancelled",
|
||||
manual: "pending"
|
||||
};
|
||||
return statusMap[gitlabStatus] || "pending";
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Notification Connectors
|
||||
|
||||
### INotificationConnector
|
||||
|
||||
```typescript
|
||||
interface INotificationConnector extends IConnector {
|
||||
// Channel operations
|
||||
listChannels(authContext: AuthContext): Promise<Channel[]>;
|
||||
|
||||
// Send operations
|
||||
sendMessage(
|
||||
authContext: AuthContext,
|
||||
channel: string,
|
||||
message: NotificationMessage
|
||||
): Promise<MessageResult>;
|
||||
|
||||
sendTemplate(
|
||||
authContext: AuthContext,
|
||||
channel: string,
|
||||
templateId: string,
|
||||
data: Record<string, any>
|
||||
): Promise<MessageResult>;
|
||||
}
|
||||
|
||||
interface Channel {
|
||||
id: string;
|
||||
name: string;
|
||||
type: string;
|
||||
}
|
||||
|
||||
interface NotificationMessage {
|
||||
text: string;
|
||||
title?: string;
|
||||
color?: string;
|
||||
fields?: MessageField[];
|
||||
actions?: MessageAction[];
|
||||
}
|
||||
|
||||
interface MessageField {
|
||||
name: string;
|
||||
value: string;
|
||||
inline?: boolean;
|
||||
}
|
||||
|
||||
interface MessageAction {
|
||||
type: "button" | "link";
|
||||
text: string;
|
||||
url?: string;
|
||||
style?: "primary" | "danger" | "default";
|
||||
}
|
||||
```
|
||||
|
||||
### Slack Connector
|
||||
|
||||
```typescript
|
||||
class SlackConnector implements INotificationConnector {
|
||||
readonly typeId = "slack";
|
||||
readonly displayName = "Slack";
|
||||
readonly version = "1.0.0";
|
||||
readonly capabilities: ConnectorCapabilities = {
|
||||
discovery: true,
|
||||
webhooks: true,
|
||||
streaming: false,
|
||||
batchOperations: false,
|
||||
customActions: ["addReaction", "updateMessage"]
|
||||
};
|
||||
|
||||
private slackClient: WebClient;
|
||||
|
||||
async initialize(config: SlackConfig): Promise<void> {
|
||||
// Client initialized in authenticate
|
||||
}
|
||||
|
||||
async authenticate(
|
||||
config: SlackConfig,
|
||||
creds: TokenCredential
|
||||
): Promise<AuthContext> {
|
||||
this.slackClient = new WebClient(creds.token);
|
||||
|
||||
// Test authentication
|
||||
const result = await this.slackClient.auth.test();
|
||||
if (!result.ok) {
|
||||
throw new Error("Slack authentication failed");
|
||||
}
|
||||
|
||||
return {
|
||||
type: "bearer",
|
||||
token: creds.token,
|
||||
teamId: result.team_id,
|
||||
userId: result.user_id
|
||||
};
|
||||
}
|
||||
|
||||
async listChannels(authContext: AuthContext): Promise<Channel[]> {
|
||||
const channels: Channel[] = [];
|
||||
let cursor: string | undefined;
|
||||
|
||||
do {
|
||||
const result = await this.slackClient.conversations.list({
|
||||
types: "public_channel,private_channel",
|
||||
cursor
|
||||
});
|
||||
|
||||
for (const channel of result.channels || []) {
|
||||
channels.push({
|
||||
id: channel.id!,
|
||||
name: channel.name!,
|
||||
type: channel.is_private ? "private" : "public"
|
||||
});
|
||||
}
|
||||
|
||||
cursor = result.response_metadata?.next_cursor;
|
||||
} while (cursor);
|
||||
|
||||
return channels;
|
||||
}
|
||||
|
||||
async sendMessage(
|
||||
authContext: AuthContext,
|
||||
channel: string,
|
||||
message: NotificationMessage
|
||||
): Promise<MessageResult> {
|
||||
const blocks = this.buildBlocks(message);
|
||||
|
||||
const result = await this.slackClient.chat.postMessage({
|
||||
channel,
|
||||
text: message.text,
|
||||
blocks,
|
||||
attachments: message.color ? [{
|
||||
color: message.color,
|
||||
blocks
|
||||
}] : undefined
|
||||
});
|
||||
|
||||
return {
|
||||
messageId: result.ts!,
|
||||
channel: result.channel!,
|
||||
success: result.ok
|
||||
};
|
||||
}
|
||||
|
||||
private buildBlocks(message: NotificationMessage): KnownBlock[] {
|
||||
const blocks: KnownBlock[] = [];
|
||||
|
||||
if (message.title) {
|
||||
blocks.push({
|
||||
type: "header",
|
||||
text: {
|
||||
type: "plain_text",
|
||||
text: message.title
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
blocks.push({
|
||||
type: "section",
|
||||
text: {
|
||||
type: "mrkdwn",
|
||||
text: message.text
|
||||
}
|
||||
});
|
||||
|
||||
if (message.fields?.length) {
|
||||
blocks.push({
|
||||
type: "section",
|
||||
fields: message.fields.map(f => ({
|
||||
type: "mrkdwn",
|
||||
text: `*${f.name}*\n${f.value}`
|
||||
}))
|
||||
});
|
||||
}
|
||||
|
||||
if (message.actions?.length) {
|
||||
blocks.push({
|
||||
type: "actions",
|
||||
elements: message.actions.map(a => ({
|
||||
type: "button",
|
||||
text: {
|
||||
type: "plain_text",
|
||||
text: a.text
|
||||
},
|
||||
url: a.url,
|
||||
style: a.style === "danger" ? "danger" : "primary"
|
||||
}))
|
||||
});
|
||||
}
|
||||
|
||||
return blocks;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Secret Store Connectors
|
||||
|
||||
### ISecretConnector
|
||||
|
||||
```typescript
|
||||
interface ISecretConnector extends IConnector {
|
||||
// Secret operations
|
||||
getSecret(
|
||||
authContext: AuthContext,
|
||||
path: string,
|
||||
key?: string
|
||||
): Promise<SecretValue>;
|
||||
|
||||
listSecrets(
|
||||
authContext: AuthContext,
|
||||
path: string
|
||||
): Promise<string[]>;
|
||||
}
|
||||
|
||||
interface SecretValue {
|
||||
value: string;
|
||||
version?: string;
|
||||
createdAt?: DateTime;
|
||||
expiresAt?: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
### HashiCorp Vault Connector
|
||||
|
||||
```typescript
|
||||
class VaultConnector implements ISecretConnector {
|
||||
readonly typeId = "hashicorp-vault";
|
||||
readonly displayName = "HashiCorp Vault";
|
||||
readonly version = "1.0.0";
|
||||
readonly capabilities: ConnectorCapabilities = {
|
||||
discovery: true,
|
||||
webhooks: false,
|
||||
streaming: false,
|
||||
batchOperations: false,
|
||||
customActions: ["renewToken"]
|
||||
};
|
||||
|
||||
private vaultClient: VaultClient;
|
||||
|
||||
async initialize(config: VaultConfig): Promise<void> {
|
||||
this.vaultClient = new VaultClient({
|
||||
endpoint: config.url,
|
||||
namespace: config.namespace
|
||||
});
|
||||
}
|
||||
|
||||
async authenticate(
|
||||
config: VaultConfig,
|
||||
creds: Credential
|
||||
): Promise<AuthContext> {
|
||||
let token: string;
|
||||
|
||||
switch (config.authMethod) {
|
||||
case "token":
|
||||
token = (creds as TokenCredential).token;
|
||||
break;
|
||||
|
||||
case "approle":
|
||||
const approle = creds as AppRoleCredential;
|
||||
const result = await this.vaultClient.auth.approle.login({
|
||||
role_id: approle.roleId,
|
||||
secret_id: approle.secretId
|
||||
});
|
||||
token = result.auth.client_token;
|
||||
break;
|
||||
|
||||
case "kubernetes":
|
||||
const k8s = creds as KubernetesCredential;
|
||||
const k8sResult = await this.vaultClient.auth.kubernetes.login({
|
||||
role: k8s.role,
|
||||
jwt: k8s.serviceAccountToken
|
||||
});
|
||||
token = k8sResult.auth.client_token;
|
||||
break;
|
||||
|
||||
default:
|
||||
throw new Error(`Unsupported auth method: ${config.authMethod}`);
|
||||
}
|
||||
|
||||
this.vaultClient.token = token;
|
||||
|
||||
return {
|
||||
type: "bearer",
|
||||
token,
|
||||
renewable: true
|
||||
};
|
||||
}
|
||||
|
||||
async getSecret(
|
||||
authContext: AuthContext,
|
||||
path: string,
|
||||
key?: string
|
||||
): Promise<SecretValue> {
|
||||
const result = await this.vaultClient.kv.v2.read({
|
||||
mount_path: this.mountPath,
|
||||
path
|
||||
});
|
||||
|
||||
const data = result.data.data;
|
||||
const value = key ? data[key] : JSON.stringify(data);
|
||||
|
||||
return {
|
||||
value,
|
||||
version: result.data.metadata.version.toString(),
|
||||
createdAt: new Date(result.data.metadata.created_time)
|
||||
};
|
||||
}
|
||||
|
||||
async listSecrets(
|
||||
authContext: AuthContext,
|
||||
path: string
|
||||
): Promise<string[]> {
|
||||
const result = await this.vaultClient.kv.v2.list({
|
||||
mount_path: this.mountPath,
|
||||
path
|
||||
});
|
||||
|
||||
return result.data.keys;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Custom Connector Development
|
||||
|
||||
### Plugin Structure
|
||||
|
||||
```
|
||||
my-connector/
|
||||
├── manifest.yaml
|
||||
├── src/
|
||||
│ ├── connector.ts
|
||||
│ ├── config.ts
|
||||
│ └── types.ts
|
||||
└── package.json
|
||||
```
|
||||
|
||||
### Manifest
|
||||
|
||||
```yaml
|
||||
# manifest.yaml
|
||||
id: my-custom-connector
|
||||
version: 1.0.0
|
||||
name: My Custom Connector
|
||||
description: Custom connector for XYZ service
|
||||
author: Your Name
|
||||
|
||||
connector:
|
||||
typeId: my-service
|
||||
displayName: My Service
|
||||
entrypoint: ./src/connector.js
|
||||
|
||||
capabilities:
|
||||
discovery: true
|
||||
webhooks: false
|
||||
streaming: false
|
||||
batchOperations: false
|
||||
|
||||
config_schema:
|
||||
type: object
|
||||
properties:
|
||||
url:
|
||||
type: string
|
||||
format: uri
|
||||
description: Service URL
|
||||
timeout:
|
||||
type: integer
|
||||
default: 30000
|
||||
required:
|
||||
- url
|
||||
|
||||
credential_types:
|
||||
- api-key
|
||||
- oauth2
|
||||
```
|
||||
|
||||
### Implementation
|
||||
|
||||
```typescript
|
||||
// connector.ts
|
||||
import { IConnector, ConnectorCapabilities } from "@stella-ops/connector-sdk";
|
||||
|
||||
export class MyConnector implements IConnector {
|
||||
readonly typeId = "my-service";
|
||||
readonly displayName = "My Service";
|
||||
readonly version = "1.0.0";
|
||||
readonly capabilities: ConnectorCapabilities = {
|
||||
discovery: true,
|
||||
webhooks: false,
|
||||
streaming: false,
|
||||
batchOperations: false,
|
||||
customActions: []
|
||||
};
|
||||
|
||||
async initialize(config: MyConfig): Promise<void> {
|
||||
// Initialize your connector
|
||||
}
|
||||
|
||||
async dispose(): Promise<void> {
|
||||
// Cleanup resources
|
||||
}
|
||||
|
||||
async ping(config: MyConfig): Promise<void> {
|
||||
// Check connectivity
|
||||
}
|
||||
|
||||
async healthCheck(config: MyConfig, creds: Credential): Promise<HealthCheckResult> {
|
||||
// Full health check
|
||||
}
|
||||
|
||||
async authenticate(config: MyConfig, creds: Credential): Promise<AuthContext> {
|
||||
// Authenticate and return context
|
||||
}
|
||||
|
||||
async discover(
|
||||
config: MyConfig,
|
||||
authContext: AuthContext,
|
||||
resourceType: string,
|
||||
filter?: DiscoveryFilter
|
||||
): Promise<DiscoveredResource[]> {
|
||||
// Discover resources
|
||||
}
|
||||
}
|
||||
|
||||
// Export connector factory
|
||||
export default function createConnector(): IConnector {
|
||||
return new MyConnector();
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Integrations Overview](overview.md)
|
||||
- [Webhooks](webhooks.md)
|
||||
- [Plugin System](../modules/plugin-system.md)
|
||||
412
docs/modules/release-orchestrator/integrations/overview.md
Normal file
412
docs/modules/release-orchestrator/integrations/overview.md
Normal file
@@ -0,0 +1,412 @@
|
||||
# Integrations Overview
|
||||
|
||||
## Purpose
|
||||
|
||||
The Integration Hub (INTHUB) provides a unified interface for connecting Release Orchestrator to external systems including container registries, CI/CD pipelines, notification services, secret stores, and metrics providers.
|
||||
|
||||
## Integration Architecture
|
||||
|
||||
```
|
||||
INTEGRATION HUB ARCHITECTURE
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ INTEGRATION HUB │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ INTEGRATION MANAGER │ │
|
||||
│ │ │ │
|
||||
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
|
||||
│ │ │ Type │ │ Instance │ │ Health │ │ Discovery │ │ │
|
||||
│ │ │ Registry │ │ Manager │ │ Monitor │ │ Service │ │ │
|
||||
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ CONNECTOR RUNTIME │ │
|
||||
│ │ │ │
|
||||
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ CONNECTOR POOL │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │
|
||||
│ │ │ │ Docker │ │ GitLab │ │ Slack │ │ Vault │ │ │ │
|
||||
│ │ │ │ Registry │ │ CI │ │ │ │ │ │ │ │
|
||||
│ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ └──────────────────────────────────────────────────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────┬─────────────────┼─────────────────┬─────────────┐
|
||||
│ │ │ │ │
|
||||
▼ ▼ ▼ ▼ ▼
|
||||
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
|
||||
│Container│ │ CI/CD │ │ Notifi- │ │ Secret │ │ Metrics │
|
||||
│Registry │ │ Systems │ │ cations │ │ Stores │ │ Systems │
|
||||
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘
|
||||
```
|
||||
|
||||
## Integration Types
|
||||
|
||||
### Container Registries
|
||||
|
||||
| Type ID | Description | Discovery Support |
|
||||
|---------|-------------|-------------------|
|
||||
| `docker-registry` | Docker Registry v2 API | Yes |
|
||||
| `docker-hub` | Docker Hub | Yes |
|
||||
| `gcr` | Google Container Registry | Yes |
|
||||
| `ecr` | AWS Elastic Container Registry | Yes |
|
||||
| `acr` | Azure Container Registry | Yes |
|
||||
| `ghcr` | GitHub Container Registry | Yes |
|
||||
| `harbor` | Harbor Registry | Yes |
|
||||
| `jfrog` | JFrog Artifactory | Yes |
|
||||
| `nexus` | Sonatype Nexus | Yes |
|
||||
| `quay` | Quay.io | Yes |
|
||||
|
||||
### CI/CD Systems
|
||||
|
||||
| Type ID | Description | Trigger Support |
|
||||
|---------|-------------|-----------------|
|
||||
| `gitlab-ci` | GitLab CI/CD | Yes |
|
||||
| `github-actions` | GitHub Actions | Yes |
|
||||
| `jenkins` | Jenkins | Yes |
|
||||
| `azure-devops` | Azure DevOps Pipelines | Yes |
|
||||
| `circleci` | CircleCI | Yes |
|
||||
| `teamcity` | TeamCity | Yes |
|
||||
| `drone` | Drone CI | Yes |
|
||||
|
||||
### Notification Services
|
||||
|
||||
| Type ID | Description | Features |
|
||||
|---------|-------------|----------|
|
||||
| `slack` | Slack | Channels, threads, reactions |
|
||||
| `teams` | Microsoft Teams | Channels, cards |
|
||||
| `email` | Email (SMTP) | Templates, attachments |
|
||||
| `webhook` | Generic Webhook | JSON payloads |
|
||||
| `pagerduty` | PagerDuty | Incidents, alerts |
|
||||
| `opsgenie` | OpsGenie | Alerts, on-call |
|
||||
|
||||
### Secret Stores
|
||||
|
||||
| Type ID | Description | Features |
|
||||
|---------|-------------|----------|
|
||||
| `hashicorp-vault` | HashiCorp Vault | KV, Transit, PKI |
|
||||
| `aws-secrets-manager` | AWS Secrets Manager | Rotation, versioning |
|
||||
| `azure-key-vault` | Azure Key Vault | Keys, secrets, certs |
|
||||
| `gcp-secret-manager` | GCP Secret Manager | Versions, labels |
|
||||
|
||||
### Metrics & Monitoring
|
||||
|
||||
| Type ID | Description | Use Case |
|
||||
|---------|-------------|----------|
|
||||
| `prometheus` | Prometheus | Canary metrics |
|
||||
| `datadog` | Datadog | APM, logs, metrics |
|
||||
| `newrelic` | New Relic | APM, infra monitoring |
|
||||
| `dynatrace` | Dynatrace | Full-stack monitoring |
|
||||
|
||||
## Integration Configuration
|
||||
|
||||
### Integration Entity
|
||||
|
||||
```typescript
|
||||
interface Integration {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
typeId: string; // e.g., "docker-registry"
|
||||
name: string; // Display name
|
||||
description?: string;
|
||||
|
||||
// Connection configuration
|
||||
config: IntegrationConfig;
|
||||
|
||||
// Credential reference (stored in vault)
|
||||
credentialRef: string;
|
||||
|
||||
// Health tracking
|
||||
healthStatus: "healthy" | "degraded" | "unhealthy" | "unknown";
|
||||
lastHealthCheck?: DateTime;
|
||||
|
||||
// Metadata
|
||||
labels: Record<string, string>;
|
||||
createdAt: DateTime;
|
||||
updatedAt: DateTime;
|
||||
}
|
||||
|
||||
interface IntegrationConfig {
|
||||
// Common fields
|
||||
url?: string;
|
||||
timeout?: number;
|
||||
retries?: number;
|
||||
|
||||
// Type-specific fields
|
||||
[key: string]: any;
|
||||
}
|
||||
```
|
||||
|
||||
### Type-Specific Configuration
|
||||
|
||||
```typescript
|
||||
// Docker Registry
|
||||
interface DockerRegistryConfig extends IntegrationConfig {
|
||||
url: string; // https://registry.example.com
|
||||
repository?: string; // Optional default repository
|
||||
insecureSkipVerify?: boolean; // Skip TLS verification
|
||||
}
|
||||
|
||||
// GitLab CI
|
||||
interface GitLabCIConfig extends IntegrationConfig {
|
||||
url: string; // https://gitlab.example.com
|
||||
projectId: string; // Project ID or path
|
||||
defaultBranch?: string; // Default ref for triggers
|
||||
}
|
||||
|
||||
// Slack
|
||||
interface SlackConfig extends IntegrationConfig {
|
||||
workspace?: string; // Workspace identifier
|
||||
defaultChannel?: string; // Default channel for notifications
|
||||
iconEmoji?: string; // Bot icon
|
||||
}
|
||||
|
||||
// HashiCorp Vault
|
||||
interface VaultConfig extends IntegrationConfig {
|
||||
url: string; // https://vault.example.com
|
||||
namespace?: string; // Vault namespace
|
||||
mountPath: string; // Secret mount path
|
||||
authMethod: "token" | "approle" | "kubernetes";
|
||||
}
|
||||
```
|
||||
|
||||
## Credential Management
|
||||
|
||||
Credentials are never stored in the Release Orchestrator database. Instead, references to external secret stores are used.
|
||||
|
||||
### Credential Reference Format
|
||||
|
||||
```
|
||||
vault://vault-integration-id/path/to/secret#key
|
||||
└─────────┬────────┘ └─────┬─────┘ └┬┘
|
||||
Vault ID Secret path Key
|
||||
```
|
||||
|
||||
### Credential Types
|
||||
|
||||
```typescript
|
||||
type CredentialType =
|
||||
| "basic" // Username/password
|
||||
| "token" // Bearer token
|
||||
| "api-key" // API key
|
||||
| "oauth2" // OAuth2 credentials
|
||||
| "service-account" // GCP/K8s service account
|
||||
| "certificate"; // Client certificate
|
||||
|
||||
interface CredentialReference {
|
||||
type: CredentialType;
|
||||
ref: string; // Vault reference
|
||||
}
|
||||
|
||||
// Examples
|
||||
const dockerCreds: CredentialReference = {
|
||||
type: "basic",
|
||||
ref: "vault://vault-1/docker/registry.example.com#credentials"
|
||||
};
|
||||
|
||||
const gitlabToken: CredentialReference = {
|
||||
type: "token",
|
||||
ref: "vault://vault-1/ci/gitlab#access_token"
|
||||
};
|
||||
```
|
||||
|
||||
## Health Monitoring
|
||||
|
||||
### Health Check Types
|
||||
|
||||
| Check Type | Description | Frequency |
|
||||
|------------|-------------|-----------|
|
||||
| `connectivity` | TCP/HTTP connectivity | 1 min |
|
||||
| `authentication` | Credential validity | 5 min |
|
||||
| `functionality` | Full operation test | 15 min |
|
||||
|
||||
### Health Check Flow
|
||||
|
||||
```typescript
|
||||
interface HealthCheckResult {
|
||||
integrationId: UUID;
|
||||
checkType: string;
|
||||
status: "healthy" | "degraded" | "unhealthy";
|
||||
latencyMs: number;
|
||||
message?: string;
|
||||
checkedAt: DateTime;
|
||||
}
|
||||
|
||||
class IntegrationHealthMonitor {
|
||||
async checkHealth(integration: Integration): Promise<HealthCheckResult> {
|
||||
const connector = this.connectorPool.get(integration.typeId);
|
||||
const startTime = Date.now();
|
||||
|
||||
try {
|
||||
// Connectivity check
|
||||
await connector.ping(integration.config);
|
||||
|
||||
// Authentication check
|
||||
const creds = await this.fetchCredentials(integration.credentialRef);
|
||||
await connector.authenticate(integration.config, creds);
|
||||
|
||||
return {
|
||||
integrationId: integration.id,
|
||||
checkType: "full",
|
||||
status: "healthy",
|
||||
latencyMs: Date.now() - startTime,
|
||||
checkedAt: new Date()
|
||||
};
|
||||
} catch (error) {
|
||||
return {
|
||||
integrationId: integration.id,
|
||||
checkType: "full",
|
||||
status: this.classifyError(error),
|
||||
latencyMs: Date.now() - startTime,
|
||||
message: error.message,
|
||||
checkedAt: new Date()
|
||||
};
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Discovery Service
|
||||
|
||||
Integrations can discover resources from connected systems.
|
||||
|
||||
### Discovery Operations
|
||||
|
||||
```typescript
|
||||
interface DiscoveryService {
|
||||
// Discover available repositories
|
||||
discoverRepositories(integrationId: UUID): Promise<Repository[]>;
|
||||
|
||||
// Discover tags/versions
|
||||
discoverTags(integrationId: UUID, repository: string): Promise<Tag[]>;
|
||||
|
||||
// Discover pipelines
|
||||
discoverPipelines(integrationId: UUID): Promise<Pipeline[]>;
|
||||
|
||||
// Discover notification channels
|
||||
discoverChannels(integrationId: UUID): Promise<Channel[]>;
|
||||
}
|
||||
|
||||
// Example: Discover Docker repositories
|
||||
const repos = await discoveryService.discoverRepositories(dockerIntegrationId);
|
||||
// Returns: [{ name: "myapp", tags: ["latest", "v1.0.0", ...] }, ...]
|
||||
```
|
||||
|
||||
### Discovery Caching
|
||||
|
||||
```typescript
|
||||
interface DiscoveryCache {
|
||||
key: string; // integration_id:resource_type
|
||||
data: any;
|
||||
discoveredAt: DateTime;
|
||||
ttlSeconds: number;
|
||||
}
|
||||
|
||||
// Cache TTLs by resource type
|
||||
const cacheTTLs = {
|
||||
repositories: 3600, // 1 hour
|
||||
tags: 300, // 5 minutes
|
||||
pipelines: 3600, // 1 hour
|
||||
channels: 86400 // 24 hours
|
||||
};
|
||||
```
|
||||
|
||||
## API Reference
|
||||
|
||||
### Create Integration
|
||||
|
||||
```http
|
||||
POST /api/v1/integrations
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"typeId": "docker-registry",
|
||||
"name": "Production Registry",
|
||||
"config": {
|
||||
"url": "https://registry.example.com",
|
||||
"repository": "myorg"
|
||||
},
|
||||
"credentialRef": "vault://vault-1/docker/prod-registry#credentials",
|
||||
"labels": {
|
||||
"environment": "production"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Test Integration
|
||||
|
||||
```http
|
||||
POST /api/v1/integrations/{id}/test
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"connectivityTest": { "status": "passed", "latencyMs": 45 },
|
||||
"authenticationTest": { "status": "passed", "latencyMs": 120 },
|
||||
"functionalityTest": { "status": "passed", "latencyMs": 230 }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Discover Resources
|
||||
|
||||
```http
|
||||
POST /api/v1/integrations/{id}/discover
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"resourceType": "repositories",
|
||||
"filter": {
|
||||
"namePattern": "myapp-*"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Integration Errors
|
||||
|
||||
| Error Code | Description | Retry Strategy |
|
||||
|------------|-------------|----------------|
|
||||
| `INTEGRATION_NOT_FOUND` | Integration ID not found | No retry |
|
||||
| `INTEGRATION_UNHEALTHY` | Integration health check failing | Backoff retry |
|
||||
| `CREDENTIAL_FETCH_FAILED` | Cannot fetch credentials | Retry with backoff |
|
||||
| `CONNECTION_REFUSED` | Cannot connect to endpoint | Retry with backoff |
|
||||
| `AUTHENTICATION_FAILED` | Invalid credentials | No retry |
|
||||
| `RATE_LIMITED` | Too many requests | Retry after delay |
|
||||
|
||||
### Circuit Breaker
|
||||
|
||||
```typescript
|
||||
interface CircuitBreakerConfig {
|
||||
failureThreshold: number; // Failures before opening
|
||||
successThreshold: number; // Successes to close
|
||||
timeout: number; // Time in open state (ms)
|
||||
}
|
||||
|
||||
// Default configuration
|
||||
const defaultCircuitBreaker: CircuitBreakerConfig = {
|
||||
failureThreshold: 5,
|
||||
successThreshold: 3,
|
||||
timeout: 60000
|
||||
};
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Connectors](connectors.md)
|
||||
- [Webhooks](webhooks.md)
|
||||
- [CI/CD Integration](ci-cd.md)
|
||||
- [Integration Hub Module](../modules/integration-hub.md)
|
||||
627
docs/modules/release-orchestrator/integrations/webhooks.md
Normal file
627
docs/modules/release-orchestrator/integrations/webhooks.md
Normal file
@@ -0,0 +1,627 @@
|
||||
# Webhooks
|
||||
|
||||
## Overview
|
||||
|
||||
Release Orchestrator supports both inbound webhooks (receiving events from external systems) and outbound webhooks (sending events to external systems).
|
||||
|
||||
## Inbound Webhooks
|
||||
|
||||
### Webhook Types
|
||||
|
||||
| Type | Source | Triggers |
|
||||
|------|--------|----------|
|
||||
| `registry-push` | Container registries | Image push events |
|
||||
| `ci-pipeline` | CI/CD systems | Pipeline completion |
|
||||
| `github-app` | GitHub | PR, push, workflow events |
|
||||
| `gitlab-webhook` | GitLab | Pipeline, push, MR events |
|
||||
| `generic` | Any system | Custom payloads |
|
||||
|
||||
### Registry Push Webhook
|
||||
|
||||
Receives events when new images are pushed to registries.
|
||||
|
||||
```
|
||||
POST /api/v1/webhooks/registry/{integrationId}
|
||||
Content-Type: application/json
|
||||
|
||||
# Docker Hub
|
||||
{
|
||||
"push_data": {
|
||||
"tag": "v1.2.0",
|
||||
"images": ["sha256:abc123..."],
|
||||
"pushed_at": 1704067200
|
||||
},
|
||||
"repository": {
|
||||
"name": "myapp",
|
||||
"namespace": "myorg",
|
||||
"repo_url": "https://hub.docker.com/r/myorg/myapp"
|
||||
}
|
||||
}
|
||||
|
||||
# Harbor
|
||||
{
|
||||
"type": "PUSH_ARTIFACT",
|
||||
"occur_at": 1704067200,
|
||||
"event_data": {
|
||||
"repository": {
|
||||
"name": "myapp",
|
||||
"repo_full_name": "myorg/myapp"
|
||||
},
|
||||
"resources": [{
|
||||
"digest": "sha256:abc123...",
|
||||
"tag": "v1.2.0"
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Webhook Handler
|
||||
|
||||
```typescript
|
||||
interface WebhookHandler {
|
||||
handleRegistryPush(
|
||||
integrationId: UUID,
|
||||
payload: RegistryPushPayload
|
||||
): Promise<WebhookResponse>;
|
||||
|
||||
handleCIPipeline(
|
||||
integrationId: UUID,
|
||||
payload: CIPipelinePayload
|
||||
): Promise<WebhookResponse>;
|
||||
}
|
||||
|
||||
class RegistryWebhookHandler implements WebhookHandler {
|
||||
async handleRegistryPush(
|
||||
integrationId: UUID,
|
||||
payload: RegistryPushPayload
|
||||
): Promise<WebhookResponse> {
|
||||
// Normalize payload from different registries
|
||||
const normalized = this.normalizePayload(payload);
|
||||
|
||||
// Find matching component
|
||||
const component = await this.componentRegistry.findByRepository(
|
||||
normalized.repository
|
||||
);
|
||||
|
||||
if (!component) {
|
||||
return {
|
||||
success: true,
|
||||
action: "ignored",
|
||||
reason: "No matching component"
|
||||
};
|
||||
}
|
||||
|
||||
// Update version map
|
||||
await this.versionManager.addVersion({
|
||||
componentId: component.id,
|
||||
tag: normalized.tag,
|
||||
digest: normalized.digest,
|
||||
channel: this.determineChannel(normalized.tag)
|
||||
});
|
||||
|
||||
// Check for auto-release triggers
|
||||
const triggers = await this.getTriggers(component.id, normalized.tag);
|
||||
for (const trigger of triggers) {
|
||||
await this.triggerRelease(trigger, normalized);
|
||||
}
|
||||
|
||||
return {
|
||||
success: true,
|
||||
action: "processed",
|
||||
componentId: component.id,
|
||||
versionsAdded: 1,
|
||||
triggersActivated: triggers.length
|
||||
};
|
||||
}
|
||||
|
||||
private normalizePayload(payload: any): NormalizedPushEvent {
|
||||
// Detect registry type and normalize
|
||||
if (payload.push_data) {
|
||||
// Docker Hub format
|
||||
return {
|
||||
repository: `${payload.repository.namespace}/${payload.repository.name}`,
|
||||
tag: payload.push_data.tag,
|
||||
digest: payload.push_data.images[0],
|
||||
pushedAt: new Date(payload.push_data.pushed_at * 1000)
|
||||
};
|
||||
}
|
||||
|
||||
if (payload.type === "PUSH_ARTIFACT") {
|
||||
// Harbor format
|
||||
return {
|
||||
repository: payload.event_data.repository.repo_full_name,
|
||||
tag: payload.event_data.resources[0].tag,
|
||||
digest: payload.event_data.resources[0].digest,
|
||||
pushedAt: new Date(payload.occur_at * 1000)
|
||||
};
|
||||
}
|
||||
|
||||
// Generic format
|
||||
return payload as NormalizedPushEvent;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Webhook Authentication
|
||||
|
||||
```typescript
|
||||
interface WebhookAuth {
|
||||
// Signature validation
|
||||
validateSignature(
|
||||
payload: Buffer,
|
||||
signature: string,
|
||||
secret: string,
|
||||
algorithm: SignatureAlgorithm
|
||||
): boolean;
|
||||
|
||||
// Token validation
|
||||
validateToken(
|
||||
token: string,
|
||||
expectedToken: string
|
||||
): boolean;
|
||||
}
|
||||
|
||||
type SignatureAlgorithm = "hmac-sha256" | "hmac-sha1";
|
||||
|
||||
class WebhookAuthenticator implements WebhookAuth {
|
||||
validateSignature(
|
||||
payload: Buffer,
|
||||
signature: string,
|
||||
secret: string,
|
||||
algorithm: SignatureAlgorithm
|
||||
): boolean {
|
||||
const algo = algorithm === "hmac-sha256" ? "sha256" : "sha1";
|
||||
const expected = crypto
|
||||
.createHmac(algo, secret)
|
||||
.update(payload)
|
||||
.digest("hex");
|
||||
|
||||
// Constant-time comparison
|
||||
return crypto.timingSafeEqual(
|
||||
Buffer.from(signature),
|
||||
Buffer.from(expected)
|
||||
);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Webhook Configuration
|
||||
|
||||
```typescript
|
||||
interface WebhookConfig {
|
||||
id: UUID;
|
||||
integrationId: UUID;
|
||||
type: WebhookType;
|
||||
|
||||
// Security
|
||||
secretRef: string; // Vault reference for signature secret
|
||||
signatureHeader?: string; // Header containing signature
|
||||
signatureAlgorithm?: SignatureAlgorithm;
|
||||
|
||||
// Processing
|
||||
enabled: boolean;
|
||||
filters?: WebhookFilter[]; // Filter events
|
||||
|
||||
// Retry
|
||||
retryPolicy: RetryPolicy;
|
||||
}
|
||||
|
||||
interface WebhookFilter {
|
||||
field: string; // JSONPath to field
|
||||
operator: "equals" | "contains" | "matches";
|
||||
value: string;
|
||||
}
|
||||
|
||||
// Example: Only process tags matching semver
|
||||
const semverFilter: WebhookFilter = {
|
||||
field: "$.tag",
|
||||
operator: "matches",
|
||||
value: "^v\\d+\\.\\d+\\.\\d+$"
|
||||
};
|
||||
```
|
||||
|
||||
## Outbound Webhooks
|
||||
|
||||
### Event Types
|
||||
|
||||
| Event | Description | Payload |
|
||||
|-------|-------------|---------|
|
||||
| `release.created` | New release created | Release details |
|
||||
| `promotion.requested` | Promotion requested | Promotion details |
|
||||
| `promotion.approved` | Promotion approved | Approval details |
|
||||
| `promotion.rejected` | Promotion rejected | Rejection details |
|
||||
| `deployment.started` | Deployment started | Job details |
|
||||
| `deployment.completed` | Deployment completed | Job details, results |
|
||||
| `deployment.failed` | Deployment failed | Job details, error |
|
||||
| `rollback.initiated` | Rollback initiated | Rollback details |
|
||||
|
||||
### Webhook Subscription
|
||||
|
||||
```typescript
|
||||
interface WebhookSubscription {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string;
|
||||
|
||||
// Target
|
||||
url: string;
|
||||
method: "POST" | "PUT";
|
||||
headers?: Record<string, string>;
|
||||
|
||||
// Authentication
|
||||
authType: "none" | "basic" | "bearer" | "signature";
|
||||
credentialRef?: string;
|
||||
signatureSecret?: string;
|
||||
|
||||
// Events
|
||||
events: string[]; // Event types to subscribe
|
||||
filters?: EventFilter[]; // Filter events
|
||||
|
||||
// Delivery
|
||||
retryPolicy: RetryPolicy;
|
||||
timeout: number;
|
||||
|
||||
// Status
|
||||
enabled: boolean;
|
||||
lastDelivery?: DateTime;
|
||||
lastStatus?: number;
|
||||
}
|
||||
|
||||
interface EventFilter {
|
||||
field: string;
|
||||
operator: string;
|
||||
value: any;
|
||||
}
|
||||
```
|
||||
|
||||
### Webhook Delivery
|
||||
|
||||
```typescript
|
||||
interface WebhookPayload {
|
||||
id: string; // Delivery ID
|
||||
timestamp: string; // ISO-8601
|
||||
event: string; // Event type
|
||||
tenantId: string;
|
||||
data: Record<string, any>; // Event-specific data
|
||||
}
|
||||
|
||||
class WebhookDeliveryService {
|
||||
async deliver(
|
||||
subscription: WebhookSubscription,
|
||||
event: DomainEvent
|
||||
): Promise<DeliveryResult> {
|
||||
const payload: WebhookPayload = {
|
||||
id: uuidv4(),
|
||||
timestamp: new Date().toISOString(),
|
||||
event: event.type,
|
||||
tenantId: subscription.tenantId,
|
||||
data: this.buildEventData(event)
|
||||
};
|
||||
|
||||
const headers = this.buildHeaders(subscription, payload);
|
||||
const body = JSON.stringify(payload);
|
||||
|
||||
// Attempt delivery with retries
|
||||
return this.deliverWithRetry(subscription, headers, body);
|
||||
}
|
||||
|
||||
private buildHeaders(
|
||||
subscription: WebhookSubscription,
|
||||
payload: WebhookPayload
|
||||
): Record<string, string> {
|
||||
const headers: Record<string, string> = {
|
||||
"Content-Type": "application/json",
|
||||
"X-Stella-Event": payload.event,
|
||||
"X-Stella-Delivery": payload.id,
|
||||
"X-Stella-Timestamp": payload.timestamp,
|
||||
...subscription.headers
|
||||
};
|
||||
|
||||
// Add signature if configured
|
||||
if (subscription.authType === "signature") {
|
||||
const signature = this.computeSignature(
|
||||
JSON.stringify(payload),
|
||||
subscription.signatureSecret!
|
||||
);
|
||||
headers["X-Stella-Signature"] = signature;
|
||||
}
|
||||
|
||||
return headers;
|
||||
}
|
||||
|
||||
private async deliverWithRetry(
|
||||
subscription: WebhookSubscription,
|
||||
headers: Record<string, string>,
|
||||
body: string
|
||||
): Promise<DeliveryResult> {
|
||||
const policy = subscription.retryPolicy;
|
||||
let lastError: Error | undefined;
|
||||
|
||||
for (let attempt = 0; attempt <= policy.maxRetries; attempt++) {
|
||||
try {
|
||||
const response = await fetch(subscription.url, {
|
||||
method: subscription.method,
|
||||
headers,
|
||||
body,
|
||||
signal: AbortSignal.timeout(subscription.timeout)
|
||||
});
|
||||
|
||||
// Record delivery
|
||||
await this.recordDelivery(subscription.id, {
|
||||
attempt,
|
||||
statusCode: response.status,
|
||||
success: response.ok
|
||||
});
|
||||
|
||||
if (response.ok) {
|
||||
return { success: true, statusCode: response.status, attempts: attempt + 1 };
|
||||
}
|
||||
|
||||
// Non-retryable status codes
|
||||
if (response.status >= 400 && response.status < 500) {
|
||||
return {
|
||||
success: false,
|
||||
statusCode: response.status,
|
||||
attempts: attempt + 1,
|
||||
error: `Client error: ${response.status}`
|
||||
};
|
||||
}
|
||||
|
||||
lastError = new Error(`Server error: ${response.status}`);
|
||||
} catch (error) {
|
||||
lastError = error as Error;
|
||||
}
|
||||
|
||||
// Wait before retry
|
||||
if (attempt < policy.maxRetries) {
|
||||
const delay = this.calculateDelay(policy, attempt);
|
||||
await sleep(delay);
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
success: false,
|
||||
attempts: policy.maxRetries + 1,
|
||||
error: lastError?.message
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Delivery Logging
|
||||
|
||||
```typescript
|
||||
interface WebhookDeliveryLog {
|
||||
id: UUID;
|
||||
subscriptionId: UUID;
|
||||
deliveryId: string;
|
||||
|
||||
// Request
|
||||
url: string;
|
||||
method: string;
|
||||
headers: Record<string, string>;
|
||||
body: string;
|
||||
|
||||
// Response
|
||||
statusCode?: number;
|
||||
responseBody?: string;
|
||||
responseTime: number;
|
||||
|
||||
// Result
|
||||
success: boolean;
|
||||
attempt: number;
|
||||
error?: string;
|
||||
|
||||
// Timing
|
||||
createdAt: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
## Webhook API
|
||||
|
||||
### Register Subscription
|
||||
|
||||
```http
|
||||
POST /api/v1/webhook-subscriptions
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"name": "Deployment Notifications",
|
||||
"url": "https://api.example.com/webhooks/stella",
|
||||
"method": "POST",
|
||||
"authType": "signature",
|
||||
"signatureSecret": "my-secret-key",
|
||||
"events": [
|
||||
"deployment.started",
|
||||
"deployment.completed",
|
||||
"deployment.failed"
|
||||
],
|
||||
"filters": [
|
||||
{
|
||||
"field": "data.environment.name",
|
||||
"operator": "equals",
|
||||
"value": "production"
|
||||
}
|
||||
],
|
||||
"retryPolicy": {
|
||||
"maxRetries": 3,
|
||||
"backoffType": "exponential",
|
||||
"backoffSeconds": 10
|
||||
},
|
||||
"timeout": 30000
|
||||
}
|
||||
```
|
||||
|
||||
### Test Subscription
|
||||
|
||||
```http
|
||||
POST /api/v1/webhook-subscriptions/{id}/test
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"event": "deployment.completed"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"deliveryId": "d1234567-...",
|
||||
"statusCode": 200,
|
||||
"responseTime": 245,
|
||||
"response": "OK"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### List Deliveries
|
||||
|
||||
```http
|
||||
GET /api/v1/webhook-subscriptions/{id}/deliveries?page=1&pageSize=20
|
||||
```
|
||||
|
||||
## Event Payloads
|
||||
|
||||
### deployment.completed
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "delivery-uuid",
|
||||
"timestamp": "2026-01-09T10:30:00Z",
|
||||
"event": "deployment.completed",
|
||||
"tenantId": "tenant-uuid",
|
||||
"data": {
|
||||
"deploymentJob": {
|
||||
"id": "job-uuid",
|
||||
"status": "completed"
|
||||
},
|
||||
"release": {
|
||||
"id": "release-uuid",
|
||||
"name": "myapp-v1.2.0",
|
||||
"components": [
|
||||
{
|
||||
"name": "api",
|
||||
"digest": "sha256:abc123..."
|
||||
}
|
||||
]
|
||||
},
|
||||
"environment": {
|
||||
"id": "env-uuid",
|
||||
"name": "production"
|
||||
},
|
||||
"promotion": {
|
||||
"id": "promo-uuid",
|
||||
"requestedBy": "user@example.com"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"id": "target-uuid",
|
||||
"name": "prod-host-1",
|
||||
"status": "succeeded"
|
||||
}
|
||||
],
|
||||
"timing": {
|
||||
"startedAt": "2026-01-09T10:25:00Z",
|
||||
"completedAt": "2026-01-09T10:30:00Z",
|
||||
"durationSeconds": 300
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### promotion.requested
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "delivery-uuid",
|
||||
"timestamp": "2026-01-09T10:00:00Z",
|
||||
"event": "promotion.requested",
|
||||
"tenantId": "tenant-uuid",
|
||||
"data": {
|
||||
"promotion": {
|
||||
"id": "promo-uuid",
|
||||
"status": "pending_approval"
|
||||
},
|
||||
"release": {
|
||||
"id": "release-uuid",
|
||||
"name": "myapp-v1.2.0"
|
||||
},
|
||||
"sourceEnvironment": {
|
||||
"id": "staging-uuid",
|
||||
"name": "staging"
|
||||
},
|
||||
"targetEnvironment": {
|
||||
"id": "prod-uuid",
|
||||
"name": "production"
|
||||
},
|
||||
"requestedBy": {
|
||||
"id": "user-uuid",
|
||||
"email": "user@example.com",
|
||||
"name": "John Doe"
|
||||
},
|
||||
"approvalRequired": {
|
||||
"count": 2,
|
||||
"currentApprovals": 0
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Signature Verification
|
||||
|
||||
Receivers should verify webhook signatures:
|
||||
|
||||
```python
|
||||
import hmac
|
||||
import hashlib
|
||||
|
||||
def verify_signature(payload: bytes, signature: str, secret: str) -> bool:
|
||||
expected = hmac.new(
|
||||
secret.encode(),
|
||||
payload,
|
||||
hashlib.sha256
|
||||
).hexdigest()
|
||||
|
||||
return hmac.compare_digest(signature, expected)
|
||||
|
||||
# In webhook handler
|
||||
@app.route("/webhooks/stella", methods=["POST"])
|
||||
def handle_webhook():
|
||||
signature = request.headers.get("X-Stella-Signature")
|
||||
if not verify_signature(request.data, signature, WEBHOOK_SECRET):
|
||||
return "Invalid signature", 401
|
||||
|
||||
payload = request.json
|
||||
# Process event...
|
||||
```
|
||||
|
||||
### IP Allowlisting
|
||||
|
||||
Configure firewall rules to only accept webhooks from Stella IP ranges:
|
||||
- Document IP ranges in deployment configuration
|
||||
- Use VPN or private networking where possible
|
||||
|
||||
### Replay Protection
|
||||
|
||||
Check delivery timestamps to prevent replay attacks:
|
||||
|
||||
```python
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
MAX_TIMESTAMP_AGE = timedelta(minutes=5)
|
||||
|
||||
def check_timestamp(timestamp_str: str) -> bool:
|
||||
timestamp = datetime.fromisoformat(timestamp_str.replace("Z", "+00:00"))
|
||||
now = datetime.now(timestamp.tzinfo)
|
||||
return abs(now - timestamp) < MAX_TIMESTAMP_AGE
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Integrations Overview](overview.md)
|
||||
- [Connectors](connectors.md)
|
||||
- [CI/CD Integration](ci-cd.md)
|
||||
597
docs/modules/release-orchestrator/modules/agents.md
Normal file
597
docs/modules/release-orchestrator/modules/agents.md
Normal file
@@ -0,0 +1,597 @@
|
||||
# AGENTS: Deployment Agents
|
||||
|
||||
**Purpose**: Lightweight deployment agents for target execution.
|
||||
|
||||
## Agent Types
|
||||
|
||||
| Agent Type | Transport | Target Types |
|
||||
|------------|-----------|--------------|
|
||||
| `agent-docker` | gRPC | Docker hosts |
|
||||
| `agent-compose` | gRPC | Docker Compose hosts |
|
||||
| `agent-ssh` | SSH | Linux remote hosts |
|
||||
| `agent-winrm` | WinRM | Windows remote hosts |
|
||||
| `agent-ecs` | AWS API | AWS ECS services |
|
||||
| `agent-nomad` | Nomad API | HashiCorp Nomad jobs |
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `agent-core`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Shared agent runtime; task execution framework |
|
||||
| **Protocol** | gRPC for communication with Stella Core |
|
||||
| **Security** | mTLS authentication; short-lived JWT for tasks |
|
||||
|
||||
**Agent Lifecycle**:
|
||||
1. Agent starts with registration token
|
||||
2. Agent registers with capabilities and labels
|
||||
3. Agent sends heartbeats (default: 30s interval)
|
||||
4. Agent receives tasks from Stella Core
|
||||
5. Agent reports task completion/failure
|
||||
|
||||
**Agent Task Protocol**:
|
||||
```typescript
|
||||
// Task assignment (Core → Agent)
|
||||
interface AgentTask {
|
||||
id: UUID;
|
||||
type: TaskType;
|
||||
targetId: UUID;
|
||||
payload: TaskPayload;
|
||||
credentials: EncryptedCredentials;
|
||||
timeout: number;
|
||||
priority: TaskPriority;
|
||||
idempotencyKey: string;
|
||||
assignedAt: DateTime;
|
||||
expiresAt: DateTime;
|
||||
}
|
||||
|
||||
type TaskType =
|
||||
| "deploy"
|
||||
| "rollback"
|
||||
| "health-check"
|
||||
| "inspect"
|
||||
| "execute-command"
|
||||
| "upload-files"
|
||||
| "write-sticker"
|
||||
| "read-sticker";
|
||||
|
||||
interface DeployTaskPayload {
|
||||
image: string;
|
||||
digest: string;
|
||||
config: DeployConfig;
|
||||
artifacts: ArtifactReference[];
|
||||
previousDigest?: string;
|
||||
hooks: {
|
||||
preDeploy?: HookConfig;
|
||||
postDeploy?: HookConfig;
|
||||
};
|
||||
}
|
||||
|
||||
// Task result (Agent → Core)
|
||||
interface TaskResult {
|
||||
taskId: UUID;
|
||||
success: boolean;
|
||||
startedAt: DateTime;
|
||||
completedAt: DateTime;
|
||||
|
||||
// Success details
|
||||
outputs?: Record<string, any>;
|
||||
artifacts?: ArtifactReference[];
|
||||
|
||||
// Failure details
|
||||
error?: string;
|
||||
errorType?: string;
|
||||
retriable?: boolean;
|
||||
|
||||
// Logs
|
||||
logs: string;
|
||||
|
||||
// Metrics
|
||||
metrics: {
|
||||
pullDurationMs?: number;
|
||||
deployDurationMs?: number;
|
||||
healthCheckDurationMs?: number;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `agent-docker`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Docker container deployment |
|
||||
| **Dependencies** | Docker Engine API |
|
||||
| **Capabilities** | `docker.deploy`, `docker.rollback`, `docker.inspect` |
|
||||
|
||||
**Docker Agent Implementation**:
|
||||
```typescript
|
||||
class DockerAgent implements TargetExecutor {
|
||||
private docker: Docker;
|
||||
|
||||
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
|
||||
const { image, digest, config, previousDigest } = task;
|
||||
const containerName = config.containerName;
|
||||
|
||||
// 1. Pull image and verify digest
|
||||
this.log(`Pulling image ${image}@${digest}`);
|
||||
await this.docker.pull(image, { digest });
|
||||
|
||||
const pulledDigest = await this.getImageDigest(image);
|
||||
if (pulledDigest !== digest) {
|
||||
throw new DigestMismatchError(
|
||||
`Expected digest ${digest}, got ${pulledDigest}. Possible tampering detected.`
|
||||
);
|
||||
}
|
||||
|
||||
// 2. Run pre-deploy hook
|
||||
if (task.hooks?.preDeploy) {
|
||||
await this.runHook(task.hooks.preDeploy, "pre-deploy");
|
||||
}
|
||||
|
||||
// 3. Stop and rename existing container
|
||||
const existingContainer = await this.findContainer(containerName);
|
||||
if (existingContainer) {
|
||||
this.log(`Stopping existing container ${containerName}`);
|
||||
await existingContainer.stop({ t: 10 });
|
||||
await existingContainer.rename(`${containerName}-previous-${Date.now()}`);
|
||||
}
|
||||
|
||||
// 4. Create new container
|
||||
this.log(`Creating container ${containerName} from ${image}@${digest}`);
|
||||
const container = await this.docker.createContainer({
|
||||
name: containerName,
|
||||
Image: `${image}@${digest}`, // Always use digest, not tag
|
||||
Env: this.buildEnvVars(config.environment),
|
||||
HostConfig: {
|
||||
PortBindings: this.buildPortBindings(config.ports),
|
||||
Binds: this.buildBindMounts(config.volumes),
|
||||
RestartPolicy: { Name: config.restartPolicy || "unless-stopped" },
|
||||
Memory: config.memoryLimit,
|
||||
CpuQuota: config.cpuLimit,
|
||||
},
|
||||
Labels: {
|
||||
"stella.release.id": config.releaseId,
|
||||
"stella.release.name": config.releaseName,
|
||||
"stella.digest": digest,
|
||||
"stella.deployed.at": new Date().toISOString(),
|
||||
},
|
||||
});
|
||||
|
||||
// 5. Start container
|
||||
this.log(`Starting container ${containerName}`);
|
||||
await container.start();
|
||||
|
||||
// 6. Wait for container to be healthy
|
||||
if (config.healthCheck) {
|
||||
this.log(`Waiting for container health check`);
|
||||
const healthy = await this.waitForHealthy(container, config.healthCheck.timeout);
|
||||
if (!healthy) {
|
||||
await this.rollbackContainer(containerName, existingContainer);
|
||||
throw new HealthCheckFailedError(`Container ${containerName} failed health check`);
|
||||
}
|
||||
}
|
||||
|
||||
// 7. Run post-deploy hook
|
||||
if (task.hooks?.postDeploy) {
|
||||
await this.runHook(task.hooks.postDeploy, "post-deploy");
|
||||
}
|
||||
|
||||
// 8. Cleanup previous container
|
||||
if (existingContainer && config.cleanupPrevious !== false) {
|
||||
this.log(`Removing previous container`);
|
||||
await existingContainer.remove({ force: true });
|
||||
}
|
||||
|
||||
return {
|
||||
success: true,
|
||||
containerId: container.id,
|
||||
previousDigest: previousDigest,
|
||||
};
|
||||
}
|
||||
|
||||
async rollback(task: RollbackTaskPayload): Promise<DeployResult> {
|
||||
const { containerName, targetDigest } = task;
|
||||
|
||||
if (targetDigest) {
|
||||
// Deploy specific digest
|
||||
return this.deploy({ ...task, digest: targetDigest });
|
||||
}
|
||||
|
||||
// Find and restore previous container
|
||||
const previousContainer = await this.findContainer(`${containerName}-previous-*`);
|
||||
if (!previousContainer) {
|
||||
throw new RollbackError(`No previous container found for ${containerName}`);
|
||||
}
|
||||
|
||||
const currentContainer = await this.findContainer(containerName);
|
||||
if (currentContainer) {
|
||||
await currentContainer.stop({ t: 10 });
|
||||
await currentContainer.rename(`${containerName}-failed-${Date.now()}`);
|
||||
}
|
||||
|
||||
await previousContainer.rename(containerName);
|
||||
await previousContainer.start();
|
||||
|
||||
return { success: true, containerId: previousContainer.id };
|
||||
}
|
||||
|
||||
async writeSticker(sticker: VersionSticker): Promise<void> {
|
||||
const stickerPath = this.config.stickerPath || "/var/stella/version.json";
|
||||
const stickerContent = JSON.stringify(sticker, null, 2);
|
||||
|
||||
if (this.config.stickerLocation === "volume") {
|
||||
await this.docker.run("alpine", [
|
||||
"sh", "-c",
|
||||
`echo '${stickerContent}' > ${stickerPath}`
|
||||
], {
|
||||
HostConfig: { Binds: [`${this.config.stickerVolume}:/var/stella`] }
|
||||
});
|
||||
} else {
|
||||
fs.writeFileSync(stickerPath, stickerContent);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `agent-compose`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Docker Compose stack deployment |
|
||||
| **Dependencies** | Docker Compose CLI |
|
||||
| **Capabilities** | `compose.deploy`, `compose.rollback`, `compose.inspect` |
|
||||
|
||||
**Compose Agent Implementation**:
|
||||
```typescript
|
||||
class ComposeAgent implements TargetExecutor {
|
||||
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
|
||||
const { artifacts, config } = task;
|
||||
const deployDir = config.deploymentDirectory;
|
||||
|
||||
// 1. Write compose lock file
|
||||
const composeLock = artifacts.find(a => a.type === "compose_lock");
|
||||
const composeContent = await this.fetchArtifact(composeLock);
|
||||
const composePath = path.join(deployDir, "compose.stella.lock.yml");
|
||||
await fs.writeFile(composePath, composeContent);
|
||||
|
||||
// 2. Run pre-deploy hook
|
||||
if (task.hooks?.preDeploy) {
|
||||
await this.runHook(task.hooks.preDeploy, deployDir);
|
||||
}
|
||||
|
||||
// 3. Pull images
|
||||
this.log("Pulling images...");
|
||||
await this.runCompose(deployDir, ["pull"]);
|
||||
|
||||
// 4. Verify digests
|
||||
await this.verifyDigests(composePath, config.expectedDigests);
|
||||
|
||||
// 5. Deploy
|
||||
this.log("Deploying services...");
|
||||
await this.runCompose(deployDir, ["up", "-d", "--remove-orphans", "--force-recreate"]);
|
||||
|
||||
// 6. Wait for services to be healthy
|
||||
if (config.healthCheck) {
|
||||
const healthy = await this.waitForServicesHealthy(deployDir, config.healthCheck.timeout);
|
||||
if (!healthy) {
|
||||
await this.rollbackToBackup(deployDir);
|
||||
throw new HealthCheckFailedError("Services failed health check");
|
||||
}
|
||||
}
|
||||
|
||||
// 7. Run post-deploy hook
|
||||
if (task.hooks?.postDeploy) {
|
||||
await this.runHook(task.hooks.postDeploy, deployDir);
|
||||
}
|
||||
|
||||
// 8. Write version sticker
|
||||
await this.writeSticker(config.sticker, deployDir);
|
||||
|
||||
return { success: true };
|
||||
}
|
||||
|
||||
private async verifyDigests(
|
||||
composePath: string,
|
||||
expectedDigests: Record<string, string>
|
||||
): Promise<void> {
|
||||
const composeContent = yaml.parse(await fs.readFile(composePath, "utf-8"));
|
||||
|
||||
for (const [service, expectedDigest] of Object.entries(expectedDigests)) {
|
||||
const serviceConfig = composeContent.services[service];
|
||||
if (!serviceConfig) {
|
||||
throw new Error(`Service ${service} not found in compose file`);
|
||||
}
|
||||
|
||||
const image = serviceConfig.image;
|
||||
if (!image.includes("@sha256:")) {
|
||||
throw new Error(`Service ${service} image not pinned to digest: ${image}`);
|
||||
}
|
||||
|
||||
const actualDigest = image.split("@")[1];
|
||||
if (actualDigest !== expectedDigest) {
|
||||
throw new DigestMismatchError(
|
||||
`Service ${service}: expected ${expectedDigest}, got ${actualDigest}`
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `agent-ssh`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | SSH remote execution (agentless) |
|
||||
| **Dependencies** | SSH client library |
|
||||
| **Capabilities** | `ssh.deploy`, `ssh.execute`, `ssh.upload` |
|
||||
|
||||
**SSH Remote Executor**:
|
||||
```typescript
|
||||
class SSHRemoteExecutor implements TargetExecutor {
|
||||
async connect(config: SSHConnectionConfig): Promise<void> {
|
||||
const privateKey = await this.secrets.getSecret(config.privateKeyRef);
|
||||
|
||||
this.ssh = new SSHClient();
|
||||
await this.ssh.connect({
|
||||
host: config.host,
|
||||
port: config.port || 22,
|
||||
username: config.username,
|
||||
privateKey: privateKey.value,
|
||||
readyTimeout: config.connectionTimeout || 30000,
|
||||
});
|
||||
}
|
||||
|
||||
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
|
||||
const { artifacts, config } = task;
|
||||
const deployDir = config.deploymentDirectory;
|
||||
|
||||
try {
|
||||
// 1. Ensure deployment directory exists
|
||||
await this.exec(`mkdir -p ${deployDir}`);
|
||||
await this.exec(`mkdir -p ${deployDir}/.stella-backup`);
|
||||
|
||||
// 2. Backup current deployment
|
||||
await this.exec(`cp -r ${deployDir}/* ${deployDir}/.stella-backup/ 2>/dev/null || true`);
|
||||
|
||||
// 3. Upload artifacts
|
||||
for (const artifact of artifacts) {
|
||||
const content = await this.fetchArtifact(artifact);
|
||||
const remotePath = path.join(deployDir, artifact.name);
|
||||
await this.uploadFile(content, remotePath);
|
||||
}
|
||||
|
||||
// 4. Run pre-deploy hook
|
||||
if (task.hooks?.preDeploy) {
|
||||
await this.runRemoteHook(task.hooks.preDeploy, deployDir);
|
||||
}
|
||||
|
||||
// 5. Execute deployment script
|
||||
const deployScript = artifacts.find(a => a.type === "deploy_script");
|
||||
if (deployScript) {
|
||||
const scriptPath = path.join(deployDir, deployScript.name);
|
||||
await this.exec(`chmod +x ${scriptPath}`);
|
||||
const result = await this.exec(scriptPath, { cwd: deployDir, timeout: config.deploymentTimeout });
|
||||
if (result.exitCode !== 0) {
|
||||
throw new DeploymentError(`Deploy script failed: ${result.stderr}`);
|
||||
}
|
||||
}
|
||||
|
||||
// 6. Run post-deploy hook
|
||||
if (task.hooks?.postDeploy) {
|
||||
await this.runRemoteHook(task.hooks.postDeploy, deployDir);
|
||||
}
|
||||
|
||||
// 7. Health check
|
||||
if (config.healthCheck) {
|
||||
const healthy = await this.runHealthCheck(config.healthCheck);
|
||||
if (!healthy) {
|
||||
await this.rollback(task);
|
||||
throw new HealthCheckFailedError("Health check failed");
|
||||
}
|
||||
}
|
||||
|
||||
// 8. Write version sticker
|
||||
await this.writeSticker(config.sticker, deployDir);
|
||||
|
||||
// 9. Cleanup backup
|
||||
await this.exec(`rm -rf ${deployDir}/.stella-backup`);
|
||||
|
||||
return { success: true };
|
||||
} finally {
|
||||
this.ssh.end();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `agent-winrm`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | WinRM remote execution (agentless) |
|
||||
| **Dependencies** | WinRM client library |
|
||||
| **Capabilities** | `winrm.deploy`, `winrm.execute`, `winrm.upload` |
|
||||
| **Authentication** | NTLM, Kerberos, Basic |
|
||||
|
||||
---
|
||||
|
||||
### Module: `agent-ecs`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | AWS ECS service deployment |
|
||||
| **Dependencies** | AWS SDK |
|
||||
| **Capabilities** | `ecs.deploy`, `ecs.rollback`, `ecs.inspect` |
|
||||
|
||||
---
|
||||
|
||||
### Module: `agent-nomad`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | HashiCorp Nomad job deployment |
|
||||
| **Dependencies** | Nomad API client |
|
||||
| **Capabilities** | `nomad.deploy`, `nomad.rollback`, `nomad.inspect` |
|
||||
|
||||
---
|
||||
|
||||
## Agent Security Model
|
||||
|
||||
### Registration Flow
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ AGENT REGISTRATION FLOW │
|
||||
│ │
|
||||
│ 1. Admin generates registration token (one-time use) │
|
||||
│ POST /api/v1/admin/agent-tokens │
|
||||
│ → { token: "reg_xxx", expiresAt: "..." } │
|
||||
│ │
|
||||
│ 2. Agent starts with registration token │
|
||||
│ ./stella-agent --register --token=reg_xxx │
|
||||
│ │
|
||||
│ 3. Agent requests mTLS certificate │
|
||||
│ POST /api/v1/agents/register │
|
||||
│ Headers: X-Registration-Token: reg_xxx │
|
||||
│ Body: { name, version, capabilities, csr } │
|
||||
│ → { agentId, certificate, caCertificate } │
|
||||
│ │
|
||||
│ 4. Agent establishes mTLS connection │
|
||||
│ Uses issued certificate for all subsequent requests │
|
||||
│ │
|
||||
│ 5. Agent requests short-lived JWT for task execution │
|
||||
│ POST /api/v1/agents/token (over mTLS) │
|
||||
│ → { token, expiresIn: 3600 } // 1 hour │
|
||||
│ │
|
||||
│ 6. Agent refreshes token before expiration │
|
||||
│ Token refresh only over mTLS connection │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Communication Security
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ AGENT COMMUNICATION SECURITY │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ AGENT │ │ STELLA CORE │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ │ │ │
|
||||
│ │ mTLS (mutual TLS) │ │
|
||||
│ │ - Agent cert signed by Stella CA │ │
|
||||
│ │ - Server cert verified by Agent │ │
|
||||
│ │ - TLS 1.3 only │ │
|
||||
│ │ - Perfect forward secrecy │ │
|
||||
│ │◄───────────────────────────────────────►│ │
|
||||
│ │ │ │
|
||||
│ │ Encrypted payload │ │
|
||||
│ │ - Task payloads encrypted with │ │
|
||||
│ │ agent-specific key │ │
|
||||
│ │ - Logs encrypted in transit │ │
|
||||
│ │◄───────────────────────────────────────►│ │
|
||||
│ │ │ │
|
||||
│ │ Heartbeat + capability refresh │ │
|
||||
│ │ - Every 30 seconds │ │
|
||||
│ │ - Signed with agent key │ │
|
||||
│ │─────────────────────────────────────────►│ │
|
||||
│ │ │ │
|
||||
│ │ Task assignment │ │
|
||||
│ │ - Contains short-lived credentials │ │
|
||||
│ │ - Scoped to specific target │ │
|
||||
│ │ - Expires after task timeout │ │
|
||||
│ │◄─────────────────────────────────────────│ │
|
||||
│ │ │ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Agents
|
||||
CREATE TABLE release.agents (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
version VARCHAR(50) NOT NULL,
|
||||
capabilities JSONB NOT NULL DEFAULT '[]',
|
||||
labels JSONB NOT NULL DEFAULT '{}',
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'offline' CHECK (status IN (
|
||||
'online', 'offline', 'degraded'
|
||||
)),
|
||||
last_heartbeat TIMESTAMPTZ,
|
||||
resource_usage JSONB,
|
||||
certificate_fingerprint VARCHAR(64),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_agents_tenant ON release.agents(tenant_id);
|
||||
CREATE INDEX idx_agents_status ON release.agents(status);
|
||||
CREATE INDEX idx_agents_capabilities ON release.agents USING GIN (capabilities);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Agent Registration
|
||||
POST /api/v1/agents/register
|
||||
Headers: X-Registration-Token: {token}
|
||||
Body: { name, version, capabilities, csr }
|
||||
Response: { agentId, certificate, caCertificate }
|
||||
|
||||
# Agent Management
|
||||
GET /api/v1/agents
|
||||
Query: ?status={online|offline|degraded}&capability={type}
|
||||
Response: Agent[]
|
||||
|
||||
GET /api/v1/agents/{id}
|
||||
Response: Agent
|
||||
|
||||
PUT /api/v1/agents/{id}
|
||||
Body: { labels?, capabilities? }
|
||||
Response: Agent
|
||||
|
||||
DELETE /api/v1/agents/{id}
|
||||
Response: { deleted: true }
|
||||
|
||||
# Agent Communication
|
||||
POST /api/v1/agents/{id}/heartbeat
|
||||
Body: { status, resourceUsage, capabilities }
|
||||
Response: { tasks: AgentTask[] }
|
||||
|
||||
POST /api/v1/agents/{id}/tasks/{taskId}/complete
|
||||
Body: { success, result, logs }
|
||||
Response: { acknowledged: true }
|
||||
|
||||
# WebSocket for real-time task stream
|
||||
WS /api/v1/agents/{id}/task-stream
|
||||
Messages:
|
||||
- { type: "task_assigned", task: AgentTask }
|
||||
- { type: "task_cancelled", taskId }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Deploy Orchestrator](deploy-orchestrator.md)
|
||||
- [Agent Security](../security/agent-security.md)
|
||||
- [API Documentation](../api/agents.md)
|
||||
477
docs/modules/release-orchestrator/modules/deploy-orchestrator.md
Normal file
477
docs/modules/release-orchestrator/modules/deploy-orchestrator.md
Normal file
@@ -0,0 +1,477 @@
|
||||
# DEPLOY: Deployment Execution
|
||||
|
||||
**Purpose**: Orchestrate deployment jobs, execute on targets, manage rollbacks, and generate artifacts.
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `deploy-orchestrator`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Deployment job coordination; strategy execution |
|
||||
| **Dependencies** | `target-executor`, `artifact-generator`, `agent-manager` |
|
||||
| **Data Entities** | `DeploymentJob`, `DeploymentTask` |
|
||||
| **Events Produced** | `deployment.started`, `deployment.task_started`, `deployment.task_completed`, `deployment.completed`, `deployment.failed` |
|
||||
|
||||
**Deployment Job Entity**:
|
||||
```typescript
|
||||
interface DeploymentJob {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
promotionId: UUID;
|
||||
releaseId: UUID;
|
||||
environmentId: UUID;
|
||||
status: DeploymentStatus;
|
||||
strategy: DeploymentStrategy;
|
||||
startedAt: DateTime;
|
||||
completedAt: DateTime;
|
||||
artifacts: GeneratedArtifact[];
|
||||
rollbackOf: UUID | null; // If this is a rollback job
|
||||
tasks: DeploymentTask[];
|
||||
}
|
||||
|
||||
type DeploymentStatus =
|
||||
| "pending" // Waiting to start
|
||||
| "running" // Deployment in progress
|
||||
| "succeeded" // All tasks succeeded
|
||||
| "failed" // One or more tasks failed
|
||||
| "cancelled" // User cancelled
|
||||
| "rolling_back" // Rollback in progress
|
||||
| "rolled_back"; // Rollback complete
|
||||
|
||||
interface DeploymentTask {
|
||||
id: UUID;
|
||||
jobId: UUID;
|
||||
targetId: UUID;
|
||||
digest: string;
|
||||
status: TaskStatus;
|
||||
agentId: UUID | null;
|
||||
startedAt: DateTime;
|
||||
completedAt: DateTime;
|
||||
exitCode: number | null;
|
||||
logs: string;
|
||||
previousDigest: string | null;
|
||||
stickerWritten: boolean;
|
||||
}
|
||||
|
||||
type TaskStatus =
|
||||
| "pending"
|
||||
| "running"
|
||||
| "succeeded"
|
||||
| "failed"
|
||||
| "cancelled"
|
||||
| "skipped";
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `target-executor`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Target-specific deployment logic |
|
||||
| **Dependencies** | `agent-manager`, `connector-runtime` |
|
||||
| **Protocol** | gRPC for agents, SSH/WinRM for agentless |
|
||||
|
||||
**Executor Types**:
|
||||
|
||||
| Type | Transport | Use Case |
|
||||
|------|-----------|----------|
|
||||
| `agent-docker` | gRPC | Docker hosts with agent |
|
||||
| `agent-compose` | gRPC | Compose hosts with agent |
|
||||
| `ssh-remote` | SSH | Agentless Linux hosts |
|
||||
| `winrm-remote` | WinRM | Agentless Windows hosts |
|
||||
| `ecs-api` | AWS API | AWS ECS services |
|
||||
| `nomad-api` | Nomad API | HashiCorp Nomad jobs |
|
||||
|
||||
---
|
||||
|
||||
### Module: `runner-executor`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Script/hook execution in sandbox |
|
||||
| **Dependencies** | `plugin-sandbox` |
|
||||
| **Supported Scripts** | C# (.csx), Bash, PowerShell |
|
||||
|
||||
**Hook Types**:
|
||||
- `pre-deploy`: Run before deployment starts
|
||||
- `post-deploy`: Run after deployment succeeds
|
||||
- `on-failure`: Run when deployment fails
|
||||
- `on-rollback`: Run during rollback
|
||||
|
||||
---
|
||||
|
||||
### Module: `artifact-generator`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Generate immutable deployment artifacts |
|
||||
| **Dependencies** | `release-manager`, `environment-manager` |
|
||||
| **Data Entities** | `GeneratedArtifact`, `ComposeLock`, `VersionSticker` |
|
||||
|
||||
**Generated Artifacts**:
|
||||
|
||||
| Artifact Type | Description |
|
||||
|---------------|-------------|
|
||||
| `compose_lock` | `compose.stella.lock.yml` - Pinned digests |
|
||||
| `script` | Compiled deployment script |
|
||||
| `sticker` | `stella.version.json` - Version marker |
|
||||
| `evidence` | Decision and execution evidence |
|
||||
| `config` | Environment-specific config files |
|
||||
|
||||
**Compose Lock File Generation**:
|
||||
```typescript
|
||||
class ComposeLockGenerator {
|
||||
async generate(
|
||||
release: Release,
|
||||
environment: Environment,
|
||||
targets: Target[]
|
||||
): Promise<GeneratedArtifact> {
|
||||
|
||||
const services: Record<string, any> = {};
|
||||
|
||||
for (const component of release.components) {
|
||||
services[component.componentName] = {
|
||||
// CRITICAL: Always use digest, never tag
|
||||
image: `${component.imageRepository}@${component.digest}`,
|
||||
|
||||
// Environment variables
|
||||
environment: this.mergeEnvironment(
|
||||
environment.config.variables,
|
||||
this.buildStellaEnv(release, environment)
|
||||
),
|
||||
|
||||
// Labels for Stella tracking
|
||||
labels: {
|
||||
"stella.release.id": release.id,
|
||||
"stella.release.name": release.name,
|
||||
"stella.component.name": component.componentName,
|
||||
"stella.component.digest": component.digest,
|
||||
"stella.environment": environment.name,
|
||||
"stella.deployed.at": new Date().toISOString(),
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
const composeLock = {
|
||||
version: "3.8",
|
||||
services,
|
||||
"x-stella": {
|
||||
release_id: release.id,
|
||||
release_name: release.name,
|
||||
environment: environment.name,
|
||||
generated_at: new Date().toISOString(),
|
||||
inputs_hash: this.computeInputsHash(release, environment),
|
||||
components: release.components.map(c => ({
|
||||
name: c.componentName,
|
||||
digest: c.digest,
|
||||
semver: c.semver,
|
||||
})),
|
||||
},
|
||||
};
|
||||
|
||||
const content = yaml.stringify(composeLock);
|
||||
const hash = crypto.createHash("sha256").update(content).digest("hex");
|
||||
|
||||
return {
|
||||
type: "compose_lock",
|
||||
name: "compose.stella.lock.yml",
|
||||
content: Buffer.from(content),
|
||||
contentHash: `sha256:${hash}`,
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Version Sticker Generation**:
|
||||
```typescript
|
||||
interface VersionSticker {
|
||||
stella_version: "1.0";
|
||||
release_id: UUID;
|
||||
release_name: string;
|
||||
components: Array<{
|
||||
name: string;
|
||||
digest: string;
|
||||
semver: string;
|
||||
tag: string;
|
||||
image_repository: string;
|
||||
}>;
|
||||
environment: string;
|
||||
environment_id: UUID;
|
||||
deployed_at: string;
|
||||
deployed_by: UUID;
|
||||
promotion_id: UUID;
|
||||
workflow_run_id: UUID;
|
||||
evidence_packet_id: UUID;
|
||||
evidence_packet_hash: string;
|
||||
orchestrator_version: string;
|
||||
source_ref?: {
|
||||
commit_sha: string;
|
||||
branch: string;
|
||||
repository: string;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `rollback-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Rollback orchestration; previous state recovery |
|
||||
| **Dependencies** | `deploy-orchestrator`, `target-registry` |
|
||||
|
||||
**Rollback Strategies**:
|
||||
|
||||
| Strategy | Description |
|
||||
|----------|-------------|
|
||||
| `to-previous` | Roll back to last successful deployment |
|
||||
| `to-release` | Roll back to specific release ID |
|
||||
| `to-sticker` | Roll back to version in sticker on target |
|
||||
|
||||
**Rollback Flow**:
|
||||
1. Identify rollback target (previous release or specified)
|
||||
2. Create rollback deployment job
|
||||
3. Execute deployment with rollback artifacts
|
||||
4. Update target state and sticker
|
||||
5. Record rollback evidence
|
||||
|
||||
---
|
||||
|
||||
## Deployment Strategies
|
||||
|
||||
### All-at-Once
|
||||
Deploy to all targets simultaneously.
|
||||
|
||||
```typescript
|
||||
interface AllAtOnceConfig {
|
||||
parallelism: number; // Max concurrent deployments (0 = unlimited)
|
||||
continueOnFailure: boolean; // Continue if some targets fail
|
||||
failureThreshold: number; // Max failures before abort
|
||||
}
|
||||
```
|
||||
|
||||
### Rolling
|
||||
Deploy to targets sequentially with health checks.
|
||||
|
||||
```typescript
|
||||
interface RollingConfig {
|
||||
batchSize: number; // Targets per batch
|
||||
batchDelay: number; // Seconds between batches
|
||||
healthCheckBetweenBatches: boolean;
|
||||
rollbackOnFailure: boolean;
|
||||
maxUnavailable: number; // Max targets unavailable at once
|
||||
}
|
||||
```
|
||||
|
||||
### Canary
|
||||
Deploy to subset, verify, then proceed.
|
||||
|
||||
```typescript
|
||||
interface CanaryConfig {
|
||||
canaryTargets: number; // Number or percentage for canary
|
||||
canaryDuration: number; // Seconds to run canary
|
||||
healthThreshold: number; // Required health percentage
|
||||
autoPromote: boolean; // Auto-proceed if healthy
|
||||
requireApproval: boolean; // Require manual approval
|
||||
}
|
||||
```
|
||||
|
||||
### Blue-Green
|
||||
Deploy to B, switch traffic, retire A.
|
||||
|
||||
```typescript
|
||||
interface BlueGreenConfig {
|
||||
targetGroupA: UUID; // Current (blue) target group
|
||||
targetGroupB: UUID; // New (green) target group
|
||||
trafficShiftType: "instant" | "gradual";
|
||||
gradualShiftSteps?: number[]; // e.g., [10, 25, 50, 100]
|
||||
rollbackOnHealthFailure: boolean;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rolling Deployment Algorithm
|
||||
|
||||
```python
|
||||
class RollingDeploymentExecutor:
|
||||
def execute(self, job: DeploymentJob, config: RollingConfig) -> DeploymentResult:
|
||||
targets = self.get_targets(job.environment_id)
|
||||
batches = self.create_batches(targets, config.batch_size)
|
||||
|
||||
deployed_targets = []
|
||||
failed_targets = []
|
||||
|
||||
for batch_index, batch in enumerate(batches):
|
||||
self.log(f"Starting batch {batch_index + 1} of {len(batches)}")
|
||||
|
||||
# Deploy batch in parallel
|
||||
batch_results = self.deploy_batch(job, batch)
|
||||
|
||||
for target, result in batch_results:
|
||||
if result.success:
|
||||
deployed_targets.append(target)
|
||||
# Write version sticker
|
||||
self.write_sticker(target, job.release)
|
||||
else:
|
||||
failed_targets.append(target)
|
||||
|
||||
if config.rollback_on_failure:
|
||||
# Rollback all deployed targets
|
||||
self.rollback_targets(deployed_targets, job.previous_release)
|
||||
return DeploymentResult(
|
||||
success=False,
|
||||
error=f"Batch {batch_index + 1} failed, rolled back",
|
||||
deployed=deployed_targets,
|
||||
failed=failed_targets,
|
||||
rolled_back=deployed_targets
|
||||
)
|
||||
|
||||
# Health check between batches
|
||||
if config.health_check_between_batches and batch_index < len(batches) - 1:
|
||||
health_result = self.check_batch_health(deployed_targets[-len(batch):])
|
||||
|
||||
if not health_result.healthy:
|
||||
if config.rollback_on_failure:
|
||||
self.rollback_targets(deployed_targets, job.previous_release)
|
||||
return DeploymentResult(
|
||||
success=False,
|
||||
error=f"Health check failed after batch {batch_index + 1}",
|
||||
deployed=deployed_targets,
|
||||
failed=failed_targets,
|
||||
rolled_back=deployed_targets
|
||||
)
|
||||
|
||||
# Delay between batches
|
||||
if config.batch_delay > 0 and batch_index < len(batches) - 1:
|
||||
time.sleep(config.batch_delay)
|
||||
|
||||
return DeploymentResult(
|
||||
success=len(failed_targets) == 0,
|
||||
deployed=deployed_targets,
|
||||
failed=failed_targets
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Deployment Jobs
|
||||
CREATE TABLE release.deployment_jobs (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
|
||||
release_id UUID NOT NULL REFERENCES release.releases(id),
|
||||
environment_id UUID NOT NULL REFERENCES release.environments(id),
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
|
||||
'pending', 'running', 'succeeded', 'failed', 'cancelled', 'rolling_back', 'rolled_back'
|
||||
)),
|
||||
strategy VARCHAR(50) NOT NULL DEFAULT 'all-at-once',
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
artifacts JSONB NOT NULL DEFAULT '[]',
|
||||
rollback_of UUID REFERENCES release.deployment_jobs(id),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_deployment_jobs_promotion ON release.deployment_jobs(promotion_id);
|
||||
CREATE INDEX idx_deployment_jobs_status ON release.deployment_jobs(status);
|
||||
|
||||
-- Deployment Tasks
|
||||
CREATE TABLE release.deployment_tasks (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
job_id UUID NOT NULL REFERENCES release.deployment_jobs(id) ON DELETE CASCADE,
|
||||
target_id UUID NOT NULL REFERENCES release.targets(id),
|
||||
digest VARCHAR(100) NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
|
||||
'pending', 'running', 'succeeded', 'failed', 'cancelled', 'skipped'
|
||||
)),
|
||||
agent_id UUID REFERENCES release.agents(id),
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
exit_code INTEGER,
|
||||
logs TEXT,
|
||||
previous_digest VARCHAR(100),
|
||||
sticker_written BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_deployment_tasks_job ON release.deployment_tasks(job_id);
|
||||
CREATE INDEX idx_deployment_tasks_target ON release.deployment_tasks(target_id);
|
||||
CREATE INDEX idx_deployment_tasks_status ON release.deployment_tasks(status);
|
||||
|
||||
-- Generated Artifacts
|
||||
CREATE TABLE release.generated_artifacts (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
deployment_job_id UUID REFERENCES release.deployment_jobs(id) ON DELETE CASCADE,
|
||||
artifact_type VARCHAR(50) NOT NULL CHECK (artifact_type IN (
|
||||
'compose_lock', 'script', 'sticker', 'evidence', 'config'
|
||||
)),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
content_hash VARCHAR(100) NOT NULL,
|
||||
content BYTEA, -- for small artifacts
|
||||
storage_ref VARCHAR(500), -- for large artifacts (S3, etc.)
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_generated_artifacts_job ON release.generated_artifacts(deployment_job_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Deployment Jobs (mostly read-only; created by promotions)
|
||||
GET /api/v1/deployment-jobs
|
||||
Query: ?promotionId={uuid}&status={status}&environmentId={uuid}
|
||||
Response: DeploymentJob[]
|
||||
|
||||
GET /api/v1/deployment-jobs/{id}
|
||||
Response: DeploymentJob (with tasks)
|
||||
|
||||
GET /api/v1/deployment-jobs/{id}/tasks
|
||||
Response: DeploymentTask[]
|
||||
|
||||
GET /api/v1/deployment-jobs/{id}/tasks/{taskId}
|
||||
Response: DeploymentTask (with logs)
|
||||
|
||||
GET /api/v1/deployment-jobs/{id}/tasks/{taskId}/logs
|
||||
Query: ?follow=true
|
||||
Response: string | SSE stream
|
||||
|
||||
GET /api/v1/deployment-jobs/{id}/artifacts
|
||||
Response: GeneratedArtifact[]
|
||||
|
||||
GET /api/v1/deployment-jobs/{id}/artifacts/{artifactId}
|
||||
Response: binary (download)
|
||||
|
||||
# Rollback
|
||||
POST /api/v1/rollbacks
|
||||
Body: {
|
||||
environmentId: UUID,
|
||||
strategy: "to-previous" | "to-release" | "to-sticker",
|
||||
targetReleaseId?: UUID # for to-release strategy
|
||||
}
|
||||
Response: DeploymentJob (rollback job)
|
||||
|
||||
GET /api/v1/rollbacks
|
||||
Query: ?environmentId={uuid}
|
||||
Response: DeploymentJob[] (rollback jobs only)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Agents Specification](agents.md)
|
||||
- [Deployment Strategies](../deployment/strategies.md)
|
||||
- [Artifact Generation](../deployment/artifacts.md)
|
||||
- [API Documentation](../api/deployments.md)
|
||||
418
docs/modules/release-orchestrator/modules/environment-manager.md
Normal file
418
docs/modules/release-orchestrator/modules/environment-manager.md
Normal file
@@ -0,0 +1,418 @@
|
||||
# ENVMGR: Environment & Inventory Manager
|
||||
|
||||
**Purpose**: Model environments, targets, agents, and their relationships.
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `environment-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Environment CRUD, ordering, configuration, freeze windows |
|
||||
| **Dependencies** | `authority` |
|
||||
| **Data Entities** | `Environment`, `EnvironmentConfig`, `FreezeWindow` |
|
||||
| **Events Produced** | `environment.created`, `environment.updated`, `environment.freeze_started`, `environment.freeze_ended` |
|
||||
|
||||
**Key Operations**:
|
||||
```
|
||||
CreateEnvironment(name, displayName, orderIndex, config) → Environment
|
||||
UpdateEnvironment(id, config) → Environment
|
||||
DeleteEnvironment(id) → void
|
||||
SetFreezeWindow(environmentId, start, end, reason, exceptions) → FreezeWindow
|
||||
ClearFreezeWindow(environmentId, windowId) → void
|
||||
ListEnvironments(tenantId) → Environment[]
|
||||
GetEnvironmentState(id) → EnvironmentState
|
||||
```
|
||||
|
||||
**Environment Entity**:
|
||||
```typescript
|
||||
interface Environment {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string; // "dev", "stage", "prod"
|
||||
displayName: string; // "Development"
|
||||
orderIndex: number; // 0, 1, 2 for promotion order
|
||||
config: EnvironmentConfig;
|
||||
freezeWindows: FreezeWindow[];
|
||||
requiredApprovals: number; // 0 for dev, 1+ for prod
|
||||
requireSeparationOfDuties: boolean;
|
||||
autoPromoteFrom: UUID | null; // auto-promote from this env
|
||||
promotionPolicy: string; // OPA policy name
|
||||
createdAt: DateTime;
|
||||
updatedAt: DateTime;
|
||||
}
|
||||
|
||||
interface EnvironmentConfig {
|
||||
variables: Record<string, string>; // env-specific variables
|
||||
secrets: SecretReference[]; // vault references
|
||||
registryOverrides: RegistryOverride[]; // per-env registry
|
||||
agentLabels: string[]; // required agent labels
|
||||
deploymentTimeout: number; // seconds
|
||||
healthCheckConfig: HealthCheckConfig;
|
||||
}
|
||||
|
||||
interface FreezeWindow {
|
||||
id: UUID;
|
||||
start: DateTime;
|
||||
end: DateTime;
|
||||
reason: string;
|
||||
createdBy: UUID;
|
||||
exceptions: UUID[]; // users who can override
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `target-registry`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Deployment target inventory; capability tracking |
|
||||
| **Dependencies** | `environment-manager`, `agent-manager` |
|
||||
| **Data Entities** | `Target`, `TargetGroup`, `TargetCapability` |
|
||||
| **Events Produced** | `target.created`, `target.updated`, `target.deleted`, `target.health_changed` |
|
||||
|
||||
**Target Types** (plugin-provided):
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
| `docker_host` | Single Docker host |
|
||||
| `compose_host` | Docker Compose host |
|
||||
| `ssh_remote` | Generic SSH target |
|
||||
| `winrm_remote` | Windows remote target |
|
||||
| `ecs_service` | AWS ECS service |
|
||||
| `nomad_job` | HashiCorp Nomad job |
|
||||
|
||||
**Target Entity**:
|
||||
```typescript
|
||||
interface Target {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
environmentId: UUID;
|
||||
name: string; // "prod-web-01"
|
||||
targetType: string; // "docker_host"
|
||||
connection: TargetConnection; // type-specific
|
||||
capabilities: TargetCapability[];
|
||||
labels: Record<string, string>; // for grouping
|
||||
healthStatus: HealthStatus;
|
||||
lastHealthCheck: DateTime;
|
||||
deploymentDirectory: string; // where artifacts are placed
|
||||
currentDigest: string | null; // what's currently deployed
|
||||
agentId: UUID | null; // assigned agent
|
||||
}
|
||||
|
||||
interface TargetConnection {
|
||||
// Common fields
|
||||
host: string;
|
||||
port: number;
|
||||
|
||||
// Type-specific (examples)
|
||||
// docker_host:
|
||||
dockerSocket?: string;
|
||||
tlsCert?: SecretReference;
|
||||
|
||||
// ssh_remote:
|
||||
username?: string;
|
||||
privateKey?: SecretReference;
|
||||
|
||||
// ecs_service:
|
||||
cluster?: string;
|
||||
service?: string;
|
||||
region?: string;
|
||||
roleArn?: string;
|
||||
}
|
||||
|
||||
interface TargetGroup {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
environmentId: UUID;
|
||||
name: string;
|
||||
labels: Record<string, string>;
|
||||
createdAt: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `agent-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Agent registration, heartbeat, capability advertisement |
|
||||
| **Dependencies** | `authority` (for agent tokens) |
|
||||
| **Data Entities** | `Agent`, `AgentCapability`, `AgentHeartbeat` |
|
||||
| **Events Produced** | `agent.registered`, `agent.online`, `agent.offline`, `agent.capability_changed` |
|
||||
|
||||
**Agent Lifecycle**:
|
||||
1. Agent starts, requests registration token from Authority
|
||||
2. Agent registers with capabilities and labels
|
||||
3. Agent sends heartbeats (default: 30s interval)
|
||||
4. Agent pulls tasks from task queue
|
||||
5. Agent reports task completion/failure
|
||||
|
||||
**Agent Entity**:
|
||||
```typescript
|
||||
interface Agent {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string;
|
||||
version: string;
|
||||
capabilities: AgentCapability[];
|
||||
labels: Record<string, string>;
|
||||
status: "online" | "offline" | "degraded";
|
||||
lastHeartbeat: DateTime;
|
||||
assignedTargets: UUID[];
|
||||
resourceUsage: ResourceUsage;
|
||||
}
|
||||
|
||||
interface AgentCapability {
|
||||
type: string; // "docker", "compose", "ssh", "winrm"
|
||||
version: string; // capability version
|
||||
config: object; // capability-specific config
|
||||
}
|
||||
|
||||
interface ResourceUsage {
|
||||
cpuPercent: number;
|
||||
memoryPercent: number;
|
||||
diskPercent: number;
|
||||
activeTasks: number;
|
||||
}
|
||||
```
|
||||
|
||||
**Agent Registration Protocol**:
|
||||
```
|
||||
1. Admin generates registration token (one-time use)
|
||||
POST /api/v1/admin/agent-tokens
|
||||
→ { token: "reg_xxx", expiresAt: "..." }
|
||||
|
||||
2. Agent starts with registration token
|
||||
./stella-agent --register --token=reg_xxx
|
||||
|
||||
3. Agent requests mTLS certificate
|
||||
POST /api/v1/agents/register
|
||||
Headers: X-Registration-Token: reg_xxx
|
||||
Body: { name, version, capabilities, csr }
|
||||
→ { agentId, certificate, caCertificate }
|
||||
|
||||
4. Agent establishes mTLS connection
|
||||
Uses issued certificate for all subsequent requests
|
||||
|
||||
5. Agent requests short-lived JWT for task execution
|
||||
POST /api/v1/agents/token (over mTLS)
|
||||
→ { token, expiresIn: 3600 } // 1 hour
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `inventory-sync`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Drift detection; expected vs actual state reconciliation |
|
||||
| **Dependencies** | `target-registry`, `agent-manager` |
|
||||
| **Events Produced** | `inventory.drift_detected`, `inventory.reconciled` |
|
||||
|
||||
**Drift Detection Process**:
|
||||
1. Read `stella.version.json` from target deployment directory
|
||||
2. Compare with expected state in database
|
||||
3. Flag discrepancies (digest mismatch, missing sticker, unexpected files)
|
||||
4. Report on dashboard
|
||||
|
||||
**Drift Detection Types**:
|
||||
|
||||
| Drift Type | Description | Severity |
|
||||
|------------|-------------|----------|
|
||||
| `digest_mismatch` | Running digest differs from expected | Critical |
|
||||
| `missing_sticker` | No version sticker found on target | Warning |
|
||||
| `stale_sticker` | Sticker timestamp older than last deployment | Warning |
|
||||
| `orphan_container` | Container not managed by Stella | Info |
|
||||
| `extra_files` | Unexpected files in deployment directory | Info |
|
||||
|
||||
---
|
||||
|
||||
## Cache Eviction Policies
|
||||
|
||||
Environment configurations and target states are cached to improve performance. **All caches MUST have bounded size and TTL-based eviction**:
|
||||
|
||||
| Cache Type | Purpose | TTL | Max Size | Eviction Strategy |
|
||||
|-----------|---------|-----|----------|-------------------|
|
||||
| **Environment Configs** | Environment configuration data | 30 minutes | 500 entries | Sliding expiration |
|
||||
| **Target Health** | Target health status | 5 minutes | 2,000 entries | Sliding expiration |
|
||||
| **Agent Capabilities** | Agent capability advertisement | 10 minutes | 1,000 entries | Sliding expiration |
|
||||
| **Freeze Windows** | Active freeze window checks | 15 minutes | 100 entries | Absolute expiration |
|
||||
|
||||
**Implementation**:
|
||||
```csharp
|
||||
public class EnvironmentConfigCache
|
||||
{
|
||||
private readonly MemoryCache _cache;
|
||||
|
||||
public EnvironmentConfigCache()
|
||||
{
|
||||
_cache = new MemoryCache(new MemoryCacheOptions
|
||||
{
|
||||
SizeLimit = 500 // Max 500 environment configs
|
||||
});
|
||||
}
|
||||
|
||||
public void CacheConfig(Guid environmentId, EnvironmentConfig config)
|
||||
{
|
||||
_cache.Set(environmentId, config, new MemoryCacheEntryOptions
|
||||
{
|
||||
Size = 1,
|
||||
SlidingExpiration = TimeSpan.FromMinutes(30) // 30-minute TTL
|
||||
});
|
||||
}
|
||||
|
||||
public EnvironmentConfig? GetCachedConfig(Guid environmentId)
|
||||
=> _cache.Get<EnvironmentConfig>(environmentId);
|
||||
|
||||
public void InvalidateConfig(Guid environmentId)
|
||||
=> _cache.Remove(environmentId);
|
||||
}
|
||||
```
|
||||
|
||||
**Cache Invalidation**:
|
||||
- Environment configs: Invalidate on update
|
||||
- Target health: Invalidate on health check or deployment
|
||||
- Agent capabilities: Invalidate on capability change event
|
||||
- Freeze windows: Invalidate on window creation/deletion
|
||||
|
||||
**Reference**: See [Implementation Guide](../implementation-guide.md#caching) for cache implementation patterns.
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Environments
|
||||
CREATE TABLE release.environments (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(100) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
order_index INTEGER NOT NULL,
|
||||
config JSONB NOT NULL DEFAULT '{}',
|
||||
freeze_windows JSONB NOT NULL DEFAULT '[]',
|
||||
required_approvals INTEGER NOT NULL DEFAULT 0,
|
||||
require_sod BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
auto_promote_from UUID REFERENCES release.environments(id),
|
||||
promotion_policy VARCHAR(255),
|
||||
deployment_timeout INTEGER NOT NULL DEFAULT 600,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_environments_tenant ON release.environments(tenant_id);
|
||||
CREATE INDEX idx_environments_order ON release.environments(tenant_id, order_index);
|
||||
|
||||
-- Target Groups
|
||||
CREATE TABLE release.target_groups (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
labels JSONB NOT NULL DEFAULT '{}',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, environment_id, name)
|
||||
);
|
||||
|
||||
-- Targets
|
||||
CREATE TABLE release.targets (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE,
|
||||
target_group_id UUID REFERENCES release.target_groups(id),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
target_type VARCHAR(100) NOT NULL,
|
||||
connection JSONB NOT NULL,
|
||||
capabilities JSONB NOT NULL DEFAULT '[]',
|
||||
labels JSONB NOT NULL DEFAULT '{}',
|
||||
deployment_directory VARCHAR(500),
|
||||
health_status VARCHAR(50) NOT NULL DEFAULT 'unknown',
|
||||
last_health_check TIMESTAMPTZ,
|
||||
current_digest VARCHAR(100),
|
||||
agent_id UUID REFERENCES release.agents(id),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, environment_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_targets_tenant_env ON release.targets(tenant_id, environment_id);
|
||||
CREATE INDEX idx_targets_type ON release.targets(target_type);
|
||||
CREATE INDEX idx_targets_labels ON release.targets USING GIN (labels);
|
||||
|
||||
-- Agents
|
||||
CREATE TABLE release.agents (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
version VARCHAR(50) NOT NULL,
|
||||
capabilities JSONB NOT NULL DEFAULT '[]',
|
||||
labels JSONB NOT NULL DEFAULT '{}',
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'offline',
|
||||
last_heartbeat TIMESTAMPTZ,
|
||||
resource_usage JSONB,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_agents_tenant ON release.agents(tenant_id);
|
||||
CREATE INDEX idx_agents_status ON release.agents(status);
|
||||
CREATE INDEX idx_agents_capabilities ON release.agents USING GIN (capabilities);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Environments
|
||||
POST /api/v1/environments
|
||||
GET /api/v1/environments
|
||||
GET /api/v1/environments/{id}
|
||||
PUT /api/v1/environments/{id}
|
||||
DELETE /api/v1/environments/{id}
|
||||
|
||||
# Freeze Windows
|
||||
POST /api/v1/environments/{envId}/freeze-windows
|
||||
GET /api/v1/environments/{envId}/freeze-windows
|
||||
DELETE /api/v1/environments/{envId}/freeze-windows/{windowId}
|
||||
|
||||
# Target Groups
|
||||
POST /api/v1/environments/{envId}/target-groups
|
||||
GET /api/v1/environments/{envId}/target-groups
|
||||
GET /api/v1/target-groups/{id}
|
||||
PUT /api/v1/target-groups/{id}
|
||||
DELETE /api/v1/target-groups/{id}
|
||||
|
||||
# Targets
|
||||
POST /api/v1/targets
|
||||
GET /api/v1/targets
|
||||
GET /api/v1/targets/{id}
|
||||
PUT /api/v1/targets/{id}
|
||||
DELETE /api/v1/targets/{id}
|
||||
POST /api/v1/targets/{id}/health-check
|
||||
GET /api/v1/targets/{id}/sticker
|
||||
GET /api/v1/targets/{id}/drift
|
||||
|
||||
# Agents
|
||||
POST /api/v1/agents/register
|
||||
GET /api/v1/agents
|
||||
GET /api/v1/agents/{id}
|
||||
PUT /api/v1/agents/{id}
|
||||
DELETE /api/v1/agents/{id}
|
||||
POST /api/v1/agents/{id}/heartbeat
|
||||
POST /api/v1/agents/{id}/tasks/{taskId}/complete
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Agent Specification](agents.md)
|
||||
- [API Documentation](../api/environments.md)
|
||||
- [Agent Security](../security/agent-security.md)
|
||||
575
docs/modules/release-orchestrator/modules/evidence.md
Normal file
575
docs/modules/release-orchestrator/modules/evidence.md
Normal file
@@ -0,0 +1,575 @@
|
||||
# RELEVI: Release Evidence
|
||||
|
||||
**Purpose**: Cryptographically sealed evidence packets for audit-grade release governance.
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `evidence-collector`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Evidence aggregation; packet composition |
|
||||
| **Dependencies** | `promotion-manager`, `deploy-orchestrator`, `decision-engine` |
|
||||
| **Data Entities** | `EvidencePacket`, `EvidenceContent` |
|
||||
| **Events Produced** | `evidence.collected`, `evidence.packet_created` |
|
||||
|
||||
**Evidence Packet Structure**:
|
||||
```typescript
|
||||
interface EvidencePacket {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
promotionId: UUID;
|
||||
packetType: EvidencePacketType;
|
||||
content: EvidenceContent;
|
||||
contentHash: string; // SHA-256 of content
|
||||
signature: string; // Cryptographic signature
|
||||
signerKeyRef: string; // Reference to signing key
|
||||
createdAt: DateTime;
|
||||
// Note: No updatedAt - packets are immutable
|
||||
}
|
||||
|
||||
type EvidencePacketType =
|
||||
| "release_decision" // Promotion decision evidence
|
||||
| "deployment" // Deployment execution evidence
|
||||
| "rollback" // Rollback evidence
|
||||
| "ab_promotion"; // A/B promotion evidence
|
||||
|
||||
interface EvidenceContent {
|
||||
// Metadata
|
||||
version: "1.0";
|
||||
generatedAt: DateTime;
|
||||
generatorVersion: string;
|
||||
|
||||
// What
|
||||
release: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
components: Array<{
|
||||
name: string;
|
||||
digest: string;
|
||||
semver: string;
|
||||
imageRepository: string;
|
||||
}>;
|
||||
sourceRef: SourceReference | null;
|
||||
};
|
||||
|
||||
// Where
|
||||
environment: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
targets: Array<{
|
||||
id: UUID;
|
||||
name: string;
|
||||
type: string;
|
||||
}>;
|
||||
};
|
||||
|
||||
// Who
|
||||
actors: {
|
||||
requester: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
email: string;
|
||||
};
|
||||
approvers: Array<{
|
||||
id: UUID;
|
||||
name: string;
|
||||
action: string;
|
||||
at: DateTime;
|
||||
comment: string | null;
|
||||
}>;
|
||||
};
|
||||
|
||||
// Why
|
||||
decision: {
|
||||
result: "allow" | "deny";
|
||||
gates: Array<{
|
||||
type: string;
|
||||
name: string;
|
||||
status: string;
|
||||
message: string;
|
||||
details: Record<string, any>;
|
||||
}>;
|
||||
reasons: string[];
|
||||
};
|
||||
|
||||
// How
|
||||
execution: {
|
||||
workflowRunId: UUID | null;
|
||||
deploymentJobId: UUID | null;
|
||||
artifacts: Array<{
|
||||
type: string;
|
||||
name: string;
|
||||
contentHash: string;
|
||||
}>;
|
||||
logs: string | null; // Compressed/truncated
|
||||
};
|
||||
|
||||
// When
|
||||
timeline: {
|
||||
requestedAt: DateTime;
|
||||
decidedAt: DateTime | null;
|
||||
startedAt: DateTime | null;
|
||||
completedAt: DateTime | null;
|
||||
};
|
||||
|
||||
// Integrity
|
||||
inputsHash: string; // Hash of all inputs for replay
|
||||
previousEvidenceId: UUID | null; // Chain to previous evidence
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `evidence-signer`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Cryptographic signing of evidence packets |
|
||||
| **Dependencies** | `authority`, `vault` (for key storage) |
|
||||
| **Algorithms** | RS256, ES256, Ed25519 |
|
||||
|
||||
**Signing Process**:
|
||||
```typescript
|
||||
class EvidenceSigner {
|
||||
async sign(content: EvidenceContent): Promise<SignedEvidence> {
|
||||
// 1. Canonicalize content (RFC 8785)
|
||||
const canonicalJson = canonicalize(content);
|
||||
|
||||
// 2. Compute content hash
|
||||
const contentHash = crypto
|
||||
.createHash("sha256")
|
||||
.update(canonicalJson)
|
||||
.digest("hex");
|
||||
|
||||
// 3. Get signing key from vault
|
||||
const keyRef = await this.getActiveSigningKey();
|
||||
const privateKey = await this.vault.getPrivateKey(keyRef);
|
||||
|
||||
// 4. Sign the content hash
|
||||
const signature = await this.signWithKey(contentHash, privateKey);
|
||||
|
||||
return {
|
||||
content,
|
||||
contentHash: `sha256:${contentHash}`,
|
||||
signature: base64Encode(signature),
|
||||
signerKeyRef: keyRef,
|
||||
algorithm: this.config.signatureAlgorithm,
|
||||
};
|
||||
}
|
||||
|
||||
async verify(packet: EvidencePacket): Promise<VerificationResult> {
|
||||
// 1. Canonicalize stored content
|
||||
const canonicalJson = canonicalize(packet.content);
|
||||
|
||||
// 2. Verify content hash
|
||||
const computedHash = crypto
|
||||
.createHash("sha256")
|
||||
.update(canonicalJson)
|
||||
.digest("hex");
|
||||
|
||||
if (`sha256:${computedHash}` !== packet.contentHash) {
|
||||
return { valid: false, error: "Content hash mismatch" };
|
||||
}
|
||||
|
||||
// 3. Get public key
|
||||
const publicKey = await this.vault.getPublicKey(packet.signerKeyRef);
|
||||
|
||||
// 4. Verify signature
|
||||
const signatureValid = await this.verifySignature(
|
||||
computedHash,
|
||||
base64Decode(packet.signature),
|
||||
publicKey
|
||||
);
|
||||
|
||||
return {
|
||||
valid: signatureValid,
|
||||
signerKeyRef: packet.signerKeyRef,
|
||||
signedAt: packet.createdAt,
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `sticker-writer`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Version sticker generation and placement |
|
||||
| **Dependencies** | `deploy-orchestrator`, `agent-manager` |
|
||||
| **Data Entities** | `VersionSticker` |
|
||||
|
||||
**Version Sticker Schema**:
|
||||
```typescript
|
||||
interface VersionSticker {
|
||||
stella_version: "1.0";
|
||||
|
||||
// Release identity
|
||||
release_id: UUID;
|
||||
release_name: string;
|
||||
|
||||
// Component details
|
||||
components: Array<{
|
||||
name: string;
|
||||
digest: string;
|
||||
semver: string;
|
||||
tag: string;
|
||||
image_repository: string;
|
||||
}>;
|
||||
|
||||
// Deployment context
|
||||
environment: string;
|
||||
environment_id: UUID;
|
||||
deployed_at: string; // ISO 8601
|
||||
deployed_by: UUID;
|
||||
|
||||
// Traceability
|
||||
promotion_id: UUID;
|
||||
workflow_run_id: UUID;
|
||||
|
||||
// Evidence chain
|
||||
evidence_packet_id: UUID;
|
||||
evidence_packet_hash: string;
|
||||
policy_decision_hash: string;
|
||||
|
||||
// Orchestrator info
|
||||
orchestrator_version: string;
|
||||
|
||||
// Source reference
|
||||
source_ref?: {
|
||||
commit_sha: string;
|
||||
branch: string;
|
||||
repository: string;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Sticker Placement**:
|
||||
- Written to `/var/stella/version.json` on each target
|
||||
- Atomic write (write to temp, rename)
|
||||
- Read during drift detection
|
||||
- Verified against expected state
|
||||
|
||||
---
|
||||
|
||||
### Module: `audit-exporter`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Compliance report generation; evidence export |
|
||||
| **Dependencies** | `evidence-collector` |
|
||||
| **Export Formats** | JSON, PDF, CSV |
|
||||
|
||||
**Audit Report Types**:
|
||||
|
||||
| Report Type | Description |
|
||||
|-------------|-------------|
|
||||
| `release_audit` | Full audit trail for a release |
|
||||
| `environment_audit` | All deployments to an environment |
|
||||
| `compliance_summary` | Summary for compliance review |
|
||||
| `change_log` | Chronological change log |
|
||||
|
||||
**Report Generation**:
|
||||
```typescript
|
||||
interface AuditReportRequest {
|
||||
type: AuditReportType;
|
||||
scope: {
|
||||
releaseId?: UUID;
|
||||
environmentId?: UUID;
|
||||
from?: DateTime;
|
||||
to?: DateTime;
|
||||
};
|
||||
format: "json" | "pdf" | "csv";
|
||||
options?: {
|
||||
includeDecisionDetails: boolean;
|
||||
includeApproverDetails: boolean;
|
||||
includeLogs: boolean;
|
||||
includeArtifacts: boolean;
|
||||
};
|
||||
}
|
||||
|
||||
interface AuditReport {
|
||||
id: UUID;
|
||||
type: AuditReportType;
|
||||
scope: ReportScope;
|
||||
generatedAt: DateTime;
|
||||
generatedBy: UUID;
|
||||
|
||||
summary: {
|
||||
totalPromotions: number;
|
||||
successfulDeployments: number;
|
||||
failedDeployments: number;
|
||||
rollbacks: number;
|
||||
averageDeploymentTime: number;
|
||||
};
|
||||
|
||||
entries: AuditEntry[];
|
||||
|
||||
// For compliance
|
||||
signatureChain: {
|
||||
valid: boolean;
|
||||
verifiedPackets: number;
|
||||
invalidPackets: number;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Immutability Enforcement
|
||||
|
||||
Evidence packets are append-only. This is enforced at multiple levels:
|
||||
|
||||
### Database Level
|
||||
```sql
|
||||
-- Evidence packets table with no UPDATE/DELETE
|
||||
CREATE TABLE release.evidence_packets (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
|
||||
packet_type VARCHAR(50) NOT NULL CHECK (packet_type IN (
|
||||
'release_decision', 'deployment', 'rollback', 'ab_promotion'
|
||||
)),
|
||||
content JSONB NOT NULL,
|
||||
content_hash VARCHAR(100) NOT NULL,
|
||||
signature TEXT,
|
||||
signer_key_ref VARCHAR(255),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
-- Note: No updated_at column; immutable by design
|
||||
);
|
||||
|
||||
-- Append-only enforcement via trigger
|
||||
CREATE OR REPLACE FUNCTION prevent_evidence_modification()
|
||||
RETURNS TRIGGER AS $$
|
||||
BEGIN
|
||||
RAISE EXCEPTION 'Evidence packets are immutable and cannot be modified or deleted';
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
CREATE TRIGGER evidence_packets_immutable
|
||||
BEFORE UPDATE OR DELETE ON evidence_packets
|
||||
FOR EACH ROW EXECUTE FUNCTION prevent_evidence_modification();
|
||||
|
||||
-- Revoke UPDATE/DELETE from application role
|
||||
REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role;
|
||||
|
||||
-- Version stickers table
|
||||
CREATE TABLE release.version_stickers (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
target_id UUID NOT NULL REFERENCES release.targets(id),
|
||||
release_id UUID NOT NULL REFERENCES release.releases(id),
|
||||
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
|
||||
sticker_content JSONB NOT NULL,
|
||||
content_hash VARCHAR(100) NOT NULL,
|
||||
written_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
verified_at TIMESTAMPTZ,
|
||||
drift_detected BOOLEAN NOT NULL DEFAULT FALSE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_version_stickers_target ON release.version_stickers(target_id);
|
||||
CREATE INDEX idx_version_stickers_release ON release.version_stickers(release_id);
|
||||
CREATE INDEX idx_evidence_packets_promotion ON release.evidence_packets(promotion_id);
|
||||
CREATE INDEX idx_evidence_packets_created ON release.evidence_packets(created_at DESC);
|
||||
```
|
||||
|
||||
### Application Level
|
||||
```csharp
|
||||
// Evidence service enforces immutability
|
||||
public sealed class EvidenceService
|
||||
{
|
||||
// Only Create method - no Update or Delete
|
||||
public async Task<EvidencePacket> CreateAsync(
|
||||
EvidenceContent content,
|
||||
CancellationToken ct)
|
||||
{
|
||||
// Sign content
|
||||
var signed = await _signer.SignAsync(content, ct);
|
||||
|
||||
// Store (append-only)
|
||||
var packet = new EvidencePacket
|
||||
{
|
||||
Id = Guid.NewGuid(),
|
||||
TenantId = content.TenantId,
|
||||
PromotionId = content.PromotionId,
|
||||
PacketType = content.PacketType,
|
||||
Content = content,
|
||||
ContentHash = signed.ContentHash,
|
||||
Signature = signed.Signature,
|
||||
SignerKeyRef = signed.SignerKeyRef,
|
||||
CreatedAt = DateTime.UtcNow,
|
||||
};
|
||||
|
||||
await _repository.InsertAsync(packet, ct);
|
||||
return packet;
|
||||
}
|
||||
|
||||
// Read methods only
|
||||
public async Task<EvidencePacket> GetAsync(Guid id, CancellationToken ct);
|
||||
public async Task<IReadOnlyList<EvidencePacket>> ListAsync(
|
||||
EvidenceFilter filter, CancellationToken ct);
|
||||
public async Task<VerificationResult> VerifyAsync(
|
||||
Guid id, CancellationToken ct);
|
||||
|
||||
// No Update or Delete methods exist
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Evidence Chain
|
||||
|
||||
Evidence packets form a verifiable chain:
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Evidence #1 │ │ Evidence #2 │ │ Evidence #3 │
|
||||
│ (Dev Deploy) │────►│ (Stage Deploy) │────►│ (Prod Deploy) │
|
||||
│ │ │ │ │ │
|
||||
│ prevEvidenceId: │ │ prevEvidenceId: │ │ prevEvidenceId: │
|
||||
│ null │ │ #1 │ │ #2 │
|
||||
│ │ │ │ │ │
|
||||
│ contentHash: │ │ contentHash: │ │ contentHash: │
|
||||
│ sha256:abc... │ │ sha256:def... │ │ sha256:ghi... │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
**Chain Verification**:
|
||||
```typescript
|
||||
async function verifyEvidenceChain(releaseId: UUID): Promise<ChainVerificationResult> {
|
||||
const packets = await getPacketsForRelease(releaseId);
|
||||
const results: PacketVerificationResult[] = [];
|
||||
|
||||
let previousHash: string | null = null;
|
||||
|
||||
for (const packet of packets) {
|
||||
// 1. Verify packet signature
|
||||
const signatureValid = await verifySignature(packet);
|
||||
|
||||
// 2. Verify content hash
|
||||
const contentValid = await verifyContentHash(packet);
|
||||
|
||||
// 3. Verify chain link
|
||||
const chainValid = packet.content.previousEvidenceId === null
|
||||
? previousHash === null
|
||||
: await verifyPreviousLink(packet, previousHash);
|
||||
|
||||
results.push({
|
||||
packetId: packet.id,
|
||||
signatureValid,
|
||||
contentValid,
|
||||
chainValid,
|
||||
valid: signatureValid && contentValid && chainValid,
|
||||
});
|
||||
|
||||
previousHash = packet.contentHash;
|
||||
}
|
||||
|
||||
return {
|
||||
valid: results.every(r => r.valid),
|
||||
packets: results,
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Evidence Packets
|
||||
GET /api/v1/evidence-packets
|
||||
Query: ?promotionId={uuid}&type={type}&from={date}&to={date}
|
||||
Response: EvidencePacket[]
|
||||
|
||||
GET /api/v1/evidence-packets/{id}
|
||||
Response: EvidencePacket (full content)
|
||||
|
||||
GET /api/v1/evidence-packets/{id}/verify
|
||||
Response: VerificationResult
|
||||
|
||||
GET /api/v1/evidence-packets/{id}/download
|
||||
Query: ?format={json|pdf}
|
||||
Response: binary
|
||||
|
||||
# Evidence Chain
|
||||
GET /api/v1/releases/{id}/evidence-chain
|
||||
Response: EvidenceChain
|
||||
|
||||
GET /api/v1/releases/{id}/evidence-chain/verify
|
||||
Response: ChainVerificationResult
|
||||
|
||||
# Audit Reports
|
||||
POST /api/v1/audit-reports
|
||||
Body: {
|
||||
type: "release" | "environment" | "compliance",
|
||||
scope: { releaseId?, environmentId?, from?, to? },
|
||||
format: "json" | "pdf" | "csv"
|
||||
}
|
||||
Response: { reportId: UUID, status: "generating" }
|
||||
|
||||
GET /api/v1/audit-reports/{id}
|
||||
Response: { status, downloadUrl? }
|
||||
|
||||
GET /api/v1/audit-reports/{id}/download
|
||||
Response: binary
|
||||
|
||||
# Version Stickers
|
||||
GET /api/v1/version-stickers
|
||||
Query: ?targetId={uuid}&releaseId={uuid}
|
||||
Response: VersionSticker[]
|
||||
|
||||
GET /api/v1/version-stickers/{id}
|
||||
Response: VersionSticker
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deterministic Replay
|
||||
|
||||
Evidence packets enable deterministic replay - given the same inputs and policy version, the same decision is produced:
|
||||
|
||||
```typescript
|
||||
async function replayDecision(evidencePacket: EvidencePacket): Promise<ReplayResult> {
|
||||
const content = evidencePacket.content;
|
||||
|
||||
// 1. Verify inputs hash
|
||||
const currentInputsHash = computeInputsHash(
|
||||
content.release,
|
||||
content.environment,
|
||||
content.decision.gates
|
||||
);
|
||||
|
||||
if (currentInputsHash !== content.inputsHash) {
|
||||
return { valid: false, error: "Inputs have changed since original decision" };
|
||||
}
|
||||
|
||||
// 2. Re-evaluate decision with same inputs
|
||||
const replayedDecision = await evaluateDecision(
|
||||
content.release,
|
||||
content.environment,
|
||||
{ asOf: content.timeline.decidedAt } // Use policy version from that time
|
||||
);
|
||||
|
||||
// 3. Compare decisions
|
||||
const decisionsMatch = replayedDecision.result === content.decision.result;
|
||||
|
||||
return {
|
||||
valid: decisionsMatch,
|
||||
originalDecision: content.decision.result,
|
||||
replayedDecision: replayedDecision.result,
|
||||
differences: decisionsMatch ? [] : computeDifferences(content.decision, replayedDecision),
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Design Principles](../design/principles.md)
|
||||
- [Security Architecture](../security/overview.md)
|
||||
- [Evidence Schema](../appendices/evidence-schema.md)
|
||||
373
docs/modules/release-orchestrator/modules/integration-hub.md
Normal file
373
docs/modules/release-orchestrator/modules/integration-hub.md
Normal file
@@ -0,0 +1,373 @@
|
||||
# INTHUB: Integration Hub
|
||||
|
||||
**Purpose**: Central management of all external integrations (SCM, CI, registries, vaults, targets).
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `integration-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | CRUD for integration instances; plugin type registry |
|
||||
| **Dependencies** | `plugin-registry`, `authority` (for credentials) |
|
||||
| **Data Entities** | `Integration`, `IntegrationType`, `IntegrationCredential` |
|
||||
| **Events Produced** | `integration.created`, `integration.updated`, `integration.deleted`, `integration.health_changed` |
|
||||
| **Events Consumed** | `plugin.registered`, `plugin.unregistered` |
|
||||
|
||||
**Key Operations**:
|
||||
```
|
||||
CreateIntegration(type, name, config, credentials) → Integration
|
||||
UpdateIntegration(id, config, credentials) → Integration
|
||||
DeleteIntegration(id) → void
|
||||
TestConnection(id) → ConnectionTestResult
|
||||
DiscoverResources(id, resourceType) → Resource[]
|
||||
GetIntegrationHealth(id) → HealthStatus
|
||||
ListIntegrations(filter) → Integration[]
|
||||
```
|
||||
|
||||
**Integration Entity**:
|
||||
```typescript
|
||||
interface Integration {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
type: string; // "scm.github", "registry.harbor"
|
||||
name: string; // user-defined name
|
||||
config: IntegrationConfig; // type-specific config
|
||||
credentialId: UUID; // reference to vault
|
||||
healthStatus: HealthStatus;
|
||||
lastHealthCheck: DateTime;
|
||||
createdAt: DateTime;
|
||||
updatedAt: DateTime;
|
||||
}
|
||||
|
||||
interface IntegrationConfig {
|
||||
endpoint: string;
|
||||
authMode: "token" | "oauth" | "mtls" | "iam";
|
||||
timeout: number;
|
||||
retryPolicy: RetryPolicy;
|
||||
customHeaders?: Record<string, string>;
|
||||
// Type-specific fields added by plugin
|
||||
[key: string]: any;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `connection-profiles`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Default settings management; "last used" pattern |
|
||||
| **Dependencies** | `integration-manager` |
|
||||
| **Data Entities** | `ConnectionProfile`, `ProfileTemplate` |
|
||||
|
||||
**Behavior**: When user adds a new integration instance:
|
||||
1. Wizard defaults to last used endpoint, auth mode, network settings
|
||||
2. Secrets are **never** auto-reused (explicit confirmation required)
|
||||
3. User can save as named profile for reuse
|
||||
|
||||
**Profile Entity**:
|
||||
```typescript
|
||||
interface ConnectionProfile {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string; // "Production GitHub"
|
||||
integrationType: string;
|
||||
defaultConfig: Partial<IntegrationConfig>;
|
||||
isDefault: boolean;
|
||||
lastUsedAt: DateTime;
|
||||
createdBy: UUID;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `connector-runtime`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Execute plugin connector logic in controlled environment |
|
||||
| **Dependencies** | `plugin-loader`, `plugin-sandbox` |
|
||||
| **Protocol** | gRPC (preferred) or HTTP/REST |
|
||||
|
||||
**Connector Interface** (implemented by plugins):
|
||||
```protobuf
|
||||
service Connector {
|
||||
// Connection management
|
||||
rpc TestConnection(TestConnectionRequest) returns (TestConnectionResponse);
|
||||
rpc GetHealth(HealthRequest) returns (HealthResponse);
|
||||
|
||||
// Resource discovery
|
||||
rpc DiscoverResources(DiscoverRequest) returns (DiscoverResponse);
|
||||
rpc ListRepositories(ListReposRequest) returns (ListReposResponse);
|
||||
rpc ListBranches(ListBranchesRequest) returns (ListBranchesResponse);
|
||||
rpc ListTags(ListTagsRequest) returns (ListTagsResponse);
|
||||
|
||||
// Registry operations
|
||||
rpc ResolveTagToDigest(ResolveRequest) returns (ResolveResponse);
|
||||
rpc FetchManifest(ManifestRequest) returns (ManifestResponse);
|
||||
rpc VerifyDigest(VerifyRequest) returns (VerifyResponse);
|
||||
|
||||
// Secrets operations
|
||||
rpc GetSecretsRef(SecretsRequest) returns (SecretsResponse);
|
||||
rpc FetchSecret(FetchSecretRequest) returns (FetchSecretResponse);
|
||||
|
||||
// Workflow step execution
|
||||
rpc ExecuteStep(StepRequest) returns (stream StepResponse);
|
||||
rpc CancelStep(CancelRequest) returns (CancelResponse);
|
||||
}
|
||||
```
|
||||
|
||||
**Request/Response Types**:
|
||||
```protobuf
|
||||
message TestConnectionRequest {
|
||||
string integration_id = 1;
|
||||
map<string, string> config = 2;
|
||||
string credential_ref = 3;
|
||||
}
|
||||
|
||||
message TestConnectionResponse {
|
||||
bool success = 1;
|
||||
string error_message = 2;
|
||||
map<string, string> details = 3;
|
||||
int64 latency_ms = 4;
|
||||
}
|
||||
|
||||
message ResolveRequest {
|
||||
string integration_id = 1;
|
||||
string image_ref = 2; // "myapp:v2.3.1"
|
||||
}
|
||||
|
||||
message ResolveResponse {
|
||||
string digest = 1; // "sha256:abc123..."
|
||||
string manifest_type = 2;
|
||||
int64 size_bytes = 3;
|
||||
google.protobuf.Timestamp pushed_at = 4;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `doctor-checks`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Integration health diagnostics; troubleshooting |
|
||||
| **Dependencies** | `integration-manager`, `connector-runtime` |
|
||||
|
||||
**Doctor Check Types**:
|
||||
|
||||
| Check | Purpose | Pass Criteria |
|
||||
|-------|---------|---------------|
|
||||
| **Connectivity** | Can reach endpoint | TCP connect succeeds |
|
||||
| **TLS** | Certificate valid | Chain validates, not expired |
|
||||
| **Authentication** | Credentials valid | Auth request succeeds |
|
||||
| **Authorization** | Permissions sufficient | Required scopes present |
|
||||
| **Version** | API version supported | Version in supported range |
|
||||
| **Rate Limit** | Quota available | >10% remaining |
|
||||
| **Latency** | Response time acceptable | <5s p99 |
|
||||
|
||||
**Doctor Check Output**:
|
||||
```typescript
|
||||
interface DoctorCheckResult {
|
||||
checkType: string;
|
||||
status: "pass" | "warn" | "fail";
|
||||
message: string;
|
||||
details: Record<string, any>;
|
||||
suggestions: string[];
|
||||
runAt: DateTime;
|
||||
durationMs: number;
|
||||
}
|
||||
|
||||
interface DoctorReport {
|
||||
integrationId: UUID;
|
||||
overallStatus: "healthy" | "degraded" | "unhealthy";
|
||||
checks: DoctorCheckResult[];
|
||||
generatedAt: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cache Eviction Policies
|
||||
|
||||
Integration health status and connector results are cached to reduce load on external systems. **All caches MUST have bounded size and TTL-based eviction**:
|
||||
|
||||
| Cache Type | Purpose | TTL | Max Size | Eviction Strategy |
|
||||
|-----------|---------|-----|----------|-------------------|
|
||||
| **Health Checks** | Integration health status | 5 minutes | 1,000 entries | Sliding expiration |
|
||||
| **Connection Tests** | Test connection results | 2 minutes | 500 entries | Sliding expiration |
|
||||
| **Resource Discovery** | Discovered resources (repos, tags) | 10 minutes | 5,000 entries | Sliding expiration |
|
||||
| **Tag Resolution** | Tag → digest mappings | 1 hour | 10,000 entries | Absolute expiration |
|
||||
|
||||
**Implementation**:
|
||||
```csharp
|
||||
public class IntegrationHealthCache
|
||||
{
|
||||
private readonly MemoryCache _cache;
|
||||
|
||||
public IntegrationHealthCache()
|
||||
{
|
||||
_cache = new MemoryCache(new MemoryCacheOptions
|
||||
{
|
||||
SizeLimit = 1_000 // Max 1,000 integration health entries
|
||||
});
|
||||
}
|
||||
|
||||
public void CacheHealthStatus(Guid integrationId, HealthStatus status)
|
||||
{
|
||||
_cache.Set(integrationId, status, new MemoryCacheEntryOptions
|
||||
{
|
||||
Size = 1,
|
||||
SlidingExpiration = TimeSpan.FromMinutes(5) // 5-minute TTL
|
||||
});
|
||||
}
|
||||
|
||||
public HealthStatus? GetCachedHealthStatus(Guid integrationId)
|
||||
=> _cache.Get<HealthStatus>(integrationId);
|
||||
}
|
||||
```
|
||||
|
||||
**Reference**: See [Implementation Guide](../implementation-guide.md#caching) for cache implementation patterns.
|
||||
|
||||
---
|
||||
|
||||
## Integration Types
|
||||
|
||||
The following integration types are supported (via plugins):
|
||||
|
||||
### SCM Integrations
|
||||
|
||||
| Type | Plugin | Capabilities |
|
||||
|------|--------|--------------|
|
||||
| `scm.github` | Built-in | repos, branches, commits, webhooks, status |
|
||||
| `scm.gitlab` | Built-in | repos, branches, commits, webhooks, pipelines |
|
||||
| `scm.bitbucket` | Plugin | repos, branches, commits, webhooks |
|
||||
| `scm.azure_repos` | Plugin | repos, branches, commits, pipelines |
|
||||
|
||||
### Registry Integrations
|
||||
|
||||
| Type | Plugin | Capabilities |
|
||||
|------|--------|--------------|
|
||||
| `registry.harbor` | Built-in | repos, tags, digests, scanning status |
|
||||
| `registry.ecr` | Plugin | repos, tags, digests, IAM auth |
|
||||
| `registry.gcr` | Plugin | repos, tags, digests |
|
||||
| `registry.dockerhub` | Plugin | repos, tags, digests |
|
||||
| `registry.ghcr` | Plugin | repos, tags, digests |
|
||||
| `registry.acr` | Plugin | repos, tags, digests |
|
||||
|
||||
### Vault Integrations
|
||||
|
||||
| Type | Plugin | Capabilities |
|
||||
|------|--------|--------------|
|
||||
| `vault.hashicorp` | Built-in | KV, transit, PKI |
|
||||
| `vault.aws_secrets` | Plugin | secrets, IAM auth |
|
||||
| `vault.azure_keyvault` | Plugin | secrets, certificates |
|
||||
| `vault.gcp_secrets` | Plugin | secrets, IAM auth |
|
||||
|
||||
### CI Integrations
|
||||
|
||||
| Type | Plugin | Capabilities |
|
||||
|------|--------|--------------|
|
||||
| `ci.github_actions` | Built-in | workflows, runs, artifacts, status |
|
||||
| `ci.gitlab_ci` | Built-in | pipelines, jobs, artifacts |
|
||||
| `ci.jenkins` | Plugin | jobs, builds, artifacts |
|
||||
| `ci.azure_pipelines` | Plugin | pipelines, runs, artifacts |
|
||||
|
||||
### Router Integrations (for Progressive Delivery)
|
||||
|
||||
| Type | Plugin | Capabilities |
|
||||
|------|--------|--------------|
|
||||
| `router.nginx` | Plugin | upstream config, reload |
|
||||
| `router.haproxy` | Plugin | backend config, reload |
|
||||
| `router.traefik` | Plugin | dynamic config |
|
||||
| `router.aws_alb` | Plugin | target groups, listener rules |
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Integration types (populated by plugins)
|
||||
CREATE TABLE release.integration_types (
|
||||
id TEXT PRIMARY KEY, -- "scm.github"
|
||||
plugin_id UUID REFERENCES release.plugins(id),
|
||||
display_name TEXT NOT NULL,
|
||||
description TEXT,
|
||||
icon_url TEXT,
|
||||
config_schema JSONB NOT NULL, -- JSON Schema for config
|
||||
capabilities TEXT[] NOT NULL, -- ["repos", "webhooks", "status"]
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
-- Integration instances
|
||||
CREATE TABLE release.integrations (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id),
|
||||
type_id TEXT NOT NULL REFERENCES release.integration_types(id),
|
||||
name TEXT NOT NULL,
|
||||
config JSONB NOT NULL,
|
||||
credential_ref TEXT NOT NULL, -- vault reference
|
||||
health_status TEXT NOT NULL DEFAULT 'unknown',
|
||||
last_health_check TIMESTAMPTZ,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
created_by UUID NOT NULL REFERENCES users(id),
|
||||
UNIQUE(tenant_id, name)
|
||||
);
|
||||
|
||||
-- Connection profiles
|
||||
CREATE TABLE release.connection_profiles (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id),
|
||||
name TEXT NOT NULL,
|
||||
integration_type TEXT NOT NULL,
|
||||
default_config JSONB NOT NULL,
|
||||
is_default BOOLEAN NOT NULL DEFAULT false,
|
||||
last_used_at TIMESTAMPTZ,
|
||||
created_by UUID NOT NULL REFERENCES users(id),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
UNIQUE(tenant_id, name)
|
||||
);
|
||||
|
||||
-- Doctor check history
|
||||
CREATE TABLE release.doctor_checks (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
integration_id UUID NOT NULL REFERENCES release.integrations(id),
|
||||
check_type TEXT NOT NULL,
|
||||
status TEXT NOT NULL,
|
||||
message TEXT,
|
||||
details JSONB,
|
||||
duration_ms INTEGER NOT NULL,
|
||||
run_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_doctor_checks_integration ON release.doctor_checks(integration_id, run_at DESC);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
See [API Documentation](../api/overview.md) for full specification.
|
||||
|
||||
```
|
||||
GET /api/v1/integration-types # List available types
|
||||
GET /api/v1/integration-types/{type} # Get type details
|
||||
|
||||
GET /api/v1/integrations # List integrations
|
||||
POST /api/v1/integrations # Create integration
|
||||
GET /api/v1/integrations/{id} # Get integration
|
||||
PUT /api/v1/integrations/{id} # Update integration
|
||||
DELETE /api/v1/integrations/{id} # Delete integration
|
||||
POST /api/v1/integrations/{id}/test # Test connection
|
||||
GET /api/v1/integrations/{id}/health # Get health status
|
||||
POST /api/v1/integrations/{id}/doctor # Run doctor checks
|
||||
GET /api/v1/integrations/{id}/resources # Discover resources
|
||||
|
||||
GET /api/v1/connection-profiles # List profiles
|
||||
POST /api/v1/connection-profiles # Create profile
|
||||
GET /api/v1/connection-profiles/{id} # Get profile
|
||||
PUT /api/v1/connection-profiles/{id} # Update profile
|
||||
DELETE /api/v1/connection-profiles/{id} # Delete profile
|
||||
```
|
||||
203
docs/modules/release-orchestrator/modules/overview.md
Normal file
203
docs/modules/release-orchestrator/modules/overview.md
Normal file
@@ -0,0 +1,203 @@
|
||||
# Module Landscape Overview
|
||||
|
||||
The Stella Ops Suite comprises existing modules (vulnerability scanning) and new modules (release orchestration). Modules are organized into **themes** (functional areas).
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ STELLA OPS SUITE │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ EXISTING THEMES (Vulnerability) │ │
|
||||
│ │ │ │
|
||||
│ │ INGEST VEXOPS REASON SCANENG EVIDENCE │ │
|
||||
│ │ ├─concelier ├─excititor ├─policy ├─scanner ├─locker │ │
|
||||
│ │ └─advisory-ai └─linksets └─opa-runtime ├─sbom-gen ├─export │ │
|
||||
│ │ └─reachability └─timeline │ │
|
||||
│ │ │ │
|
||||
│ │ RUNTIME JOBCTRL OBSERVE REPLAY DEVEXP │ │
|
||||
│ │ ├─signals ├─scheduler ├─notifier └─replay-core ├─cli │ │
|
||||
│ │ ├─graph ├─orchestrator └─telemetry ├─web-ui │ │
|
||||
│ │ └─zastava └─task-runner └─sdk │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ NEW THEMES (Release Orchestration) │ │
|
||||
│ │ │ │
|
||||
│ │ INTHUB (Integration Hub) │ │
|
||||
│ │ ├─integration-manager Central registry of configured integrations │ │
|
||||
│ │ ├─connection-profiles Default settings + credential management │ │
|
||||
│ │ ├─connector-runtime Plugin connector execution environment │ │
|
||||
│ │ └─doctor-checks Integration health diagnostics │ │
|
||||
│ │ │ │
|
||||
│ │ ENVMGR (Environment & Inventory) │ │
|
||||
│ │ ├─environment-manager Environment CRUD, ordering, config │ │
|
||||
│ │ ├─target-registry Deployment targets (hosts/services) │ │
|
||||
│ │ ├─agent-manager Agent registration, health, capabilities │ │
|
||||
│ │ └─inventory-sync Drift detection, state reconciliation │ │
|
||||
│ │ │ │
|
||||
│ │ RELMAN (Release Management) │ │
|
||||
│ │ ├─component-registry Image repos → components mapping │ │
|
||||
│ │ ├─version-manager Tag/digest → semver mapping │ │
|
||||
│ │ ├─release-manager Release bundle lifecycle │ │
|
||||
│ │ └─release-catalog Release history, search, compare │ │
|
||||
│ │ │ │
|
||||
│ │ WORKFL (Workflow Engine) │ │
|
||||
│ │ ├─workflow-designer Template creation, step graph editor │ │
|
||||
│ │ ├─workflow-engine DAG execution, state machine │ │
|
||||
│ │ ├─step-executor Step dispatch, retry, timeout │ │
|
||||
│ │ └─step-registry Built-in + plugin-provided steps │ │
|
||||
│ │ │ │
|
||||
│ │ PROMOT (Promotion & Approval) │ │
|
||||
│ │ ├─promotion-manager Promotion request lifecycle │ │
|
||||
│ │ ├─approval-gateway Approval collection, SoD enforcement │ │
|
||||
│ │ ├─decision-engine Gate evaluation, policy integration │ │
|
||||
│ │ └─gate-registry Built-in + custom gates │ │
|
||||
│ │ │ │
|
||||
│ │ DEPLOY (Deployment Execution) │ │
|
||||
│ │ ├─deploy-orchestrator Deployment job coordination │ │
|
||||
│ │ ├─target-executor Target-specific deployment logic │ │
|
||||
│ │ ├─runner-executor Script/hook execution sandbox │ │
|
||||
│ │ ├─artifact-generator Compose/script artifact generation │ │
|
||||
│ │ └─rollback-manager Rollback orchestration │ │
|
||||
│ │ │ │
|
||||
│ │ AGENTS (Deployment Agents) │ │
|
||||
│ │ ├─agent-core Shared agent runtime │ │
|
||||
│ │ ├─agent-docker Docker host agent │ │
|
||||
│ │ ├─agent-compose Docker Compose agent │ │
|
||||
│ │ ├─agent-ssh SSH remote executor │ │
|
||||
│ │ ├─agent-winrm WinRM remote executor │ │
|
||||
│ │ ├─agent-ecs AWS ECS agent │ │
|
||||
│ │ └─agent-nomad HashiCorp Nomad agent │ │
|
||||
│ │ │ │
|
||||
│ │ PROGDL (Progressive Delivery) │ │
|
||||
│ │ ├─ab-manager A/B release coordination │ │
|
||||
│ │ ├─traffic-router Router plugin orchestration │ │
|
||||
│ │ ├─canary-controller Canary ramp automation │ │
|
||||
│ │ └─rollout-strategy Strategy templates │ │
|
||||
│ │ │ │
|
||||
│ │ RELEVI (Release Evidence) │ │
|
||||
│ │ ├─evidence-collector Evidence aggregation │ │
|
||||
│ │ ├─evidence-signer Cryptographic signing │ │
|
||||
│ │ ├─sticker-writer Version sticker generation │ │
|
||||
│ │ └─audit-exporter Compliance report generation │ │
|
||||
│ │ │ │
|
||||
│ │ PLUGIN (Plugin Infrastructure) │ │
|
||||
│ │ ├─plugin-registry Plugin discovery, versioning │ │
|
||||
│ │ ├─plugin-loader Plugin lifecycle management │ │
|
||||
│ │ ├─plugin-sandbox Isolation, resource limits │ │
|
||||
│ │ └─plugin-sdk SDK for plugin development │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Theme Summary
|
||||
|
||||
### Existing Themes (Vulnerability Scanning)
|
||||
|
||||
| Theme | Purpose | Key Modules |
|
||||
|-------|---------|-------------|
|
||||
| **INGEST** | Advisory ingestion | concelier, advisory-ai |
|
||||
| **VEXOPS** | VEX document handling | excititor, linksets |
|
||||
| **REASON** | Policy and decisioning | policy, opa-runtime |
|
||||
| **SCANENG** | Scanning and SBOM | scanner, sbom-gen, reachability |
|
||||
| **EVIDENCE** | Evidence and attestation | locker, export, timeline |
|
||||
| **RUNTIME** | Runtime signals | signals, graph, zastava |
|
||||
| **JOBCTRL** | Job orchestration | scheduler, orchestrator, task-runner |
|
||||
| **OBSERVE** | Observability | notifier, telemetry |
|
||||
| **REPLAY** | Deterministic replay | replay-core |
|
||||
| **DEVEXP** | Developer experience | cli, web-ui, sdk |
|
||||
|
||||
### New Themes (Release Orchestration)
|
||||
|
||||
| Theme | Purpose | Key Modules | Documentation |
|
||||
|-------|---------|-------------|---------------|
|
||||
| **INTHUB** | Integration hub | integration-manager, connection-profiles, connector-runtime, doctor-checks | [Details](integration-hub.md) |
|
||||
| **ENVMGR** | Environment & inventory | environment-manager, target-registry, agent-manager, inventory-sync | [Details](environment-manager.md) |
|
||||
| **RELMAN** | Release management | component-registry, version-manager, release-manager, release-catalog | [Details](release-manager.md) |
|
||||
| **WORKFL** | Workflow engine | workflow-designer, workflow-engine, step-executor, step-registry | [Details](workflow-engine.md) |
|
||||
| **PROMOT** | Promotion & approval | promotion-manager, approval-gateway, decision-engine, gate-registry | [Details](promotion-manager.md) |
|
||||
| **DEPLOY** | Deployment execution | deploy-orchestrator, target-executor, runner-executor, artifact-generator, rollback-manager | [Details](deploy-orchestrator.md) |
|
||||
| **AGENTS** | Deployment agents | agent-core, agent-docker, agent-compose, agent-ssh, agent-winrm, agent-ecs, agent-nomad | [Details](agents.md) |
|
||||
| **PROGDL** | Progressive delivery | ab-manager, traffic-router, canary-controller, rollout-strategy | [Details](progressive-delivery.md) |
|
||||
| **RELEVI** | Release evidence | evidence-collector, evidence-signer, sticker-writer, audit-exporter | [Details](evidence.md) |
|
||||
| **PLUGIN** | Plugin infrastructure | plugin-registry, plugin-loader, plugin-sandbox, plugin-sdk | [Details](plugin-system.md) |
|
||||
|
||||
## Module Dependencies
|
||||
|
||||
```
|
||||
┌──────────────┐
|
||||
│ AUTHORITY │
|
||||
└──────┬───────┘
|
||||
│
|
||||
┌──────────────────┼──────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
|
||||
│ INTHUB │ │ ENVMGR │ │ PLUGIN │
|
||||
│ (Integrations)│ │ (Environments)│ │ (Plugins) │
|
||||
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
|
||||
│ │ │
|
||||
└──────────┬───────┴──────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ RELMAN │
|
||||
│ (Releases) │
|
||||
└───────┬───────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ WORKFL │
|
||||
│ (Workflows) │
|
||||
└───────┬───────┘
|
||||
│
|
||||
┌──────────┴──────────┐
|
||||
│ │
|
||||
▼ ▼
|
||||
┌───────────────┐ ┌───────────────┐
|
||||
│ PROMOT │ │ DEPLOY │
|
||||
│ (Promotion) │ │ (Deployment) │
|
||||
└───────┬───────┘ └───────┬───────┘
|
||||
│ │
|
||||
│ ▼
|
||||
│ ┌───────────────┐
|
||||
│ │ AGENTS │
|
||||
│ │ (Agents) │
|
||||
│ └───────┬───────┘
|
||||
│ │
|
||||
└──────────┬──────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ RELEVI │
|
||||
│ (Evidence) │
|
||||
└───────────────┘
|
||||
```
|
||||
|
||||
## Communication Patterns
|
||||
|
||||
| Pattern | Usage |
|
||||
|---------|-------|
|
||||
| **Synchronous API** | User-initiated operations (CRUD, queries) |
|
||||
| **Event Bus** | Cross-module notifications (domain events) |
|
||||
| **Task Queue** | Long-running operations (deployments, syncs) |
|
||||
| **WebSocket/SSE** | Real-time UI updates |
|
||||
| **gRPC Streams** | Agent communication |
|
||||
|
||||
## Database Schema Organization
|
||||
|
||||
Each theme owns a PostgreSQL schema:
|
||||
|
||||
| Schema | Owner Theme |
|
||||
|--------|-------------|
|
||||
| `release.integrations` | INTHUB |
|
||||
| `release.environments` | ENVMGR |
|
||||
| `release.components` | RELMAN |
|
||||
| `release.workflows` | WORKFL |
|
||||
| `release.promotions` | PROMOT |
|
||||
| `release.deployments` | DEPLOY |
|
||||
| `release.agents` | AGENTS |
|
||||
| `release.evidence` | RELEVI |
|
||||
| `release.plugins` | PLUGIN |
|
||||
629
docs/modules/release-orchestrator/modules/plugin-system.md
Normal file
629
docs/modules/release-orchestrator/modules/plugin-system.md
Normal file
@@ -0,0 +1,629 @@
|
||||
# PLUGIN: Plugin Infrastructure
|
||||
|
||||
**Purpose**: Extensible plugin system for integrations, steps, and custom functionality.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ PLUGIN ARCHITECTURE │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PLUGIN REGISTRY │ │
|
||||
│ │ │ │
|
||||
│ │ - Plugin discovery and versioning │ │
|
||||
│ │ - Manifest validation │ │
|
||||
│ │ - Dependency resolution │ │
|
||||
│ └──────────────────────────────┬──────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PLUGIN LOADER │ │
|
||||
│ │ │ │
|
||||
│ │ - Lifecycle management (load, start, stop, unload) │ │
|
||||
│ │ - Health monitoring │ │
|
||||
│ │ - Hot reload support │ │
|
||||
│ └──────────────────────────────┬──────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PLUGIN SANDBOX │ │
|
||||
│ │ │ │
|
||||
│ │ - Process isolation │ │
|
||||
│ │ - Resource limits (CPU, memory, network) │ │
|
||||
│ │ - Capability enforcement │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Plugin Types: │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Connector │ │ Step │ │ Gate │ │ Agent │ │
|
||||
│ │ Plugins │ │ Providers │ │ Providers │ │ Plugins │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `plugin-registry`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Plugin discovery; versioning; manifest management |
|
||||
| **Data Entities** | `Plugin`, `PluginManifest`, `PluginVersion` |
|
||||
| **Events Produced** | `plugin.discovered`, `plugin.registered`, `plugin.unregistered` |
|
||||
|
||||
**Plugin Entity**:
|
||||
```typescript
|
||||
interface Plugin {
|
||||
id: UUID;
|
||||
pluginId: string; // "com.example.my-connector"
|
||||
version: string; // "1.2.3"
|
||||
vendor: string;
|
||||
license: string;
|
||||
manifest: PluginManifest;
|
||||
status: PluginStatus;
|
||||
entrypoint: string; // Path to plugin executable/module
|
||||
lastHealthCheck: DateTime;
|
||||
healthMessage: string | null;
|
||||
installedAt: DateTime;
|
||||
updatedAt: DateTime;
|
||||
}
|
||||
|
||||
type PluginStatus =
|
||||
| "discovered" // Found but not loaded
|
||||
| "loaded" // Loaded but not active
|
||||
| "active" // Running and healthy
|
||||
| "stopped" // Manually stopped
|
||||
| "failed" // Failed to load or crashed
|
||||
| "degraded"; // Running but with issues
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `plugin-loader`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Plugin lifecycle management |
|
||||
| **Dependencies** | `plugin-registry`, `plugin-sandbox` |
|
||||
| **Events Produced** | `plugin.loaded`, `plugin.started`, `plugin.stopped`, `plugin.failed` |
|
||||
|
||||
**Plugin Lifecycle**:
|
||||
```
|
||||
┌──────────────┐
|
||||
│ DISCOVERED │ ──── Plugin found in registry
|
||||
└──────┬───────┘
|
||||
│ load()
|
||||
▼
|
||||
┌──────────────┐
|
||||
│ LOADED │ ──── Plugin validated and prepared
|
||||
└──────┬───────┘
|
||||
│ start()
|
||||
▼
|
||||
┌──────────────┐ ┌──────────────┐
|
||||
│ ACTIVE │ ──── │ DEGRADED │ ◄── Health issues
|
||||
└──────┬───────┘ └──────────────┘
|
||||
│ stop() │
|
||||
▼ │
|
||||
┌──────────────┐ │
|
||||
│ STOPPED │ ◄───────────┘ manual stop
|
||||
└──────────────┘
|
||||
|
||||
│ unload()
|
||||
▼
|
||||
┌──────────────┐
|
||||
│ UNLOADED │
|
||||
└──────────────┘
|
||||
```
|
||||
|
||||
**Lifecycle Operations**:
|
||||
```typescript
|
||||
interface PluginLoader {
|
||||
// Discovery
|
||||
discover(): Promise<Plugin[]>;
|
||||
refresh(): Promise<void>;
|
||||
|
||||
// Lifecycle
|
||||
load(pluginId: string): Promise<Plugin>;
|
||||
start(pluginId: string): Promise<void>;
|
||||
stop(pluginId: string): Promise<void>;
|
||||
unload(pluginId: string): Promise<void>;
|
||||
restart(pluginId: string): Promise<void>;
|
||||
|
||||
// Health
|
||||
checkHealth(pluginId: string): Promise<HealthStatus>;
|
||||
getStatus(pluginId: string): Promise<PluginStatus>;
|
||||
|
||||
// Hot reload
|
||||
reload(pluginId: string): Promise<void>;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `plugin-sandbox`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Isolation; resource limits; security |
|
||||
| **Enforcement** | Process isolation, capability-based security |
|
||||
|
||||
**Sandbox Configuration**:
|
||||
```typescript
|
||||
interface SandboxConfig {
|
||||
// Process isolation
|
||||
processIsolation: boolean; // Run in separate process
|
||||
containerIsolation: boolean; // Run in container
|
||||
|
||||
// Resource limits
|
||||
resourceLimits: {
|
||||
maxMemoryMb: number; // Memory limit
|
||||
maxCpuPercent: number; // CPU limit
|
||||
maxDiskMb: number; // Disk quota
|
||||
maxNetworkBandwidth: number; // Network bandwidth limit
|
||||
};
|
||||
|
||||
// Network restrictions
|
||||
networkPolicy: {
|
||||
allowedHosts: string[]; // Allowed outbound hosts
|
||||
blockedHosts: string[]; // Blocked hosts
|
||||
allowOutbound: boolean; // Allow any outbound
|
||||
};
|
||||
|
||||
// Filesystem restrictions
|
||||
filesystemPolicy: {
|
||||
readOnlyPaths: string[];
|
||||
writablePaths: string[];
|
||||
blockedPaths: string[];
|
||||
};
|
||||
|
||||
// Timeouts
|
||||
timeouts: {
|
||||
initializationMs: number;
|
||||
operationMs: number;
|
||||
shutdownMs: number;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Capability Enforcement**:
|
||||
```typescript
|
||||
interface PluginCapabilities {
|
||||
// Integration capabilities
|
||||
integrations: {
|
||||
scm: boolean;
|
||||
ci: boolean;
|
||||
registry: boolean;
|
||||
vault: boolean;
|
||||
router: boolean;
|
||||
};
|
||||
|
||||
// Step capabilities
|
||||
steps: {
|
||||
deploy: boolean;
|
||||
gate: boolean;
|
||||
notify: boolean;
|
||||
custom: boolean;
|
||||
};
|
||||
|
||||
// System capabilities
|
||||
system: {
|
||||
network: boolean;
|
||||
filesystem: boolean;
|
||||
secrets: boolean;
|
||||
database: boolean;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `plugin-sdk`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | SDK for plugin development |
|
||||
| **Languages** | C#, TypeScript, Go |
|
||||
|
||||
**Plugin SDK Interface**:
|
||||
```typescript
|
||||
// Base plugin interface
|
||||
interface StellaPlugin {
|
||||
// Lifecycle
|
||||
initialize(config: PluginConfig): Promise<void>;
|
||||
start(): Promise<void>;
|
||||
stop(): Promise<void>;
|
||||
dispose(): Promise<void>;
|
||||
|
||||
// Health
|
||||
getHealth(): Promise<HealthStatus>;
|
||||
|
||||
// Metadata
|
||||
getManifest(): PluginManifest;
|
||||
}
|
||||
|
||||
// Connector plugin interface
|
||||
interface ConnectorPlugin extends StellaPlugin {
|
||||
createConnector(config: ConnectorConfig): Promise<Connector>;
|
||||
}
|
||||
|
||||
// Step provider plugin interface
|
||||
interface StepProviderPlugin extends StellaPlugin {
|
||||
getStepTypes(): StepType[];
|
||||
executeStep(
|
||||
stepType: string,
|
||||
config: StepConfig,
|
||||
inputs: StepInputs,
|
||||
context: StepContext
|
||||
): AsyncGenerator<StepEvent>;
|
||||
}
|
||||
|
||||
// Gate provider plugin interface
|
||||
interface GateProviderPlugin extends StellaPlugin {
|
||||
getGateTypes(): GateType[];
|
||||
evaluateGate(
|
||||
gateType: string,
|
||||
config: GateConfig,
|
||||
context: GateContext
|
||||
): Promise<GateResult>;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Three-Surface Plugin Model
|
||||
|
||||
Plugins contribute to the system through three distinct surfaces:
|
||||
|
||||
### 1. Manifest Surface (Static)
|
||||
|
||||
The plugin manifest declares:
|
||||
- Plugin identity and version
|
||||
- Required capabilities
|
||||
- Provided integrations/steps/gates
|
||||
- Configuration schema
|
||||
- UI components (optional)
|
||||
|
||||
```yaml
|
||||
# plugin.stella.yaml
|
||||
plugin:
|
||||
id: "com.example.jenkins-connector"
|
||||
version: "1.0.0"
|
||||
vendor: "Example Corp"
|
||||
license: "Apache-2.0"
|
||||
description: "Jenkins CI integration for Stella Ops"
|
||||
|
||||
capabilities:
|
||||
required:
|
||||
- network
|
||||
optional:
|
||||
- secrets
|
||||
|
||||
provides:
|
||||
integrations:
|
||||
- type: "ci.jenkins"
|
||||
displayName: "Jenkins"
|
||||
configSchema: "./schemas/jenkins-config.json"
|
||||
capabilities:
|
||||
- "pipelines"
|
||||
- "builds"
|
||||
- "artifacts"
|
||||
|
||||
steps:
|
||||
- type: "jenkins-trigger"
|
||||
displayName: "Trigger Jenkins Build"
|
||||
category: "integration"
|
||||
configSchema: "./schemas/jenkins-trigger-config.json"
|
||||
inputSchema: "./schemas/jenkins-trigger-input.json"
|
||||
outputSchema: "./schemas/jenkins-trigger-output.json"
|
||||
|
||||
ui:
|
||||
configScreen: "./ui/config.html"
|
||||
icon: "./assets/jenkins-icon.svg"
|
||||
|
||||
dependencies:
|
||||
stellaCore: ">=1.0.0"
|
||||
```
|
||||
|
||||
### 2. Connector Runtime Surface (Dynamic)
|
||||
|
||||
Plugins implement connector interfaces for runtime operations:
|
||||
|
||||
```typescript
|
||||
// Jenkins connector implementation
|
||||
class JenkinsConnector implements CIConnector {
|
||||
private client: JenkinsClient;
|
||||
|
||||
async initialize(config: ConnectorConfig, secrets: SecretHandle[]): Promise<void> {
|
||||
const apiToken = await this.getSecret(secrets, "api_token");
|
||||
this.client = new JenkinsClient({
|
||||
baseUrl: config.endpoint,
|
||||
username: config.username,
|
||||
apiToken: apiToken,
|
||||
});
|
||||
}
|
||||
|
||||
async testConnection(): Promise<ConnectionTestResult> {
|
||||
try {
|
||||
const crumb = await this.client.getCrumb();
|
||||
return { success: true, message: "Connected to Jenkins" };
|
||||
} catch (error) {
|
||||
return { success: false, message: error.message };
|
||||
}
|
||||
}
|
||||
|
||||
async listPipelines(): Promise<PipelineInfo[]> {
|
||||
const jobs = await this.client.getJobs();
|
||||
return jobs.map(job => ({
|
||||
id: job.name,
|
||||
name: job.displayName,
|
||||
url: job.url,
|
||||
lastBuild: job.lastBuild?.number,
|
||||
}));
|
||||
}
|
||||
|
||||
async triggerPipeline(pipelineId: string, params: object): Promise<PipelineRun> {
|
||||
const queueItem = await this.client.build(pipelineId, params);
|
||||
return {
|
||||
id: queueItem.id.toString(),
|
||||
pipelineId,
|
||||
status: "queued",
|
||||
startedAt: new Date(),
|
||||
};
|
||||
}
|
||||
|
||||
async getPipelineRun(runId: string): Promise<PipelineRun> {
|
||||
const build = await this.client.getBuild(runId);
|
||||
return {
|
||||
id: build.number.toString(),
|
||||
pipelineId: build.job,
|
||||
status: this.mapStatus(build.result),
|
||||
startedAt: new Date(build.timestamp),
|
||||
completedAt: build.result ? new Date(build.timestamp + build.duration) : null,
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Step Provider Surface (Execution)
|
||||
|
||||
Plugins implement step execution logic:
|
||||
|
||||
```typescript
|
||||
// Jenkins trigger step implementation
|
||||
class JenkinsTriggerStep implements StepExecutor {
|
||||
async *execute(
|
||||
config: StepConfig,
|
||||
inputs: StepInputs,
|
||||
context: StepContext
|
||||
): AsyncGenerator<StepEvent> {
|
||||
const connector = await context.getConnector<JenkinsConnector>(config.integrationId);
|
||||
|
||||
yield { type: "log", line: `Triggering Jenkins job: ${config.jobName}` };
|
||||
|
||||
// Trigger build
|
||||
const run = await connector.triggerPipeline(config.jobName, inputs.parameters);
|
||||
yield { type: "output", name: "buildId", value: run.id };
|
||||
yield { type: "log", line: `Build queued: ${run.id}` };
|
||||
|
||||
// Wait for completion if configured
|
||||
if (config.waitForCompletion) {
|
||||
yield { type: "log", line: "Waiting for build to complete..." };
|
||||
|
||||
while (true) {
|
||||
const status = await connector.getPipelineRun(run.id);
|
||||
|
||||
if (status.status === "succeeded") {
|
||||
yield { type: "output", name: "status", value: "succeeded" };
|
||||
yield { type: "result", success: true };
|
||||
return;
|
||||
}
|
||||
|
||||
if (status.status === "failed") {
|
||||
yield { type: "output", name: "status", value: "failed" };
|
||||
yield { type: "result", success: false, message: "Build failed" };
|
||||
return;
|
||||
}
|
||||
|
||||
yield { type: "progress", progress: 50, message: `Build running: ${status.status}` };
|
||||
await sleep(config.pollIntervalSeconds * 1000);
|
||||
}
|
||||
}
|
||||
|
||||
yield { type: "result", success: true };
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Plugins
|
||||
CREATE TABLE release.plugins (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
plugin_id VARCHAR(255) NOT NULL UNIQUE,
|
||||
version VARCHAR(50) NOT NULL,
|
||||
vendor VARCHAR(255) NOT NULL,
|
||||
license VARCHAR(100),
|
||||
manifest JSONB NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'discovered' CHECK (status IN (
|
||||
'discovered', 'loaded', 'active', 'stopped', 'failed', 'degraded'
|
||||
)),
|
||||
entrypoint VARCHAR(500) NOT NULL,
|
||||
last_health_check TIMESTAMPTZ,
|
||||
health_message TEXT,
|
||||
installed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_plugins_status ON release.plugins(status);
|
||||
|
||||
-- Plugin Instances (per-tenant configuration)
|
||||
CREATE TABLE release.plugin_instances (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
plugin_id UUID NOT NULL REFERENCES release.plugins(id) ON DELETE CASCADE,
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
config JSONB NOT NULL DEFAULT '{}',
|
||||
enabled BOOLEAN NOT NULL DEFAULT TRUE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_plugin_instances_tenant ON release.plugin_instances(tenant_id);
|
||||
|
||||
-- Integration types (populated by plugins)
|
||||
CREATE TABLE release.integration_types (
|
||||
id TEXT PRIMARY KEY, -- "scm.github", "ci.jenkins"
|
||||
plugin_id UUID REFERENCES release.plugins(id),
|
||||
display_name TEXT NOT NULL,
|
||||
description TEXT,
|
||||
icon_url TEXT,
|
||||
config_schema JSONB NOT NULL, -- JSON Schema for config
|
||||
capabilities TEXT[] NOT NULL, -- ["repos", "webhooks", "status"]
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Plugin Registry
|
||||
GET /api/v1/plugins
|
||||
Query: ?status={status}&capability={type}
|
||||
Response: Plugin[]
|
||||
|
||||
GET /api/v1/plugins/{id}
|
||||
Response: Plugin (with manifest)
|
||||
|
||||
POST /api/v1/plugins/{id}/enable
|
||||
Response: Plugin
|
||||
|
||||
POST /api/v1/plugins/{id}/disable
|
||||
Response: Plugin
|
||||
|
||||
GET /api/v1/plugins/{id}/health
|
||||
Response: { status, message, diagnostics[] }
|
||||
|
||||
# Plugin Instances (per-tenant config)
|
||||
POST /api/v1/plugin-instances
|
||||
Body: { pluginId: UUID, config: object }
|
||||
Response: PluginInstance
|
||||
|
||||
GET /api/v1/plugin-instances
|
||||
Response: PluginInstance[]
|
||||
|
||||
PUT /api/v1/plugin-instances/{id}
|
||||
Body: { config: object, enabled: boolean }
|
||||
Response: PluginInstance
|
||||
|
||||
DELETE /api/v1/plugin-instances/{id}
|
||||
Response: { deleted: true }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Plugin Security
|
||||
|
||||
### Capability Declaration
|
||||
|
||||
Plugins must declare all required capabilities in their manifest. The system enforces:
|
||||
|
||||
1. **Network Access**: Plugins can only access declared hosts
|
||||
2. **Secret Access**: Plugins receive secrets through controlled injection
|
||||
3. **Database Access**: No direct database access; API only
|
||||
4. **Filesystem Access**: Limited to declared paths
|
||||
|
||||
### Sandbox Enforcement
|
||||
|
||||
```typescript
|
||||
// Plugin execution is sandboxed
|
||||
class PluginSandbox {
|
||||
async execute<T>(
|
||||
plugin: Plugin,
|
||||
operation: () => Promise<T>
|
||||
): Promise<T> {
|
||||
// 1. Verify capabilities
|
||||
this.verifyCapabilities(plugin);
|
||||
|
||||
// 2. Set resource limits
|
||||
const limits = this.getResourceLimits(plugin);
|
||||
await this.applyLimits(limits);
|
||||
|
||||
// 3. Create isolated context
|
||||
const context = await this.createIsolatedContext(plugin);
|
||||
|
||||
try {
|
||||
// 4. Execute with timeout
|
||||
return await this.withTimeout(
|
||||
operation(),
|
||||
plugin.manifest.timeouts.operationMs
|
||||
);
|
||||
} catch (error) {
|
||||
// 5. Log and handle errors
|
||||
await this.handlePluginError(plugin, error);
|
||||
throw error;
|
||||
} finally {
|
||||
// 6. Cleanup
|
||||
await context.dispose();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Plugin Failures Cannot Crash Core
|
||||
|
||||
```csharp
|
||||
// Core orchestration is protected from plugin failures
|
||||
public sealed class PromotionDecisionEngine
|
||||
{
|
||||
public async Task<DecisionResult> EvaluateAsync(
|
||||
Promotion promotion,
|
||||
IReadOnlyList<IGateProvider> gates,
|
||||
CancellationToken ct)
|
||||
{
|
||||
var results = new List<GateResult>();
|
||||
|
||||
foreach (var gate in gates)
|
||||
{
|
||||
try
|
||||
{
|
||||
// Plugin provides evaluation logic
|
||||
var result = await gate.EvaluateAsync(promotion, ct);
|
||||
results.Add(result);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
// Plugin failure is logged but doesn't crash core
|
||||
_logger.LogError(ex, "Gate {GateType} failed", gate.Type);
|
||||
results.Add(new GateResult
|
||||
{
|
||||
GateType = gate.Type,
|
||||
Status = GateStatus.Failed,
|
||||
Message = $"Gate evaluation failed: {ex.Message}",
|
||||
IsBlocking = gate.IsBlocking,
|
||||
});
|
||||
}
|
||||
|
||||
// Core decides how to aggregate (plugins cannot override)
|
||||
if (results.Last().IsBlocking && _policy.FailFast)
|
||||
break;
|
||||
}
|
||||
|
||||
// Core makes final decision
|
||||
return _decisionAggregator.Aggregate(results);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Integration Hub](integration-hub.md)
|
||||
- [Workflow Engine](workflow-engine.md)
|
||||
- [Connector Interface](../integrations/connectors.md)
|
||||
@@ -0,0 +1,471 @@
|
||||
# PROGDL: Progressive Delivery
|
||||
|
||||
**Purpose**: A/B releases, canary deployments, and traffic management.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ PROGRESSIVE DELIVERY ARCHITECTURE │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ A/B RELEASE MANAGER │ │
|
||||
│ │ │ │
|
||||
│ │ - Create A/B release with variations │ │
|
||||
│ │ - Manage traffic split configuration │ │
|
||||
│ │ - Coordinate rollout stages │ │
|
||||
│ │ - Handle promotion/rollback │ │
|
||||
│ └──────────────────────────────┬──────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────────┴──────────────────┐ │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌───────────────────────┐ ┌───────────────────────┐ │
|
||||
│ │ TARGET-GROUP A/B │ │ ROUTER-BASED A/B │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ Deploy to groups │ │ Configure traffic │ │
|
||||
│ │ by labels/membership │ │ via load balancer │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ Good for: │ │ Good for: │ │
|
||||
│ │ - Background workers │ │ - Web/API traffic │ │
|
||||
│ │ - Batch processors │ │ - Customer-facing │ │
|
||||
│ │ - Internal services │ │ - L7 routing │ │
|
||||
│ └───────────────────────┘ └───────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ CANARY CONTROLLER │ │
|
||||
│ │ │ │
|
||||
│ │ - Execute rollout stages │ │
|
||||
│ │ - Monitor health metrics │ │
|
||||
│ │ - Auto-advance or pause │ │
|
||||
│ │ - Trigger rollback on failure │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ TRAFFIC ROUTER INTEGRATION │ │
|
||||
│ │ │ │
|
||||
│ │ Plugin-based integration with: │ │
|
||||
│ │ - Nginx (config generation + reload) │ │
|
||||
│ │ - HAProxy (config generation + reload) │ │
|
||||
│ │ - Traefik (dynamic config API) │ │
|
||||
│ │ - AWS ALB (target group weights) │ │
|
||||
│ │ - Custom (webhook) │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `ab-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | A/B release lifecycle; variation management |
|
||||
| **Dependencies** | `release-manager`, `environment-manager`, `deploy-orchestrator` |
|
||||
| **Data Entities** | `ABRelease`, `Variation`, `TrafficSplit` |
|
||||
| **Events Produced** | `ab.created`, `ab.started`, `ab.stage_advanced`, `ab.promoted`, `ab.rolled_back` |
|
||||
|
||||
**A/B Release Entity**:
|
||||
```typescript
|
||||
interface ABRelease {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
environmentId: UUID;
|
||||
name: string;
|
||||
variations: Variation[];
|
||||
activeVariation: string; // "A" or "B"
|
||||
trafficSplit: TrafficSplit;
|
||||
rolloutStrategy: RolloutStrategy;
|
||||
status: ABReleaseStatus;
|
||||
createdAt: DateTime;
|
||||
completedAt: DateTime | null;
|
||||
createdBy: UUID;
|
||||
}
|
||||
|
||||
interface Variation {
|
||||
name: string; // "A", "B"
|
||||
releaseId: UUID;
|
||||
targetGroupId: UUID | null; // for target-group based A/B
|
||||
trafficPercentage: number;
|
||||
deploymentJobId: UUID | null;
|
||||
}
|
||||
|
||||
interface TrafficSplit {
|
||||
type: "percentage" | "sticky" | "header";
|
||||
percentages: Record<string, number>; // {"A": 90, "B": 10}
|
||||
stickyKey?: string; // cookie or header name
|
||||
headerMatch?: { // for header-based routing
|
||||
header: string;
|
||||
values: Record<string, string>; // value -> variation
|
||||
};
|
||||
}
|
||||
|
||||
type ABReleaseStatus =
|
||||
| "created" // Configured, not started
|
||||
| "deploying" // Deploying variations
|
||||
| "running" // Active with traffic split
|
||||
| "promoting" // Promoting winner to 100%
|
||||
| "completed" // Successfully completed
|
||||
| "rolled_back"; // Rolled back to original
|
||||
```
|
||||
|
||||
**A/B Release Models**:
|
||||
|
||||
| Model | Description | Use Case |
|
||||
|-------|-------------|----------|
|
||||
| **Target-Group A/B** | Deploy different releases to different target groups | Background workers, internal services |
|
||||
| **Router-Based A/B** | Use load balancer to split traffic | Web/API traffic, customer-facing |
|
||||
| **Hybrid A/B** | Combination of both | Complex deployments |
|
||||
|
||||
---
|
||||
|
||||
### Module: `traffic-router`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Router plugin orchestration; traffic shifting |
|
||||
| **Dependencies** | `integration-manager`, `connector-runtime` |
|
||||
| **Protocol** | Plugin-specific (API calls, config generation) |
|
||||
|
||||
**Router Connector Interface**:
|
||||
```typescript
|
||||
interface RouterConnector extends BaseConnector {
|
||||
// Traffic management
|
||||
configureRoute(config: RouteConfig): Promise<void>;
|
||||
getTrafficDistribution(): Promise<TrafficDistribution>;
|
||||
shiftTraffic(from: string, to: string, percentage: number): Promise<void>;
|
||||
|
||||
// Configuration
|
||||
reloadConfig(): Promise<void>;
|
||||
validateConfig(config: string): Promise<ValidationResult>;
|
||||
}
|
||||
|
||||
interface RouteConfig {
|
||||
upstream: string;
|
||||
backends: Array<{
|
||||
name: string;
|
||||
targets: string[];
|
||||
weight: number;
|
||||
}>;
|
||||
healthCheck?: {
|
||||
path: string;
|
||||
interval: number;
|
||||
timeout: number;
|
||||
};
|
||||
}
|
||||
|
||||
interface TrafficDistribution {
|
||||
backends: Array<{
|
||||
name: string;
|
||||
weight: number;
|
||||
healthyTargets: number;
|
||||
totalTargets: number;
|
||||
}>;
|
||||
timestamp: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
**Router Plugins**:
|
||||
|
||||
| Plugin | Capabilities |
|
||||
|--------|-------------|
|
||||
| `router.nginx` | Config generation, reload via signal/API |
|
||||
| `router.haproxy` | Config generation, reload via socket |
|
||||
| `router.traefik` | Dynamic config API |
|
||||
| `router.aws_alb` | Target group weights via AWS API |
|
||||
| `router.custom` | Webhook-based custom integration |
|
||||
|
||||
---
|
||||
|
||||
### Module: `canary-controller`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Canary ramp automation; health monitoring |
|
||||
| **Dependencies** | `ab-manager`, `traffic-router` |
|
||||
| **Data Entities** | `CanaryStage`, `HealthResult` |
|
||||
| **Events Produced** | `canary.stage_started`, `canary.stage_passed`, `canary.stage_failed` |
|
||||
|
||||
**Canary Stage Entity**:
|
||||
```typescript
|
||||
interface CanaryStage {
|
||||
id: UUID;
|
||||
abReleaseId: UUID;
|
||||
stageNumber: number;
|
||||
trafficPercentage: number;
|
||||
status: CanaryStageStatus;
|
||||
healthThreshold: number; // Required health % to pass
|
||||
durationSeconds: number; // How long to run stage
|
||||
requireApproval: boolean; // Require manual approval
|
||||
startedAt: DateTime | null;
|
||||
completedAt: DateTime | null;
|
||||
healthResult: HealthResult | null;
|
||||
}
|
||||
|
||||
type CanaryStageStatus =
|
||||
| "pending"
|
||||
| "running"
|
||||
| "succeeded"
|
||||
| "failed"
|
||||
| "skipped";
|
||||
|
||||
interface HealthResult {
|
||||
healthy: boolean;
|
||||
healthPercentage: number;
|
||||
metrics: {
|
||||
successRate: number;
|
||||
errorRate: number;
|
||||
latencyP50: number;
|
||||
latencyP99: number;
|
||||
};
|
||||
samples: number;
|
||||
evaluatedAt: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
**Canary Rollout Execution**:
|
||||
```typescript
|
||||
class CanaryController {
|
||||
async executeRollout(abRelease: ABRelease): Promise<void> {
|
||||
const stages = abRelease.rolloutStrategy.stages;
|
||||
|
||||
for (const stage of stages) {
|
||||
this.log(`Starting canary stage ${stage.stageNumber}: ${stage.trafficPercentage}%`);
|
||||
|
||||
// 1. Shift traffic to canary percentage
|
||||
await this.trafficRouter.shiftTraffic(
|
||||
abRelease.variations[0].name, // baseline
|
||||
abRelease.variations[1].name, // canary
|
||||
stage.trafficPercentage
|
||||
);
|
||||
|
||||
// 2. Update stage status
|
||||
stage.status = "running";
|
||||
stage.startedAt = new Date();
|
||||
await this.save(stage);
|
||||
|
||||
// 3. Wait for stage duration
|
||||
await this.waitForDuration(stage.durationSeconds);
|
||||
|
||||
// 4. Evaluate health
|
||||
const healthResult = await this.evaluateHealth(abRelease, stage);
|
||||
stage.healthResult = healthResult;
|
||||
|
||||
if (!healthResult.healthy || healthResult.healthPercentage < stage.healthThreshold) {
|
||||
stage.status = "failed";
|
||||
await this.save(stage);
|
||||
|
||||
// Rollback
|
||||
await this.rollback(abRelease);
|
||||
throw new CanaryFailedError(`Stage ${stage.stageNumber} failed health check`);
|
||||
}
|
||||
|
||||
// 5. Check if approval required
|
||||
if (stage.requireApproval) {
|
||||
await this.waitForApproval(abRelease, stage);
|
||||
}
|
||||
|
||||
stage.status = "succeeded";
|
||||
stage.completedAt = new Date();
|
||||
await this.save(stage);
|
||||
|
||||
// 6. Check for auto-advance
|
||||
if (!abRelease.rolloutStrategy.autoAdvance) {
|
||||
await this.waitForManualAdvance(abRelease);
|
||||
}
|
||||
}
|
||||
|
||||
// All stages passed - promote canary to 100%
|
||||
await this.promote(abRelease, abRelease.variations[1].name);
|
||||
}
|
||||
|
||||
private async evaluateHealth(abRelease: ABRelease, stage: CanaryStage): Promise<HealthResult> {
|
||||
// Collect metrics from targets
|
||||
const canaryVariation = abRelease.variations.find(v => v.name === "B");
|
||||
const targets = await this.getTargets(canaryVariation.targetGroupId);
|
||||
|
||||
let healthyCount = 0;
|
||||
let totalLatency = 0;
|
||||
let errorCount = 0;
|
||||
|
||||
for (const target of targets) {
|
||||
const health = await this.checkTargetHealth(target);
|
||||
if (health.healthy) healthyCount++;
|
||||
totalLatency += health.latencyMs;
|
||||
if (health.errorRate > 0) errorCount++;
|
||||
}
|
||||
|
||||
return {
|
||||
healthy: healthyCount >= targets.length * (stage.healthThreshold / 100),
|
||||
healthPercentage: (healthyCount / targets.length) * 100,
|
||||
metrics: {
|
||||
successRate: ((targets.length - errorCount) / targets.length) * 100,
|
||||
errorRate: (errorCount / targets.length) * 100,
|
||||
latencyP50: totalLatency / targets.length,
|
||||
latencyP99: totalLatency / targets.length * 1.5, // simplified
|
||||
},
|
||||
samples: targets.length,
|
||||
evaluatedAt: new Date(),
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `rollout-strategy`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Strategy templates; configuration |
|
||||
| **Data Entities** | `RolloutStrategyTemplate` |
|
||||
|
||||
**Built-in Strategy Templates**:
|
||||
|
||||
| Template | Stages | Description |
|
||||
|----------|--------|-------------|
|
||||
| `canary-10-25-50-100` | 4 | Standard canary: 10%, 25%, 50%, 100% |
|
||||
| `canary-1-5-10-50-100` | 5 | Conservative: 1%, 5%, 10%, 50%, 100% |
|
||||
| `blue-green-instant` | 2 | Deploy 100% to green, instant switch |
|
||||
| `blue-green-gradual` | 4 | Gradual shift: 25%, 50%, 75%, 100% |
|
||||
|
||||
**Rollout Strategy Definition**:
|
||||
```typescript
|
||||
interface RolloutStrategy {
|
||||
id: UUID;
|
||||
name: string;
|
||||
stages: Array<{
|
||||
trafficPercentage: number;
|
||||
durationSeconds: number;
|
||||
healthThreshold: number;
|
||||
requireApproval: boolean;
|
||||
}>;
|
||||
autoAdvance: boolean;
|
||||
rollbackOnFailure: boolean;
|
||||
healthCheckInterval: number;
|
||||
}
|
||||
|
||||
// Example: Standard Canary
|
||||
const standardCanary: RolloutStrategy = {
|
||||
name: "canary-10-25-50-100",
|
||||
stages: [
|
||||
{ trafficPercentage: 10, durationSeconds: 300, healthThreshold: 95, requireApproval: false },
|
||||
{ trafficPercentage: 25, durationSeconds: 600, healthThreshold: 95, requireApproval: false },
|
||||
{ trafficPercentage: 50, durationSeconds: 900, healthThreshold: 95, requireApproval: true },
|
||||
{ trafficPercentage: 100, durationSeconds: 0, healthThreshold: 95, requireApproval: false },
|
||||
],
|
||||
autoAdvance: true,
|
||||
rollbackOnFailure: true,
|
||||
healthCheckInterval: 30,
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- A/B Releases
|
||||
CREATE TABLE release.ab_releases (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES release.environments(id),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
variations JSONB NOT NULL, -- [{name, releaseId, targetGroupId, trafficPercentage}]
|
||||
active_variation VARCHAR(50) NOT NULL DEFAULT 'A',
|
||||
traffic_split JSONB NOT NULL,
|
||||
rollout_strategy JSONB NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN (
|
||||
'created', 'deploying', 'running', 'promoting', 'completed', 'rolled_back'
|
||||
)),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
completed_at TIMESTAMPTZ,
|
||||
created_by UUID REFERENCES users(id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_ab_releases_tenant_env ON release.ab_releases(tenant_id, environment_id);
|
||||
CREATE INDEX idx_ab_releases_status ON release.ab_releases(status);
|
||||
|
||||
-- Canary Stages
|
||||
CREATE TABLE release.canary_stages (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
ab_release_id UUID NOT NULL REFERENCES release.ab_releases(id) ON DELETE CASCADE,
|
||||
stage_number INTEGER NOT NULL,
|
||||
traffic_percentage INTEGER NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
|
||||
'pending', 'running', 'succeeded', 'failed', 'skipped'
|
||||
)),
|
||||
health_threshold DECIMAL(5,2),
|
||||
duration_seconds INTEGER,
|
||||
require_approval BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
health_result JSONB,
|
||||
UNIQUE (ab_release_id, stage_number)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# A/B Releases
|
||||
POST /api/v1/ab-releases
|
||||
Body: {
|
||||
environmentId: UUID,
|
||||
name: string,
|
||||
variations: [
|
||||
{ name: "A", releaseId: UUID, targetGroupId?: UUID },
|
||||
{ name: "B", releaseId: UUID, targetGroupId?: UUID }
|
||||
],
|
||||
trafficSplit: TrafficSplit,
|
||||
rolloutStrategy: RolloutStrategy
|
||||
}
|
||||
Response: ABRelease
|
||||
|
||||
GET /api/v1/ab-releases
|
||||
Query: ?environmentId={uuid}&status={status}
|
||||
Response: ABRelease[]
|
||||
|
||||
GET /api/v1/ab-releases/{id}
|
||||
Response: ABRelease (with stages)
|
||||
|
||||
POST /api/v1/ab-releases/{id}/start
|
||||
Response: ABRelease
|
||||
|
||||
POST /api/v1/ab-releases/{id}/advance
|
||||
Body: { stageNumber?: number } # advance to next or specific stage
|
||||
Response: ABRelease
|
||||
|
||||
POST /api/v1/ab-releases/{id}/promote
|
||||
Body: { variation: "A" | "B" } # promote to 100%
|
||||
Response: ABRelease
|
||||
|
||||
POST /api/v1/ab-releases/{id}/rollback
|
||||
Response: ABRelease
|
||||
|
||||
GET /api/v1/ab-releases/{id}/traffic
|
||||
Response: { currentSplit: TrafficDistribution, history: TrafficHistory[] }
|
||||
|
||||
GET /api/v1/ab-releases/{id}/health
|
||||
Response: { variations: [{ name, healthStatus, metrics }] }
|
||||
|
||||
# Rollout Strategies
|
||||
GET /api/v1/rollout-strategies
|
||||
Response: RolloutStrategyTemplate[]
|
||||
|
||||
GET /api/v1/rollout-strategies/{id}
|
||||
Response: RolloutStrategyTemplate
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Deploy Orchestrator](deploy-orchestrator.md)
|
||||
- [A/B Releases](../progressive-delivery/ab-releases.md)
|
||||
- [Canary Controller](../progressive-delivery/canary.md)
|
||||
- [Router Plugins](../progressive-delivery/routers.md)
|
||||
433
docs/modules/release-orchestrator/modules/promotion-manager.md
Normal file
433
docs/modules/release-orchestrator/modules/promotion-manager.md
Normal file
@@ -0,0 +1,433 @@
|
||||
# PROMOT: Promotion & Approval Manager
|
||||
|
||||
**Purpose**: Manage promotion requests, approvals, gates, and decision records.
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `promotion-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Promotion request lifecycle; state management |
|
||||
| **Dependencies** | `release-manager`, `environment-manager`, `workflow-engine` |
|
||||
| **Data Entities** | `Promotion`, `PromotionState` |
|
||||
| **Events Produced** | `promotion.requested`, `promotion.approved`, `promotion.rejected`, `promotion.started`, `promotion.completed`, `promotion.failed`, `promotion.rolled_back` |
|
||||
|
||||
**Key Operations**:
|
||||
```
|
||||
RequestPromotion(releaseId, targetEnvironmentId, reason) → Promotion
|
||||
ApprovePromotion(promotionId, comment) → Promotion
|
||||
RejectPromotion(promotionId, reason) → Promotion
|
||||
CancelPromotion(promotionId) → Promotion
|
||||
GetPromotionStatus(promotionId) → PromotionState
|
||||
GetDecisionRecord(promotionId) → DecisionRecord
|
||||
```
|
||||
|
||||
**Promotion Entity**:
|
||||
```typescript
|
||||
interface Promotion {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
releaseId: UUID;
|
||||
sourceEnvironmentId: UUID | null; // null for first deployment
|
||||
targetEnvironmentId: UUID;
|
||||
status: PromotionStatus;
|
||||
decisionRecord: DecisionRecord;
|
||||
workflowRunId: UUID | null;
|
||||
requestedAt: DateTime;
|
||||
requestedBy: UUID;
|
||||
requestReason: string;
|
||||
decidedAt: DateTime | null;
|
||||
startedAt: DateTime | null;
|
||||
completedAt: DateTime | null;
|
||||
evidencePacketId: UUID | null;
|
||||
}
|
||||
|
||||
type PromotionStatus =
|
||||
| "pending_approval" // Waiting for human approval
|
||||
| "pending_gate" // Waiting for gate evaluation
|
||||
| "approved" // Ready for deployment
|
||||
| "rejected" // Blocked by approval or gate
|
||||
| "deploying" // Deployment in progress
|
||||
| "deployed" // Successfully deployed
|
||||
| "failed" // Deployment failed
|
||||
| "cancelled" // User cancelled
|
||||
| "rolled_back"; // Rolled back after failure
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `approval-gateway`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Approval collection; separation of duties enforcement |
|
||||
| **Dependencies** | `authority` (for user/group lookup) |
|
||||
| **Data Entities** | `Approval`, `ApprovalPolicy` |
|
||||
| **Events Produced** | `approval.granted`, `approval.denied` |
|
||||
|
||||
**Approval Policy Entity**:
|
||||
```typescript
|
||||
interface ApprovalPolicy {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
environmentId: UUID;
|
||||
requiredCount: number; // Minimum approvals required
|
||||
requiredRoles: string[]; // At least one approver must have role
|
||||
requiredGroups: string[]; // At least one approver must be in group
|
||||
requireSeparationOfDuties: boolean; // Requester cannot approve
|
||||
allowSelfApproval: boolean; // Override SoD for specific users
|
||||
expirationMinutes: number; // Approval expires after N minutes
|
||||
}
|
||||
|
||||
interface Approval {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
promotionId: UUID;
|
||||
approverId: UUID;
|
||||
action: "approved" | "rejected";
|
||||
comment: string;
|
||||
approvedAt: DateTime;
|
||||
approverRole: string;
|
||||
approverGroups: string[];
|
||||
}
|
||||
```
|
||||
|
||||
**Separation of Duties (SoD) Rules**:
|
||||
1. Requester cannot approve their own promotion (if `requireSeparationOfDuties` is true)
|
||||
2. Same user cannot approve twice
|
||||
3. At least N different users must approve (based on `requiredCount`)
|
||||
4. At least one approver must match `requiredRoles` if specified
|
||||
5. At least one approver must be in `requiredGroups` if specified
|
||||
|
||||
---
|
||||
|
||||
### Module: `decision-engine`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Gate evaluation; policy integration; decision record generation |
|
||||
| **Dependencies** | `gate-registry`, `policy` (OPA integration), `scanner` (security data) |
|
||||
| **Data Entities** | `DecisionRecord`, `GateResult` |
|
||||
| **Events Produced** | `decision.evaluated`, `decision.recorded` |
|
||||
|
||||
**Decision Record Structure**:
|
||||
```typescript
|
||||
interface DecisionRecord {
|
||||
promotionId: UUID;
|
||||
evaluatedAt: DateTime;
|
||||
decision: "allow" | "deny" | "pending";
|
||||
|
||||
// What was evaluated
|
||||
release: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
components: Array<{
|
||||
name: string;
|
||||
digest: string;
|
||||
semver: string;
|
||||
}>;
|
||||
};
|
||||
|
||||
environment: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
requiredApprovals: number;
|
||||
freezeWindow: boolean;
|
||||
};
|
||||
|
||||
// Gate evaluation results
|
||||
gates: GateResult[];
|
||||
|
||||
// Approval status
|
||||
approvalStatus: {
|
||||
required: number;
|
||||
received: number;
|
||||
approvers: Array<{
|
||||
userId: UUID;
|
||||
action: string;
|
||||
at: DateTime;
|
||||
}>;
|
||||
sodViolation: boolean;
|
||||
};
|
||||
|
||||
// Reason for decision
|
||||
reasons: string[];
|
||||
|
||||
// Hash of all inputs for replay verification
|
||||
inputsHash: string;
|
||||
}
|
||||
|
||||
interface GateResult {
|
||||
gateType: string;
|
||||
gateName: string;
|
||||
status: "passed" | "failed" | "warning" | "skipped";
|
||||
message: string;
|
||||
details: Record<string, any>;
|
||||
evaluatedAt: DateTime;
|
||||
durationMs: number;
|
||||
}
|
||||
```
|
||||
|
||||
**Gate Evaluation Order**:
|
||||
1. **Freeze Window Check**: Is environment in freeze?
|
||||
2. **Approval Check**: All required approvals received?
|
||||
3. **Security Gate**: No blocking vulnerabilities?
|
||||
4. **Custom Policy Gates**: All OPA policies pass?
|
||||
5. **Integration Gates**: External system checks pass?
|
||||
|
||||
---
|
||||
|
||||
### Module: `gate-registry`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Built-in + custom gate registration |
|
||||
| **Dependencies** | `plugin-registry` |
|
||||
| **Data Entities** | `GateDefinition`, `GateConfig` |
|
||||
|
||||
**Built-in Gates**:
|
||||
|
||||
| Gate Type | Description |
|
||||
|-----------|-------------|
|
||||
| `freeze-window` | Check if environment is in freeze |
|
||||
| `approval` | Check if required approvals received |
|
||||
| `security-scan` | Check for blocking vulnerabilities |
|
||||
| `scan-freshness` | Check if scan is recent enough |
|
||||
| `digest-verification` | Verify digests haven't changed |
|
||||
| `environment-sequence` | Enforce promotion order |
|
||||
| `custom-opa` | Custom OPA/Rego policy |
|
||||
| `webhook` | External webhook gate |
|
||||
|
||||
**Gate Definition**:
|
||||
```typescript
|
||||
interface GateDefinition {
|
||||
type: string;
|
||||
displayName: string;
|
||||
description: string;
|
||||
configSchema: JSONSchema;
|
||||
evaluator: "builtin" | UUID; // builtin or plugin ID
|
||||
blocking: boolean; // Can block promotion
|
||||
cacheable: boolean; // Can cache result
|
||||
cacheTtlSeconds: number;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Promotion State Machine
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ PROMOTION STATE MACHINE │
|
||||
│ │
|
||||
│ ┌───────────────┐ │
|
||||
│ │ REQUESTED │ ◄──── User requests promotion │
|
||||
│ └───────┬───────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌───────────────┐ ┌───────────────┐ │
|
||||
│ │ PENDING │─────►│ REJECTED │ ◄──── Approver rejects │
|
||||
│ │ APPROVAL │ └───────────────┘ │
|
||||
│ └───────┬───────┘ │
|
||||
│ │ approval received │
|
||||
│ ▼ │
|
||||
│ ┌───────────────┐ ┌───────────────┐ │
|
||||
│ │ PENDING │─────►│ REJECTED │ ◄──── Gate fails │
|
||||
│ │ GATE │ └───────────────┘ │
|
||||
│ └───────┬───────┘ │
|
||||
│ │ all gates pass │
|
||||
│ ▼ │
|
||||
│ ┌───────────────┐ │
|
||||
│ │ APPROVED │ ◄──── Ready for deployment │
|
||||
│ └───────┬───────┘ │
|
||||
│ │ workflow starts │
|
||||
│ ▼ │
|
||||
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
|
||||
│ │ DEPLOYING │─────►│ FAILED │─────►│ ROLLED_BACK │ │
|
||||
│ └───────┬───────┘ └───────────────┘ └───────────────┘ │
|
||||
│ │ │
|
||||
│ │ deployment complete │
|
||||
│ ▼ │
|
||||
│ ┌───────────────┐ │
|
||||
│ │ DEPLOYED │ ◄──── Success! │
|
||||
│ └───────────────┘ │
|
||||
│ │
|
||||
│ Additional transitions: │
|
||||
│ - Any non-terminal → CANCELLED: user cancels │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Promotions
|
||||
CREATE TABLE release.promotions (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
release_id UUID NOT NULL REFERENCES release.releases(id),
|
||||
source_environment_id UUID REFERENCES release.environments(id),
|
||||
target_environment_id UUID NOT NULL REFERENCES release.environments(id),
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending_approval' CHECK (status IN (
|
||||
'pending_approval', 'pending_gate', 'approved', 'rejected',
|
||||
'deploying', 'deployed', 'failed', 'cancelled', 'rolled_back'
|
||||
)),
|
||||
decision_record JSONB,
|
||||
workflow_run_id UUID REFERENCES release.workflow_runs(id),
|
||||
requested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
requested_by UUID NOT NULL REFERENCES users(id),
|
||||
request_reason TEXT,
|
||||
decided_at TIMESTAMPTZ,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
evidence_packet_id UUID
|
||||
);
|
||||
|
||||
CREATE INDEX idx_promotions_tenant ON release.promotions(tenant_id);
|
||||
CREATE INDEX idx_promotions_release ON release.promotions(release_id);
|
||||
CREATE INDEX idx_promotions_status ON release.promotions(status);
|
||||
CREATE INDEX idx_promotions_target_env ON release.promotions(target_environment_id);
|
||||
|
||||
-- Approvals
|
||||
CREATE TABLE release.approvals (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
promotion_id UUID NOT NULL REFERENCES release.promotions(id) ON DELETE CASCADE,
|
||||
approver_id UUID NOT NULL REFERENCES users(id),
|
||||
action VARCHAR(50) NOT NULL CHECK (action IN ('approved', 'rejected')),
|
||||
comment TEXT,
|
||||
approved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
approver_role VARCHAR(255),
|
||||
approver_groups JSONB NOT NULL DEFAULT '[]'
|
||||
);
|
||||
|
||||
CREATE INDEX idx_approvals_promotion ON release.approvals(promotion_id);
|
||||
CREATE INDEX idx_approvals_approver ON release.approvals(approver_id);
|
||||
|
||||
-- Approval Policies
|
||||
CREATE TABLE release.approval_policies (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE,
|
||||
required_count INTEGER NOT NULL DEFAULT 1,
|
||||
required_roles JSONB NOT NULL DEFAULT '[]',
|
||||
required_groups JSONB NOT NULL DEFAULT '[]',
|
||||
require_sod BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
allow_self_approval BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
expiration_minutes INTEGER NOT NULL DEFAULT 1440,
|
||||
UNIQUE (tenant_id, environment_id)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Promotions
|
||||
POST /api/v1/promotions
|
||||
Body: { releaseId, targetEnvironmentId, reason? }
|
||||
Response: Promotion
|
||||
|
||||
GET /api/v1/promotions
|
||||
Query: ?status={status}&releaseId={uuid}&environmentId={uuid}&page={n}
|
||||
Response: { data: Promotion[], meta: PaginationMeta }
|
||||
|
||||
GET /api/v1/promotions/{id}
|
||||
Response: Promotion (with decision record, approvals)
|
||||
|
||||
POST /api/v1/promotions/{id}/approve
|
||||
Body: { comment? }
|
||||
Response: Promotion
|
||||
|
||||
POST /api/v1/promotions/{id}/reject
|
||||
Body: { reason }
|
||||
Response: Promotion
|
||||
|
||||
POST /api/v1/promotions/{id}/cancel
|
||||
Response: Promotion
|
||||
|
||||
GET /api/v1/promotions/{id}/decision
|
||||
Response: DecisionRecord
|
||||
|
||||
GET /api/v1/promotions/{id}/approvals
|
||||
Response: Approval[]
|
||||
|
||||
GET /api/v1/promotions/{id}/evidence
|
||||
Response: EvidencePacket
|
||||
|
||||
# Gate Evaluation Preview
|
||||
POST /api/v1/promotions/preview-gates
|
||||
Body: { releaseId, targetEnvironmentId }
|
||||
Response: { wouldPass: boolean, gates: GateResult[] }
|
||||
|
||||
# Approval Policies
|
||||
POST /api/v1/approval-policies
|
||||
GET /api/v1/approval-policies
|
||||
GET /api/v1/approval-policies/{id}
|
||||
PUT /api/v1/approval-policies/{id}
|
||||
DELETE /api/v1/approval-policies/{id}
|
||||
|
||||
# Pending Approvals (for current user)
|
||||
GET /api/v1/my/pending-approvals
|
||||
Response: Promotion[]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Gate Integration
|
||||
|
||||
The security gate evaluates the release against vulnerability data from the Scanner module:
|
||||
|
||||
```typescript
|
||||
interface SecurityGateConfig {
|
||||
blockOnCritical: boolean; // Block if any critical severity
|
||||
blockOnHigh: boolean; // Block if any high severity
|
||||
maxCritical: number; // Max allowed critical (0 for strict)
|
||||
maxHigh: number; // Max allowed high
|
||||
requireFreshScan: boolean; // Require scan within N hours
|
||||
scanFreshnessHours: number; // How recent scan must be
|
||||
allowExceptions: boolean; // Allow VEX exceptions
|
||||
requireVexJustification: boolean; // Require VEX for exceptions
|
||||
}
|
||||
|
||||
interface SecurityGateResult {
|
||||
passed: boolean;
|
||||
summary: {
|
||||
critical: number;
|
||||
high: number;
|
||||
medium: number;
|
||||
low: number;
|
||||
};
|
||||
blocking: Array<{
|
||||
cve: string;
|
||||
severity: string;
|
||||
component: string;
|
||||
digest: string;
|
||||
fixAvailable: boolean;
|
||||
}>;
|
||||
exceptions: Array<{
|
||||
cve: string;
|
||||
vexStatus: string;
|
||||
justification: string;
|
||||
}>;
|
||||
scanAge: {
|
||||
component: string;
|
||||
scannedAt: DateTime;
|
||||
ageHours: number;
|
||||
fresh: boolean;
|
||||
}[];
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Workflow Engine](workflow-engine.md)
|
||||
- [Security Architecture](../security/overview.md)
|
||||
- [API Documentation](../api/promotions.md)
|
||||
406
docs/modules/release-orchestrator/modules/release-manager.md
Normal file
406
docs/modules/release-orchestrator/modules/release-manager.md
Normal file
@@ -0,0 +1,406 @@
|
||||
# RELMAN: Release Management
|
||||
|
||||
**Purpose**: Manage components, versions, and release bundles.
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `component-registry`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Map image repositories to logical components |
|
||||
| **Dependencies** | `integration-manager` (for registry access) |
|
||||
| **Data Entities** | `Component`, `ComponentVersion` |
|
||||
| **Events Produced** | `component.created`, `component.updated`, `component.deleted` |
|
||||
|
||||
**Key Operations**:
|
||||
```
|
||||
CreateComponent(name, displayName, imageRepository, registryId) → Component
|
||||
UpdateComponent(id, config) → Component
|
||||
DeleteComponent(id) → void
|
||||
SyncVersions(componentId, forceRefresh) → VersionMap[]
|
||||
ListComponents(tenantId) → Component[]
|
||||
```
|
||||
|
||||
**Component Entity**:
|
||||
```typescript
|
||||
interface Component {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string; // "api", "worker", "frontend"
|
||||
displayName: string; // "API Service"
|
||||
imageRepository: string; // "registry.example.com/myapp/api"
|
||||
registryIntegrationId: UUID; // which registry integration
|
||||
versioningStrategy: VersionStrategy;
|
||||
deploymentTemplate: string; // which workflow template to use
|
||||
defaultChannel: string; // "stable", "beta"
|
||||
metadata: Record<string, string>;
|
||||
}
|
||||
|
||||
interface VersionStrategy {
|
||||
type: "semver" | "date" | "sequential" | "manual";
|
||||
tagPattern?: string; // regex for tag extraction
|
||||
semverExtract?: string; // regex capture group
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `version-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Tag/digest mapping; version rules |
|
||||
| **Dependencies** | `component-registry`, `connector-runtime` |
|
||||
| **Data Entities** | `VersionMap`, `VersionRule`, `Channel` |
|
||||
| **Events Produced** | `version.resolved`, `version.updated` |
|
||||
|
||||
**Version Resolution**:
|
||||
```typescript
|
||||
interface VersionMap {
|
||||
id: UUID;
|
||||
componentId: UUID;
|
||||
tag: string; // "v2.3.1"
|
||||
digest: string; // "sha256:abc123..."
|
||||
semver: string; // "2.3.1"
|
||||
channel: string; // "stable"
|
||||
prerelease: boolean;
|
||||
buildMetadata: string;
|
||||
resolvedAt: DateTime;
|
||||
source: "auto" | "manual";
|
||||
}
|
||||
|
||||
interface VersionRule {
|
||||
id: UUID;
|
||||
componentId: UUID;
|
||||
pattern: string; // "^v(\\d+\\.\\d+\\.\\d+)$"
|
||||
channel: string; // "stable"
|
||||
prereleasePattern: string;// ".*-(alpha|beta|rc).*"
|
||||
}
|
||||
```
|
||||
|
||||
**Version Resolution Algorithm**:
|
||||
1. Fetch tags from registry (via connector)
|
||||
2. Apply version rules to extract semver
|
||||
3. Resolve each tag to digest
|
||||
4. Store in version map
|
||||
5. Update channels ("latest stable", "latest beta")
|
||||
|
||||
---
|
||||
|
||||
### Module: `release-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Release bundle lifecycle; composition |
|
||||
| **Dependencies** | `component-registry`, `version-manager` |
|
||||
| **Data Entities** | `Release`, `ReleaseComponent` |
|
||||
| **Events Produced** | `release.created`, `release.promoted`, `release.deprecated` |
|
||||
|
||||
**Release Entity**:
|
||||
```typescript
|
||||
interface Release {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string; // "myapp-v2.3.1"
|
||||
displayName: string; // "MyApp 2.3.1"
|
||||
components: ReleaseComponent[];
|
||||
sourceRef: SourceReference;
|
||||
status: ReleaseStatus;
|
||||
createdAt: DateTime;
|
||||
createdBy: UUID;
|
||||
deployedEnvironments: UUID[]; // where currently deployed
|
||||
metadata: Record<string, string>;
|
||||
}
|
||||
|
||||
interface ReleaseComponent {
|
||||
componentId: UUID;
|
||||
componentName: string;
|
||||
digest: string; // sha256:...
|
||||
semver: string; // resolved semver
|
||||
tag: string; // original tag (for display)
|
||||
role: "primary" | "sidecar" | "init" | "migration";
|
||||
}
|
||||
|
||||
interface SourceReference {
|
||||
scmIntegrationId?: UUID;
|
||||
commitSha?: string;
|
||||
branch?: string;
|
||||
ciIntegrationId?: UUID;
|
||||
buildId?: string;
|
||||
pipelineUrl?: string;
|
||||
}
|
||||
|
||||
type ReleaseStatus =
|
||||
| "draft" // being composed
|
||||
| "ready" // ready for promotion
|
||||
| "promoting" // promotion in progress
|
||||
| "deployed" // deployed to at least one env
|
||||
| "deprecated" // marked as deprecated
|
||||
| "archived"; // no longer active
|
||||
```
|
||||
|
||||
**Release Creation Modes**:
|
||||
|
||||
| Mode | Description |
|
||||
|------|-------------|
|
||||
| **Full Release** | All components, latest versions |
|
||||
| **Partial Release** | Subset of components updated; others pinned from last deployment |
|
||||
| **Pinned Release** | All versions explicitly specified |
|
||||
| **Channel Release** | All components from specific channel ("beta") |
|
||||
|
||||
---
|
||||
|
||||
### Module: `release-catalog`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Release history, search, comparison |
|
||||
| **Dependencies** | `release-manager` |
|
||||
|
||||
**Key Operations**:
|
||||
```
|
||||
SearchReleases(filter, pagination) → Release[]
|
||||
CompareReleases(releaseA, releaseB) → ReleaseDiff
|
||||
GetReleaseHistory(componentId) → Release[]
|
||||
GetReleaseLineage(releaseId) → ReleaseLineage // promotion path
|
||||
```
|
||||
|
||||
**Release Comparison**:
|
||||
```typescript
|
||||
interface ReleaseDiff {
|
||||
releaseA: UUID;
|
||||
releaseB: UUID;
|
||||
added: ComponentDiff[]; // Components in B not in A
|
||||
removed: ComponentDiff[]; // Components in A not in B
|
||||
changed: ComponentChange[]; // Components with different versions
|
||||
unchanged: ComponentDiff[]; // Components with same version
|
||||
}
|
||||
|
||||
interface ComponentChange {
|
||||
componentId: UUID;
|
||||
componentName: string;
|
||||
fromVersion: string;
|
||||
toVersion: string;
|
||||
fromDigest: string;
|
||||
toDigest: string;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Components
|
||||
CREATE TABLE release.components (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
image_repository VARCHAR(500) NOT NULL,
|
||||
registry_integration_id UUID REFERENCES release.integrations(id),
|
||||
versioning_strategy JSONB NOT NULL DEFAULT '{"type": "semver"}',
|
||||
deployment_template VARCHAR(255),
|
||||
default_channel VARCHAR(50) NOT NULL DEFAULT 'stable',
|
||||
metadata JSONB NOT NULL DEFAULT '{}',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_components_tenant ON release.components(tenant_id);
|
||||
|
||||
-- Version Maps
|
||||
CREATE TABLE release.version_maps (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
component_id UUID NOT NULL REFERENCES release.components(id) ON DELETE CASCADE,
|
||||
tag VARCHAR(255) NOT NULL,
|
||||
digest VARCHAR(100) NOT NULL,
|
||||
semver VARCHAR(50),
|
||||
channel VARCHAR(50) NOT NULL DEFAULT 'stable',
|
||||
prerelease BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
build_metadata VARCHAR(255),
|
||||
resolved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
source VARCHAR(50) NOT NULL DEFAULT 'auto',
|
||||
UNIQUE (tenant_id, component_id, digest)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_version_maps_component ON release.version_maps(component_id);
|
||||
CREATE INDEX idx_version_maps_digest ON release.version_maps(digest);
|
||||
CREATE INDEX idx_version_maps_semver ON release.version_maps(semver);
|
||||
|
||||
-- Releases
|
||||
CREATE TABLE release.releases (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
components JSONB NOT NULL, -- [{componentId, digest, semver, tag, role}]
|
||||
source_ref JSONB, -- {scmIntegrationId, commitSha, ciIntegrationId, buildId}
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'draft',
|
||||
metadata JSONB NOT NULL DEFAULT '{}',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
created_by UUID REFERENCES users(id),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_releases_tenant ON release.releases(tenant_id);
|
||||
CREATE INDEX idx_releases_status ON release.releases(status);
|
||||
CREATE INDEX idx_releases_created ON release.releases(created_at DESC);
|
||||
|
||||
-- Release Environment State
|
||||
CREATE TABLE release.release_environment_state (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE,
|
||||
release_id UUID NOT NULL REFERENCES release.releases(id),
|
||||
status VARCHAR(50) NOT NULL,
|
||||
deployed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
deployed_by UUID REFERENCES users(id),
|
||||
promotion_id UUID,
|
||||
evidence_ref VARCHAR(255),
|
||||
UNIQUE (tenant_id, environment_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_release_env_state_env ON release.release_environment_state(environment_id);
|
||||
CREATE INDEX idx_release_env_state_release ON release.release_environment_state(release_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Components
|
||||
POST /api/v1/components
|
||||
Body: { name, displayName, imageRepository, registryIntegrationId, versioningStrategy?, defaultChannel? }
|
||||
Response: Component
|
||||
|
||||
GET /api/v1/components
|
||||
Response: Component[]
|
||||
|
||||
GET /api/v1/components/{id}
|
||||
Response: Component
|
||||
|
||||
PUT /api/v1/components/{id}
|
||||
Response: Component
|
||||
|
||||
DELETE /api/v1/components/{id}
|
||||
Response: { deleted: true }
|
||||
|
||||
POST /api/v1/components/{id}/sync-versions
|
||||
Body: { forceRefresh?: boolean }
|
||||
Response: { synced: number, versions: VersionMap[] }
|
||||
|
||||
GET /api/v1/components/{id}/versions
|
||||
Query: ?channel={stable|beta}&limit={n}
|
||||
Response: VersionMap[]
|
||||
|
||||
# Version Maps
|
||||
POST /api/v1/version-maps
|
||||
Body: { componentId, tag, semver, channel } # manual version assignment
|
||||
Response: VersionMap
|
||||
|
||||
GET /api/v1/version-maps
|
||||
Query: ?componentId={uuid}&channel={channel}
|
||||
Response: VersionMap[]
|
||||
|
||||
# Releases
|
||||
POST /api/v1/releases
|
||||
Body: {
|
||||
name: string,
|
||||
displayName?: string,
|
||||
components: [
|
||||
{ componentId: UUID, version?: string, digest?: string, channel?: string }
|
||||
],
|
||||
sourceRef?: SourceReference
|
||||
}
|
||||
Response: Release
|
||||
|
||||
GET /api/v1/releases
|
||||
Query: ?status={status}&componentId={uuid}&page={n}&pageSize={n}
|
||||
Response: { data: Release[], meta: PaginationMeta }
|
||||
|
||||
GET /api/v1/releases/{id}
|
||||
Response: Release (with full component details)
|
||||
|
||||
PUT /api/v1/releases/{id}
|
||||
Body: { displayName?, metadata?, status? }
|
||||
Response: Release
|
||||
|
||||
DELETE /api/v1/releases/{id}
|
||||
Response: { deleted: true }
|
||||
|
||||
GET /api/v1/releases/{id}/state
|
||||
Response: { environments: [{ environmentId, status, deployedAt }] }
|
||||
|
||||
POST /api/v1/releases/{id}/deprecate
|
||||
Response: Release
|
||||
|
||||
GET /api/v1/releases/{id}/compare/{otherId}
|
||||
Response: ReleaseDiff
|
||||
|
||||
# Quick release creation
|
||||
POST /api/v1/releases/from-latest
|
||||
Body: {
|
||||
name: string,
|
||||
channel?: string, # default: stable
|
||||
componentIds?: UUID[], # default: all
|
||||
pinFrom?: { environmentId: UUID } # for partial release
|
||||
}
|
||||
Response: Release
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Release Identity: Digest-First Principle
|
||||
|
||||
A core design invariant of the Release Orchestrator:
|
||||
|
||||
```
|
||||
INVARIANT: A release is a set of OCI image digests (component -> digest mapping), never tags.
|
||||
```
|
||||
|
||||
**Implementation Requirements**:
|
||||
- Tags are convenience inputs for resolution
|
||||
- Tags are resolved to digests at release creation time
|
||||
- All downstream operations (promotion, deployment, rollback) use digests
|
||||
- Digest mismatch at pull time = deployment failure (tamper detection)
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{
|
||||
"id": "release-uuid",
|
||||
"name": "myapp-v2.3.1",
|
||||
"components": [
|
||||
{
|
||||
"componentId": "api-component-uuid",
|
||||
"componentName": "api",
|
||||
"tag": "v2.3.1",
|
||||
"digest": "sha256:abc123def456...",
|
||||
"semver": "2.3.1",
|
||||
"role": "primary"
|
||||
},
|
||||
{
|
||||
"componentId": "worker-component-uuid",
|
||||
"componentName": "worker",
|
||||
"tag": "v2.3.1",
|
||||
"digest": "sha256:789xyz123abc...",
|
||||
"semver": "2.3.1",
|
||||
"role": "primary"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Design Principles](../design/principles.md)
|
||||
- [API Documentation](../api/releases.md)
|
||||
- [Promotion Manager](promotion-manager.md)
|
||||
590
docs/modules/release-orchestrator/modules/workflow-engine.md
Normal file
590
docs/modules/release-orchestrator/modules/workflow-engine.md
Normal file
@@ -0,0 +1,590 @@
|
||||
# WORKFL: Workflow Engine
|
||||
|
||||
**Purpose**: DAG-based workflow execution for deployments, approvals, and custom automation.
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `workflow-designer`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Template creation; DAG graph editor; validation |
|
||||
| **Dependencies** | `step-registry` |
|
||||
| **Data Entities** | `WorkflowTemplate`, `StepNode`, `StepEdge` |
|
||||
|
||||
**Workflow Template Structure**:
|
||||
```typescript
|
||||
interface WorkflowTemplate {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string;
|
||||
displayName: string;
|
||||
description: string;
|
||||
version: number;
|
||||
|
||||
// DAG structure
|
||||
nodes: StepNode[];
|
||||
edges: StepEdge[];
|
||||
|
||||
// I/O
|
||||
inputs: InputDefinition[];
|
||||
outputs: OutputDefinition[];
|
||||
|
||||
// Metadata
|
||||
tags: string[];
|
||||
isBuiltin: boolean;
|
||||
createdAt: DateTime;
|
||||
createdBy: UUID;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `workflow-engine`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | DAG execution; state machine; pause/resume |
|
||||
| **Dependencies** | `step-executor`, `step-registry` |
|
||||
| **Data Entities** | `WorkflowRun`, `WorkflowState` |
|
||||
| **Events Produced** | `workflow.started`, `workflow.paused`, `workflow.resumed`, `workflow.completed`, `workflow.failed` |
|
||||
|
||||
**Workflow Execution Algorithm**:
|
||||
```python
|
||||
class WorkflowEngine:
|
||||
def execute(self, workflow_run: WorkflowRun) -> None:
|
||||
"""Main workflow execution loop."""
|
||||
|
||||
# Initialize
|
||||
workflow_run.status = "running"
|
||||
workflow_run.started_at = now()
|
||||
self.save(workflow_run)
|
||||
|
||||
try:
|
||||
while not self.is_terminal(workflow_run):
|
||||
# Handle pause state
|
||||
if workflow_run.status == "paused":
|
||||
self.wait_for_resume(workflow_run)
|
||||
continue
|
||||
|
||||
# Get nodes ready for execution
|
||||
ready_nodes = self.get_ready_nodes(workflow_run)
|
||||
|
||||
if not ready_nodes:
|
||||
# Check if we're waiting on approvals
|
||||
if self.has_pending_approvals(workflow_run):
|
||||
workflow_run.status = "paused"
|
||||
self.save(workflow_run)
|
||||
continue
|
||||
|
||||
# Check if all nodes are complete
|
||||
if self.all_nodes_complete(workflow_run):
|
||||
break
|
||||
|
||||
# Deadlock detection
|
||||
raise WorkflowDeadlockError(workflow_run.id)
|
||||
|
||||
# Execute ready nodes in parallel
|
||||
futures = []
|
||||
for node in ready_nodes:
|
||||
future = self.executor.submit(
|
||||
self.execute_node,
|
||||
workflow_run,
|
||||
node
|
||||
)
|
||||
futures.append((node, future))
|
||||
|
||||
# Wait for at least one to complete
|
||||
completed = self.wait_any(futures)
|
||||
|
||||
for node, result in completed:
|
||||
step_run = self.get_step_run(workflow_run, node.id)
|
||||
|
||||
if result.success:
|
||||
step_run.status = "succeeded"
|
||||
step_run.outputs = result.outputs
|
||||
self.propagate_outputs(workflow_run, node, result.outputs)
|
||||
else:
|
||||
step_run.status = "failed"
|
||||
step_run.error_message = result.error
|
||||
|
||||
# Handle failure action
|
||||
if node.on_failure == "fail":
|
||||
workflow_run.status = "failed"
|
||||
workflow_run.error_message = f"Step {node.name} failed: {result.error}"
|
||||
self.cancel_pending_steps(workflow_run)
|
||||
return
|
||||
elif node.on_failure == "rollback":
|
||||
self.trigger_rollback(workflow_run, node)
|
||||
elif node.on_failure.startswith("goto:"):
|
||||
target = node.on_failure.split(":")[1]
|
||||
self.add_ready_node(workflow_run, target)
|
||||
# "continue" just continues to next nodes
|
||||
|
||||
step_run.completed_at = now()
|
||||
self.save(step_run)
|
||||
|
||||
# Workflow completed successfully
|
||||
workflow_run.status = "succeeded"
|
||||
workflow_run.completed_at = now()
|
||||
self.save(workflow_run)
|
||||
|
||||
except WorkflowCancelledError:
|
||||
workflow_run.status = "cancelled"
|
||||
workflow_run.completed_at = now()
|
||||
self.save(workflow_run)
|
||||
except Exception as e:
|
||||
workflow_run.status = "failed"
|
||||
workflow_run.error_message = str(e)
|
||||
workflow_run.completed_at = now()
|
||||
self.save(workflow_run)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `step-executor`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Step dispatch; retry logic; timeout handling |
|
||||
| **Dependencies** | `step-registry`, `plugin-sandbox` |
|
||||
| **Data Entities** | `StepRun`, `StepResult` |
|
||||
| **Events Produced** | `step.started`, `step.progress`, `step.completed`, `step.failed`, `step.retrying` |
|
||||
|
||||
**Step Node Structure**:
|
||||
```typescript
|
||||
interface StepNode {
|
||||
id: string; // Unique within template (e.g., "deploy-api")
|
||||
type: string; // Step type from registry
|
||||
name: string; // Display name
|
||||
config: Record<string, any>; // Step-specific configuration
|
||||
inputs: InputBinding[]; // Input value bindings
|
||||
outputs: OutputBinding[]; // Output declarations
|
||||
position: { x: number; y: number }; // UI position
|
||||
|
||||
// Execution settings
|
||||
timeout: number; // Seconds (default from step type)
|
||||
retryPolicy: RetryPolicy;
|
||||
onFailure: FailureAction;
|
||||
condition?: string; // JS expression for conditional execution
|
||||
|
||||
// Documentation
|
||||
description?: string;
|
||||
documentation?: string;
|
||||
}
|
||||
|
||||
type FailureAction = "fail" | "continue" | "rollback" | "goto:{nodeId}";
|
||||
|
||||
interface InputBinding {
|
||||
name: string; // Input parameter name
|
||||
source: InputSource;
|
||||
}
|
||||
|
||||
type InputSource =
|
||||
| { type: "literal"; value: any }
|
||||
| { type: "context"; path: string } // e.g., "release.name"
|
||||
| { type: "output"; nodeId: string; outputName: string }
|
||||
| { type: "secret"; secretName: string }
|
||||
| { type: "expression"; expression: string }; // JS expression
|
||||
|
||||
interface StepEdge {
|
||||
id: string;
|
||||
from: string; // Source node ID
|
||||
to: string; // Target node ID
|
||||
condition?: string; // Optional condition expression
|
||||
label?: string; // Display label for conditional edges
|
||||
}
|
||||
|
||||
interface RetryPolicy {
|
||||
maxRetries: number;
|
||||
backoffType: "fixed" | "exponential";
|
||||
backoffSeconds: number;
|
||||
retryableErrors: string[];
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `step-registry`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Built-in + plugin-provided step types |
|
||||
| **Dependencies** | `plugin-registry` |
|
||||
| **Data Entities** | `StepType`, `StepSchema` |
|
||||
|
||||
**Built-in Step Types**:
|
||||
|
||||
| Step Type | Category | Description |
|
||||
|-----------|----------|-------------|
|
||||
| `approval` | Control | Wait for human approval |
|
||||
| `security-gate` | Gate | Evaluate security policy |
|
||||
| `custom-gate` | Gate | Custom OPA policy evaluation |
|
||||
| `deploy-docker` | Deploy | Deploy single container |
|
||||
| `deploy-compose` | Deploy | Deploy Docker Compose stack |
|
||||
| `deploy-ecs` | Deploy | Deploy to AWS ECS |
|
||||
| `deploy-nomad` | Deploy | Deploy to HashiCorp Nomad |
|
||||
| `health-check` | Verify | HTTP/TCP health check |
|
||||
| `smoke-test` | Verify | Run smoke test suite |
|
||||
| `execute-script` | Custom | Run C#/Bash script |
|
||||
| `webhook` | Integration | Call external webhook |
|
||||
| `trigger-ci` | Integration | Trigger CI pipeline |
|
||||
| `wait-ci` | Integration | Wait for CI pipeline |
|
||||
| `notify` | Notification | Send notification |
|
||||
| `rollback` | Recovery | Rollback deployment |
|
||||
| `traffic-shift` | Progressive | Shift traffic percentage |
|
||||
|
||||
**Step Type Definition**:
|
||||
```typescript
|
||||
interface StepType {
|
||||
type: string; // "deploy-compose"
|
||||
displayName: string; // "Deploy Compose Stack"
|
||||
description: string;
|
||||
category: StepCategory;
|
||||
icon: string;
|
||||
|
||||
// Schema
|
||||
configSchema: JSONSchema; // Step configuration schema
|
||||
inputSchema: JSONSchema; // Required inputs schema
|
||||
outputSchema: JSONSchema; // Produced outputs schema
|
||||
|
||||
// Execution
|
||||
executor: "builtin" | UUID; // builtin or plugin ID
|
||||
defaultTimeout: number;
|
||||
safeToRetry: boolean;
|
||||
retryableErrors: string[];
|
||||
|
||||
// Documentation
|
||||
documentation: string;
|
||||
examples: StepExample[];
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow Run State Machine
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ WORKFLOW RUN STATE MACHINE │
|
||||
│ │
|
||||
│ ┌──────────┐ │
|
||||
│ │ CREATED │ │
|
||||
│ └────┬─────┘ │
|
||||
│ │ start() │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────┐ │
|
||||
│ │ │ │
|
||||
│ pause() ┌──┴──────────┐ │ │
|
||||
│ ┌────────►│ PAUSED │◄─────────┐ │ │
|
||||
│ │ └──────┬──────┘ │ │ │
|
||||
│ │ │ resume() │ │ │
|
||||
│ │ ▼ │ │ │
|
||||
│ │ ┌─────────────┐ │ │ │
|
||||
│ └─────────│ RUNNING │──────────┘ │ │
|
||||
│ └──────┬──────┘ (waiting for │ │
|
||||
│ │ approval) │ │
|
||||
│ ┌────────────┼────────────┐ │ │
|
||||
│ │ │ │ │ │
|
||||
│ ▼ ▼ ▼ │ │
|
||||
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
|
||||
│ │ SUCCEEDED │ │ FAILED │ │ CANCELLED │ │ │
|
||||
│ └───────────┘ └───────────┘ └───────────┘ │ │
|
||||
│ │
|
||||
│ Transitions: │
|
||||
│ - CREATED → RUNNING: start() │
|
||||
│ - RUNNING → PAUSED: pause(), waiting approval │
|
||||
│ - PAUSED → RUNNING: resume(), approval granted │
|
||||
│ - RUNNING → SUCCEEDED: all nodes complete │
|
||||
│ - RUNNING → FAILED: node fails with fail action │
|
||||
│ - RUNNING → CANCELLED: cancel() │
|
||||
│ - PAUSED → CANCELLED: cancel() │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Step Run State Machine
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ STEP RUN STATE MACHINE │
|
||||
│ │
|
||||
│ ┌──────────┐ │
|
||||
│ │ PENDING │ ◄──── Initial state; dependencies not met │
|
||||
│ └────┬─────┘ │
|
||||
│ │ dependencies met + condition true │
|
||||
│ ▼ │
|
||||
│ ┌──────────┐ │
|
||||
│ │ RUNNING │ ◄──── Step is executing │
|
||||
│ └────┬─────┘ │
|
||||
│ │ │
|
||||
│ ┌────┴────────────────┬─────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
|
||||
│ │ SUCCEEDED │ │ FAILED │ │ SKIPPED │ │
|
||||
│ └───────────┘ └─────┬─────┘ └───────────┘ │
|
||||
│ │ ▲ │
|
||||
│ │ │ condition false │
|
||||
│ ▼ │ │
|
||||
│ ┌───────────┐ │ │
|
||||
│ │ RETRYING │──────┘ (max retries exceeded) │
|
||||
│ └─────┬─────┘ │
|
||||
│ │ │
|
||||
│ │ retry attempt │
|
||||
│ └──────────────────┐ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────┐ │
|
||||
│ │ RUNNING │ (retry) │
|
||||
│ └──────────┘ │
|
||||
│ │
|
||||
│ Additional transitions: │
|
||||
│ - Any state → CANCELLED: workflow cancelled │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Workflow Templates
|
||||
CREATE TABLE release.workflow_templates (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
description TEXT,
|
||||
version INTEGER NOT NULL DEFAULT 1,
|
||||
nodes JSONB NOT NULL,
|
||||
edges JSONB NOT NULL,
|
||||
inputs JSONB NOT NULL DEFAULT '[]',
|
||||
outputs JSONB NOT NULL DEFAULT '[]',
|
||||
tags JSONB NOT NULL DEFAULT '[]',
|
||||
is_builtin BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
created_by UUID REFERENCES users(id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_workflow_templates_tenant ON release.workflow_templates(tenant_id);
|
||||
CREATE INDEX idx_workflow_templates_name ON release.workflow_templates(name);
|
||||
|
||||
-- Workflow Runs
|
||||
CREATE TABLE release.workflow_runs (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
template_id UUID NOT NULL REFERENCES release.workflow_templates(id),
|
||||
template_version INTEGER NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'created',
|
||||
context JSONB NOT NULL,
|
||||
inputs JSONB NOT NULL DEFAULT '{}',
|
||||
outputs JSONB NOT NULL DEFAULT '{}',
|
||||
error_message TEXT,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
created_by UUID REFERENCES users(id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_workflow_runs_tenant ON release.workflow_runs(tenant_id);
|
||||
CREATE INDEX idx_workflow_runs_template ON release.workflow_runs(template_id);
|
||||
CREATE INDEX idx_workflow_runs_status ON release.workflow_runs(status);
|
||||
|
||||
-- Step Runs
|
||||
CREATE TABLE release.step_runs (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
workflow_run_id UUID NOT NULL REFERENCES release.workflow_runs(id) ON DELETE CASCADE,
|
||||
node_id VARCHAR(255) NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending',
|
||||
inputs JSONB NOT NULL DEFAULT '{}',
|
||||
outputs JSONB NOT NULL DEFAULT '{}',
|
||||
error_message TEXT,
|
||||
logs TEXT,
|
||||
attempt_number INTEGER NOT NULL DEFAULT 1,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
UNIQUE (workflow_run_id, node_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_step_runs_workflow ON release.step_runs(workflow_run_id);
|
||||
CREATE INDEX idx_step_runs_status ON release.step_runs(status);
|
||||
|
||||
-- Step Registry
|
||||
CREATE TABLE release.step_types (
|
||||
type VARCHAR(255) PRIMARY KEY,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
description TEXT,
|
||||
category VARCHAR(100) NOT NULL,
|
||||
icon VARCHAR(255),
|
||||
config_schema JSONB NOT NULL,
|
||||
input_schema JSONB NOT NULL,
|
||||
output_schema JSONB NOT NULL,
|
||||
executor VARCHAR(255) NOT NULL DEFAULT 'builtin',
|
||||
default_timeout INTEGER NOT NULL DEFAULT 300,
|
||||
safe_to_retry BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
retryable_errors JSONB NOT NULL DEFAULT '[]',
|
||||
documentation TEXT,
|
||||
examples JSONB NOT NULL DEFAULT '[]',
|
||||
plugin_id UUID REFERENCES release.plugins(id),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_step_types_category ON release.step_types(category);
|
||||
CREATE INDEX idx_step_types_plugin ON release.step_types(plugin_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow Template Example: Standard Deployment
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "template-standard-deploy",
|
||||
"name": "standard-deploy",
|
||||
"displayName": "Standard Deployment",
|
||||
"version": 1,
|
||||
"inputs": [
|
||||
{ "name": "releaseId", "type": "uuid", "required": true },
|
||||
{ "name": "environmentId", "type": "uuid", "required": true },
|
||||
{ "name": "promotionId", "type": "uuid", "required": true }
|
||||
],
|
||||
"nodes": [
|
||||
{
|
||||
"id": "approval",
|
||||
"type": "approval",
|
||||
"name": "Approval Gate",
|
||||
"config": {},
|
||||
"inputs": [
|
||||
{ "name": "promotionId", "source": { "type": "context", "path": "promotionId" } }
|
||||
],
|
||||
"position": { "x": 100, "y": 100 }
|
||||
},
|
||||
{
|
||||
"id": "security-gate",
|
||||
"type": "security-gate",
|
||||
"name": "Security Verification",
|
||||
"config": {
|
||||
"blockOnCritical": true,
|
||||
"blockOnHigh": true
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "releaseId", "source": { "type": "context", "path": "releaseId" } }
|
||||
],
|
||||
"position": { "x": 100, "y": 200 }
|
||||
},
|
||||
{
|
||||
"id": "deploy-targets",
|
||||
"type": "deploy-compose",
|
||||
"name": "Deploy to Targets",
|
||||
"config": {
|
||||
"strategy": "rolling",
|
||||
"parallelism": 2
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "releaseId", "source": { "type": "context", "path": "releaseId" } },
|
||||
{ "name": "environmentId", "source": { "type": "context", "path": "environmentId" } }
|
||||
],
|
||||
"timeout": 600,
|
||||
"retryPolicy": {
|
||||
"maxRetries": 2,
|
||||
"backoffType": "exponential",
|
||||
"backoffSeconds": 30
|
||||
},
|
||||
"onFailure": "rollback",
|
||||
"position": { "x": 100, "y": 400 }
|
||||
},
|
||||
{
|
||||
"id": "health-check",
|
||||
"type": "health-check",
|
||||
"name": "Health Verification",
|
||||
"config": {
|
||||
"type": "http",
|
||||
"path": "/health",
|
||||
"expectedStatus": 200,
|
||||
"timeout": 30,
|
||||
"retries": 5
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "targets", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "deployedTargets" } }
|
||||
],
|
||||
"onFailure": "rollback",
|
||||
"position": { "x": 100, "y": 500 }
|
||||
},
|
||||
{
|
||||
"id": "notify-success",
|
||||
"type": "notify",
|
||||
"name": "Success Notification",
|
||||
"config": {
|
||||
"channel": "slack",
|
||||
"template": "deployment-success"
|
||||
},
|
||||
"onFailure": "continue",
|
||||
"position": { "x": 100, "y": 700 }
|
||||
},
|
||||
{
|
||||
"id": "rollback-handler",
|
||||
"type": "rollback",
|
||||
"name": "Rollback Handler",
|
||||
"config": {
|
||||
"strategy": "to-previous"
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "deploymentJobId", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "jobId" } }
|
||||
],
|
||||
"position": { "x": 300, "y": 450 }
|
||||
}
|
||||
],
|
||||
"edges": [
|
||||
{ "id": "e1", "from": "approval", "to": "security-gate" },
|
||||
{ "id": "e2", "from": "security-gate", "to": "deploy-targets" },
|
||||
{ "id": "e3", "from": "deploy-targets", "to": "health-check" },
|
||||
{ "id": "e4", "from": "health-check", "to": "notify-success" },
|
||||
{ "id": "e5", "from": "deploy-targets", "to": "rollback-handler", "condition": "status === 'failed'" },
|
||||
{ "id": "e6", "from": "health-check", "to": "rollback-handler", "condition": "status === 'failed'" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
See [API Documentation](../api/workflows.md) for full specification.
|
||||
|
||||
```yaml
|
||||
# Workflow Templates
|
||||
POST /api/v1/workflow-templates
|
||||
GET /api/v1/workflow-templates
|
||||
GET /api/v1/workflow-templates/{id}
|
||||
PUT /api/v1/workflow-templates/{id}
|
||||
DELETE /api/v1/workflow-templates/{id}
|
||||
POST /api/v1/workflow-templates/{id}/validate
|
||||
|
||||
# Step Registry
|
||||
GET /api/v1/step-types
|
||||
GET /api/v1/step-types/{type}
|
||||
|
||||
# Workflow Runs
|
||||
POST /api/v1/workflow-runs
|
||||
GET /api/v1/workflow-runs
|
||||
GET /api/v1/workflow-runs/{id}
|
||||
POST /api/v1/workflow-runs/{id}/pause
|
||||
POST /api/v1/workflow-runs/{id}/resume
|
||||
POST /api/v1/workflow-runs/{id}/cancel
|
||||
GET /api/v1/workflow-runs/{id}/steps
|
||||
GET /api/v1/workflow-runs/{id}/steps/{nodeId}
|
||||
GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs
|
||||
GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Workflow Templates](../workflow/templates.md)
|
||||
- [Execution State Machine](../workflow/execution.md)
|
||||
- [API Documentation](../api/workflows.md)
|
||||
246
docs/modules/release-orchestrator/operations/alerting.md
Normal file
246
docs/modules/release-orchestrator/operations/alerting.md
Normal file
@@ -0,0 +1,246 @@
|
||||
# Alerting Rules
|
||||
|
||||
> Prometheus alerting rules for the Release Orchestrator.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 13.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Metrics](metrics.md), [Observability Overview](overview.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The Release Orchestrator provides Prometheus alerting rules for monitoring promotions, deployments, agents, and integrations.
|
||||
|
||||
---
|
||||
|
||||
## High Priority Alerts
|
||||
|
||||
### Security Gate Block Rate
|
||||
|
||||
```yaml
|
||||
- alert: PromotionGateBlockRate
|
||||
expr: |
|
||||
rate(stella_security_gate_results_total{result="blocked"}[1h]) /
|
||||
rate(stella_security_gate_results_total[1h]) > 0.5
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High rate of security gate blocks"
|
||||
description: "More than 50% of promotions are being blocked by security gates"
|
||||
```
|
||||
|
||||
### Deployment Failure Rate
|
||||
|
||||
```yaml
|
||||
- alert: DeploymentFailureRate
|
||||
expr: |
|
||||
rate(stella_deployments_total{status="failed"}[1h]) /
|
||||
rate(stella_deployments_total[1h]) > 0.1
|
||||
for: 10m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "High deployment failure rate"
|
||||
description: "More than 10% of deployments are failing"
|
||||
```
|
||||
|
||||
### Agent Offline
|
||||
|
||||
```yaml
|
||||
- alert: AgentOffline
|
||||
expr: |
|
||||
stella_agents_status{status="offline"} == 1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Agent offline"
|
||||
description: "Agent {{ $labels.agent_id }} has been offline for 5 minutes"
|
||||
```
|
||||
|
||||
### Promotion Stuck
|
||||
|
||||
```yaml
|
||||
- alert: PromotionStuck
|
||||
expr: |
|
||||
time() - stella_promotion_start_time{status="deploying"} > 1800
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Promotion stuck in deploying state"
|
||||
description: "Promotion {{ $labels.promotion_id }} has been deploying for more than 30 minutes"
|
||||
```
|
||||
|
||||
### Integration Unhealthy
|
||||
|
||||
```yaml
|
||||
- alert: IntegrationUnhealthy
|
||||
expr: |
|
||||
stella_integration_health{status="unhealthy"} == 1
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Integration unhealthy"
|
||||
description: "Integration {{ $labels.integration_name }} has been unhealthy for 10 minutes"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Medium Priority Alerts
|
||||
|
||||
### Workflow Step Timeout
|
||||
|
||||
```yaml
|
||||
- alert: WorkflowStepTimeout
|
||||
expr: |
|
||||
stella_workflow_step_duration_seconds > 600
|
||||
for: 1m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Workflow step taking too long"
|
||||
description: "Step {{ $labels.step_type }} in workflow {{ $labels.workflow_run_id }} has been running for more than 10 minutes"
|
||||
```
|
||||
|
||||
### Evidence Generation Failure
|
||||
|
||||
```yaml
|
||||
- alert: EvidenceGenerationFailure
|
||||
expr: |
|
||||
rate(stella_evidence_generation_failures_total[1h]) > 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Evidence generation failures"
|
||||
description: "Evidence generation is failing, affecting audit compliance"
|
||||
```
|
||||
|
||||
### Target Health Degraded
|
||||
|
||||
```yaml
|
||||
- alert: TargetHealthDegraded
|
||||
expr: |
|
||||
stella_target_health{status!="healthy"} == 1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Target health degraded"
|
||||
description: "Target {{ $labels.target_name }} is reporting {{ $labels.status }}"
|
||||
```
|
||||
|
||||
### Approval Timeout
|
||||
|
||||
```yaml
|
||||
- alert: ApprovalTimeout
|
||||
expr: |
|
||||
time() - stella_promotion_approval_requested_time > 86400
|
||||
for: 1h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Promotion awaiting approval for too long"
|
||||
description: "Promotion {{ $labels.promotion_id }} has been waiting for approval for more than 24 hours"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Low Priority Alerts
|
||||
|
||||
### Database Connection Pool
|
||||
|
||||
```yaml
|
||||
- alert: DatabaseConnectionPoolExhausted
|
||||
expr: |
|
||||
stella_db_connection_pool_available < 5
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Database connection pool running low"
|
||||
description: "Only {{ $value }} database connections available"
|
||||
```
|
||||
|
||||
### Plugin Error Rate
|
||||
|
||||
```yaml
|
||||
- alert: PluginErrorRate
|
||||
expr: |
|
||||
rate(stella_plugin_errors_total[5m]) > 1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Plugin errors detected"
|
||||
description: "Plugin {{ $labels.plugin_id }} is experiencing errors"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Alert Routing
|
||||
|
||||
### Example AlertManager Configuration
|
||||
|
||||
```yaml
|
||||
# alertmanager.yaml
|
||||
route:
|
||||
receiver: default
|
||||
group_by: [alertname, severity]
|
||||
group_wait: 30s
|
||||
group_interval: 5m
|
||||
repeat_interval: 4h
|
||||
|
||||
routes:
|
||||
- match:
|
||||
severity: critical
|
||||
receiver: pagerduty
|
||||
continue: true
|
||||
|
||||
- match:
|
||||
severity: warning
|
||||
receiver: slack
|
||||
|
||||
receivers:
|
||||
- name: default
|
||||
webhook_configs:
|
||||
- url: http://webhook.example.com/alerts
|
||||
|
||||
- name: pagerduty
|
||||
pagerduty_configs:
|
||||
- service_key: ${PAGERDUTY_KEY}
|
||||
severity: critical
|
||||
|
||||
- name: slack
|
||||
slack_configs:
|
||||
- channel: '#alerts'
|
||||
api_url: ${SLACK_WEBHOOK_URL}
|
||||
title: '{{ .CommonAnnotations.summary }}'
|
||||
text: '{{ .CommonAnnotations.description }}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dashboard Integration
|
||||
|
||||
### Grafana Alert Panels
|
||||
|
||||
Recommended dashboard panels for alerts:
|
||||
|
||||
| Panel | Query |
|
||||
|-------|-------|
|
||||
| Active Alerts | `count(ALERTS{alertstate="firing"})` |
|
||||
| Alert History | `count_over_time(ALERTS{alertstate="firing"}[24h])` |
|
||||
| By Severity | `count(ALERTS{alertstate="firing"}) by (severity)` |
|
||||
| By Component | `count(ALERTS{alertstate="firing"}) by (alertname)` |
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Metrics](metrics.md)
|
||||
- [Observability Overview](overview.md)
|
||||
- [Logging](logging.md)
|
||||
- [Tracing](tracing.md)
|
||||
220
docs/modules/release-orchestrator/operations/logging.md
Normal file
220
docs/modules/release-orchestrator/operations/logging.md
Normal file
@@ -0,0 +1,220 @@
|
||||
# Logging Specification
|
||||
|
||||
> Structured logging format and categories for the Release Orchestrator.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 13.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Observability Overview](overview.md), [Tracing](tracing.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The Release Orchestrator uses structured JSON logging with consistent format, correlation IDs, and context propagation for all components.
|
||||
|
||||
---
|
||||
|
||||
## Structured Log Format
|
||||
|
||||
### JSON Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-01-09T14:32:15.123Z",
|
||||
"level": "info",
|
||||
"module": "promotion-manager",
|
||||
"message": "Promotion approved",
|
||||
"context": {
|
||||
"tenant_id": "uuid",
|
||||
"promotion_id": "uuid",
|
||||
"release_id": "uuid",
|
||||
"environment": "prod",
|
||||
"user_id": "uuid"
|
||||
},
|
||||
"details": {
|
||||
"approvals_count": 2,
|
||||
"gates_passed": ["security", "approval", "freeze"],
|
||||
"decision": "allow"
|
||||
},
|
||||
"trace_id": "abc123",
|
||||
"span_id": "def456",
|
||||
"duration_ms": 45
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Log Levels
|
||||
|
||||
| Level | Usage |
|
||||
|-------|-------|
|
||||
| `error` | Errors requiring attention; failures that impact functionality |
|
||||
| `warn` | Potential issues; degraded functionality; approaching limits |
|
||||
| `info` | Significant events; state changes; audit-relevant actions |
|
||||
| `debug` | Detailed debugging info; request/response bodies |
|
||||
| `trace` | Very detailed tracing; internal state; performance profiling |
|
||||
|
||||
---
|
||||
|
||||
## Log Categories
|
||||
|
||||
| Category | Examples |
|
||||
|----------|----------|
|
||||
| `api` | Request received, response sent, validation errors |
|
||||
| `promotion` | Promotion requested, approved, rejected, completed |
|
||||
| `deployment` | Deployment started, task assigned, completed, failed |
|
||||
| `security` | Gate evaluation, vulnerability found, policy violation |
|
||||
| `agent` | Agent registered, heartbeat, task execution |
|
||||
| `workflow` | Workflow started, step executed, completed |
|
||||
| `integration` | Integration tested, resource discovered, webhook received |
|
||||
|
||||
---
|
||||
|
||||
## Logging Examples
|
||||
|
||||
### API Request
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-01-09T14:32:15.123Z",
|
||||
"level": "info",
|
||||
"module": "api",
|
||||
"message": "Request received",
|
||||
"context": {
|
||||
"tenant_id": "uuid",
|
||||
"user_id": "uuid"
|
||||
},
|
||||
"details": {
|
||||
"method": "POST",
|
||||
"path": "/api/v1/promotions",
|
||||
"status": 201,
|
||||
"duration_ms": 125
|
||||
},
|
||||
"trace_id": "abc123",
|
||||
"span_id": "def456"
|
||||
}
|
||||
```
|
||||
|
||||
### Promotion Event
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-01-09T14:32:15.123Z",
|
||||
"level": "info",
|
||||
"module": "promotion-manager",
|
||||
"message": "Promotion approved",
|
||||
"context": {
|
||||
"tenant_id": "uuid",
|
||||
"promotion_id": "uuid",
|
||||
"release_id": "uuid",
|
||||
"environment": "prod",
|
||||
"user_id": "uuid"
|
||||
},
|
||||
"details": {
|
||||
"approvals_count": 2,
|
||||
"gates_passed": ["security", "approval", "freeze"],
|
||||
"decision": "allow"
|
||||
},
|
||||
"trace_id": "abc123",
|
||||
"span_id": "def456",
|
||||
"duration_ms": 45
|
||||
}
|
||||
```
|
||||
|
||||
### Security Gate Failure
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-01-09T14:32:15.123Z",
|
||||
"level": "warn",
|
||||
"module": "security",
|
||||
"message": "Security gate blocked promotion",
|
||||
"context": {
|
||||
"tenant_id": "uuid",
|
||||
"promotion_id": "uuid",
|
||||
"release_id": "uuid",
|
||||
"environment": "prod"
|
||||
},
|
||||
"details": {
|
||||
"gate_name": "security-gate",
|
||||
"reason": "Critical vulnerability found",
|
||||
"vulnerabilities": {
|
||||
"critical": 1,
|
||||
"high": 3
|
||||
}
|
||||
},
|
||||
"trace_id": "abc123",
|
||||
"span_id": "def456"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sensitive Data Masking
|
||||
|
||||
The following fields are automatically masked in logs:
|
||||
|
||||
| Field Type | Masking Strategy |
|
||||
|------------|------------------|
|
||||
| Passwords | Not logged |
|
||||
| API Keys | First 4 and last 4 chars only |
|
||||
| Tokens | Hash only |
|
||||
| PII | Redacted |
|
||||
| Credentials | Not logged |
|
||||
|
||||
### Example
|
||||
|
||||
```json
|
||||
{
|
||||
"message": "Authentication succeeded",
|
||||
"details": {
|
||||
"api_key": "sk_l...abcd",
|
||||
"token_hash": "sha256:abc123..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Correlation IDs
|
||||
|
||||
All logs include correlation IDs for request tracing:
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `trace_id` | W3C Trace Context trace ID |
|
||||
| `span_id` | Current operation span ID |
|
||||
| `correlation_id` | Business-level correlation (optional) |
|
||||
|
||||
---
|
||||
|
||||
## Log Aggregation
|
||||
|
||||
Recommended log aggregation setup:
|
||||
|
||||
```yaml
|
||||
# Fluent Bit configuration
|
||||
[INPUT]
|
||||
Name tail
|
||||
Path /var/log/stella/*.log
|
||||
Parser json
|
||||
|
||||
[FILTER]
|
||||
Name nest
|
||||
Match *
|
||||
Operation lift
|
||||
Nested_under context
|
||||
|
||||
[OUTPUT]
|
||||
Name opensearch
|
||||
Match *
|
||||
Host opensearch.example.com
|
||||
Index stella-logs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Observability Overview](overview.md)
|
||||
- [Tracing](tracing.md)
|
||||
- [Alerting](alerting.md)
|
||||
- [Security Overview](../security/overview.md)
|
||||
274
docs/modules/release-orchestrator/operations/metrics.md
Normal file
274
docs/modules/release-orchestrator/operations/metrics.md
Normal file
@@ -0,0 +1,274 @@
|
||||
# Metrics Specification
|
||||
|
||||
## Overview
|
||||
|
||||
Release Orchestrator exposes Prometheus-compatible metrics for monitoring deployment health, performance, and operational status.
|
||||
|
||||
## Core Metrics
|
||||
|
||||
### Release Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_releases_total` | counter | Total releases created | `tenant`, `status` |
|
||||
| `stella_releases_active` | gauge | Currently active releases | `tenant`, `status` |
|
||||
| `stella_release_components_count` | histogram | Components per release | `tenant` |
|
||||
|
||||
### Promotion Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_promotions_total` | counter | Total promotions | `tenant`, `env`, `status` |
|
||||
| `stella_promotions_in_progress` | gauge | Promotions currently in progress | `tenant`, `env` |
|
||||
| `stella_promotion_duration_seconds` | histogram | Time from request to completion | `tenant`, `env`, `status` |
|
||||
| `stella_approval_pending_count` | gauge | Pending approvals | `tenant`, `env` |
|
||||
| `stella_approval_duration_seconds` | histogram | Time to approve | `tenant`, `env` |
|
||||
|
||||
### Deployment Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_deployments_total` | counter | Total deployments | `tenant`, `env`, `strategy`, `status` |
|
||||
| `stella_deployment_duration_seconds` | histogram | Deployment duration | `tenant`, `env`, `strategy` |
|
||||
| `stella_deployment_tasks_total` | counter | Total deployment tasks | `tenant`, `status` |
|
||||
| `stella_deployment_task_duration_seconds` | histogram | Task duration | `target_type` |
|
||||
| `stella_rollbacks_total` | counter | Total rollbacks | `tenant`, `env`, `reason` |
|
||||
|
||||
### Agent Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_agents_connected` | gauge | Connected agents | `tenant` |
|
||||
| `stella_agents_by_status` | gauge | Agents by status | `tenant`, `status` |
|
||||
| `stella_agent_tasks_total` | counter | Tasks executed by agents | `agent`, `type`, `status` |
|
||||
| `stella_agent_task_duration_seconds` | histogram | Agent task duration | `agent`, `type` |
|
||||
| `stella_agent_heartbeat_age_seconds` | gauge | Seconds since last heartbeat | `agent` |
|
||||
| `stella_agent_resource_cpu_percent` | gauge | Agent CPU usage | `agent` |
|
||||
| `stella_agent_resource_memory_percent` | gauge | Agent memory usage | `agent` |
|
||||
|
||||
### Workflow Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_workflow_runs_total` | counter | Workflow executions | `tenant`, `template`, `status` |
|
||||
| `stella_workflow_runs_active` | gauge | Currently running workflows | `tenant`, `template` |
|
||||
| `stella_workflow_duration_seconds` | histogram | Workflow duration | `template`, `status` |
|
||||
| `stella_workflow_step_duration_seconds` | histogram | Step execution time | `step_type`, `status` |
|
||||
| `stella_workflow_step_retries_total` | counter | Step retry count | `step_type` |
|
||||
|
||||
### Target Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_targets_total` | gauge | Total targets | `tenant`, `env`, `type` |
|
||||
| `stella_targets_by_health` | gauge | Targets by health status | `tenant`, `env`, `health` |
|
||||
| `stella_target_drift_detected` | gauge | Targets with drift | `tenant`, `env` |
|
||||
|
||||
### Integration Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_integrations_total` | gauge | Configured integrations | `tenant`, `type` |
|
||||
| `stella_integration_health` | gauge | Integration health (1=healthy) | `tenant`, `integration` |
|
||||
| `stella_integration_requests_total` | counter | Requests to integrations | `integration`, `operation`, `status` |
|
||||
| `stella_integration_latency_seconds` | histogram | Integration request latency | `integration`, `operation` |
|
||||
|
||||
### Gate Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_gate_evaluations_total` | counter | Gate evaluations | `tenant`, `gate_type`, `result` |
|
||||
| `stella_gate_evaluation_duration_seconds` | histogram | Gate evaluation time | `gate_type` |
|
||||
| `stella_gate_blocks_total` | counter | Blocked promotions by gate | `tenant`, `gate_type`, `env` |
|
||||
|
||||
## API Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_http_requests_total` | counter | HTTP requests | `method`, `path`, `status` |
|
||||
| `stella_http_request_duration_seconds` | histogram | Request latency | `method`, `path` |
|
||||
| `stella_http_requests_in_flight` | gauge | Active requests | `method` |
|
||||
| `stella_http_request_size_bytes` | histogram | Request size | `method`, `path` |
|
||||
| `stella_http_response_size_bytes` | histogram | Response size | `method`, `path` |
|
||||
|
||||
## Evidence Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_evidence_packets_total` | counter | Evidence packets generated | `tenant`, `type` |
|
||||
| `stella_evidence_packet_size_bytes` | histogram | Evidence packet size | `type` |
|
||||
| `stella_evidence_verification_total` | counter | Evidence verifications | `result` |
|
||||
|
||||
## Prometheus Configuration
|
||||
|
||||
```yaml
|
||||
# prometheus.yml
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: 'stella-orchestrator'
|
||||
static_configs:
|
||||
- targets: ['stella-orchestrator:9090']
|
||||
metrics_path: /metrics
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /etc/prometheus/ca.crt
|
||||
|
||||
- job_name: 'stella-agents'
|
||||
kubernetes_sd_configs:
|
||||
- role: pod
|
||||
selectors:
|
||||
- role: pod
|
||||
label: "app.kubernetes.io/name=stella-agent"
|
||||
relabel_configs:
|
||||
- source_labels: [__meta_kubernetes_pod_label_agent_id]
|
||||
target_label: agent_id
|
||||
```
|
||||
|
||||
## Histogram Buckets
|
||||
|
||||
### Duration Buckets (seconds)
|
||||
|
||||
```yaml
|
||||
# Short operations (API calls, gate evaluations)
|
||||
short_duration_buckets: [0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
|
||||
|
||||
# Medium operations (workflow steps)
|
||||
medium_duration_buckets: [0.1, 0.5, 1, 2.5, 5, 10, 30, 60, 120, 300]
|
||||
|
||||
# Long operations (deployments)
|
||||
long_duration_buckets: [1, 5, 10, 30, 60, 120, 300, 600, 1200, 3600]
|
||||
```
|
||||
|
||||
### Size Buckets (bytes)
|
||||
|
||||
```yaml
|
||||
# Request/response sizes
|
||||
size_buckets: [100, 1000, 10000, 100000, 1000000, 10000000]
|
||||
|
||||
# Evidence packet sizes
|
||||
evidence_buckets: [1000, 10000, 100000, 500000, 1000000, 5000000]
|
||||
```
|
||||
|
||||
## SLI Definitions
|
||||
|
||||
### Availability SLI
|
||||
|
||||
```promql
|
||||
# API availability (99.9% target)
|
||||
sum(rate(stella_http_requests_total{status!~"5.."}[5m]))
|
||||
/
|
||||
sum(rate(stella_http_requests_total[5m]))
|
||||
```
|
||||
|
||||
### Latency SLI
|
||||
|
||||
```promql
|
||||
# API latency P99 < 500ms
|
||||
histogram_quantile(0.99,
|
||||
sum(rate(stella_http_request_duration_seconds_bucket[5m])) by (le)
|
||||
)
|
||||
```
|
||||
|
||||
### Deployment Success SLI
|
||||
|
||||
```promql
|
||||
# Deployment success rate (99% target)
|
||||
sum(rate(stella_deployments_total{status="succeeded"}[24h]))
|
||||
/
|
||||
sum(rate(stella_deployments_total[24h]))
|
||||
```
|
||||
|
||||
## Alert Rules
|
||||
|
||||
```yaml
|
||||
groups:
|
||||
- name: stella-orchestrator
|
||||
rules:
|
||||
- alert: HighDeploymentFailureRate
|
||||
expr: |
|
||||
sum(rate(stella_deployments_total{status="failed"}[1h]))
|
||||
/
|
||||
sum(rate(stella_deployments_total[1h])) > 0.1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: High deployment failure rate
|
||||
description: More than 10% of deployments failing in the last hour
|
||||
|
||||
- alert: AgentOffline
|
||||
expr: stella_agent_heartbeat_age_seconds > 120
|
||||
for: 2m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: Agent {{ $labels.agent }} offline
|
||||
description: Agent has not sent heartbeat for > 2 minutes
|
||||
|
||||
- alert: PendingApprovalsStale
|
||||
expr: |
|
||||
stella_approval_pending_count > 0
|
||||
and
|
||||
time() - stella_promotion_request_timestamp > 3600
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: Stale pending approvals
|
||||
description: Approvals pending for more than 1 hour
|
||||
|
||||
- alert: IntegrationUnhealthy
|
||||
expr: stella_integration_health == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: Integration {{ $labels.integration }} unhealthy
|
||||
description: Integration health check failing
|
||||
|
||||
- alert: HighAPILatency
|
||||
expr: |
|
||||
histogram_quantile(0.99,
|
||||
sum(rate(stella_http_request_duration_seconds_bucket[5m])) by (le, path)
|
||||
) > 1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: High API latency on {{ $labels.path }}
|
||||
description: P99 latency exceeds 1 second
|
||||
```
|
||||
|
||||
## Grafana Dashboards
|
||||
|
||||
### Main Dashboard Panels
|
||||
|
||||
1. **Deployment Pipeline Overview**
|
||||
- Promotions per environment (time series)
|
||||
- Success/failure rates (gauge)
|
||||
- Active deployments (stat)
|
||||
|
||||
2. **Agent Health**
|
||||
- Connected agents (stat)
|
||||
- Agent status distribution (pie chart)
|
||||
- Heartbeat age (table)
|
||||
|
||||
3. **Gate Performance**
|
||||
- Gate evaluation counts (bar chart)
|
||||
- Block rate by gate type (time series)
|
||||
- Evaluation latency (heatmap)
|
||||
|
||||
4. **API Performance**
|
||||
- Request rate (time series)
|
||||
- Error rate (time series)
|
||||
- Latency distribution (heatmap)
|
||||
|
||||
## References
|
||||
|
||||
- [Operations Overview](overview.md)
|
||||
- [Logging](logging.md)
|
||||
- [Tracing](tracing.md)
|
||||
- [Alerting](alerting.md)
|
||||
508
docs/modules/release-orchestrator/operations/overview.md
Normal file
508
docs/modules/release-orchestrator/operations/overview.md
Normal file
@@ -0,0 +1,508 @@
|
||||
# Operations Overview
|
||||
|
||||
## Observability Stack
|
||||
|
||||
Release Orchestrator provides comprehensive observability through metrics, logging, and distributed tracing.
|
||||
|
||||
```
|
||||
OBSERVABILITY ARCHITECTURE
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ RELEASE ORCHESTRATOR │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Metrics │ │ Logs │ │ Traces │ │ Events │ │
|
||||
│ │ Exporter │ │ Collector │ │ Exporter │ │ Publisher │ │
|
||||
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
|
||||
│ │ │ │ │ │
|
||||
└─────────┼────────────────┼────────────────┼────────────────┼────────────────┘
|
||||
│ │ │ │
|
||||
▼ ▼ ▼ ▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ OBSERVABILITY BACKENDS │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Prometheus │ │ Loki / │ │ Jaeger / │ │ Event │ │
|
||||
│ │ / Mimir │ │ Elasticsearch│ │ Tempo │ │ Bus │ │
|
||||
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
|
||||
│ │ │ │ │ │
|
||||
│ └────────────────┴────────────────┴────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────┐ │
|
||||
│ │ Grafana │ │
|
||||
│ │ Dashboards │ │
|
||||
│ └─────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Metrics
|
||||
|
||||
### Core Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_releases_total` | counter | Total releases created | `tenant`, `status` |
|
||||
| `stella_promotions_total` | counter | Total promotions | `tenant`, `env`, `status` |
|
||||
| `stella_deployments_total` | counter | Total deployments | `tenant`, `env`, `strategy` |
|
||||
| `stella_deployment_duration_seconds` | histogram | Deployment duration | `tenant`, `env`, `strategy` |
|
||||
| `stella_rollbacks_total` | counter | Total rollbacks | `tenant`, `env`, `reason` |
|
||||
| `stella_agents_connected` | gauge | Connected agents | `tenant` |
|
||||
| `stella_targets_total` | gauge | Total targets | `tenant`, `env`, `type` |
|
||||
| `stella_workflow_runs_total` | counter | Workflow executions | `tenant`, `template`, `status` |
|
||||
| `stella_workflow_step_duration_seconds` | histogram | Step execution time | `step_type` |
|
||||
| `stella_approval_pending_count` | gauge | Pending approvals | `tenant`, `env` |
|
||||
| `stella_approval_duration_seconds` | histogram | Time to approve | `tenant`, `env` |
|
||||
|
||||
### API Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_http_requests_total` | counter | HTTP requests | `method`, `path`, `status` |
|
||||
| `stella_http_request_duration_seconds` | histogram | Request latency | `method`, `path` |
|
||||
| `stella_http_requests_in_flight` | gauge | Active requests | `method` |
|
||||
|
||||
### Agent Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_agent_tasks_total` | counter | Tasks executed | `agent`, `type`, `status` |
|
||||
| `stella_agent_task_duration_seconds` | histogram | Task duration | `agent`, `type` |
|
||||
| `stella_agent_heartbeat_age_seconds` | gauge | Since last heartbeat | `agent` |
|
||||
|
||||
### Prometheus Configuration
|
||||
|
||||
```yaml
|
||||
# prometheus.yml
|
||||
scrape_configs:
|
||||
- job_name: 'stella-orchestrator'
|
||||
static_configs:
|
||||
- targets: ['stella-orchestrator:9090']
|
||||
metrics_path: /metrics
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /etc/prometheus/ca.crt
|
||||
|
||||
- job_name: 'stella-agents'
|
||||
kubernetes_sd_configs:
|
||||
- role: pod
|
||||
selectors:
|
||||
- role: pod
|
||||
label: "app.kubernetes.io/name=stella-agent"
|
||||
relabel_configs:
|
||||
- source_labels: [__meta_kubernetes_pod_label_agent_id]
|
||||
target_label: agent_id
|
||||
```
|
||||
|
||||
## Logging
|
||||
|
||||
### Log Format
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-01-09T10:30:00.123Z",
|
||||
"level": "info",
|
||||
"message": "Deployment started",
|
||||
"service": "deploy-orchestrator",
|
||||
"version": "1.0.0",
|
||||
"traceId": "abc123def456",
|
||||
"spanId": "789ghi",
|
||||
"tenantId": "tenant-uuid",
|
||||
"correlationId": "corr-uuid",
|
||||
"context": {
|
||||
"deploymentJobId": "job-uuid",
|
||||
"releaseId": "release-uuid",
|
||||
"environmentId": "env-uuid"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Log Levels
|
||||
|
||||
| Level | Usage |
|
||||
|-------|-------|
|
||||
| `error` | Failures requiring attention |
|
||||
| `warn` | Degraded operation, recoverable issues |
|
||||
| `info` | Business events (deployment started, approval granted) |
|
||||
| `debug` | Detailed operational info |
|
||||
| `trace` | Very detailed debugging |
|
||||
|
||||
### Structured Logging Configuration
|
||||
|
||||
```typescript
|
||||
// Logging configuration
|
||||
const loggerConfig = {
|
||||
level: process.env.LOG_LEVEL || 'info',
|
||||
format: 'json',
|
||||
outputs: [
|
||||
{
|
||||
type: 'stdout',
|
||||
format: 'json'
|
||||
},
|
||||
{
|
||||
type: 'file',
|
||||
path: '/var/log/stella/orchestrator.log',
|
||||
rotation: {
|
||||
maxSize: '100MB',
|
||||
maxFiles: 10
|
||||
}
|
||||
}
|
||||
],
|
||||
// Sensitive field masking
|
||||
redact: [
|
||||
'password',
|
||||
'token',
|
||||
'secret',
|
||||
'credentials',
|
||||
'authorization'
|
||||
]
|
||||
};
|
||||
```
|
||||
|
||||
### Important Log Events
|
||||
|
||||
| Event | Level | Description |
|
||||
|-------|-------|-------------|
|
||||
| `deployment.started` | info | Deployment job started |
|
||||
| `deployment.completed` | info | Deployment successful |
|
||||
| `deployment.failed` | error | Deployment failed |
|
||||
| `rollback.initiated` | warn | Rollback triggered |
|
||||
| `approval.granted` | info | Promotion approved |
|
||||
| `approval.denied` | info | Promotion rejected |
|
||||
| `agent.connected` | info | Agent came online |
|
||||
| `agent.disconnected` | warn | Agent went offline |
|
||||
| `security.gate.failed` | warn | Security check blocked |
|
||||
|
||||
## Distributed Tracing
|
||||
|
||||
### Trace Context Propagation
|
||||
|
||||
```typescript
|
||||
// Trace context in requests
|
||||
interface TraceContext {
|
||||
traceId: string;
|
||||
spanId: string;
|
||||
parentSpanId?: string;
|
||||
sampled: boolean;
|
||||
baggage?: Record<string, string>;
|
||||
}
|
||||
|
||||
// W3C Trace Context headers
|
||||
// traceparent: 00-{traceId}-{spanId}-{flags}
|
||||
// tracestate: stella=...
|
||||
|
||||
// Example trace propagation
|
||||
class TracingMiddleware {
|
||||
handle(req: Request, res: Response, next: NextFunction): void {
|
||||
const traceparent = req.headers['traceparent'];
|
||||
const traceContext = this.parseTraceParent(traceparent);
|
||||
|
||||
// Start span for this request
|
||||
const span = this.tracer.startSpan('http.request', {
|
||||
parent: traceContext,
|
||||
attributes: {
|
||||
'http.method': req.method,
|
||||
'http.url': req.url,
|
||||
'http.user_agent': req.headers['user-agent'],
|
||||
'tenant.id': req.tenantId
|
||||
}
|
||||
});
|
||||
|
||||
// Attach to request for downstream use
|
||||
req.span = span;
|
||||
|
||||
res.on('finish', () => {
|
||||
span.setAttribute('http.status_code', res.statusCode);
|
||||
span.end();
|
||||
});
|
||||
|
||||
next();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Key Spans
|
||||
|
||||
| Span Name | Description | Attributes |
|
||||
|-----------|-------------|------------|
|
||||
| `deployment.execute` | Full deployment | `release_id`, `environment` |
|
||||
| `task.dispatch` | Task dispatch to agent | `target_id`, `agent_id` |
|
||||
| `agent.execute` | Agent task execution | `task_type`, `duration` |
|
||||
| `workflow.run` | Workflow execution | `template_id`, `status` |
|
||||
| `workflow.step` | Individual step | `step_type`, `node_id` |
|
||||
| `approval.wait` | Waiting for approval | `promotion_id`, `duration` |
|
||||
| `gate.evaluate` | Gate evaluation | `gate_type`, `result` |
|
||||
|
||||
### Jaeger Configuration
|
||||
|
||||
```yaml
|
||||
# jaeger-config.yaml
|
||||
apiVersion: jaegertracing.io/v1
|
||||
kind: Jaeger
|
||||
metadata:
|
||||
name: stella-jaeger
|
||||
spec:
|
||||
strategy: production
|
||||
collector:
|
||||
maxReplicas: 5
|
||||
storage:
|
||||
type: elasticsearch
|
||||
options:
|
||||
es:
|
||||
server-urls: https://elasticsearch:9200
|
||||
secretName: jaeger-es-secret
|
||||
ingress:
|
||||
enabled: true
|
||||
```
|
||||
|
||||
## Alerting
|
||||
|
||||
### Alert Rules
|
||||
|
||||
```yaml
|
||||
# prometheus-rules.yaml
|
||||
groups:
|
||||
- name: stella.deployment
|
||||
rules:
|
||||
- alert: DeploymentFailureRateHigh
|
||||
expr: |
|
||||
sum(rate(stella_deployments_total{status="failed"}[5m])) /
|
||||
sum(rate(stella_deployments_total[5m])) > 0.1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "High deployment failure rate"
|
||||
description: "More than 10% of deployments are failing"
|
||||
|
||||
- alert: DeploymentDurationHigh
|
||||
expr: |
|
||||
histogram_quantile(0.95, sum(rate(stella_deployment_duration_seconds_bucket[5m])) by (le, tenant)) > 600
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Deployment duration high"
|
||||
description: "P95 deployment duration exceeds 10 minutes"
|
||||
|
||||
- alert: RollbackRateHigh
|
||||
expr: |
|
||||
sum(rate(stella_rollbacks_total[1h])) > 3
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High rollback rate"
|
||||
description: "More than 3 rollbacks in the last hour"
|
||||
|
||||
- name: stella.agents
|
||||
rules:
|
||||
- alert: AgentOffline
|
||||
expr: |
|
||||
stella_agent_heartbeat_age_seconds > 120
|
||||
for: 2m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Agent offline"
|
||||
description: "Agent {{ $labels.agent }} has not sent heartbeat for 2 minutes"
|
||||
|
||||
- alert: AgentPoolLow
|
||||
expr: |
|
||||
count(stella_agents_connected{status="online"}) by (tenant) < 2
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Low agent count"
|
||||
description: "Fewer than 2 agents online for tenant {{ $labels.tenant }}"
|
||||
|
||||
- name: stella.approvals
|
||||
rules:
|
||||
- alert: ApprovalBacklogHigh
|
||||
expr: |
|
||||
stella_approval_pending_count > 10
|
||||
for: 1h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Approval backlog growing"
|
||||
description: "More than 10 pending approvals for over an hour"
|
||||
|
||||
- alert: ApprovalWaitLong
|
||||
expr: |
|
||||
histogram_quantile(0.90, stella_approval_duration_seconds_bucket) > 86400
|
||||
for: 1h
|
||||
labels:
|
||||
severity: info
|
||||
annotations:
|
||||
summary: "Long approval wait times"
|
||||
description: "P90 approval wait time exceeds 24 hours"
|
||||
```
|
||||
|
||||
### PagerDuty Integration
|
||||
|
||||
```typescript
|
||||
interface AlertManagerConfig {
|
||||
receivers: [
|
||||
{
|
||||
name: "stella-critical",
|
||||
pagerduty_configs: [
|
||||
{
|
||||
service_key: "${PAGERDUTY_SERVICE_KEY}",
|
||||
severity: "critical"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
name: "stella-warning",
|
||||
slack_configs: [
|
||||
{
|
||||
api_url: "${SLACK_WEBHOOK_URL}",
|
||||
channel: "#stella-alerts",
|
||||
send_resolved: true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
route: {
|
||||
receiver: "stella-warning",
|
||||
routes: [
|
||||
{
|
||||
match: { severity: "critical" },
|
||||
receiver: "stella-critical"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Dashboards
|
||||
|
||||
### Deployment Dashboard
|
||||
|
||||
Key panels:
|
||||
- Deployment rate over time
|
||||
- Success/failure ratio
|
||||
- Average deployment duration
|
||||
- Deployment duration histogram
|
||||
- Active deployments by environment
|
||||
- Recent deployment list
|
||||
|
||||
### Agent Health Dashboard
|
||||
|
||||
Key panels:
|
||||
- Connected agents count
|
||||
- Agent heartbeat status
|
||||
- Tasks per agent
|
||||
- Task success rate by agent
|
||||
- Agent resource utilization
|
||||
|
||||
### Approval Dashboard
|
||||
|
||||
Key panels:
|
||||
- Pending approvals count
|
||||
- Approval response time
|
||||
- Approvals by user
|
||||
- Rejection reasons breakdown
|
||||
|
||||
## Health Endpoints
|
||||
|
||||
### Application Health
|
||||
|
||||
```http
|
||||
GET /health
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"version": "1.0.0",
|
||||
"uptime": 86400,
|
||||
"checks": {
|
||||
"database": { "status": "healthy", "latency": 5 },
|
||||
"redis": { "status": "healthy", "latency": 2 },
|
||||
"vault": { "status": "healthy", "latency": 10 }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Readiness Probe
|
||||
|
||||
```http
|
||||
GET /health/ready
|
||||
```
|
||||
|
||||
### Liveness Probe
|
||||
|
||||
```http
|
||||
GET /health/live
|
||||
```
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### Database Connection Pool
|
||||
|
||||
```typescript
|
||||
const poolConfig = {
|
||||
min: 5,
|
||||
max: 20,
|
||||
acquireTimeout: 30000,
|
||||
idleTimeout: 600000,
|
||||
connectionTimeout: 10000
|
||||
};
|
||||
```
|
||||
|
||||
### Cache Configuration
|
||||
|
||||
```typescript
|
||||
const cacheConfig = {
|
||||
// Release cache
|
||||
releases: {
|
||||
ttl: 300, // 5 minutes
|
||||
maxSize: 1000
|
||||
},
|
||||
// Target cache
|
||||
targets: {
|
||||
ttl: 60, // 1 minute
|
||||
maxSize: 5000
|
||||
},
|
||||
// Workflow template cache
|
||||
templates: {
|
||||
ttl: 3600, // 1 hour
|
||||
maxSize: 100
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
```typescript
|
||||
const rateLimitConfig = {
|
||||
// API rate limits
|
||||
api: {
|
||||
windowMs: 60000, // 1 minute
|
||||
max: 1000, // requests per window
|
||||
burst: 100 // burst allowance
|
||||
},
|
||||
// Webhook rate limits
|
||||
webhooks: {
|
||||
windowMs: 60000,
|
||||
max: 100
|
||||
},
|
||||
// Per-tenant limits
|
||||
tenant: {
|
||||
windowMs: 60000,
|
||||
max: 500
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Metrics Reference](metrics.md)
|
||||
- [Logging Guide](logging.md)
|
||||
- [Tracing Setup](tracing.md)
|
||||
- [Alert Configuration](alerting.md)
|
||||
222
docs/modules/release-orchestrator/operations/tracing.md
Normal file
222
docs/modules/release-orchestrator/operations/tracing.md
Normal file
@@ -0,0 +1,222 @@
|
||||
# Distributed Tracing Specification
|
||||
|
||||
> OpenTelemetry-based distributed tracing for the Release Orchestrator.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 13.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Observability Overview](overview.md), [Logging](logging.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The Release Orchestrator uses OpenTelemetry for distributed tracing, enabling end-to-end visibility of promotion workflows, deployments, and agent tasks.
|
||||
|
||||
---
|
||||
|
||||
## Trace Context Propagation
|
||||
|
||||
### W3C Trace Context
|
||||
|
||||
```typescript
|
||||
// Trace context structure
|
||||
interface TraceContext {
|
||||
traceId: string; // 32-char hex
|
||||
spanId: string; // 16-char hex
|
||||
parentSpanId?: string;
|
||||
sampled: boolean;
|
||||
baggage: Record<string, string>;
|
||||
}
|
||||
|
||||
// Propagation headers
|
||||
const TRACE_HEADERS = {
|
||||
W3C_TRACEPARENT: "traceparent",
|
||||
W3C_TRACESTATE: "tracestate",
|
||||
BAGGAGE: "baggage",
|
||||
};
|
||||
|
||||
// Example traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
|
||||
```
|
||||
|
||||
### Header Format
|
||||
|
||||
```
|
||||
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
|
||||
^ ^ ^ ^
|
||||
| | | |
|
||||
| trace-id (32 hex) span-id (16 hex) flags
|
||||
version
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Traces
|
||||
|
||||
| Operation | Span Name | Attributes |
|
||||
|-----------|-----------|------------|
|
||||
| Promotion request | `promotion.request` | promotion_id, release_id, environment |
|
||||
| Gate evaluation | `promotion.evaluate_gates` | gate_names, result |
|
||||
| Workflow execution | `workflow.execute` | workflow_run_id, template_name |
|
||||
| Step execution | `workflow.step.{type}` | step_run_id, node_id, inputs |
|
||||
| Deployment job | `deployment.execute` | job_id, environment, strategy |
|
||||
| Agent task | `agent.task.{type}` | task_id, agent_id, target_id |
|
||||
| Plugin call | `plugin.{method}` | plugin_id, method, duration |
|
||||
|
||||
---
|
||||
|
||||
## Trace Hierarchy
|
||||
|
||||
### Promotion Flow
|
||||
|
||||
```
|
||||
promotion.request (root)
|
||||
+-- promotion.evaluate_gates
|
||||
| +-- gate.security
|
||||
| +-- gate.approval
|
||||
| +-- gate.freeze_window
|
||||
|
|
||||
+-- workflow.execute
|
||||
| +-- workflow.step.security-check
|
||||
| +-- workflow.step.approval
|
||||
| +-- workflow.step.deploy
|
||||
| +-- deployment.execute
|
||||
| +-- deployment.assign_tasks
|
||||
| +-- agent.task.pull
|
||||
| +-- agent.task.deploy
|
||||
| +-- agent.task.health_check
|
||||
|
|
||||
+-- evidence.generate
|
||||
+-- evidence.sign
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Span Attributes
|
||||
|
||||
### Common Attributes
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `tenant.id` | string | Tenant UUID |
|
||||
| `user.id` | string | User UUID (if authenticated) |
|
||||
| `release.id` | string | Release UUID |
|
||||
| `environment.name` | string | Environment name |
|
||||
| `error` | boolean | Whether error occurred |
|
||||
| `error.type` | string | Error type/class |
|
||||
|
||||
### Promotion Attributes
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `promotion.id` | string | Promotion UUID |
|
||||
| `promotion.status` | string | Current status |
|
||||
| `promotion.gates` | string[] | Gates evaluated |
|
||||
| `promotion.decision` | string | allow/deny |
|
||||
|
||||
### Deployment Attributes
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `deployment.job_id` | string | Deployment job UUID |
|
||||
| `deployment.strategy` | string | Deployment strategy |
|
||||
| `deployment.target_count` | int | Number of targets |
|
||||
| `deployment.batch_size` | int | Batch size |
|
||||
|
||||
### Agent Task Attributes
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `task.id` | string | Task UUID |
|
||||
| `task.type` | string | Task type |
|
||||
| `agent.id` | string | Agent UUID |
|
||||
| `target.id` | string | Target UUID |
|
||||
|
||||
---
|
||||
|
||||
## OpenTelemetry Configuration
|
||||
|
||||
### SDK Configuration
|
||||
|
||||
```yaml
|
||||
# otel-config.yaml
|
||||
service:
|
||||
name: stella-release-orchestrator
|
||||
version: ${VERSION}
|
||||
|
||||
exporters:
|
||||
otlp:
|
||||
endpoint: otel-collector:4317
|
||||
protocol: grpc
|
||||
|
||||
processors:
|
||||
batch:
|
||||
timeout: 10s
|
||||
send_batch_size: 1024
|
||||
|
||||
resource:
|
||||
attributes:
|
||||
- key: service.namespace
|
||||
value: stella-ops
|
||||
- key: deployment.environment
|
||||
value: ${ENVIRONMENT}
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
OTEL_SERVICE_NAME=stella-release-orchestrator
|
||||
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
|
||||
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
|
||||
OTEL_TRACES_SAMPLER=parentbased_traceidratio
|
||||
OTEL_TRACES_SAMPLER_ARG=0.1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sampling Strategy
|
||||
|
||||
| Environment | Sampling Rate | Reason |
|
||||
|-------------|---------------|--------|
|
||||
| Development | 100% | Full visibility |
|
||||
| Staging | 100% | Full visibility |
|
||||
| Production | 10% | Cost/performance |
|
||||
| Production (errors) | 100% | Always sample errors |
|
||||
|
||||
---
|
||||
|
||||
## Example Trace
|
||||
|
||||
```json
|
||||
{
|
||||
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
|
||||
"spans": [
|
||||
{
|
||||
"spanId": "00f067aa0ba902b7",
|
||||
"name": "promotion.request",
|
||||
"duration_ms": 5234,
|
||||
"attributes": {
|
||||
"promotion.id": "promo-123",
|
||||
"release.id": "rel-456",
|
||||
"environment.name": "production"
|
||||
}
|
||||
},
|
||||
{
|
||||
"spanId": "00f067aa0ba902b8",
|
||||
"parentSpanId": "00f067aa0ba902b7",
|
||||
"name": "gate.security",
|
||||
"duration_ms": 234,
|
||||
"attributes": {
|
||||
"gate.result": "passed",
|
||||
"vulnerabilities.critical": 0
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Observability Overview](overview.md)
|
||||
- [Logging](logging.md)
|
||||
- [Metrics](metrics.md)
|
||||
- [Alerting](alerting.md)
|
||||
@@ -0,0 +1,266 @@
|
||||
# A/B Release Models
|
||||
|
||||
> Two models for A/B releases: target-group based and router-based traffic splitting.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 11.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Traffic Router](routers.md)
|
||||
**Sprint:** [110_001 A/B Release Manager](../../../../implplan/SPRINT_20260110_110_001_PROGDL_ab_release_manager.md)
|
||||
|
||||
## Overview
|
||||
|
||||
Stella Ops supports two distinct models for A/B releases:
|
||||
|
||||
1. **Target-Group A/B:** Scale different target groups to shift workload
|
||||
2. **Router-Based A/B:** Use traffic routers to split requests between variations
|
||||
|
||||
Each model has different use cases, trade-offs, and implementation requirements.
|
||||
|
||||
---
|
||||
|
||||
## Model 1: Target-Group A/B
|
||||
|
||||
Target-group A/B splits traffic by scaling different groups of targets. Suitable for worker services, background processors, and scenarios where sticky sessions are not required.
|
||||
|
||||
### Configuration
|
||||
|
||||
```typescript
|
||||
interface TargetGroupABConfig {
|
||||
type: "target-group";
|
||||
|
||||
// Group definitions
|
||||
groupA: {
|
||||
targetGroupId: UUID;
|
||||
labels?: Record<string, string>;
|
||||
};
|
||||
groupB: {
|
||||
targetGroupId: UUID;
|
||||
labels?: Record<string, string>;
|
||||
};
|
||||
|
||||
// Rollout by scaling groups
|
||||
rolloutStrategy: {
|
||||
type: "scale-groups";
|
||||
stages: ScaleStage[];
|
||||
};
|
||||
}
|
||||
|
||||
interface ScaleStage {
|
||||
name: string;
|
||||
groupAPercentage: number; // Percentage of group A targets active
|
||||
groupBPercentage: number; // Percentage of group B targets active
|
||||
duration?: number; // Auto-advance after duration (seconds)
|
||||
healthThreshold?: number; // Required health % to advance
|
||||
requireApproval?: boolean;
|
||||
}
|
||||
```
|
||||
|
||||
### Example: Worker Service Canary
|
||||
|
||||
```typescript
|
||||
const workerCanaryConfig: TargetGroupABConfig = {
|
||||
type: "target-group",
|
||||
groupA: { labels: { "worker-group": "A" } },
|
||||
groupB: { labels: { "worker-group": "B" } },
|
||||
rolloutStrategy: {
|
||||
type: "scale-groups",
|
||||
stages: [
|
||||
// Stage 1: 100% A, 10% B (canary)
|
||||
{ name: "canary", groupAPercentage: 100, groupBPercentage: 10,
|
||||
duration: 300, healthThreshold: 95 },
|
||||
// Stage 2: 100% A, 50% B
|
||||
{ name: "expand", groupAPercentage: 100, groupBPercentage: 50,
|
||||
duration: 600, healthThreshold: 95 },
|
||||
// Stage 3: 50% A, 100% B
|
||||
{ name: "shift", groupAPercentage: 50, groupBPercentage: 100,
|
||||
duration: 600, healthThreshold: 95 },
|
||||
// Stage 4: 0% A, 100% B (complete)
|
||||
{ name: "complete", groupAPercentage: 0, groupBPercentage: 100,
|
||||
requireApproval: true },
|
||||
],
|
||||
},
|
||||
};
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
- Background job processors
|
||||
- Worker services without external traffic
|
||||
- Infrastructure-level splitting
|
||||
- Static traffic distribution
|
||||
- Hardware-based variants
|
||||
|
||||
---
|
||||
|
||||
## Model 2: Router-Based A/B
|
||||
|
||||
Router-based A/B uses traffic routers (Nginx, HAProxy, ALB) to split incoming requests between variations. Suitable for APIs, web services, and scenarios requiring sticky sessions.
|
||||
|
||||
### Configuration
|
||||
|
||||
```typescript
|
||||
interface RouterBasedABConfig {
|
||||
type: "router-based";
|
||||
|
||||
// Router integration
|
||||
routerIntegrationId: UUID;
|
||||
|
||||
// Upstream configuration
|
||||
upstreamName: string;
|
||||
variationA: {
|
||||
targets: string[];
|
||||
serviceName?: string;
|
||||
};
|
||||
variationB: {
|
||||
targets: string[];
|
||||
serviceName?: string;
|
||||
};
|
||||
|
||||
// Traffic split configuration
|
||||
trafficSplit: TrafficSplitConfig;
|
||||
|
||||
// Rollout strategy
|
||||
rolloutStrategy: RouterRolloutStrategy;
|
||||
}
|
||||
|
||||
interface TrafficSplitConfig {
|
||||
type: "weight" | "header" | "cookie" | "tenant" | "composite";
|
||||
|
||||
// Weight-based (percentage)
|
||||
weights?: { A: number; B: number };
|
||||
|
||||
// Header-based
|
||||
headerName?: string;
|
||||
headerValueA?: string;
|
||||
headerValueB?: string;
|
||||
|
||||
// Cookie-based
|
||||
cookieName?: string;
|
||||
cookieValueA?: string;
|
||||
cookieValueB?: string;
|
||||
|
||||
// Tenant-based (by host/path)
|
||||
tenantRules?: TenantRule[];
|
||||
}
|
||||
```
|
||||
|
||||
### Rollout Strategy
|
||||
|
||||
```typescript
|
||||
interface RouterRolloutStrategy {
|
||||
type: "manual" | "time-based" | "health-based" | "composite";
|
||||
stages: RouterRolloutStage[];
|
||||
}
|
||||
|
||||
interface RouterRolloutStage {
|
||||
name: string;
|
||||
trafficPercentageB: number; // % of traffic to variation B
|
||||
|
||||
// Advancement criteria
|
||||
duration?: number; // Auto-advance after duration
|
||||
healthThreshold?: number; // Required health %
|
||||
errorRateThreshold?: number; // Max error rate %
|
||||
latencyThreshold?: number; // Max p99 latency ms
|
||||
requireApproval?: boolean;
|
||||
|
||||
// Optional: specific routing rules for this stage
|
||||
routingOverrides?: TrafficSplitConfig;
|
||||
}
|
||||
```
|
||||
|
||||
### Example: API Canary with Health-Based Advancement
|
||||
|
||||
```typescript
|
||||
const apiCanaryConfig: RouterBasedABConfig = {
|
||||
type: "router-based",
|
||||
routerIntegrationId: "nginx-prod",
|
||||
upstreamName: "api-backend",
|
||||
variationA: { serviceName: "api-v1" },
|
||||
variationB: { serviceName: "api-v2" },
|
||||
trafficSplit: { type: "weight", weights: { A: 100, B: 0 } },
|
||||
rolloutStrategy: {
|
||||
type: "health-based",
|
||||
stages: [
|
||||
{ name: "canary-10", trafficPercentageB: 10,
|
||||
duration: 300, healthThreshold: 99, errorRateThreshold: 1 },
|
||||
{ name: "canary-25", trafficPercentageB: 25,
|
||||
duration: 600, healthThreshold: 99, errorRateThreshold: 1 },
|
||||
{ name: "canary-50", trafficPercentageB: 50,
|
||||
duration: 900, healthThreshold: 99, errorRateThreshold: 1 },
|
||||
{ name: "promote", trafficPercentageB: 100,
|
||||
requireApproval: true },
|
||||
],
|
||||
},
|
||||
};
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
- API services with external traffic
|
||||
- Web applications with user sessions
|
||||
- Dynamic traffic distribution
|
||||
- User-based variants (A/B testing)
|
||||
- Feature flags and gradual rollouts
|
||||
|
||||
---
|
||||
|
||||
## Routing Strategies
|
||||
|
||||
### Weight-Based Routing
|
||||
|
||||
Splits traffic by percentage across variations.
|
||||
|
||||
```yaml
|
||||
trafficSplit:
|
||||
type: weight
|
||||
weights:
|
||||
A: 90
|
||||
B: 10
|
||||
```
|
||||
|
||||
### Header-Based Routing
|
||||
|
||||
Routes based on request header values.
|
||||
|
||||
```yaml
|
||||
trafficSplit:
|
||||
type: header
|
||||
headerName: X-Feature-Flag
|
||||
headerValueA: "control"
|
||||
headerValueB: "experiment"
|
||||
```
|
||||
|
||||
### Cookie-Based Routing
|
||||
|
||||
Routes based on cookie values for sticky sessions.
|
||||
|
||||
```yaml
|
||||
trafficSplit:
|
||||
type: cookie
|
||||
cookieName: ab_variation
|
||||
cookieValueA: "A"
|
||||
cookieValueB: "B"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Comparison Matrix
|
||||
|
||||
| Aspect | Target-Group A/B | Router-Based A/B |
|
||||
|--------|------------------|------------------|
|
||||
| **Traffic Control** | By scaling targets | By routing rules |
|
||||
| **Sticky Sessions** | Not supported | Supported |
|
||||
| **Granularity** | Target-level | Request-level |
|
||||
| **External Traffic** | Not required | Required |
|
||||
| **Infrastructure** | Target groups | Traffic router |
|
||||
| **Use Case** | Workers, batch jobs | APIs, web apps |
|
||||
| **Rollback Speed** | Slower (scaling) | Immediate (routing) |
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Progressive Delivery Module](../modules/progressive-delivery.md)
|
||||
- [Canary Controller](canary.md)
|
||||
- [Router Plugins](routers.md)
|
||||
- [Deployment Strategies](../deployment/strategies.md)
|
||||
270
docs/modules/release-orchestrator/progressive-delivery/canary.md
Normal file
270
docs/modules/release-orchestrator/progressive-delivery/canary.md
Normal file
@@ -0,0 +1,270 @@
|
||||
# Canary Controller
|
||||
|
||||
> Automated canary deployment controller with health-based stage advancement and automatic rollback.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 11.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Deployment Strategies](../deployment/strategies.md)
|
||||
**Sprint:** [110_003 Canary Controller](../../../../implplan/SPRINT_20260110_110_003_PROGDL_canary_controller.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The Canary Controller automates progressive rollout of new versions by gradually shifting traffic, monitoring health metrics, and automatically rolling back if issues are detected.
|
||||
|
||||
---
|
||||
|
||||
## Canary State Machine
|
||||
|
||||
### States
|
||||
|
||||
```
|
||||
CREATED -> DEPLOYING -> EVALUATING -> PROMOTING/ROLLING_BACK -> COMPLETED
|
||||
```
|
||||
|
||||
| State | Description |
|
||||
|-------|-------------|
|
||||
| `CREATED` | Canary release defined, not started |
|
||||
| `DEPLOYING` | Deploying variation B to targets |
|
||||
| `EVALUATING` | Monitoring health metrics at current stage |
|
||||
| `PROMOTING` | Advancing to next stage |
|
||||
| `ROLLING_BACK` | Reverting to variation A |
|
||||
| `COMPLETED` | Final state (promoted or rolled back) |
|
||||
|
||||
---
|
||||
|
||||
## Implementation
|
||||
|
||||
### Canary Controller Class
|
||||
|
||||
```typescript
|
||||
class CanaryController {
|
||||
async executeRollout(abRelease: ABRelease): Promise<void> {
|
||||
const strategy = abRelease.rolloutStrategy;
|
||||
|
||||
for (let i = 0; i < strategy.stages.length; i++) {
|
||||
const stage = strategy.stages[i];
|
||||
const stageRecord = await this.startStage(abRelease, stage, i);
|
||||
|
||||
try {
|
||||
// 1. Apply traffic configuration for this stage
|
||||
await this.applyStageTraffic(abRelease, stage);
|
||||
this.emit("canary.stage_started", { abRelease, stage, stageNumber: i });
|
||||
|
||||
// 2. Wait for stage completion based on criteria
|
||||
const result = await this.waitForStageCompletion(abRelease, stage);
|
||||
|
||||
if (!result.success) {
|
||||
// Health check failed - rollback
|
||||
this.log(`Stage ${stage.name} failed health check: ${result.reason}`);
|
||||
await this.rollback(abRelease, result.reason);
|
||||
return;
|
||||
}
|
||||
|
||||
// 3. Check if approval required
|
||||
if (stage.requireApproval) {
|
||||
this.log(`Stage ${stage.name} requires approval`);
|
||||
await this.pauseForApproval(abRelease, stage);
|
||||
|
||||
// Wait for approval
|
||||
const approval = await this.waitForApproval(abRelease, stage);
|
||||
if (!approval.approved) {
|
||||
await this.rollback(abRelease, "Approval denied");
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
await this.completeStage(stageRecord, "succeeded");
|
||||
this.emit("canary.stage_completed", { abRelease, stage, stageNumber: i });
|
||||
|
||||
} catch (error) {
|
||||
await this.completeStage(stageRecord, "failed", error.message);
|
||||
await this.rollback(abRelease, error.message);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
// Rollout complete
|
||||
await this.completeRollout(abRelease);
|
||||
this.emit("canary.promoted", { abRelease });
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Stage Completion Logic
|
||||
|
||||
```typescript
|
||||
private async waitForStageCompletion(
|
||||
abRelease: ABRelease,
|
||||
stage: RolloutStage
|
||||
): Promise<StageCompletionResult> {
|
||||
|
||||
const startTime = Date.now();
|
||||
const checkInterval = 30000; // 30 seconds
|
||||
|
||||
while (true) {
|
||||
// Check health metrics
|
||||
const health = await this.checkHealth(abRelease, stage);
|
||||
|
||||
if (!health.healthy) {
|
||||
return {
|
||||
success: false,
|
||||
reason: `Health check failed: ${health.reason}`
|
||||
};
|
||||
}
|
||||
|
||||
// Check error rate (if threshold configured)
|
||||
if (stage.errorRateThreshold !== undefined) {
|
||||
const errorRate = await this.getErrorRate(abRelease);
|
||||
if (errorRate > stage.errorRateThreshold) {
|
||||
return {
|
||||
success: false,
|
||||
reason: `Error rate ${errorRate}% exceeds threshold ${stage.errorRateThreshold}%`
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// Check latency (if threshold configured)
|
||||
if (stage.latencyThreshold !== undefined) {
|
||||
const latency = await this.getP99Latency(abRelease);
|
||||
if (latency > stage.latencyThreshold) {
|
||||
return {
|
||||
success: false,
|
||||
reason: `P99 latency ${latency}ms exceeds threshold ${stage.latencyThreshold}ms`
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// Check duration (auto-advance)
|
||||
if (stage.duration !== undefined) {
|
||||
const elapsed = (Date.now() - startTime) / 1000;
|
||||
if (elapsed >= stage.duration) {
|
||||
return { success: true };
|
||||
}
|
||||
}
|
||||
|
||||
// Wait before next check
|
||||
await sleep(checkInterval);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Traffic Application
|
||||
|
||||
```typescript
|
||||
private async applyStageTraffic(abRelease: ABRelease, stage: RolloutStage): Promise<void> {
|
||||
if (abRelease.config.type === "router-based") {
|
||||
const router = await this.getRouterConnector(abRelease.config.routerIntegrationId);
|
||||
|
||||
await router.shiftTraffic(
|
||||
abRelease.config.variationA.serviceName,
|
||||
abRelease.config.variationB.serviceName,
|
||||
stage.trafficPercentageB
|
||||
);
|
||||
|
||||
} else if (abRelease.config.type === "target-group") {
|
||||
// Scale target groups
|
||||
await this.scaleTargetGroup(
|
||||
abRelease.config.groupA,
|
||||
stage.groupAPercentage
|
||||
);
|
||||
await this.scaleTargetGroup(
|
||||
abRelease.config.groupB,
|
||||
stage.groupBPercentage
|
||||
);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Rollback
|
||||
|
||||
```typescript
|
||||
async rollback(abRelease: ABRelease, reason: string): Promise<void> {
|
||||
this.log(`Rolling back A/B release: ${reason}`);
|
||||
this.emit("canary.rollback_started", { abRelease, reason });
|
||||
|
||||
if (abRelease.config.type === "router-based") {
|
||||
// Shift all traffic back to A
|
||||
const router = await this.getRouterConnector(abRelease.config.routerIntegrationId);
|
||||
await router.shiftTraffic(
|
||||
abRelease.config.variationB.serviceName,
|
||||
abRelease.config.variationA.serviceName,
|
||||
100
|
||||
);
|
||||
|
||||
} else if (abRelease.config.type === "target-group") {
|
||||
// Scale B to 0, A to 100
|
||||
await this.scaleTargetGroup(abRelease.config.groupA, 100);
|
||||
await this.scaleTargetGroup(abRelease.config.groupB, 0);
|
||||
}
|
||||
|
||||
abRelease.status = "rolled_back";
|
||||
await this.save(abRelease);
|
||||
|
||||
this.emit("canary.rolled_back", { abRelease, reason });
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Canary Stages
|
||||
|
||||
```yaml
|
||||
rolloutStrategy:
|
||||
type: health-based
|
||||
stages:
|
||||
- name: canary-5
|
||||
trafficPercentageB: 5
|
||||
duration: 300 # 5 minutes
|
||||
healthThreshold: 99
|
||||
errorRateThreshold: 0.5
|
||||
|
||||
- name: canary-25
|
||||
trafficPercentageB: 25
|
||||
duration: 600 # 10 minutes
|
||||
healthThreshold: 99
|
||||
errorRateThreshold: 1.0
|
||||
|
||||
- name: canary-50
|
||||
trafficPercentageB: 50
|
||||
duration: 900 # 15 minutes
|
||||
healthThreshold: 99
|
||||
errorRateThreshold: 1.0
|
||||
|
||||
- name: promote
|
||||
trafficPercentageB: 100
|
||||
requireApproval: true
|
||||
```
|
||||
|
||||
### Health Metrics
|
||||
|
||||
| Metric | Description | Typical Threshold |
|
||||
|--------|-------------|-------------------|
|
||||
| Success Rate | % of successful requests | > 99% |
|
||||
| Error Rate | % of failed requests | < 1% |
|
||||
| P99 Latency | 99th percentile response time | < 500ms |
|
||||
| Health Check | Container/service health | Healthy |
|
||||
|
||||
---
|
||||
|
||||
## Events
|
||||
|
||||
The canary controller emits events for observability:
|
||||
|
||||
| Event | Description |
|
||||
|-------|-------------|
|
||||
| `canary.stage_started` | Stage execution began |
|
||||
| `canary.stage_completed` | Stage completed successfully |
|
||||
| `canary.rollback_started` | Rollback initiated |
|
||||
| `canary.rolled_back` | Rollback completed |
|
||||
| `canary.promoted` | Full promotion completed |
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Progressive Delivery Module](../modules/progressive-delivery.md)
|
||||
- [A/B Release Models](ab-releases.md)
|
||||
- [Router Plugins](routers.md)
|
||||
- [Metrics](../operations/metrics.md)
|
||||
@@ -0,0 +1,348 @@
|
||||
# Router Plugins
|
||||
|
||||
> Traffic router plugins for progressive delivery (Nginx, AWS ALB, and custom implementations).
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 11.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Plugin System](../modules/plugin-system.md)
|
||||
**Sprint:** [110_004 Router Plugins](../../../../implplan/SPRINT_20260110_110_004_PROGDL_nginx_router.md)
|
||||
|
||||
## Overview
|
||||
|
||||
Router plugins enable traffic shifting for progressive delivery. The orchestrator ships with an Nginx router plugin for v1, with HAProxy, Traefik, and AWS ALB available as additional plugins.
|
||||
|
||||
---
|
||||
|
||||
## Router Plugin Interface
|
||||
|
||||
All router plugins implement the `TrafficRouterPlugin` interface:
|
||||
|
||||
```typescript
|
||||
interface TrafficRouterPlugin {
|
||||
// Configuration
|
||||
configureRoute(config: RouteConfig): Promise<void>;
|
||||
|
||||
// Traffic operations
|
||||
shiftTraffic(from: string, to: string, percentage: number): Promise<void>;
|
||||
getTrafficDistribution(): Promise<TrafficDistribution>;
|
||||
|
||||
// Health
|
||||
validateConfig(): Promise<ValidationResult>;
|
||||
reload(): Promise<void>;
|
||||
}
|
||||
|
||||
interface RouteConfig {
|
||||
upstream: string;
|
||||
serverName: string;
|
||||
variations: Variation[];
|
||||
splitType: "weight" | "header" | "cookie";
|
||||
headerName?: string;
|
||||
headerValueB?: string;
|
||||
stickySession?: boolean;
|
||||
stickyDuration?: number;
|
||||
}
|
||||
|
||||
interface Variation {
|
||||
name: string;
|
||||
targets: string[];
|
||||
weight: number;
|
||||
}
|
||||
|
||||
interface TrafficDistribution {
|
||||
variations: {
|
||||
name: string;
|
||||
percentage: number;
|
||||
targets: string[];
|
||||
}[];
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Nginx Router Plugin (v1 Built-in)
|
||||
|
||||
The Nginx plugin generates and manages Nginx configuration for traffic splitting.
|
||||
|
||||
### Implementation
|
||||
|
||||
```typescript
|
||||
class NginxRouterPlugin implements TrafficRouterPlugin {
|
||||
async configureRoute(config: RouteConfig): Promise<void> {
|
||||
const upstreamConfig = this.generateUpstreamConfig(config);
|
||||
const serverConfig = this.generateServerConfig(config);
|
||||
|
||||
// Write configuration files
|
||||
await this.writeConfig(
|
||||
`/etc/nginx/conf.d/upstream-${config.upstream}.conf`,
|
||||
upstreamConfig
|
||||
);
|
||||
await this.writeConfig(
|
||||
`/etc/nginx/conf.d/server-${config.upstream}.conf`,
|
||||
serverConfig
|
||||
);
|
||||
|
||||
// Validate configuration
|
||||
const validation = await this.validateConfig();
|
||||
if (!validation.valid) {
|
||||
throw new Error(`Nginx config validation failed: ${validation.error}`);
|
||||
}
|
||||
|
||||
// Reload nginx
|
||||
await this.reload();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Upstream Configuration
|
||||
|
||||
```typescript
|
||||
private generateUpstreamConfig(config: RouteConfig): string {
|
||||
const lines: string[] = [];
|
||||
|
||||
for (const variation of config.variations) {
|
||||
lines.push(`upstream ${config.upstream}_${variation.name} {`);
|
||||
|
||||
for (const target of variation.targets) {
|
||||
lines.push(` server ${target};`);
|
||||
}
|
||||
|
||||
lines.push(`}`);
|
||||
lines.push(``);
|
||||
}
|
||||
|
||||
// Combined upstream with weights (for percentage-based routing)
|
||||
if (config.splitType === "weight") {
|
||||
lines.push(`upstream ${config.upstream} {`);
|
||||
|
||||
for (const variation of config.variations) {
|
||||
const weight = variation.weight;
|
||||
for (const target of variation.targets) {
|
||||
lines.push(` server ${target} weight=${weight};`);
|
||||
}
|
||||
}
|
||||
|
||||
lines.push(`}`);
|
||||
}
|
||||
|
||||
return lines.join("\n");
|
||||
}
|
||||
```
|
||||
|
||||
### Server Configuration
|
||||
|
||||
```typescript
|
||||
private generateServerConfig(config: RouteConfig): string {
|
||||
if (config.splitType === "header" || config.splitType === "cookie") {
|
||||
// Split block based on header/cookie
|
||||
return `
|
||||
map $http_${config.headerName || "x-variation"} $${config.upstream}_backend {
|
||||
default ${config.upstream}_A;
|
||||
"${config.headerValueB || "B"}" ${config.upstream}_B;
|
||||
}
|
||||
|
||||
server {
|
||||
listen 80;
|
||||
server_name ${config.serverName};
|
||||
|
||||
location / {
|
||||
proxy_pass http://$${config.upstream}_backend;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
}
|
||||
}
|
||||
`;
|
||||
} else {
|
||||
// Weight-based (default)
|
||||
return `
|
||||
server {
|
||||
listen 80;
|
||||
server_name ${config.serverName};
|
||||
|
||||
location / {
|
||||
proxy_pass http://${config.upstream};
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
}
|
||||
}
|
||||
`;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Traffic Shifting
|
||||
|
||||
```typescript
|
||||
async shiftTraffic(from: string, to: string, percentage: number): Promise<void> {
|
||||
const config = await this.getCurrentConfig();
|
||||
|
||||
// Update weights
|
||||
for (const variation of config.variations) {
|
||||
if (variation.name === to) {
|
||||
variation.weight = percentage;
|
||||
} else {
|
||||
variation.weight = 100 - percentage;
|
||||
}
|
||||
}
|
||||
|
||||
await this.configureRoute(config);
|
||||
}
|
||||
|
||||
async getTrafficDistribution(): Promise<TrafficDistribution> {
|
||||
// Parse current nginx config to get weights
|
||||
const config = await this.parseCurrentConfig();
|
||||
|
||||
return {
|
||||
variations: config.variations.map(v => ({
|
||||
name: v.name,
|
||||
percentage: v.weight,
|
||||
targets: v.targets,
|
||||
})),
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## AWS ALB Router Plugin
|
||||
|
||||
The AWS ALB plugin manages weighted target groups for traffic splitting.
|
||||
|
||||
### Implementation
|
||||
|
||||
```typescript
|
||||
class AWSALBRouterPlugin implements TrafficRouterPlugin {
|
||||
private alb: AWS.ELBv2;
|
||||
|
||||
async configureRoute(config: RouteConfig): Promise<void> {
|
||||
const listenerArn = config.listenerArn;
|
||||
|
||||
// Create/update target groups for each variation
|
||||
const targetGroupArns: Record<string, string> = {};
|
||||
|
||||
for (const variation of config.variations) {
|
||||
const tgArn = await this.ensureTargetGroup(
|
||||
`${config.upstream}-${variation.name}`,
|
||||
variation.targets
|
||||
);
|
||||
targetGroupArns[variation.name] = tgArn;
|
||||
}
|
||||
|
||||
// Update listener rule with weighted target groups
|
||||
await this.alb.modifyRule({
|
||||
RuleArn: config.ruleArn,
|
||||
Actions: [{
|
||||
Type: "forward",
|
||||
ForwardConfig: {
|
||||
TargetGroups: config.variations.map(v => ({
|
||||
TargetGroupArn: targetGroupArns[v.name],
|
||||
Weight: v.weight,
|
||||
})),
|
||||
TargetGroupStickinessConfig: {
|
||||
Enabled: config.stickySession || false,
|
||||
DurationSeconds: config.stickyDuration || 3600,
|
||||
},
|
||||
},
|
||||
}],
|
||||
}).promise();
|
||||
}
|
||||
|
||||
async shiftTraffic(from: string, to: string, percentage: number): Promise<void> {
|
||||
const rule = await this.getRule();
|
||||
const forwardConfig = rule.Actions[0].ForwardConfig;
|
||||
|
||||
// Update weights
|
||||
for (const tg of forwardConfig.TargetGroups) {
|
||||
if (tg.TargetGroupArn.includes(`-${to}`)) {
|
||||
tg.Weight = percentage;
|
||||
} else {
|
||||
tg.Weight = 100 - percentage;
|
||||
}
|
||||
}
|
||||
|
||||
await this.alb.modifyRule({
|
||||
RuleArn: rule.RuleArn,
|
||||
Actions: rule.Actions,
|
||||
}).promise();
|
||||
}
|
||||
|
||||
async getTrafficDistribution(): Promise<TrafficDistribution> {
|
||||
const rule = await this.getRule();
|
||||
const forwardConfig = rule.Actions[0].ForwardConfig;
|
||||
|
||||
const variations = [];
|
||||
for (const tg of forwardConfig.TargetGroups) {
|
||||
const targets = await this.getTargetGroupTargets(tg.TargetGroupArn);
|
||||
const name = tg.TargetGroupArn.split("-").pop();
|
||||
|
||||
variations.push({
|
||||
name,
|
||||
percentage: tg.Weight,
|
||||
targets: targets.map(t => t.Id),
|
||||
});
|
||||
}
|
||||
|
||||
return { variations };
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Router Plugin Catalog
|
||||
|
||||
| Plugin | Status | Description |
|
||||
|--------|--------|-------------|
|
||||
| Nginx | v1 Built-in | Configuration-based weight/header routing |
|
||||
| HAProxy | Plugin | Runtime API for traffic management |
|
||||
| Traefik | Plugin | Dynamic configuration via API |
|
||||
| AWS ALB | Plugin | Weighted target groups |
|
||||
| Envoy | Planned | xDS API integration |
|
||||
|
||||
---
|
||||
|
||||
## Creating Custom Router Plugins
|
||||
|
||||
To create a custom router plugin:
|
||||
|
||||
1. **Implement Interface:** Create a class implementing `TrafficRouterPlugin`
|
||||
2. **Register Plugin:** Add to plugin registry with capabilities
|
||||
3. **Configuration Schema:** Define JSON Schema for plugin config
|
||||
4. **Health Checks:** Implement connection testing
|
||||
5. **Rollback Support:** Handle traffic reversion on failures
|
||||
|
||||
### Example Plugin Manifest
|
||||
|
||||
```yaml
|
||||
plugin:
|
||||
name: my-router
|
||||
version: 1.0.0
|
||||
type: router
|
||||
|
||||
capabilities:
|
||||
- traffic-routing
|
||||
- weight-based
|
||||
- header-based
|
||||
|
||||
config:
|
||||
type: object
|
||||
properties:
|
||||
endpoint:
|
||||
type: string
|
||||
description: Router API endpoint
|
||||
auth:
|
||||
type: object
|
||||
properties:
|
||||
type:
|
||||
enum: [basic, token]
|
||||
credentialRef:
|
||||
type: string
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Progressive Delivery Module](../modules/progressive-delivery.md)
|
||||
- [Plugin System](../modules/plugin-system.md)
|
||||
- [Canary Controller](canary.md)
|
||||
- [A/B Release Models](ab-releases.md)
|
||||
246
docs/modules/release-orchestrator/roadmap.md
Normal file
246
docs/modules/release-orchestrator/roadmap.md
Normal file
@@ -0,0 +1,246 @@
|
||||
# Implementation Roadmap
|
||||
|
||||
> Phased delivery plan for the Release Orchestrator implementation.
|
||||
|
||||
**Status:** Planned
|
||||
**Source:** [Architecture Advisory Section 14](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related:** [Implementation Guide](implementation-guide.md), [Test Structure](test-structure.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The Release Orchestrator is delivered in 8 phases over 34 weeks, progressively building from foundational infrastructure to full plugin ecosystem support.
|
||||
|
||||
---
|
||||
|
||||
## Phased Delivery Plan
|
||||
|
||||
### Phase 1: Foundation (Weeks 1-4)
|
||||
|
||||
**Goal:** Core infrastructure and basic release management
|
||||
|
||||
| Week | Deliverables |
|
||||
|------|--------------|
|
||||
| Week 1 | Database schema migration; INTHUB integration-manager; connection-profiles |
|
||||
| Week 2 | ENVMGR environment-manager; target-registry (basic) |
|
||||
| Week 3 | RELMAN component-registry; version-manager; release-manager |
|
||||
| Week 4 | Basic release CRUD APIs; CLI commands; integration tests |
|
||||
|
||||
**Exit Criteria:**
|
||||
- Can create environments with config
|
||||
- Can register components with image repos
|
||||
- Can create releases with pinned digests
|
||||
- Can list/search releases
|
||||
|
||||
**Certified Path:** Manual release creation; no deployment yet
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Workflow Engine (Weeks 5-8)
|
||||
|
||||
**Goal:** Workflow execution capability
|
||||
|
||||
| Week | Deliverables |
|
||||
|------|--------------|
|
||||
| Week 5 | WORKFL step-registry; built-in step types (approval, policy-gate, notify) |
|
||||
| Week 6 | WORKFL workflow-designer; workflow template CRUD |
|
||||
| Week 7 | WORKFL workflow-engine; DAG execution; state machine |
|
||||
| Week 8 | Step executor; retry logic; timeout handling; workflow run APIs |
|
||||
|
||||
**Exit Criteria:**
|
||||
- Can create workflow templates via API
|
||||
- Can execute workflows with approval steps
|
||||
- Workflow state machine handles all transitions
|
||||
- Step retries work correctly
|
||||
|
||||
**Certified Path:** Approval-only workflows; no deployment execution yet
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Promotion & Decision (Weeks 9-12)
|
||||
|
||||
**Goal:** Promotion workflow with security gates
|
||||
|
||||
| Week | Deliverables |
|
||||
|------|--------------|
|
||||
| Week 9 | PROMOT promotion-manager; approval-gateway |
|
||||
| Week 10 | PROMOT decision-engine; security gate integration with SCANENG |
|
||||
| Week 11 | Gate registry; freeze window gate; SoD enforcement |
|
||||
| Week 12 | Promotion APIs; "Why blocked?" endpoint; decision record |
|
||||
|
||||
**Exit Criteria:**
|
||||
- Can request promotion
|
||||
- Security gates evaluate scan verdicts
|
||||
- Approval workflow enforces SoD
|
||||
- Decision record captures gate results
|
||||
|
||||
**Certified Path:** Promotions with security + approval gates; no deployment yet
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Deployment Execution (Weeks 13-18)
|
||||
|
||||
**Goal:** Deploy to Docker/Compose targets
|
||||
|
||||
| Week | Deliverables |
|
||||
|------|--------------|
|
||||
| Week 13 | AGENTS agent-core; agent registration; heartbeat |
|
||||
| Week 14 | AGENTS agent-docker; Docker host deployment |
|
||||
| Week 15 | AGENTS agent-compose; Compose deployment |
|
||||
| Week 16 | DEPLOY deploy-orchestrator; artifact-generator |
|
||||
| Week 17 | DEPLOY rollback-manager; version sticker writing |
|
||||
| Week 18 | RELEVI evidence-collector; evidence-signer; audit-exporter |
|
||||
|
||||
**Exit Criteria:**
|
||||
- Agents can register and receive tasks
|
||||
- Docker deployment works with digest verification
|
||||
- Compose deployment writes lock files
|
||||
- Rollback restores previous version
|
||||
- Evidence packets generated for deployments
|
||||
|
||||
**Certified Path:** Full promotion -> deployment flow for Docker/Compose
|
||||
|
||||
---
|
||||
|
||||
### Phase 5: UI & Polish (Weeks 19-22)
|
||||
|
||||
**Goal:** Web console for release orchestration
|
||||
|
||||
| Week | Deliverables |
|
||||
|------|--------------|
|
||||
| Week 19 | Dashboard components; metrics widgets |
|
||||
| Week 20 | Environment overview; release detail screens |
|
||||
| Week 21 | Workflow editor (graph); run visualization |
|
||||
| Week 22 | Promotion UI; approval queue; "Why blocked?" modal |
|
||||
|
||||
**Exit Criteria:**
|
||||
- Dashboard shows operational metrics
|
||||
- Can manage environments/releases via UI
|
||||
- Can create/edit workflows in graph editor
|
||||
- Can approve promotions via UI
|
||||
|
||||
**Certified Path:** Complete v1 user experience
|
||||
|
||||
---
|
||||
|
||||
### Phase 6: Progressive Delivery (Weeks 23-26)
|
||||
|
||||
**Goal:** A/B releases and canary deployments
|
||||
|
||||
| Week | Deliverables |
|
||||
|------|--------------|
|
||||
| Week 23 | PROGDL ab-manager; target-group A/B |
|
||||
| Week 24 | PROGDL canary-controller; stage execution |
|
||||
| Week 25 | PROGDL traffic-router; Nginx plugin |
|
||||
| Week 26 | Canary UI; traffic visualization; health monitoring |
|
||||
|
||||
**Exit Criteria:**
|
||||
- Can create A/B release with variations
|
||||
- Canary controller advances stages based on health
|
||||
- Traffic router shifts weights
|
||||
- Rollback on health failure works
|
||||
|
||||
**Certified Path:** Target-group A/B; Nginx router-based A/B
|
||||
|
||||
---
|
||||
|
||||
### Phase 7: Extended Targets (Weeks 27-30)
|
||||
|
||||
**Goal:** ECS and Nomad support; SSH/WinRM agentless
|
||||
|
||||
| Week | Deliverables |
|
||||
|------|--------------|
|
||||
| Week 27 | AGENTS agent-ssh; SSH remote executor |
|
||||
| Week 28 | AGENTS agent-winrm; WinRM remote executor |
|
||||
| Week 29 | AGENTS agent-ecs; ECS deployment |
|
||||
| Week 30 | AGENTS agent-nomad; Nomad deployment |
|
||||
|
||||
**Exit Criteria:**
|
||||
- SSH deployment works with script execution
|
||||
- WinRM deployment works with PowerShell
|
||||
- ECS task definition updates work
|
||||
- Nomad job submissions work
|
||||
|
||||
**Certified Path:** All target types operational
|
||||
|
||||
---
|
||||
|
||||
### Phase 8: Plugin Ecosystem (Weeks 31-34)
|
||||
|
||||
**Goal:** Full plugin system; external integrations
|
||||
|
||||
| Week | Deliverables |
|
||||
|------|--------------|
|
||||
| Week 31 | PLUGIN plugin-registry; plugin-loader |
|
||||
| Week 32 | PLUGIN plugin-sandbox; plugin-sdk |
|
||||
| Week 33 | GitHub plugin; GitLab plugin |
|
||||
| Week 34 | Jenkins plugin; Vault plugin |
|
||||
|
||||
**Exit Criteria:**
|
||||
- Can install and configure plugins
|
||||
- Plugins can contribute step types
|
||||
- Plugins can contribute integrations
|
||||
- Plugin sandbox enforces limits
|
||||
|
||||
**Certified Path:** GitHub + Harbor + Docker/Compose + Vault
|
||||
|
||||
---
|
||||
|
||||
## Resource Requirements
|
||||
|
||||
### Team Structure
|
||||
|
||||
| Role | Count | Responsibilities |
|
||||
|------|-------|------------------|
|
||||
| Tech Lead | 1 | Architecture decisions; code review; unblocking |
|
||||
| Backend Engineers | 4 | Module development; API implementation |
|
||||
| Frontend Engineers | 2 | Web console; dashboard; workflow editor |
|
||||
| DevOps Engineer | 1 | CI/CD; infrastructure; agent deployment |
|
||||
| QA Engineer | 1 | Test automation; integration testing |
|
||||
| Technical Writer | 0.5 | Documentation; API docs; user guides |
|
||||
|
||||
### Infrastructure Requirements
|
||||
|
||||
| Component | Specification |
|
||||
|-----------|---------------|
|
||||
| PostgreSQL | Primary database; 16+ recommended; read replicas for scale |
|
||||
| Redis | Job queues; caching; session storage |
|
||||
| Object Storage | S3-compatible; evidence packets; large artifacts |
|
||||
| Container Runtime | Docker; for plugin sandboxes |
|
||||
| Kubernetes | Optional; for Stella core deployment (not required for targets) |
|
||||
|
||||
---
|
||||
|
||||
## Risk Mitigation
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|------------|--------|------------|
|
||||
| Agent security complexity | High | High | Early security review; penetration testing; mTLS implementation in Phase 4 |
|
||||
| Workflow state machine edge cases | Medium | High | Comprehensive state transition tests; chaos testing |
|
||||
| Plugin sandbox escapes | Low | Critical | Security audit; capability restrictions; resource limits |
|
||||
| Database migration issues | Medium | Medium | Staged rollout; rollback scripts; data validation |
|
||||
| UI performance with large workflows | Medium | Medium | Virtual rendering; lazy loading; performance testing |
|
||||
| Integration compatibility | High | Medium | Abstract connector interface; extensive integration tests |
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
| Phase | Key Metrics |
|
||||
|-------|-------------|
|
||||
| Phase 1 | Release creation time < 5s; API latency p99 < 200ms |
|
||||
| Phase 2 | Workflow execution reliability > 99.9% |
|
||||
| Phase 3 | Gate evaluation time < 500ms; SoD enforcement 100% |
|
||||
| Phase 4 | Deployment success rate > 99%; rollback time < 60s |
|
||||
| Phase 5 | UI initial load < 2s; real-time update latency < 1s |
|
||||
| Phase 6 | Canary rollback trigger time < 30s |
|
||||
| Phase 7 | All target type coverage with unified API |
|
||||
| Phase 8 | Plugin sandbox isolation verified by security audit |
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Sprint Index](../../implplan/SPRINT_20260110_100_000_INDEX_release_orchestrator.md)
|
||||
- [Implementation Guide](implementation-guide.md)
|
||||
- [Test Structure](test-structure.md)
|
||||
- [Architecture Overview](architecture.md)
|
||||
286
docs/modules/release-orchestrator/security/agent-security.md
Normal file
286
docs/modules/release-orchestrator/security/agent-security.md
Normal file
@@ -0,0 +1,286 @@
|
||||
# Agent Security Model
|
||||
|
||||
## Overview
|
||||
|
||||
Agents are trusted components that execute deployment tasks on targets. Their security model ensures:
|
||||
- Strong identity through mTLS certificates
|
||||
- Minimal privilege through scoped task credentials
|
||||
- Audit trail through signed task receipts
|
||||
- Isolation through process sandboxing
|
||||
|
||||
## Agent Registration Flow
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ AGENT REGISTRATION FLOW │
|
||||
│ │
|
||||
│ 1. Admin generates registration token (one-time use) │
|
||||
│ POST /api/v1/admin/agent-tokens │
|
||||
│ Response: { token: "reg_xxx", expiresAt: "..." } │
|
||||
│ │
|
||||
│ 2. Agent starts with registration token │
|
||||
│ ./stella-agent --register --token=reg_xxx │
|
||||
│ │
|
||||
│ 3. Agent requests mTLS certificate │
|
||||
│ POST /api/v1/agents/register │
|
||||
│ Headers: X-Registration-Token: reg_xxx │
|
||||
│ Body: { name, version, capabilities, csr } │
|
||||
│ Response: { agentId, certificate, caCertificate } │
|
||||
│ │
|
||||
│ 4. Agent establishes mTLS connection │
|
||||
│ Uses issued certificate for all subsequent requests │
|
||||
│ │
|
||||
│ 5. Agent requests short-lived JWT for task execution │
|
||||
│ POST /api/v1/agents/token (over mTLS) │
|
||||
│ Response: { token, expiresIn: 3600 } │
|
||||
│ │
|
||||
│ 6. Agent refreshes token before expiration │
|
||||
│ Token refresh only over mTLS connection │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## mTLS Communication
|
||||
|
||||
All agent-to-core communication uses mutual TLS:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ AGENT COMMUNICATION SECURITY │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ AGENT │ │ STELLA CORE │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ │ │ │
|
||||
│ │ mTLS (mutual TLS) │ │
|
||||
│ │ - Agent cert signed by Stella CA │ │
|
||||
│ │ - Server cert verified by Agent │ │
|
||||
│ │ - TLS 1.3 only │ │
|
||||
│ │ - Perfect forward secrecy │ │
|
||||
│ │◄────────────────────────────────────────►│ │
|
||||
│ │ │ │
|
||||
│ │ Encrypted payload │ │
|
||||
│ │ - Task payloads encrypted with │ │
|
||||
│ │ agent-specific key │ │
|
||||
│ │ - Logs encrypted in transit │ │
|
||||
│ │◄────────────────────────────────────────►│ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### TLS Requirements
|
||||
|
||||
| Requirement | Value |
|
||||
|-------------|-------|
|
||||
| Protocol | TLS 1.3 only |
|
||||
| Cipher Suites | TLS_AES_256_GCM_SHA384, TLS_CHACHA20_POLY1305_SHA256 |
|
||||
| Key Exchange | ECDHE with P-384 or X25519 |
|
||||
| Certificate Key | RSA 4096-bit or ECDSA P-384 |
|
||||
| Certificate Validity | 90 days (auto-renewed) |
|
||||
|
||||
## Certificate Management
|
||||
|
||||
### Certificate Structure
|
||||
|
||||
```typescript
|
||||
interface AgentCertificate {
|
||||
subject: {
|
||||
CN: string; // Agent name
|
||||
O: string; // "Stella Ops"
|
||||
OU: string; // Tenant ID
|
||||
};
|
||||
serialNumber: string;
|
||||
issuer: string; // Stella CA
|
||||
validFrom: DateTime;
|
||||
validTo: DateTime;
|
||||
extensions: {
|
||||
keyUsage: ["digitalSignature", "keyEncipherment"];
|
||||
extendedKeyUsage: ["clientAuth"];
|
||||
subjectAltName: string[]; // Agent ID as URI
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Certificate Renewal
|
||||
|
||||
Agents automatically renew certificates before expiration:
|
||||
1. Agent detects certificate expiring within 30 days
|
||||
2. Agent generates new CSR with same identity
|
||||
3. Agent submits renewal request over existing mTLS connection
|
||||
4. Authority issues new certificate
|
||||
5. Agent transitions to new certificate seamlessly
|
||||
|
||||
## Secrets Management
|
||||
|
||||
Secrets are NEVER stored in the Stella database. Only vault references are stored.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ SECRETS FLOW (NEVER STORED IN DB) │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ VAULT │ │ STELLA CORE │ │ AGENT │ │
|
||||
│ │ (Source) │ │ (Broker) │ │ (Consumer) │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ │ │ │ │
|
||||
│ │ │ Task requires secret │ │
|
||||
│ │ │ │ │
|
||||
│ │ Fetch with service │ │ │
|
||||
│ │ account token │ │ │
|
||||
│ │◄─────────────────────── │ │
|
||||
│ │ │ │ │
|
||||
│ │ Return secret │ │ │
|
||||
│ │ (wrapped, short TTL) │ │ │
|
||||
│ │────────────────────────► │ │
|
||||
│ │ │ │ │
|
||||
│ │ │ Embed in task payload │ │
|
||||
│ │ │ (encrypted) │ │
|
||||
│ │ │────────────────────────► │
|
||||
│ │ │ │ │
|
||||
│ │ │ │ Decrypt │
|
||||
│ │ │ │ Use for task │
|
||||
│ │ │ │ Discard │
|
||||
│ │
|
||||
│ Rules: │
|
||||
│ - Secrets NEVER stored in Stella database │
|
||||
│ - Only Vault references stored │
|
||||
│ - Secrets fetched at execution time only │
|
||||
│ - Secrets not logged (masked in logs) │
|
||||
│ - Secrets not persisted in agent memory beyond task scope │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Task Security
|
||||
|
||||
### Task Assignment
|
||||
|
||||
```typescript
|
||||
interface AgentTask {
|
||||
id: UUID;
|
||||
type: TaskType;
|
||||
targetId: UUID;
|
||||
payload: TaskPayload;
|
||||
credentials: EncryptedCredentials; // Encrypted with agent's public key
|
||||
timeout: number;
|
||||
priority: TaskPriority;
|
||||
idempotencyKey: string;
|
||||
assignedAt: DateTime;
|
||||
expiresAt: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
### Credential Scoping
|
||||
|
||||
Task credentials are:
|
||||
- Scoped to specific target only
|
||||
- Valid only for task duration
|
||||
- Encrypted with agent's public key
|
||||
- Logged when accessed (without values)
|
||||
|
||||
### Task Execution Isolation
|
||||
|
||||
Agents execute tasks with isolation:
|
||||
```typescript
|
||||
interface TaskExecutionContext {
|
||||
// Process isolation
|
||||
workingDirectory: string; // Unique per task
|
||||
processUser: string; // Non-root user
|
||||
networkNamespace: string; // If network isolation enabled
|
||||
|
||||
// Resource limits
|
||||
memoryLimit: number; // Bytes
|
||||
cpuLimit: number; // Millicores
|
||||
diskLimit: number; // Bytes
|
||||
networkEgress: string[]; // Allowed destinations
|
||||
|
||||
// Cleanup
|
||||
cleanupOnComplete: boolean;
|
||||
cleanupTimeout: number;
|
||||
}
|
||||
```
|
||||
|
||||
## Agent Capabilities
|
||||
|
||||
Agents declare capabilities that determine what tasks they can execute:
|
||||
|
||||
```typescript
|
||||
interface AgentCapabilities {
|
||||
docker?: DockerCapability;
|
||||
compose?: ComposeCapability;
|
||||
ssh?: SshCapability;
|
||||
winrm?: WinrmCapability;
|
||||
ecs?: EcsCapability;
|
||||
nomad?: NomadCapability;
|
||||
}
|
||||
|
||||
interface DockerCapability {
|
||||
version: string;
|
||||
apiVersion: string;
|
||||
runtimes: string[];
|
||||
registryAuth: boolean;
|
||||
}
|
||||
|
||||
interface ComposeCapability {
|
||||
version: string;
|
||||
fileFormats: string[];
|
||||
}
|
||||
```
|
||||
|
||||
## Heartbeat Protocol
|
||||
|
||||
```typescript
|
||||
interface AgentHeartbeat {
|
||||
agentId: UUID;
|
||||
timestamp: DateTime;
|
||||
status: "healthy" | "degraded";
|
||||
resourceUsage: {
|
||||
cpuPercent: number;
|
||||
memoryPercent: number;
|
||||
diskPercent: number;
|
||||
networkRxBytes: number;
|
||||
networkTxBytes: number;
|
||||
};
|
||||
activeTaskCount: number;
|
||||
completedTasks: number;
|
||||
failedTasks: number;
|
||||
errors: string[];
|
||||
signature: string; // HMAC of heartbeat data
|
||||
}
|
||||
```
|
||||
|
||||
### Heartbeat Validation
|
||||
|
||||
1. Verify signature matches expected HMAC
|
||||
2. Check timestamp is within acceptable skew (30s)
|
||||
3. Update agent status based on heartbeat content
|
||||
4. Trigger alerts if heartbeat missing for >90s
|
||||
|
||||
## Agent Revocation
|
||||
|
||||
When an agent is compromised or decommissioned:
|
||||
|
||||
1. Certificate added to CRL (Certificate Revocation List)
|
||||
2. All pending tasks for agent cancelled
|
||||
3. Agent removed from target assignments
|
||||
4. Audit event logged
|
||||
5. New agent can be registered with same name (new identity)
|
||||
|
||||
## Security Checklist
|
||||
|
||||
| Control | Implementation |
|
||||
|---------|----------------|
|
||||
| Identity | mTLS certificates signed by internal CA |
|
||||
| Authentication | Certificate-based + short-lived JWT |
|
||||
| Authorization | Task-scoped credentials |
|
||||
| Encryption | TLS 1.3 for transport, envelope encryption for secrets |
|
||||
| Isolation | Process sandboxing, resource limits |
|
||||
| Audit | All task assignments and completions logged |
|
||||
| Revocation | CRL for compromised agents |
|
||||
| Secret handling | Vault integration, no persistence |
|
||||
|
||||
## References
|
||||
|
||||
- [Security Overview](overview.md)
|
||||
- [Authentication & Authorization](auth.md)
|
||||
- [Threat Model](threat-model.md)
|
||||
239
docs/modules/release-orchestrator/security/audit-trail.md
Normal file
239
docs/modules/release-orchestrator/security/audit-trail.md
Normal file
@@ -0,0 +1,239 @@
|
||||
# Audit Trail
|
||||
|
||||
> Audit event structure and audited operations for compliance and forensics.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 8.5](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Evidence Module](../modules/evidence.md), [Security Overview](overview.md)
|
||||
**Sprints:** [109_001 Evidence Collector](../../../../implplan/SPRINT_20260110_109_001_RELEVI_evidence_collector.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The Release Orchestrator maintains a tamper-evident audit trail of all security-relevant operations. Audit events are cryptographically chained to detect tampering.
|
||||
|
||||
---
|
||||
|
||||
## Audit Event Structure
|
||||
|
||||
### TypeScript Interface
|
||||
|
||||
```typescript
|
||||
interface AuditEvent {
|
||||
id: UUID;
|
||||
timestamp: DateTime;
|
||||
tenantId: UUID;
|
||||
|
||||
// Actor
|
||||
actorType: "user" | "agent" | "system" | "plugin";
|
||||
actorId: UUID;
|
||||
actorName: string;
|
||||
actorIp?: string;
|
||||
|
||||
// Action
|
||||
action: string; // "promotion.approved", "deployment.started"
|
||||
resource: string; // "promotion"
|
||||
resourceId: UUID;
|
||||
|
||||
// Context
|
||||
environmentId?: UUID;
|
||||
releaseId?: UUID;
|
||||
promotionId?: UUID;
|
||||
|
||||
// Details
|
||||
before?: object; // State before (for updates)
|
||||
after?: object; // State after
|
||||
metadata?: object; // Additional context
|
||||
|
||||
// Integrity
|
||||
previousEventHash: string; // Hash chain for tamper detection
|
||||
eventHash: string;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Audited Operations
|
||||
|
||||
| Category | Operations |
|
||||
|----------|------------|
|
||||
| **Authentication** | Login, logout, token refresh, failed attempts |
|
||||
| **Authorization** | Permission denied events |
|
||||
| **Environments** | Create, update, delete, freeze window changes |
|
||||
| **Releases** | Create, deprecate, archive |
|
||||
| **Promotions** | Request, approve, reject, cancel |
|
||||
| **Deployments** | Start, complete, fail, rollback |
|
||||
| **Targets** | Register, update, delete, health changes |
|
||||
| **Agents** | Register, heartbeat gaps, capability changes |
|
||||
| **Integrations** | Create, update, delete, test |
|
||||
| **Plugins** | Enable, disable, config changes |
|
||||
| **Evidence** | Create (never update/delete) |
|
||||
|
||||
---
|
||||
|
||||
## Hash Chain
|
||||
|
||||
### Chain Verification
|
||||
|
||||
The audit trail uses SHA-256 hash chaining for tamper detection:
|
||||
|
||||
```typescript
|
||||
interface HashChainEntry {
|
||||
eventId: UUID;
|
||||
eventHash: string;
|
||||
previousEventHash: string;
|
||||
}
|
||||
|
||||
function computeEventHash(event: AuditEvent): string {
|
||||
const payload = JSON.stringify({
|
||||
id: event.id,
|
||||
timestamp: event.timestamp,
|
||||
tenantId: event.tenantId,
|
||||
actorType: event.actorType,
|
||||
actorId: event.actorId,
|
||||
action: event.action,
|
||||
resource: event.resource,
|
||||
resourceId: event.resourceId,
|
||||
previousEventHash: event.previousEventHash,
|
||||
});
|
||||
|
||||
return sha256(payload);
|
||||
}
|
||||
|
||||
function verifyChain(events: AuditEvent[]): VerificationResult {
|
||||
for (let i = 1; i < events.length; i++) {
|
||||
const current = events[i];
|
||||
const previous = events[i - 1];
|
||||
|
||||
if (current.previousEventHash !== previous.eventHash) {
|
||||
return {
|
||||
valid: false,
|
||||
brokenAt: i,
|
||||
reason: "Hash chain broken"
|
||||
};
|
||||
}
|
||||
|
||||
const computed = computeEventHash(current);
|
||||
if (computed !== current.eventHash) {
|
||||
return {
|
||||
valid: false,
|
||||
brokenAt: i,
|
||||
reason: "Event hash mismatch"
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
return { valid: true };
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example Audit Events
|
||||
|
||||
### Promotion Approved
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "evt-123",
|
||||
"timestamp": "2026-01-09T14:32:15Z",
|
||||
"tenantId": "tenant-uuid",
|
||||
"actorType": "user",
|
||||
"actorId": "user-uuid",
|
||||
"actorName": "jane@example.com",
|
||||
"actorIp": "192.168.1.100",
|
||||
"action": "promotion.approved",
|
||||
"resource": "promotion",
|
||||
"resourceId": "promo-uuid",
|
||||
"environmentId": "env-uuid",
|
||||
"releaseId": "rel-uuid",
|
||||
"promotionId": "promo-uuid",
|
||||
"before": {
|
||||
"status": "pending"
|
||||
},
|
||||
"after": {
|
||||
"status": "approved",
|
||||
"approvals": 2
|
||||
},
|
||||
"metadata": {
|
||||
"comment": "LGTM"
|
||||
},
|
||||
"previousEventHash": "sha256:abc...",
|
||||
"eventHash": "sha256:def..."
|
||||
}
|
||||
```
|
||||
|
||||
### Deployment Started
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "evt-124",
|
||||
"timestamp": "2026-01-09T14:32:20Z",
|
||||
"tenantId": "tenant-uuid",
|
||||
"actorType": "system",
|
||||
"actorId": "system",
|
||||
"actorName": "deployment-orchestrator",
|
||||
"action": "deployment.started",
|
||||
"resource": "deployment",
|
||||
"resourceId": "deploy-uuid",
|
||||
"environmentId": "env-uuid",
|
||||
"releaseId": "rel-uuid",
|
||||
"promotionId": "promo-uuid",
|
||||
"after": {
|
||||
"status": "deploying",
|
||||
"strategy": "rolling",
|
||||
"targetCount": 5
|
||||
},
|
||||
"previousEventHash": "sha256:def...",
|
||||
"eventHash": "sha256:ghi..."
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Retention Policy
|
||||
|
||||
| Environment | Retention Period |
|
||||
|-------------|------------------|
|
||||
| All tenants | 7 years (compliance) |
|
||||
| After tenant deletion | 7 years (legal hold) |
|
||||
| Archive format | NDJSON, signed |
|
||||
|
||||
---
|
||||
|
||||
## Export Format
|
||||
|
||||
Audit events can be exported for compliance reporting:
|
||||
|
||||
```bash
|
||||
# Export audit trail for a date range
|
||||
GET /api/v1/audit/export?
|
||||
start=2026-01-01T00:00:00Z&
|
||||
end=2026-01-31T23:59:59Z&
|
||||
format=ndjson
|
||||
```
|
||||
|
||||
Response includes signed digest for verification:
|
||||
|
||||
```json
|
||||
{
|
||||
"export": {
|
||||
"startDate": "2026-01-01T00:00:00Z",
|
||||
"endDate": "2026-01-31T23:59:59Z",
|
||||
"eventCount": 15234,
|
||||
"firstEventHash": "sha256:abc...",
|
||||
"lastEventHash": "sha256:xyz...",
|
||||
"downloadUrl": "https://..."
|
||||
},
|
||||
"signature": "base64-signature",
|
||||
"signedAt": "2026-02-01T00:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Security Overview](overview.md)
|
||||
- [Evidence](../modules/evidence.md)
|
||||
- [Logging](../operations/logging.md)
|
||||
- [Evidence Schema](../appendices/evidence-schema.md)
|
||||
305
docs/modules/release-orchestrator/security/auth.md
Normal file
305
docs/modules/release-orchestrator/security/auth.md
Normal file
@@ -0,0 +1,305 @@
|
||||
# Authentication & Authorization
|
||||
|
||||
## Authentication Methods
|
||||
|
||||
### OAuth 2.0 for Human Users
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────────────┐
|
||||
│ OAUTH 2.0 AUTHORIZATION CODE FLOW │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────────┐ │
|
||||
│ │ Browser │ │ Authority │ │
|
||||
│ └────┬─────┘ └──────┬───────┘ │
|
||||
│ │ │ │
|
||||
│ │ 1. Login request │ │
|
||||
│ │ ────────────────────────────────────► │ │
|
||||
│ │ │ │
|
||||
│ │ 2. Redirect to IdP │ │
|
||||
│ │ ◄──────────────────────────────────── │ │
|
||||
│ │ │ │
|
||||
│ │ 3. User authenticates at IdP │ │
|
||||
│ │ ─────────────────────────────────► │ │
|
||||
│ │ │ │
|
||||
│ │ 4. IdP callback with code │ │
|
||||
│ │ ◄──────────────────────────────────── │ │
|
||||
│ │ │ │
|
||||
│ │ 5. Exchange code for tokens │ │
|
||||
│ │ ────────────────────────────────────► │ │
|
||||
│ │ │ │
|
||||
│ │ 6. Access token + refresh token │ │
|
||||
│ │ ◄──────────────────────────────────── │ │
|
||||
│ │ │ │
|
||||
└──────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### mTLS for Agents
|
||||
|
||||
Agents authenticate using mutual TLS with certificates issued by Stella's internal CA.
|
||||
|
||||
**Registration Flow:**
|
||||
1. Admin generates one-time registration token
|
||||
2. Agent starts with registration token
|
||||
3. Agent submits CSR (Certificate Signing Request)
|
||||
4. Authority issues certificate signed by Stella CA
|
||||
5. Agent uses certificate for all subsequent requests
|
||||
|
||||
### API Keys for Service-to-Service
|
||||
|
||||
External services can use API keys for programmatic access:
|
||||
- Keys are tenant-scoped
|
||||
- Keys can have restricted permissions
|
||||
- Keys can have expiration dates
|
||||
- Key usage is audited
|
||||
|
||||
## JWT Token Structure
|
||||
|
||||
### Access Token Claims
|
||||
|
||||
```typescript
|
||||
interface AccessTokenClaims {
|
||||
// Standard claims
|
||||
iss: string; // "https://authority.stella.local"
|
||||
sub: string; // User ID
|
||||
aud: string[]; // ["stella-api"]
|
||||
exp: number; // Expiration timestamp
|
||||
iat: number; // Issued at timestamp
|
||||
jti: string; // Unique token ID
|
||||
|
||||
// Custom claims
|
||||
tenant_id: string;
|
||||
roles: string[];
|
||||
permissions: Permission[];
|
||||
email?: string;
|
||||
name?: string;
|
||||
}
|
||||
```
|
||||
|
||||
### Token Lifetimes
|
||||
|
||||
| Token Type | Lifetime | Refresh |
|
||||
|------------|----------|---------|
|
||||
| Access Token | 15 minutes | Via refresh token |
|
||||
| Refresh Token | 7 days | Rotated on use |
|
||||
| Agent Token | 1 hour | Via mTLS connection |
|
||||
| API Key | Configurable | Not refreshed |
|
||||
|
||||
## Authorization Model
|
||||
|
||||
### Resource Types
|
||||
|
||||
```typescript
|
||||
type ResourceType =
|
||||
| "environment"
|
||||
| "release"
|
||||
| "promotion"
|
||||
| "target"
|
||||
| "agent"
|
||||
| "workflow"
|
||||
| "plugin"
|
||||
| "integration"
|
||||
| "evidence";
|
||||
```
|
||||
|
||||
### Action Types
|
||||
|
||||
```typescript
|
||||
type ActionType =
|
||||
| "create"
|
||||
| "read"
|
||||
| "update"
|
||||
| "delete"
|
||||
| "execute"
|
||||
| "approve"
|
||||
| "deploy"
|
||||
| "rollback";
|
||||
```
|
||||
|
||||
### Permission Structure
|
||||
|
||||
```typescript
|
||||
interface Permission {
|
||||
resource: ResourceType;
|
||||
action: ActionType;
|
||||
scope?: PermissionScope;
|
||||
conditions?: Condition[];
|
||||
}
|
||||
|
||||
type PermissionScope =
|
||||
| "*" // All resources
|
||||
| { environmentId: UUID } // Specific environment
|
||||
| { labels: Record<string, string> }; // Label-based
|
||||
```
|
||||
|
||||
### Built-in Roles
|
||||
|
||||
| Role | Description | Key Permissions |
|
||||
|------|-------------|-----------------|
|
||||
| `admin` | Full access | All permissions |
|
||||
| `release_manager` | Manage releases and promotions | Create releases, request promotions |
|
||||
| `deployer` | Execute deployments | Approve promotions (where allowed), view releases |
|
||||
| `approver` | Approve promotions | Approve promotions (SoD respected) |
|
||||
| `viewer` | Read-only access | Read all resources |
|
||||
| `agent` | Agent service account | Execute deployment tasks |
|
||||
|
||||
### Role Definitions
|
||||
|
||||
```typescript
|
||||
const roles = {
|
||||
admin: {
|
||||
permissions: [
|
||||
{ resource: "*", action: "*" }
|
||||
]
|
||||
},
|
||||
release_manager: {
|
||||
permissions: [
|
||||
{ resource: "release", action: "create" },
|
||||
{ resource: "release", action: "read" },
|
||||
{ resource: "release", action: "update" },
|
||||
{ resource: "promotion", action: "create" },
|
||||
{ resource: "promotion", action: "read" },
|
||||
{ resource: "environment", action: "read" },
|
||||
{ resource: "workflow", action: "read" },
|
||||
{ resource: "workflow", action: "execute" }
|
||||
]
|
||||
},
|
||||
deployer: {
|
||||
permissions: [
|
||||
{ resource: "release", action: "read" },
|
||||
{ resource: "promotion", action: "read" },
|
||||
{ resource: "promotion", action: "approve" },
|
||||
{ resource: "environment", action: "read" },
|
||||
{ resource: "target", action: "read" },
|
||||
{ resource: "agent", action: "read" }
|
||||
]
|
||||
},
|
||||
approver: {
|
||||
permissions: [
|
||||
{ resource: "promotion", action: "read" },
|
||||
{ resource: "promotion", action: "approve" },
|
||||
{ resource: "release", action: "read" },
|
||||
{ resource: "environment", action: "read" }
|
||||
]
|
||||
},
|
||||
viewer: {
|
||||
permissions: [
|
||||
{ resource: "*", action: "read" }
|
||||
]
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Environment-Scoped Permissions
|
||||
|
||||
Permissions can be scoped to specific environments:
|
||||
|
||||
```typescript
|
||||
// User can approve promotions only to staging
|
||||
{
|
||||
resource: "promotion",
|
||||
action: "approve",
|
||||
scope: { environmentId: "staging-env-id" }
|
||||
}
|
||||
|
||||
// User can deploy only to targets with specific labels
|
||||
{
|
||||
resource: "target",
|
||||
action: "deploy",
|
||||
scope: { labels: { "tier": "frontend" } }
|
||||
}
|
||||
```
|
||||
|
||||
## Separation of Duties (SoD)
|
||||
|
||||
When SoD is enabled for an environment:
|
||||
- The user who requested a promotion cannot approve it
|
||||
- The user who created a release cannot be the sole approver
|
||||
- Approval records include SoD verification status
|
||||
|
||||
```typescript
|
||||
interface ApprovalValidation {
|
||||
promotionId: UUID;
|
||||
approverId: UUID;
|
||||
requesterId: UUID;
|
||||
sodRequired: boolean;
|
||||
sodSatisfied: boolean;
|
||||
validationResult: "valid" | "self_approval_denied" | "sod_violation";
|
||||
}
|
||||
```
|
||||
|
||||
## Permission Checking Algorithm
|
||||
|
||||
```typescript
|
||||
async function checkPermission(
|
||||
userId: UUID,
|
||||
resource: ResourceType,
|
||||
action: ActionType,
|
||||
resourceId?: UUID
|
||||
): Promise<boolean> {
|
||||
// 1. Get user's roles and direct permissions
|
||||
const userRoles = await getUserRoles(userId);
|
||||
const userPermissions = await getUserPermissions(userId);
|
||||
|
||||
// 2. Expand role permissions
|
||||
const rolePermissions = userRoles.flatMap(r => roles[r].permissions);
|
||||
const allPermissions = [...rolePermissions, ...userPermissions];
|
||||
|
||||
// 3. Check for matching permission
|
||||
for (const perm of allPermissions) {
|
||||
if (matchesResource(perm.resource, resource) &&
|
||||
matchesAction(perm.action, action) &&
|
||||
matchesScope(perm.scope, resourceId) &&
|
||||
evaluateConditions(perm.conditions)) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
function matchesResource(pattern: string, resource: string): boolean {
|
||||
return pattern === "*" || pattern === resource;
|
||||
}
|
||||
|
||||
function matchesAction(pattern: string, action: string): boolean {
|
||||
return pattern === "*" || pattern === action;
|
||||
}
|
||||
```
|
||||
|
||||
## API Authorization Headers
|
||||
|
||||
All API requests require:
|
||||
```http
|
||||
Authorization: Bearer <access_token>
|
||||
```
|
||||
|
||||
For agent requests (over mTLS):
|
||||
```http
|
||||
X-Agent-Id: <agent_id>
|
||||
Authorization: Bearer <agent_token>
|
||||
```
|
||||
|
||||
## Permission Denied Response
|
||||
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"error": {
|
||||
"code": "PERMISSION_DENIED",
|
||||
"message": "User does not have permission to approve promotions to production",
|
||||
"details": {
|
||||
"resource": "promotion",
|
||||
"action": "approve",
|
||||
"scope": { "environmentId": "prod-env-id" },
|
||||
"requiredRoles": ["admin", "approver"],
|
||||
"userRoles": ["viewer"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Security Overview](overview.md)
|
||||
- [Agent Security](agent-security.md)
|
||||
- [Authority Module](../../../authority/architecture.md)
|
||||
281
docs/modules/release-orchestrator/security/overview.md
Normal file
281
docs/modules/release-orchestrator/security/overview.md
Normal file
@@ -0,0 +1,281 @@
|
||||
# Security Architecture Overview
|
||||
|
||||
## Security Principles
|
||||
|
||||
| Principle | Implementation |
|
||||
|-----------|----------------|
|
||||
| **Defense in depth** | Multiple layers: network, auth, authz, audit |
|
||||
| **Least privilege** | Role-based access; minimal permissions |
|
||||
| **Zero trust** | All requests authenticated; mTLS for agents |
|
||||
| **Secrets hygiene** | Secrets in vault; never in DB; ephemeral injection |
|
||||
| **Audit everything** | All mutations logged; evidence trail |
|
||||
| **Immutable evidence** | Evidence packets append-only; cryptographically signed |
|
||||
|
||||
## Authentication Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ AUTHENTICATION ARCHITECTURE │
|
||||
│ │
|
||||
│ Human Users Service/Agent │
|
||||
│ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ Browser │ │ Agent │ │
|
||||
│ └────┬─────┘ └────┬─────┘ │
|
||||
│ │ │ │
|
||||
│ │ OAuth 2.0 │ mTLS + JWT │
|
||||
│ │ Authorization Code │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌──────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ AUTHORITY MODULE │ │
|
||||
│ │ │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
|
||||
│ │ │ OAuth 2.0 │ │ mTLS │ │ API Key │ │ │
|
||||
│ │ │ Provider │ │ Validator │ │ Validator │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ TOKEN ISSUER │ │ │
|
||||
│ │ │ - Short-lived JWT (15 min) │ │ │
|
||||
│ │ │ - Contains: user_id, tenant_id, roles, permissions │ │ │
|
||||
│ │ │ - Signed with RS256 │ │ │
|
||||
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
||||
│ └──────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ API GATEWAY │ │
|
||||
│ │ │ │
|
||||
│ │ - Validate JWT signature │ │
|
||||
│ │ - Check token expiration │ │
|
||||
│ │ - Extract tenant context │ │
|
||||
│ │ - Enforce rate limits │ │
|
||||
│ └──────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Authorization Model
|
||||
|
||||
### Permission Structure
|
||||
|
||||
```typescript
|
||||
interface Permission {
|
||||
resource: ResourceType;
|
||||
action: ActionType;
|
||||
scope?: ScopeType;
|
||||
conditions?: Condition[];
|
||||
}
|
||||
|
||||
type ResourceType =
|
||||
| "environment"
|
||||
| "release"
|
||||
| "promotion"
|
||||
| "target"
|
||||
| "agent"
|
||||
| "workflow"
|
||||
| "plugin"
|
||||
| "integration"
|
||||
| "evidence";
|
||||
|
||||
type ActionType =
|
||||
| "create"
|
||||
| "read"
|
||||
| "update"
|
||||
| "delete"
|
||||
| "execute"
|
||||
| "approve"
|
||||
| "deploy"
|
||||
| "rollback";
|
||||
|
||||
type ScopeType =
|
||||
| "*" // All resources
|
||||
| { environmentId: UUID } // Specific environment
|
||||
| { labels: Record<string, string> }; // Label-based
|
||||
```
|
||||
|
||||
### Role Definitions
|
||||
|
||||
| Role | Permissions |
|
||||
|------|-------------|
|
||||
| `admin` | All permissions on all resources |
|
||||
| `release-manager` | Full access to releases, promotions; read environments/targets |
|
||||
| `deployer` | Read releases; create/read promotions; read targets |
|
||||
| `approver` | Read/approve promotions |
|
||||
| `viewer` | Read-only access to all resources |
|
||||
|
||||
### Environment-Scoped Roles
|
||||
|
||||
Roles can be scoped to specific environments:
|
||||
|
||||
```typescript
|
||||
// Example: Production deployer can only deploy to production
|
||||
const prodDeployer = {
|
||||
role: "deployer",
|
||||
scope: { environmentId: "prod-environment-uuid" }
|
||||
};
|
||||
```
|
||||
|
||||
## Policy Enforcement Points
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ POLICY ENFORCEMENT POINTS │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ API LAYER (PEP 1) │ │
|
||||
│ │ - Authenticate request │ │
|
||||
│ │ - Check resource-level permissions │ │
|
||||
│ │ - Enforce tenant isolation │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ SERVICE LAYER (PEP 2) │ │
|
||||
│ │ - Check business-level permissions │ │
|
||||
│ │ - Validate separation of duties │ │
|
||||
│ │ - Enforce approval policies │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ DECISION ENGINE (PEP 3) │ │
|
||||
│ │ - Evaluate security gates │ │
|
||||
│ │ - Evaluate custom OPA policies │ │
|
||||
│ │ - Produce signed decision records │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ DATA LAYER (PEP 4) │ │
|
||||
│ │ - Row-level security (tenant_id) │ │
|
||||
│ │ - Append-only enforcement (evidence) │ │
|
||||
│ │ - Encryption at rest │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Agent Security Model
|
||||
|
||||
See [Agent Security](agent-security.md) for detailed agent security architecture.
|
||||
|
||||
Key features:
|
||||
- mTLS authentication with CA-signed certificates
|
||||
- One-time registration tokens
|
||||
- Short-lived JWT for task execution
|
||||
- Encrypted task payloads
|
||||
- Scoped credentials per task
|
||||
|
||||
## Secrets Management
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ SECRETS FLOW (NEVER STORED IN DB) │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ VAULT │ │ STELLA CORE │ │ AGENT │ │
|
||||
│ │ (Source) │ │ (Broker) │ │ (Consumer) │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ │ │ │ │
|
||||
│ │ │ Task requires secret │ │
|
||||
│ │ │ │ │
|
||||
│ │ Fetch with service │ │ │
|
||||
│ │ account token │ │ │
|
||||
│ │◄─────────────────────── │ │
|
||||
│ │ │ │ │
|
||||
│ │ Return secret │ │ │
|
||||
│ │ (wrapped, short TTL) │ │ │
|
||||
│ │───────────────────────► │ │
|
||||
│ │ │ │ │
|
||||
│ │ │ Embed in task payload │ │
|
||||
│ │ │ (encrypted) │ │
|
||||
│ │ │───────────────────────► │
|
||||
│ │ │ │ │
|
||||
│ │ │ │ Decrypt │
|
||||
│ │ │ │ Use for task │
|
||||
│ │ │ │ Discard │
|
||||
│ │
|
||||
│ Rules: │
|
||||
│ - Secrets NEVER stored in Stella database │
|
||||
│ - Only Vault references stored │
|
||||
│ - Secrets fetched at execution time only │
|
||||
│ - Secrets not logged (masked in logs) │
|
||||
│ - Secrets not persisted in agent memory beyond task scope │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Threat Model
|
||||
|
||||
| Threat | Attack Vector | Mitigation |
|
||||
|--------|---------------|------------|
|
||||
| **Credential theft** | Database breach | Secrets never in DB; only vault refs |
|
||||
| **Token replay** | Stolen JWT | Short-lived tokens (15 min); refresh tokens rotated |
|
||||
| **Agent impersonation** | Fake agent | mTLS with CA-signed certs; registration token one-time |
|
||||
| **Digest tampering** | Modified image | Digest verification at pull time; mismatch = failure |
|
||||
| **Evidence tampering** | Modified audit records | Append-only table; cryptographic signing |
|
||||
| **Privilege escalation** | Compromised account | Role-based access; SoD enforcement; audit logs |
|
||||
| **Supply chain attack** | Malicious plugin | Plugin sandbox; capability declarations; review process |
|
||||
| **Lateral movement** | Compromised target | Short-lived task credentials; scoped permissions |
|
||||
| **Data exfiltration** | Log/artifact theft | Encryption at rest; network segmentation |
|
||||
| **Denial of service** | Resource exhaustion | Rate limiting; resource quotas; circuit breakers |
|
||||
|
||||
## Audit Trail
|
||||
|
||||
### Audit Event Structure
|
||||
|
||||
```typescript
|
||||
interface AuditEvent {
|
||||
id: UUID;
|
||||
timestamp: DateTime;
|
||||
tenantId: UUID;
|
||||
|
||||
// Actor
|
||||
actorType: "user" | "agent" | "system" | "plugin";
|
||||
actorId: UUID;
|
||||
actorName: string;
|
||||
actorIp?: string;
|
||||
|
||||
// Action
|
||||
action: string; // "promotion.approved", "deployment.started"
|
||||
resource: string; // "promotion"
|
||||
resourceId: UUID;
|
||||
|
||||
// Context
|
||||
environmentId?: UUID;
|
||||
releaseId?: UUID;
|
||||
promotionId?: UUID;
|
||||
|
||||
// Details
|
||||
before?: object; // State before (for updates)
|
||||
after?: object; // State after
|
||||
metadata?: object; // Additional context
|
||||
|
||||
// Integrity
|
||||
previousEventHash: string; // Hash chain for tamper detection
|
||||
eventHash: string;
|
||||
}
|
||||
```
|
||||
|
||||
### Audited Operations
|
||||
|
||||
| Category | Operations |
|
||||
|----------|------------|
|
||||
| **Authentication** | Login, logout, token refresh, failed attempts |
|
||||
| **Authorization** | Permission denied events |
|
||||
| **Environments** | Create, update, delete, freeze window changes |
|
||||
| **Releases** | Create, deprecate, archive |
|
||||
| **Promotions** | Request, approve, reject, cancel |
|
||||
| **Deployments** | Start, complete, fail, rollback |
|
||||
| **Targets** | Register, update, delete, health changes |
|
||||
| **Agents** | Register, heartbeat gaps, capability changes |
|
||||
| **Integrations** | Create, update, delete, test |
|
||||
| **Plugins** | Enable, disable, config changes |
|
||||
| **Evidence** | Create (never update/delete) |
|
||||
|
||||
## References
|
||||
|
||||
- [Authentication & Authorization](auth.md)
|
||||
- [Agent Security](agent-security.md)
|
||||
- [Threat Model](threat-model.md)
|
||||
- [Audit Trail](audit-trail.md)
|
||||
207
docs/modules/release-orchestrator/security/threat-model.md
Normal file
207
docs/modules/release-orchestrator/security/threat-model.md
Normal file
@@ -0,0 +1,207 @@
|
||||
# Threat Model
|
||||
|
||||
## Overview
|
||||
|
||||
This document identifies threats to the Release Orchestrator and their mitigations.
|
||||
|
||||
## Threat Categories
|
||||
|
||||
### T1: Credential Theft
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker gains access to credentials through database breach |
|
||||
| **Attack Vector** | SQL injection, database backup theft, insider threat |
|
||||
| **Assets at Risk** | Registry credentials, vault tokens, SSH keys |
|
||||
| **Mitigation** | Secrets NEVER stored in database; only vault references stored |
|
||||
| **Detection** | Anomalous vault access patterns, failed authentication attempts |
|
||||
|
||||
### T2: Token Replay
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker captures and reuses valid JWT tokens |
|
||||
| **Attack Vector** | Man-in-the-middle, log file exposure, memory dump |
|
||||
| **Assets at Risk** | User sessions, API access |
|
||||
| **Mitigation** | Short-lived tokens (15 min), refresh token rotation, TLS everywhere |
|
||||
| **Detection** | Token used from unusual IP, concurrent sessions |
|
||||
|
||||
### T3: Agent Impersonation
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker registers fake agent to receive deployment tasks |
|
||||
| **Attack Vector** | Stolen registration token, certificate forgery |
|
||||
| **Assets at Risk** | Deployment credentials, target access |
|
||||
| **Mitigation** | One-time registration tokens, mTLS with CA-signed certs |
|
||||
| **Detection** | Registration from unexpected network, capability mismatch |
|
||||
|
||||
### T4: Digest Tampering
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker modifies container image after release creation |
|
||||
| **Attack Vector** | Registry compromise, man-in-the-middle at pull time |
|
||||
| **Assets at Risk** | Application integrity, supply chain |
|
||||
| **Mitigation** | Digest verification at pull time; mismatch = deployment failure |
|
||||
| **Detection** | Pull failures due to digest mismatch |
|
||||
|
||||
### T5: Evidence Tampering
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker modifies audit records to hide malicious activity |
|
||||
| **Attack Vector** | Database admin access, SQL injection |
|
||||
| **Assets at Risk** | Audit integrity, compliance |
|
||||
| **Mitigation** | Append-only table, cryptographic signing, no UPDATE/DELETE |
|
||||
| **Detection** | Signature verification failure, hash chain break |
|
||||
|
||||
### T6: Privilege Escalation
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | User gains permissions beyond their role |
|
||||
| **Attack Vector** | Role assignment exploit, permission bypass |
|
||||
| **Assets at Risk** | Environment access, approval authority |
|
||||
| **Mitigation** | Role-based access, SoD enforcement, audit logs |
|
||||
| **Detection** | Unusual permission patterns, SoD violation attempts |
|
||||
|
||||
### T7: Supply Chain Attack
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Malicious plugin injected into workflow |
|
||||
| **Attack Vector** | Plugin repository compromise, typosquatting |
|
||||
| **Assets at Risk** | All environments, all credentials |
|
||||
| **Mitigation** | Plugin sandbox, capability declarations, signed manifests |
|
||||
| **Detection** | Unexpected network egress, resource anomalies |
|
||||
|
||||
### T8: Lateral Movement
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker uses compromised target to access others |
|
||||
| **Attack Vector** | Target compromise, credential reuse |
|
||||
| **Assets at Risk** | Other targets, environments |
|
||||
| **Mitigation** | Short-lived task credentials, scoped permissions |
|
||||
| **Detection** | Cross-target credential use, unexpected connections |
|
||||
|
||||
### T9: Data Exfiltration
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker extracts logs, artifacts, or configuration |
|
||||
| **Attack Vector** | API abuse, log aggregator compromise |
|
||||
| **Assets at Risk** | Application data, deployment configurations |
|
||||
| **Mitigation** | Encryption at rest, network segmentation, audit logging |
|
||||
| **Detection** | Large data transfers, unusual API patterns |
|
||||
|
||||
### T10: Denial of Service
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker exhausts resources to prevent deployments |
|
||||
| **Attack Vector** | API flooding, workflow loop, agent task spam |
|
||||
| **Assets at Risk** | Service availability |
|
||||
| **Mitigation** | Rate limiting, resource quotas, circuit breakers |
|
||||
| **Detection** | Resource exhaustion alerts, traffic spikes |
|
||||
|
||||
## STRIDE Analysis
|
||||
|
||||
| Category | Threats | Primary Mitigations |
|
||||
|----------|---------|---------------------|
|
||||
| **Spoofing** | T3 Agent Impersonation | mTLS, registration tokens |
|
||||
| **Tampering** | T4 Digest, T5 Evidence | Digest verification, append-only tables |
|
||||
| **Repudiation** | Evidence manipulation | Signed evidence packets |
|
||||
| **Information Disclosure** | T1 Credentials, T9 Exfiltration | Vault integration, encryption |
|
||||
| **Denial of Service** | T10 Resource exhaustion | Rate limits, quotas |
|
||||
| **Elevation of Privilege** | T6 Escalation | RBAC, SoD enforcement |
|
||||
|
||||
## Trust Boundaries
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ TRUST BOUNDARIES │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PUBLIC NETWORK (Untrusted) │ │
|
||||
│ │ │ │
|
||||
│ │ Internet, External Users, External Services │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ │ TLS + Authentication │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ DMZ (Semi-trusted) │ │
|
||||
│ │ │ │
|
||||
│ │ API Gateway, Webhook Gateway │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ │ Internal mTLS │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ INTERNAL NETWORK (Trusted) │ │
|
||||
│ │ │ │
|
||||
│ │ Stella Core Services, Database, Internal Vault │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ │ Agent mTLS │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ DEPLOYMENT NETWORK (Controlled) │ │
|
||||
│ │ │ │
|
||||
│ │ Agents, Targets │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Data Classification
|
||||
|
||||
| Classification | Examples | Protection Requirements |
|
||||
|---------------|----------|------------------------|
|
||||
| **Critical** | Vault credentials, signing keys | Hardware security, minimal access |
|
||||
| **Sensitive** | User tokens, agent certificates | Encryption, access logging |
|
||||
| **Internal** | Release configs, workflow definitions | Encryption at rest |
|
||||
| **Public** | API documentation, release names | Integrity protection |
|
||||
|
||||
## Security Controls Summary
|
||||
|
||||
| Control | Implementation | Threats Addressed |
|
||||
|---------|----------------|-------------------|
|
||||
| mTLS | Agent communication | T3 |
|
||||
| Short-lived tokens | 15-min access tokens | T2 |
|
||||
| Vault integration | No secrets in DB | T1 |
|
||||
| Digest verification | Pull-time validation | T4 |
|
||||
| Append-only tables | Evidence immutability | T5 |
|
||||
| RBAC + SoD | Permission enforcement | T6 |
|
||||
| Plugin sandbox | Resource limits, capability control | T7 |
|
||||
| Scoped credentials | Task-specific access | T8 |
|
||||
| Encryption | At rest and in transit | T9 |
|
||||
| Rate limiting | API and resource quotas | T10 |
|
||||
|
||||
## Incident Response
|
||||
|
||||
### Detection Signals
|
||||
|
||||
| Signal | Indicates | Response |
|
||||
|--------|-----------|----------|
|
||||
| Digest mismatch at pull | T4 Tampering | Halt deployment, investigate registry |
|
||||
| Evidence signature failure | T5 Tampering | Preserve logs, forensic analysis |
|
||||
| Unusual agent registration | T3 Impersonation | Revoke agent, review access |
|
||||
| SoD violation attempt | T6 Escalation | Block action, alert admin |
|
||||
| Plugin network egress | T7 Supply chain | Isolate plugin, review manifest |
|
||||
|
||||
### Response Procedures
|
||||
|
||||
1. **Contain** - Isolate affected component (revoke token, disable agent)
|
||||
2. **Investigate** - Collect logs, evidence packets, audit trail
|
||||
3. **Remediate** - Patch vulnerability, rotate credentials
|
||||
4. **Recover** - Restore service, verify integrity
|
||||
5. **Report** - Document incident, update threat model
|
||||
|
||||
## References
|
||||
|
||||
- [Security Overview](overview.md)
|
||||
- [Agent Security](agent-security.md)
|
||||
- [Audit Trail](audit-trail.md)
|
||||
508
docs/modules/release-orchestrator/test-structure.md
Normal file
508
docs/modules/release-orchestrator/test-structure.md
Normal file
@@ -0,0 +1,508 @@
|
||||
# Test Structure & Guidelines
|
||||
|
||||
> Test organization, categorization, and patterns for Release Orchestrator modules.
|
||||
|
||||
---
|
||||
|
||||
## Test Directory Layout
|
||||
|
||||
Release Orchestrator tests follow the Stella Ops standard test structure:
|
||||
|
||||
```
|
||||
src/ReleaseOrchestrator/
|
||||
├── __Libraries/
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Core/
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Workflow/
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Promotion/
|
||||
│ └── StellaOps.ReleaseOrchestrator.Deploy/
|
||||
├── __Tests/
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Core.Tests/ # Unit tests for Core
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Workflow.Tests/ # Unit tests for Workflow
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Promotion.Tests/ # Unit tests for Promotion
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Deploy.Tests/ # Unit tests for Deploy
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Integration.Tests/ # Integration tests
|
||||
│ └── StellaOps.ReleaseOrchestrator.Acceptance.Tests/ # End-to-end tests
|
||||
└── StellaOps.ReleaseOrchestrator.WebService/
|
||||
```
|
||||
|
||||
**Shared test infrastructure**:
|
||||
```
|
||||
src/__Tests/__Libraries/
|
||||
├── StellaOps.Infrastructure.Postgres.Testing/ # PostgreSQL Testcontainers fixtures
|
||||
└── StellaOps.Testing.Common/ # Common test utilities
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test Categories
|
||||
|
||||
Tests **MUST** be categorized using xUnit traits to enable selective execution:
|
||||
|
||||
### Unit Tests
|
||||
|
||||
```csharp
|
||||
[Trait("Category", "Unit")]
|
||||
public class PromotionValidatorTests
|
||||
{
|
||||
[Fact]
|
||||
public void Validate_MissingReleaseId_ReturnsFalse()
|
||||
{
|
||||
// Arrange
|
||||
var validator = new PromotionValidator();
|
||||
var promotion = new Promotion { ReleaseId = Guid.Empty };
|
||||
|
||||
// Act
|
||||
var result = validator.Validate(promotion);
|
||||
|
||||
// Assert
|
||||
Assert.False(result.IsValid);
|
||||
Assert.Contains("ReleaseId is required", result.Errors);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Characteristics**:
|
||||
- No database, network, or file system access
|
||||
- Fast execution (< 100ms per test)
|
||||
- Isolated from external dependencies
|
||||
- Deterministic and repeatable
|
||||
|
||||
### Integration Tests
|
||||
|
||||
```csharp
|
||||
[Trait("Category", "Integration")]
|
||||
public class PromotionRepositoryTests : IClassFixture<PostgresFixture>
|
||||
{
|
||||
private readonly PostgresFixture _fixture;
|
||||
|
||||
public PromotionRepositoryTests(PostgresFixture fixture)
|
||||
{
|
||||
_fixture = fixture;
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task SaveAsync_ValidPromotion_PersistsToDatabase()
|
||||
{
|
||||
// Arrange
|
||||
await using var connection = _fixture.CreateConnection();
|
||||
var repository = new PromotionRepository(connection, _fixture.TimeProvider);
|
||||
|
||||
var promotion = new Promotion
|
||||
{
|
||||
Id = Guid.NewGuid(),
|
||||
TenantId = _fixture.DefaultTenantId,
|
||||
ReleaseId = Guid.NewGuid(),
|
||||
TargetEnvironmentId = Guid.NewGuid(),
|
||||
Status = PromotionState.PendingApproval,
|
||||
RequestedAt = _fixture.TimeProvider.GetUtcNow(),
|
||||
RequestedBy = Guid.NewGuid()
|
||||
};
|
||||
|
||||
// Act
|
||||
await repository.SaveAsync(promotion, CancellationToken.None);
|
||||
|
||||
// Assert
|
||||
var retrieved = await repository.GetByIdAsync(promotion.Id, CancellationToken.None);
|
||||
Assert.NotNull(retrieved);
|
||||
Assert.Equal(promotion.ReleaseId, retrieved.ReleaseId);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Characteristics**:
|
||||
- Uses Testcontainers for PostgreSQL
|
||||
- Requires Docker to be running
|
||||
- Slower execution (hundreds of ms per test)
|
||||
- Tests data access layer and database constraints
|
||||
|
||||
### Acceptance Tests
|
||||
|
||||
```csharp
|
||||
[Trait("Category", "Acceptance")]
|
||||
public class PromotionWorkflowTests : IClassFixture<WebApplicationFactory<Program>>
|
||||
{
|
||||
private readonly WebApplicationFactory<Program> _factory;
|
||||
private readonly HttpClient _client;
|
||||
|
||||
public PromotionWorkflowTests(WebApplicationFactory<Program> factory)
|
||||
{
|
||||
_factory = factory;
|
||||
_client = factory.CreateClient();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task PromotionWorkflow_EndToEnd_SuccessfullyDeploysRelease()
|
||||
{
|
||||
// Arrange: Create environment, release, and promotion
|
||||
var envId = await CreateEnvironmentAsync("Production");
|
||||
var releaseId = await CreateReleaseAsync("v2.3.1");
|
||||
|
||||
// Act: Request promotion
|
||||
var promotionResponse = await _client.PostAsJsonAsync(
|
||||
"/api/v1/promotions",
|
||||
new { releaseId, targetEnvironmentId = envId });
|
||||
|
||||
promotionResponse.EnsureSuccessStatusCode();
|
||||
var promotion = await promotionResponse.Content.ReadFromJsonAsync<PromotionDto>();
|
||||
|
||||
// Act: Approve promotion
|
||||
var approveResponse = await _client.PostAsync(
|
||||
$"/api/v1/promotions/{promotion.Id}/approve", null);
|
||||
|
||||
approveResponse.EnsureSuccessStatusCode();
|
||||
|
||||
// Assert: Verify deployment completed
|
||||
var status = await GetPromotionStatusAsync(promotion.Id);
|
||||
Assert.Equal("deployed", status.Status);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Characteristics**:
|
||||
- Tests full API surface and workflows
|
||||
- Uses `WebApplicationFactory` for in-memory hosting
|
||||
- Tests end-to-end scenarios
|
||||
- May involve multiple services
|
||||
|
||||
---
|
||||
|
||||
## PostgreSQL Test Fixtures
|
||||
|
||||
### Testcontainers Fixture
|
||||
|
||||
```csharp
|
||||
public class PostgresFixture : IAsyncLifetime
|
||||
{
|
||||
private PostgreSqlContainer? _container;
|
||||
private NpgsqlConnection? _connection;
|
||||
public TimeProvider TimeProvider { get; private set; } = null!;
|
||||
public IGuidGenerator GuidGenerator { get; private set; } = null!;
|
||||
public Guid DefaultTenantId { get; private set; }
|
||||
|
||||
public async Task InitializeAsync()
|
||||
{
|
||||
// Start PostgreSQL container
|
||||
_container = new PostgreSqlBuilder()
|
||||
.WithImage("postgres:16")
|
||||
.WithDatabase("stellaops_test")
|
||||
.WithUsername("postgres")
|
||||
.WithPassword("postgres")
|
||||
.Build();
|
||||
|
||||
await _container.StartAsync();
|
||||
|
||||
// Create connection
|
||||
_connection = new NpgsqlConnection(_container.GetConnectionString());
|
||||
await _connection.OpenAsync();
|
||||
|
||||
// Run migrations
|
||||
await ApplyMigrationsAsync();
|
||||
|
||||
// Setup test infrastructure
|
||||
TimeProvider = new ManualTimeProvider();
|
||||
GuidGenerator = new SequentialGuidGenerator();
|
||||
DefaultTenantId = Guid.Parse("00000000-0000-0000-0000-000000000001");
|
||||
|
||||
// Seed test data
|
||||
await SeedTestDataAsync();
|
||||
}
|
||||
|
||||
public NpgsqlConnection CreateConnection()
|
||||
{
|
||||
if (_container == null)
|
||||
throw new InvalidOperationException("Container not initialized");
|
||||
|
||||
return new NpgsqlConnection(_container.GetConnectionString());
|
||||
}
|
||||
|
||||
private async Task ApplyMigrationsAsync()
|
||||
{
|
||||
// Apply schema migrations
|
||||
await ExecuteSqlFileAsync("schema/release-orchestrator-schema.sql");
|
||||
}
|
||||
|
||||
private async Task SeedTestDataAsync()
|
||||
{
|
||||
// Create default tenant
|
||||
await using var cmd = _connection!.CreateCommand();
|
||||
cmd.CommandText = @"
|
||||
INSERT INTO tenants (id, name, created_at)
|
||||
VALUES (@id, @name, @created_at)
|
||||
ON CONFLICT DO NOTHING";
|
||||
cmd.Parameters.AddWithValue("id", DefaultTenantId);
|
||||
cmd.Parameters.AddWithValue("name", "Test Tenant");
|
||||
cmd.Parameters.AddWithValue("created_at", TimeProvider.GetUtcNow());
|
||||
await cmd.ExecuteNonQueryAsync();
|
||||
}
|
||||
|
||||
public async Task DisposeAsync()
|
||||
{
|
||||
if (_connection != null)
|
||||
{
|
||||
await _connection.DisposeAsync();
|
||||
}
|
||||
|
||||
if (_container != null)
|
||||
{
|
||||
await _container.DisposeAsync();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test Patterns
|
||||
|
||||
### Deterministic Time in Tests
|
||||
|
||||
```csharp
|
||||
public class PromotionTimingTests
|
||||
{
|
||||
[Fact]
|
||||
public void CreatePromotion_SetsCorrectTimestamp()
|
||||
{
|
||||
// Arrange
|
||||
var manualTime = new ManualTimeProvider();
|
||||
manualTime.SetUtcNow(new DateTimeOffset(2026, 1, 10, 14, 30, 0, TimeSpan.Zero));
|
||||
|
||||
var guidGen = new SequentialGuidGenerator();
|
||||
var manager = new PromotionManager(manualTime, guidGen);
|
||||
|
||||
// Act
|
||||
var promotion = manager.CreatePromotion(
|
||||
releaseId: Guid.Parse("00000000-0000-0000-0000-000000000001"),
|
||||
targetEnvId: Guid.Parse("00000000-0000-0000-0000-000000000002")
|
||||
);
|
||||
|
||||
// Assert
|
||||
Assert.Equal(
|
||||
new DateTimeOffset(2026, 1, 10, 14, 30, 0, TimeSpan.Zero),
|
||||
promotion.RequestedAt
|
||||
);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Testing CancellationToken Propagation
|
||||
|
||||
```csharp
|
||||
public class PromotionCancellationTests
|
||||
{
|
||||
[Fact]
|
||||
public async Task ApprovePromotionAsync_CancellationRequested_ThrowsOperationCanceledException()
|
||||
{
|
||||
// Arrange
|
||||
var cts = new CancellationTokenSource();
|
||||
var repository = new Mock<IPromotionRepository>();
|
||||
|
||||
repository
|
||||
.Setup(r => r.GetByIdAsync(It.IsAny<Guid>(), It.IsAny<CancellationToken>()))
|
||||
.Returns(async (Guid id, CancellationToken ct) =>
|
||||
{
|
||||
await Task.Delay(100, ct); // Simulate delay
|
||||
return new Promotion { Id = id };
|
||||
});
|
||||
|
||||
var manager = new PromotionManager(repository.Object, TimeProvider.System, new SystemGuidGenerator());
|
||||
|
||||
// Act & Assert
|
||||
cts.Cancel(); // Cancel before operation completes
|
||||
|
||||
await Assert.ThrowsAsync<OperationCanceledException>(async () =>
|
||||
await manager.ApprovePromotionAsync(Guid.NewGuid(), Guid.NewGuid(), cts.Token)
|
||||
);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Testing Immutability
|
||||
|
||||
```csharp
|
||||
public class ReleaseImmutabilityTests
|
||||
{
|
||||
[Fact]
|
||||
public void GetComponents_ReturnsImmutableCollection()
|
||||
{
|
||||
// Arrange
|
||||
var release = new Release
|
||||
{
|
||||
Components = new Dictionary<string, ComponentDigest>
|
||||
{
|
||||
["api"] = new ComponentDigest("registry.io/api", "sha256:abc123", "v1.0.0")
|
||||
}.ToImmutableDictionary()
|
||||
};
|
||||
|
||||
// Act
|
||||
var components = release.Components;
|
||||
|
||||
// Assert: Attempting to modify throws
|
||||
Assert.Throws<NotSupportedException>(() =>
|
||||
{
|
||||
var mutable = (IDictionary<string, ComponentDigest>)components;
|
||||
mutable["web"] = new ComponentDigest("registry.io/web", "sha256:def456", "v1.0.0");
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Testing Evidence Hash Determinism
|
||||
|
||||
```csharp
|
||||
public class EvidenceHashDeterminismTests
|
||||
{
|
||||
[Fact]
|
||||
public void ComputeEvidenceHash_SameInputs_ProducesSameHash()
|
||||
{
|
||||
// Arrange
|
||||
var decisionRecord = new DecisionRecord
|
||||
{
|
||||
PromotionId = Guid.Parse("00000000-0000-0000-0000-000000000001"),
|
||||
DecidedAt = new DateTimeOffset(2026, 1, 10, 12, 0, 0, TimeSpan.Zero),
|
||||
Outcome = "approved",
|
||||
GateResults = ImmutableArray.Create(
|
||||
new GateResult("security", "pass", null)
|
||||
)
|
||||
};
|
||||
|
||||
// Act: Compute hash multiple times
|
||||
var hash1 = EvidenceHasher.ComputeHash(decisionRecord);
|
||||
var hash2 = EvidenceHasher.ComputeHash(decisionRecord);
|
||||
|
||||
// Assert: Hashes are identical
|
||||
Assert.Equal(hash1, hash2);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Running Tests
|
||||
|
||||
### Run All Tests
|
||||
|
||||
```bash
|
||||
dotnet test src/StellaOps.sln
|
||||
```
|
||||
|
||||
### Run Only Unit Tests
|
||||
|
||||
```bash
|
||||
dotnet test src/StellaOps.sln --filter "Category=Unit"
|
||||
```
|
||||
|
||||
### Run Only Integration Tests
|
||||
|
||||
```bash
|
||||
dotnet test src/StellaOps.sln --filter "Category=Integration"
|
||||
```
|
||||
|
||||
### Run Specific Test Class
|
||||
|
||||
```bash
|
||||
dotnet test --filter "FullyQualifiedName~PromotionValidatorTests"
|
||||
```
|
||||
|
||||
### Run with Coverage
|
||||
|
||||
```bash
|
||||
dotnet test src/StellaOps.sln --collect:"XPlat Code Coverage"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test Data Builders
|
||||
|
||||
Use builder pattern for complex test data:
|
||||
|
||||
```csharp
|
||||
public class PromotionBuilder
|
||||
{
|
||||
private Guid _id = Guid.NewGuid();
|
||||
private Guid _tenantId = Guid.NewGuid();
|
||||
private Guid _releaseId = Guid.NewGuid();
|
||||
private Guid _targetEnvId = Guid.NewGuid();
|
||||
private PromotionState _status = PromotionState.PendingApproval;
|
||||
private DateTimeOffset _requestedAt = DateTimeOffset.UtcNow;
|
||||
|
||||
public PromotionBuilder WithId(Guid id)
|
||||
{
|
||||
_id = id;
|
||||
return this;
|
||||
}
|
||||
|
||||
public PromotionBuilder WithStatus(PromotionState status)
|
||||
{
|
||||
_status = status;
|
||||
return this;
|
||||
}
|
||||
|
||||
public PromotionBuilder WithReleaseId(Guid releaseId)
|
||||
{
|
||||
_releaseId = releaseId;
|
||||
return this;
|
||||
}
|
||||
|
||||
public Promotion Build()
|
||||
{
|
||||
return new Promotion
|
||||
{
|
||||
Id = _id,
|
||||
TenantId = _tenantId,
|
||||
ReleaseId = _releaseId,
|
||||
TargetEnvironmentId = _targetEnvId,
|
||||
Status = _status,
|
||||
RequestedAt = _requestedAt,
|
||||
RequestedBy = Guid.NewGuid()
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// Usage in tests
|
||||
[Fact]
|
||||
public void ApprovePromotion_PendingStatus_TransitionsToApproved()
|
||||
{
|
||||
var promotion = new PromotionBuilder()
|
||||
.WithStatus(PromotionState.PendingApproval)
|
||||
.Build();
|
||||
|
||||
// ... test logic
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Code Coverage Requirements
|
||||
|
||||
- **Unit tests**: Aim for 80%+ coverage of business logic
|
||||
- **Integration tests**: Cover all data access paths and constraints
|
||||
- **Acceptance tests**: Cover critical user journeys
|
||||
|
||||
**Exclusions from coverage**:
|
||||
- Program.cs / Startup.cs configuration code
|
||||
- DTOs and simple data classes
|
||||
- Generated code
|
||||
|
||||
---
|
||||
|
||||
## Summary Checklist
|
||||
|
||||
Before merging:
|
||||
|
||||
- [ ] All tests categorized with `[Trait("Category", "...")]`
|
||||
- [ ] Unit tests use `TimeProvider` and `IGuidGenerator` for determinism
|
||||
- [ ] Integration tests use `PostgresFixture` with Testcontainers
|
||||
- [ ] `CancellationToken` propagation tested where applicable
|
||||
- [ ] Evidence hash determinism verified
|
||||
- [ ] No test reimplements production logic
|
||||
- [ ] All tests pass locally and in CI
|
||||
- [ ] Code coverage meets requirements
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Implementation Guide](./implementation-guide.md) — .NET implementation patterns
|
||||
- [CLAUDE.md](../../../CLAUDE.md) — Stella Ops coding rules
|
||||
- [PostgreSQL Testing Guide](../../infrastructure/Postgres.Testing/README.md) — Testcontainers setup
|
||||
- [src/__Tests/AGENTS.md](../../../src/__Tests/AGENTS.md) — Global test infrastructure
|
||||
207
docs/modules/release-orchestrator/ui/dashboard.md
Normal file
207
docs/modules/release-orchestrator/ui/dashboard.md
Normal file
@@ -0,0 +1,207 @@
|
||||
# Dashboard Specification
|
||||
|
||||
> Main dashboard layout and metrics specification for the Release Orchestrator UI.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 12.1](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [WebSocket APIs](../api/websockets.md), [Metrics](../operations/metrics.md)
|
||||
**Sprint:** [111_001 Dashboard Overview](../../../../implplan/SPRINT_20260110_111_001_FE_dashboard_overview.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The dashboard provides a real-time overview of security posture, release operations, estate health, and compliance status.
|
||||
|
||||
---
|
||||
|
||||
## Dashboard Layout
|
||||
|
||||
```
|
||||
+-----------------------------------------------------------------------------+
|
||||
| STELLA OPS SUITE |
|
||||
| +-----+ [User Menu v] |
|
||||
| |Logo | Dashboard Releases Environments Workflows Integrations |
|
||||
+-----------------------------------------------------------------------------+
|
||||
| |
|
||||
| +-------------------------------+ +-----------------------------------+ |
|
||||
| | SECURITY POSTURE | | RELEASE OPERATIONS | |
|
||||
| | | | | |
|
||||
| | +---------+ +---------+ | | +---------+ +---------+ | |
|
||||
| | |Critical | | High | | | |In Flight| |Completed| | |
|
||||
| | | 0 * | | 3 * | | | | 2 | | 47 | | |
|
||||
| | |reachable| |reachable| | | |deploys | | today | | |
|
||||
| | +---------+ +---------+ | | +---------+ +---------+ | |
|
||||
| | | | | |
|
||||
| | Blocked: 2 releases | | Pending Approval: 3 | |
|
||||
| | Risk Drift: 1 env | | Failed (24h): 1 | |
|
||||
| | | | | |
|
||||
| +-------------------------------+ +-----------------------------------+ |
|
||||
| |
|
||||
| +-------------------------------+ +-----------------------------------+ |
|
||||
| | ESTATE HEALTH | | COMPLIANCE/AUDIT | |
|
||||
| | | | | |
|
||||
| | Agents: 12 online, 1 offline| | Evidence Complete: 98% | |
|
||||
| | Targets: 45/47 healthy | | Policy Changes: 2 (this week) | |
|
||||
| | Drift Detected: 2 targets | | Audit Exports: 5 (this month) | |
|
||||
| | | | | |
|
||||
| +-------------------------------+ +-----------------------------------+ |
|
||||
| |
|
||||
| +-----------------------------------------------------------------------+ |
|
||||
| | RECENT ACTIVITY | |
|
||||
| | | |
|
||||
| | * 14:32 myapp-v2.3.1 deployed to prod (jane@example.com) | |
|
||||
| | o 14:28 myapp-v2.3.1 promoted to stage (auto) | |
|
||||
| | * 14:15 api-v1.2.0 blocked: critical vuln CVE-2024-1234 | |
|
||||
| | o 13:45 worker-v3.0.0 release created (john@example.com) | |
|
||||
| | * 13:30 Target prod-web-03 health: degraded | |
|
||||
| | | |
|
||||
| +-----------------------------------------------------------------------+ |
|
||||
| |
|
||||
+-----------------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dashboard Metrics
|
||||
|
||||
### TypeScript Interfaces
|
||||
|
||||
```typescript
|
||||
interface DashboardMetrics {
|
||||
// Security Posture
|
||||
security: {
|
||||
criticalReachable: number;
|
||||
highReachable: number;
|
||||
blockedReleases: number;
|
||||
riskDriftEnvironments: number;
|
||||
digestsAnalyzedToday: number;
|
||||
digestQuota: number;
|
||||
};
|
||||
|
||||
// Release Operations
|
||||
operations: {
|
||||
deploymentsInFlight: number;
|
||||
deploymentsCompletedToday: number;
|
||||
deploymentsFailed24h: number;
|
||||
pendingApprovals: number;
|
||||
averageDeployTime: number; // seconds
|
||||
};
|
||||
|
||||
// Estate Health
|
||||
estate: {
|
||||
agentsOnline: number;
|
||||
agentsOffline: number;
|
||||
agentsDegraded: number;
|
||||
targetsHealthy: number;
|
||||
targetsUnhealthy: number;
|
||||
targetsDrift: number;
|
||||
};
|
||||
|
||||
// Compliance/Audit
|
||||
compliance: {
|
||||
evidenceCompleteness: number; // percentage
|
||||
policyChangesThisWeek: number;
|
||||
auditExportsThisMonth: number;
|
||||
lastExportDate: DateTime;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dashboard Panels
|
||||
|
||||
### 1. Security Posture Panel
|
||||
|
||||
Displays current security state across all releases:
|
||||
|
||||
| Metric | Description |
|
||||
|--------|-------------|
|
||||
| Critical Reachable | Critical vulnerabilities with confirmed reachability |
|
||||
| High Reachable | High severity vulnerabilities with confirmed reachability |
|
||||
| Blocked Releases | Releases blocked by security gates |
|
||||
| Risk Drift | Environments with changed risk since deployment |
|
||||
|
||||
### 2. Release Operations Panel
|
||||
|
||||
Shows active deployment operations:
|
||||
|
||||
| Metric | Description |
|
||||
|--------|-------------|
|
||||
| In Flight | Deployments currently in progress |
|
||||
| Completed Today | Successful deployments in last 24h |
|
||||
| Pending Approval | Promotions awaiting approval |
|
||||
| Failed (24h) | Failed deployments in last 24h |
|
||||
|
||||
### 3. Estate Health Panel
|
||||
|
||||
Displays agent and target health:
|
||||
|
||||
| Metric | Description |
|
||||
|--------|-------------|
|
||||
| Agents Online | Number of agents reporting healthy |
|
||||
| Agents Offline | Agents that missed heartbeats |
|
||||
| Targets Healthy | Targets passing health checks |
|
||||
| Drift Detected | Targets with version drift |
|
||||
|
||||
### 4. Compliance/Audit Panel
|
||||
|
||||
Shows audit and compliance status:
|
||||
|
||||
| Metric | Description |
|
||||
|--------|-------------|
|
||||
| Evidence Complete | % of deployments with full evidence |
|
||||
| Policy Changes | Policy modifications this week |
|
||||
| Audit Exports | Evidence exports this month |
|
||||
|
||||
---
|
||||
|
||||
## Real-Time Updates
|
||||
|
||||
### WebSocket Integration
|
||||
|
||||
```typescript
|
||||
interface DashboardStreamMessage {
|
||||
type: "metric_update" | "activity" | "alert";
|
||||
timestamp: DateTime;
|
||||
payload: MetricUpdate | ActivityEvent | Alert;
|
||||
}
|
||||
|
||||
// Subscribe to dashboard stream
|
||||
const ws = new WebSocket("/api/v1/dashboard/stream");
|
||||
|
||||
ws.onmessage = (event) => {
|
||||
const message: DashboardStreamMessage = JSON.parse(event.data);
|
||||
|
||||
switch (message.type) {
|
||||
case "metric_update":
|
||||
updateMetrics(message.payload);
|
||||
break;
|
||||
case "activity":
|
||||
addActivityItem(message.payload);
|
||||
break;
|
||||
case "alert":
|
||||
showAlert(message.payload);
|
||||
break;
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Targets
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Initial Load | < 2 seconds |
|
||||
| Metric Refresh | Every 30 seconds |
|
||||
| WebSocket Reconnect | Exponential backoff (1s, 2s, 4s, ... 30s max) |
|
||||
| Activity History | Last 50 events |
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [WebSocket APIs](../api/websockets.md)
|
||||
- [Metrics](../operations/metrics.md)
|
||||
- [Workflow Editor](workflow-editor.md)
|
||||
- [Key Screens](screens.md)
|
||||
332
docs/modules/release-orchestrator/ui/overview.md
Normal file
332
docs/modules/release-orchestrator/ui/overview.md
Normal file
@@ -0,0 +1,332 @@
|
||||
# UI Overview
|
||||
|
||||
## Status
|
||||
|
||||
**Planned** - UI implementation has not started.
|
||||
|
||||
## Design Principles
|
||||
|
||||
| Principle | Implementation |
|
||||
|-----------|----------------|
|
||||
| **Clarity** | Clear status indicators, intuitive navigation |
|
||||
| **Real-time** | Live updates via WebSocket for deployments |
|
||||
| **Actionable** | One-click approvals, quick actions |
|
||||
| **Audit-friendly** | Full history visibility, evidence access |
|
||||
| **Mobile-aware** | Responsive design for on-call scenarios |
|
||||
|
||||
## Main Screens
|
||||
|
||||
### Dashboard
|
||||
|
||||
The main dashboard provides an at-a-glance view of deployment health across environments.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ RELEASE ORCHESTRATOR [User] [Settings] │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ ENVIRONMENT PIPELINE │ │
|
||||
│ │ │ │
|
||||
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
|
||||
│ │ │ DEV │───►│ STAGING │───►│ UAT │───►│ PROD │ │ │
|
||||
│ │ │ v1.5.0 │ │ v1.4.2 │ │ v1.4.1 │ │ v1.4.0 │ │ │
|
||||
│ │ │ 3/3 OK │ │ 2/2 OK │ │ 2/2 OK │ │ 5/5 OK │ │ │
|
||||
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────────────────────┐ ┌──────────────────────────────┐ │
|
||||
│ │ PENDING APPROVALS (3) │ │ RECENT DEPLOYMENTS │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ ● myapp → prod [Approve] │ │ ✓ api v1.5.0 → dev 2m │ │
|
||||
│ │ Requested by: John │ │ ✓ web v1.4.2 → staging 15m │ │
|
||||
│ │ 2 hours ago │ │ ✗ api v1.4.1 → uat 1h │ │
|
||||
│ │ │ │ ✓ web v1.4.0 → prod 2h │ │
|
||||
│ │ ● web → uat [Approve] │ │ │ │
|
||||
│ │ Requested by: Jane │ │ [View All] │ │
|
||||
│ │ 30 minutes ago │ │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ └──────────────────────────────┘ └──────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────────────────────┐ ┌──────────────────────────────┐ │
|
||||
│ │ AGENT STATUS │ │ ACTIVE WORKFLOWS │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ ● 12 Online │ │ ● Deploy api v1.5.0 │ │
|
||||
│ │ ○ 1 Offline │ │ Step: Health Check (3/5) │ │
|
||||
│ │ ◐ 2 Degraded │ │ │ │
|
||||
│ │ │ │ ● Promote web to UAT │ │
|
||||
│ │ [View Details] │ │ Step: Awaiting Approval │ │
|
||||
│ │ │ │ │ │
|
||||
│ └──────────────────────────────┘ └──────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Releases View
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ RELEASES [+ Create Release] │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Filter: [All ▼] Status: [All ▼] Search: [________________] │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ NAME STATUS COMPONENTS ENVIRONMENTS CREATED │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ myapp-v1.5.0 Ready 3 dev 2h ago │ │
|
||||
│ │ myapp-v1.4.2 Deployed 3 staging, uat 1d ago │ │
|
||||
│ │ myapp-v1.4.1 Deployed 3 prod 3d ago │ │
|
||||
│ │ myapp-v1.4.0 Deprecated 3 - 1w ago │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ RELEASE DETAIL: myapp-v1.5.0 [Promote ▼] │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ │ │
|
||||
│ │ Components: │ │
|
||||
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ api sha256:abc123... registry.io/myorg/api │ │ │
|
||||
│ │ │ web sha256:def456... registry.io/myorg/web │ │ │
|
||||
│ │ │ worker sha256:ghi789... registry.io/myorg/worker │ │ │
|
||||
│ │ └────────────────────────────────────────────────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ Source: https://github.com/myorg/myapp @ v1.5.0 │ │
|
||||
│ │ Created: 2h ago by john@example.com │ │
|
||||
│ │ │ │
|
||||
│ │ Promotion History: │ │
|
||||
│ │ dev (✓) → staging (pending) → uat (-) → prod (-) │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Promotion Detail
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ PROMOTION: myapp-v1.5.0 → production │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Status: PENDING APPROVAL [Approve] [Reject] │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ GATE EVALUATION │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ │ │
|
||||
│ │ ✓ Security Gate Passed │ │
|
||||
│ │ No critical vulnerabilities │ │
|
||||
│ │ │ │
|
||||
│ │ ✓ Freeze Window Check Passed │ │
|
||||
│ │ No active freeze windows │ │
|
||||
│ │ │ │
|
||||
│ │ ◐ Approval Gate 1/2 Approvals │ │
|
||||
│ │ Jane approved 30m ago │ │
|
||||
│ │ Waiting for 1 more approval │ │
|
||||
│ │ │ │
|
||||
│ │ ○ Separation of Duties Pending │ │
|
||||
│ │ Requester: John (cannot approve) │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PROMOTION TIMELINE │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ │ │
|
||||
│ │ 10:00 John requested promotion │ │
|
||||
│ │ 10:05 Security gate evaluated: PASSED │ │
|
||||
│ │ 10:05 Freeze check: PASSED │ │
|
||||
│ │ 10:30 Jane approved │ │
|
||||
│ │ 11:00 Waiting for additional approval... │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Workflow Editor
|
||||
|
||||
Visual editor for creating and modifying workflow templates.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ WORKFLOW EDITOR: standard-deploy [Save] [Run] │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────┐ ┌─────────────────────────────────────────────────┐ │
|
||||
│ │ STEP PALETTE │ │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ Control │ │ ┌──────────┐ │ │
|
||||
│ │ ├─ Approval │ │ │ Approval │ │ │
|
||||
│ │ ├─ Wait │ │ │ Gate │ │ │
|
||||
│ │ └─ Condition │ │ └────┬─────┘ │ │
|
||||
│ │ │ │ │ │ │
|
||||
│ │ Gates │ │ ▼ │ │
|
||||
│ │ ├─ Security │ │ ┌──────────┐ │ │
|
||||
│ │ ├─ Freeze │ │ │ Security │ │ │
|
||||
│ │ └─ Custom │ │ │ Gate │ │ │
|
||||
│ │ │ │ └────┬─────┘ │ │
|
||||
│ │ Deploy │ │ │ │ │
|
||||
│ │ ├─ Docker │ │ ▼ │ │
|
||||
│ │ ├─ Compose │ │ ┌──────────┐ │ │
|
||||
│ │ └─ ECS │ │ │ Deploy │ │ │
|
||||
│ │ │ │ │ Targets │ │ │
|
||||
│ │ Verify │ │ └────┬─────┘ │ │
|
||||
│ │ ├─ Health │ │ │ │ │
|
||||
│ │ └─ Smoke Test │ │ ┌────┴────┐ │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ Notify │ │ ▼ ▼ │ │
|
||||
│ │ ├─ Slack │ │ ┌──────┐ ┌──────────┐ │ │
|
||||
│ │ └─ Email │ │ │Health│ │ Rollback │◄──[on failure] │ │
|
||||
│ │ │ │ │Check │ │ Handler │ │ │
|
||||
│ │ │ │ └──┬───┘ └────┬─────┘ │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ │ │ ▼ ▼ │ │
|
||||
│ │ │ │ ┌──────┐ ┌──────────┐ │ │
|
||||
│ │ │ │ │Notify│ │ Notify │ │ │
|
||||
│ │ │ │ │Success│ │ Failure │ │ │
|
||||
│ │ │ │ └──────┘ └──────────┘ │ │
|
||||
│ │ │ │ │ │
|
||||
│ └─────────────────┘ └─────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ STEP PROPERTIES: Deploy Targets │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ Type: deploy-compose │ │
|
||||
│ │ Strategy: [Rolling ▼] │ │
|
||||
│ │ Parallelism: [2] │ │
|
||||
│ │ Timeout: [600] seconds │ │
|
||||
│ │ On Failure: [Rollback ▼] │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Deployment Live View
|
||||
|
||||
Real-time view of an active deployment.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ DEPLOYMENT: myapp-v1.5.0 → production [Abort]│
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Status: RUNNING Progress: ████████░░ 80% │
|
||||
│ Strategy: Rolling (batch 4/5) Duration: 5m 23s │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ TARGET STATUS │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ │ │
|
||||
│ │ ✓ prod-host-1 sha256:abc123 Deployed Health: OK │ │
|
||||
│ │ ✓ prod-host-2 sha256:abc123 Deployed Health: OK │ │
|
||||
│ │ ✓ prod-host-3 sha256:abc123 Deployed Health: OK │ │
|
||||
│ │ ● prod-host-4 sha256:abc123 Deploying Health: Checking... │ │
|
||||
│ │ ○ prod-host-5 - Pending Health: - │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ LIVE LOGS: prod-host-4 │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ 10:25:15 Pulling image sha256:abc123... │ │
|
||||
│ │ 10:25:18 Image pulled successfully │ │
|
||||
│ │ 10:25:19 Stopping existing container... │ │
|
||||
│ │ 10:25:20 Starting new container... │ │
|
||||
│ │ 10:25:21 Container started │ │
|
||||
│ │ 10:25:22 Running health check... │ │
|
||||
│ │ 10:25:25 Health check passed (1/3) │ │
|
||||
│ │ 10:25:28 Health check passed (2/3) │ │
|
||||
│ │ ... │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Environment Management
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ENVIRONMENTS [+ Add Environment] │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ NAME ORDER TARGETS CURRENT RELEASE APPROVALS STATUS │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ development 1 3 myapp-v1.5.0 0 Active │ │
|
||||
│ │ staging 2 2 myapp-v1.4.2 1 Active │ │
|
||||
│ │ uat 3 2 myapp-v1.4.1 1 Active │ │
|
||||
│ │ production 4 5 myapp-v1.4.0 2 + SoD Active │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ ENVIRONMENT DETAIL: production [Edit] │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ │ │
|
||||
│ │ Approval Policy: │ │
|
||||
│ │ - Required approvals: 2 │ │
|
||||
│ │ - Separation of duties: Enabled │ │
|
||||
│ │ - Approver roles: release-manager, tech-lead │ │
|
||||
│ │ │ │
|
||||
│ │ Freeze Windows: │ │
|
||||
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ Holiday Freeze Dec 20 - Jan 5 Active [Remove] │ │ │
|
||||
│ │ │ Weekend Freeze Sat-Sun Active [Remove] │ │ │
|
||||
│ │ └────────────────────────────────────────────────────────────┘ │ │
|
||||
│ │ [+ Add Freeze Window] │ │
|
||||
│ │ │ │
|
||||
│ │ Targets: │ │
|
||||
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ prod-host-1 docker_host healthy sha256:abc... │ │ │
|
||||
│ │ │ prod-host-2 docker_host healthy sha256:abc... │ │ │
|
||||
│ │ │ prod-host-3 docker_host healthy sha256:abc... │ │ │
|
||||
│ │ │ prod-host-4 docker_host healthy sha256:abc... │ │ │
|
||||
│ │ │ prod-host-5 docker_host degraded sha256:abc... │ │ │
|
||||
│ │ └────────────────────────────────────────────────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Key Interactions
|
||||
|
||||
### Approval Flow
|
||||
|
||||
1. User sees pending approval notification on dashboard
|
||||
2. Clicks to view promotion detail
|
||||
3. Reviews gate evaluation results and change details
|
||||
4. Clicks "Approve" or "Reject" with optional comment
|
||||
5. System validates SoD requirements
|
||||
6. Promotion advances or notification sent
|
||||
|
||||
### Quick Promote
|
||||
|
||||
1. From release detail, user clicks "Promote"
|
||||
2. Selects target environment from dropdown
|
||||
3. Confirms promotion request
|
||||
4. System evaluates gates immediately
|
||||
5. If auto-approved, deployment begins
|
||||
6. If approval required, notification sent to approvers
|
||||
|
||||
### Emergency Rollback
|
||||
|
||||
1. From deployment history or alert, user clicks "Rollback"
|
||||
2. System shows previous healthy version
|
||||
3. User confirms rollback
|
||||
4. System creates rollback deployment job
|
||||
5. Real-time progress shown
|
||||
|
||||
## Mobile Considerations
|
||||
|
||||
- Responsive design for smaller screens
|
||||
- Critical actions (approve/reject) accessible on mobile
|
||||
- Push notifications for pending approvals
|
||||
- Simplified views for monitoring on-the-go
|
||||
|
||||
## References
|
||||
|
||||
- [API Overview](../api/overview.md)
|
||||
- [Workflow Templates](../workflow/templates.md)
|
||||
232
docs/modules/release-orchestrator/ui/screens.md
Normal file
232
docs/modules/release-orchestrator/ui/screens.md
Normal file
@@ -0,0 +1,232 @@
|
||||
# Key UI Screens
|
||||
|
||||
> Specification for key UI screens: Environment Overview, Release Detail, and Why Blocked modal.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 12.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Environment Manager](../modules/environment-manager.md), [Release Manager](../modules/release-manager.md)
|
||||
**Sprints:** [111_002 - 111_007](../../../../implplan/)
|
||||
|
||||
## Overview
|
||||
|
||||
This document specifies the key UI screens for release orchestration.
|
||||
|
||||
---
|
||||
|
||||
## Environment Overview Screen
|
||||
|
||||
The environment overview shows the deployment pipeline and current state of each environment.
|
||||
|
||||
```
|
||||
+-----------------------------------------------------------------------------+
|
||||
| ENVIRONMENTS [+ New Environment] |
|
||||
+-----------------------------------------------------------------------------+
|
||||
| |
|
||||
| +------------------------------------------------------------------------+ |
|
||||
| | ENVIRONMENT PIPELINE | |
|
||||
| | | |
|
||||
| | +---------+ +---------+ +---------+ +---------+ | |
|
||||
| | | DEV | ---> | TEST | ---> | STAGE | ---> | PROD | | |
|
||||
| | | | | | | | | | | |
|
||||
| | | v2.4.0 | | v2.3.1 | | v2.3.1 | | v2.3.0 | | |
|
||||
| | | * 5 min | | * 2h | | * 1d | | * 3d | | |
|
||||
| | +---------+ +---------+ +---------+ +---------+ | |
|
||||
| | | |
|
||||
| +------------------------------------------------------------------------+ |
|
||||
| |
|
||||
| +------------------------------------------------------------------------+ |
|
||||
| | PRODUCTION [Manage] [View] | |
|
||||
| | | |
|
||||
| | Current Release: myapp-v2.3.0 | |
|
||||
| | Deployed: 3 days ago by jane@example.com | |
|
||||
| | Targets: 5 healthy, 0 unhealthy | |
|
||||
| | | |
|
||||
| | +---------------------------------------------------------------+ | |
|
||||
| | | Pending Promotion: myapp-v2.3.1 [Review] | | |
|
||||
| | | Waiting: 2 approvals (1/2) | | |
|
||||
| | | Security: V All gates pass | | |
|
||||
| | +---------------------------------------------------------------+ | |
|
||||
| | | |
|
||||
| | Freeze Windows: None active | |
|
||||
| | Required Approvals: 2 | |
|
||||
| | | |
|
||||
| +------------------------------------------------------------------------+ |
|
||||
| |
|
||||
+-----------------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
### Features
|
||||
|
||||
- **Environment Pipeline:** Visual flow showing version progression
|
||||
- **Environment Cards:** Detailed view of each environment
|
||||
- **Target Health:** Real-time target health indicators
|
||||
- **Pending Promotions:** Promotions awaiting action
|
||||
- **Freeze Windows:** Active and scheduled freeze windows
|
||||
- **Approval Status:** Current approval count vs required
|
||||
|
||||
---
|
||||
|
||||
## Release Detail Screen
|
||||
|
||||
The release detail screen shows all information about a specific release.
|
||||
|
||||
```
|
||||
+-----------------------------------------------------------------------------+
|
||||
| RELEASE: myapp-v2.3.1 |
|
||||
| Created: 2 hours ago by jane@example.com |
|
||||
+-----------------------------------------------------------------------------+
|
||||
| |
|
||||
| [Overview] [Components] [Security] [Deployments] [Evidence] |
|
||||
| |
|
||||
| +------------------------------------------------------------------------+ |
|
||||
| | COMPONENTS | |
|
||||
| | | |
|
||||
| | +------------------------------------------------------------------+ | |
|
||||
| | | api | | |
|
||||
| | | Version: 2.3.1 Digest: sha256:abc123... | | |
|
||||
| | | Security: V 0 critical, 0 high (0 reachable) | | |
|
||||
| | | Image: registry.example.com/myapp/api@sha256:abc123 | | |
|
||||
| | +------------------------------------------------------------------+ | |
|
||||
| | | |
|
||||
| | +------------------------------------------------------------------+ | |
|
||||
| | | worker | | |
|
||||
| | | Version: 2.3.1 Digest: sha256:def456... | | |
|
||||
| | | Security: V 0 critical, 0 high (0 reachable) | | |
|
||||
| | | Image: registry.example.com/myapp/worker@sha256:def456 | | |
|
||||
| | +------------------------------------------------------------------+ | |
|
||||
| | | |
|
||||
| +------------------------------------------------------------------------+ |
|
||||
| |
|
||||
| +------------------------------------------------------------------------+ |
|
||||
| | DEPLOYMENT STATUS | |
|
||||
| | | |
|
||||
| | dev *--------------------------------------------* Deployed (2h) | |
|
||||
| | test *--------------------------------------------* Deployed (1h) | |
|
||||
| | stage o--------------------------------------------* Deploying... | |
|
||||
| | prod o Not deployed | |
|
||||
| | | |
|
||||
| +------------------------------------------------------------------------+ |
|
||||
| |
|
||||
| [Promote to Stage v] [Compare with Production] [Download Evidence] |
|
||||
| |
|
||||
+-----------------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
### Tabs
|
||||
|
||||
1. **Overview:** Release metadata and summary
|
||||
2. **Components:** Component list with digests and versions
|
||||
3. **Security:** Vulnerability summary and reachability analysis
|
||||
4. **Deployments:** Deployment history across environments
|
||||
5. **Evidence:** Evidence packets for compliance
|
||||
|
||||
### Features
|
||||
|
||||
- **Digest Display:** Full OCI digests for each component
|
||||
- **Security Summary:** Vulnerability counts by severity
|
||||
- **Deployment Timeline:** Visual progress across environments
|
||||
- **Quick Actions:** Promote, compare, and export options
|
||||
|
||||
---
|
||||
|
||||
## "Why Blocked?" Modal
|
||||
|
||||
The "Why Blocked?" modal explains why a promotion cannot proceed.
|
||||
|
||||
```
|
||||
+-----------------------------------------------------------------------------+
|
||||
| WHY IS THIS PROMOTION BLOCKED? [Close] |
|
||||
+-----------------------------------------------------------------------------+
|
||||
| |
|
||||
| Release: myapp-v2.4.0 -> Production |
|
||||
| |
|
||||
| +------------------------------------------------------------------------+ |
|
||||
| | X SECURITY GATE FAILED | |
|
||||
| | | |
|
||||
| | Component 'api' has 1 critical reachable vulnerability: | |
|
||||
| | | |
|
||||
| | - CVE-2024-1234 (Critical, CVSS 9.8) | |
|
||||
| | Package: log4j 2.14.0 | |
|
||||
| | Reachability: V Confirmed reachable via api/logging/Logger.java | |
|
||||
| | Fixed in: 2.17.1 | |
|
||||
| | [View Details] [View Evidence] | |
|
||||
| | | |
|
||||
| | Remediation: Update log4j to version 2.17.1 or later | |
|
||||
| | | |
|
||||
| +------------------------------------------------------------------------+ |
|
||||
| |
|
||||
| +------------------------------------------------------------------------+ |
|
||||
| | V APPROVAL GATE PASSED | |
|
||||
| | | |
|
||||
| | Required: 2 approvals | |
|
||||
| | Received: 2 approvals | |
|
||||
| | - john@example.com (2h ago): "LGTM" | |
|
||||
| | - sarah@example.com (1h ago): "Approved for prod" | |
|
||||
| | | |
|
||||
| +------------------------------------------------------------------------+ |
|
||||
| |
|
||||
| +------------------------------------------------------------------------+ |
|
||||
| | V FREEZE WINDOW GATE PASSED | |
|
||||
| | | |
|
||||
| | No active freeze windows for production | |
|
||||
| | | |
|
||||
| +------------------------------------------------------------------------+ |
|
||||
| |
|
||||
| Policy evaluated at: 2026-01-09T14:32:15Z |
|
||||
| Policy hash: sha256:789xyz... |
|
||||
| [View Full Decision Record] |
|
||||
| |
|
||||
+-----------------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
### Features
|
||||
|
||||
- **Gate-by-Gate Status:** Shows each gate with pass/fail status
|
||||
- **Failure Details:** Specific information about why a gate failed
|
||||
- **Vulnerability Details:** CVE info, package, version, and remediation
|
||||
- **Reachability Evidence:** Links to reachability analysis
|
||||
- **Approval History:** List of approvers and their comments
|
||||
- **Override Mechanism:** Request override for authorized users
|
||||
- **Decision Record:** Link to full evidence packet
|
||||
|
||||
---
|
||||
|
||||
## Navigation Structure
|
||||
|
||||
```
|
||||
Dashboard
|
||||
+-- Releases
|
||||
| +-- [Release Detail]
|
||||
| +-- Create Release
|
||||
| +-- Compare Releases
|
||||
|
|
||||
+-- Environments
|
||||
| +-- [Environment Overview]
|
||||
| +-- Create Environment
|
||||
| +-- Manage Targets
|
||||
|
|
||||
+-- Workflows
|
||||
| +-- [Workflow Editor]
|
||||
| +-- Workflow Runs
|
||||
| +-- Step Types
|
||||
|
|
||||
+-- Integrations
|
||||
| +-- Connectors
|
||||
| +-- Plugins
|
||||
| +-- Vault
|
||||
|
|
||||
+-- Settings
|
||||
+-- Users & Teams
|
||||
+-- Policies
|
||||
+-- Audit Log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Dashboard](dashboard.md)
|
||||
- [Workflow Editor](workflow-editor.md)
|
||||
- [Environment Manager](../modules/environment-manager.md)
|
||||
- [Release Manager](../modules/release-manager.md)
|
||||
- [Promotion Manager](../modules/promotion-manager.md)
|
||||
296
docs/modules/release-orchestrator/ui/workflow-editor.md
Normal file
296
docs/modules/release-orchestrator/ui/workflow-editor.md
Normal file
@@ -0,0 +1,296 @@
|
||||
# Workflow Editor Specification
|
||||
|
||||
> Visual workflow editor for creating and editing DAG-based workflow templates.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Architecture Advisory Section 12.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
**Related Modules:** [Workflow Engine](../modules/workflow-engine.md), [Workflow Templates](../workflow/templates.md)
|
||||
**Sprint:** [111_004 Workflow Editor](../../../../implplan/SPRINT_20260110_111_004_FE_workflow_editor.md)
|
||||
|
||||
## Overview
|
||||
|
||||
The workflow editor provides a visual graph editor for creating and editing workflow templates. It supports drag-and-drop node placement, connection creation, real-time run visualization, and bidirectional YAML synchronization.
|
||||
|
||||
---
|
||||
|
||||
## Graph Editor Component
|
||||
|
||||
### Editor State
|
||||
|
||||
```typescript
|
||||
interface WorkflowEditorState {
|
||||
template: WorkflowTemplate;
|
||||
selectedNode: string | null;
|
||||
selectedEdge: string | null;
|
||||
zoom: number;
|
||||
pan: { x: number; y: number };
|
||||
mode: "select" | "pan" | "connect";
|
||||
clipboard: StepNode[] | null;
|
||||
undoStack: WorkflowTemplate[];
|
||||
redoStack: WorkflowTemplate[];
|
||||
}
|
||||
|
||||
interface WorkflowEditorProps {
|
||||
template: WorkflowTemplate;
|
||||
stepTypes: StepType[];
|
||||
readOnly: boolean;
|
||||
onSave: (template: WorkflowTemplate) => void;
|
||||
onValidate: (template: WorkflowTemplate) => ValidationResult;
|
||||
}
|
||||
```
|
||||
|
||||
### Node Renderer
|
||||
|
||||
```typescript
|
||||
interface NodeRendererProps {
|
||||
node: StepNode;
|
||||
stepType: StepType;
|
||||
status?: StepRunStatus; // For run visualization
|
||||
selected: boolean;
|
||||
onSelect: () => void;
|
||||
onMove: (position: Position) => void;
|
||||
onConnect: (sourceHandle: string) => void;
|
||||
}
|
||||
|
||||
const NodeRenderer: React.FC<NodeRendererProps> = ({
|
||||
node, stepType, status, selected
|
||||
}) => {
|
||||
const statusColor = getStatusColor(status);
|
||||
|
||||
return (
|
||||
<div className={`workflow-node ${selected ? 'selected' : ''}`}
|
||||
style={{ borderColor: statusColor }}>
|
||||
|
||||
{/* Node header */}
|
||||
<div className="node-header" style={{ backgroundColor: stepType.color }}>
|
||||
<Icon name={stepType.icon} />
|
||||
<span className="node-name">{node.name}</span>
|
||||
{status && <StatusBadge status={status} />}
|
||||
</div>
|
||||
|
||||
{/* Node body */}
|
||||
<div className="node-body">
|
||||
<span className="node-type">{stepType.name}</span>
|
||||
{node.timeout && <span className="node-timeout">T {node.timeout}s</span>}
|
||||
</div>
|
||||
|
||||
{/* Connection handles */}
|
||||
<Handle type="target" position="top" />
|
||||
<Handle type="source" position="bottom" />
|
||||
|
||||
{/* Conditional indicator */}
|
||||
{node.condition && (
|
||||
<div className="condition-badge" title={node.condition}>
|
||||
<Icon name="condition" />
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Run Visualization Overlay
|
||||
|
||||
### Real-Time Execution Display
|
||||
|
||||
```typescript
|
||||
interface RunVisualizationProps {
|
||||
template: WorkflowTemplate;
|
||||
workflowRun: WorkflowRun;
|
||||
stepRuns: StepRun[];
|
||||
onNodeClick: (nodeId: string) => void;
|
||||
}
|
||||
|
||||
const RunVisualization: React.FC<RunVisualizationProps> = ({
|
||||
template, workflowRun, stepRuns, onNodeClick
|
||||
}) => {
|
||||
// WebSocket for real-time updates
|
||||
const { subscribe, unsubscribe } = useWorkflowStream(workflowRun.id);
|
||||
|
||||
useEffect(() => {
|
||||
const handlers = {
|
||||
'step_started': (data) => updateStepStatus(data.nodeId, 'running'),
|
||||
'step_completed': (data) => updateStepStatus(data.nodeId, data.status),
|
||||
'step_log': (data) => appendLog(data.nodeId, data.line),
|
||||
};
|
||||
|
||||
subscribe(handlers);
|
||||
return () => unsubscribe();
|
||||
}, [workflowRun.id]);
|
||||
|
||||
return (
|
||||
<div className="run-visualization">
|
||||
{/* Workflow graph with status overlay */}
|
||||
<WorkflowGraph
|
||||
template={template}
|
||||
nodeRenderer={(node) => (
|
||||
<NodeRenderer
|
||||
node={node}
|
||||
stepType={getStepType(node.type)}
|
||||
status={getStepRunStatus(node.id)}
|
||||
selected={selectedNode === node.id}
|
||||
onSelect={() => setSelectedNode(node.id)}
|
||||
/>
|
||||
)}
|
||||
edgeRenderer={(edge) => (
|
||||
<EdgeRenderer
|
||||
edge={edge}
|
||||
animated={isEdgeActive(edge)}
|
||||
/>
|
||||
)}
|
||||
/>
|
||||
|
||||
{/* Log panel */}
|
||||
{selectedNode && (
|
||||
<LogPanel
|
||||
stepRun={getStepRun(selectedNode)}
|
||||
streaming={isStepRunning(selectedNode)}
|
||||
/>
|
||||
)}
|
||||
|
||||
{/* Progress bar */}
|
||||
<ProgressBar
|
||||
completed={completedSteps}
|
||||
total={totalSteps}
|
||||
status={workflowRun.status}
|
||||
/>
|
||||
</div>
|
||||
);
|
||||
};
|
||||
```
|
||||
|
||||
### Status Indicators
|
||||
|
||||
| Status | Visual |
|
||||
|--------|--------|
|
||||
| Pending | Gray circle |
|
||||
| Running | Blue spinner |
|
||||
| Success | Green checkmark |
|
||||
| Failed | Red X |
|
||||
| Skipped | Yellow dash |
|
||||
|
||||
---
|
||||
|
||||
## Canvas Operations
|
||||
|
||||
### Drag and Drop
|
||||
|
||||
- Drag steps from palette to canvas
|
||||
- Drop creates new node at position
|
||||
- Connect nodes by dragging from source to target handle
|
||||
- Multi-select with Shift+click or box selection
|
||||
|
||||
### Validation
|
||||
|
||||
The editor performs real-time validation:
|
||||
|
||||
- **DAG Cycle Detection:** Prevent circular dependencies
|
||||
- **Orphan Node Detection:** Warn about unconnected nodes
|
||||
- **Required Inputs:** Highlight missing required configuration
|
||||
- **Type Compatibility:** Validate edge connections between compatible types
|
||||
|
||||
### Zoom and Pan
|
||||
|
||||
| Action | Control |
|
||||
|--------|---------|
|
||||
| Zoom In | Ctrl + Mouse Wheel Up |
|
||||
| Zoom Out | Ctrl + Mouse Wheel Down |
|
||||
| Fit View | Ctrl + 0 |
|
||||
| Pan | Middle Mouse Drag / Space + Drag |
|
||||
| Reset | Ctrl + R |
|
||||
|
||||
---
|
||||
|
||||
## YAML Editor Mode
|
||||
|
||||
### Monaco Editor Integration
|
||||
|
||||
The editor supports a bidirectional YAML mode for power users:
|
||||
|
||||
```typescript
|
||||
interface YAMLEditorProps {
|
||||
template: WorkflowTemplate;
|
||||
onChange: (template: WorkflowTemplate) => void;
|
||||
onValidate: (yaml: string) => ValidationResult;
|
||||
}
|
||||
|
||||
const YAMLEditor: React.FC<YAMLEditorProps> = ({ template, onChange, onValidate }) => {
|
||||
const [yaml, setYaml] = useState(templateToYaml(template));
|
||||
|
||||
return (
|
||||
<MonacoEditor
|
||||
language="yaml"
|
||||
value={yaml}
|
||||
onChange={(value) => {
|
||||
setYaml(value);
|
||||
const result = onValidate(value);
|
||||
if (result.valid) {
|
||||
onChange(yamlToTemplate(value));
|
||||
}
|
||||
}}
|
||||
options={{
|
||||
minimap: { enabled: false },
|
||||
lineNumbers: 'on',
|
||||
scrollBeyondLastLine: false,
|
||||
}}
|
||||
/>
|
||||
);
|
||||
};
|
||||
```
|
||||
|
||||
### Bidirectional Sync
|
||||
|
||||
Changes in either view (graph or YAML) are synchronized:
|
||||
|
||||
- Graph changes update YAML immediately
|
||||
- Valid YAML changes update graph
|
||||
- Invalid YAML shows error markers without updating graph
|
||||
|
||||
---
|
||||
|
||||
## Step Palette
|
||||
|
||||
### Available Step Types
|
||||
|
||||
The palette shows all available step types from core and plugins:
|
||||
|
||||
```typescript
|
||||
interface StepPaletteProps {
|
||||
stepTypes: StepType[];
|
||||
onDragStart: (stepType: string) => void;
|
||||
filter: string;
|
||||
}
|
||||
|
||||
const categories = [
|
||||
{ name: "Deployment", types: ["deploy", "rollback"] },
|
||||
{ name: "Gates", types: ["security-gate", "approval", "freeze-window-gate"] },
|
||||
{ name: "Utility", types: ["script", "wait", "notify"] },
|
||||
{ name: "Plugins", types: [] }, // Dynamically loaded
|
||||
];
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Keyboard Shortcuts
|
||||
|
||||
| Shortcut | Action |
|
||||
|----------|--------|
|
||||
| Ctrl + S | Save template |
|
||||
| Ctrl + Z | Undo |
|
||||
| Ctrl + Shift + Z | Redo |
|
||||
| Delete | Delete selected |
|
||||
| Ctrl + C | Copy selected |
|
||||
| Ctrl + V | Paste |
|
||||
| Ctrl + A | Select all |
|
||||
| Escape | Deselect / Cancel |
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Workflow Templates](../workflow/templates.md)
|
||||
- [Workflow APIs](../api/workflows.md)
|
||||
- [Dashboard](dashboard.md)
|
||||
- [Key Screens](screens.md)
|
||||
591
docs/modules/release-orchestrator/workflow/execution.md
Normal file
591
docs/modules/release-orchestrator/workflow/execution.md
Normal file
@@ -0,0 +1,591 @@
|
||||
# Workflow Execution
|
||||
|
||||
## Overview
|
||||
|
||||
The Workflow Engine executes workflow templates as DAGs (Directed Acyclic Graphs) of steps, managing state transitions, parallelism, retries, and failure handling.
|
||||
|
||||
## Execution Architecture
|
||||
|
||||
```
|
||||
WORKFLOW EXECUTION ARCHITECTURE
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ WORKFLOW ENGINE │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ WORKFLOW RUNNER │ │
|
||||
│ │ │ │
|
||||
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
|
||||
│ │ │ Template │───►│ Execution │───►│ Context │ │ │
|
||||
│ │ │ Parser │ │ Planner │ │ Builder │ │ │
|
||||
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
|
||||
│ │ │ │ │ │ │
|
||||
│ │ └────────────────┼─────────────────┘ │ │
|
||||
│ │ ▼ │ │
|
||||
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ DAG EXECUTOR │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │
|
||||
│ │ │ │ Ready │ │ Running │ │ Waiting │ │ Completed│ │ │ │
|
||||
│ │ │ │ Queue │ │ Set │ │ Set │ │ Set │ │ │ │
|
||||
│ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │
|
||||
│ │ │ │ STEP DISPATCHER │ │ │ │
|
||||
│ │ │ └──────────────────────────────────────────────────────┘ │ │ │
|
||||
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ STEP EXECUTOR POOL │ │
|
||||
│ │ │ │
|
||||
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
|
||||
│ │ │ Executor 1 │ │ Executor 2 │ │ Executor 3 │ │ Executor N │ │ │
|
||||
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Workflow Run State Machine
|
||||
|
||||
```
|
||||
WORKFLOW RUN STATES
|
||||
|
||||
┌──────────┐
|
||||
│ CREATED │
|
||||
└────┬─────┘
|
||||
│ start()
|
||||
▼
|
||||
┌──────────┐
|
||||
│ RUNNING │◄──────────────────┐
|
||||
└────┬─────┘ │
|
||||
│ │
|
||||
┌───────────────────┼───────────────────┐ │
|
||||
│ │ │ │
|
||||
▼ ▼ ▼ │
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐│
|
||||
│ WAITING │ │ PAUSED │ │ FAILING ││
|
||||
│ APPROVAL │ │ │ │ ││
|
||||
└────┬─────┘ └────┬─────┘ └────┬─────┘│
|
||||
│ │ │ │
|
||||
│ approve() │ resume() │ │
|
||||
│ │ │ │
|
||||
└───────────────►──┴──────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────┘
|
||||
│
|
||||
┌───────────────────────┼───────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
│COMPLETED │ │ FAILED │ │ CANCELLED│
|
||||
└──────────┘ └──────────┘ └──────────┘
|
||||
```
|
||||
|
||||
### State Transitions
|
||||
|
||||
| Current State | Event | Next State | Description |
|
||||
|---------------|-------|------------|-------------|
|
||||
| `created` | `start()` | `running` | Begin workflow execution |
|
||||
| `running` | Step requires approval | `waiting_approval` | Pause for human approval |
|
||||
| `running` | `pause()` | `paused` | Manual pause requested |
|
||||
| `running` | Step fails | `failing` | Handle failure path |
|
||||
| `running` | All steps complete | `completed` | Workflow success |
|
||||
| `waiting_approval` | `approve()` | `running` | Resume after approval |
|
||||
| `waiting_approval` | `reject()` | `failed` | Rejection ends workflow |
|
||||
| `paused` | `resume()` | `running` | Resume execution |
|
||||
| `paused` | `cancel()` | `cancelled` | Cancel workflow |
|
||||
| `failing` | Rollback complete | `failed` | Failure handling done |
|
||||
| `failing` | Rollback succeeds | `running` | Resume with fallback |
|
||||
|
||||
## Step Execution State Machine
|
||||
|
||||
```
|
||||
STEP STATES
|
||||
|
||||
┌──────────┐
|
||||
│ PENDING │
|
||||
└────┬─────┘
|
||||
│ schedule()
|
||||
▼
|
||||
┌──────────┐
|
||||
│ QUEUED │
|
||||
└────┬─────┘
|
||||
│ dispatch()
|
||||
▼
|
||||
┌──────────┐
|
||||
│ RUNNING │◄─────────┐
|
||||
└────┬─────┘ │
|
||||
│ │ retry()
|
||||
┌───────────────────┼───────────────┐│
|
||||
│ │ ││
|
||||
▼ ▼ ▼│
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
│SUCCEEDED │ │ FAILED │ │ RETRYING │
|
||||
└──────────┘ └────┬─────┘ └──────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ FAILURE HANDLER │
|
||||
│ ┌───────────────┐ │
|
||||
│ │ fail │──┼─► Mark workflow failing
|
||||
│ │ continue │──┼─► Continue to next step
|
||||
│ │ rollback │──┼─► Trigger rollback path
|
||||
│ │ goto:{nodeId} │──┼─► Jump to specific node
|
||||
│ └───────────────┘ │
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
### Step States
|
||||
|
||||
| State | Description |
|
||||
|-------|-------------|
|
||||
| `pending` | Step not yet ready (dependencies incomplete) |
|
||||
| `queued` | Ready for execution, waiting for executor |
|
||||
| `running` | Currently executing |
|
||||
| `succeeded` | Completed successfully |
|
||||
| `failed` | Failed after all retries exhausted |
|
||||
| `retrying` | Failed, waiting for retry |
|
||||
| `skipped` | Condition evaluated to false |
|
||||
|
||||
## DAG Execution Algorithm
|
||||
|
||||
```python
|
||||
class DAGExecutor:
|
||||
def __init__(self, workflow_run: WorkflowRun):
|
||||
self.run = workflow_run
|
||||
self.template = workflow_run.template
|
||||
self.pending = set(node.id for node in template.nodes)
|
||||
self.running = set()
|
||||
self.completed = set()
|
||||
self.failed = set()
|
||||
self.outputs = {} # nodeId -> outputs
|
||||
|
||||
async def execute(self):
|
||||
"""Main execution loop."""
|
||||
self.run.status = WorkflowStatus.RUNNING
|
||||
self.run.started_at = datetime.utcnow()
|
||||
|
||||
while self.pending or self.running:
|
||||
# Find ready nodes (all dependencies satisfied)
|
||||
ready = self.find_ready_nodes()
|
||||
|
||||
# Dispatch ready nodes
|
||||
for node_id in ready:
|
||||
asyncio.create_task(self.execute_node(node_id))
|
||||
self.pending.remove(node_id)
|
||||
self.running.add(node_id)
|
||||
|
||||
# Wait for any node to complete
|
||||
if self.running:
|
||||
await self.wait_for_completion()
|
||||
|
||||
# Check for deadlock
|
||||
if not ready and self.pending and not self.running:
|
||||
raise DeadlockException(self.pending)
|
||||
|
||||
# Determine final status
|
||||
if self.failed:
|
||||
self.run.status = WorkflowStatus.FAILED
|
||||
else:
|
||||
self.run.status = WorkflowStatus.COMPLETED
|
||||
|
||||
self.run.completed_at = datetime.utcnow()
|
||||
|
||||
def find_ready_nodes(self) -> List[str]:
|
||||
"""Find nodes whose dependencies are all complete."""
|
||||
ready = []
|
||||
for node_id in self.pending:
|
||||
node = self.template.get_node(node_id)
|
||||
|
||||
# Check condition
|
||||
if node.condition:
|
||||
if not self.evaluate_condition(node.condition):
|
||||
self.mark_skipped(node_id)
|
||||
continue
|
||||
|
||||
# Check all incoming edges
|
||||
incoming = self.template.get_incoming_edges(node_id)
|
||||
dependencies_met = all(
|
||||
edge.from_node in self.completed
|
||||
for edge in incoming
|
||||
if self.evaluate_edge_condition(edge)
|
||||
)
|
||||
|
||||
if dependencies_met:
|
||||
ready.append(node_id)
|
||||
|
||||
return ready
|
||||
|
||||
async def execute_node(self, node_id: str):
|
||||
"""Execute a single node."""
|
||||
node = self.template.get_node(node_id)
|
||||
step_run = StepRun(
|
||||
workflow_run_id=self.run.id,
|
||||
node_id=node_id,
|
||||
status=StepStatus.RUNNING
|
||||
)
|
||||
|
||||
try:
|
||||
# Resolve inputs
|
||||
inputs = self.resolve_inputs(node)
|
||||
|
||||
# Get step executor
|
||||
executor = self.step_registry.get_executor(node.type)
|
||||
|
||||
# Execute with timeout
|
||||
async with asyncio.timeout(node.timeout):
|
||||
outputs = await executor.execute(inputs, node.config)
|
||||
|
||||
# Store outputs
|
||||
self.outputs[node_id] = outputs
|
||||
step_run.outputs = outputs
|
||||
step_run.status = StepStatus.SUCCEEDED
|
||||
|
||||
self.running.remove(node_id)
|
||||
self.completed.add(node_id)
|
||||
|
||||
except Exception as e:
|
||||
await self.handle_step_failure(node, step_run, e)
|
||||
|
||||
async def handle_step_failure(self, node, step_run, error):
|
||||
"""Handle step failure according to retry and failure policies."""
|
||||
step_run.attempt_number += 1
|
||||
|
||||
# Check retry policy
|
||||
if step_run.attempt_number <= node.retry_policy.max_retries:
|
||||
if self.is_retryable(error, node.retry_policy):
|
||||
step_run.status = StepStatus.RETRYING
|
||||
delay = self.calculate_backoff(node.retry_policy, step_run.attempt_number)
|
||||
await asyncio.sleep(delay)
|
||||
await self.execute_node(node.id) # Retry
|
||||
return
|
||||
|
||||
# No more retries - handle failure
|
||||
step_run.status = StepStatus.FAILED
|
||||
step_run.error = str(error)
|
||||
|
||||
match node.on_failure:
|
||||
case "fail":
|
||||
self.run.status = WorkflowStatus.FAILING
|
||||
self.failed.add(node.id)
|
||||
case "continue":
|
||||
self.completed.add(node.id) # Continue as if succeeded
|
||||
case "rollback":
|
||||
await self.trigger_rollback(node)
|
||||
case _ if node.on_failure.startswith("goto:"):
|
||||
target = node.on_failure.split(":")[1]
|
||||
self.pending.add(target) # Add target to pending
|
||||
|
||||
self.running.remove(node.id)
|
||||
```
|
||||
|
||||
## Input Resolution
|
||||
|
||||
Inputs to steps can come from multiple sources:
|
||||
|
||||
```typescript
|
||||
interface InputResolver {
|
||||
resolve(binding: InputBinding, context: ExecutionContext): any;
|
||||
}
|
||||
|
||||
class StandardInputResolver implements InputResolver {
|
||||
resolve(binding: InputBinding, context: ExecutionContext): any {
|
||||
switch (binding.source.type) {
|
||||
case "literal":
|
||||
return binding.source.value;
|
||||
|
||||
case "context":
|
||||
// Navigate context path: "release.name" -> context.release.name
|
||||
return this.navigatePath(context, binding.source.path);
|
||||
|
||||
case "output":
|
||||
// Get output from previous step
|
||||
const stepOutputs = context.stepOutputs[binding.source.nodeId];
|
||||
return stepOutputs?.[binding.source.outputName];
|
||||
|
||||
case "secret":
|
||||
// Fetch from vault (never cached)
|
||||
return this.secretsClient.fetch(binding.source.secretName);
|
||||
|
||||
case "expression":
|
||||
// Evaluate JavaScript expression
|
||||
return this.expressionEvaluator.evaluate(
|
||||
binding.source.expression,
|
||||
context
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Execution Context
|
||||
|
||||
The execution context provides data available to all steps:
|
||||
|
||||
```typescript
|
||||
interface ExecutionContext {
|
||||
// Workflow identifiers
|
||||
workflowRunId: UUID;
|
||||
templateId: UUID;
|
||||
templateVersion: number;
|
||||
|
||||
// Input values
|
||||
inputs: Record<string, any>;
|
||||
|
||||
// Domain objects (loaded at start)
|
||||
release?: Release;
|
||||
promotion?: Promotion;
|
||||
environment?: Environment;
|
||||
targets?: Target[];
|
||||
|
||||
// Step outputs (accumulated during execution)
|
||||
stepOutputs: Record<string, Record<string, any>>;
|
||||
|
||||
// Tenant context
|
||||
tenantId: UUID;
|
||||
userId: UUID;
|
||||
|
||||
// Metadata
|
||||
startedAt: DateTime;
|
||||
correlationId: string;
|
||||
}
|
||||
```
|
||||
|
||||
## Concurrency Control
|
||||
|
||||
### Parallelism Within Workflows
|
||||
|
||||
```typescript
|
||||
interface ParallelConfig {
|
||||
maxConcurrency: number; // Max simultaneous steps
|
||||
failFast: boolean; // Stop all on first failure
|
||||
}
|
||||
|
||||
// Example: Parallel deployment to multiple targets
|
||||
const parallelDeploy: StepNode = {
|
||||
id: "parallel-deploy",
|
||||
type: "parallel",
|
||||
config: {
|
||||
maxConcurrency: 5,
|
||||
failFast: false
|
||||
},
|
||||
children: [
|
||||
{ id: "deploy-target-1", type: "deploy-docker", ... },
|
||||
{ id: "deploy-target-2", type: "deploy-docker", ... },
|
||||
{ id: "deploy-target-3", type: "deploy-docker", ... },
|
||||
]
|
||||
};
|
||||
```
|
||||
|
||||
### Global Concurrency Limits
|
||||
|
||||
```typescript
|
||||
interface ConcurrencyLimits {
|
||||
maxWorkflowsPerTenant: number; // Concurrent workflow runs
|
||||
maxStepsPerWorkflow: number; // Concurrent steps per workflow
|
||||
maxDeploymentsPerEnvironment: number; // Prevent deployment conflicts
|
||||
}
|
||||
|
||||
// Default limits
|
||||
const defaults: ConcurrencyLimits = {
|
||||
maxWorkflowsPerTenant: 10,
|
||||
maxStepsPerWorkflow: 20,
|
||||
maxDeploymentsPerEnvironment: 1 // One deployment at a time
|
||||
};
|
||||
```
|
||||
|
||||
## Checkpoint and Resume
|
||||
|
||||
Workflows support checkpointing for long-running executions:
|
||||
|
||||
```typescript
|
||||
interface WorkflowCheckpoint {
|
||||
workflowRunId: UUID;
|
||||
checkpointedAt: DateTime;
|
||||
|
||||
// Execution state
|
||||
pendingNodes: string[];
|
||||
completedNodes: string[];
|
||||
failedNodes: string[];
|
||||
|
||||
// Accumulated data
|
||||
stepOutputs: Record<string, Record<string, any>>;
|
||||
|
||||
// Context snapshot
|
||||
contextSnapshot: ExecutionContext;
|
||||
}
|
||||
|
||||
class CheckpointManager {
|
||||
// Save checkpoint after each step completion
|
||||
async saveCheckpoint(run: WorkflowRun): Promise<void> {
|
||||
const checkpoint: WorkflowCheckpoint = {
|
||||
workflowRunId: run.id,
|
||||
checkpointedAt: new Date(),
|
||||
pendingNodes: Array.from(run.executor.pending),
|
||||
completedNodes: Array.from(run.executor.completed),
|
||||
failedNodes: Array.from(run.executor.failed),
|
||||
stepOutputs: run.executor.outputs,
|
||||
contextSnapshot: run.context
|
||||
};
|
||||
|
||||
await this.repository.save(checkpoint);
|
||||
}
|
||||
|
||||
// Resume from checkpoint after service restart
|
||||
async resumeFromCheckpoint(workflowRunId: UUID): Promise<WorkflowRun> {
|
||||
const checkpoint = await this.repository.get(workflowRunId);
|
||||
|
||||
const run = new WorkflowRun();
|
||||
run.executor.pending = new Set(checkpoint.pendingNodes);
|
||||
run.executor.completed = new Set(checkpoint.completedNodes);
|
||||
run.executor.failed = new Set(checkpoint.failedNodes);
|
||||
run.executor.outputs = checkpoint.stepOutputs;
|
||||
run.context = checkpoint.contextSnapshot;
|
||||
|
||||
// Resume execution
|
||||
await run.executor.execute();
|
||||
return run;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Timeout Handling
|
||||
|
||||
```typescript
|
||||
interface TimeoutConfig {
|
||||
stepTimeout: number; // Per-step timeout (seconds)
|
||||
workflowTimeout: number; // Total workflow timeout (seconds)
|
||||
}
|
||||
|
||||
class TimeoutHandler {
|
||||
async executeWithTimeout<T>(
|
||||
operation: () => Promise<T>,
|
||||
timeoutSeconds: number,
|
||||
onTimeout: () => Promise<void>
|
||||
): Promise<T> {
|
||||
const controller = new AbortController();
|
||||
const timeoutId = setTimeout(
|
||||
() => controller.abort(),
|
||||
timeoutSeconds * 1000
|
||||
);
|
||||
|
||||
try {
|
||||
const result = await operation();
|
||||
clearTimeout(timeoutId);
|
||||
return result;
|
||||
} catch (error) {
|
||||
if (error.name === 'AbortError') {
|
||||
await onTimeout();
|
||||
throw new TimeoutException(timeoutSeconds);
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Event Emission
|
||||
|
||||
The workflow engine emits events for observability:
|
||||
|
||||
```typescript
|
||||
type WorkflowEvent =
|
||||
| { type: "workflow.started"; workflowRunId: UUID; templateId: UUID }
|
||||
| { type: "workflow.completed"; workflowRunId: UUID; status: string }
|
||||
| { type: "workflow.failed"; workflowRunId: UUID; error: string }
|
||||
| { type: "step.started"; workflowRunId: UUID; nodeId: string }
|
||||
| { type: "step.completed"; workflowRunId: UUID; nodeId: string; outputs: any }
|
||||
| { type: "step.failed"; workflowRunId: UUID; nodeId: string; error: string }
|
||||
| { type: "step.retrying"; workflowRunId: UUID; nodeId: string; attempt: number };
|
||||
|
||||
class WorkflowEventEmitter {
|
||||
private subscribers: Map<string, ((event: WorkflowEvent) => void)[]> = new Map();
|
||||
|
||||
emit(event: WorkflowEvent): void {
|
||||
const handlers = this.subscribers.get(event.type) || [];
|
||||
for (const handler of handlers) {
|
||||
handler(event);
|
||||
}
|
||||
|
||||
// Also emit to event bus for external consumers
|
||||
this.eventBus.publish("workflow.events", event);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Execution Monitoring
|
||||
|
||||
### Real-time Progress
|
||||
|
||||
```typescript
|
||||
interface WorkflowProgress {
|
||||
workflowRunId: UUID;
|
||||
status: WorkflowStatus;
|
||||
|
||||
// Step progress
|
||||
totalSteps: number;
|
||||
completedSteps: number;
|
||||
runningSteps: number;
|
||||
failedSteps: number;
|
||||
|
||||
// Current activity
|
||||
currentNodes: string[];
|
||||
|
||||
// Timing
|
||||
startedAt: DateTime;
|
||||
estimatedCompletion?: DateTime;
|
||||
|
||||
// Step details
|
||||
steps: StepProgress[];
|
||||
}
|
||||
|
||||
interface StepProgress {
|
||||
nodeId: string;
|
||||
nodeName: string;
|
||||
status: StepStatus;
|
||||
startedAt?: DateTime;
|
||||
completedAt?: DateTime;
|
||||
attempt: number;
|
||||
logs?: string;
|
||||
}
|
||||
```
|
||||
|
||||
### WebSocket Streaming
|
||||
|
||||
```typescript
|
||||
// Client subscribes to workflow progress
|
||||
const ws = new WebSocket(`/api/v1/workflow-runs/${runId}/stream`);
|
||||
|
||||
ws.onmessage = (event) => {
|
||||
const progress: WorkflowProgress = JSON.parse(event.data);
|
||||
updateUI(progress);
|
||||
};
|
||||
|
||||
// Server streams updates
|
||||
class WorkflowStreamHandler {
|
||||
async stream(runId: UUID, connection: WebSocket): Promise<void> {
|
||||
const subscription = this.eventBus.subscribe(`workflow.${runId}.*`);
|
||||
|
||||
for await (const event of subscription) {
|
||||
const progress = await this.buildProgress(runId);
|
||||
connection.send(JSON.stringify(progress));
|
||||
|
||||
if (progress.status === 'completed' || progress.status === 'failed') {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
connection.close();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Workflow Templates](templates.md)
|
||||
- [Workflow Engine Module](../modules/workflow-engine.md)
|
||||
- [Promotion Manager](../modules/promotion-manager.md)
|
||||
405
docs/modules/release-orchestrator/workflow/promotion.md
Normal file
405
docs/modules/release-orchestrator/workflow/promotion.md
Normal file
@@ -0,0 +1,405 @@
|
||||
# Promotion State Machine
|
||||
|
||||
## Overview
|
||||
|
||||
Promotions move releases through environments (Dev -> Staging -> Production). The promotion state machine manages the lifecycle from request to completion.
|
||||
|
||||
## Promotion States
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ PROMOTION STATE MACHINE │
|
||||
│ │
|
||||
│ ┌──────────────────┐ │
|
||||
│ │ PENDING_APPROVAL │ (initial) │
|
||||
│ └────────┬─────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────────┼──────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
||||
│ │ REJECTED │ │ PENDING_GATE │ │ CANCELLED │ │
|
||||
│ └────────────────┘ └────────┬───────┘ └────────────────┘ │
|
||||
│ │ │
|
||||
│ │ gates pass │
|
||||
│ ▼ │
|
||||
│ ┌────────────────┐ │
|
||||
│ │ APPROVED │ │
|
||||
│ └────────┬───────┘ │
|
||||
│ │ │
|
||||
│ │ start deployment │
|
||||
│ ▼ │
|
||||
│ ┌────────────────┐ │
|
||||
│ │ DEPLOYING │ │
|
||||
│ └────────┬───────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────────┼──────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
||||
│ │ FAILED │ │ DEPLOYED │ │ ROLLED_BACK │ │
|
||||
│ └────────────────┘ └────────────────┘ └────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## State Definitions
|
||||
|
||||
| State | Description |
|
||||
|-------|-------------|
|
||||
| `pending_approval` | Awaiting human approval (if required) |
|
||||
| `pending_gate` | Awaiting automated gate evaluation |
|
||||
| `approved` | All approvals and gates satisfied; ready for deployment |
|
||||
| `rejected` | Blocked by approval rejection or gate failure |
|
||||
| `deploying` | Deployment in progress |
|
||||
| `deployed` | Successfully deployed to target environment |
|
||||
| `failed` | Deployment failed (not rolled back) |
|
||||
| `cancelled` | Cancelled by user before completion |
|
||||
| `rolled_back` | Deployment rolled back to previous version |
|
||||
|
||||
## State Transitions
|
||||
|
||||
### Valid Transitions
|
||||
|
||||
```typescript
|
||||
const validTransitions: Record<PromotionStatus, PromotionStatus[]> = {
|
||||
pending_approval: ["pending_gate", "approved", "rejected", "cancelled"],
|
||||
pending_gate: ["approved", "rejected", "cancelled"],
|
||||
approved: ["deploying", "cancelled"],
|
||||
deploying: ["deployed", "failed", "rolled_back"],
|
||||
rejected: [], // terminal
|
||||
cancelled: [], // terminal
|
||||
deployed: [], // terminal (for this promotion)
|
||||
failed: ["rolled_back"], // can trigger rollback
|
||||
rolled_back: [] // terminal
|
||||
};
|
||||
```
|
||||
|
||||
### Transition Events
|
||||
|
||||
```typescript
|
||||
interface PromotionTransition {
|
||||
promotionId: UUID;
|
||||
fromState: PromotionStatus;
|
||||
toState: PromotionStatus;
|
||||
trigger: TransitionTrigger;
|
||||
triggeredBy: UUID; // user or system
|
||||
timestamp: DateTime;
|
||||
details: object;
|
||||
}
|
||||
|
||||
type TransitionTrigger =
|
||||
| "approval_granted"
|
||||
| "approval_rejected"
|
||||
| "gate_passed"
|
||||
| "gate_failed"
|
||||
| "deployment_started"
|
||||
| "deployment_completed"
|
||||
| "deployment_failed"
|
||||
| "rollback_triggered"
|
||||
| "rollback_completed"
|
||||
| "user_cancelled";
|
||||
```
|
||||
|
||||
## Promotion Flow
|
||||
|
||||
### 1. Request Promotion
|
||||
|
||||
```typescript
|
||||
async function requestPromotion(request: PromotionRequest): Promise<Promotion> {
|
||||
// Validate release exists and is ready
|
||||
const release = await getRelease(request.releaseId);
|
||||
if (release.status !== "ready" && release.status !== "deployed") {
|
||||
throw new Error("Release not ready for promotion");
|
||||
}
|
||||
|
||||
// Validate target environment
|
||||
const environment = await getEnvironment(request.targetEnvironmentId);
|
||||
|
||||
// Check freeze windows
|
||||
if (await isEnvironmentFrozen(environment.id)) {
|
||||
throw new Error("Environment is frozen");
|
||||
}
|
||||
|
||||
// Determine initial state
|
||||
const requiresApproval = environment.requiredApprovals > 0;
|
||||
const initialStatus = requiresApproval ? "pending_approval" : "pending_gate";
|
||||
|
||||
// Create promotion
|
||||
const promotion = await createPromotion({
|
||||
releaseId: request.releaseId,
|
||||
sourceEnvironmentId: release.currentEnvironmentId,
|
||||
targetEnvironmentId: environment.id,
|
||||
status: initialStatus,
|
||||
requestedBy: request.userId,
|
||||
requestReason: request.reason
|
||||
});
|
||||
|
||||
// Emit event
|
||||
await emitEvent("promotion.requested", promotion);
|
||||
|
||||
return promotion;
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Approval Phase
|
||||
|
||||
```typescript
|
||||
async function processApproval(
|
||||
promotionId: UUID,
|
||||
approverId: UUID,
|
||||
action: "approve" | "reject",
|
||||
comment?: string
|
||||
): Promise<Promotion> {
|
||||
const promotion = await getPromotion(promotionId);
|
||||
const environment = await getEnvironment(promotion.targetEnvironmentId);
|
||||
|
||||
// Validate approver can approve
|
||||
await validateApproverPermission(approverId, environment.id);
|
||||
|
||||
// Check separation of duties
|
||||
if (environment.requireSeparationOfDuties) {
|
||||
if (approverId === promotion.requestedBy) {
|
||||
throw new Error("Separation of duties violation: requester cannot approve");
|
||||
}
|
||||
}
|
||||
|
||||
// Record approval
|
||||
await recordApproval({
|
||||
promotionId,
|
||||
approverId,
|
||||
action,
|
||||
comment
|
||||
});
|
||||
|
||||
if (action === "reject") {
|
||||
return await transitionState(promotion, "rejected", {
|
||||
trigger: "approval_rejected",
|
||||
triggeredBy: approverId,
|
||||
details: { reason: comment }
|
||||
});
|
||||
}
|
||||
|
||||
// Check if all required approvals received
|
||||
const approvalCount = await countApprovals(promotionId);
|
||||
if (approvalCount >= environment.requiredApprovals) {
|
||||
return await transitionState(promotion, "pending_gate", {
|
||||
trigger: "approval_granted",
|
||||
triggeredBy: approverId
|
||||
});
|
||||
}
|
||||
|
||||
return promotion;
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Gate Evaluation
|
||||
|
||||
```typescript
|
||||
async function evaluateGates(promotionId: UUID): Promise<GateEvaluationResult> {
|
||||
const promotion = await getPromotion(promotionId);
|
||||
const environment = await getEnvironment(promotion.targetEnvironmentId);
|
||||
const release = await getRelease(promotion.releaseId);
|
||||
|
||||
const gateResults: GateResult[] = [];
|
||||
|
||||
// Security gate
|
||||
const securityResult = await evaluateSecurityGate(release, environment);
|
||||
gateResults.push(securityResult);
|
||||
|
||||
// Custom policy gates
|
||||
for (const policy of environment.policies) {
|
||||
const policyResult = await evaluatePolicyGate(release, environment, policy);
|
||||
gateResults.push(policyResult);
|
||||
}
|
||||
|
||||
// Aggregate results
|
||||
const allPassed = gateResults.every(g => g.passed);
|
||||
const blockingFailures = gateResults.filter(g => !g.passed && g.blocking);
|
||||
|
||||
// Create decision record
|
||||
const decisionRecord = await createDecisionRecord({
|
||||
promotionId,
|
||||
gateResults,
|
||||
decision: allPassed ? "allow" : "block",
|
||||
decidedAt: new Date()
|
||||
});
|
||||
|
||||
// Transition state
|
||||
if (allPassed) {
|
||||
await transitionState(promotion, "approved", {
|
||||
trigger: "gate_passed",
|
||||
triggeredBy: "system",
|
||||
details: { decisionRecordId: decisionRecord.id }
|
||||
});
|
||||
} else {
|
||||
await transitionState(promotion, "rejected", {
|
||||
trigger: "gate_failed",
|
||||
triggeredBy: "system",
|
||||
details: { blockingGates: blockingFailures }
|
||||
});
|
||||
}
|
||||
|
||||
return { passed: allPassed, gateResults, decisionRecord };
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Deployment Execution
|
||||
|
||||
```typescript
|
||||
async function executeDeployment(promotionId: UUID): Promise<DeploymentJob> {
|
||||
const promotion = await getPromotion(promotionId);
|
||||
|
||||
// Transition to deploying
|
||||
await transitionState(promotion, "deploying", {
|
||||
trigger: "deployment_started",
|
||||
triggeredBy: "system"
|
||||
});
|
||||
|
||||
// Generate artifacts
|
||||
const artifacts = await generateArtifacts(promotion);
|
||||
|
||||
// Create deployment job
|
||||
const job = await createDeploymentJob({
|
||||
promotionId,
|
||||
releaseId: promotion.releaseId,
|
||||
environmentId: promotion.targetEnvironmentId,
|
||||
artifacts
|
||||
});
|
||||
|
||||
// Execute via workflow or direct
|
||||
const workflowRun = await startDeploymentWorkflow(job);
|
||||
|
||||
// Update promotion with workflow reference
|
||||
await updatePromotion(promotionId, { workflowRunId: workflowRun.id });
|
||||
|
||||
return job;
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Completion Handling
|
||||
|
||||
```typescript
|
||||
async function handleDeploymentCompletion(
|
||||
jobId: UUID,
|
||||
status: "succeeded" | "failed"
|
||||
): Promise<Promotion> {
|
||||
const job = await getDeploymentJob(jobId);
|
||||
const promotion = await getPromotion(job.promotionId);
|
||||
|
||||
if (status === "succeeded") {
|
||||
// Generate evidence packet
|
||||
const evidence = await generateEvidencePacket(promotion, job);
|
||||
|
||||
// Update release environment state
|
||||
await updateReleaseEnvironmentState({
|
||||
releaseId: promotion.releaseId,
|
||||
environmentId: promotion.targetEnvironmentId,
|
||||
status: "deployed",
|
||||
promotionId: promotion.id,
|
||||
evidenceRef: evidence.id
|
||||
});
|
||||
|
||||
return await transitionState(promotion, "deployed", {
|
||||
trigger: "deployment_completed",
|
||||
triggeredBy: "system",
|
||||
details: { evidencePacketId: evidence.id }
|
||||
});
|
||||
} else {
|
||||
return await transitionState(promotion, "failed", {
|
||||
trigger: "deployment_failed",
|
||||
triggeredBy: "system",
|
||||
details: { jobId, error: job.errorMessage }
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Decision Record
|
||||
|
||||
Every promotion produces a decision record:
|
||||
|
||||
```typescript
|
||||
interface DecisionRecord {
|
||||
id: UUID;
|
||||
promotionId: UUID;
|
||||
decision: "allow" | "block";
|
||||
decidedAt: DateTime;
|
||||
|
||||
// Inputs
|
||||
release: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
components: Array<{ name: string; digest: string }>;
|
||||
};
|
||||
environment: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
};
|
||||
|
||||
// Gate results
|
||||
gateResults: Array<{
|
||||
gateName: string;
|
||||
gateType: string;
|
||||
passed: boolean;
|
||||
blocking: boolean;
|
||||
message: string;
|
||||
details: object;
|
||||
evaluatedAt: DateTime;
|
||||
}>;
|
||||
|
||||
// Approvals
|
||||
approvals: Array<{
|
||||
approverId: UUID;
|
||||
approverName: string;
|
||||
action: "approved" | "rejected";
|
||||
comment?: string;
|
||||
timestamp: DateTime;
|
||||
}>;
|
||||
|
||||
// Context
|
||||
requester: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
};
|
||||
requestReason: string;
|
||||
|
||||
// Signature
|
||||
contentHash: string;
|
||||
signature: string;
|
||||
}
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Request promotion
|
||||
POST /api/v1/promotions
|
||||
Body: { releaseId, targetEnvironmentId, reason? }
|
||||
Response: Promotion
|
||||
|
||||
# Approve/reject promotion
|
||||
POST /api/v1/promotions/{id}/approve
|
||||
POST /api/v1/promotions/{id}/reject
|
||||
Body: { comment? }
|
||||
Response: Promotion
|
||||
|
||||
# Cancel promotion
|
||||
POST /api/v1/promotions/{id}/cancel
|
||||
Response: Promotion
|
||||
|
||||
# Get decision record
|
||||
GET /api/v1/promotions/{id}/decision
|
||||
Response: DecisionRecord
|
||||
|
||||
# Preview gates (dry run)
|
||||
POST /api/v1/promotions/preview-gates
|
||||
Body: { releaseId, targetEnvironmentId }
|
||||
Response: { wouldPass: boolean, gates: GateResult[] }
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Workflow Templates](templates.md)
|
||||
- [Workflow Execution](execution.md)
|
||||
- [Evidence Schema](../appendices/evidence-schema.md)
|
||||
327
docs/modules/release-orchestrator/workflow/templates.md
Normal file
327
docs/modules/release-orchestrator/workflow/templates.md
Normal file
@@ -0,0 +1,327 @@
|
||||
# Workflow Template Structure
|
||||
|
||||
## Overview
|
||||
|
||||
Workflow templates define the DAG (Directed Acyclic Graph) of steps to execute during deployment, promotion, and other automated processes.
|
||||
|
||||
## Template Structure
|
||||
|
||||
```typescript
|
||||
interface WorkflowTemplate {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string; // "standard-deploy"
|
||||
displayName: string; // "Standard Deployment"
|
||||
description: string;
|
||||
version: number; // Auto-incremented
|
||||
|
||||
// DAG structure
|
||||
nodes: StepNode[];
|
||||
edges: StepEdge[];
|
||||
|
||||
// I/O definitions
|
||||
inputs: InputDefinition[];
|
||||
outputs: OutputDefinition[];
|
||||
|
||||
// Metadata
|
||||
tags: string[];
|
||||
isBuiltin: boolean;
|
||||
createdAt: DateTime;
|
||||
createdBy: UUID;
|
||||
}
|
||||
```
|
||||
|
||||
## Node Types
|
||||
|
||||
### Step Node
|
||||
|
||||
```typescript
|
||||
interface StepNode {
|
||||
id: string; // Unique within template (e.g., "deploy-api")
|
||||
type: string; // Step type from registry
|
||||
name: string; // Display name
|
||||
config: Record<string, any>; // Step-specific configuration
|
||||
inputs: InputBinding[]; // Input value bindings
|
||||
outputs: OutputBinding[]; // Output declarations
|
||||
position: { x: number; y: number }; // UI position
|
||||
|
||||
// Execution settings
|
||||
timeout: number; // Seconds (default from step type)
|
||||
retryPolicy: RetryPolicy;
|
||||
onFailure: FailureAction;
|
||||
condition?: string; // JS expression for conditional execution
|
||||
|
||||
// Documentation
|
||||
description?: string;
|
||||
documentation?: string;
|
||||
}
|
||||
|
||||
type FailureAction = "fail" | "continue" | "rollback" | "goto:{nodeId}";
|
||||
|
||||
interface RetryPolicy {
|
||||
maxRetries: number;
|
||||
backoffType: "fixed" | "exponential";
|
||||
backoffSeconds: number;
|
||||
retryableErrors: string[];
|
||||
}
|
||||
```
|
||||
|
||||
### Input Bindings
|
||||
|
||||
```typescript
|
||||
interface InputBinding {
|
||||
name: string; // Input parameter name
|
||||
source: InputSource;
|
||||
}
|
||||
|
||||
type InputSource =
|
||||
| { type: "literal"; value: any }
|
||||
| { type: "context"; path: string } // e.g., "release.name"
|
||||
| { type: "output"; nodeId: string; outputName: string }
|
||||
| { type: "secret"; secretName: string }
|
||||
| { type: "expression"; expression: string }; // JS expression
|
||||
```
|
||||
|
||||
### Edge Types
|
||||
|
||||
```typescript
|
||||
interface StepEdge {
|
||||
id: string;
|
||||
from: string; // Source node ID
|
||||
to: string; // Target node ID
|
||||
condition?: string; // Optional condition expression
|
||||
label?: string; // Display label for conditional edges
|
||||
}
|
||||
```
|
||||
|
||||
## Built-in Step Types
|
||||
|
||||
### Control Steps
|
||||
|
||||
| Type | Description | Config |
|
||||
|------|-------------|--------|
|
||||
| `approval` | Wait for human approval | `promotionId` |
|
||||
| `wait` | Wait for specified duration | `durationSeconds` |
|
||||
| `condition` | Branch based on condition | `expression` |
|
||||
| `parallel` | Execute children in parallel | `maxConcurrency` |
|
||||
|
||||
### Gate Steps
|
||||
|
||||
| Type | Description | Config |
|
||||
|------|-------------|--------|
|
||||
| `security-gate` | Evaluate security policy | `blockOnCritical`, `blockOnHigh` |
|
||||
| `custom-gate` | Custom OPA policy evaluation | `policyName` |
|
||||
| `freeze-check` | Check freeze windows | - |
|
||||
| `approval-check` | Check approval status | `requiredCount` |
|
||||
|
||||
### Deploy Steps
|
||||
|
||||
| Type | Description | Config |
|
||||
|------|-------------|--------|
|
||||
| `deploy-docker` | Deploy single container | `containerName`, `strategy` |
|
||||
| `deploy-compose` | Deploy Docker Compose stack | `composePath`, `strategy` |
|
||||
| `deploy-ecs` | Deploy to AWS ECS | `cluster`, `service` |
|
||||
| `deploy-nomad` | Deploy to HashiCorp Nomad | `jobName` |
|
||||
|
||||
### Verification Steps
|
||||
|
||||
| Type | Description | Config |
|
||||
|------|-------------|--------|
|
||||
| `health-check` | HTTP/TCP health check | `type`, `path`, `expectedStatus` |
|
||||
| `smoke-test` | Run smoke test suite | `testSuite`, `timeout` |
|
||||
| `verify-digest` | Verify deployed digest | `expectedDigest` |
|
||||
|
||||
### Integration Steps
|
||||
|
||||
| Type | Description | Config |
|
||||
|------|-------------|--------|
|
||||
| `webhook` | Call external webhook | `url`, `method`, `headers` |
|
||||
| `trigger-ci` | Trigger CI pipeline | `integrationId`, `pipelineId` |
|
||||
| `wait-ci` | Wait for CI pipeline | `runId`, `timeout` |
|
||||
|
||||
### Notification Steps
|
||||
|
||||
| Type | Description | Config |
|
||||
|------|-------------|--------|
|
||||
| `notify` | Send notification | `channel`, `template` |
|
||||
| `slack` | Send Slack message | `channel`, `message` |
|
||||
| `email` | Send email | `recipients`, `template` |
|
||||
|
||||
### Recovery Steps
|
||||
|
||||
| Type | Description | Config |
|
||||
|------|-------------|--------|
|
||||
| `rollback` | Rollback deployment | `strategy`, `targetReleaseId` |
|
||||
| `execute-script` | Run recovery script | `scriptType`, `scriptRef` |
|
||||
|
||||
## Template Example: Standard Deployment
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "template-standard-deploy",
|
||||
"name": "standard-deploy",
|
||||
"displayName": "Standard Deployment",
|
||||
"version": 1,
|
||||
"inputs": [
|
||||
{ "name": "releaseId", "type": "uuid", "required": true },
|
||||
{ "name": "environmentId", "type": "uuid", "required": true },
|
||||
{ "name": "promotionId", "type": "uuid", "required": true }
|
||||
],
|
||||
"nodes": [
|
||||
{
|
||||
"id": "approval",
|
||||
"type": "approval",
|
||||
"name": "Approval Gate",
|
||||
"config": {},
|
||||
"inputs": [
|
||||
{ "name": "promotionId", "source": { "type": "context", "path": "promotionId" } }
|
||||
],
|
||||
"position": { "x": 100, "y": 100 }
|
||||
},
|
||||
{
|
||||
"id": "security-gate",
|
||||
"type": "security-gate",
|
||||
"name": "Security Verification",
|
||||
"config": {
|
||||
"blockOnCritical": true,
|
||||
"blockOnHigh": true
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "releaseId", "source": { "type": "context", "path": "releaseId" } }
|
||||
],
|
||||
"position": { "x": 100, "y": 200 }
|
||||
},
|
||||
{
|
||||
"id": "pre-deploy-hook",
|
||||
"type": "execute-script",
|
||||
"name": "Pre-Deploy Hook",
|
||||
"config": {
|
||||
"scriptType": "csharp",
|
||||
"scriptRef": "hooks/pre-deploy.csx"
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "release", "source": { "type": "context", "path": "release" } },
|
||||
{ "name": "environment", "source": { "type": "context", "path": "environment" } }
|
||||
],
|
||||
"timeout": 300,
|
||||
"onFailure": "fail",
|
||||
"position": { "x": 100, "y": 300 }
|
||||
},
|
||||
{
|
||||
"id": "deploy-targets",
|
||||
"type": "deploy-compose",
|
||||
"name": "Deploy to Targets",
|
||||
"config": {
|
||||
"strategy": "rolling",
|
||||
"parallelism": 2
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "releaseId", "source": { "type": "context", "path": "releaseId" } },
|
||||
{ "name": "environmentId", "source": { "type": "context", "path": "environmentId" } }
|
||||
],
|
||||
"timeout": 600,
|
||||
"retryPolicy": {
|
||||
"maxRetries": 2,
|
||||
"backoffType": "exponential",
|
||||
"backoffSeconds": 30
|
||||
},
|
||||
"onFailure": "rollback",
|
||||
"position": { "x": 100, "y": 400 }
|
||||
},
|
||||
{
|
||||
"id": "health-check",
|
||||
"type": "health-check",
|
||||
"name": "Health Verification",
|
||||
"config": {
|
||||
"type": "http",
|
||||
"path": "/health",
|
||||
"expectedStatus": 200,
|
||||
"timeout": 30,
|
||||
"retries": 5
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "targets", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "deployedTargets" } }
|
||||
],
|
||||
"onFailure": "rollback",
|
||||
"position": { "x": 100, "y": 500 }
|
||||
},
|
||||
{
|
||||
"id": "post-deploy-hook",
|
||||
"type": "execute-script",
|
||||
"name": "Post-Deploy Hook",
|
||||
"config": {
|
||||
"scriptType": "bash",
|
||||
"inline": "echo 'Deployment complete'"
|
||||
},
|
||||
"timeout": 300,
|
||||
"onFailure": "continue",
|
||||
"position": { "x": 100, "y": 600 }
|
||||
},
|
||||
{
|
||||
"id": "notify-success",
|
||||
"type": "notify",
|
||||
"name": "Success Notification",
|
||||
"config": {
|
||||
"channel": "slack",
|
||||
"template": "deployment-success"
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "release", "source": { "type": "context", "path": "release" } },
|
||||
{ "name": "environment", "source": { "type": "context", "path": "environment" } }
|
||||
],
|
||||
"onFailure": "continue",
|
||||
"position": { "x": 100, "y": 700 }
|
||||
},
|
||||
{
|
||||
"id": "rollback-handler",
|
||||
"type": "rollback",
|
||||
"name": "Rollback Handler",
|
||||
"config": {
|
||||
"strategy": "to-previous"
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "deploymentJobId", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "jobId" } }
|
||||
],
|
||||
"position": { "x": 300, "y": 450 }
|
||||
},
|
||||
{
|
||||
"id": "notify-failure",
|
||||
"type": "notify",
|
||||
"name": "Failure Notification",
|
||||
"config": {
|
||||
"channel": "slack",
|
||||
"template": "deployment-failure"
|
||||
},
|
||||
"onFailure": "continue",
|
||||
"position": { "x": 300, "y": 550 }
|
||||
}
|
||||
],
|
||||
"edges": [
|
||||
{ "id": "e1", "from": "approval", "to": "security-gate" },
|
||||
{ "id": "e2", "from": "security-gate", "to": "pre-deploy-hook" },
|
||||
{ "id": "e3", "from": "pre-deploy-hook", "to": "deploy-targets" },
|
||||
{ "id": "e4", "from": "deploy-targets", "to": "health-check" },
|
||||
{ "id": "e5", "from": "health-check", "to": "post-deploy-hook" },
|
||||
{ "id": "e6", "from": "post-deploy-hook", "to": "notify-success" },
|
||||
{ "id": "e7", "from": "deploy-targets", "to": "rollback-handler", "condition": "status === 'failed'" },
|
||||
{ "id": "e8", "from": "health-check", "to": "rollback-handler", "condition": "status === 'failed'" },
|
||||
{ "id": "e9", "from": "rollback-handler", "to": "notify-failure" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Template Validation
|
||||
|
||||
Templates are validated for:
|
||||
|
||||
1. **Structural validity**: Valid JSON/YAML, required fields present
|
||||
2. **DAG validity**: No cycles, all edges reference valid nodes
|
||||
3. **Type validity**: All step types exist in registry
|
||||
4. **Schema validity**: Step configs match type schemas
|
||||
5. **Input validity**: All required inputs are bindable
|
||||
|
||||
## References
|
||||
|
||||
- [Workflow Engine](../modules/workflow-engine.md)
|
||||
- [Execution State Machine](execution.md)
|
||||
- [Step Registry](../modules/workflow-engine.md#module-step-registry)
|
||||
Reference in New Issue
Block a user