This commit is contained in:
master
2026-01-11 11:19:42 +02:00
150 changed files with 76353 additions and 721 deletions

View File

@@ -0,0 +1,137 @@
# Release Orchestrator
> Central release control plane for non-Kubernetes container estates.
**Status:** Planned (not yet implemented)
**Source:** [Full Architecture Specification](../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
## Purpose
The Release Orchestrator extends Stella Ops from a vulnerability scanning platform into **Stella Ops Suite** — a unified release control plane for non-Kubernetes container environments. It integrates:
- **Existing capabilities**: SBOM generation, reachability-aware vulnerability analysis, VEX support, policy engine, evidence locker, deterministic replay
- **New capabilities**: Environment management, release orchestration, promotion workflows, deployment execution, progressive delivery, audit-grade release governance
## Scope
| In Scope | Out of Scope |
|----------|--------------|
| Non-K8s container deployments (Docker, Compose, ECS, Nomad) | Kubernetes deployments (use ArgoCD, Flux) |
| Release identity via OCI digests | Tag-based release identity |
| Plugin-extensible integrations | Hard-coded vendor integrations |
| SSH/WinRM + agent-based deployment | Cloud-native serverless deployments |
| L4/L7 traffic management via router plugins | Built-in service mesh |
## Documentation Structure
### Design & Principles
- [Design Principles](design/principles.md) — Core principles and invariants
- [Key Decisions](design/decisions.md) — Architectural decision record
### Implementation
- [Implementation Guide](implementation-guide.md) — .NET 10 patterns and best practices
- [Test Structure](test-structure.md) — Test organization and guidelines
### Module Architecture
- [Module Overview](modules/overview.md) — All modules and themes
- [Integration Hub (INTHUB)](modules/integration-hub.md) — External integrations
- [Environment Manager (ENVMGR)](modules/environment-manager.md) — Environments and targets
- [Release Manager (RELMAN)](modules/release-manager.md) — Release bundles and versions
- [Workflow Engine (WORKFL)](modules/workflow-engine.md) — DAG execution
- [Promotion Manager (PROMOT)](modules/promotion-manager.md) — Approvals and gates
- [Deploy Orchestrator (DEPLOY)](modules/deploy-orchestrator.md) — Deployment execution
- [Agents (AGENTS)](modules/agents.md) — Deployment agents
- [Progressive Delivery (PROGDL)](modules/progressive-delivery.md) — A/B and canary
- [Release Evidence (RELEVI)](modules/evidence.md) — Evidence packets
- [Plugin System (PLUGIN)](modules/plugin-system.md) — Plugin infrastructure
### Data Model
- [Database Schema](data-model/schema.md) — PostgreSQL schema specification
- [Entity Definitions](data-model/entities.md) — Entity descriptions
### API Specification
- [API Overview](api/overview.md) — API design principles
- [Environment APIs](api/environments.md) — Environment endpoints
- [Release APIs](api/releases.md) — Release endpoints
- [Promotion APIs](api/promotions.md) — Promotion endpoints
- [Workflow APIs](api/workflows.md) — Workflow endpoints
- [Agent APIs](api/agents.md) — Agent endpoints
- [WebSocket APIs](api/websockets.md) — Real-time endpoints
### Workflow Engine
- [Template Structure](workflow/templates.md) — Workflow template specification
- [Execution State Machine](workflow/execution.md) — Workflow state machine
- [Promotion State Machine](workflow/promotion.md) — Promotion state machine
### Security
- [Security Overview](security/overview.md) — Security principles
- [Authentication & Authorization](security/auth.md) — AuthN/AuthZ
- [Agent Security](security/agent-security.md) — Agent security model
- [Threat Model](security/threat-model.md) — Threats and mitigations
- [Audit Trail](security/audit-trail.md) — Audit logging
### Integrations
- [Integration Overview](integrations/overview.md) — Integration types
- [Connector Interface](integrations/connectors.md) — Connector specification
- [Webhook Architecture](integrations/webhooks.md) — Webhook handling
- [CI/CD Patterns](integrations/ci-cd.md) — CI/CD integration patterns
### Deployment
- [Deployment Overview](deployment/overview.md) — Architecture overview
- [Deployment Strategies](deployment/strategies.md) — Deployment strategies
- [Agent-Based Deployment](deployment/agent-based.md) — Agent deployment
- [Agentless Deployment](deployment/agentless.md) — SSH/WinRM deployment
- [Artifact Generation](deployment/artifacts.md) — Generated artifacts
### Progressive Delivery
- [Progressive Overview](progressive-delivery/overview.md) — Progressive delivery architecture
- [A/B Releases](progressive-delivery/ab-releases.md) — A/B release models
- [Canary Controller](progressive-delivery/canary.md) — Canary implementation
- [Router Plugins](progressive-delivery/routers.md) — Traffic routing plugins
### UI/UX
- [Dashboard Specification](ui/dashboard.md) — Dashboard screens
- [Workflow Editor](ui/workflow-editor.md) — Workflow editor
- [Screen Reference](ui/screens.md) — Key UI screens
### Operations
- [Metrics](operations/metrics.md) — Metrics specification
- [Logging](operations/logging.md) — Logging patterns
- [Tracing](operations/tracing.md) — Distributed tracing
- [Alerting](operations/alerting.md) — Alert rules
### Implementation
- [Roadmap](roadmap.md) — Implementation phases
- [Resource Requirements](roadmap.md#resource-requirements) — Sizing
### Appendices
- [Glossary](appendices/glossary.md) — Term definitions
- [Configuration Reference](appendices/config.md) — Configuration options
- [Error Codes](appendices/errors.md) — API error codes
- [Evidence Schema](appendices/evidence-schema.md) — Evidence packet format
## Quick Reference
### Key Principles
1. **Digest-first release identity** — Releases are immutable OCI digests, not tags
2. **Evidence for every decision** — Every promotion/deployment produces sealed evidence
3. **Pluggable everything, stable core** — Integrations are plugins; core is stable
4. **No feature gating** — All plans include all features
5. **Offline-first operation** — Core works in air-gapped environments
6. **Immutable generated artifacts** — Every deployment generates stored artifacts
### Platform Themes
| Theme | Purpose |
|-------|---------|
| **INTHUB** | Integration hub — external system connections |
| **ENVMGR** | Environment management — environments, targets, agents |
| **RELMAN** | Release management — components, versions, releases |
| **WORKFL** | Workflow engine — DAG execution, steps |
| **PROMOT** | Promotion — approvals, gates, decisions |
| **DEPLOY** | Deployment — execution, artifacts, rollback |
| **AGENTS** | Agents — Docker, Compose, ECS, Nomad |
| **PROGDL** | Progressive delivery — A/B, canary |
| **RELEVI** | Evidence — packets, stickers, audit |
| **PLUGIN** | Plugins — registry, loader, SDK |

View File

@@ -0,0 +1,274 @@
# Agent APIs
> API endpoints for agent registration, lifecycle management, and task coordination.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 6.3.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Agents Module](../modules/agents.md), [Agent Security](../security/agent-security.md)
## Overview
The Agent API provides endpoints for registering deployment agents, managing their lifecycle, and coordinating task execution. Agents use mTLS for secure communication after initial registration.
---
## Registration Endpoints
### Register Agent
**Endpoint:** `POST /api/v1/agents/register`
Registers a new agent with the orchestrator. Requires a one-time registration token.
**Headers:**
```
X-Agent-Token: {registration-token}
```
**Request:**
```json
{
"name": "agent-prod-01",
"version": "1.0.0",
"capabilities": ["docker", "compose"],
"labels": {
"datacenter": "us-east-1",
"role": "deployment"
}
}
```
**Response:** `201 Created`
```json
{
"agentId": "uuid",
"token": "jwt-token-for-subsequent-requests",
"config": {
"heartbeatInterval": 30,
"taskPollInterval": 5,
"logLevel": "info"
},
"certificate": {
"cert": "-----BEGIN CERTIFICATE-----...",
"key": "-----BEGIN PRIVATE KEY-----...",
"ca": "-----BEGIN CERTIFICATE-----...",
"expiresAt": "2026-01-11T14:23:45Z"
}
}
```
**Notes:**
- Registration token is single-use and expires after 24 hours
- After registration, agent must use mTLS for all subsequent requests
- Certificate is short-lived (24h) and must be renewed via heartbeat
---
## Lifecycle Endpoints
### List Agents
**Endpoint:** `GET /api/v1/agents`
**Query Parameters:**
- `status` (string): Filter by status (`online`, `offline`, `degraded`)
- `capability` (string): Filter by capability (`docker`, `compose`, `ssh`, `winrm`, `ecs`, `nomad`)
**Response:** `200 OK`
```json
[
{
"id": "uuid",
"name": "agent-prod-01",
"version": "1.0.0",
"status": "online",
"capabilities": ["docker", "compose"],
"lastHeartbeat": "2026-01-10T14:23:45Z",
"resourceUsage": {
"cpu": 15.5,
"memory": 45.2
}
}
]
```
### Get Agent
**Endpoint:** `GET /api/v1/agents/{id}`
**Response:** `200 OK` - Full agent details including assigned targets
### Update Agent
**Endpoint:** `PUT /api/v1/agents/{id}`
**Request:**
```json
{
"labels": {
"datacenter": "us-west-2"
},
"capabilities": ["docker", "compose", "ssh"]
}
```
**Response:** `200 OK` - Updated agent
### Delete Agent
**Endpoint:** `DELETE /api/v1/agents/{id}`
Revokes agent credentials and removes registration.
**Response:** `200 OK`
```json
{ "deleted": true }
```
---
## Heartbeat Endpoints
### Send Heartbeat
**Endpoint:** `POST /api/v1/agents/{id}/heartbeat`
Agents must send heartbeats at the configured interval to maintain online status and receive pending tasks.
**Request:**
```json
{
"status": "healthy",
"resourceUsage": {
"cpu": 15.5,
"memory": 45.2,
"disk": 60.0
},
"capabilities": ["docker", "compose"],
"runningTasks": 2
}
```
**Response:** `200 OK`
```json
{
"tasks": [
{
"taskId": "uuid",
"taskType": "docker.pull",
"payload": {
"image": "myapp",
"tag": "v2.3.1",
"digest": "sha256:abc123..."
},
"credentials": {
"registry.username": "user",
"registry.password": "token"
},
"timeout": 300
}
],
"certificateRenewal": {
"cert": "-----BEGIN CERTIFICATE-----...",
"expiresAt": "2026-01-11T14:23:45Z"
}
}
```
**Notes:**
- Certificate renewal is included when current certificate is within 1 hour of expiration
- Tasks array contains pending work for the agent
- Missing heartbeats for 3 intervals marks agent as `offline`
---
## Task Endpoints
### Complete Task
**Endpoint:** `POST /api/v1/agents/{id}/tasks/{taskId}/complete`
Reports task completion status back to the orchestrator.
**Request:**
```json
{
"success": true,
"result": {
"imageId": "sha256:abc123...",
"containerId": "container-uuid"
},
"logs": [
{ "timestamp": "2026-01-10T14:23:45Z", "level": "info", "message": "Pulling image..." },
{ "timestamp": "2026-01-10T14:23:50Z", "level": "info", "message": "Image pulled successfully" }
]
}
```
**Response:** `200 OK`
```json
{ "acknowledged": true }
```
### Get Pending Tasks
**Endpoint:** `GET /api/v1/agents/{id}/tasks`
Alternative to heartbeat for polling pending tasks.
**Response:** `200 OK`
```json
{
"tasks": [
{
"taskId": "uuid",
"taskType": "docker.run",
"priority": 10,
"createdAt": "2026-01-10T14:20:00Z"
}
]
}
```
---
## WebSocket Endpoints
### Task Stream
**Endpoint:** `WS /api/v1/agents/{id}/task-stream`
Real-time task assignment stream for agents.
**Messages (Server to Agent):**
```json
{ "type": "task_assigned", "task": { "taskId": "uuid", "taskType": "docker.pull", ... } }
{ "type": "task_cancelled", "taskId": "uuid" }
```
**Messages (Agent to Server):**
```json
{ "type": "task_progress", "taskId": "uuid", "progress": 50, "message": "Pulling layer 3/5" }
{ "type": "task_log", "taskId": "uuid", "level": "info", "message": "..." }
```
---
## Error Responses
| Status Code | Description |
|-------------|-------------|
| `401` | Invalid or expired registration token |
| `403` | Agent not authorized for this operation |
| `404` | Agent not found |
| `409` | Agent name already registered |
| `503` | Agent offline or unreachable |
---
## See Also
- [Environments API](environments.md)
- [Agents Module](../modules/agents.md)
- [Agent Security](../security/agent-security.md)
- [WebSocket APIs](websockets.md)

View File

@@ -0,0 +1,289 @@
# Environment Management APIs
> API endpoints for managing environments, targets, agents, freeze windows, and inventory.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 6.3.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Environment Manager](../modules/environment-manager.md), [Agents](../modules/agents.md)
## Overview
The Environment Management API provides CRUD operations for environments, target groups, deployment targets, agents, freeze windows, and inventory synchronization. All endpoints require authentication and respect tenant isolation via Row-Level Security.
---
## Environment Endpoints
### Create Environment
**Endpoint:** `POST /api/v1/environments`
**Request:**
```json
{
"name": "production",
"displayName": "Production",
"orderIndex": 3,
"config": {
"deploymentTimeout": 600,
"healthCheckInterval": 30
},
"requiredApprovals": 2,
"requireSod": true,
"promotionPolicy": "default"
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"name": "production",
"displayName": "Production",
"orderIndex": 3,
"isProduction": true,
"requiredApprovals": 2,
"requireSeparationOfDuties": true,
"createdAt": "2026-01-10T14:23:45Z"
}
```
### List Environments
**Endpoint:** `GET /api/v1/environments`
**Query Parameters:**
- `includeState` (boolean): Include current release state
**Response:** `200 OK`
```json
[
{
"id": "uuid",
"name": "development",
"displayName": "Development",
"orderIndex": 1,
"currentRelease": {
"id": "release-uuid",
"name": "myapp-v2.3.1",
"deployedAt": "2026-01-09T10:00:00Z"
}
}
]
```
### Get Environment
**Endpoint:** `GET /api/v1/environments/{id}`
**Response:** `200 OK` - Full environment details
### Update Environment
**Endpoint:** `PUT /api/v1/environments/{id}`
**Request:** Partial environment object
**Response:** `200 OK` - Updated environment
### Delete Environment
**Endpoint:** `DELETE /api/v1/environments/{id}`
**Response:** `200 OK`
```json
{ "deleted": true }
```
---
## Freeze Window Endpoints
### Create Freeze Window
**Endpoint:** `POST /api/v1/environments/{envId}/freeze-windows`
**Request:**
```json
{
"start": "2026-01-15T00:00:00Z",
"end": "2026-01-20T00:00:00Z",
"reason": "Holiday freeze",
"exceptions": ["user-uuid-1", "user-uuid-2"]
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"environmentId": "env-uuid",
"start": "2026-01-15T00:00:00Z",
"end": "2026-01-20T00:00:00Z",
"reason": "Holiday freeze",
"createdBy": "user-uuid"
}
```
### List Freeze Windows
**Endpoint:** `GET /api/v1/environments/{envId}/freeze-windows`
**Query Parameters:**
- `active` (boolean): Filter to active freeze windows only
**Response:** `200 OK` - Array of freeze windows
### Delete Freeze Window
**Endpoint:** `DELETE /api/v1/environments/{envId}/freeze-windows/{windowId}`
**Response:** `200 OK`
```json
{ "deleted": true }
```
---
## Target Group Endpoints
### Create Target Group
**Endpoint:** `POST /api/v1/environments/{envId}/target-groups`
### List Target Groups
**Endpoint:** `GET /api/v1/environments/{envId}/target-groups`
### Get Target Group
**Endpoint:** `GET /api/v1/target-groups/{id}`
### Update Target Group
**Endpoint:** `PUT /api/v1/target-groups/{id}`
### Delete Target Group
**Endpoint:** `DELETE /api/v1/target-groups/{id}`
---
## Target Endpoints
### Create Target
**Endpoint:** `POST /api/v1/targets`
**Request:**
```json
{
"environmentId": "env-uuid",
"targetGroupId": "group-uuid",
"name": "prod-web-01",
"targetType": "docker_host",
"connection": {
"host": "192.168.1.100",
"port": 2375,
"tlsEnabled": true
},
"labels": {
"role": "web",
"datacenter": "us-east-1"
},
"deploymentDirectory": "/opt/deployments"
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"name": "prod-web-01",
"targetType": "docker_host",
"healthStatus": "unknown",
"createdAt": "2026-01-10T14:23:45Z"
}
```
### List Targets
**Endpoint:** `GET /api/v1/targets`
**Query Parameters:**
- `environmentId` (UUID): Filter by environment
- `targetType` (string): Filter by type (`docker_host`, `compose_host`, `ecs_service`, `nomad_job`)
- `labels` (JSON): Filter by labels
- `healthStatus` (string): Filter by health status
**Response:** `200 OK` - Array of targets
### Get Target
**Endpoint:** `GET /api/v1/targets/{id}`
### Update Target
**Endpoint:** `PUT /api/v1/targets/{id}`
### Delete Target
**Endpoint:** `DELETE /api/v1/targets/{id}`
### Trigger Health Check
**Endpoint:** `POST /api/v1/targets/{id}/health-check`
**Response:** `200 OK`
```json
{
"status": "healthy",
"message": "Docker daemon responding",
"checkedAt": "2026-01-10T14:23:45Z"
}
```
### Get Version Sticker
**Endpoint:** `GET /api/v1/targets/{id}/sticker`
**Response:** `200 OK`
```json
{
"releaseId": "uuid",
"releaseName": "myapp-v2.3.1",
"components": [
{
"componentId": "uuid",
"componentName": "api",
"digest": "sha256:abc123..."
}
],
"deployedAt": "2026-01-09T10:00:00Z",
"deployedBy": "user-uuid"
}
```
### Check Drift
**Endpoint:** `GET /api/v1/targets/{id}/drift`
**Response:** `200 OK`
```json
{
"hasDrift": true,
"expected": { "releaseId": "uuid", "digest": "sha256:abc..." },
"actual": { "digest": "sha256:def..." },
"differences": [
{ "component": "api", "expected": "sha256:abc...", "actual": "sha256:def..." }
]
}
```
---
## See Also
- [Agents API](agents.md)
- [Environment Manager Module](../modules/environment-manager.md)
- [Agent Security](../security/agent-security.md)

View File

@@ -0,0 +1,299 @@
# API Overview
**Version**: v1
**Base Path**: `/api/v1`
## Design Principles
| Principle | Implementation |
|-----------|----------------|
| **RESTful** | Resource-oriented URLs, standard HTTP methods |
| **Versioned** | `/api/v1/...` prefix; breaking changes require version bump |
| **Consistent** | Standard response envelope, error format, pagination |
| **Authenticated** | OAuth 2.0 Bearer tokens via Authority module |
| **Tenant-scoped** | Tenant ID from token; all operations scoped to tenant |
| **Audited** | All mutating operations logged with user/timestamp |
## Authentication
All API requests require a valid JWT Bearer token:
```http
Authorization: Bearer <token>
```
Tokens are issued by the Authority module and contain:
- `user_id`: User identifier
- `tenant_id`: Tenant scope
- `roles`: User roles
- `permissions`: Specific permissions
## Standard Response Envelope
### Success Response
```typescript
interface ApiResponse<T> {
success: true;
data: T;
meta?: {
pagination?: PaginationMeta;
requestId: string;
timestamp: string;
};
}
```
### Error Response
```typescript
interface ApiErrorResponse {
success: false;
error: {
code: string; // e.g., "PROMOTION_BLOCKED"
message: string; // Human-readable message
details?: object; // Additional context
validationErrors?: ValidationError[];
};
meta: {
requestId: string;
timestamp: string;
};
}
interface ValidationError {
field: string;
message: string;
code: string;
}
```
### Pagination
```typescript
interface PaginationMeta {
page: number;
pageSize: number;
totalItems: number;
totalPages: number;
hasNext: boolean;
hasPrevious: boolean;
}
```
## HTTP Status Codes
| Code | Description |
|------|-------------|
| `200` | Success |
| `201` | Created |
| `204` | No Content |
| `400` | Bad Request - validation error |
| `401` | Unauthorized - invalid/missing token |
| `403` | Forbidden - insufficient permissions |
| `404` | Not Found |
| `409` | Conflict - resource state conflict |
| `422` | Unprocessable Entity - business rule violation |
| `429` | Too Many Requests - rate limited |
| `500` | Internal Server Error |
## Common Query Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `page` | integer | Page number (1-indexed) |
| `pageSize` | integer | Items per page (max 100) |
| `sort` | string | Sort field (prefix `-` for descending) |
| `filter` | string | JSON filter expression |
## API Modules
### Integration Hub (INTHUB)
```
GET /api/v1/integration-types
GET /api/v1/integration-types/{typeId}
POST /api/v1/integrations
GET /api/v1/integrations
GET /api/v1/integrations/{id}
PUT /api/v1/integrations/{id}
DELETE /api/v1/integrations/{id}
POST /api/v1/integrations/{id}/test
POST /api/v1/integrations/{id}/discover
GET /api/v1/integrations/{id}/health
```
### Environment & Inventory (ENVMGR)
```
POST /api/v1/environments
GET /api/v1/environments
GET /api/v1/environments/{id}
PUT /api/v1/environments/{id}
DELETE /api/v1/environments/{id}
POST /api/v1/environments/{envId}/freeze-windows
GET /api/v1/environments/{envId}/freeze-windows
DELETE /api/v1/environments/{envId}/freeze-windows/{windowId}
POST /api/v1/targets
GET /api/v1/targets
GET /api/v1/targets/{id}
PUT /api/v1/targets/{id}
DELETE /api/v1/targets/{id}
POST /api/v1/targets/{id}/health-check
GET /api/v1/targets/{id}/sticker
GET /api/v1/targets/{id}/drift
POST /api/v1/agents/register
GET /api/v1/agents
GET /api/v1/agents/{id}
PUT /api/v1/agents/{id}
DELETE /api/v1/agents/{id}
POST /api/v1/agents/{id}/heartbeat
```
### Release Management (RELMAN)
```
POST /api/v1/components
GET /api/v1/components
GET /api/v1/components/{id}
PUT /api/v1/components/{id}
DELETE /api/v1/components/{id}
POST /api/v1/components/{id}/sync-versions
GET /api/v1/components/{id}/versions
POST /api/v1/releases
GET /api/v1/releases
GET /api/v1/releases/{id}
PUT /api/v1/releases/{id}
DELETE /api/v1/releases/{id}
GET /api/v1/releases/{id}/state
POST /api/v1/releases/{id}/deprecate
GET /api/v1/releases/{id}/compare/{otherId}
POST /api/v1/releases/from-latest
```
### Workflow Engine (WORKFL)
```
POST /api/v1/workflow-templates
GET /api/v1/workflow-templates
GET /api/v1/workflow-templates/{id}
PUT /api/v1/workflow-templates/{id}
DELETE /api/v1/workflow-templates/{id}
POST /api/v1/workflow-templates/{id}/validate
GET /api/v1/step-types
GET /api/v1/step-types/{type}
POST /api/v1/workflow-runs
GET /api/v1/workflow-runs
GET /api/v1/workflow-runs/{id}
POST /api/v1/workflow-runs/{id}/pause
POST /api/v1/workflow-runs/{id}/resume
POST /api/v1/workflow-runs/{id}/cancel
GET /api/v1/workflow-runs/{id}/steps
GET /api/v1/workflow-runs/{id}/steps/{nodeId}
GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs
GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts
```
### Promotion & Approval (PROMOT)
```
POST /api/v1/promotions
GET /api/v1/promotions
GET /api/v1/promotions/{id}
POST /api/v1/promotions/{id}/approve
POST /api/v1/promotions/{id}/reject
POST /api/v1/promotions/{id}/cancel
GET /api/v1/promotions/{id}/decision
GET /api/v1/promotions/{id}/approvals
GET /api/v1/promotions/{id}/evidence
POST /api/v1/promotions/preview-gates
POST /api/v1/approval-policies
GET /api/v1/approval-policies
GET /api/v1/my/pending-approvals
```
### Deployment (DEPLOY)
```
GET /api/v1/deployment-jobs
GET /api/v1/deployment-jobs/{id}
GET /api/v1/deployment-jobs/{id}/tasks
GET /api/v1/deployment-jobs/{id}/tasks/{taskId}
GET /api/v1/deployment-jobs/{id}/tasks/{taskId}/logs
GET /api/v1/deployment-jobs/{id}/artifacts
GET /api/v1/deployment-jobs/{id}/artifacts/{artifactId}
POST /api/v1/rollbacks
GET /api/v1/rollbacks
```
### Progressive Delivery (PROGDL)
```
POST /api/v1/ab-releases
GET /api/v1/ab-releases
GET /api/v1/ab-releases/{id}
POST /api/v1/ab-releases/{id}/start
POST /api/v1/ab-releases/{id}/advance
POST /api/v1/ab-releases/{id}/promote
POST /api/v1/ab-releases/{id}/rollback
GET /api/v1/ab-releases/{id}/traffic
GET /api/v1/ab-releases/{id}/health
GET /api/v1/rollout-strategies
```
### Release Evidence (RELEVI)
```
GET /api/v1/evidence-packets
GET /api/v1/evidence-packets/{id}
GET /api/v1/evidence-packets/{id}/download
POST /api/v1/audit-reports
GET /api/v1/audit-reports/{id}
GET /api/v1/audit-reports/{id}/download
GET /api/v1/version-stickers
GET /api/v1/version-stickers/{id}
```
### Plugin Infrastructure (PLUGIN)
```
GET /api/v1/plugins
GET /api/v1/plugins/{id}
POST /api/v1/plugins/{id}/enable
POST /api/v1/plugins/{id}/disable
GET /api/v1/plugins/{id}/health
POST /api/v1/plugin-instances
GET /api/v1/plugin-instances
PUT /api/v1/plugin-instances/{id}
DELETE /api/v1/plugin-instances/{id}
```
## WebSocket Endpoints
```
WS /api/v1/workflow-runs/{id}/stream
WS /api/v1/deployment-jobs/{id}/stream
WS /api/v1/agents/{id}/task-stream
WS /api/v1/dashboard/stream
```
## Rate Limits
| Tier | Requests/minute | Burst |
|------|-----------------|-------|
| Standard | 1000 | 100 |
| Premium | 5000 | 500 |
Rate limit headers:
- `X-RateLimit-Limit`: Request limit
- `X-RateLimit-Remaining`: Remaining requests
- `X-RateLimit-Reset`: Reset timestamp
## References
- [Environments API](environments.md)
- [Releases API](releases.md)
- [Promotions API](promotions.md)
- [Workflows API](workflows.md)
- [Agents API](agents.md)
- [WebSocket API](websockets.md)

View File

@@ -0,0 +1,317 @@
# Promotion & Approval APIs
> API endpoints for managing promotions, approvals, and gate evaluations.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 6.3.5](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Promotion Manager Module](../modules/promotion-manager.md), [Workflow Promotion](../workflow/promotion.md)
## Overview
The Promotion API provides endpoints for requesting release promotions between environments, managing approvals, and evaluating promotion gates. Promotions enforce separation of duties (SoD) and require configured approvals before deployment proceeds.
---
## Promotion Endpoints
### Create Promotion Request
**Endpoint:** `POST /api/v1/promotions`
Initiates a promotion request for a release to a target environment.
**Request:**
```json
{
"releaseId": "uuid",
"targetEnvironmentId": "uuid",
"reason": "Deploying v2.3.1 with critical bug fix"
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"releaseId": "uuid",
"releaseName": "myapp-v2.3.1",
"sourceEnvironmentId": "uuid",
"sourceEnvironmentName": "Staging",
"targetEnvironmentId": "uuid",
"targetEnvironmentName": "Production",
"status": "pending",
"requestedBy": "user-uuid",
"requestedAt": "2026-01-10T14:23:45Z",
"reason": "Deploying v2.3.1 with critical bug fix"
}
```
**Status Flow:**
```
pending -> awaiting_approval -> approved -> deploying -> deployed
-> rejected
-> cancelled
-> failed
-> rolled_back
```
### List Promotions
**Endpoint:** `GET /api/v1/promotions`
**Query Parameters:**
- `status` (string): Filter by status
- `releaseId` (UUID): Filter by release
- `environmentId` (UUID): Filter by target environment
- `page` (number): Page number
**Response:** `200 OK`
```json
{
"data": [
{
"id": "uuid",
"releaseName": "myapp-v2.3.1",
"targetEnvironmentName": "Production",
"status": "awaiting_approval",
"requestedAt": "2026-01-10T14:23:45Z"
}
],
"meta": { "page": 1, "totalCount": 25 }
}
```
### Get Promotion
**Endpoint:** `GET /api/v1/promotions/{id}`
**Response:** `200 OK` - Full promotion with decision record and approvals
### Approve Promotion
**Endpoint:** `POST /api/v1/promotions/{id}/approve`
**Request:**
```json
{
"comment": "Approved after reviewing security scan results"
}
```
**Response:** `200 OK`
```json
{
"id": "uuid",
"status": "approved",
"approvalCount": 2,
"requiredApprovals": 2,
"decidedAt": "2026-01-10T14:30:00Z"
}
```
**Notes:**
- Separation of Duties (SoD): The user who requested the promotion cannot approve it if `requireSod` is enabled on the environment
- Multi-party approval: Promotion proceeds when `approvalCount >= requiredApprovals`
### Reject Promotion
**Endpoint:** `POST /api/v1/promotions/{id}/reject`
**Request:**
```json
{
"reason": "Security vulnerabilities not addressed"
}
```
**Response:** `200 OK` - Updated promotion with `status: rejected`
### Cancel Promotion
**Endpoint:** `POST /api/v1/promotions/{id}/cancel`
Cancels a pending or awaiting_approval promotion.
**Response:** `200 OK` - Updated promotion with `status: cancelled`
---
## Decision & Evidence Endpoints
### Get Decision Record
**Endpoint:** `GET /api/v1/promotions/{id}/decision`
Returns the full decision record including gate evaluations.
**Response:** `200 OK`
```json
{
"promotionId": "uuid",
"decision": "allow",
"decidedAt": "2026-01-10T14:30:00Z",
"gates": [
{
"gateName": "security-gate",
"passed": true,
"details": {
"criticalCount": 0,
"highCount": 3,
"maxCritical": 0,
"maxHigh": 5
}
},
{
"gateName": "freeze-window-gate",
"passed": true,
"details": {
"activeFreezeWindow": null
}
}
],
"approvals": [
{
"approverId": "uuid",
"approverName": "John Doe",
"decision": "approved",
"comment": "LGTM",
"approvedAt": "2026-01-10T14:28:00Z"
}
]
}
```
### Get Approvals
**Endpoint:** `GET /api/v1/promotions/{id}/approvals`
**Response:** `200 OK` - Array of approval records
### Get Evidence Packet
**Endpoint:** `GET /api/v1/promotions/{id}/evidence`
Returns the signed evidence packet for the promotion decision.
**Response:** `200 OK`
```json
{
"id": "uuid",
"type": "release_decision",
"version": "1.0",
"content": { ... },
"contentHash": "sha256:abc...",
"signature": "base64-signature",
"signatureAlgorithm": "ECDSA-P256-SHA256",
"signerKeyRef": "key-id",
"generatedAt": "2026-01-10T14:30:00Z"
}
```
---
## Gate Preview Endpoints
### Preview Gate Evaluation
**Endpoint:** `POST /api/v1/promotions/preview-gates`
Evaluates gates without creating a promotion (dry run).
**Request:**
```json
{
"releaseId": "uuid",
"targetEnvironmentId": "uuid"
}
```
**Response:** `200 OK`
```json
{
"wouldPass": false,
"gates": [
{
"gateName": "security-gate",
"passed": false,
"blocking": true,
"message": "3 critical vulnerabilities exceed threshold (max: 0)"
},
{
"gateName": "freeze-window-gate",
"passed": true,
"blocking": false,
"message": "No active freeze window"
}
]
}
```
---
## Approval Policy Endpoints
### Create Approval Policy
**Endpoint:** `POST /api/v1/approval-policies`
**Request:**
```json
{
"name": "production-policy",
"environmentId": "uuid",
"requiredApprovals": 2,
"approverGroups": ["release-managers", "sre-team"],
"requireSeparationOfDuties": true,
"autoExpireHours": 24
}
```
### List Approval Policies
**Endpoint:** `GET /api/v1/approval-policies`
### Get Approval Policy
**Endpoint:** `GET /api/v1/approval-policies/{id}`
### Update Approval Policy
**Endpoint:** `PUT /api/v1/approval-policies/{id}`
### Delete Approval Policy
**Endpoint:** `DELETE /api/v1/approval-policies/{id}`
---
## Current User Endpoints
### Get My Pending Approvals
**Endpoint:** `GET /api/v1/my/pending-approvals`
Returns promotions awaiting approval from the current user.
**Response:** `200 OK` - Array of promotions
---
## Error Responses
| Status Code | Description |
|-------------|-------------|
| `400` | Invalid promotion request |
| `403` | User cannot approve (SoD violation or not in approver list) |
| `404` | Promotion not found |
| `409` | Promotion already decided |
| `422` | Gate evaluation failed |
---
## See Also
- [Workflows API](workflows.md)
- [Releases API](releases.md)
- [Promotion Manager Module](../modules/promotion-manager.md)
- [Security Gates](../modules/promotion-manager.md#security-gate)

View File

@@ -0,0 +1,345 @@
# Release Management APIs
> API endpoints for managing components, versions, and release bundles.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 6.3.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Release Manager Module](../modules/release-manager.md), [Integration Hub](../modules/integration-hub.md)
## Overview
The Release Management API provides endpoints for managing container components, version tracking, and release bundle creation. All releases are identified by immutable OCI digests, ensuring cryptographic verification throughout the deployment pipeline.
> **Design Principle:** Release identity is established via digest, not tag. Tags are human-friendly aliases; digests are the source of truth.
---
## Component Endpoints
### Create Component
**Endpoint:** `POST /api/v1/components`
Registers a new container component for release management.
**Request:**
```json
{
"name": "api",
"displayName": "API Service",
"imageRepository": "myorg/api",
"registryIntegrationId": "uuid",
"versioningStrategy": "semver",
"defaultChannel": "stable"
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"name": "api",
"displayName": "API Service",
"imageRepository": "myorg/api",
"registryIntegrationId": "uuid",
"versioningStrategy": "semver",
"createdAt": "2026-01-10T14:23:45Z"
}
```
### List Components
**Endpoint:** `GET /api/v1/components`
**Response:** `200 OK` - Array of components
### Get Component
**Endpoint:** `GET /api/v1/components/{id}`
### Update Component
**Endpoint:** `PUT /api/v1/components/{id}`
### Delete Component
**Endpoint:** `DELETE /api/v1/components/{id}`
### Sync Versions
**Endpoint:** `POST /api/v1/components/{id}/sync-versions`
Triggers a refresh of available versions from the container registry.
**Request:**
```json
{
"forceRefresh": true
}
```
**Response:** `200 OK`
```json
{
"synced": 15,
"versions": [
{
"tag": "v2.3.1",
"digest": "sha256:abc123...",
"semver": "2.3.1",
"channel": "stable",
"pushedAt": "2026-01-09T10:00:00Z"
}
]
}
```
### List Component Versions
**Endpoint:** `GET /api/v1/components/{id}/versions`
**Query Parameters:**
- `channel` (string): Filter by channel (`stable`, `beta`, `rc`)
- `limit` (number): Maximum versions to return
**Response:** `200 OK` - Array of version maps
---
## Version Map Endpoints
### Create Version Map
**Endpoint:** `POST /api/v1/version-maps`
Manually assign a semver and channel to a tag/digest.
**Request:**
```json
{
"componentId": "uuid",
"tag": "v2.3.1",
"semver": "2.3.1",
"channel": "stable"
}
```
**Response:** `201 Created`
### List Version Maps
**Endpoint:** `GET /api/v1/version-maps`
**Query Parameters:**
- `componentId` (UUID): Filter by component
- `channel` (string): Filter by channel
---
## Release Endpoints
### Create Release
**Endpoint:** `POST /api/v1/releases`
Creates a new release bundle with specified component versions.
**Request:**
```json
{
"name": "myapp-v2.3.1",
"displayName": "My App 2.3.1",
"components": [
{ "componentId": "uuid", "version": "2.3.1" },
{ "componentId": "uuid", "digest": "sha256:def456..." },
{ "componentId": "uuid", "channel": "stable" }
],
"sourceRef": {
"scmIntegrationId": "uuid",
"repository": "myorg/myapp",
"branch": "main",
"commitSha": "abc123"
}
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"name": "myapp-v2.3.1",
"displayName": "My App 2.3.1",
"status": "draft",
"components": [
{
"componentId": "uuid",
"componentName": "api",
"version": "2.3.1",
"digest": "sha256:abc123...",
"channel": "stable"
}
],
"createdAt": "2026-01-10T14:23:45Z",
"createdBy": "user-uuid"
}
```
### Create Release from Latest
**Endpoint:** `POST /api/v1/releases/from-latest`
Convenience endpoint to create a release from the latest versions of all (or specified) components.
**Request:**
```json
{
"name": "myapp-latest",
"channel": "stable",
"componentIds": ["uuid1", "uuid2"],
"pinFrom": {
"environmentId": "uuid"
}
}
```
**Response:** `201 Created` - Release with resolved digests
### List Releases
**Endpoint:** `GET /api/v1/releases`
**Query Parameters:**
- `status` (string): Filter by status (`draft`, `ready`, `promoting`, `deployed`, `deprecated`)
- `componentId` (UUID): Filter by component inclusion
- `page` (number): Page number
- `pageSize` (number): Items per page
**Response:** `200 OK`
```json
{
"data": [
{
"id": "uuid",
"name": "myapp-v2.3.1",
"status": "deployed",
"componentCount": 3,
"createdAt": "2026-01-10T14:23:45Z"
}
],
"meta": {
"page": 1,
"pageSize": 20,
"totalCount": 150,
"totalPages": 8
}
}
```
### Get Release
**Endpoint:** `GET /api/v1/releases/{id}`
**Response:** `200 OK` - Full release with component details
### Update Release
**Endpoint:** `PUT /api/v1/releases/{id}`
**Request:**
```json
{
"displayName": "Updated Display Name",
"metadata": { "key": "value" },
"status": "ready"
}
```
### Delete Release
**Endpoint:** `DELETE /api/v1/releases/{id}`
### Get Release State
**Endpoint:** `GET /api/v1/releases/{id}/state`
Returns the deployment state of a release across environments.
**Response:** `200 OK`
```json
{
"environments": [
{
"environmentId": "uuid",
"environmentName": "Development",
"status": "deployed",
"deployedAt": "2026-01-09T10:00:00Z"
},
{
"environmentId": "uuid",
"environmentName": "Staging",
"status": "deployed",
"deployedAt": "2026-01-10T08:00:00Z"
},
{
"environmentId": "uuid",
"environmentName": "Production",
"status": "not_deployed"
}
]
}
```
### Deprecate Release
**Endpoint:** `POST /api/v1/releases/{id}/deprecate`
Marks a release as deprecated, preventing new promotions.
**Response:** `200 OK` - Updated release with `status: deprecated`
### Compare Releases
**Endpoint:** `GET /api/v1/releases/{id}/compare/{otherId}`
Compares two releases to identify component differences.
**Response:** `200 OK`
```json
{
"added": [
{ "componentId": "uuid", "componentName": "worker" }
],
"removed": [
{ "componentId": "uuid", "componentName": "legacy-service" }
],
"changed": [
{
"component": "api",
"fromVersion": "2.3.0",
"toVersion": "2.3.1",
"fromDigest": "sha256:old...",
"toDigest": "sha256:new..."
}
]
}
```
---
## Error Responses
| Status Code | Description |
|-------------|-------------|
| `400` | Invalid release configuration |
| `404` | Release or component not found |
| `409` | Release name already exists |
| `422` | Cannot resolve component version |
---
## See Also
- [Promotions API](promotions.md)
- [Release Manager Module](../modules/release-manager.md)
- [Integration Hub](../modules/integration-hub.md)
- [Design Principles](../design/principles.md)

View File

@@ -0,0 +1,374 @@
# Real-Time APIs (WebSocket/SSE)
> WebSocket and Server-Sent Events endpoints for real-time updates.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 6.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Workflow Execution](../workflow/execution.md), [UI Dashboard](../ui/dashboard.md)
## Overview
The Release Orchestrator provides real-time streaming endpoints for workflow runs, deployment progress, agent tasks, and dashboard metrics. These endpoints support both WebSocket connections and Server-Sent Events (SSE) for browser compatibility.
---
## Authentication
All WebSocket and SSE connections require authentication via JWT token:
**WebSocket:** Token in query parameter or first message
```
ws://api/v1/workflow-runs/{id}/stream?token=jwt-token
```
**SSE:** Token in Authorization header
```
GET /api/v1/dashboard/stream
Authorization: Bearer jwt-token
```
---
## Workflow Run Stream
**Endpoint:** `WS /api/v1/workflow-runs/{id}/stream`
Streams real-time updates for a workflow run including step progress and logs.
### Message Types (Server to Client)
**Step Started:**
```json
{
"type": "step_started",
"nodeId": "security-check",
"stepType": "security-gate",
"timestamp": "2026-01-10T14:23:45Z"
}
```
**Step Progress:**
```json
{
"type": "step_progress",
"nodeId": "deploy",
"progress": 50,
"message": "Deploying to target 3/6"
}
```
**Step Log:**
```json
{
"type": "step_log",
"nodeId": "deploy",
"line": "Pulling image sha256:abc123...",
"level": "info",
"timestamp": "2026-01-10T14:23:50Z"
}
```
**Step Completed:**
```json
{
"type": "step_completed",
"nodeId": "security-check",
"status": "succeeded",
"outputs": {
"criticalCount": 0,
"highCount": 3
},
"duration": 5.2,
"timestamp": "2026-01-10T14:23:50Z"
}
```
**Workflow Completed:**
```json
{
"type": "workflow_completed",
"status": "succeeded",
"duration": 125.5,
"outputs": {
"deploymentId": "uuid"
},
"timestamp": "2026-01-10T14:25:50Z"
}
```
---
## Deployment Job Stream
**Endpoint:** `WS /api/v1/deployment-jobs/{id}/stream`
Streams real-time updates for deployment job execution.
### Message Types (Server to Client)
**Task Started:**
```json
{
"type": "task_started",
"taskId": "uuid",
"targetId": "uuid",
"targetName": "prod-web-01",
"taskType": "docker.pull",
"timestamp": "2026-01-10T14:23:45Z"
}
```
**Task Progress:**
```json
{
"type": "task_progress",
"taskId": "uuid",
"progress": 75,
"message": "Pulling layer 4/5"
}
```
**Task Log:**
```json
{
"type": "task_log",
"taskId": "uuid",
"line": "Container started successfully",
"level": "info"
}
```
**Task Completed:**
```json
{
"type": "task_completed",
"taskId": "uuid",
"targetId": "uuid",
"status": "succeeded",
"duration": 45.2,
"result": {
"containerId": "abc123",
"digest": "sha256:..."
},
"timestamp": "2026-01-10T14:24:30Z"
}
```
**Job Completed:**
```json
{
"type": "job_completed",
"status": "succeeded",
"targetsDeployed": 4,
"targetsFailed": 0,
"duration": 180.5,
"timestamp": "2026-01-10T14:26:45Z"
}
```
---
## Agent Task Stream
**Endpoint:** `WS /api/v1/agents/{id}/task-stream`
Bidirectional stream for agent task assignment and progress reporting.
### Message Types (Server to Agent)
**Task Assigned:**
```json
{
"type": "task_assigned",
"task": {
"taskId": "uuid",
"taskType": "docker.pull",
"payload": {
"image": "myapp",
"digest": "sha256:abc123..."
},
"credentials": {
"registry.username": "user",
"registry.password": "token"
},
"timeout": 300
}
}
```
**Task Cancelled:**
```json
{
"type": "task_cancelled",
"taskId": "uuid",
"reason": "Deployment cancelled by user"
}
```
### Message Types (Agent to Server)
**Task Progress:**
```json
{
"type": "task_progress",
"taskId": "uuid",
"progress": 50,
"message": "Pulling image layer 3/5"
}
```
**Task Log:**
```json
{
"type": "task_log",
"taskId": "uuid",
"level": "info",
"message": "Image layer downloaded: sha256:def456..."
}
```
**Task Completed:**
```json
{
"type": "task_completed",
"taskId": "uuid",
"success": true,
"result": {
"imageId": "sha256:abc123..."
}
}
```
---
## Dashboard Metrics Stream
**Endpoint:** `WS /api/v1/dashboard/stream`
Streams real-time dashboard metrics and alerts.
### Message Types (Server to Client)
**Metric Update:**
```json
{
"type": "metric_update",
"metrics": {
"pipelineStatus": [
{ "environmentId": "uuid", "name": "Production", "health": "healthy" }
],
"pendingApprovals": 3,
"activeDeployments": 1,
"recentReleases": 12,
"systemHealth": {
"agentsOnline": 8,
"agentsTotal": 10,
"queueDepth": 5
}
},
"timestamp": "2026-01-10T14:23:45Z"
}
```
**Alert:**
```json
{
"type": "alert",
"alert": {
"id": "uuid",
"severity": "warning",
"title": "Deployment Failed",
"message": "Deployment to Production failed: health check timeout",
"resourceType": "deployment",
"resourceId": "uuid",
"timestamp": "2026-01-10T14:23:45Z"
}
}
```
**Promotion Update:**
```json
{
"type": "promotion_update",
"promotion": {
"id": "uuid",
"releaseName": "myapp-v2.3.1",
"targetEnvironment": "Production",
"status": "awaiting_approval",
"requestedBy": "John Doe"
}
}
```
---
## Connection Management
### Reconnection
Clients should implement exponential backoff reconnection:
```javascript
const connect = (retryCount = 0) => {
const ws = new WebSocket(url);
ws.onclose = () => {
const delay = Math.min(1000 * Math.pow(2, retryCount), 30000);
setTimeout(() => connect(retryCount + 1), delay);
};
ws.onopen = () => {
retryCount = 0;
};
};
```
### Heartbeat
WebSocket connections receive periodic heartbeat messages:
```json
{
"type": "heartbeat",
"timestamp": "2026-01-10T14:23:45Z"
}
```
Clients should respond with:
```json
{
"type": "pong"
}
```
Connections without pong response within 30 seconds are terminated.
---
## Error Messages
```json
{
"type": "error",
"code": "unauthorized",
"message": "Token expired",
"timestamp": "2026-01-10T14:23:45Z"
}
```
| Error Code | Description |
|------------|-------------|
| `unauthorized` | Invalid or expired token |
| `forbidden` | No access to resource |
| `not_found` | Resource not found |
| `rate_limited` | Too many connections |
| `internal_error` | Server error |
---
## See Also
- [Workflows API](workflows.md)
- [Agents API](agents.md)
- [UI Dashboard](../ui/dashboard.md)
- [Workflow Execution](../workflow/execution.md)

View File

@@ -0,0 +1,354 @@
# Workflow APIs
> API endpoints for managing workflow templates, step registry, and workflow runs.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 6.3.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Workflow Engine Module](../modules/workflow-engine.md), [Workflow Templates](../workflow/templates.md)
## Overview
The Workflow API provides endpoints for managing workflow templates (DAG definitions), discovering available step types, and executing workflow runs. Workflows are directed acyclic graphs (DAGs) of steps that orchestrate promotions, deployments, and other automation tasks.
---
## Workflow Template Endpoints
### Create Workflow Template
**Endpoint:** `POST /api/v1/workflow-templates`
**Request:**
```json
{
"name": "standard-promotion",
"displayName": "Standard Promotion Workflow",
"description": "Default workflow for promoting releases",
"nodes": [
{
"id": "security-check",
"type": "security-gate",
"name": "Security Check",
"config": {
"maxCritical": 0,
"maxHigh": 5
},
"position": { "x": 100, "y": 100 }
},
{
"id": "approval",
"type": "approval",
"name": "Manager Approval",
"config": {
"approvers": ["manager-group"],
"minApprovals": 1
},
"position": { "x": 300, "y": 100 }
},
{
"id": "deploy",
"type": "deploy",
"name": "Deploy to Target",
"config": {
"strategy": "rolling",
"batchSize": "25%"
},
"position": { "x": 500, "y": 100 }
}
],
"edges": [
{ "from": "security-check", "to": "approval" },
{ "from": "approval", "to": "deploy" }
],
"inputs": [
{ "name": "releaseId", "type": "uuid", "required": true },
{ "name": "environmentId", "type": "uuid", "required": true }
],
"outputs": [
{ "name": "deploymentId", "type": "uuid" }
]
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"name": "standard-promotion",
"displayName": "Standard Promotion Workflow",
"version": 1,
"nodeCount": 3,
"isActive": true,
"createdAt": "2026-01-10T14:23:45Z"
}
```
### List Workflow Templates
**Endpoint:** `GET /api/v1/workflow-templates`
**Query Parameters:**
- `includeBuiltin` (boolean): Include system-provided templates
- `tags` (string): Filter by tags
**Response:** `200 OK` - Array of workflow templates
### Get Workflow Template
**Endpoint:** `GET /api/v1/workflow-templates/{id}`
**Response:** `200 OK` - Full template with nodes and edges
### Update Workflow Template
**Endpoint:** `PUT /api/v1/workflow-templates/{id}`
Creates a new version of the template.
**Request:** Partial or full template definition
**Response:** `200 OK` - New version of template
### Delete Workflow Template
**Endpoint:** `DELETE /api/v1/workflow-templates/{id}`
**Response:** `200 OK`
```json
{ "deleted": true }
```
### Validate Workflow Template
**Endpoint:** `POST /api/v1/workflow-templates/{id}/validate`
Validates a template with sample inputs.
**Request:**
```json
{
"inputs": {
"releaseId": "sample-uuid",
"environmentId": "sample-uuid"
}
}
```
**Response:** `200 OK`
```json
{
"valid": true,
"errors": []
}
```
Or on validation failure:
```json
{
"valid": false,
"errors": [
{ "nodeId": "deploy", "field": "config.strategy", "message": "Invalid strategy: unknown" },
{ "type": "dag", "message": "Cycle detected: node-a -> node-b -> node-a" }
]
}
```
---
## Step Registry Endpoints
### List Step Types
**Endpoint:** `GET /api/v1/step-types`
Lists all available step types from core and plugins.
**Query Parameters:**
- `category` (string): Filter by category (`deployment`, `gate`, `notification`, `utility`)
- `provider` (string): Filter by provider (`builtin`, `plugin-id`)
**Response:** `200 OK`
```json
[
{
"type": "script",
"displayName": "Script",
"description": "Execute shell script on target",
"category": "utility",
"provider": "builtin",
"configSchema": { ... }
},
{
"type": "security-gate",
"displayName": "Security Gate",
"description": "Check vulnerability thresholds",
"category": "gate",
"provider": "builtin",
"configSchema": { ... }
}
]
```
### Get Step Type
**Endpoint:** `GET /api/v1/step-types/{type}`
**Response:** `200 OK` - Full step type with configuration schema
---
## Workflow Run Endpoints
### Start Workflow Run
**Endpoint:** `POST /api/v1/workflow-runs`
**Request:**
```json
{
"templateId": "uuid",
"context": {
"releaseId": "uuid",
"environmentId": "uuid",
"variables": {
"deploymentTimeout": 600
}
}
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"templateId": "uuid",
"templateVersion": 1,
"status": "running",
"startedAt": "2026-01-10T14:23:45Z"
}
```
### List Workflow Runs
**Endpoint:** `GET /api/v1/workflow-runs`
**Query Parameters:**
- `status` (string): Filter by status (`pending`, `running`, `succeeded`, `failed`, `cancelled`)
- `templateId` (UUID): Filter by template
- `page` (number): Page number
**Response:** `200 OK`
```json
{
"data": [
{
"id": "uuid",
"templateName": "standard-promotion",
"status": "running",
"progress": 66,
"startedAt": "2026-01-10T14:23:45Z"
}
],
"meta": { "page": 1, "totalCount": 50 }
}
```
### Get Workflow Run
**Endpoint:** `GET /api/v1/workflow-runs/{id}`
**Response:** `200 OK` - Full run with step statuses
### Pause Workflow Run
**Endpoint:** `POST /api/v1/workflow-runs/{id}/pause`
Pauses a running workflow at the next step boundary.
**Response:** `200 OK` - Updated workflow run
### Resume Workflow Run
**Endpoint:** `POST /api/v1/workflow-runs/{id}/resume`
Resumes a paused workflow.
**Response:** `200 OK` - Updated workflow run
### Cancel Workflow Run
**Endpoint:** `POST /api/v1/workflow-runs/{id}/cancel`
Cancels a running or paused workflow.
**Response:** `200 OK` - Updated workflow run
### List Step Runs
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps`
**Response:** `200 OK`
```json
[
{
"nodeId": "security-check",
"stepType": "security-gate",
"status": "succeeded",
"startedAt": "2026-01-10T14:23:45Z",
"completedAt": "2026-01-10T14:23:50Z"
},
{
"nodeId": "approval",
"stepType": "approval",
"status": "running",
"startedAt": "2026-01-10T14:23:50Z"
}
]
```
### Get Step Run
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}`
**Response:** `200 OK` - Step run with logs
### Get Step Logs
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs`
**Query Parameters:**
- `follow` (boolean): Stream logs in real-time via SSE
**Response:** `200 OK` - Log content or SSE stream
### List Step Artifacts
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts`
**Response:** `200 OK` - Array of artifacts
### Download Artifact
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts/{artifactId}`
**Response:** Binary download
---
## Error Responses
| Status Code | Description |
|-------------|-------------|
| `400` | Invalid workflow template |
| `404` | Template or run not found |
| `409` | Workflow already running |
| `422` | DAG validation failed |
---
## See Also
- [WebSocket APIs](websockets.md) - Real-time workflow updates
- [Workflow Engine Module](../modules/workflow-engine.md)
- [Workflow Templates](../workflow/templates.md)
- [Workflow Execution](../workflow/execution.md)

View File

@@ -0,0 +1,224 @@
# Configuration Reference
> Environment variables and OPA policy examples for the Release Orchestrator.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 15.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Security Overview](../security/overview.md), [Promotion Manager](../modules/promotion-manager.md)
**Sprint:** [101_001 Foundation](../../../../implplan/SPRINT_20260110_101_001_DB_schema_core_tables.md)
## Overview
This document provides the configuration reference for the Release Orchestrator, including environment variables and OPA policy examples.
---
## Environment Variables
### Core Configuration
```bash
# Database
STELLA_DATABASE_URL=postgresql://user:pass@host:5432/stella
STELLA_REDIS_URL=redis://host:6379
STELLA_SECRET_KEY=base64-encoded-32-bytes
STELLA_LOG_LEVEL=info
STELLA_LOG_FORMAT=json
```
### Authentication (Authority)
```bash
# OAuth/OIDC
STELLA_OAUTH_ISSUER=https://auth.example.com
STELLA_OAUTH_CLIENT_ID=stella-app
STELLA_OAUTH_CLIENT_SECRET=secret
```
### Agents
```bash
# Agent TLS
STELLA_AGENT_LISTEN_PORT=8443
STELLA_AGENT_TLS_CERT=/path/to/cert.pem
STELLA_AGENT_TLS_KEY=/path/to/key.pem
STELLA_AGENT_CA_CERT=/path/to/ca.pem
```
### Plugins
```bash
# Plugin configuration
STELLA_PLUGIN_DIR=/var/stella/plugins
STELLA_PLUGIN_SANDBOX_MEMORY=512m
STELLA_PLUGIN_SANDBOX_CPU=1
```
### Integrations
```bash
# Vault integration
STELLA_VAULT_ADDR=https://vault.example.com
STELLA_VAULT_TOKEN=hvs.xxx
```
---
## Full Configuration File
```yaml
# stella-config.yaml
database:
url: postgresql://user:pass@host:5432/stella
pool_size: 20
ssl_mode: require
redis:
url: redis://host:6379
prefix: stella
auth:
issuer: https://auth.example.com
client_id: stella-app
client_secret_ref: vault://secrets/oauth-client-secret
agents:
listen_port: 8443
tls:
cert_path: /etc/stella/agent.crt
key_path: /etc/stella/agent.key
ca_path: /etc/stella/ca.crt
heartbeat_interval: 30
task_timeout: 600
plugins:
directory: /var/stella/plugins
sandbox:
memory: 512m
cpu: 1
network: restricted
evidence:
storage_path: /var/stella/evidence
signing_key_ref: vault://secrets/evidence-signing-key
retention_days: 2555 # 7 years
logging:
level: info
format: json
output: stdout
telemetry:
enabled: true
otlp_endpoint: otel-collector:4317
service_name: stella-release-orchestrator
```
---
## OPA Policy Examples
### Security Gate Policy
```rego
# security_gate.rego
package stella.gates.security
default allow = false
allow {
input.release.components[_].security.reachable_critical == 0
input.release.components[_].security.reachable_high == 0
}
deny[msg] {
component := input.release.components[_]
component.security.reachable_critical > 0
msg := sprintf("Component %s has %d reachable critical vulnerabilities",
[component.name, component.security.reachable_critical])
}
```
### Approval Gate Policy
```rego
# approval_gate.rego
package stella.gates.approval
default allow = false
allow {
count(input.approvals) >= input.environment.required_approvals
separation_of_duties_met
}
separation_of_duties_met {
not input.environment.require_sod
}
separation_of_duties_met {
input.environment.require_sod
approver_ids := {a.approver_id | a := input.approvals[_]; a.action == "approved"}
not input.promotion.requested_by in approver_ids
}
```
### Freeze Window Gate Policy
```rego
# freeze_window_gate.rego
package stella.gates.freeze
default allow = true
allow = false {
window := input.environment.freeze_windows[_]
time.now_ns() >= time.parse_rfc3339_ns(window.start)
time.now_ns() <= time.parse_rfc3339_ns(window.end)
not input.promotion.requested_by in window.exceptions
}
```
---
## API Error Codes
| Code | HTTP Status | Description |
|------|-------------|-------------|
| `RELEASE_NOT_FOUND` | 404 | Release with specified ID does not exist |
| `ENVIRONMENT_NOT_FOUND` | 404 | Environment with specified ID does not exist |
| `PROMOTION_BLOCKED` | 403 | Promotion blocked by policy gates |
| `APPROVAL_REQUIRED` | 403 | Additional approvals required |
| `FREEZE_WINDOW_ACTIVE` | 403 | Environment is in freeze window |
| `DIGEST_MISMATCH` | 400 | Image digest does not match expected |
| `AGENT_OFFLINE` | 503 | Required agent is offline |
| `WORKFLOW_FAILED` | 500 | Workflow execution failed |
| `PLUGIN_ERROR` | 500 | Plugin returned an error |
| `QUOTA_EXCEEDED` | 429 | Digest analysis quota exceeded |
| `VALIDATION_ERROR` | 400 | Request validation failed |
| `UNAUTHORIZED` | 401 | Authentication required |
| `FORBIDDEN` | 403 | Insufficient permissions |
---
## Default Values
| Setting | Default | Description |
|---------|---------|-------------|
| Agent heartbeat interval | 30s | Frequency of agent heartbeats |
| Task timeout | 600s | Maximum time for agent task |
| Deployment batch size | 25% | Percentage of targets per batch |
| Health check timeout | 60s | Timeout for health checks |
| Evidence retention | 7 years | Audit compliance requirement |
| Max workflow steps | 50 | Maximum steps per workflow |
| Max parallel tasks | 10 | Per-agent concurrent tasks |
---
## See Also
- [Security Overview](../security/overview.md)
- [Promotion Manager](../modules/promotion-manager.md)
- [Database Schema](../data-model/schema.md)
- [Glossary](glossary.md)

View File

@@ -0,0 +1,296 @@
# API Error Codes
## Overview
All API errors follow a consistent format with error codes for programmatic handling.
## Error Response Format
```typescript
interface ApiErrorResponse {
success: false;
error: {
code: string; // Machine-readable error code
message: string; // Human-readable message
details?: object; // Additional context
validationErrors?: ValidationError[];
};
meta: {
requestId: string;
timestamp: string;
};
}
interface ValidationError {
field: string;
message: string;
code: string;
}
```
## Error Code Categories
| Prefix | Category | HTTP Status Range |
|--------|----------|-------------------|
| `AUTH_` | Authentication | 401 |
| `PERM_` | Authorization/Permission | 403 |
| `VAL_` | Validation | 400 |
| `RES_` | Resource | 404, 409 |
| `ENV_` | Environment | 422 |
| `REL_` | Release | 422 |
| `PROM_` | Promotion | 422 |
| `DEPLOY_` | Deployment | 422 |
| `GATE_` | Gate | 422 |
| `AGT_` | Agent | 422 |
| `INT_` | Integration | 422 |
| `WF_` | Workflow | 422 |
| `SYS_` | System | 500 |
## Authentication Errors (401)
| Code | Message | Description |
|------|---------|-------------|
| `AUTH_TOKEN_MISSING` | Authentication token required | No token provided |
| `AUTH_TOKEN_INVALID` | Invalid authentication token | Token cannot be parsed |
| `AUTH_TOKEN_EXPIRED` | Authentication token expired | Token has expired |
| `AUTH_TOKEN_REVOKED` | Authentication token revoked | Token has been revoked |
| `AUTH_AGENT_CERT_INVALID` | Invalid agent certificate | Agent mTLS cert invalid |
| `AUTH_AGENT_CERT_EXPIRED` | Agent certificate expired | Agent cert has expired |
| `AUTH_API_KEY_INVALID` | Invalid API key | API key not recognized |
## Permission Errors (403)
| Code | Message | Description |
|------|---------|-------------|
| `PERM_DENIED` | Permission denied | Generic permission denial |
| `PERM_RESOURCE_DENIED` | Access to resource denied | Cannot access specific resource |
| `PERM_ACTION_DENIED` | Action not permitted | Cannot perform specific action |
| `PERM_SCOPE_DENIED` | Outside permitted scope | Action outside user's scope |
| `PERM_SOD_VIOLATION` | Separation of duties violation | SoD prevents action |
| `PERM_SELF_APPROVAL` | Cannot approve own request | Self-approval not allowed |
| `PERM_TENANT_MISMATCH` | Tenant mismatch | Resource belongs to different tenant |
## Validation Errors (400)
| Code | Message | Description |
|------|---------|-------------|
| `VAL_REQUIRED_FIELD` | Required field missing | Field is required |
| `VAL_INVALID_FORMAT` | Invalid field format | Field format incorrect |
| `VAL_INVALID_VALUE` | Invalid field value | Value not in allowed set |
| `VAL_TOO_LONG` | Field value too long | Exceeds max length |
| `VAL_TOO_SHORT` | Field value too short | Below min length |
| `VAL_INVALID_UUID` | Invalid UUID format | Not a valid UUID |
| `VAL_INVALID_DIGEST` | Invalid digest format | Not a valid OCI digest |
| `VAL_INVALID_SEMVER` | Invalid semver format | Not valid semantic version |
| `VAL_INVALID_JSON` | Invalid JSON | Request body not valid JSON |
| `VAL_SCHEMA_MISMATCH` | Schema validation failed | Doesn't match schema |
## Resource Errors (404, 409)
| Code | Message | HTTP | Description |
|------|---------|------|-------------|
| `RES_NOT_FOUND` | Resource not found | 404 | Generic not found |
| `RES_ENVIRONMENT_NOT_FOUND` | Environment not found | 404 | Environment doesn't exist |
| `RES_RELEASE_NOT_FOUND` | Release not found | 404 | Release doesn't exist |
| `RES_PROMOTION_NOT_FOUND` | Promotion not found | 404 | Promotion doesn't exist |
| `RES_TARGET_NOT_FOUND` | Target not found | 404 | Target doesn't exist |
| `RES_AGENT_NOT_FOUND` | Agent not found | 404 | Agent doesn't exist |
| `RES_CONFLICT` | Resource conflict | 409 | Resource state conflict |
| `RES_ALREADY_EXISTS` | Resource already exists | 409 | Duplicate resource |
| `RES_VERSION_CONFLICT` | Version conflict | 409 | Optimistic lock failure |
## Environment Errors (422)
| Code | Message | Description |
|------|---------|-------------|
| `ENV_FROZEN` | Environment is frozen | Deployment blocked by freeze window |
| `ENV_FREEZE_ACTIVE` | Active freeze window | Cannot modify during freeze |
| `ENV_INVALID_ORDER` | Invalid environment order | Order index conflict |
| `ENV_CIRCULAR_PROMOTION` | Circular promotion path | Auto-promote creates cycle |
| `ENV_QUOTA_EXCEEDED` | Environment quota exceeded | Max environments reached |
## Release Errors (422)
| Code | Message | Description |
|------|---------|-------------|
| `REL_ALREADY_FINALIZED` | Release already finalized | Cannot modify finalized release |
| `REL_NOT_READY` | Release not ready | Release not in ready state |
| `REL_DIGEST_MISMATCH` | Digest mismatch | Resolved digest differs |
| `REL_TAG_NOT_FOUND` | Tag not found in registry | Cannot resolve tag |
| `REL_COMPONENT_MISSING` | Component not found | Referenced component missing |
| `REL_INVALID_STATUS_TRANSITION` | Invalid status transition | Status change not allowed |
| `REL_DEPRECATED` | Release deprecated | Cannot promote deprecated release |
## Promotion Errors (422)
| Code | Message | Description |
|------|---------|-------------|
| `PROM_ALREADY_EXISTS` | Promotion already pending | Duplicate promotion request |
| `PROM_NOT_PENDING` | Promotion not pending | Cannot approve/reject |
| `PROM_ALREADY_APPROVED` | Promotion already approved | Already approved |
| `PROM_ALREADY_REJECTED` | Promotion already rejected | Already rejected |
| `PROM_ALREADY_CANCELLED` | Promotion already cancelled | Already cancelled |
| `PROM_DEPLOYING` | Promotion is deploying | Cannot cancel during deploy |
| `PROM_INVALID_STATE` | Invalid promotion state | State doesn't allow action |
| `PROM_APPROVER_REQUIRED` | Additional approvers required | Insufficient approvals |
| `PROM_SKIP_ENVIRONMENT` | Cannot skip environments | Must promote sequentially |
## Deployment Errors (422)
| Code | Message | Description |
|------|---------|-------------|
| `DEPLOY_IN_PROGRESS` | Deployment in progress | Another deployment running |
| `DEPLOY_NO_TARGETS` | No targets available | No targets in environment |
| `DEPLOY_TARGET_UNHEALTHY` | Target unhealthy | Target failed health check |
| `DEPLOY_AGENT_UNAVAILABLE` | Agent unavailable | Required agent offline |
| `DEPLOY_ARTIFACT_MISSING` | Deployment artifact missing | Required artifact not found |
| `DEPLOY_TIMEOUT` | Deployment timeout | Exceeded timeout |
| `DEPLOY_PULL_FAILED` | Image pull failed | Cannot pull container image |
| `DEPLOY_DIGEST_VERIFICATION_FAILED` | Digest verification failed | Image tampered |
| `DEPLOY_HEALTH_CHECK_FAILED` | Health check failed | Post-deploy health failed |
| `DEPLOY_ROLLBACK_IN_PROGRESS` | Rollback in progress | Already rolling back |
| `DEPLOY_NOTHING_TO_ROLLBACK` | Nothing to rollback | No previous deployment |
## Gate Errors (422)
| Code | Message | Description |
|------|---------|-------------|
| `GATE_EVALUATION_FAILED` | Gate evaluation failed | Gate cannot be evaluated |
| `GATE_SECURITY_BLOCKED` | Blocked by security gate | Security policy violation |
| `GATE_POLICY_BLOCKED` | Blocked by policy gate | Custom policy violation |
| `GATE_APPROVAL_BLOCKED` | Blocked pending approval | Awaiting approval |
| `GATE_TIMEOUT` | Gate evaluation timeout | Evaluation exceeded timeout |
## Agent Errors (422)
| Code | Message | Description |
|------|---------|-------------|
| `AGT_REGISTRATION_FAILED` | Agent registration failed | Cannot register agent |
| `AGT_TOKEN_INVALID` | Invalid registration token | Bad or expired token |
| `AGT_TOKEN_USED` | Registration token already used | One-time token reused |
| `AGT_CERTIFICATE_FAILED` | Certificate issuance failed | Cannot issue certificate |
| `AGT_OFFLINE` | Agent offline | Agent not responding |
| `AGT_CAPABILITY_MISSING` | Missing capability | Agent lacks required capability |
| `AGT_TASK_FAILED` | Task execution failed | Agent task failed |
| `AGT_HEARTBEAT_TIMEOUT` | Heartbeat timeout | Agent heartbeat overdue |
## Integration Errors (422)
| Code | Message | Description |
|------|---------|-------------|
| `INT_CONNECTION_FAILED` | Connection failed | Cannot connect to integration |
| `INT_AUTH_FAILED` | Authentication failed | Integration auth failed |
| `INT_RATE_LIMITED` | Rate limited | Integration rate limit hit |
| `INT_TIMEOUT` | Integration timeout | Request timeout |
| `INT_INVALID_RESPONSE` | Invalid response | Unexpected response format |
| `INT_RESOURCE_NOT_FOUND` | External resource not found | Registry/SCM resource missing |
## Workflow Errors (422)
| Code | Message | Description |
|------|---------|-------------|
| `WF_TEMPLATE_NOT_FOUND` | Workflow template not found | Template doesn't exist |
| `WF_TEMPLATE_INVALID` | Invalid workflow template | Template validation failed |
| `WF_CYCLE_DETECTED` | Cycle detected in workflow | DAG contains cycle |
| `WF_STEP_FAILED` | Workflow step failed | Step execution failed |
| `WF_ALREADY_RUNNING` | Workflow already running | Duplicate workflow run |
| `WF_INVALID_STATE` | Invalid workflow state | Cannot perform action |
| `WF_EXPRESSION_ERROR` | Expression evaluation error | Bad expression |
## System Errors (500)
| Code | Message | Description |
|------|---------|-------------|
| `SYS_INTERNAL_ERROR` | Internal server error | Unexpected error |
| `SYS_DATABASE_ERROR` | Database error | Database operation failed |
| `SYS_STORAGE_ERROR` | Storage error | Storage operation failed |
| `SYS_VAULT_ERROR` | Vault error | Secret retrieval failed |
| `SYS_QUEUE_ERROR` | Queue error | Message queue failed |
| `SYS_SERVICE_UNAVAILABLE` | Service unavailable | Dependency unavailable |
| `SYS_OVERLOADED` | System overloaded | Capacity exceeded |
## Example Error Responses
### Validation Error
```json
{
"success": false,
"error": {
"code": "VAL_REQUIRED_FIELD",
"message": "Validation failed",
"validationErrors": [
{
"field": "releaseId",
"message": "Release ID is required",
"code": "VAL_REQUIRED_FIELD"
},
{
"field": "targetEnvironmentId",
"message": "Invalid UUID format",
"code": "VAL_INVALID_UUID"
}
]
},
"meta": {
"requestId": "req-12345",
"timestamp": "2026-01-10T14:30:00Z"
}
}
```
### Permission Error
```json
{
"success": false,
"error": {
"code": "PERM_SOD_VIOLATION",
"message": "Separation of duties violation: requester cannot approve their own promotion",
"details": {
"promotionId": "promo-uuid",
"requesterId": "user-uuid",
"approverId": "user-uuid",
"environmentId": "env-uuid",
"requiresSoD": true
}
},
"meta": {
"requestId": "req-12345",
"timestamp": "2026-01-10T14:30:00Z"
}
}
```
### Gate Block Error
```json
{
"success": false,
"error": {
"code": "GATE_SECURITY_BLOCKED",
"message": "Promotion blocked by security gate",
"details": {
"gateName": "security-gate",
"releaseId": "rel-uuid",
"targetEnvironment": "production",
"violations": [
{
"type": "critical_vulnerability",
"count": 3,
"threshold": 0
}
]
}
},
"meta": {
"requestId": "req-12345",
"timestamp": "2026-01-10T14:30:00Z"
}
}
```
## References
- [API Overview](../api/overview.md)
- [Security Overview](../security/overview.md)

View File

@@ -0,0 +1,549 @@
# Evidence Packet Schema
## Overview
Evidence packets are cryptographically signed, immutable records of deployment decisions and outcomes. They provide audit-grade proof of who did what, when, and why.
## Evidence Packet Types
| Type | Description | Generated When |
|------|-------------|----------------|
| `release_decision` | Promotion decision evidence | Promotion approved/rejected |
| `deployment` | Deployment execution evidence | Deployment completes |
| `rollback` | Rollback evidence | Rollback completes |
| `ab_promotion` | A/B release promotion evidence | A/B promotion completes |
## Schema Definition
### Evidence Packet Structure
```typescript
interface EvidencePacket {
// Identification
id: UUID;
version: "1.0";
type: EvidencePacketType;
// Metadata
generatedAt: DateTime;
generatorVersion: string;
tenantId: UUID;
// Content
content: EvidenceContent;
// Integrity
contentHash: string; // SHA-256 of canonical JSON content
signature: string; // Base64-encoded signature
signatureAlgorithm: string; // "RS256", "ES256"
signerKeyRef: string; // Reference to signing key
}
type EvidencePacketType =
| "release_decision"
| "deployment"
| "rollback"
| "ab_promotion";
```
### Evidence Content
```typescript
interface EvidenceContent {
// What was released
release: ReleaseEvidence;
// Where it was released
environment: EnvironmentEvidence;
// Who requested and approved
actors: ActorEvidence;
// Why it was allowed
decision: DecisionEvidence;
// How it was executed (deployment only)
execution?: ExecutionEvidence;
// Previous state (for rollback)
previous?: PreviousStateEvidence;
}
```
### Release Evidence
```typescript
interface ReleaseEvidence {
id: UUID;
name: string;
displayName: string;
createdAt: DateTime;
createdBy: ActorRef;
components: Array<{
id: UUID;
name: string;
digest: string;
semver: string;
tag: string;
role: "primary" | "sidecar" | "init" | "migration";
}>;
sourceRef?: {
scmIntegrationId?: UUID;
repository?: string;
commitSha?: string;
branch?: string;
ciIntegrationId?: UUID;
buildId?: string;
pipelineUrl?: string;
};
}
```
### Environment Evidence
```typescript
interface EnvironmentEvidence {
id: UUID;
name: string;
displayName: string;
orderIndex: number;
targets: Array<{
id: UUID;
name: string;
type: string;
healthStatus: string;
}>;
configuration: {
requiredApprovals: number;
requireSeparationOfDuties: boolean;
promotionPolicy?: string;
deploymentTimeout: number;
};
}
```
### Actor Evidence
```typescript
interface ActorEvidence {
requester: ActorRef;
requestReason: string;
requestedAt: DateTime;
approvers: Array<{
actor: ActorRef;
action: "approved" | "rejected";
comment?: string;
timestamp: DateTime;
roles: string[];
}>;
deployer?: {
agent: AgentRef;
triggeredBy: ActorRef;
startedAt: DateTime;
};
}
interface ActorRef {
id: UUID;
type: "user" | "system" | "agent";
name: string;
email?: string;
}
interface AgentRef {
id: UUID;
name: string;
version: string;
}
```
### Decision Evidence
```typescript
interface DecisionEvidence {
promotionId: UUID;
decision: "allow" | "block";
decidedAt: DateTime;
gateResults: Array<{
gateName: string;
gateType: string;
passed: boolean;
blocking: boolean;
message: string;
evaluatedAt: DateTime;
details: object;
}>;
freezeWindowCheck: {
checked: boolean;
windowActive: boolean;
windowId?: UUID;
exemption?: {
grantedBy: UUID;
reason: string;
};
};
separationOfDuties: {
required: boolean;
satisfied: boolean;
requesterIds: UUID[];
approverIds: UUID[];
};
}
```
### Execution Evidence
```typescript
interface ExecutionEvidence {
deploymentJobId: UUID;
strategy: string;
startedAt: DateTime;
completedAt: DateTime;
status: "succeeded" | "failed" | "rolled_back";
tasks: Array<{
targetId: UUID;
targetName: string;
agentId: UUID;
status: string;
startedAt: DateTime;
completedAt: DateTime;
digest: string;
stickerWritten: boolean;
error?: string;
}>;
artifacts: Array<{
name: string;
type: string;
sha256: string;
storageRef: string;
}>;
metrics: {
totalTasks: number;
succeededTasks: number;
failedTasks: number;
totalDurationSeconds: number;
};
}
```
### Previous State Evidence
```typescript
interface PreviousStateEvidence {
releaseId: UUID;
releaseName: string;
deployedAt: DateTime;
deployedBy: ActorRef;
components: Array<{
name: string;
digest: string;
}>;
}
```
## Example Evidence Packet
```json
{
"id": "evid-12345-uuid",
"version": "1.0",
"type": "deployment",
"generatedAt": "2026-01-10T14:35:00Z",
"generatorVersion": "stella-evidence-generator@1.5.0",
"tenantId": "tenant-uuid",
"content": {
"release": {
"id": "rel-uuid",
"name": "myapp-v2.3.1",
"displayName": "MyApp v2.3.1",
"createdAt": "2026-01-10T10:00:00Z",
"createdBy": {
"id": "user-uuid",
"type": "user",
"name": "John Doe",
"email": "john@example.com"
},
"components": [
{
"id": "comp-api-uuid",
"name": "api",
"digest": "sha256:abc123def456...",
"semver": "2.3.1",
"tag": "v2.3.1",
"role": "primary"
},
{
"id": "comp-worker-uuid",
"name": "worker",
"digest": "sha256:789xyz...",
"semver": "2.3.1",
"tag": "v2.3.1",
"role": "primary"
}
],
"sourceRef": {
"repository": "github.com/myorg/myapp",
"commitSha": "abc123",
"branch": "main",
"buildId": "build-456"
}
},
"environment": {
"id": "env-prod-uuid",
"name": "production",
"displayName": "Production",
"orderIndex": 2,
"targets": [
{
"id": "target-1-uuid",
"name": "prod-web-01",
"type": "compose_host",
"healthStatus": "healthy"
},
{
"id": "target-2-uuid",
"name": "prod-web-02",
"type": "compose_host",
"healthStatus": "healthy"
}
],
"configuration": {
"requiredApprovals": 2,
"requireSeparationOfDuties": true,
"deploymentTimeout": 600
}
},
"actors": {
"requester": {
"id": "user-john-uuid",
"type": "user",
"name": "John Doe",
"email": "john@example.com"
},
"requestReason": "Release v2.3.1 with performance improvements",
"requestedAt": "2026-01-10T12:00:00Z",
"approvers": [
{
"actor": {
"id": "user-jane-uuid",
"type": "user",
"name": "Jane Smith",
"email": "jane@example.com"
},
"action": "approved",
"comment": "LGTM, tests passed",
"timestamp": "2026-01-10T13:00:00Z",
"roles": ["release_manager"]
},
{
"actor": {
"id": "user-bob-uuid",
"type": "user",
"name": "Bob Johnson",
"email": "bob@example.com"
},
"action": "approved",
"comment": "Approved for production",
"timestamp": "2026-01-10T13:30:00Z",
"roles": ["approver"]
}
],
"deployer": {
"agent": {
"id": "agent-prod-uuid",
"name": "prod-agent-01",
"version": "1.5.0"
},
"triggeredBy": {
"id": "system",
"type": "system",
"name": "Stella Orchestrator"
},
"startedAt": "2026-01-10T14:00:00Z"
}
},
"decision": {
"promotionId": "promo-uuid",
"decision": "allow",
"decidedAt": "2026-01-10T13:55:00Z",
"gateResults": [
{
"gateName": "security-gate",
"gateType": "security",
"passed": true,
"blocking": true,
"message": "No critical or high vulnerabilities",
"evaluatedAt": "2026-01-10T13:50:00Z",
"details": {
"critical": 0,
"high": 0,
"medium": 5,
"low": 12
}
},
{
"gateName": "approval-gate",
"gateType": "approval",
"passed": true,
"blocking": true,
"message": "2/2 required approvals received",
"evaluatedAt": "2026-01-10T13:55:00Z",
"details": {
"required": 2,
"received": 2
}
}
],
"freezeWindowCheck": {
"checked": true,
"windowActive": false
},
"separationOfDuties": {
"required": true,
"satisfied": true,
"requesterIds": ["user-john-uuid"],
"approverIds": ["user-jane-uuid", "user-bob-uuid"]
}
},
"execution": {
"deploymentJobId": "job-uuid",
"strategy": "rolling",
"startedAt": "2026-01-10T14:00:00Z",
"completedAt": "2026-01-10T14:35:00Z",
"status": "succeeded",
"tasks": [
{
"targetId": "target-1-uuid",
"targetName": "prod-web-01",
"agentId": "agent-prod-uuid",
"status": "succeeded",
"startedAt": "2026-01-10T14:00:00Z",
"completedAt": "2026-01-10T14:15:00Z",
"digest": "sha256:abc123def456...",
"stickerWritten": true
},
{
"targetId": "target-2-uuid",
"targetName": "prod-web-02",
"agentId": "agent-prod-uuid",
"status": "succeeded",
"startedAt": "2026-01-10T14:20:00Z",
"completedAt": "2026-01-10T14:35:00Z",
"digest": "sha256:abc123def456...",
"stickerWritten": true
}
],
"artifacts": [
{
"name": "compose.stella.lock.yml",
"type": "compose-lock",
"sha256": "checksum...",
"storageRef": "s3://artifacts/job-uuid/compose.stella.lock.yml"
}
],
"metrics": {
"totalTasks": 2,
"succeededTasks": 2,
"failedTasks": 0,
"totalDurationSeconds": 2100
}
}
},
"contentHash": "sha256:content-hash...",
"signature": "base64-signature...",
"signatureAlgorithm": "RS256",
"signerKeyRef": "stella/signing/prod-key-2026"
}
```
## Signature Verification
```typescript
async function verifyEvidencePacket(packet: EvidencePacket): Promise<VerificationResult> {
// 1. Verify content hash
const canonicalContent = canonicalize(packet.content);
const computedHash = sha256(canonicalContent);
if (computedHash !== packet.contentHash) {
return { valid: false, error: "Content hash mismatch" };
}
// 2. Get signing key
const publicKey = await getPublicKey(packet.signerKeyRef);
// 3. Verify signature
const signatureValid = await verify(
packet.signature,
packet.contentHash,
publicKey,
packet.signatureAlgorithm
);
if (!signatureValid) {
return { valid: false, error: "Invalid signature" };
}
return { valid: true };
}
```
## Storage
Evidence packets are stored in an append-only table:
```sql
CREATE TABLE release.evidence_packets (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
type TEXT NOT NULL,
version TEXT NOT NULL DEFAULT '1.0',
content JSONB NOT NULL,
content_hash TEXT NOT NULL,
signature TEXT NOT NULL,
signature_algorithm TEXT NOT NULL,
signer_key_ref TEXT NOT NULL,
generated_at TIMESTAMPTZ NOT NULL,
generator_version TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
-- Note: No updated_at - packets are immutable
);
-- Prevent modifications
REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role;
```
## Export Formats
Evidence packets can be exported in multiple formats:
| Format | Use Case |
|--------|----------|
| JSON | API consumption, archival |
| PDF | Human-readable compliance reports |
| CSV | Spreadsheet analysis |
| SLSA | SLSA provenance format |
## References
- [Security Overview](../security/overview.md)
- [Deployment Artifacts](../deployment/artifacts.md)
- [Audit Trail](../security/audit-trail.md)

View File

@@ -0,0 +1,235 @@
# Glossary
## Core Concepts
### Agent
A software component installed on deployment targets that receives and executes deployment tasks. Agents communicate with the orchestrator via mTLS and execute deployments locally on the target.
### Approval
A human decision to authorize a promotion request. Approvals may require multiple approvers and enforce separation of duties.
### Approval Policy
Rules defining who can approve promotions to specific environments, including required approval counts and SoD requirements.
### Blue-Green Deployment
A deployment strategy using two identical production environments. Traffic switches from "blue" (current) to "green" (new) after validation.
### Canary Deployment
A deployment strategy that gradually rolls out changes to a small subset of targets before full deployment, allowing validation with real traffic.
### Channel
A version stream for components (e.g., "stable", "beta", "nightly"). Each channel tracks the latest compatible version.
### Component
A deployable unit mapped to a container image repository. Components have versions tracked via digest.
### Compose Lock
A Docker Compose file with all image references pinned to specific digests, ensuring reproducible deployments.
### Connector
A plugin that integrates Release Orchestrator with external systems (registries, CI/CD, notifications, etc.).
### Decision Record
An immutable record of all gate evaluations and conditions considered when making a promotion decision.
### Deployment Job
A unit of work representing the deployment of a release to an environment. Contains multiple deployment tasks.
### Deployment Task
A single target-level deployment operation within a deployment job.
### Digest
A cryptographic hash (SHA-256) that uniquely identifies a container image. Format: `sha256:abc123...`
### Drift
A mismatch between the expected deployed version (from version sticker) and the actual running version on a target.
### Environment
A logical grouping of deployment targets representing a stage in the promotion pipeline (e.g., dev, staging, production).
### Evidence Packet
An immutable, cryptographically signed record of deployment decisions and outcomes for audit purposes.
### Freeze Window
A time period during which deployments to an environment are blocked (e.g., holiday code freeze).
### Gate
A checkpoint in the promotion workflow that must pass before deployment proceeds. Types include security gates, approval gates, and custom policy gates.
### Promotion
The process of moving a release from one environment to another, subject to gates and approvals.
### Release
A versioned bundle of component digests representing a deployable unit. Releases are immutable once created.
### Rolling Deployment
A deployment strategy that updates targets in batches, maintaining availability throughout the process.
### Rollback
The process of reverting to a previous release version when a deployment fails or causes issues.
### Security Gate
An automated gate that evaluates security policies (vulnerability thresholds, compliance requirements) before allowing promotion.
### Separation of Duties (SoD)
A security principle requiring that the person who requests a promotion cannot be the same person who approves it.
### Step
A single unit of work within a workflow template. Steps have types (deploy, approve, notify, etc.) and can have dependencies.
### Target
A specific deployment destination (host, service, container) within an environment.
### Tenant
An isolated organizational unit with its own environments, releases, and configurations. Multi-tenancy ensures data isolation.
### Version Map
A mapping of image tags to digests for a component, allowing tag-based references while maintaining digest-based deployments.
### Version Sticker
Metadata placed on deployment targets indicating the currently deployed release and digest.
### Workflow
A DAG (Directed Acyclic Graph) of steps defining the deployment process, including gates, approvals, and verification.
### Workflow Template
A reusable workflow definition that can be customized for specific deployment scenarios.
## Module Abbreviations
| Abbreviation | Full Name | Description |
|--------------|-----------|-------------|
| INTHUB | Integration Hub | External system integration |
| ENVMGR | Environment Manager | Environment and target management |
| RELMAN | Release Management | Component and release management |
| WORKFL | Workflow Engine | Workflow execution |
| PROMOT | Promotion & Approval | Promotion and approval handling |
| DEPLOY | Deployment Execution | Deployment orchestration |
| AGENTS | Deployment Agents | Agent management |
| PROGDL | Progressive Delivery | A/B and canary releases |
| RELEVI | Release Evidence | Audit and compliance |
| PLUGIN | Plugin Infrastructure | Plugin system |
## Deployment Strategies
| Strategy | Description |
|----------|-------------|
| All-at-once | Deploy to all targets simultaneously |
| Rolling | Deploy in batches with availability |
| Canary | Gradual rollout with metrics validation |
| Blue-Green | Parallel environment with traffic switch |
## Status Values
### Promotion Status
| Status | Description |
|--------|-------------|
| `pending` | Promotion created, not yet evaluated |
| `pending_approval` | Waiting for human approval |
| `approved` | Approved, ready for deployment |
| `rejected` | Rejected by approver |
| `deploying` | Deployment in progress |
| `completed` | Successfully deployed |
| `failed` | Deployment failed |
| `cancelled` | Cancelled by user |
### Deployment Job Status
| Status | Description |
|--------|-------------|
| `pending` | Job created, not started |
| `preparing` | Generating artifacts |
| `running` | Tasks executing |
| `completing` | Verifying deployment |
| `completed` | Successfully completed |
| `failed` | Deployment failed |
| `rolling_back` | Rollback in progress |
| `rolled_back` | Rollback completed |
### Agent Status
| Status | Description |
|--------|-------------|
| `online` | Agent connected and healthy |
| `offline` | Agent not connected |
| `degraded` | Agent connected but reporting issues |
### Target Health Status
| Status | Description |
|--------|-------------|
| `healthy` | Target responding correctly |
| `unhealthy` | Target failing health checks |
| `unknown` | Health status not determined |
## API Error Codes
| Code | Description |
|------|-------------|
| `RELEASE_NOT_FOUND` | Release ID does not exist |
| `ENVIRONMENT_NOT_FOUND` | Environment ID does not exist |
| `PROMOTION_BLOCKED` | Promotion blocked by gate or freeze |
| `APPROVAL_REQUIRED` | Promotion requires approval |
| `INSUFFICIENT_APPROVALS` | Not enough approvals |
| `SOD_VIOLATION` | Separation of duties violated |
| `FREEZE_WINDOW_ACTIVE` | Environment in freeze window |
| `SECURITY_GATE_FAILED` | Security requirements not met |
| `NO_AGENT_AVAILABLE` | No agent available for target |
| `DEPLOYMENT_IN_PROGRESS` | Another deployment running |
| `ROLLBACK_NOT_POSSIBLE` | No previous version to rollback to |
## Integration Types
| Type | Category | Description |
|------|----------|-------------|
| `docker-registry` | Registry | Docker Registry v2 |
| `ecr` | Registry | AWS ECR |
| `acr` | Registry | Azure Container Registry |
| `gcr` | Registry | Google Container Registry |
| `harbor` | Registry | Harbor Registry |
| `gitlab-ci` | CI/CD | GitLab CI/CD |
| `github-actions` | CI/CD | GitHub Actions |
| `jenkins` | CI/CD | Jenkins |
| `slack` | Notification | Slack |
| `teams` | Notification | Microsoft Teams |
| `email` | Notification | Email (SMTP) |
| `hashicorp-vault` | Secrets | HashiCorp Vault |
| `prometheus` | Metrics | Prometheus |
## Workflow Step Types
| Type | Category | Description |
|------|----------|-------------|
| `approval` | Control | Wait for human approval |
| `wait` | Control | Wait for duration |
| `condition` | Control | Branch based on condition |
| `parallel` | Control | Execute children in parallel |
| `security-gate` | Gate | Evaluate security policy |
| `custom-gate` | Gate | Custom OPA policy |
| `freeze-check` | Gate | Check freeze windows |
| `deploy-docker` | Deploy | Deploy single container |
| `deploy-compose` | Deploy | Deploy Compose stack |
| `health-check` | Verify | HTTP/TCP health check |
| `smoke-test` | Verify | Run smoke tests |
| `notify` | Notify | Send notification |
| `webhook` | Integration | Call external webhook |
| `trigger-ci` | Integration | Trigger CI pipeline |
| `rollback` | Recovery | Rollback deployment |
## Security Terms
| Term | Description |
|------|-------------|
| mTLS | Mutual TLS - both client and server authenticate with certificates |
| JWT | JSON Web Token - used for API authentication |
| RBAC | Role-Based Access Control |
| OPA | Open Policy Agent - policy evaluation engine |
| SoD | Separation of Duties |
| PEP | Policy Enforcement Point |
## References
- [Design Principles](../design/principles.md)
- [API Overview](../api/overview.md)
- [Security Overview](../security/overview.md)

View File

@@ -0,0 +1,410 @@
# Release Orchestrator Architecture
> Technical architecture specification for the Release Orchestrator — Stella Ops Suite's central release control plane for non-Kubernetes container estates.
**Status:** Planned (not yet implemented)
## Overview
The Release Orchestrator transforms Stella Ops Suite from a vulnerability scanning platform into a centralized, auditable release control plane. It sits between CI systems and runtime targets, governing promotion across environments, enforcing security and policy gates, and producing verifiable evidence for every release decision.
### Core Value Proposition
- **Release orchestration** — UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks
- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates
- **OCI-digest-first releases** — Immutable digest-based release identity
- **Toolchain-agnostic integrations** — Plug into any SCM, CI, registry, secrets system
- **Auditability + standards** — Evidence packets, SBOM/VEX/attestation support, deterministic replay
## Design Principles
1. **Digest-First Release Identity** — A release is an immutable set of OCI digests, never mutable tags. Tags are resolved to digests at release creation time.
2. **Pluggable Everything, Stable Core** — Integrations are plugins; the core orchestration engine is stable. Plugins contribute UI screens, connector logic, step types, and agent types.
3. **Evidence for Every Decision** — Every deployment/promotion produces an immutable evidence record containing who, what, why, how, and when.
4. **No Feature Gating** — All plans include all features. Limits are only: environments, new digests/day, fair use on deployments.
5. **Offline-First Operation** — All core operations work in air-gapped environments. Plugins may require connectivity; core does not.
6. **Immutable Generated Artifacts** — Every deployment generates and stores immutable artifacts (compose lockfiles, scripts, evidence).
## Platform Themes
The Release Orchestrator introduces ten new functional themes:
| Theme | Purpose | Key Modules |
|-------|---------|-------------|
| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime |
| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager |
| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager |
| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor |
| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine |
| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Artifact Generator |
| **AGENTS** | Deployment agents | Agent Core, Docker/Compose/ECS/Nomad agents |
| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller |
| **RELEVI** | Release evidence | Evidence Collector, Sticker Writer, Audit Exporter |
| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin SDK |
## Components
```
ReleaseOrchestrator/
├── __Libraries/
│ ├── StellaOps.ReleaseOrchestrator.Core/ # Core domain models
│ ├── StellaOps.ReleaseOrchestrator.Workflow/ # DAG workflow engine
│ ├── StellaOps.ReleaseOrchestrator.Promotion/ # Promotion logic
│ ├── StellaOps.ReleaseOrchestrator.Deploy/ # Deployment coordination
│ ├── StellaOps.ReleaseOrchestrator.Evidence/ # Evidence generation
│ ├── StellaOps.ReleaseOrchestrator.Plugin/ # Plugin infrastructure
│ └── StellaOps.ReleaseOrchestrator.Integration/ # Integration connectors
├── StellaOps.ReleaseOrchestrator.WebService/ # HTTP API
├── StellaOps.ReleaseOrchestrator.Worker/ # Background processing
├── StellaOps.Agent.Core/ # Agent base framework
├── StellaOps.Agent.Docker/ # Docker host agent
├── StellaOps.Agent.Compose/ # Docker Compose agent
├── StellaOps.Agent.SSH/ # SSH agentless executor
├── StellaOps.Agent.WinRM/ # WinRM agentless executor
├── StellaOps.Agent.ECS/ # AWS ECS agent
├── StellaOps.Agent.Nomad/ # HashiCorp Nomad agent
└── __Tests/
└── StellaOps.ReleaseOrchestrator.*.Tests/
```
## Data Flow
### Release Orchestration Flow
```
CI Build → Registry Push → Webhook → Stella Scan → Create Release →
Request Promotion → Gate Evaluation → Decision Record →
Deploy via Agent → Version Sticker → Evidence Packet
```
### Detailed Flow
1. **CI pushes image** to registry by digest; triggers webhook to Stella
2. **Stella scans** the new digest (if not already scanned); stores verdict
3. **Release created** bundling component digests with semantic version
4. **Promotion requested** to move release from source → target environment
5. **Gate evaluation** runs: security verdict, approval count, freeze windows, custom policies
6. **Decision record** produced with evidence refs and signed
7. **Deployment executed** via agent to target (Docker/Compose/ECS/Nomad)
8. **Version sticker** written to target for drift detection
9. **Evidence packet** sealed and stored
## Key Abstractions
### Environment
```csharp
public sealed record Environment
{
public required Guid Id { get; init; }
public required Guid TenantId { get; init; }
public required string Name { get; init; } // "dev", "stage", "prod"
public required string Slug { get; init; } // URL-safe identifier
public required int PromotionOrder { get; init; } // 1, 2, 3...
public required FreezeWindow[] FreezeWindows { get; init; }
public required ApprovalPolicy ApprovalPolicy { get; init; }
public required bool IsProduction { get; init; }
public EnvironmentState State { get; init; } // Active, Frozen, Retired
}
```
### Release
```csharp
public sealed record Release
{
public required Guid Id { get; init; }
public required Guid TenantId { get; init; }
public required string Version { get; init; } // SemVer: "2.3.1"
public required string Name { get; init; } // Display name
public required ImmutableDictionary<string, ComponentDigest> Components { get; init; }
public required string SourceRef { get; init; } // Git SHA or tag
public required DateTimeOffset CreatedAt { get; init; }
public required Guid CreatedBy { get; init; }
public ReleaseState State { get; init; } // Draft, Active, Deprecated
}
public sealed record ComponentDigest
{
public required string Repository { get; init; } // registry.example.com/app/api
public required string Digest { get; init; } // sha256:abc123...
public required string? ResolvedFromTag { get; init; } // Optional: "v2.3.1"
}
```
### Promotion
```csharp
public sealed record Promotion
{
public required Guid Id { get; init; }
public required Guid TenantId { get; init; }
public required Guid ReleaseId { get; init; }
public required Guid SourceEnvironmentId { get; init; }
public required Guid TargetEnvironmentId { get; init; }
public required Guid RequestedBy { get; init; }
public required DateTimeOffset RequestedAt { get; init; }
public PromotionState State { get; init; } // Pending, Approved, Rejected, Deployed, RolledBack
public required ImmutableArray<GateResult> GateResults { get; init; }
public required ImmutableArray<ApprovalRecord> Approvals { get; init; }
public required DecisionRecord? Decision { get; init; }
}
```
### Workflow
```csharp
public sealed record Workflow
{
public required Guid Id { get; init; }
public required string Name { get; init; }
public required ImmutableArray<WorkflowStep> Steps { get; init; }
public required ImmutableDictionary<string, string[]> DependencyGraph { get; init; }
}
public sealed record WorkflowStep
{
public required string Id { get; init; }
public required string Type { get; init; } // "script", "approval", "deploy", "gate"
public required StepProvider Provider { get; init; }
public required ImmutableDictionary<string, object> Config { get; init; }
public required string[] DependsOn { get; init; }
public StepState State { get; init; }
}
```
### Target
```csharp
public sealed record Target
{
public required Guid Id { get; init; }
public required Guid TenantId { get; init; }
public required Guid EnvironmentId { get; init; }
public required string Name { get; init; }
public required TargetType Type { get; init; } // DockerHost, ComposeHost, ECSService, NomadJob
public required ImmutableDictionary<string, string> Labels { get; init; }
public required Guid? AgentId { get; init; } // Null for agentless
public required TargetState State { get; init; }
public required HealthStatus Health { get; init; }
}
public enum TargetType
{
DockerHost,
ComposeHost,
ECSService,
NomadJob,
SSHRemote,
WinRMRemote
}
```
### Agent
```csharp
public sealed record Agent
{
public required Guid Id { get; init; }
public required Guid TenantId { get; init; }
public required string Name { get; init; }
public required string Version { get; init; }
public required ImmutableArray<string> Capabilities { get; init; }
public required DateTimeOffset LastHeartbeat { get; init; }
public required AgentState State { get; init; } // Online, Offline, Degraded
public required ImmutableDictionary<string, string> Labels { get; init; }
}
```
## Database Schema
| Table | Purpose |
|-------|---------|
| `release.environments` | Environment definitions with freeze windows |
| `release.targets` | Deployment targets within environments |
| `release.agents` | Registered deployment agents |
| `release.components` | Component definitions (service → repository mapping) |
| `release.releases` | Release bundles (version → component digests) |
| `release.promotions` | Promotion requests and state |
| `release.approvals` | Approval records |
| `release.workflows` | Workflow templates |
| `release.workflow_runs` | Workflow execution state |
| `release.deployment_jobs` | Deployment job records |
| `release.evidence_packets` | Sealed evidence records |
| `release.integrations` | Integration configurations |
| `release.plugins` | Plugin registrations |
## Gate Types
| Gate | Purpose | Evaluation |
|------|---------|------------|
| **Security** | Check scan verdict | Query latest scan for release digest; block on critical/high reachable |
| **Approval** | Human sign-off | Count approvals; check SoD rules |
| **FreezeWindow** | Calendar-based blocking | Check target environment freeze windows |
| **PreviousEnvironment** | Require prior deployment | Verify release deployed to source environment |
| **Policy** | Custom OPA/Rego rules | Evaluate policy with promotion context |
| **HealthCheck** | Target health | Verify target is healthy before deploy |
## Plugin System (Three-Surface Model)
Plugins contribute through three surfaces:
### 1. Manifest (Static Declaration)
```yaml
# plugin-manifest.yaml
name: github-integration
version: 1.0.0
provider: StellaOps.Integration.GitHub.Plugin
capabilities:
integrations:
- type: scm
id: github
displayName: GitHub
steps:
- type: github-status
displayName: Update GitHub Status
gates:
- type: github-check
displayName: GitHub Check Required
```
### 2. Connector Runtime (Dynamic Execution)
```csharp
public interface IIntegrationConnector
{
Task<ConnectionTestResult> TestConnectionAsync(CancellationToken ct);
Task<HealthStatus> GetHealthAsync(CancellationToken ct);
Task<IReadOnlyList<Resource>> DiscoverResourcesAsync(string resourceType, CancellationToken ct);
}
public interface ISCMConnector : IIntegrationConnector
{
Task<CommitInfo> GetCommitAsync(string ref, CancellationToken ct);
Task CreateCommitStatusAsync(string commit, CommitStatus status, CancellationToken ct);
}
public interface IRegistryConnector : IIntegrationConnector
{
Task<string> ResolveDigestAsync(string imageRef, CancellationToken ct);
Task<bool> VerifyDigestAsync(string imageRef, string expectedDigest, CancellationToken ct);
}
```
### 3. Step Provider (Execution Contract)
```csharp
public interface IStepProvider
{
StepExecutionCharacteristics Characteristics { get; }
Task<StepResult> ExecuteAsync(StepContext context, CancellationToken ct);
Task<StepResult> RollbackAsync(StepContext context, CancellationToken ct);
}
public sealed record StepExecutionCharacteristics
{
public bool IsIdempotent { get; init; }
public bool SupportsRollback { get; init; }
public TimeSpan DefaultTimeout { get; init; }
public ResourceRequirements Resources { get; init; }
}
```
## Invariants
1. **Release identity is immutable** — Once created, a release's component digests cannot be changed. Create a new release instead.
2. **Promotions are append-only** — Promotion state transitions are recorded; no edits or deletions.
3. **Evidence packets are sealed** — Evidence is cryptographically signed and stored immutably.
4. **Digest verification at deploy time** — Agents verify image digests at pull time; mismatch fails deployment.
5. **Separation of duties enforced** — Requester cannot be sole approver for production promotions.
6. **Workflow execution is deterministic** — Same inputs produce same execution order and outputs.
## Error Handling
- **Transient failures** — Retry with exponential backoff; circuit breaker for repeated failures
- **Agent disconnection** — Mark agent offline; reassign pending tasks to other agents
- **Deployment failure** — Automatic rollback if configured; otherwise mark promotion as failed
- **Gate failure** — Block promotion; require manual intervention or re-evaluation
## Observability
### Metrics
- `release_promotions_total` — Counter by environment and outcome
- `release_deployments_duration_seconds` — Histogram of deployment times
- `release_gate_evaluations_total` — Counter by gate type and result
- `release_agents_online` — Gauge of online agents
- `release_workflow_steps_duration_seconds` — Histogram by step type
### Traces
- `promotion.request` — Span for promotion request handling
- `gate.evaluate` — Span for each gate evaluation
- `deployment.execute` — Span for deployment execution
- `agent.task` — Span for agent task execution
### Logs
- Structured logs with correlation IDs
- Promotion ID, release ID, environment ID in all relevant logs
- Sensitive data (secrets, credentials) masked
## Security Considerations
### Agent Security
- **mTLS authentication** — Agents authenticate with CA-signed certificates
- **Short-lived credentials** — Task credentials expire after execution
- **Capability-based authorization** — Agents only receive tasks matching their capabilities
- **Heartbeat monitoring** — Detect and flag agent disconnections
### Secrets Management
- **Never stored in database** — Only vault references stored
- **Fetched at execution time** — Secrets retrieved just-in-time for deployment
- **Short-lived** — Dynamic credentials with minimal TTL
- **Masked in logs** — Secret values never logged
### Plugin Sandbox
- **Resource limits** — CPU, memory, timeout limits per plugin
- **Capability restrictions** — Plugins declare required capabilities
- **Network isolation** — Optional network restrictions for plugins
## Performance Characteristics
- **Promotion evaluation** — < 5 seconds for typical gate evaluation
- **Deployment latency** Dominated by image pull time; orchestration overhead < 10 seconds
- **Agent heartbeat** 30-second interval; offline detection within 90 seconds
- **Workflow step timeout** Configurable; default 5 minutes per step
## Implementation Roadmap
| Phase | Focus | Key Deliverables |
|-------|-------|------------------|
| **Phase 1** | Foundation | Environment management, integration hub, release bundles |
| **Phase 2** | Workflow Engine | DAG execution, step registry, workflow templates |
| **Phase 3** | Promotion & Decision | Approval gateway, security gates, decision records |
| **Phase 4** | Deployment Execution | Docker/Compose agents, artifact generation, rollback |
| **Phase 5** | UI & Polish | Release dashboard, promotion UI, environment management |
| **Phase 6** | Progressive Delivery | A/B releases, canary, traffic routing |
| **Phase 7** | Extended Targets | ECS, Nomad, SSH/WinRM agentless |
| **Phase 8** | Plugin Ecosystem | Full plugin system, marketplace |
## References
- [Product Vision](../../product/VISION.md)
- [Architecture Overview](../../ARCHITECTURE_OVERVIEW.md)
- [Full Orchestrator Specification](../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
- [Competitive Landscape](../../product/competitive-landscape.md)

View File

@@ -0,0 +1,343 @@
# Entity Definitions
This document describes the core entities in the Release Orchestrator data model.
## Entity Relationship Overview
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ ENTITY RELATIONSHIPS │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ Tenant │───────│ Environment │───────│ Target │ │
│ └──────────┘ └──────────────┘ └────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ Component│ │ Approval │ │ Agent │ │
│ └──────────┘ │ Policy │ └────────────┘ │
│ │ └──────────────┘ │ │
│ │ │ │ │
│ ▼ │ ▼ │
│ ┌──────────┐ │ ┌─────────────┐ │
│ │ Version │ │ │ Deployment │ │
│ │ Map │ │ │ Task │ │
│ └──────────┘ │ └─────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ ▼ │ ▼ │
│ ┌─────────────────────────┼─────────────────────────────┐ │
│ │ │ │ │
│ │ ┌──────────┐ ┌─────▼─────┐ ┌─────────────┐ │ │
│ │ │ Release │─────│ Promotion │─────│ Deployment │ │ │
│ │ └──────────┘ └───────────┘ │ Job │ │ │
│ │ │ │ └─────────────┘ │ │
│ │ │ │ │ │ │
│ │ │ ▼ │ │ │
│ │ │ ┌───────────┐ │ │ │
│ │ │ │ Approval │ │ │ │
│ │ │ └───────────┘ │ │ │
│ │ │ │ │ │ │
│ │ │ ▼ ▼ │ │
│ │ │ ┌───────────┐ ┌───────────┐ │ │
│ │ │ │ Decision │ │ Generated │ │ │
│ │ │ │ Record │ │ Artifacts │ │ │
│ │ │ └───────────┘ └───────────┘ │ │
│ │ │ │ │ │ │
│ │ │ └────────┬────────┘ │ │
│ │ │ │ │ │
│ │ │ ▼ │ │
│ │ │ ┌───────────┐ │ │
│ │ └───────────────────►│ Evidence │◄────────────┘ │
│ │ │ Packet │ │
│ │ └───────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌───────────┐ │
│ │ │ Version │ │
│ │ │ Sticker │ │
│ │ └───────────┘ │
│ │ │
│ └─────────────────────────────────────────────────────────────────────────┘
└─────────────────────────────────────────────────────────────────────────────┘
```
## Core Entities
### Environment
Represents a deployment target environment (dev, staging, production).
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `tenant_id` | UUID | Tenant reference |
| `name` | string | Unique name (e.g., "prod") |
| `display_name` | string | Display name (e.g., "Production") |
| `order_index` | integer | Promotion order |
| `config` | JSONB | Environment configuration |
| `freeze_windows` | JSONB | Active freeze windows |
| `required_approvals` | integer | Approvals needed for promotion |
| `require_sod` | boolean | Require separation of duties |
| `created_at` | timestamp | Creation time |
### Target
Represents a deployment target (host, service).
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `tenant_id` | UUID | Tenant reference |
| `environment_id` | UUID | Environment reference |
| `name` | string | Target name |
| `target_type` | string | Type (docker_host, compose_host, etc.) |
| `connection` | JSONB | Connection configuration |
| `labels` | JSONB | Target labels |
| `health_status` | string | Current health status |
| `current_digest` | string | Currently deployed digest |
### Agent
Represents a deployment agent.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `tenant_id` | UUID | Tenant reference |
| `name` | string | Agent name |
| `version` | string | Agent version |
| `capabilities` | JSONB | Agent capabilities |
| `status` | string | online/offline/degraded |
| `last_heartbeat` | timestamp | Last heartbeat time |
### Component
Represents a deployable component (maps to an image repository).
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `tenant_id` | UUID | Tenant reference |
| `name` | string | Component name |
| `display_name` | string | Display name |
| `image_repository` | string | Image repository URL |
| `versioning_strategy` | JSONB | How versions are determined |
| `default_channel` | string | Default version channel |
### Version Map
Maps image tags to digests and semantic versions.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `component_id` | UUID | Component reference |
| `tag` | string | Image tag |
| `digest` | string | Image digest (sha256:...) |
| `semver` | string | Semantic version |
| `channel` | string | Version channel (stable, beta) |
### Release
A versioned bundle of component digests.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `tenant_id` | UUID | Tenant reference |
| `name` | string | Release name |
| `display_name` | string | Display name |
| `components` | JSONB | Component/digest mappings |
| `source_ref` | JSONB | Source code reference |
| `status` | string | draft/ready/deployed/deprecated |
| `created_by` | UUID | Creator user reference |
### Promotion
A request to promote a release to an environment.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `tenant_id` | UUID | Tenant reference |
| `release_id` | UUID | Release reference |
| `source_environment_id` | UUID | Source environment (nullable) |
| `target_environment_id` | UUID | Target environment |
| `status` | string | Promotion status |
| `decision_record` | JSONB | Gate evaluation results |
| `workflow_run_id` | UUID | Associated workflow run |
| `requested_by` | UUID | Requesting user |
| `requested_at` | timestamp | Request time |
### Approval
An approval or rejection of a promotion.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `promotion_id` | UUID | Promotion reference |
| `approver_id` | UUID | Approving user |
| `action` | string | approved/rejected |
| `comment` | string | Approval comment |
| `approved_at` | timestamp | Approval time |
### Deployment Job
A deployment execution job.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `promotion_id` | UUID | Promotion reference |
| `release_id` | UUID | Release reference |
| `environment_id` | UUID | Environment reference |
| `status` | string | Job status |
| `strategy` | string | Deployment strategy |
| `artifacts` | JSONB | Generated artifacts |
| `rollback_of` | UUID | If rollback, original job |
### Deployment Task
A task to deploy to a single target.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `job_id` | UUID | Job reference |
| `target_id` | UUID | Target reference |
| `digest` | string | Digest to deploy |
| `status` | string | Task status |
| `agent_id` | UUID | Assigned agent |
| `logs` | text | Execution logs |
| `previous_digest` | string | Previous digest (for rollback) |
### Evidence Packet
Immutable audit evidence for a promotion/deployment.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `promotion_id` | UUID | Promotion reference |
| `packet_type` | string | Type of evidence |
| `content` | JSONB | Evidence content |
| `content_hash` | string | SHA-256 of content |
| `signature` | string | Cryptographic signature |
| `signer_key_ref` | string | Signing key reference |
| `created_at` | timestamp | Creation time (no update) |
### Version Sticker
Version marker placed on deployment targets.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `target_id` | UUID | Target reference |
| `release_id` | UUID | Release reference |
| `promotion_id` | UUID | Promotion reference |
| `sticker_content` | JSONB | Sticker JSON content |
| `content_hash` | string | Content hash |
| `written_at` | timestamp | Write time |
| `drift_detected` | boolean | Drift detection flag |
## Workflow Entities
### Workflow Template
A reusable workflow definition.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `tenant_id` | UUID | Tenant reference (null for builtin) |
| `name` | string | Template name |
| `version` | integer | Template version |
| `nodes` | JSONB | Step nodes |
| `edges` | JSONB | Step edges |
| `inputs` | JSONB | Input definitions |
| `outputs` | JSONB | Output definitions |
| `is_builtin` | boolean | Is built-in template |
### Workflow Run
An execution of a workflow template.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `template_id` | UUID | Template reference |
| `template_version` | integer | Template version at execution |
| `status` | string | Run status |
| `context` | JSONB | Execution context |
| `inputs` | JSONB | Input values |
| `outputs` | JSONB | Output values |
| `started_at` | timestamp | Start time |
| `completed_at` | timestamp | Completion time |
### Step Run
Execution of a single step within a workflow run.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `workflow_run_id` | UUID | Workflow run reference |
| `node_id` | string | Node ID from template |
| `status` | string | Step status |
| `inputs` | JSONB | Resolved inputs |
| `outputs` | JSONB | Produced outputs |
| `logs` | text | Execution logs |
| `attempt_number` | integer | Retry attempt number |
## Plugin Entities
### Plugin
A registered plugin.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `plugin_id` | string | Unique plugin identifier |
| `version` | string | Plugin version |
| `vendor` | string | Plugin vendor |
| `manifest` | JSONB | Plugin manifest |
| `status` | string | Plugin status |
| `entrypoint` | string | Plugin entrypoint path |
### Plugin Instance
A tenant-specific plugin configuration.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `plugin_id` | UUID | Plugin reference |
| `tenant_id` | UUID | Tenant reference |
| `config` | JSONB | Tenant configuration |
| `enabled` | boolean | Is enabled for tenant |
## Integration Entities
### Integration
A configured external integration.
| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Primary key |
| `tenant_id` | UUID | Tenant reference |
| `type_id` | string | Integration type |
| `name` | string | Integration name |
| `config` | JSONB | Integration configuration |
| `credential_ref` | string | Vault credential reference |
| `health_status` | string | Connection health |
## References
- [Database Schema](schema.md)
- [Module Overview](../modules/overview.md)

View File

@@ -0,0 +1,631 @@
# Database Schema (PostgreSQL)
This document specifies the complete PostgreSQL schema for the Release Orchestrator.
## Schema Organization
All release orchestration tables reside in the `release` schema:
```sql
CREATE SCHEMA IF NOT EXISTS release;
SET search_path TO release, public;
```
## Core Tables
### Tenant and Authority Extensions
```sql
-- Extended: Add release-related permissions
ALTER TABLE permissions ADD COLUMN IF NOT EXISTS
resource_type VARCHAR(50) CHECK (resource_type IN (
'environment', 'release', 'promotion', 'target', 'workflow', 'plugin'
));
```
---
## Integration Hub
```sql
CREATE TABLE integration_types (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(100) NOT NULL UNIQUE,
category VARCHAR(50) NOT NULL CHECK (category IN (
'scm', 'ci', 'registry', 'vault', 'target', 'router'
)),
plugin_id UUID REFERENCES plugins(id),
config_schema JSONB NOT NULL,
secrets_schema JSONB NOT NULL,
is_builtin BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE integrations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
integration_type_id UUID NOT NULL REFERENCES integration_types(id),
name VARCHAR(255) NOT NULL,
config JSONB NOT NULL,
credential_ref VARCHAR(500), -- Vault path or encrypted ref
status VARCHAR(50) NOT NULL DEFAULT 'unknown' CHECK (status IN (
'healthy', 'degraded', 'unhealthy', 'unknown'
)),
last_health_check TIMESTAMPTZ,
last_health_message TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by UUID REFERENCES users(id),
UNIQUE (tenant_id, name)
);
CREATE INDEX idx_integrations_tenant ON integrations(tenant_id);
CREATE INDEX idx_integrations_type ON integrations(integration_type_id);
CREATE TABLE connection_profiles (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES users(id),
integration_type_id UUID NOT NULL REFERENCES integration_types(id),
name VARCHAR(255) NOT NULL,
config_defaults JSONB NOT NULL,
is_default BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, user_id, integration_type_id, name)
);
```
---
## Environment & Inventory
```sql
CREATE TABLE environments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
name VARCHAR(100) NOT NULL,
display_name VARCHAR(255) NOT NULL,
order_index INTEGER NOT NULL,
config JSONB NOT NULL DEFAULT '{}',
freeze_windows JSONB NOT NULL DEFAULT '[]',
required_approvals INTEGER NOT NULL DEFAULT 0,
require_sod BOOLEAN NOT NULL DEFAULT FALSE,
auto_promote_from UUID REFERENCES environments(id),
promotion_policy VARCHAR(255),
deployment_timeout INTEGER NOT NULL DEFAULT 600,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, name)
);
CREATE INDEX idx_environments_tenant ON environments(tenant_id);
CREATE INDEX idx_environments_order ON environments(tenant_id, order_index);
CREATE TABLE target_groups (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
labels JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, environment_id, name)
);
CREATE TABLE targets (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
target_group_id UUID REFERENCES target_groups(id),
name VARCHAR(255) NOT NULL,
target_type VARCHAR(100) NOT NULL,
connection JSONB NOT NULL,
capabilities JSONB NOT NULL DEFAULT '[]',
labels JSONB NOT NULL DEFAULT '{}',
deployment_directory VARCHAR(500),
health_status VARCHAR(50) NOT NULL DEFAULT 'unknown' CHECK (health_status IN (
'healthy', 'degraded', 'unhealthy', 'unknown'
)),
last_health_check TIMESTAMPTZ,
current_digest VARCHAR(100),
agent_id UUID REFERENCES agents(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, environment_id, name)
);
CREATE INDEX idx_targets_tenant_env ON targets(tenant_id, environment_id);
CREATE INDEX idx_targets_type ON targets(target_type);
CREATE INDEX idx_targets_labels ON targets USING GIN (labels);
CREATE TABLE agents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
version VARCHAR(50) NOT NULL,
capabilities JSONB NOT NULL DEFAULT '[]',
labels JSONB NOT NULL DEFAULT '{}',
status VARCHAR(50) NOT NULL DEFAULT 'offline' CHECK (status IN (
'online', 'offline', 'degraded'
)),
last_heartbeat TIMESTAMPTZ,
resource_usage JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, name)
);
CREATE INDEX idx_agents_tenant ON agents(tenant_id);
CREATE INDEX idx_agents_status ON agents(status);
CREATE INDEX idx_agents_capabilities ON agents USING GIN (capabilities);
```
---
## Release Management
```sql
CREATE TABLE components (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
display_name VARCHAR(255) NOT NULL,
image_repository VARCHAR(500) NOT NULL,
registry_integration_id UUID REFERENCES integrations(id),
versioning_strategy JSONB NOT NULL DEFAULT '{"type": "semver"}',
deployment_template VARCHAR(255),
default_channel VARCHAR(50) NOT NULL DEFAULT 'stable',
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, name)
);
CREATE INDEX idx_components_tenant ON components(tenant_id);
CREATE TABLE version_maps (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
component_id UUID NOT NULL REFERENCES components(id) ON DELETE CASCADE,
tag VARCHAR(255) NOT NULL,
digest VARCHAR(100) NOT NULL,
semver VARCHAR(50),
channel VARCHAR(50) NOT NULL DEFAULT 'stable',
prerelease BOOLEAN NOT NULL DEFAULT FALSE,
build_metadata VARCHAR(255),
resolved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
source VARCHAR(50) NOT NULL DEFAULT 'auto' CHECK (source IN ('auto', 'manual')),
UNIQUE (tenant_id, component_id, digest)
);
CREATE INDEX idx_version_maps_component ON version_maps(component_id);
CREATE INDEX idx_version_maps_digest ON version_maps(digest);
CREATE INDEX idx_version_maps_semver ON version_maps(semver);
CREATE TABLE releases (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
display_name VARCHAR(255) NOT NULL,
components JSONB NOT NULL, -- [{componentId, digest, semver, tag, role}]
source_ref JSONB, -- {scmIntegrationId, commitSha, ciIntegrationId, buildId}
status VARCHAR(50) NOT NULL DEFAULT 'draft' CHECK (status IN (
'draft', 'ready', 'promoting', 'deployed', 'deprecated', 'archived'
)),
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by UUID REFERENCES users(id),
UNIQUE (tenant_id, name)
);
CREATE INDEX idx_releases_tenant ON releases(tenant_id);
CREATE INDEX idx_releases_status ON releases(status);
CREATE INDEX idx_releases_created ON releases(created_at DESC);
CREATE TABLE release_environment_state (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
release_id UUID NOT NULL REFERENCES releases(id),
status VARCHAR(50) NOT NULL CHECK (status IN (
'deployed', 'deploying', 'failed', 'rolling_back', 'rolled_back'
)),
deployed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
deployed_by UUID REFERENCES users(id),
promotion_id UUID, -- will reference promotions
evidence_ref VARCHAR(255),
UNIQUE (tenant_id, environment_id)
);
CREATE INDEX idx_release_env_state_env ON release_environment_state(environment_id);
CREATE INDEX idx_release_env_state_release ON release_environment_state(release_id);
```
---
## Workflow Engine
```sql
CREATE TABLE workflow_templates (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE, -- NULL for builtin
name VARCHAR(255) NOT NULL,
display_name VARCHAR(255) NOT NULL,
description TEXT,
version INTEGER NOT NULL DEFAULT 1,
nodes JSONB NOT NULL,
edges JSONB NOT NULL,
inputs JSONB NOT NULL DEFAULT '[]',
outputs JSONB NOT NULL DEFAULT '[]',
is_builtin BOOLEAN NOT NULL DEFAULT FALSE,
tags JSONB NOT NULL DEFAULT '[]',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by UUID REFERENCES users(id),
UNIQUE (tenant_id, name, version)
);
CREATE INDEX idx_workflow_templates_tenant ON workflow_templates(tenant_id);
CREATE INDEX idx_workflow_templates_builtin ON workflow_templates(is_builtin);
CREATE TABLE workflow_runs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
template_id UUID NOT NULL REFERENCES workflow_templates(id),
template_version INTEGER NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN (
'created', 'running', 'paused', 'succeeded', 'failed', 'cancelled'
)),
context JSONB NOT NULL, -- inputs, variables, release info
outputs JSONB,
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
error_message TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
triggered_by UUID REFERENCES users(id)
);
CREATE INDEX idx_workflow_runs_tenant ON workflow_runs(tenant_id);
CREATE INDEX idx_workflow_runs_status ON workflow_runs(status);
CREATE INDEX idx_workflow_runs_template ON workflow_runs(template_id);
CREATE TABLE step_runs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
workflow_run_id UUID NOT NULL REFERENCES workflow_runs(id) ON DELETE CASCADE,
node_id VARCHAR(100) NOT NULL,
step_type VARCHAR(100) NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
'pending', 'running', 'succeeded', 'failed', 'skipped', 'retrying', 'cancelled'
)),
inputs JSONB NOT NULL,
config JSONB NOT NULL,
outputs JSONB,
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
attempt_number INTEGER NOT NULL DEFAULT 1,
error_message TEXT,
error_type VARCHAR(100),
logs TEXT,
artifacts JSONB NOT NULL DEFAULT '[]',
t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional)
ts_wall TIMESTAMPTZ, -- Wall-clock timestamp for debugging (optional)
UNIQUE (workflow_run_id, node_id, attempt_number)
);
CREATE INDEX idx_step_runs_workflow ON step_runs(workflow_run_id);
CREATE INDEX idx_step_runs_status ON step_runs(status);
```
---
## Promotion & Approval
```sql
CREATE TABLE promotions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
release_id UUID NOT NULL REFERENCES releases(id),
source_environment_id UUID REFERENCES environments(id),
target_environment_id UUID NOT NULL REFERENCES environments(id),
status VARCHAR(50) NOT NULL DEFAULT 'pending_approval' CHECK (status IN (
'pending_approval', 'pending_gate', 'approved', 'rejected',
'deploying', 'deployed', 'failed', 'cancelled', 'rolled_back'
)),
decision_record JSONB,
workflow_run_id UUID REFERENCES workflow_runs(id),
requested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
requested_by UUID NOT NULL REFERENCES users(id),
request_reason TEXT,
decided_at TIMESTAMPTZ,
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
evidence_packet_id UUID,
t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional)
ts_wall TIMESTAMPTZ -- Wall-clock timestamp for debugging (optional)
);
CREATE INDEX idx_promotions_tenant ON promotions(tenant_id);
CREATE INDEX idx_promotions_release ON promotions(release_id);
CREATE INDEX idx_promotions_status ON promotions(status);
CREATE INDEX idx_promotions_target_env ON promotions(target_environment_id);
-- Add FK to release_environment_state
ALTER TABLE release_environment_state
ADD CONSTRAINT fk_release_env_state_promotion
FOREIGN KEY (promotion_id) REFERENCES promotions(id);
CREATE TABLE approvals (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
promotion_id UUID NOT NULL REFERENCES promotions(id) ON DELETE CASCADE,
approver_id UUID NOT NULL REFERENCES users(id),
action VARCHAR(50) NOT NULL CHECK (action IN ('approved', 'rejected')),
comment TEXT,
approved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
approver_role VARCHAR(255),
approver_groups JSONB NOT NULL DEFAULT '[]'
);
CREATE INDEX idx_approvals_promotion ON approvals(promotion_id);
CREATE INDEX idx_approvals_approver ON approvals(approver_id);
CREATE TABLE approval_policies (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
required_count INTEGER NOT NULL DEFAULT 1,
required_roles JSONB NOT NULL DEFAULT '[]',
required_groups JSONB NOT NULL DEFAULT '[]',
require_sod BOOLEAN NOT NULL DEFAULT FALSE,
allow_self_approval BOOLEAN NOT NULL DEFAULT FALSE,
expiration_minutes INTEGER NOT NULL DEFAULT 1440,
UNIQUE (tenant_id, environment_id)
);
```
---
## Deployment
```sql
CREATE TABLE deployment_jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
promotion_id UUID NOT NULL REFERENCES promotions(id),
release_id UUID NOT NULL REFERENCES releases(id),
environment_id UUID NOT NULL REFERENCES environments(id),
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
'pending', 'running', 'succeeded', 'failed', 'cancelled', 'rolling_back', 'rolled_back'
)),
strategy VARCHAR(50) NOT NULL DEFAULT 'all-at-once',
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
artifacts JSONB NOT NULL DEFAULT '[]',
rollback_of UUID REFERENCES deployment_jobs(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional)
ts_wall TIMESTAMPTZ -- Wall-clock timestamp for debugging (optional)
);
CREATE INDEX idx_deployment_jobs_promotion ON deployment_jobs(promotion_id);
CREATE INDEX idx_deployment_jobs_status ON deployment_jobs(status);
CREATE TABLE deployment_tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
job_id UUID NOT NULL REFERENCES deployment_jobs(id) ON DELETE CASCADE,
target_id UUID NOT NULL REFERENCES targets(id),
digest VARCHAR(100) NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
'pending', 'running', 'succeeded', 'failed', 'cancelled', 'skipped'
)),
agent_id UUID REFERENCES agents(id),
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
exit_code INTEGER,
logs TEXT,
previous_digest VARCHAR(100),
sticker_written BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_deployment_tasks_job ON deployment_tasks(job_id);
CREATE INDEX idx_deployment_tasks_target ON deployment_tasks(target_id);
CREATE INDEX idx_deployment_tasks_status ON deployment_tasks(status);
CREATE TABLE generated_artifacts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
deployment_job_id UUID REFERENCES deployment_jobs(id) ON DELETE CASCADE,
artifact_type VARCHAR(50) NOT NULL CHECK (artifact_type IN (
'compose_lock', 'script', 'sticker', 'evidence', 'config'
)),
name VARCHAR(255) NOT NULL,
content_hash VARCHAR(100) NOT NULL,
content BYTEA, -- for small artifacts
storage_ref VARCHAR(500), -- for large artifacts (S3, etc.)
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_generated_artifacts_job ON generated_artifacts(deployment_job_id);
```
---
## Progressive Delivery
```sql
CREATE TABLE ab_releases (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
environment_id UUID NOT NULL REFERENCES environments(id),
name VARCHAR(255) NOT NULL,
variations JSONB NOT NULL, -- [{name, releaseId, targetGroupId, trafficPercentage}]
active_variation VARCHAR(50) NOT NULL DEFAULT 'A',
traffic_split JSONB NOT NULL,
rollout_strategy JSONB NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN (
'created', 'deploying', 'running', 'promoting', 'completed', 'rolled_back'
)),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
completed_at TIMESTAMPTZ,
created_by UUID REFERENCES users(id)
);
CREATE INDEX idx_ab_releases_tenant_env ON ab_releases(tenant_id, environment_id);
CREATE INDEX idx_ab_releases_status ON ab_releases(status);
CREATE TABLE canary_stages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
ab_release_id UUID NOT NULL REFERENCES ab_releases(id) ON DELETE CASCADE,
stage_number INTEGER NOT NULL,
traffic_percentage INTEGER NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
'pending', 'running', 'succeeded', 'failed', 'skipped'
)),
health_threshold DECIMAL(5,2),
duration_seconds INTEGER,
require_approval BOOLEAN NOT NULL DEFAULT FALSE,
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
health_result JSONB,
UNIQUE (ab_release_id, stage_number)
);
```
---
## Release Evidence
```sql
CREATE TABLE evidence_packets (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
promotion_id UUID NOT NULL REFERENCES promotions(id),
packet_type VARCHAR(50) NOT NULL CHECK (packet_type IN (
'release_decision', 'deployment', 'rollback', 'ab_promotion'
)),
content JSONB NOT NULL,
content_hash VARCHAR(100) NOT NULL,
signature TEXT,
signer_key_ref VARCHAR(255),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
-- Note: No UPDATE or DELETE allowed (append-only)
);
CREATE INDEX idx_evidence_packets_promotion ON evidence_packets(promotion_id);
CREATE INDEX idx_evidence_packets_created ON evidence_packets(created_at DESC);
-- Append-only enforcement via trigger
CREATE OR REPLACE FUNCTION prevent_evidence_modification()
RETURNS TRIGGER AS $$
BEGIN
RAISE EXCEPTION 'Evidence packets are immutable and cannot be modified or deleted';
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER evidence_packets_immutable
BEFORE UPDATE OR DELETE ON evidence_packets
FOR EACH ROW EXECUTE FUNCTION prevent_evidence_modification();
CREATE TABLE version_stickers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
target_id UUID NOT NULL REFERENCES targets(id),
deployment_job_id UUID REFERENCES deployment_jobs(id),
release_id UUID NOT NULL REFERENCES releases(id),
digest VARCHAR(100) NOT NULL,
sticker_content JSONB NOT NULL,
written_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
verified_at TIMESTAMPTZ,
verification_status VARCHAR(50) CHECK (verification_status IN ('valid', 'mismatch', 'missing'))
);
CREATE INDEX idx_version_stickers_target ON version_stickers(target_id);
CREATE INDEX idx_version_stickers_release ON version_stickers(release_id);
```
---
## Plugin Infrastructure
```sql
CREATE TABLE plugins (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL UNIQUE,
display_name VARCHAR(255) NOT NULL,
version VARCHAR(50) NOT NULL,
description TEXT,
manifest JSONB NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'inactive' CHECK (status IN (
'active', 'inactive', 'error'
)),
error_message TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE plugin_instances (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
plugin_id UUID NOT NULL REFERENCES plugins(id),
config JSONB NOT NULL DEFAULT '{}',
enabled BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, plugin_id)
);
CREATE INDEX idx_plugin_instances_tenant ON plugin_instances(tenant_id);
```
---
---
## Hybrid Logical Clock (HLC) for Distributed Ordering
**Optional Enhancement**: For strict distributed ordering and multi-region support, the following tables include optional `t_hlc` (Hybrid Logical Clock timestamp) and `ts_wall` (wall-clock timestamp) columns:
- `promotions` — Promotion state transitions
- `deployment_jobs` — Deployment task ordering
- `step_runs` — Workflow step execution ordering
**When to use HLC**:
- Multi-region deployments requiring strict causal ordering
- Deterministic replay across distributed systems
- Timeline event ordering in audit logs
**HLC Schema**:
```sql
t_hlc BIGINT -- HLC timestamp (monotonic, skew-tolerant)
ts_wall TIMESTAMPTZ -- Wall-clock timestamp (informational)
```
**Usage**:
- `t_hlc` is generated by `IHybridLogicalClock.Tick()` on state transitions
- `ts_wall` is populated by `TimeProvider.GetUtcNow()` for debugging
- Index on `t_hlc` for ordering queries: `CREATE INDEX idx_promotions_hlc ON promotions(t_hlc);`
**Reference**: See [Implementation Guide](../implementation-guide.md#hybrid-logical-clock-hlc-for-distributed-ordering) for HLC usage patterns.
---
## Row-Level Security (Multi-Tenancy)
All tables with `tenant_id` should have RLS enabled:
```sql
-- Enable RLS on all release tables
ALTER TABLE integrations ENABLE ROW LEVEL SECURITY;
ALTER TABLE environments ENABLE ROW LEVEL SECURITY;
ALTER TABLE targets ENABLE ROW LEVEL SECURITY;
ALTER TABLE releases ENABLE ROW LEVEL SECURITY;
ALTER TABLE promotions ENABLE ROW LEVEL SECURITY;
-- ... etc.
-- Example policy
CREATE POLICY tenant_isolation ON integrations
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::UUID);
```

View File

@@ -0,0 +1,403 @@
# Agent-Based Deployment
> Agent-based deployment using Docker and Compose agents for executing tasks on targets.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 10.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Agents Module](../modules/agents.md), [Deploy Orchestrator](../modules/deploy-orchestrator.md)
**Sprints:** [108_002 Docker Agent](../../../../implplan/SPRINT_20260110_108_002_AGENTS_docker.md), [108_003 Compose Agent](../../../../implplan/SPRINT_20260110_108_003_AGENTS_compose.md)
## Overview
Agent-based deployment uses lightweight agents installed on target hosts to execute deployment tasks. Agents communicate with the orchestrator over mTLS and receive tasks through heartbeat polling or WebSocket streams.
---
## Agent Task Protocol
### Task Payload Structure
```typescript
// Task assignment (Core -> Agent)
interface AgentTask {
id: UUID;
type: TaskType;
targetId: UUID;
payload: TaskPayload;
credentials: EncryptedCredentials;
timeout: number;
priority: TaskPriority;
idempotencyKey: string;
assignedAt: DateTime;
expiresAt: DateTime;
}
type TaskType =
| "deploy"
| "rollback"
| "health-check"
| "inspect"
| "execute-command"
| "upload-files"
| "write-sticker"
| "read-sticker";
interface DeployTaskPayload {
image: string;
digest: string;
config: DeployConfig;
artifacts: ArtifactReference[];
previousDigest?: string;
hooks: {
preDeploy?: HookConfig;
postDeploy?: HookConfig;
};
}
```
### Task Result Structure
```typescript
// Task result (Agent -> Core)
interface TaskResult {
taskId: UUID;
success: boolean;
startedAt: DateTime;
completedAt: DateTime;
// Success details
outputs?: Record<string, any>;
artifacts?: ArtifactReference[];
// Failure details
error?: string;
errorType?: string;
retriable?: boolean;
// Logs
logs: string;
// Metrics
metrics: {
pullDurationMs?: number;
deployDurationMs?: number;
healthCheckDurationMs?: number;
};
}
```
---
## Docker Agent Implementation
The Docker agent deploys single containers to Docker hosts with digest verification.
### Docker Agent Capabilities
- Pull images with digest verification
- Create and start containers
- Stop and remove containers
- Health check monitoring
- Version sticker management
- Rollback to previous container
### Deploy Task Flow
```typescript
class DockerAgent implements TargetExecutor {
private docker: Docker;
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
const { image, digest, config, previousDigest } = task;
const containerName = config.containerName;
// 1. Pull image and verify digest
this.log(`Pulling image ${image}@${digest}`);
await this.docker.pull(image, { digest });
const pulledDigest = await this.getImageDigest(image);
if (pulledDigest !== digest) {
throw new DigestMismatchError(
`Expected digest ${digest}, got ${pulledDigest}. Possible tampering detected.`
);
}
// 2. Run pre-deploy hook
if (task.hooks?.preDeploy) {
await this.runHook(task.hooks.preDeploy, "pre-deploy");
}
// 3. Stop and rename existing container
const existingContainer = await this.findContainer(containerName);
if (existingContainer) {
this.log(`Stopping existing container ${containerName}`);
await existingContainer.stop({ t: 10 });
await existingContainer.rename(`${containerName}-previous-${Date.now()}`);
}
// 4. Create new container
this.log(`Creating container ${containerName} from ${image}@${digest}`);
const container = await this.docker.createContainer({
name: containerName,
Image: `${image}@${digest}`, // Always use digest, not tag
Env: this.buildEnvVars(config.environment),
HostConfig: {
PortBindings: this.buildPortBindings(config.ports),
Binds: this.buildBindMounts(config.volumes),
RestartPolicy: { Name: config.restartPolicy || "unless-stopped" },
Memory: config.memoryLimit,
CpuQuota: config.cpuLimit,
},
Labels: {
"stella.release.id": config.releaseId,
"stella.release.name": config.releaseName,
"stella.digest": digest,
"stella.deployed.at": new Date().toISOString(),
},
});
// 5. Start container
this.log(`Starting container ${containerName}`);
await container.start();
// 6. Wait for container to be healthy (if health check configured)
if (config.healthCheck) {
this.log(`Waiting for container health check`);
const healthy = await this.waitForHealthy(container, config.healthCheck.timeout);
if (!healthy) {
// Rollback to previous container
await this.rollbackContainer(containerName, existingContainer);
throw new HealthCheckFailedError(`Container ${containerName} failed health check`);
}
}
// 7. Run post-deploy hook
if (task.hooks?.postDeploy) {
await this.runHook(task.hooks.postDeploy, "post-deploy");
}
// 8. Cleanup previous container
if (existingContainer && config.cleanupPrevious !== false) {
this.log(`Removing previous container`);
await existingContainer.remove({ force: true });
}
return {
success: true,
containerId: container.id,
previousDigest: previousDigest,
logs: this.getLogs(),
durationMs: this.getDuration(),
};
}
}
```
### Rollback Implementation
```typescript
async rollback(task: RollbackTaskPayload): Promise<DeployResult> {
const { containerName, targetDigest } = task;
// Find previous container or use specified digest
if (targetDigest) {
// Deploy specific digest
return this.deploy({
...task,
digest: targetDigest,
});
}
// Find and restore previous container
const previousContainer = await this.findContainer(`${containerName}-previous-*`);
if (!previousContainer) {
throw new RollbackError(`No previous container found for ${containerName}`);
}
// Stop current, rename, start previous
const currentContainer = await this.findContainer(containerName);
if (currentContainer) {
await currentContainer.stop({ t: 10 });
await currentContainer.rename(`${containerName}-failed-${Date.now()}`);
}
await previousContainer.rename(containerName);
await previousContainer.start();
return {
success: true,
containerId: previousContainer.id,
logs: this.getLogs(),
durationMs: this.getDuration(),
};
}
```
### Version Sticker Management
```typescript
async writeSticker(sticker: VersionSticker): Promise<void> {
const stickerPath = this.config.stickerPath || "/var/stella/version.json";
const stickerContent = JSON.stringify(sticker, null, 2);
// Write to host filesystem or container volume
if (this.config.stickerLocation === "volume") {
// Write to shared volume
await this.docker.run("alpine", [
"sh", "-c",
`echo '${stickerContent}' > ${stickerPath}`
], {
HostConfig: {
Binds: [`${this.config.stickerVolume}:/var/stella`]
}
});
} else {
// Write directly to host
fs.writeFileSync(stickerPath, stickerContent);
}
}
```
---
## Compose Agent Implementation
The Compose agent deploys multi-container applications defined in Docker Compose files.
### Compose Agent Capabilities
- Pull images for all services
- Verify digests for all services
- Deploy using compose lock files
- Health check all services
- Rollback to previous deployment
- Version sticker management
### Deploy Task Flow
```typescript
class ComposeAgent implements TargetExecutor {
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
const { artifacts, config } = task;
const deployDir = config.deploymentDirectory;
// 1. Write compose lock file
const composeLock = artifacts.find(a => a.type === "compose_lock");
const composeContent = await this.fetchArtifact(composeLock);
const composePath = path.join(deployDir, "compose.stella.lock.yml");
await fs.writeFile(composePath, composeContent);
// 2. Write any additional config files
for (const artifact of artifacts.filter(a => a.type === "config")) {
const content = await this.fetchArtifact(artifact);
await fs.writeFile(path.join(deployDir, artifact.name), content);
}
// 3. Run pre-deploy hook
if (task.hooks?.preDeploy) {
await this.runHook(task.hooks.preDeploy, deployDir);
}
// 4. Pull images
this.log("Pulling images...");
const pullResult = await this.runCompose(deployDir, ["pull"]);
if (!pullResult.success) {
throw new Error(`Failed to pull images: ${pullResult.stderr}`);
}
// 5. Verify digests
await this.verifyDigests(composePath, config.expectedDigests);
// 6. Deploy
this.log("Deploying services...");
const upResult = await this.runCompose(deployDir, [
"up", "-d",
"--remove-orphans",
"--force-recreate"
]);
if (!upResult.success) {
throw new Error(`Failed to deploy: ${upResult.stderr}`);
}
// 7. Wait for services to be healthy
if (config.healthCheck) {
this.log("Waiting for services to be healthy...");
const healthy = await this.waitForServicesHealthy(
deployDir,
config.healthCheck.timeout
);
if (!healthy) {
// Rollback
await this.rollbackToBackup(deployDir);
throw new HealthCheckFailedError("Services failed health check");
}
}
// 8. Run post-deploy hook
if (task.hooks?.postDeploy) {
await this.runHook(task.hooks.postDeploy, deployDir);
}
// 9. Write version sticker
await this.writeSticker(config.sticker, deployDir);
return {
success: true,
logs: this.getLogs(),
durationMs: this.getDuration(),
};
}
}
```
### Digest Verification
```typescript
private async verifyDigests(
composePath: string,
expectedDigests: Record<string, string>
): Promise<void> {
const composeContent = yaml.parse(await fs.readFile(composePath, "utf-8"));
for (const [service, expectedDigest] of Object.entries(expectedDigests)) {
const serviceConfig = composeContent.services[service];
if (!serviceConfig) {
throw new Error(`Service ${service} not found in compose file`);
}
const image = serviceConfig.image;
if (!image.includes("@sha256:")) {
throw new Error(`Service ${service} image not pinned to digest: ${image}`);
}
const actualDigest = image.split("@")[1];
if (actualDigest !== expectedDigest) {
throw new DigestMismatchError(
`Service ${service}: expected ${expectedDigest}, got ${actualDigest}`
);
}
}
}
```
---
## Security Considerations
1. **Digest Verification:** All deployments verify image digests before execution
2. **Credential Encryption:** Credentials are encrypted in transit and at rest
3. **mTLS Communication:** All agent-server communication uses mutual TLS
4. **Hook Sandboxing:** Pre/post-deploy hooks run in isolated environments
5. **Audit Logging:** All deployment actions are logged with actor context
---
## See Also
- [Agents Module](../modules/agents.md)
- [Agent Security](../security/agent-security.md)
- [Deployment Orchestrator](../modules/deploy-orchestrator.md)
- [Agentless Deployment](agentless.md)

View File

@@ -0,0 +1,427 @@
# Agentless Deployment (SSH/WinRM)
> Agentless deployment using SSH and WinRM for remote execution without installing agents.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 10.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Agents Module](../modules/agents.md), [Deploy Orchestrator](../modules/deploy-orchestrator.md)
**Sprints:** [108_004 SSH Agent](../../../../implplan/SPRINT_20260110_108_004_AGENTS_ssh.md), [108_005 WinRM Agent](../../../../implplan/SPRINT_20260110_108_005_AGENTS_winrm.md)
## Overview
Agentless deployment enables deployment to targets without requiring a pre-installed agent. The orchestrator connects directly to targets using SSH (Linux/Unix) or WinRM (Windows) to execute deployment commands.
---
## SSH Remote Executor
### Capabilities
- SSH key-based authentication
- File transfer via SFTP
- Remote command execution
- Docker operations over SSH
- Script execution
- Backup and rollback
### Connection Management
```typescript
class SSHRemoteExecutor implements TargetExecutor {
private ssh: SSHClient;
async connect(config: SSHConnectionConfig): Promise<void> {
const privateKey = await this.secrets.getSecret(config.privateKeyRef);
this.ssh = new SSHClient();
await this.ssh.connect({
host: config.host,
port: config.port || 22,
username: config.username,
privateKey: privateKey.value,
readyTimeout: config.connectionTimeout || 30000,
keepaliveInterval: 10000,
});
}
}
```
### Deploy Task Flow
```typescript
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
const { artifacts, config } = task;
const deployDir = config.deploymentDirectory;
try {
// 1. Ensure deployment directory exists
await this.exec(`mkdir -p ${deployDir}`);
await this.exec(`mkdir -p ${deployDir}/.stella-backup`);
// 2. Backup current deployment
await this.exec(`cp -r ${deployDir}/* ${deployDir}/.stella-backup/ 2>/dev/null || true`);
// 3. Upload artifacts
for (const artifact of artifacts) {
const content = await this.fetchArtifact(artifact);
const remotePath = path.join(deployDir, artifact.name);
await this.uploadFile(content, remotePath);
}
// 4. Run pre-deploy hook
if (task.hooks?.preDeploy) {
await this.runRemoteHook(task.hooks.preDeploy, deployDir);
}
// 5. Execute deployment script
const deployScript = artifacts.find(a => a.type === "deploy_script");
if (deployScript) {
const scriptPath = path.join(deployDir, deployScript.name);
await this.exec(`chmod +x ${scriptPath}`);
const result = await this.exec(scriptPath, {
cwd: deployDir,
timeout: config.deploymentTimeout,
env: config.environment,
});
if (result.exitCode !== 0) {
throw new DeploymentError(`Deploy script failed: ${result.stderr}`);
}
}
// 6. Run post-deploy hook
if (task.hooks?.postDeploy) {
await this.runRemoteHook(task.hooks.postDeploy, deployDir);
}
// 7. Health check
if (config.healthCheck) {
const healthy = await this.runHealthCheck(config.healthCheck);
if (!healthy) {
await this.rollback(task);
throw new HealthCheckFailedError("Health check failed");
}
}
// 8. Write version sticker
await this.writeSticker(config.sticker, deployDir);
// 9. Cleanup backup
await this.exec(`rm -rf ${deployDir}/.stella-backup`);
return {
success: true,
logs: this.getLogs(),
durationMs: this.getDuration(),
};
} finally {
this.ssh.end();
}
}
```
### Command Execution
```typescript
private async exec(
command: string,
options?: ExecOptions
): Promise<CommandResult> {
return new Promise((resolve, reject) => {
const timeout = options?.timeout || 60000;
let stdout = "";
let stderr = "";
this.ssh.exec(command, { cwd: options?.cwd }, (err, stream) => {
if (err) {
reject(err);
return;
}
const timer = setTimeout(() => {
stream.close();
reject(new TimeoutError(`Command timed out after ${timeout}ms`));
}, timeout);
stream.on("data", (data: Buffer) => {
stdout += data.toString();
this.log(data.toString());
});
stream.stderr.on("data", (data: Buffer) => {
stderr += data.toString();
this.log(`[stderr] ${data.toString()}`);
});
stream.on("close", (code: number) => {
clearTimeout(timer);
resolve({ exitCode: code, stdout, stderr });
});
});
});
}
```
### File Upload via SFTP
```typescript
private async uploadFile(content: Buffer | string, remotePath: string): Promise<void> {
return new Promise((resolve, reject) => {
this.ssh.sftp((err, sftp) => {
if (err) {
reject(err);
return;
}
const writeStream = sftp.createWriteStream(remotePath);
writeStream.on("close", () => resolve());
writeStream.on("error", reject);
writeStream.end(content);
});
});
}
```
### Rollback
```typescript
async rollback(task: RollbackTaskPayload): Promise<DeployResult> {
const deployDir = task.config.deploymentDirectory;
// Restore from backup
await this.exec(`rm -rf ${deployDir}/*`);
await this.exec(`cp -r ${deployDir}/.stella-backup/* ${deployDir}/`);
// Re-run deployment from backup
const deployScript = path.join(deployDir, "deploy.sh");
await this.exec(deployScript, { cwd: deployDir });
return {
success: true,
logs: this.getLogs(),
durationMs: this.getDuration(),
};
}
```
---
## WinRM Remote Executor
### Capabilities
- NTLM/Kerberos authentication
- PowerShell script execution
- File transfer via base64 encoding
- Windows container operations
- Windows service management
### Connection Management
```typescript
class WinRMRemoteExecutor implements TargetExecutor {
private winrm: WinRMClient;
async connect(config: WinRMConnectionConfig): Promise<void> {
const credential = await this.secrets.getSecret(config.credentialRef);
this.winrm = new WinRMClient({
host: config.host,
port: config.port || 5986,
username: credential.username,
password: credential.password,
protocol: config.useHttps ? "https" : "http",
authentication: config.authType || "ntlm", // ntlm, kerberos, basic
});
await this.winrm.openShell();
}
}
```
### Deploy Task Flow
```typescript
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
const { artifacts, config } = task;
const deployDir = config.deploymentDirectory;
try {
// 1. Ensure deployment directory exists
await this.execPowerShell(`
if (-not (Test-Path "${deployDir}")) {
New-Item -ItemType Directory -Path "${deployDir}" -Force
}
if (-not (Test-Path "${deployDir}\\.stella-backup")) {
New-Item -ItemType Directory -Path "${deployDir}\\.stella-backup" -Force
}
`);
// 2. Backup current deployment
await this.execPowerShell(`
Get-ChildItem "${deployDir}" -Exclude ".stella-backup" |
Copy-Item -Destination "${deployDir}\\.stella-backup" -Recurse -Force
`);
// 3. Upload artifacts
for (const artifact of artifacts) {
const content = await this.fetchArtifact(artifact);
const remotePath = `${deployDir}\\${artifact.name}`;
await this.uploadFile(content, remotePath);
}
// 4. Run pre-deploy hook
if (task.hooks?.preDeploy) {
await this.runRemoteHook(task.hooks.preDeploy, deployDir);
}
// 5. Execute deployment script
const deployScript = artifacts.find(a => a.type === "deploy_script");
if (deployScript) {
const scriptPath = `${deployDir}\\${deployScript.name}`;
const result = await this.execPowerShell(`
Set-Location "${deployDir}"
& "${scriptPath}"
exit $LASTEXITCODE
`, { timeout: config.deploymentTimeout });
if (result.exitCode !== 0) {
throw new DeploymentError(`Deploy script failed: ${result.stderr}`);
}
}
// 6. Run post-deploy hook
if (task.hooks?.postDeploy) {
await this.runRemoteHook(task.hooks.postDeploy, deployDir);
}
// 7. Health check
if (config.healthCheck) {
const healthy = await this.runHealthCheck(config.healthCheck);
if (!healthy) {
await this.rollback(task);
throw new HealthCheckFailedError("Health check failed");
}
}
// 8. Write version sticker
await this.writeSticker(config.sticker, deployDir);
// 9. Cleanup backup
await this.execPowerShell(`
Remove-Item -Path "${deployDir}\\.stella-backup" -Recurse -Force
`);
return {
success: true,
logs: this.getLogs(),
durationMs: this.getDuration(),
};
} finally {
this.winrm.closeShell();
}
}
```
### PowerShell Execution
```typescript
private async execPowerShell(
script: string,
options?: ExecOptions
): Promise<CommandResult> {
const encoded = Buffer.from(script, "utf16le").toString("base64");
return this.winrm.runCommand(
`powershell -EncodedCommand ${encoded}`,
{ timeout: options?.timeout || 60000 }
);
}
```
### File Upload
```typescript
private async uploadFile(content: Buffer | string, remotePath: string): Promise<void> {
// Use PowerShell to write file content
const base64Content = Buffer.from(content).toString("base64");
await this.execPowerShell(`
$bytes = [Convert]::FromBase64String("${base64Content}")
[IO.File]::WriteAllBytes("${remotePath}", $bytes)
`);
}
```
---
## Security Considerations
### SSH Security
1. **Key-Based Authentication:** Always use SSH keys, never passwords
2. **Key Rotation:** Regularly rotate SSH keys
3. **Bastion Hosts:** Use jump hosts for network isolation
4. **Connection Timeouts:** Enforce strict connection timeouts
5. **Known Hosts:** Verify host fingerprints
### WinRM Security
1. **HTTPS Required:** Always use WinRM over HTTPS in production
2. **Certificate Validation:** Validate server certificates
3. **Kerberos Preferred:** Use Kerberos when available, NTLM as fallback
4. **Credential Protection:** Store credentials in vault
5. **Session Cleanup:** Always close sessions after use
---
## Configuration Examples
### SSH Target Configuration
```yaml
target:
name: web-server-01
type: ssh
connection:
host: 192.168.1.100
port: 22
username: deploy
privateKeyRef: vault://ssh-keys/deploy-key
deployment:
directory: /opt/myapp
healthCheck:
command: curl -f http://localhost:8080/health
timeout: 30
```
### WinRM Target Configuration
```yaml
target:
name: windows-server-01
type: winrm
connection:
host: 192.168.1.200
port: 5986
useHttps: true
authType: kerberos
credentialRef: vault://windows-creds/deploy-user
deployment:
directory: C:\Apps\MyApp
healthCheck:
command: Invoke-WebRequest -Uri http://localhost:8080/health -UseBasicParsing
timeout: 30
```
---
## See Also
- [Agent-Based Deployment](agent-based.md)
- [Agents Module](../modules/agents.md)
- [Deployment Orchestrator](../modules/deploy-orchestrator.md)
- [Security Overview](../security/overview.md)

View File

@@ -0,0 +1,308 @@
# Artifact Generation
## Overview
Every deployment generates immutable artifacts that enable reproducibility, audit, and rollback.
## Generated Artifacts
### 1. Compose Lock File
**File:** `compose.stella.lock.yml`
A Docker Compose file with all image references pinned to specific digests.
```yaml
# compose.stella.lock.yml
# Generated by Stella Ops - DO NOT EDIT
# Release: myapp-v2.3.1
# Generated: 2026-01-10T14:30:00Z
# Generator: stella-artifact-generator@1.5.0
version: "3.8"
services:
api:
image: registry.example.com/myapp/api@sha256:abc123...
# Original tag: v2.3.1
deploy:
replicas: 2
environment:
- DATABASE_URL=${DATABASE_URL}
- REDIS_URL=${REDIS_URL}
labels:
stella.component.id: "comp-api-uuid"
stella.release.id: "rel-uuid"
stella.digest: "sha256:abc123..."
worker:
image: registry.example.com/myapp/worker@sha256:def456...
# Original tag: v2.3.1
deploy:
replicas: 1
labels:
stella.component.id: "comp-worker-uuid"
stella.release.id: "rel-uuid"
stella.digest: "sha256:def456..."
# Stella metadata
x-stella:
release:
id: "rel-uuid"
name: "myapp-v2.3.1"
created_at: "2026-01-10T14:00:00Z"
environment:
id: "env-uuid"
name: "production"
deployment:
id: "deploy-uuid"
started_at: "2026-01-10T14:30:00Z"
checksums:
sha256: "checksum-of-this-file"
```
### 2. Version Sticker
**File:** `stella.version.json`
Metadata file placed on deployment targets indicating current deployment state.
```json
{
"version": "1.0",
"generatedAt": "2026-01-10T14:35:00Z",
"generator": "stella-artifact-generator@1.5.0",
"release": {
"id": "rel-uuid",
"name": "myapp-v2.3.1",
"createdAt": "2026-01-10T14:00:00Z",
"components": [
{
"name": "api",
"digest": "sha256:abc123...",
"semver": "2.3.1",
"tag": "v2.3.1"
},
{
"name": "worker",
"digest": "sha256:def456...",
"semver": "2.3.1",
"tag": "v2.3.1"
}
]
},
"deployment": {
"id": "deploy-uuid",
"promotionId": "promo-uuid",
"environmentId": "env-uuid",
"environmentName": "production",
"targetId": "target-uuid",
"targetName": "prod-web-01",
"strategy": "rolling",
"startedAt": "2026-01-10T14:30:00Z",
"completedAt": "2026-01-10T14:35:00Z"
},
"deployer": {
"userId": "user-uuid",
"userName": "john.doe",
"agentId": "agent-uuid",
"agentName": "prod-agent-01"
},
"previous": {
"releaseId": "prev-rel-uuid",
"releaseName": "myapp-v2.3.0",
"digest": "sha256:789..."
},
"signature": "base64-encoded-signature",
"signatureAlgorithm": "RS256",
"signerKeyRef": "stella/signing/prod-key-2026"
}
```
### 3. Evidence Packet
**File:** Evidence stored in database (exportable as JSON/PDF)
See [Evidence Schema](../appendices/evidence-schema.md) for full specification.
### 4. Deployment Script (Optional)
**File:** `deploy.stella.script.dll` or `deploy.stella.sh`
When deployments use C# or shell scripts with hooks:
```csharp
// deploy.stella.csx (source, compiled to DLL)
#r "nuget: StellaOps.Sdk, 1.0.0"
using StellaOps.Sdk;
// Pre-deploy hook
await Context.RunPreDeployHook(async (ctx) => {
await ctx.ExecuteCommand("./scripts/backup-database.sh");
await ctx.HealthCheck("/ready", timeout: 30);
});
// Deploy
await Context.Deploy();
// Post-deploy hook
await Context.RunPostDeployHook(async (ctx) => {
await ctx.ExecuteCommand("./scripts/warm-cache.sh");
await ctx.Notify("slack", "Deployment complete");
});
```
## Artifact Storage
### Storage Structure
```
artifacts/
├── {tenant_id}/
│ ├── {deployment_id}/
│ │ ├── compose.stella.lock.yml
│ │ ├── deploy.stella.script.dll (if applicable)
│ │ ├── deploy.stella.script.csx (source)
│ │ ├── manifest.json
│ │ └── checksums.sha256
│ └── ...
└── ...
```
### Manifest File
```json
{
"version": "1.0",
"deploymentId": "deploy-uuid",
"createdAt": "2026-01-10T14:30:00Z",
"artifacts": [
{
"name": "compose.stella.lock.yml",
"type": "compose-lock",
"size": 2048,
"sha256": "abc123..."
},
{
"name": "deploy.stella.script.dll",
"type": "script-compiled",
"size": 8192,
"sha256": "def456..."
}
],
"totalSize": 10240,
"signature": "base64-signature"
}
```
## Artifact Generation Process
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ ARTIFACT GENERATION FLOW │
│ │
│ ┌─────────────────┐ │
│ │ Promotion │ │
│ │ Approved │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ ARTIFACT GENERATOR │ │
│ │ │ │
│ │ 1. Load release bundle (components, digests) │ │
│ │ 2. Load environment configuration (variables, secrets refs) │ │
│ │ 3. Load workflow template (hooks, scripts) │ │
│ │ 4. Generate compose.stella.lock.yml │ │
│ │ 5. Compile scripts (if any) │ │
│ │ 6. Generate version sticker template │ │
│ │ 7. Compute checksums │ │
│ │ 8. Sign artifacts │ │
│ │ 9. Store in artifact storage │ │
│ │ │ │
│ └────────────────────────────┬────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ DEPLOYMENT ORCHESTRATOR │ │
│ │ │ │
│ │ Artifacts distributed to targets via agents │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Artifact Properties
### Immutability
Once generated, artifacts are never modified:
- Content-addressed storage (hash in path/metadata)
- No overwrite capability
- Append-only storage pattern
### Integrity
All artifacts are:
- Checksummed (SHA-256)
- Signed with deployment key
- Verifiable at deployment time
### Retention
| Environment | Retention Period |
|-------------|------------------|
| Development | 30 days |
| Staging | 90 days |
| Production | 7 years (compliance) |
## API Operations
```yaml
# List artifacts for deployment
GET /api/v1/deployment-jobs/{id}/artifacts
Response: Artifact[]
# Download specific artifact
GET /api/v1/deployment-jobs/{id}/artifacts/{name}
Response: binary
# Get artifact manifest
GET /api/v1/deployment-jobs/{id}/artifacts/manifest
Response: ArtifactManifest
# Verify artifact integrity
POST /api/v1/deployment-jobs/{id}/artifacts/{name}/verify
Response: { valid: boolean, checksum: string, signature: string }
```
## Drift Detection
Version stickers enable drift detection:
```typescript
interface DriftCheck {
targetId: UUID;
expectedSticker: VersionSticker;
actualSticker: VersionSticker | null;
driftDetected: boolean;
driftType?: "missing" | "corrupted" | "mismatch";
details?: {
expectedDigest: string;
actualDigest: string;
field: string;
};
}
```
## References
- [Deployment Overview](overview.md)
- [Deployment Strategies](strategies.md)
- [Evidence Schema](../appendices/evidence-schema.md)

View File

@@ -0,0 +1,671 @@
# Deployment Overview
## Purpose
The Deployment system executes the actual deployment of releases to target environments, managing deployment jobs, tasks, artifact generation, and rollback capabilities.
## Deployment Architecture
```
DEPLOYMENT ARCHITECTURE
┌─────────────────────────────────────────────────────────────────────────────┐
│ DEPLOY ORCHESTRATOR │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ DEPLOYMENT JOB MANAGER │ │
│ │ │ │
│ │ Promotion ───► Create Job ───► Plan Tasks ───► Execute Tasks │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ TARGET EXECUTOR │ │ RUNNER EXECUTOR │ │ ARTIFACT GENERATOR │ │
│ │ │ │ │ │ │ │
│ │ - Task dispatch │ │ - Agent tasks │ │ - Compose files │ │
│ │ - Status tracking │ │ - SSH tasks │ │ - Env configs │ │
│ │ - Log aggregation │ │ - API tasks │ │ - Manifests │ │
│ └─────────────────────┘ └─────────────────┘ └─────────────────────┘ │
│ │ │
└─────────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────┼────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Agent │ │ Agentless │ │ API │
│ Execution │ │ Execution │ │ Execution │
│ │ │ │ │ │
│ Docker, │ │ SSH, │ │ ECS, │
│ Compose │ │ WinRM │ │ Nomad │
└─────────────┘ └─────────────┘ └─────────────┘
```
## Deployment Flow
### Standard Deployment Flow
```
DEPLOYMENT FLOW
Promotion Deployment Task Agent/Target
Approved Job Execution
│ │ │ │
│ Create Job │ │ │
├───────────────►│ │ │
│ │ │ │
│ │ Generate │ │
│ │ Artifacts │ │
│ ├────────────────►│ │
│ │ │ │
│ │ Create Tasks │ │
│ │ per Target │ │
│ ├────────────────►│ │
│ │ │ │
│ │ │ Dispatch Task │
│ │ ├────────────────►│
│ │ │ │
│ │ │ Execute │
│ │ │ (Pull, Deploy) │
│ │ │ │
│ │ │ Report Status │
│ │ │◄────────────────┤
│ │ │ │
│ │ Aggregate │ │
│ │ Results │ │
│ │◄────────────────┤ │
│ │ │ │
│ Job Complete │ │ │
│◄───────────────┤ │ │
│ │ │ │
```
## Deployment Job
### Job Entity
```typescript
interface DeploymentJob {
id: UUID;
promotionId: UUID;
releaseId: UUID;
environmentId: UUID;
// Execution configuration
strategy: DeploymentStrategy;
parallelism: number;
// Status tracking
status: JobStatus;
startedAt?: DateTime;
completedAt?: DateTime;
// Artifacts
artifacts: GeneratedArtifact[];
// Rollback reference
rollbackOf?: UUID; // If this is a rollback job
previousJobId?: UUID; // Previous successful job
// Tasks
tasks: DeploymentTask[];
}
type JobStatus =
| "pending"
| "preparing"
| "running"
| "completing"
| "completed"
| "failed"
| "rolling_back"
| "rolled_back";
type DeploymentStrategy =
| "all-at-once"
| "rolling"
| "canary"
| "blue-green";
```
### Job State Machine
```
JOB STATE MACHINE
┌──────────┐
│ PENDING │
└────┬─────┘
│ start()
┌──────────┐
│PREPARING │
│ │
│ Generate │
│ artifacts│
└────┬─────┘
┌──────────┐
│ RUNNING │◄────────────────┐
│ │ │
│ Execute │ │
│ tasks │ │
└────┬─────┘ │
│ │
┌───────────────┼───────────────┐ │
│ │ │ │
▼ ▼ ▼ │
┌──────────┐ ┌──────────┐ ┌──────────┐ │
│COMPLETING│ │ FAILED │ │ ROLLING │ │
│ │ │ │ │ BACK │──┘
│ Verify │ │ │ │ │
│ health │ │ │ │ │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
▼ │ ▼
┌──────────┐ │ ┌──────────┐
│COMPLETED │ │ │ ROLLED │
└──────────┘ │ │ BACK │
│ └──────────┘
[Failure
handling]
```
## Deployment Task
### Task Entity
```typescript
interface DeploymentTask {
id: UUID;
jobId: UUID;
targetId: UUID;
// What to deploy
componentId: UUID;
digest: string;
// Execution
status: TaskStatus;
agentId?: UUID;
startedAt?: DateTime;
completedAt?: DateTime;
// Results
logs: string;
previousDigest?: string; // For rollback
error?: string;
// Retry tracking
attemptNumber: number;
maxAttempts: number;
}
type TaskStatus =
| "pending"
| "queued"
| "dispatched"
| "running"
| "verifying"
| "succeeded"
| "failed"
| "retrying";
```
### Task Dispatch
```typescript
class TaskDispatcher {
async dispatchTask(task: DeploymentTask): Promise<void> {
const target = await this.targetRepository.get(task.targetId);
switch (target.executionModel) {
case "agent":
await this.dispatchToAgent(task, target);
break;
case "ssh":
await this.dispatchViaSsh(task, target);
break;
case "api":
await this.dispatchViaApi(task, target);
break;
}
}
private async dispatchToAgent(
task: DeploymentTask,
target: Target
): Promise<void> {
// Find available agent for target
const agent = await this.agentManager.findAgentForTarget(target);
if (!agent) {
throw new NoAgentAvailableError(target.id);
}
// Create task payload
const payload: AgentTaskPayload = {
taskId: task.id,
targetId: target.id,
action: "deploy",
digest: task.digest,
config: target.connection,
credentials: await this.fetchTaskCredentials(target)
};
// Dispatch to agent
await this.agentClient.dispatchTask(agent.id, payload);
// Update task status
task.status = "dispatched";
task.agentId = agent.id;
await this.taskRepository.update(task);
}
}
```
## Generated Artifacts
### Artifact Types
| Type | Description | Format |
|------|-------------|--------|
| `compose-file` | Docker Compose file | YAML |
| `compose-lock` | Pinned compose file | YAML |
| `env-file` | Environment variables | .env |
| `systemd-unit` | Systemd service unit | .service |
| `nginx-config` | Nginx configuration | .conf |
| `manifest` | Deployment manifest | JSON |
### Compose Lock Generation
```typescript
interface ComposeLock {
version: string;
services: Record<string, LockedService>;
generated: {
releaseId: string;
promotionId: string;
timestamp: string;
digest: string; // Hash of this file
};
}
interface LockedService {
image: string; // Full image reference with digest
environment?: Record<string, string>;
labels: Record<string, string>;
}
class ComposeArtifactGenerator {
async generateLock(
release: Release,
target: Target,
template: ComposeTemplate
): Promise<ComposeLock> {
const services: Record<string, LockedService> = {};
for (const [serviceName, serviceConfig] of Object.entries(template.services)) {
// Find component for this service
const componentDigest = release.components.find(
c => c.name === serviceConfig.componentName
);
if (!componentDigest) {
throw new Error(`No component found for service ${serviceName}`);
}
// Build locked image reference
const imageRef = `${componentDigest.repository}@${componentDigest.digest}`;
services[serviceName] = {
image: imageRef,
environment: {
...serviceConfig.environment,
STELLA_RELEASE_ID: release.id,
STELLA_DIGEST: componentDigest.digest
},
labels: {
"stella.release.id": release.id,
"stella.component.name": componentDigest.name,
"stella.digest": componentDigest.digest,
"stella.deployed.at": new Date().toISOString()
}
};
}
const lock: ComposeLock = {
version: "3.8",
services,
generated: {
releaseId: release.id,
promotionId: target.promotionId,
timestamp: new Date().toISOString(),
digest: "" // Computed below
}
};
// Compute content hash
const content = yaml.stringify(lock);
lock.generated.digest = crypto.createHash("sha256").update(content).digest("hex");
return lock;
}
}
```
## Deployment Execution
### Execution Models
| Model | Description | Use Case |
|-------|-------------|----------|
| `agent` | Stella agent on target | Docker hosts, servers |
| `ssh` | SSH-based agentless | Unix servers |
| `winrm` | WinRM-based agentless | Windows servers |
| `api` | API-based | ECS, Nomad, K8s |
### Agent-Based Execution
```typescript
class AgentExecutor {
async execute(task: DeploymentTask): Promise<ExecutionResult> {
const agent = await this.agentManager.get(task.agentId);
const target = await this.targetRepository.get(task.targetId);
// Prepare task payload with secrets
const payload: TaskPayload = {
taskId: task.id,
targetId: target.id,
action: "deploy",
digest: task.digest,
config: target.connection,
artifacts: await this.getArtifacts(task.jobId),
credentials: await this.secretsManager.fetchForTask(target)
};
// Dispatch to agent
const taskRef = await this.agentClient.dispatchTask(agent.id, payload);
// Wait for completion
const result = await this.waitForTaskCompletion(taskRef, task.timeout);
return result;
}
private async waitForTaskCompletion(
taskRef: TaskReference,
timeout: number
): Promise<ExecutionResult> {
const deadline = Date.now() + timeout * 1000;
while (Date.now() < deadline) {
const status = await this.agentClient.getTaskStatus(taskRef);
if (status.completed) {
return {
success: status.success,
logs: status.logs,
deployedDigest: status.deployedDigest,
error: status.error
};
}
await sleep(1000);
}
throw new TimeoutError(`Task did not complete within ${timeout} seconds`);
}
}
```
### SSH-Based Execution
```typescript
class SshExecutor {
async execute(task: DeploymentTask): Promise<ExecutionResult> {
const target = await this.targetRepository.get(task.targetId);
const sshConfig = target.connection as SshConnectionConfig;
// Get SSH credentials from vault
const creds = await this.secretsManager.fetchSshCredentials(
sshConfig.credentialRef
);
// Connect via SSH
const ssh = new NodeSSH();
await ssh.connect({
host: sshConfig.host,
port: sshConfig.port || 22,
username: creds.username,
privateKey: creds.privateKey
});
try {
// Upload artifacts
const artifacts = await this.getArtifacts(task.jobId);
for (const artifact of artifacts) {
await ssh.putFile(artifact.localPath, artifact.remotePath);
}
// Execute deployment script
const result = await ssh.execCommand(
this.buildDeployCommand(task, target),
{ cwd: sshConfig.workDir }
);
return {
success: result.code === 0,
logs: `${result.stdout}\n${result.stderr}`,
error: result.code !== 0 ? result.stderr : undefined
};
} finally {
ssh.dispose();
}
}
private buildDeployCommand(task: DeploymentTask, target: Target): string {
// Build deployment command based on target type
switch (target.targetType) {
case "compose_host":
return `cd ${target.connection.workDir} && docker-compose pull && docker-compose up -d`;
case "docker_host":
return `docker pull ${task.digest} && docker stop ${target.containerName} && docker run -d --name ${target.containerName} ${task.digest}`;
default:
throw new Error(`Unsupported target type: ${target.targetType}`);
}
}
}
```
## Health Verification
```typescript
interface HealthCheckConfig {
type: "http" | "tcp" | "command";
timeout: number;
retries: number;
interval: number;
// HTTP-specific
path?: string;
expectedStatus?: number;
expectedBody?: string;
// TCP-specific
port?: number;
// Command-specific
command?: string;
}
class HealthVerifier {
async verify(
target: Target,
config: HealthCheckConfig
): Promise<HealthCheckResult> {
let lastError: Error | undefined;
for (let attempt = 0; attempt < config.retries; attempt++) {
try {
const result = await this.performCheck(target, config);
if (result.healthy) {
return result;
}
lastError = new Error(result.message);
} catch (error) {
lastError = error as Error;
}
if (attempt < config.retries - 1) {
await sleep(config.interval * 1000);
}
}
return {
healthy: false,
message: lastError?.message || "Health check failed",
attempts: config.retries
};
}
private async performCheck(
target: Target,
config: HealthCheckConfig
): Promise<HealthCheckResult> {
switch (config.type) {
case "http":
return this.httpCheck(target, config);
case "tcp":
return this.tcpCheck(target, config);
case "command":
return this.commandCheck(target, config);
}
}
private async httpCheck(
target: Target,
config: HealthCheckConfig
): Promise<HealthCheckResult> {
const url = `${target.healthEndpoint}${config.path || "/health"}`;
try {
const response = await fetch(url, {
signal: AbortSignal.timeout(config.timeout * 1000)
});
const healthy = response.status === (config.expectedStatus || 200);
return {
healthy,
message: healthy ? "OK" : `Status ${response.status}`,
statusCode: response.status
};
} catch (error) {
return {
healthy: false,
message: (error as Error).message
};
}
}
}
```
## Rollback Management
```typescript
class RollbackManager {
async initiateRollback(
jobId: UUID,
reason: string
): Promise<DeploymentJob> {
const failedJob = await this.jobRepository.get(jobId);
const previousJob = await this.findPreviousSuccessfulJob(
failedJob.environmentId,
failedJob.releaseId
);
if (!previousJob) {
throw new NoRollbackTargetError(jobId);
}
// Create rollback job
const rollbackJob: DeploymentJob = {
id: uuidv4(),
promotionId: failedJob.promotionId,
releaseId: previousJob.releaseId, // Previous release
environmentId: failedJob.environmentId,
strategy: "all-at-once", // Fast rollback
parallelism: 10,
status: "pending",
rollbackOf: jobId,
previousJobId: previousJob.id,
artifacts: [],
tasks: []
};
// Create tasks to restore previous state
for (const task of failedJob.tasks) {
const previousTask = previousJob.tasks.find(
t => t.targetId === task.targetId
);
if (previousTask) {
rollbackJob.tasks.push({
id: uuidv4(),
jobId: rollbackJob.id,
targetId: task.targetId,
componentId: previousTask.componentId,
digest: previousTask.previousDigest || task.previousDigest!,
status: "pending",
logs: "",
attemptNumber: 0,
maxAttempts: 3
});
}
}
await this.jobRepository.save(rollbackJob);
// Execute rollback
await this.executeJob(rollbackJob);
return rollbackJob;
}
private async findPreviousSuccessfulJob(
environmentId: UUID,
excludeReleaseId: UUID
): Promise<DeploymentJob | null> {
return this.jobRepository.findOne({
environmentId,
status: "completed",
releaseId: { $ne: excludeReleaseId }
}, {
orderBy: { completedAt: "desc" }
});
}
}
```
## References
- [Deployment Strategies](strategies.md)
- [Agent-Based Deployment](agent-based.md)
- [Agentless Deployment](agentless.md)
- [Generated Artifacts](artifacts.md)
- [Deploy Orchestrator Module](../modules/deploy-orchestrator.md)

View File

@@ -0,0 +1,656 @@
# Deployment Strategies
## Overview
Release Orchestrator supports multiple deployment strategies to balance deployment speed, risk, and availability requirements.
## Strategy Comparison
| Strategy | Description | Risk Level | Downtime | Rollback Speed |
|----------|-------------|------------|----------|----------------|
| All-at-once | Deploy to all targets simultaneously | High | Brief | Fast |
| Rolling | Deploy to targets in batches | Medium | None | Medium |
| Canary | Deploy to subset, then expand | Low | None | Fast |
| Blue-Green | Deploy to parallel environment | Low | None | Instant |
## All-at-Once Strategy
### Description
Deploys to all targets simultaneously. Simple and fast, but highest risk.
```
ALL-AT-ONCE DEPLOYMENT
Time 0 Time 1
┌─────────────────┐ ┌─────────────────┐
│ Target 1 [v1] │ │ Target 1 [v2] │
├─────────────────┤ ├─────────────────┤
│ Target 2 [v1] │ ───► │ Target 2 [v2] │
├─────────────────┤ ├─────────────────┤
│ Target 3 [v1] │ │ Target 3 [v2] │
└─────────────────┘ └─────────────────┘
```
### Configuration
```typescript
interface AllAtOnceConfig {
strategy: "all-at-once";
// Concurrency limit (0 = unlimited)
maxConcurrent: number;
// Health check after deployment
healthCheck: HealthCheckConfig;
// Failure behavior
failureBehavior: "rollback" | "continue" | "pause";
}
// Example
const config: AllAtOnceConfig = {
strategy: "all-at-once",
maxConcurrent: 0,
healthCheck: {
type: "http",
path: "/health",
timeout: 30,
retries: 3,
interval: 10
},
failureBehavior: "rollback"
};
```
### Execution
```typescript
class AllAtOnceExecutor {
async execute(job: DeploymentJob, config: AllAtOnceConfig): Promise<void> {
const tasks = job.tasks;
const concurrency = config.maxConcurrent || tasks.length;
// Execute all tasks with concurrency limit
const results = await pMap(
tasks,
async (task) => {
try {
await this.executeTask(task);
return { taskId: task.id, success: true };
} catch (error) {
return { taskId: task.id, success: false, error };
}
},
{ concurrency }
);
// Check for failures
const failures = results.filter(r => !r.success);
if (failures.length > 0) {
if (config.failureBehavior === "rollback") {
await this.rollbackAll(job);
throw new DeploymentFailedError(failures);
} else if (config.failureBehavior === "pause") {
job.status = "failed";
throw new DeploymentFailedError(failures);
}
// "continue" - proceed despite failures
}
// Health check all targets
await this.verifyAllTargets(job, config.healthCheck);
}
}
```
### Use Cases
- Development environments
- Small deployments
- Time-critical updates
- Stateless services with fast startup
## Rolling Strategy
### Description
Deploys to targets in configurable batches, maintaining availability throughout.
```
ROLLING DEPLOYMENT (batch size: 1)
Time 0 Time 1 Time 2 Time 3
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ T1 [v1] │ │ T1 [v2] ✓ │ │ T1 [v2] ✓ │ │ T1 [v2] ✓ │
├─────────────┤ ├─────────────┤ ├─────────────┤ ├─────────────┤
│ T2 [v1] │──►│ T2 [v1] │──►│ T2 [v2] ✓ │──►│ T2 [v2] ✓ │
├─────────────┤ ├─────────────┤ ├─────────────┤ ├─────────────┤
│ T3 [v1] │ │ T3 [v1] │ │ T3 [v1] │ │ T3 [v2] ✓ │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
```
### Configuration
```typescript
interface RollingConfig {
strategy: "rolling";
// Batch configuration
batchSize: number; // Targets per batch
batchPercent?: number; // Alternative: percentage of targets
// Timing
batchDelay: number; // Seconds between batches
stabilizationTime: number; // Wait after health check passes
// Health check
healthCheck: HealthCheckConfig;
// Failure handling
maxFailedBatches: number; // Failures before stopping
failureBehavior: "rollback" | "pause" | "skip";
// Ordering
targetOrder: "default" | "shuffle" | "priority";
}
// Example
const config: RollingConfig = {
strategy: "rolling",
batchSize: 2,
batchDelay: 30,
stabilizationTime: 60,
healthCheck: {
type: "http",
path: "/health",
timeout: 30,
retries: 5,
interval: 10
},
maxFailedBatches: 1,
failureBehavior: "rollback",
targetOrder: "default"
};
```
### Execution
```typescript
class RollingExecutor {
async execute(job: DeploymentJob, config: RollingConfig): Promise<void> {
const tasks = this.orderTasks(job.tasks, config.targetOrder);
const batches = this.createBatches(tasks, config);
let failedBatches = 0;
const completedTasks: DeploymentTask[] = [];
for (const batch of batches) {
this.emitProgress(job, {
phase: "deploying",
currentBatch: batches.indexOf(batch) + 1,
totalBatches: batches.length,
completedTargets: completedTasks.length,
totalTargets: tasks.length
});
// Execute batch
const results = await Promise.all(
batch.map(task => this.executeTask(task))
);
// Check batch results
const failures = results.filter(r => !r.success);
if (failures.length > 0) {
failedBatches++;
if (failedBatches > config.maxFailedBatches) {
if (config.failureBehavior === "rollback") {
await this.rollbackCompleted(completedTasks);
}
throw new DeploymentFailedError(failures);
}
if (config.failureBehavior === "pause") {
job.status = "failed";
throw new DeploymentFailedError(failures);
}
// "skip" - continue to next batch
}
// Health check batch targets
await this.verifyBatch(batch, config.healthCheck);
// Wait for stabilization
if (config.stabilizationTime > 0) {
await sleep(config.stabilizationTime * 1000);
}
completedTasks.push(...batch);
// Wait before next batch
if (batches.indexOf(batch) < batches.length - 1) {
await sleep(config.batchDelay * 1000);
}
}
}
private createBatches(
tasks: DeploymentTask[],
config: RollingConfig
): DeploymentTask[][] {
const batchSize = config.batchPercent
? Math.ceil(tasks.length * config.batchPercent / 100)
: config.batchSize;
const batches: DeploymentTask[][] = [];
for (let i = 0; i < tasks.length; i += batchSize) {
batches.push(tasks.slice(i, i + batchSize));
}
return batches;
}
}
```
### Use Cases
- Production deployments
- High-availability requirements
- Large target counts
- Services requiring gradual rollout
## Canary Strategy
### Description
Deploys to a small subset of targets first, validates, then expands to remaining targets.
```
CANARY DEPLOYMENT
Phase 1: Canary (10%) Phase 2: Expand (50%) Phase 3: Full (100%)
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ T1 [v2] ✓ │ ◄─canary │ T1 [v2] ✓ │ │ T1 [v2] ✓ │
├─────────────┤ ├─────────────┤ ├─────────────┤
│ T2 [v1] │ │ T2 [v2] ✓ │ │ T2 [v2] ✓ │
├─────────────┤ ├─────────────┤ ├─────────────┤
│ T3 [v1] │ │ T3 [v2] ✓ │ │ T3 [v2] ✓ │
├─────────────┤ ├─────────────┤ ├─────────────┤
│ T4 [v1] │ │ T4 [v2] ✓ │ │ T4 [v2] ✓ │
├─────────────┤ ├─────────────┤ ├─────────────┤
│ T5 [v1] │ │ T5 [v1] │ │ T5 [v2] ✓ │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
▼ ▼ ▼
Health Check Health Check Health Check
Error Rate Check Error Rate Check Error Rate Check
```
### Configuration
```typescript
interface CanaryConfig {
strategy: "canary";
// Canary stages
stages: CanaryStage[];
// Canary selection
canarySelector: "random" | "labeled" | "first";
canaryLabel?: string; // Label for canary targets
// Automatic vs manual progression
autoProgress: boolean;
// Health and metrics checks
healthCheck: HealthCheckConfig;
metricsCheck?: MetricsCheckConfig;
}
interface CanaryStage {
name: string;
percentage: number; // Target percentage
duration: number; // Minimum time at this stage (seconds)
autoProgress: boolean; // Auto-advance after duration
}
interface MetricsCheckConfig {
integrationId: UUID; // Metrics integration
queries: MetricQuery[];
failureThreshold: number; // Percentage deviation to fail
}
interface MetricQuery {
name: string;
query: string; // PromQL or similar
operator: "lt" | "gt" | "eq";
threshold: number;
}
// Example
const config: CanaryConfig = {
strategy: "canary",
stages: [
{ name: "canary", percentage: 10, duration: 300, autoProgress: false },
{ name: "expand", percentage: 50, duration: 300, autoProgress: true },
{ name: "full", percentage: 100, duration: 0, autoProgress: true }
],
canarySelector: "labeled",
canaryLabel: "canary=true",
autoProgress: false,
healthCheck: {
type: "http",
path: "/health",
timeout: 30,
retries: 5,
interval: 10
},
metricsCheck: {
integrationId: "prometheus-uuid",
queries: [
{
name: "error_rate",
query: "rate(http_requests_total{status=~\"5..\"}[5m]) / rate(http_requests_total[5m])",
operator: "lt",
threshold: 0.01 // Less than 1% error rate
}
],
failureThreshold: 10
}
};
```
### Execution
```typescript
class CanaryExecutor {
async execute(job: DeploymentJob, config: CanaryConfig): Promise<void> {
const tasks = this.orderTasks(job.tasks, config);
for (const stage of config.stages) {
const targetCount = Math.ceil(tasks.length * stage.percentage / 100);
const stageTasks = tasks.slice(0, targetCount);
const newTasks = stageTasks.filter(t => t.status === "pending");
this.emitProgress(job, {
phase: "canary",
stage: stage.name,
percentage: stage.percentage,
targets: stageTasks.length
});
// Deploy to new targets in this stage
await Promise.all(newTasks.map(task => this.executeTask(task)));
// Health check stage targets
await this.verifyTargets(stageTasks, config.healthCheck);
// Metrics check if configured
if (config.metricsCheck) {
await this.checkMetrics(stageTasks, config.metricsCheck);
}
// Wait for stage duration
if (stage.duration > 0) {
await this.waitWithMonitoring(
stageTasks,
stage.duration,
config.metricsCheck
);
}
// Wait for manual approval if not auto-progress
if (!stage.autoProgress && stage.percentage < 100) {
await this.waitForApproval(job, stage.name);
}
}
}
private async checkMetrics(
targets: DeploymentTask[],
config: MetricsCheckConfig
): Promise<void> {
const metricsClient = await this.getMetricsClient(config.integrationId);
for (const query of config.queries) {
const result = await metricsClient.query(query.query);
const passed = this.evaluateMetric(result, query);
if (!passed) {
throw new CanaryMetricsFailedError(query.name, result, query.threshold);
}
}
}
}
```
### Use Cases
- Risk-sensitive deployments
- Services with real user traffic
- Deployments with metrics-based validation
- Gradual feature rollouts
## Blue-Green Strategy
### Description
Deploys to a parallel "green" environment while "blue" continues serving traffic, then switches.
```
BLUE-GREEN DEPLOYMENT
Phase 1: Deploy Green Phase 2: Switch Traffic
┌─────────────────────────┐ ┌─────────────────────────┐
│ Load Balancer │ │ Load Balancer │
│ │ │ │ │ │
│ ▼ │ │ ▼ │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ Blue [v1] │◄─active│ │ │ Blue [v1] │ │
│ │ T1, T2, T3 │ │ │ │ T1, T2, T3 │ │
│ └─────────────┘ │ │ └─────────────┘ │
│ │ │ │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ Green [v2] │◄─deploy│ │ │ Green [v2] │◄─active│
│ │ T4, T5, T6 │ │ │ │ T4, T5, T6 │ │
│ └─────────────┘ │ │ └─────────────┘ │
│ │ │ │
└─────────────────────────┘ └─────────────────────────┘
```
### Configuration
```typescript
interface BlueGreenConfig {
strategy: "blue-green";
// Environment labels
blueLabel: string; // Label for blue targets
greenLabel: string; // Label for green targets
// Traffic routing
routerIntegration: UUID; // Router/LB integration
routingConfig: RoutingConfig;
// Validation
healthCheck: HealthCheckConfig;
warmupTime: number; // Seconds to warm up green
validationTests?: string[]; // Test suites to run
// Switchover
switchoverMode: "instant" | "gradual";
gradualSteps?: number[]; // Percentage steps for gradual
// Rollback
keepBlueActive: number; // Seconds to keep blue ready
}
// Example
const config: BlueGreenConfig = {
strategy: "blue-green",
blueLabel: "deployment=blue",
greenLabel: "deployment=green",
routerIntegration: "nginx-lb-uuid",
routingConfig: {
upstreamName: "myapp",
healthEndpoint: "/health"
},
healthCheck: {
type: "http",
path: "/health",
timeout: 30,
retries: 5,
interval: 10
},
warmupTime: 60,
validationTests: ["smoke-test-suite"],
switchoverMode: "instant",
keepBlueActive: 1800 // 30 minutes
};
```
### Execution
```typescript
class BlueGreenExecutor {
async execute(job: DeploymentJob, config: BlueGreenConfig): Promise<void> {
// Identify blue and green targets
const { blue, green } = this.categorizeTargets(job.tasks, config);
// Phase 1: Deploy to green
this.emitProgress(job, { phase: "deploying-green" });
await Promise.all(green.map(task => this.executeTask(task)));
// Health check green targets
await this.verifyTargets(green, config.healthCheck);
// Warmup period
if (config.warmupTime > 0) {
this.emitProgress(job, { phase: "warming-up" });
await sleep(config.warmupTime * 1000);
}
// Run validation tests
if (config.validationTests?.length) {
this.emitProgress(job, { phase: "validating" });
await this.runValidationTests(green, config.validationTests);
}
// Phase 2: Switch traffic
this.emitProgress(job, { phase: "switching-traffic" });
if (config.switchoverMode === "instant") {
await this.instantSwitchover(config, blue, green);
} else {
await this.gradualSwitchover(config, blue, green);
}
// Verify traffic routing
await this.verifyRouting(green, config);
// Schedule blue decommission
if (config.keepBlueActive > 0) {
this.scheduleBlueDecommission(blue, config.keepBlueActive);
}
}
private async instantSwitchover(
config: BlueGreenConfig,
blue: DeploymentTask[],
green: DeploymentTask[]
): Promise<void> {
const router = await this.getRouter(config.routerIntegration);
// Update upstream to green targets
await router.updateUpstream(config.routingConfig.upstreamName, {
servers: green.map(t => ({
address: t.target.address,
weight: 1
}))
});
// Remove blue from rotation
await router.removeServers(
config.routingConfig.upstreamName,
blue.map(t => t.target.address)
);
}
private async gradualSwitchover(
config: BlueGreenConfig,
blue: DeploymentTask[],
green: DeploymentTask[]
): Promise<void> {
const router = await this.getRouter(config.routerIntegration);
const steps = config.gradualSteps || [25, 50, 75, 100];
for (const percentage of steps) {
await router.setTrafficSplit(config.routingConfig.upstreamName, {
blue: 100 - percentage,
green: percentage
});
// Monitor for errors
await this.monitorTraffic(30);
}
}
}
```
### Use Cases
- Zero-downtime deployments
- Database migration deployments
- High-stakes production updates
- Instant rollback requirements
## Strategy Selection Guide
```
STRATEGY SELECTION
START
┌────────────────────────┐
│ Zero downtime needed? │
└───────────┬────────────┘
No │ Yes
│ │ │
▼ │ ▼
┌──────────┐ │ ┌───────────────────┐
│ All-at- │ │ │ Metrics-based │
│ once │ │ │ validation needed?│
└──────────┘ │ └─────────┬─────────┘
│ │
│ No │ Yes
│ │ │ │
│ ▼ │ ▼
│ ┌──────────┐│ ┌──────────┐
│ │ Instant ││ │ Canary │
│ │ rollback? ││ │ │
│ └────┬─────┘│ └──────────┘
│ │ │
│ No │ Yes │
│ │ │ │ │
│ ▼ │ ▼ │
│┌──────┐│┌────┴─────┐
││Rolling│││Blue-Green│
│└──────┘│└──────────┘
│ │
└───────┘
```
## References
- [Deployment Overview](overview.md)
- [Progressive Delivery](../modules/progressive-delivery.md)
- [Rollback Management](overview.md#rollback-management)

View File

@@ -0,0 +1,249 @@
# Key Architectural Decisions
This document records significant architectural decisions and their rationale.
## ADR-001: Digest-First Release Identity
**Status:** Accepted
**Context:**
Container images can be referenced by tags (e.g., `v1.2.3`) or digests (e.g., `sha256:abc123...`). Tags are mutable - the same tag can point to different images over time.
**Decision:**
All releases are identified by immutable OCI digests, never tags. Tags are accepted as input but immediately resolved to digests at release creation time.
**Consequences:**
- Releases are immutable and reproducible
- Digest mismatch at pull time indicates tampering (deployment fails)
- Rollback targets specific digest, not "previous tag"
- Requires registry integration for tag resolution
- Users see both tag (friendly) and digest (authoritative) in UI
---
## ADR-002: Evidence for Every Decision
**Status:** Accepted
**Context:**
Compliance and audit requirements demand proof of what was deployed, when, by whom, and why.
**Decision:**
Every promotion and deployment produces a cryptographically signed evidence packet that is immutable and append-only.
**Consequences:**
- Evidence table has no UPDATE/DELETE permissions
- Evidence enables audit-grade compliance reporting
- Evidence enables deterministic replay (same inputs + policy = same decision)
- Evidence packets are exportable for external audit systems
- Storage requirements increase over time
---
## ADR-003: Plugin Architecture for Integrations
**Status:** Accepted
**Context:**
Organizations use diverse toolchains (registries, CI/CD, vaults, notification systems). Hard-coding integrations limits adoption.
**Decision:**
All integrations are implemented as plugins via a three-surface contract (Manifest, Connector Runtime, Step Provider). Core orchestration is stable and plugin-agnostic.
**Consequences:**
- Core has no hard-coded vendor integrations
- New integrations can be added without core changes
- Plugin failures cannot crash core (sandbox isolation)
- Plugin interface must be versioned and stable
- Additional complexity in plugin lifecycle management
---
## ADR-004: No Feature Gating
**Status:** Accepted
**Context:**
Enterprise software often gates security features behind premium tiers, creating "pay for security" anti-patterns.
**Decision:**
All plans include all features. Pricing is based only on:
- Number of environments
- New digests analyzed per day
- Fair use on deployments
**Consequences:**
- No feature flags tied to billing tier
- Transparent pricing without feature fragmentation
- May limit revenue optimization per customer
- Quota enforcement must be clear and user-friendly
---
## ADR-005: Offline-First Operation
**Status:** Accepted
**Context:**
Many organizations operate in air-gapped or restricted network environments. Dependency on external services limits adoption.
**Decision:**
All core operations must work in air-gapped environments. External data is synced via mirror bundles. Plugins may require connectivity; core does not.
**Consequences:**
- No runtime calls to external APIs for core decisions
- Advisory data synced via offline bundles
- Plugin connectivity requirements are declared in manifest
- Evidence packets exportable for external submission
- Additional complexity in data synchronization
---
## ADR-006: Agent-Based and Agentless Deployment
**Status:** Accepted
**Context:**
Some organizations prefer agents for security isolation; others prefer agentless for simplicity.
**Decision:**
Support both agent-based (persistent daemon on targets) and agentless (SSH/WinRM on demand) deployment models.
**Consequences:**
- Agent provides better performance and reliability
- Agentless reduces infrastructure footprint
- Unified task model abstracts deployment details
- Security model must handle both patterns
- Higher testing matrix
---
## ADR-007: PostgreSQL as Primary Database
**Status:** Accepted
**Context:**
Database choice affects scalability, operations, and feature availability.
**Decision:**
PostgreSQL (16+) as the primary database with:
- Per-module schema isolation
- Row-level security for multi-tenancy
- JSONB for flexible configuration
- Append-only triggers for evidence tables
**Consequences:**
- Proven scalability and reliability
- Rich feature set (JSONB, RLS, triggers)
- Single database technology to operate
- Requires PostgreSQL expertise
- Schema migrations must be carefully managed
---
## ADR-008: Workflow Engine with DAG Execution
**Status:** Accepted
**Context:**
Deployment workflows need conditional logic, parallel execution, error handling, and rollback support.
**Decision:**
Implement a DAG-based workflow engine where:
- Workflows are templates with nodes (steps) and edges (dependencies)
- Steps execute when all dependencies are satisfied
- Expressions reference previous step outputs
- Built-in support for approval, retry, timeout, and rollback
**Consequences:**
- Flexible workflow composition
- Visual representation in UI
- Complex error handling scenarios supported
- Learning curve for workflow authors
- Expression engine security considerations
---
## ADR-009: Separation of Duties Enforcement
**Status:** Accepted
**Context:**
Compliance requires that the person requesting a change cannot be the same person approving it.
**Decision:**
Separation of Duties (SoD) is enforced at the approval gateway level, preventing self-approval when SoD is enabled for an environment.
**Consequences:**
- Prevents single-person deployment to sensitive environments
- Configurable per environment
- May slow down deployments
- Requires minimum team size for SoD-enabled environments
---
## ADR-010: Version Stickers for Drift Detection
**Status:** Accepted
**Context:**
Knowing what's actually deployed on targets is essential for audit and troubleshooting.
**Decision:**
Every deployment writes a `stella.version.json` sticker file on the target containing release ID, digests, deployment timestamp, and deployer identity.
**Consequences:**
- Enables drift detection (expected vs actual)
- Provides audit trail on target hosts
- Enables accurate "what's deployed where" queries
- Requires file access on targets
- Sticker corruption/deletion must be handled
---
## ADR-011: Security Gate Integration
**Status:** Accepted
**Context:**
Security scanning exists as a separate concern; release orchestration should leverage but not duplicate it.
**Decision:**
Security scanning remains in existing modules (Scanner, VEX). Release orchestration consumes scan results through a security gate that evaluates vulnerability thresholds.
**Consequences:**
- Clear separation of concerns
- Existing scanning investment preserved
- Gate configuration determines block thresholds
- Requires API integration with scanning modules
- Policy engine evaluates security verdicts
---
## ADR-012: gRPC for Agent Communication
**Status:** Accepted
**Context:**
Agent communication requires efficient, bidirectional, and secure data transfer.
**Decision:**
Use gRPC for agent communication with:
- mTLS for transport security
- Bidirectional streaming for logs and progress
- Protocol buffers for efficient serialization
**Consequences:**
- Efficient binary protocol
- Strong typing via protobuf
- Built-in streaming support
- Requires gRPC infrastructure
- Firewall considerations for gRPC traffic
---
## References
- [Design Principles](principles.md)
- [Security Architecture](../security/overview.md)
- [Plugin System](../modules/plugin-system.md)

View File

@@ -0,0 +1,221 @@
# Design Principles & Invariants
> These principles are **inviolable** and MUST be reflected in all code, UI, documentation, and audit artifacts.
## Core Principles
### Principle 1: Release Identity via Digest
```
INVARIANT: A release is a set of OCI image digests (component → digest mapping), never tags.
```
- Tags are convenience inputs for resolution
- Tags are resolved to digests at release creation time
- All downstream operations (promotion, deployment, rollback) use digests
- Digest mismatch at pull time = deployment failure (tamper detection)
**Implementation Requirements:**
- Release creation API accepts tags but immediately resolves to digests
- All internal references use `sha256:` prefixed digests
- Agent deployment verifies digest at pull time
- Rollback targets specific digest, not "previous tag"
### Principle 2: Determinism and Evidence
```
INVARIANT: Every deployment/promotion produces an immutable evidence record.
```
Evidence record contains:
- **Who**: User identity (from Authority)
- **What**: Release bundle (digests), target environment, target hosts
- **Why**: Policy evaluation result, approval records, decision reasons
- **How**: Generated artifacts (compose files, scripts), execution logs
- **When**: Timestamps for request, decision, execution, completion
Evidence enables:
- Audit-grade compliance reporting
- Deterministic replay (same inputs + policy → same decision)
- "Why blocked?" explainability
**Implementation Requirements:**
- Evidence is generated synchronously with decision
- Evidence is signed before storage
- Evidence table is append-only (no UPDATE/DELETE)
- Evidence includes hash of all inputs for replay verification
### Principle 3: Pluggable Everything, Stable Core
```
INVARIANT: Integrations are plugins; the core orchestration engine is stable.
```
**Plugins contribute:**
- Configuration screens (UI)
- Connector logic (runtime)
- Step node types (workflow)
- Doctor checks (diagnostics)
- Agent types (deployment)
**Core engine provides:**
- Workflow execution (DAG processing)
- State machine management
- Evidence generation
- Policy evaluation
- Credential brokering
**Implementation Requirements:**
- Core has no hard-coded integrations
- Plugin interface is versioned and stable
- Plugin failures cannot crash core
- Core provides fallback behavior when plugins unavailable
### Principle 4: No Feature Gating
```
INVARIANT: All plans include all features. Limits are only:
- Number of environments
- Number of new digests analyzed per day
- Fair use on deployments
```
This prevents:
- "Pay for security" anti-pattern
- Per-project/per-seat billing landmines
- Feature fragmentation across tiers
**Implementation Requirements:**
- No feature flags tied to billing tier
- Quota enforcement is transparent (clear error messages)
- Usage metrics exposed for customer visibility
- Overage handling is graceful (soft limits with warnings)
### Principle 5: Offline-First Operation
```
INVARIANT: All core operations MUST work in air-gapped environments.
```
Implications:
- No runtime calls to external APIs for core decisions
- Vulnerability data synced via mirror bundles
- Plugins may require connectivity; core does not
- Evidence packets exportable for external audit
**Implementation Requirements:**
- Core decision logic has no external HTTP calls
- All external data is pre-synced and cached
- Plugin connectivity requirements are declared in manifest
- Offline mode is explicit configuration, not degraded fallback
### Principle 6: Immutable Generated Artifacts
```
INVARIANT: Every deployment generates and stores immutable artifacts.
```
Generated artifacts:
- `compose.stella.lock.yml`: Pinned digests, resolved env refs
- `deploy.stella.script.dll`: Compiled C# script (or hash reference)
- `release.evidence.json`: Decision record
- `stella.version.json`: Version sticker placed on target
Version sticker enables:
- Drift detection (expected vs actual)
- Audit trail on target host
- Rollback reference
**Implementation Requirements:**
- Artifacts are content-addressed (hash in filename or metadata)
- Artifacts are stored before deployment execution
- Artifact storage is immutable (no overwrites)
- Version sticker is atomic write on target
---
## Architectural Invariants (Enforced by Design)
These invariants are enforced through database constraints, code architecture, and operational controls.
| Invariant | Enforcement Mechanism |
|-----------|----------------------|
| Digests are immutable | Database constraint: digest column is unique, no updates |
| Evidence packets are append-only | Evidence table has no UPDATE/DELETE permissions |
| Secrets never in database | Vault integration; only references stored |
| Plugins cannot bypass policy | Policy evaluation in core, not plugin |
| Multi-tenant isolation | `tenant_id` FK on all tables; row-level security |
| Workflow state is auditable | State transitions logged; no direct state manipulation |
| Approvals are tamper-evident | Approval records are signed and append-only |
### Database Enforcement
```sql
-- Example: Evidence table with no UPDATE/DELETE
CREATE TABLE release.evidence_packets (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
content_hash TEXT NOT NULL,
content JSONB NOT NULL,
signature TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
-- No updated_at column; immutable by design
);
-- Revoke UPDATE/DELETE from application role
REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role;
```
### Code Architecture Enforcement
```csharp
// Policy evaluation is ALWAYS in core, never delegated to plugins
public sealed class PromotionDecisionEngine
{
// Plugins provide gate implementations, but core orchestrates evaluation
public async Task<DecisionResult> EvaluateAsync(
Promotion promotion,
IReadOnlyList<IGateProvider> gates,
CancellationToken ct)
{
// Core controls evaluation order and aggregation
var results = new List<GateResult>();
foreach (var gate in gates)
{
// Plugin provides evaluation logic
var result = await gate.EvaluateAsync(promotion, ct);
results.Add(result);
// Core decides how to aggregate (plugins cannot override)
if (result.IsBlocking && _policy.FailFast)
break;
}
// Core makes final decision
return _decisionAggregator.Aggregate(results);
}
}
```
---
## Document Conventions
Throughout the Release Orchestrator documentation:
- **MUST**: Mandatory requirement; non-compliance is a bug
- **SHOULD**: Recommended but not mandatory; deviation requires justification
- **MAY**: Optional; implementation decision
- **Entity names**: `PascalCase` (e.g., `ReleaseBundle`)
- **Table names**: `snake_case` (e.g., `release_bundles`)
- **API paths**: `/api/v1/resource-name`
- **Module names**: `kebab-case` (e.g., `release-manager`)
---
## References
- [Key Architectural Decisions](decisions.md)
- [Module Architecture](../modules/overview.md)
- [Security Architecture](../security/overview.md)

View File

@@ -0,0 +1,602 @@
# Implementation Guide
> .NET 10 implementation patterns and best practices for Release Orchestrator modules.
**Target Audience**: Development team implementing Release Orchestrator modules
**Prerequisites**: Familiarity with [CLAUDE.md](../../../CLAUDE.md) coding rules
---
## Overview
This guide supplements the architecture documentation with .NET 10-specific implementation patterns required for all Release Orchestrator modules. These patterns ensure:
- Deterministic behavior for evidence reproducibility
- Testability through dependency injection
- Compliance with Stella Ops coding standards
- Performance and reliability
---
## Code Quality Requirements
### Compiler Configuration
All Release Orchestrator projects **MUST** enforce warnings as errors:
```xml
<!-- In Directory.Build.props or .csproj -->
<PropertyGroup>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
<Nullable>enable</Nullable>
<ImplicitUsings>disable</ImplicitUsings>
</PropertyGroup>
```
**Rationale**: Warnings indicate potential bugs, regressions, or code quality drift. Treating them as errors prevents them from being ignored.
---
## Determinism & Time Handling
### TimeProvider Injection
**Never** use `DateTime.UtcNow`, `DateTimeOffset.UtcNow`, or `DateTimeOffset.Now` directly. Always inject `TimeProvider`.
```csharp
// ❌ BAD - non-deterministic, hard to test
public class PromotionManager
{
public Promotion CreatePromotion(Guid releaseId, Guid targetEnvId)
{
return new Promotion
{
Id = Guid.NewGuid(),
ReleaseId = releaseId,
TargetEnvironmentId = targetEnvId,
RequestedAt = DateTimeOffset.UtcNow // ❌ Hard-coded time
};
}
}
// ✅ GOOD - injectable, testable, deterministic
public class PromotionManager
{
private readonly TimeProvider _timeProvider;
private readonly IGuidGenerator _guidGenerator;
public PromotionManager(TimeProvider timeProvider, IGuidGenerator guidGenerator)
{
_timeProvider = timeProvider;
_guidGenerator = guidGenerator;
}
public Promotion CreatePromotion(Guid releaseId, Guid targetEnvId)
{
return new Promotion
{
Id = _guidGenerator.NewGuid(),
ReleaseId = releaseId,
TargetEnvironmentId = targetEnvId,
RequestedAt = _timeProvider.GetUtcNow() // ✅ Injected, testable
};
}
}
```
**Registration**:
```csharp
// Production: use system time
services.AddSingleton(TimeProvider.System);
// Testing: use manual time for deterministic tests
var manualTime = new ManualTimeProvider();
manualTime.SetUtcNow(new DateTimeOffset(2026, 1, 10, 12, 0, 0, TimeSpan.Zero));
services.AddSingleton<TimeProvider>(manualTime);
```
---
### GUID Generation
**Never** use `Guid.NewGuid()` directly. Always inject `IGuidGenerator`.
```csharp
// ❌ BAD
var releaseId = Guid.NewGuid();
// ✅ GOOD
var releaseId = _guidGenerator.NewGuid();
```
**Interface**:
```csharp
public interface IGuidGenerator
{
Guid NewGuid();
}
// Production implementation
public sealed class SystemGuidGenerator : IGuidGenerator
{
public Guid NewGuid() => Guid.NewGuid();
}
// Deterministic test implementation
public sealed class SequentialGuidGenerator : IGuidGenerator
{
private int _counter;
public Guid NewGuid()
{
var bytes = new byte[16];
BitConverter.GetBytes(_counter++).CopyTo(bytes, 0);
return new Guid(bytes);
}
}
```
---
## Async & Cancellation
### CancellationToken Propagation
**Always** propagate `CancellationToken` through async call chains. Never use `CancellationToken.None` except at entry points where no token is available.
```csharp
// ❌ BAD - ignores cancellation
public async Task<Promotion> ApprovePromotionAsync(Guid promotionId, Guid userId, CancellationToken ct)
{
var promotion = await _repository.GetByIdAsync(promotionId, CancellationToken.None); // ❌ Wrong
promotion.Approvals.Add(new Approval
{
ApproverId = userId,
ApprovedAt = _timeProvider.GetUtcNow()
});
await _repository.SaveAsync(promotion, CancellationToken.None); // ❌ Wrong
await Task.Delay(1000); // ❌ Missing ct
return promotion;
}
// ✅ GOOD - propagates cancellation
public async Task<Promotion> ApprovePromotionAsync(Guid promotionId, Guid userId, CancellationToken ct)
{
var promotion = await _repository.GetByIdAsync(promotionId, ct); // ✅ Propagated
promotion.Approvals.Add(new Approval
{
ApproverId = userId,
ApprovedAt = _timeProvider.GetUtcNow()
});
await _repository.SaveAsync(promotion, ct); // ✅ Propagated
await Task.Delay(1000, ct); // ✅ Cancellable
return promotion;
}
```
---
## HTTP Client Usage
### IHttpClientFactory for Connector Runtime
**Never** instantiate `HttpClient` directly. Always use `IHttpClientFactory` with configured timeouts and resilience policies.
```csharp
// ❌ BAD - direct instantiation risks socket exhaustion
public class GitHubConnector
{
public async Task<string> GetCommitAsync(string sha)
{
using var client = new HttpClient(); // ❌ Socket exhaustion risk
var response = await client.GetAsync($"https://api.github.com/commits/{sha}");
return await response.Content.ReadAsStringAsync();
}
}
// ✅ GOOD - factory with resilience
public class GitHubConnector
{
private readonly IHttpClientFactory _httpClientFactory;
public GitHubConnector(IHttpClientFactory httpClientFactory)
{
_httpClientFactory = httpClientFactory;
}
public async Task<string> GetCommitAsync(string sha, CancellationToken ct)
{
var client = _httpClientFactory.CreateClient("GitHub");
var response = await client.GetAsync($"/commits/{sha}", ct);
response.EnsureSuccessStatusCode();
return await response.Content.ReadAsStringAsync(ct);
}
}
```
**Registration with resilience**:
```csharp
services.AddHttpClient("GitHub", client =>
{
client.BaseAddress = new Uri("https://api.github.com");
client.Timeout = TimeSpan.FromSeconds(30);
client.DefaultRequestHeaders.Add("User-Agent", "StellaOps/1.0");
})
.AddStandardResilienceHandler(options =>
{
options.Retry.MaxRetryAttempts = 3;
options.CircuitBreaker.SamplingDuration = TimeSpan.FromSeconds(30);
options.TotalRequestTimeout.Timeout = TimeSpan.FromMinutes(1);
});
```
---
## Culture & Formatting
### Invariant Culture for Parsing
**Always** use `CultureInfo.InvariantCulture` for parsing and formatting dates, numbers, and any string that will be persisted, hashed, or compared.
```csharp
// ❌ BAD - culture-sensitive
var percentage = double.Parse(input);
var formatted = value.ToString("P2");
var dateStr = date.ToString("yyyy-MM-dd");
// ✅ GOOD - invariant culture
var percentage = double.Parse(input, CultureInfo.InvariantCulture);
var formatted = value.ToString("P2", CultureInfo.InvariantCulture);
var dateStr = date.ToString("yyyy-MM-dd", CultureInfo.InvariantCulture);
```
---
## JSON Handling
### RFC 8785 Canonical JSON for Evidence
For evidence packets and decision records that will be hashed or signed, use **RFC 8785-compliant** canonical JSON serialization.
```csharp
// ❌ BAD - non-canonical JSON
var json = JsonSerializer.Serialize(decisionRecord, new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
PropertyNamingPolicy = JsonNamingPolicy.CamelCase
});
var hash = ComputeHash(json); // ❌ Non-deterministic
// ✅ GOOD - use shared canonicalizer
var canonicalJson = CanonicalJsonSerializer.Serialize(decisionRecord);
var hash = ComputeHash(canonicalJson); // ✅ Deterministic
```
**Canonical JSON Requirements**:
- Keys sorted alphabetically
- Minimal escaping per RFC 8785 spec
- No exponent notation for numbers
- No trailing/leading zeros
- No whitespace
---
## Database Interaction
### DateTimeOffset for PostgreSQL timestamptz
PostgreSQL `timestamptz` columns **MUST** be read and written as `DateTimeOffset`, not `DateTime`.
```csharp
// ❌ BAD - loses offset information
await using var reader = await command.ExecuteReaderAsync(ct);
while (await reader.ReadAsync(ct))
{
var createdAt = reader.GetDateTime(reader.GetOrdinal("created_at")); // ❌ Loses offset
}
// ✅ GOOD - preserves offset
await using var reader = await command.ExecuteReaderAsync(ct);
while (await reader.ReadAsync(ct))
{
var createdAt = reader.GetFieldValue<DateTimeOffset>(reader.GetOrdinal("created_at")); // ✅ Correct
}
```
**Insertion**:
```csharp
// ✅ Always use UTC DateTimeOffset
var createdAt = _timeProvider.GetUtcNow(); // Returns DateTimeOffset
await command.ExecuteNonQueryAsync(ct);
```
---
## Hybrid Logical Clock (HLC) for Distributed Ordering
For distributed ordering and audit-safe sequencing, use `IHybridLogicalClock` from `StellaOps.HybridLogicalClock`.
**When to use HLC**:
- Promotion state transitions
- Workflow step execution ordering
- Deployment task sequencing
- Timeline event ordering
```csharp
public class PromotionStateTransition
{
private readonly IHybridLogicalClock _hlc;
private readonly TimeProvider _timeProvider;
public async Task TransitionStateAsync(
Promotion promotion,
PromotionState newState,
CancellationToken ct)
{
var transition = new StateTransition
{
PromotionId = promotion.Id,
FromState = promotion.Status,
ToState = newState,
THlc = _hlc.Tick(), // ✅ Monotonic, skew-tolerant ordering
TsWall = _timeProvider.GetUtcNow(), // ✅ Informational timestamp
TransitionedBy = _currentUser.Id
};
await _repository.RecordTransitionAsync(transition, ct);
}
}
```
**HLC State Persistence**:
```csharp
// Service startup
public async Task StartAsync(CancellationToken ct)
{
await _hlc.InitializeFromStateAsync(ct); // Restore monotonicity
}
// Service shutdown
public async Task StopAsync(CancellationToken ct)
{
await _hlc.PersistStateAsync(ct); // Persist HLC state
}
```
---
## Configuration & Options
### Options Validation at Startup
Use `ValidateDataAnnotations()` and `ValidateOnStart()` for all options classes.
```csharp
// Options class
public sealed class PromotionManagerOptions
{
[Required]
[Range(1, 10)]
public int MaxConcurrentPromotions { get; set; } = 3;
[Required]
[Range(1, 3600)]
public int ApprovalExpirationSeconds { get; set; } = 1440;
}
// Registration with validation
services.AddOptions<PromotionManagerOptions>()
.Bind(configuration.GetSection("PromotionManager"))
.ValidateDataAnnotations()
.ValidateOnStart();
// Complex validation
public class PromotionManagerOptionsValidator : IValidateOptions<PromotionManagerOptions>
{
public ValidateOptionsResult Validate(string? name, PromotionManagerOptions options)
{
if (options.MaxConcurrentPromotions <= 0)
return ValidateOptionsResult.Fail("MaxConcurrentPromotions must be positive");
return ValidateOptionsResult.Success;
}
}
services.AddSingleton<IValidateOptions<PromotionManagerOptions>, PromotionManagerOptionsValidator>();
```
---
## Immutability & Collections
### Return Immutable Collections from Public APIs
Public APIs **MUST** return `IReadOnlyList<T>`, `ImmutableArray<T>`, or defensive copies. Never expose mutable backing stores.
```csharp
// ❌ BAD - exposes mutable backing store
public class ReleaseManager
{
private readonly List<Component> _components = new();
public List<Component> Components => _components; // ❌ Callers can mutate!
}
// ✅ GOOD - immutable return
public class ReleaseManager
{
private readonly List<Component> _components = new();
public IReadOnlyList<Component> Components => _components.AsReadOnly(); // ✅ Read-only
// Or using ImmutableArray
public ImmutableArray<Component> GetComponents() => _components.ToImmutableArray();
}
```
---
## Error Handling
### No Silent Stubs
Placeholder code **MUST** throw `NotImplementedException` or return an explicit error. Never return success from unimplemented paths.
```csharp
// ❌ BAD - silent stub masks missing implementation
public async Task<Result> DeployToNomadAsync(Deployment deployment, CancellationToken ct)
{
// TODO: implement Nomad deployment
return Result.Success(); // ❌ Ships broken feature!
}
// ✅ GOOD - explicit failure
public async Task<Result> DeployToNomadAsync(Deployment deployment, CancellationToken ct)
{
throw new NotImplementedException(
"Nomad deployment not yet implemented. See SPRINT_20260115_003_AGENTS_nomad_support.md");
}
// ✅ Alternative: return unsupported result
public async Task<Result> DeployToNomadAsync(Deployment deployment, CancellationToken ct)
{
return Result.Failure("Nomad deployment target not yet supported. Use Docker or Compose.");
}
```
---
## Caching
### Bounded Caches with Eviction
**Do not** use `ConcurrentDictionary` or `Dictionary` for caching without eviction policies. Use bounded caches with TTL/LRU eviction.
```csharp
// ❌ BAD - unbounded growth
public class VersionMapCache
{
private readonly ConcurrentDictionary<string, DigestMapping> _cache = new();
public void Add(string tag, DigestMapping mapping)
{
_cache[tag] = mapping; // ❌ Never evicts, memory grows forever
}
}
// ✅ GOOD - bounded with eviction
public class VersionMapCache
{
private readonly MemoryCache _cache;
public VersionMapCache()
{
_cache = new MemoryCache(new MemoryCacheOptions
{
SizeLimit = 10_000 // Max 10k entries
});
}
public void Add(string tag, DigestMapping mapping)
{
_cache.Set(tag, mapping, new MemoryCacheEntryOptions
{
Size = 1,
SlidingExpiration = TimeSpan.FromHours(1) // ✅ 1 hour TTL
});
}
public DigestMapping? Get(string tag) => _cache.Get<DigestMapping>(tag);
}
```
**Cache TTL Recommendations**:
- **Integration health checks**: 5 minutes
- **Version maps (tag → digest)**: 1 hour
- **Environment configs**: 30 minutes
- **Agent capabilities**: 10 minutes
---
## Testing
### Test Helpers Must Call Production Code
Test helpers **MUST** call production code, not reimplement algorithms. Only mock I/O and network boundaries.
```csharp
// ❌ BAD - test reimplements production logic
public static string ComputeEvidenceHash(DecisionRecord record)
{
// Custom hash implementation in test
var json = JsonSerializer.Serialize(record); // ❌ Different from production!
return SHA256.HashData(Encoding.UTF8.GetBytes(json)).ToHexString();
}
// ✅ GOOD - test uses production code
public static string ComputeEvidenceHash(DecisionRecord record)
{
// Calls production EvidenceHasher
return EvidenceHasher.ComputeHash(record); // ✅ Same as production
}
```
---
## Path Resolution
### Explicit CLI Options for Paths
**Do not** derive paths from `AppContext.BaseDirectory` with parent directory walks. Use explicit CLI options or environment variables.
```csharp
// ❌ BAD - fragile parent walks
var repoRoot = Path.GetFullPath(Path.Combine(
AppContext.BaseDirectory, "..", "..", "..", ".."));
// ✅ GOOD - explicit option with fallback
[Option("--repo-root", Description = "Repository root path")]
public string? RepoRoot { get; set; }
public string GetRepoRoot() =>
RepoRoot
?? Environment.GetEnvironmentVariable("STELLAOPS_REPO_ROOT")
?? throw new InvalidOperationException(
"Repository root not specified. Use --repo-root or set STELLAOPS_REPO_ROOT.");
```
---
## Summary Checklist
Before submitting a pull request, verify:
- [ ] `TreatWarningsAsErrors` enabled in project file
- [ ] All timestamps use `TimeProvider`, never `DateTime.UtcNow`
- [ ] All GUIDs use `IGuidGenerator`, never `Guid.NewGuid()`
- [ ] `CancellationToken` propagated through all async methods
- [ ] HTTP clients use `IHttpClientFactory`, never `new HttpClient()`
- [ ] Culture-invariant parsing for all formatted strings
- [ ] Canonical JSON for evidence/decision records
- [ ] `DateTimeOffset` for all PostgreSQL `timestamptz` columns
- [ ] HLC used for distributed ordering where applicable
- [ ] Options classes validated at startup with `ValidateOnStart()`
- [ ] Public APIs return immutable collections
- [ ] No silent stubs; unimplemented code throws `NotImplementedException`
- [ ] Caches have bounded size and TTL eviction
- [ ] Tests exercise production code, not reimplementations
---
## References
- [CLAUDE.md](../../../CLAUDE.md) — Stella Ops coding rules
- [Test Structure](./test-structure.md) — Test organization guidelines
- [Database Schema](./data-model/schema.md) — Schema patterns
- [HLC Documentation](../../eventing/event-envelope-schema.md) — Event ordering with HLC

View File

@@ -0,0 +1,643 @@
# CI/CD Integration
## Overview
Release Orchestrator integrates with CI/CD systems to:
- Receive build completion notifications
- Trigger additional pipelines during deployment
- Create releases from CI artifacts
- Report deployment status back to CI systems
## Integration Patterns
### Pattern 1: CI Triggers Release
```
CI TRIGGERS RELEASE
┌────────────┐ ┌────────────┐ ┌────────────────────┐
│ CI/CD │ │ Container │ │ Release │
│ System │ │ Registry │ │ Orchestrator │
└─────┬──────┘ └─────┬──────┘ └─────────┬──────────┘
│ │ │
│ Build & Push │ │
│─────────────────►│ │
│ │ │
│ │ Webhook: image pushed
│ │─────────────────────►│
│ │ │
│ │ │ Create/Update
│ │ │ Version Map
│ │ │
│ │ │ Auto-create
│ │ │ Release (if configured)
│ │ │
│ API: Create Release (optional) │
│────────────────────────────────────────►│
│ │ │
│ │ │ Start Promotion
│ │ │ Workflow
│ │ │
```
### Pattern 2: Orchestrator Triggers CI
```
ORCHESTRATOR TRIGGERS CI
┌────────────────────┐ ┌────────────┐ ┌────────────┐
│ Release │ │ CI/CD │ │ Target │
│ Orchestrator │ │ System │ │ Systems │
└─────────┬──────────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
│ Pre-deploy: Trigger │ │
│ Integration Tests │ │
│─────────────────────►│ │
│ │ │
│ │ Run Tests │
│ │─────────────────►│
│ │ │
│ Wait for completion │ │
│◄─────────────────────│ │
│ │ │
│ If passed: Deploy │ │
│─────────────────────────────────────────►
│ │ │
```
### Pattern 3: Bidirectional Integration
```
BIDIRECTIONAL INTEGRATION
┌────────────┐ ┌────────────────────┐
│ CI/CD │◄───────────────────────►│ Release │
│ System │ │ Orchestrator │
└─────┬──────┘ └─────────┬──────────┘
│ │
│══════════════════════════════════════════│
│ Events (both directions) │
│══════════════════════════════════════════│
│ │
│ CI Events: │
│ - Pipeline completed │
│ - Tests passed/failed │
│ - Artifacts ready │
│ │
│ Orchestrator Events: │
│ - Deployment started │
│ - Deployment completed │
│ - Rollback initiated │
│ │
```
## CI/CD System Configuration
### GitLab CI Integration
```yaml
# .gitlab-ci.yml
stages:
- build
- push
- release
variables:
STELLA_API_URL: https://stella.example.com/api/v1
COMPONENT_NAME: myapp
build:
stage: build
script:
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
push:
stage: push
script:
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
- docker tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG
rules:
- if: $CI_COMMIT_TAG
release:
stage: release
image: curlimages/curl:latest
script:
- |
# Get image digest
DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG | cut -d@ -f2)
# Create release in Stella
curl -X POST "$STELLA_API_URL/releases" \
-H "Authorization: Bearer $STELLA_TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"name\": \"$COMPONENT_NAME-$CI_COMMIT_TAG\",
\"components\": [{
\"componentId\": \"$STELLA_COMPONENT_ID\",
\"digest\": \"$DIGEST\"
}],
\"sourceRef\": {
\"type\": \"git\",
\"repository\": \"$CI_PROJECT_URL\",
\"commit\": \"$CI_COMMIT_SHA\",
\"tag\": \"$CI_COMMIT_TAG\"
}
}"
rules:
- if: $CI_COMMIT_TAG
```
### GitHub Actions Integration
```yaml
# .github/workflows/release.yml
name: Release to Stella
on:
push:
tags:
- 'v*'
jobs:
build-and-release:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v5
with:
push: true
tags: |
ghcr.io/${{ github.repository }}:${{ github.sha }}
ghcr.io/${{ github.repository }}:${{ github.ref_name }}
- name: Get image digest
id: digest
run: |
DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' ghcr.io/${{ github.repository }}:${{ github.ref_name }} | cut -d@ -f2)
echo "digest=$DIGEST" >> $GITHUB_OUTPUT
- name: Create Stella Release
uses: stella-ops/create-release-action@v1
with:
stella-url: ${{ vars.STELLA_API_URL }}
stella-token: ${{ secrets.STELLA_TOKEN }}
release-name: ${{ github.event.repository.name }}-${{ github.ref_name }}
components: |
- componentId: ${{ vars.STELLA_COMPONENT_ID }}
digest: ${{ steps.digest.outputs.digest }}
source-ref: |
type: git
repository: ${{ github.server_url }}/${{ github.repository }}
commit: ${{ github.sha }}
tag: ${{ github.ref_name }}
```
### Jenkins Integration
```groovy
// Jenkinsfile
pipeline {
agent any
environment {
STELLA_API_URL = 'https://stella.example.com/api/v1'
STELLA_TOKEN = credentials('stella-api-token')
REGISTRY = 'registry.example.com'
IMAGE_NAME = 'myorg/myapp'
}
stages {
stage('Build') {
steps {
script {
docker.build("${REGISTRY}/${IMAGE_NAME}:${env.BUILD_TAG}")
}
}
}
stage('Push') {
steps {
script {
docker.withRegistry("https://${REGISTRY}", 'registry-creds') {
docker.image("${REGISTRY}/${IMAGE_NAME}:${env.BUILD_TAG}").push()
}
}
}
}
stage('Create Release') {
when {
tag pattern: "v\\d+\\.\\d+\\.\\d+", comparator: "REGEXP"
}
steps {
script {
def digest = sh(
script: "docker inspect --format='{{index .RepoDigests 0}}' ${REGISTRY}/${IMAGE_NAME}:${env.TAG_NAME} | cut -d@ -f2",
returnStdout: true
).trim()
def response = httpRequest(
url: "${STELLA_API_URL}/releases",
httpMode: 'POST',
contentType: 'APPLICATION_JSON',
customHeaders: [[name: 'Authorization', value: "Bearer ${STELLA_TOKEN}"]],
requestBody: """
{
"name": "${IMAGE_NAME}-${env.TAG_NAME}",
"components": [{
"componentId": "${env.STELLA_COMPONENT_ID}",
"digest": "${digest}"
}],
"sourceRef": {
"type": "git",
"repository": "${env.GIT_URL}",
"commit": "${env.GIT_COMMIT}",
"tag": "${env.TAG_NAME}"
}
}
"""
)
echo "Release created: ${response.content}"
}
}
}
}
post {
success {
// Notify Stella of successful build
httpRequest(
url: "${STELLA_API_URL}/webhooks/ci-status",
httpMode: 'POST',
contentType: 'APPLICATION_JSON',
customHeaders: [[name: 'Authorization', value: "Bearer ${STELLA_TOKEN}"]],
requestBody: """
{
"buildId": "${env.BUILD_ID}",
"status": "success",
"commit": "${env.GIT_COMMIT}"
}
"""
)
}
}
}
```
## Workflow Step Integration
### Trigger CI Pipeline Step
```typescript
// Step type: trigger-ci
interface TriggerCIConfig {
integrationId: UUID; // CI integration reference
pipelineId: string; // Pipeline to trigger
ref?: string; // Branch/tag reference
variables?: Record<string, string>;
waitForCompletion: boolean;
timeout?: number;
}
class TriggerCIStep implements IStepExecutor {
async execute(
inputs: StepInputs,
config: TriggerCIConfig,
context: ExecutionContext
): Promise<StepOutputs> {
const connector = await this.getConnector(config.integrationId);
// Trigger pipeline
const run = await connector.triggerPipeline(
config.pipelineId,
{
ref: config.ref || context.release?.sourceRef?.tag,
variables: {
...config.variables,
STELLA_RELEASE_ID: context.release?.id,
STELLA_PROMOTION_ID: context.promotion?.id,
STELLA_ENVIRONMENT: context.environment?.name
}
}
);
if (!config.waitForCompletion) {
return {
pipelineRunId: run.id,
status: run.status,
webUrl: run.webUrl
};
}
// Wait for completion
const finalStatus = await this.waitForPipeline(
connector,
run.id,
config.timeout || 3600
);
if (finalStatus.status !== "success") {
throw new StepError(
`Pipeline failed with status: ${finalStatus.status}`,
{ pipelineRunId: run.id, status: finalStatus }
);
}
return {
pipelineRunId: run.id,
status: finalStatus.status,
webUrl: run.webUrl
};
}
private async waitForPipeline(
connector: ICICDConnector,
runId: string,
timeout: number
): Promise<PipelineRun> {
const deadline = Date.now() + timeout * 1000;
while (Date.now() < deadline) {
const run = await connector.getPipelineRun(runId);
if (run.status === "success" || run.status === "failed" || run.status === "cancelled") {
return run;
}
await sleep(10000); // Poll every 10 seconds
}
throw new TimeoutError(`Pipeline did not complete within ${timeout} seconds`);
}
}
```
### Wait for CI Step
```typescript
// Step type: wait-ci
interface WaitCIConfig {
integrationId: UUID;
runId?: string; // If known, or from input
runIdInput?: string; // Input name containing run ID
timeout: number;
failOnError: boolean;
}
class WaitCIStep implements IStepExecutor {
async execute(
inputs: StepInputs,
config: WaitCIConfig,
context: ExecutionContext
): Promise<StepOutputs> {
const runId = config.runId || inputs[config.runIdInput!];
if (!runId) {
throw new StepError("Pipeline run ID not provided");
}
const connector = await this.getConnector(config.integrationId);
const finalStatus = await this.waitForPipeline(
connector,
runId,
config.timeout
);
const success = finalStatus.status === "success";
if (!success && config.failOnError) {
throw new StepError(
`Pipeline failed with status: ${finalStatus.status}`,
{ pipelineRunId: runId, status: finalStatus }
);
}
return {
status: finalStatus.status,
success,
pipelineRun: finalStatus
};
}
}
```
## Deployment Status Reporting
### GitHub Deployment Status
```typescript
class GitHubStatusReporter {
async reportDeploymentStart(
integration: Integration,
deployment: DeploymentContext
): Promise<void> {
const client = await this.getClient(integration);
// Create deployment
const { data: ghDeployment } = await client.repos.createDeployment({
owner: deployment.repository.owner,
repo: deployment.repository.name,
ref: deployment.sourceRef.commit,
environment: deployment.environment.name,
auto_merge: false,
required_contexts: [],
payload: {
stellaReleaseId: deployment.release.id,
stellaPromotionId: deployment.promotion.id
}
});
// Set status to in_progress
await client.repos.createDeploymentStatus({
owner: deployment.repository.owner,
repo: deployment.repository.name,
deployment_id: ghDeployment.id,
state: "in_progress",
log_url: `${this.stellaUrl}/deployments/${deployment.jobId}`,
description: "Deployment in progress"
});
// Store deployment ID for later status update
await this.storeMapping(deployment.jobId, ghDeployment.id);
}
async reportDeploymentComplete(
integration: Integration,
deployment: DeploymentContext,
success: boolean
): Promise<void> {
const client = await this.getClient(integration);
const ghDeploymentId = await this.getMapping(deployment.jobId);
await client.repos.createDeploymentStatus({
owner: deployment.repository.owner,
repo: deployment.repository.name,
deployment_id: ghDeploymentId,
state: success ? "success" : "failure",
log_url: `${this.stellaUrl}/deployments/${deployment.jobId}`,
environment_url: deployment.environment.url,
description: success
? "Deployment completed successfully"
: "Deployment failed"
});
}
}
```
### GitLab Pipeline Status
```typescript
class GitLabStatusReporter {
async reportDeploymentStatus(
integration: Integration,
deployment: DeploymentContext,
state: "running" | "success" | "failed" | "canceled"
): Promise<void> {
const client = await this.getClient(integration);
await client.post(
`/projects/${integration.config.projectId}/statuses/${deployment.sourceRef.commit}`,
{
state,
ref: deployment.sourceRef.tag || deployment.sourceRef.branch,
name: `stella/${deployment.environment.name}`,
target_url: `${this.stellaUrl}/deployments/${deployment.jobId}`,
description: this.getDescription(state, deployment)
}
);
}
private getDescription(state: string, deployment: DeploymentContext): string {
switch (state) {
case "running":
return `Deploying to ${deployment.environment.name}`;
case "success":
return `Deployed to ${deployment.environment.name}`;
case "failed":
return `Deployment to ${deployment.environment.name} failed`;
case "canceled":
return `Deployment to ${deployment.environment.name} cancelled`;
default:
return "";
}
}
}
```
## API for CI Systems
### Create Release from CI
```http
POST /api/v1/releases
Authorization: Bearer <ci-token>
Content-Type: application/json
{
"name": "myapp-v1.2.0",
"components": [
{
"componentId": "component-uuid",
"digest": "sha256:abc123..."
}
],
"sourceRef": {
"type": "git",
"repository": "https://github.com/myorg/myapp",
"commit": "abc123def456",
"tag": "v1.2.0",
"branch": "main"
},
"metadata": {
"buildId": "12345",
"buildUrl": "https://ci.example.com/builds/12345",
"triggeredBy": "ci-pipeline"
}
}
```
### Report Build Status
```http
POST /api/v1/ci-events/build-complete
Authorization: Bearer <ci-token>
Content-Type: application/json
{
"integrationId": "integration-uuid",
"buildId": "12345",
"status": "success",
"commit": "abc123def456",
"artifacts": [
{
"name": "myapp",
"digest": "sha256:abc123...",
"repository": "registry.example.com/myorg/myapp"
}
],
"testResults": {
"passed": 150,
"failed": 0,
"skipped": 5
}
}
```
## Service Account for CI
### Creating CI Service Account
```http
POST /api/v1/service-accounts
Authorization: Bearer <admin-token>
Content-Type: application/json
{
"name": "ci-pipeline",
"description": "Service account for CI/CD integration",
"roles": ["release-creator"],
"permissions": [
{ "resource": "release", "action": "create" },
{ "resource": "component", "action": "read" },
{ "resource": "version-map", "action": "read" }
],
"expiresIn": "365d"
}
```
Response:
```json
{
"success": true,
"data": {
"id": "sa-uuid",
"name": "ci-pipeline",
"token": "stella_sa_xxxxxxxxxxxxx",
"expiresAt": "2027-01-09T00:00:00Z"
}
}
```
## References
- [Integrations Overview](overview.md)
- [Connectors](connectors.md)
- [Webhooks](webhooks.md)
- [Workflow Templates](../workflow/templates.md)

View File

@@ -0,0 +1,900 @@
# Connector Development
## Overview
Connectors are the integration layer between Release Orchestrator and external systems. Each connector implements a standard interface for its integration type.
## Connector Architecture
```
CONNECTOR ARCHITECTURE
┌─────────────────────────────────────────────────────────────────────────────┐
│ CONNECTOR RUNTIME │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CONNECTOR INTERFACE │ │
│ │ │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │
│ │ │ getCapabilities()│ │ ping() │ │ authenticate() │ │ │
│ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │
│ │ │ discover() │ │ execute() │ │ healthCheck() │ │ │
│ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CONNECTOR IMPLEMENTATIONS │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Registry │ │ CI/CD │ │ Notification│ │ Secret │ │ │
│ │ │ Connectors │ │ Connectors │ │ Connectors │ │ Connectors │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ - Docker │ │ - GitLab │ │ - Slack │ │ - Vault │ │ │
│ │ │ - ECR │ │ - GitHub │ │ - Teams │ │ - AWS SM │ │ │
│ │ │ - ACR │ │ - Jenkins │ │ - Email │ │ - Azure KV │ │ │
│ │ │ - Harbor │ │ - Azure DO │ │ - PagerDuty │ │ │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Base Connector Interface
```typescript
interface IConnector {
// Metadata
readonly typeId: string;
readonly displayName: string;
readonly version: string;
readonly capabilities: ConnectorCapabilities;
// Lifecycle
initialize(config: IntegrationConfig): Promise<void>;
dispose(): Promise<void>;
// Health
ping(config: IntegrationConfig): Promise<void>;
healthCheck(config: IntegrationConfig, creds: Credential): Promise<HealthCheckResult>;
// Authentication
authenticate(config: IntegrationConfig, creds: Credential): Promise<AuthContext>;
// Discovery (optional)
discover?(
config: IntegrationConfig,
authContext: AuthContext,
resourceType: string,
filter?: DiscoveryFilter
): Promise<DiscoveredResource[]>;
}
interface ConnectorCapabilities {
discovery: boolean;
webhooks: boolean;
streaming: boolean;
batchOperations: boolean;
customActions: string[];
}
```
## Registry Connectors
### IRegistryConnector
```typescript
interface IRegistryConnector extends IConnector {
// Repository operations
listRepositories(authContext: AuthContext): Promise<Repository[]>;
// Tag operations
listTags(authContext: AuthContext, repository: string): Promise<Tag[]>;
getManifest(authContext: AuthContext, repository: string, reference: string): Promise<Manifest>;
getDigest(authContext: AuthContext, repository: string, tag: string): Promise<string>;
// Image operations
imageExists(authContext: AuthContext, repository: string, digest: string): Promise<boolean>;
getImageMetadata(authContext: AuthContext, repository: string, digest: string): Promise<ImageMetadata>;
}
interface Repository {
name: string;
fullName: string;
tagCount?: number;
lastUpdated?: DateTime;
}
interface Tag {
name: string;
digest: string;
createdAt?: DateTime;
size?: number;
}
interface ImageMetadata {
digest: string;
mediaType: string;
size: number;
architecture: string;
os: string;
created: DateTime;
labels: Record<string, string>;
layers: LayerInfo[];
}
```
### Docker Registry Connector
```typescript
class DockerRegistryConnector implements IRegistryConnector {
readonly typeId = "docker-registry";
readonly displayName = "Docker Registry";
readonly version = "1.0.0";
readonly capabilities: ConnectorCapabilities = {
discovery: true,
webhooks: true,
streaming: false,
batchOperations: false,
customActions: []
};
private httpClient: HttpClient;
async initialize(config: DockerRegistryConfig): Promise<void> {
this.httpClient = new HttpClient({
baseUrl: config.url,
timeout: config.timeout || 30000,
insecureSkipVerify: config.insecureSkipVerify
});
}
async ping(config: DockerRegistryConfig): Promise<void> {
const response = await this.httpClient.get("/v2/");
if (response.status !== 200 && response.status !== 401) {
throw new Error(`Registry unavailable: ${response.status}`);
}
}
async authenticate(
config: DockerRegistryConfig,
creds: BasicCredential
): Promise<AuthContext> {
// Get auth challenge from /v2/
const challenge = await this.getAuthChallenge();
if (challenge.type === "bearer") {
// OAuth2 token flow
const token = await this.getToken(challenge, creds);
return { type: "bearer", token };
} else {
// Basic auth
return {
type: "basic",
credentials: Buffer.from(`${creds.username}:${creds.password}`).toString("base64")
};
}
}
async getDigest(
authContext: AuthContext,
repository: string,
tag: string
): Promise<string> {
const response = await this.httpClient.head(
`/v2/${repository}/manifests/${tag}`,
{
headers: {
...this.authHeader(authContext),
Accept: "application/vnd.docker.distribution.manifest.v2+json"
}
}
);
const digest = response.headers.get("docker-content-digest");
if (!digest) {
throw new Error("No digest header in response");
}
return digest;
}
async getImageMetadata(
authContext: AuthContext,
repository: string,
digest: string
): Promise<ImageMetadata> {
// Fetch manifest
const manifest = await this.getManifest(authContext, repository, digest);
// Fetch config blob
const configDigest = manifest.config.digest;
const configResponse = await this.httpClient.get(
`/v2/${repository}/blobs/${configDigest}`,
{ headers: this.authHeader(authContext) }
);
const config = await configResponse.json();
return {
digest,
mediaType: manifest.mediaType,
size: manifest.config.size,
architecture: config.architecture,
os: config.os,
created: new Date(config.created),
labels: config.config?.Labels || {},
layers: manifest.layers.map(l => ({
digest: l.digest,
size: l.size,
mediaType: l.mediaType
}))
};
}
}
```
### ECR Connector
```typescript
class ECRConnector implements IRegistryConnector {
readonly typeId = "ecr";
readonly displayName = "AWS ECR";
readonly version = "1.0.0";
readonly capabilities: ConnectorCapabilities = {
discovery: true,
webhooks: false,
streaming: false,
batchOperations: true,
customActions: ["createRepository", "setLifecyclePolicy"]
};
private ecrClient: ECRClient;
async initialize(config: ECRConfig): Promise<void> {
this.ecrClient = new ECRClient({
region: config.region,
credentials: {
accessKeyId: config.accessKeyId,
secretAccessKey: config.secretAccessKey
}
});
}
async authenticate(
config: ECRConfig,
creds: AWSCredential
): Promise<AuthContext> {
const command = new GetAuthorizationTokenCommand({});
const response = await this.ecrClient.send(command);
const authData = response.authorizationData?.[0];
if (!authData?.authorizationToken) {
throw new Error("Failed to get ECR authorization token");
}
return {
type: "bearer",
token: authData.authorizationToken,
expiresAt: authData.expiresAt
};
}
async listRepositories(authContext: AuthContext): Promise<Repository[]> {
const repositories: Repository[] = [];
let nextToken: string | undefined;
do {
const command = new DescribeRepositoriesCommand({
nextToken
});
const response = await this.ecrClient.send(command);
for (const repo of response.repositories || []) {
repositories.push({
name: repo.repositoryName!,
fullName: repo.repositoryUri!,
lastUpdated: repo.createdAt
});
}
nextToken = response.nextToken;
} while (nextToken);
return repositories;
}
}
```
## CI/CD Connectors
### ICICDConnector
```typescript
interface ICICDConnector extends IConnector {
// Pipeline operations
listPipelines(authContext: AuthContext): Promise<Pipeline[]>;
getPipeline(authContext: AuthContext, pipelineId: string): Promise<Pipeline>;
// Trigger operations
triggerPipeline(
authContext: AuthContext,
pipelineId: string,
params: TriggerParams
): Promise<PipelineRun>;
// Run operations
getPipelineRun(authContext: AuthContext, runId: string): Promise<PipelineRun>;
cancelPipelineRun(authContext: AuthContext, runId: string): Promise<void>;
getPipelineRunLogs(authContext: AuthContext, runId: string): Promise<string>;
}
interface Pipeline {
id: string;
name: string;
ref?: string;
webUrl?: string;
}
interface TriggerParams {
ref?: string; // Branch/tag
variables?: Record<string, string>;
}
interface PipelineRun {
id: string;
pipelineId: string;
status: PipelineStatus;
ref?: string;
webUrl?: string;
startedAt?: DateTime;
finishedAt?: DateTime;
}
type PipelineStatus =
| "pending"
| "running"
| "success"
| "failed"
| "cancelled";
```
### GitLab CI Connector
```typescript
class GitLabCIConnector implements ICICDConnector {
readonly typeId = "gitlab-ci";
readonly displayName = "GitLab CI/CD";
readonly version = "1.0.0";
readonly capabilities: ConnectorCapabilities = {
discovery: true,
webhooks: true,
streaming: false,
batchOperations: false,
customActions: ["retryPipeline"]
};
private apiClient: GitLabClient;
async initialize(config: GitLabCIConfig): Promise<void> {
this.apiClient = new GitLabClient({
baseUrl: config.url,
projectId: config.projectId
});
}
async authenticate(
config: GitLabCIConfig,
creds: TokenCredential
): Promise<AuthContext> {
// Validate token with user endpoint
this.apiClient.setToken(creds.token);
await this.apiClient.get("/user");
return {
type: "bearer",
token: creds.token
};
}
async triggerPipeline(
authContext: AuthContext,
pipelineId: string,
params: TriggerParams
): Promise<PipelineRun> {
const response = await this.apiClient.post(
`/projects/${this.projectId}/pipeline`,
{
ref: params.ref || this.defaultBranch,
variables: Object.entries(params.variables || {}).map(([key, value]) => ({
key,
value,
variable_type: "env_var"
}))
},
{ headers: { Authorization: `Bearer ${authContext.token}` } }
);
return {
id: response.id.toString(),
pipelineId: pipelineId,
status: this.mapStatus(response.status),
ref: response.ref,
webUrl: response.web_url,
startedAt: response.started_at ? new Date(response.started_at) : undefined
};
}
async getPipelineRun(
authContext: AuthContext,
runId: string
): Promise<PipelineRun> {
const response = await this.apiClient.get(
`/projects/${this.projectId}/pipelines/${runId}`,
{ headers: { Authorization: `Bearer ${authContext.token}` } }
);
return {
id: response.id.toString(),
pipelineId: response.id.toString(),
status: this.mapStatus(response.status),
ref: response.ref,
webUrl: response.web_url,
startedAt: response.started_at ? new Date(response.started_at) : undefined,
finishedAt: response.finished_at ? new Date(response.finished_at) : undefined
};
}
private mapStatus(gitlabStatus: string): PipelineStatus {
const statusMap: Record<string, PipelineStatus> = {
created: "pending",
waiting_for_resource: "pending",
preparing: "pending",
pending: "pending",
running: "running",
success: "success",
failed: "failed",
canceled: "cancelled",
skipped: "cancelled",
manual: "pending"
};
return statusMap[gitlabStatus] || "pending";
}
}
```
## Notification Connectors
### INotificationConnector
```typescript
interface INotificationConnector extends IConnector {
// Channel operations
listChannels(authContext: AuthContext): Promise<Channel[]>;
// Send operations
sendMessage(
authContext: AuthContext,
channel: string,
message: NotificationMessage
): Promise<MessageResult>;
sendTemplate(
authContext: AuthContext,
channel: string,
templateId: string,
data: Record<string, any>
): Promise<MessageResult>;
}
interface Channel {
id: string;
name: string;
type: string;
}
interface NotificationMessage {
text: string;
title?: string;
color?: string;
fields?: MessageField[];
actions?: MessageAction[];
}
interface MessageField {
name: string;
value: string;
inline?: boolean;
}
interface MessageAction {
type: "button" | "link";
text: string;
url?: string;
style?: "primary" | "danger" | "default";
}
```
### Slack Connector
```typescript
class SlackConnector implements INotificationConnector {
readonly typeId = "slack";
readonly displayName = "Slack";
readonly version = "1.0.0";
readonly capabilities: ConnectorCapabilities = {
discovery: true,
webhooks: true,
streaming: false,
batchOperations: false,
customActions: ["addReaction", "updateMessage"]
};
private slackClient: WebClient;
async initialize(config: SlackConfig): Promise<void> {
// Client initialized in authenticate
}
async authenticate(
config: SlackConfig,
creds: TokenCredential
): Promise<AuthContext> {
this.slackClient = new WebClient(creds.token);
// Test authentication
const result = await this.slackClient.auth.test();
if (!result.ok) {
throw new Error("Slack authentication failed");
}
return {
type: "bearer",
token: creds.token,
teamId: result.team_id,
userId: result.user_id
};
}
async listChannels(authContext: AuthContext): Promise<Channel[]> {
const channels: Channel[] = [];
let cursor: string | undefined;
do {
const result = await this.slackClient.conversations.list({
types: "public_channel,private_channel",
cursor
});
for (const channel of result.channels || []) {
channels.push({
id: channel.id!,
name: channel.name!,
type: channel.is_private ? "private" : "public"
});
}
cursor = result.response_metadata?.next_cursor;
} while (cursor);
return channels;
}
async sendMessage(
authContext: AuthContext,
channel: string,
message: NotificationMessage
): Promise<MessageResult> {
const blocks = this.buildBlocks(message);
const result = await this.slackClient.chat.postMessage({
channel,
text: message.text,
blocks,
attachments: message.color ? [{
color: message.color,
blocks
}] : undefined
});
return {
messageId: result.ts!,
channel: result.channel!,
success: result.ok
};
}
private buildBlocks(message: NotificationMessage): KnownBlock[] {
const blocks: KnownBlock[] = [];
if (message.title) {
blocks.push({
type: "header",
text: {
type: "plain_text",
text: message.title
}
});
}
blocks.push({
type: "section",
text: {
type: "mrkdwn",
text: message.text
}
});
if (message.fields?.length) {
blocks.push({
type: "section",
fields: message.fields.map(f => ({
type: "mrkdwn",
text: `*${f.name}*\n${f.value}`
}))
});
}
if (message.actions?.length) {
blocks.push({
type: "actions",
elements: message.actions.map(a => ({
type: "button",
text: {
type: "plain_text",
text: a.text
},
url: a.url,
style: a.style === "danger" ? "danger" : "primary"
}))
});
}
return blocks;
}
}
```
## Secret Store Connectors
### ISecretConnector
```typescript
interface ISecretConnector extends IConnector {
// Secret operations
getSecret(
authContext: AuthContext,
path: string,
key?: string
): Promise<SecretValue>;
listSecrets(
authContext: AuthContext,
path: string
): Promise<string[]>;
}
interface SecretValue {
value: string;
version?: string;
createdAt?: DateTime;
expiresAt?: DateTime;
}
```
### HashiCorp Vault Connector
```typescript
class VaultConnector implements ISecretConnector {
readonly typeId = "hashicorp-vault";
readonly displayName = "HashiCorp Vault";
readonly version = "1.0.0";
readonly capabilities: ConnectorCapabilities = {
discovery: true,
webhooks: false,
streaming: false,
batchOperations: false,
customActions: ["renewToken"]
};
private vaultClient: VaultClient;
async initialize(config: VaultConfig): Promise<void> {
this.vaultClient = new VaultClient({
endpoint: config.url,
namespace: config.namespace
});
}
async authenticate(
config: VaultConfig,
creds: Credential
): Promise<AuthContext> {
let token: string;
switch (config.authMethod) {
case "token":
token = (creds as TokenCredential).token;
break;
case "approle":
const approle = creds as AppRoleCredential;
const result = await this.vaultClient.auth.approle.login({
role_id: approle.roleId,
secret_id: approle.secretId
});
token = result.auth.client_token;
break;
case "kubernetes":
const k8s = creds as KubernetesCredential;
const k8sResult = await this.vaultClient.auth.kubernetes.login({
role: k8s.role,
jwt: k8s.serviceAccountToken
});
token = k8sResult.auth.client_token;
break;
default:
throw new Error(`Unsupported auth method: ${config.authMethod}`);
}
this.vaultClient.token = token;
return {
type: "bearer",
token,
renewable: true
};
}
async getSecret(
authContext: AuthContext,
path: string,
key?: string
): Promise<SecretValue> {
const result = await this.vaultClient.kv.v2.read({
mount_path: this.mountPath,
path
});
const data = result.data.data;
const value = key ? data[key] : JSON.stringify(data);
return {
value,
version: result.data.metadata.version.toString(),
createdAt: new Date(result.data.metadata.created_time)
};
}
async listSecrets(
authContext: AuthContext,
path: string
): Promise<string[]> {
const result = await this.vaultClient.kv.v2.list({
mount_path: this.mountPath,
path
});
return result.data.keys;
}
}
```
## Custom Connector Development
### Plugin Structure
```
my-connector/
├── manifest.yaml
├── src/
│ ├── connector.ts
│ ├── config.ts
│ └── types.ts
└── package.json
```
### Manifest
```yaml
# manifest.yaml
id: my-custom-connector
version: 1.0.0
name: My Custom Connector
description: Custom connector for XYZ service
author: Your Name
connector:
typeId: my-service
displayName: My Service
entrypoint: ./src/connector.js
capabilities:
discovery: true
webhooks: false
streaming: false
batchOperations: false
config_schema:
type: object
properties:
url:
type: string
format: uri
description: Service URL
timeout:
type: integer
default: 30000
required:
- url
credential_types:
- api-key
- oauth2
```
### Implementation
```typescript
// connector.ts
import { IConnector, ConnectorCapabilities } from "@stella-ops/connector-sdk";
export class MyConnector implements IConnector {
readonly typeId = "my-service";
readonly displayName = "My Service";
readonly version = "1.0.0";
readonly capabilities: ConnectorCapabilities = {
discovery: true,
webhooks: false,
streaming: false,
batchOperations: false,
customActions: []
};
async initialize(config: MyConfig): Promise<void> {
// Initialize your connector
}
async dispose(): Promise<void> {
// Cleanup resources
}
async ping(config: MyConfig): Promise<void> {
// Check connectivity
}
async healthCheck(config: MyConfig, creds: Credential): Promise<HealthCheckResult> {
// Full health check
}
async authenticate(config: MyConfig, creds: Credential): Promise<AuthContext> {
// Authenticate and return context
}
async discover(
config: MyConfig,
authContext: AuthContext,
resourceType: string,
filter?: DiscoveryFilter
): Promise<DiscoveredResource[]> {
// Discover resources
}
}
// Export connector factory
export default function createConnector(): IConnector {
return new MyConnector();
}
```
## References
- [Integrations Overview](overview.md)
- [Webhooks](webhooks.md)
- [Plugin System](../modules/plugin-system.md)

View File

@@ -0,0 +1,412 @@
# Integrations Overview
## Purpose
The Integration Hub (INTHUB) provides a unified interface for connecting Release Orchestrator to external systems including container registries, CI/CD pipelines, notification services, secret stores, and metrics providers.
## Integration Architecture
```
INTEGRATION HUB ARCHITECTURE
┌─────────────────────────────────────────────────────────────────────────────┐
│ INTEGRATION HUB │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ INTEGRATION MANAGER │ │
│ │ │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ Type │ │ Instance │ │ Health │ │ Discovery │ │ │
│ │ │ Registry │ │ Manager │ │ Monitor │ │ Service │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CONNECTOR RUNTIME │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
│ │ │ CONNECTOR POOL │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │
│ │ │ │ Docker │ │ GitLab │ │ Slack │ │ Vault │ │ │ │
│ │ │ │ Registry │ │ CI │ │ │ │ │ │ │ │
│ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │
│ │ │ │ │ │
│ │ └──────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────┬─────────────────┼─────────────────┬─────────────┐
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│Container│ │ CI/CD │ │ Notifi- │ │ Secret │ │ Metrics │
│Registry │ │ Systems │ │ cations │ │ Stores │ │ Systems │
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘
```
## Integration Types
### Container Registries
| Type ID | Description | Discovery Support |
|---------|-------------|-------------------|
| `docker-registry` | Docker Registry v2 API | Yes |
| `docker-hub` | Docker Hub | Yes |
| `gcr` | Google Container Registry | Yes |
| `ecr` | AWS Elastic Container Registry | Yes |
| `acr` | Azure Container Registry | Yes |
| `ghcr` | GitHub Container Registry | Yes |
| `harbor` | Harbor Registry | Yes |
| `jfrog` | JFrog Artifactory | Yes |
| `nexus` | Sonatype Nexus | Yes |
| `quay` | Quay.io | Yes |
### CI/CD Systems
| Type ID | Description | Trigger Support |
|---------|-------------|-----------------|
| `gitlab-ci` | GitLab CI/CD | Yes |
| `github-actions` | GitHub Actions | Yes |
| `jenkins` | Jenkins | Yes |
| `azure-devops` | Azure DevOps Pipelines | Yes |
| `circleci` | CircleCI | Yes |
| `teamcity` | TeamCity | Yes |
| `drone` | Drone CI | Yes |
### Notification Services
| Type ID | Description | Features |
|---------|-------------|----------|
| `slack` | Slack | Channels, threads, reactions |
| `teams` | Microsoft Teams | Channels, cards |
| `email` | Email (SMTP) | Templates, attachments |
| `webhook` | Generic Webhook | JSON payloads |
| `pagerduty` | PagerDuty | Incidents, alerts |
| `opsgenie` | OpsGenie | Alerts, on-call |
### Secret Stores
| Type ID | Description | Features |
|---------|-------------|----------|
| `hashicorp-vault` | HashiCorp Vault | KV, Transit, PKI |
| `aws-secrets-manager` | AWS Secrets Manager | Rotation, versioning |
| `azure-key-vault` | Azure Key Vault | Keys, secrets, certs |
| `gcp-secret-manager` | GCP Secret Manager | Versions, labels |
### Metrics & Monitoring
| Type ID | Description | Use Case |
|---------|-------------|----------|
| `prometheus` | Prometheus | Canary metrics |
| `datadog` | Datadog | APM, logs, metrics |
| `newrelic` | New Relic | APM, infra monitoring |
| `dynatrace` | Dynatrace | Full-stack monitoring |
## Integration Configuration
### Integration Entity
```typescript
interface Integration {
id: UUID;
tenantId: UUID;
typeId: string; // e.g., "docker-registry"
name: string; // Display name
description?: string;
// Connection configuration
config: IntegrationConfig;
// Credential reference (stored in vault)
credentialRef: string;
// Health tracking
healthStatus: "healthy" | "degraded" | "unhealthy" | "unknown";
lastHealthCheck?: DateTime;
// Metadata
labels: Record<string, string>;
createdAt: DateTime;
updatedAt: DateTime;
}
interface IntegrationConfig {
// Common fields
url?: string;
timeout?: number;
retries?: number;
// Type-specific fields
[key: string]: any;
}
```
### Type-Specific Configuration
```typescript
// Docker Registry
interface DockerRegistryConfig extends IntegrationConfig {
url: string; // https://registry.example.com
repository?: string; // Optional default repository
insecureSkipVerify?: boolean; // Skip TLS verification
}
// GitLab CI
interface GitLabCIConfig extends IntegrationConfig {
url: string; // https://gitlab.example.com
projectId: string; // Project ID or path
defaultBranch?: string; // Default ref for triggers
}
// Slack
interface SlackConfig extends IntegrationConfig {
workspace?: string; // Workspace identifier
defaultChannel?: string; // Default channel for notifications
iconEmoji?: string; // Bot icon
}
// HashiCorp Vault
interface VaultConfig extends IntegrationConfig {
url: string; // https://vault.example.com
namespace?: string; // Vault namespace
mountPath: string; // Secret mount path
authMethod: "token" | "approle" | "kubernetes";
}
```
## Credential Management
Credentials are never stored in the Release Orchestrator database. Instead, references to external secret stores are used.
### Credential Reference Format
```
vault://vault-integration-id/path/to/secret#key
└─────────┬────────┘ └─────┬─────┘ └┬┘
Vault ID Secret path Key
```
### Credential Types
```typescript
type CredentialType =
| "basic" // Username/password
| "token" // Bearer token
| "api-key" // API key
| "oauth2" // OAuth2 credentials
| "service-account" // GCP/K8s service account
| "certificate"; // Client certificate
interface CredentialReference {
type: CredentialType;
ref: string; // Vault reference
}
// Examples
const dockerCreds: CredentialReference = {
type: "basic",
ref: "vault://vault-1/docker/registry.example.com#credentials"
};
const gitlabToken: CredentialReference = {
type: "token",
ref: "vault://vault-1/ci/gitlab#access_token"
};
```
## Health Monitoring
### Health Check Types
| Check Type | Description | Frequency |
|------------|-------------|-----------|
| `connectivity` | TCP/HTTP connectivity | 1 min |
| `authentication` | Credential validity | 5 min |
| `functionality` | Full operation test | 15 min |
### Health Check Flow
```typescript
interface HealthCheckResult {
integrationId: UUID;
checkType: string;
status: "healthy" | "degraded" | "unhealthy";
latencyMs: number;
message?: string;
checkedAt: DateTime;
}
class IntegrationHealthMonitor {
async checkHealth(integration: Integration): Promise<HealthCheckResult> {
const connector = this.connectorPool.get(integration.typeId);
const startTime = Date.now();
try {
// Connectivity check
await connector.ping(integration.config);
// Authentication check
const creds = await this.fetchCredentials(integration.credentialRef);
await connector.authenticate(integration.config, creds);
return {
integrationId: integration.id,
checkType: "full",
status: "healthy",
latencyMs: Date.now() - startTime,
checkedAt: new Date()
};
} catch (error) {
return {
integrationId: integration.id,
checkType: "full",
status: this.classifyError(error),
latencyMs: Date.now() - startTime,
message: error.message,
checkedAt: new Date()
};
}
}
}
```
## Discovery Service
Integrations can discover resources from connected systems.
### Discovery Operations
```typescript
interface DiscoveryService {
// Discover available repositories
discoverRepositories(integrationId: UUID): Promise<Repository[]>;
// Discover tags/versions
discoverTags(integrationId: UUID, repository: string): Promise<Tag[]>;
// Discover pipelines
discoverPipelines(integrationId: UUID): Promise<Pipeline[]>;
// Discover notification channels
discoverChannels(integrationId: UUID): Promise<Channel[]>;
}
// Example: Discover Docker repositories
const repos = await discoveryService.discoverRepositories(dockerIntegrationId);
// Returns: [{ name: "myapp", tags: ["latest", "v1.0.0", ...] }, ...]
```
### Discovery Caching
```typescript
interface DiscoveryCache {
key: string; // integration_id:resource_type
data: any;
discoveredAt: DateTime;
ttlSeconds: number;
}
// Cache TTLs by resource type
const cacheTTLs = {
repositories: 3600, // 1 hour
tags: 300, // 5 minutes
pipelines: 3600, // 1 hour
channels: 86400 // 24 hours
};
```
## API Reference
### Create Integration
```http
POST /api/v1/integrations
Content-Type: application/json
{
"typeId": "docker-registry",
"name": "Production Registry",
"config": {
"url": "https://registry.example.com",
"repository": "myorg"
},
"credentialRef": "vault://vault-1/docker/prod-registry#credentials",
"labels": {
"environment": "production"
}
}
```
### Test Integration
```http
POST /api/v1/integrations/{id}/test
```
Response:
```json
{
"success": true,
"data": {
"connectivityTest": { "status": "passed", "latencyMs": 45 },
"authenticationTest": { "status": "passed", "latencyMs": 120 },
"functionalityTest": { "status": "passed", "latencyMs": 230 }
}
}
```
### Discover Resources
```http
POST /api/v1/integrations/{id}/discover
Content-Type: application/json
{
"resourceType": "repositories",
"filter": {
"namePattern": "myapp-*"
}
}
```
## Error Handling
### Integration Errors
| Error Code | Description | Retry Strategy |
|------------|-------------|----------------|
| `INTEGRATION_NOT_FOUND` | Integration ID not found | No retry |
| `INTEGRATION_UNHEALTHY` | Integration health check failing | Backoff retry |
| `CREDENTIAL_FETCH_FAILED` | Cannot fetch credentials | Retry with backoff |
| `CONNECTION_REFUSED` | Cannot connect to endpoint | Retry with backoff |
| `AUTHENTICATION_FAILED` | Invalid credentials | No retry |
| `RATE_LIMITED` | Too many requests | Retry after delay |
### Circuit Breaker
```typescript
interface CircuitBreakerConfig {
failureThreshold: number; // Failures before opening
successThreshold: number; // Successes to close
timeout: number; // Time in open state (ms)
}
// Default configuration
const defaultCircuitBreaker: CircuitBreakerConfig = {
failureThreshold: 5,
successThreshold: 3,
timeout: 60000
};
```
## References
- [Connectors](connectors.md)
- [Webhooks](webhooks.md)
- [CI/CD Integration](ci-cd.md)
- [Integration Hub Module](../modules/integration-hub.md)

View File

@@ -0,0 +1,627 @@
# Webhooks
## Overview
Release Orchestrator supports both inbound webhooks (receiving events from external systems) and outbound webhooks (sending events to external systems).
## Inbound Webhooks
### Webhook Types
| Type | Source | Triggers |
|------|--------|----------|
| `registry-push` | Container registries | Image push events |
| `ci-pipeline` | CI/CD systems | Pipeline completion |
| `github-app` | GitHub | PR, push, workflow events |
| `gitlab-webhook` | GitLab | Pipeline, push, MR events |
| `generic` | Any system | Custom payloads |
### Registry Push Webhook
Receives events when new images are pushed to registries.
```
POST /api/v1/webhooks/registry/{integrationId}
Content-Type: application/json
# Docker Hub
{
"push_data": {
"tag": "v1.2.0",
"images": ["sha256:abc123..."],
"pushed_at": 1704067200
},
"repository": {
"name": "myapp",
"namespace": "myorg",
"repo_url": "https://hub.docker.com/r/myorg/myapp"
}
}
# Harbor
{
"type": "PUSH_ARTIFACT",
"occur_at": 1704067200,
"event_data": {
"repository": {
"name": "myapp",
"repo_full_name": "myorg/myapp"
},
"resources": [{
"digest": "sha256:abc123...",
"tag": "v1.2.0"
}]
}
}
```
### Webhook Handler
```typescript
interface WebhookHandler {
handleRegistryPush(
integrationId: UUID,
payload: RegistryPushPayload
): Promise<WebhookResponse>;
handleCIPipeline(
integrationId: UUID,
payload: CIPipelinePayload
): Promise<WebhookResponse>;
}
class RegistryWebhookHandler implements WebhookHandler {
async handleRegistryPush(
integrationId: UUID,
payload: RegistryPushPayload
): Promise<WebhookResponse> {
// Normalize payload from different registries
const normalized = this.normalizePayload(payload);
// Find matching component
const component = await this.componentRegistry.findByRepository(
normalized.repository
);
if (!component) {
return {
success: true,
action: "ignored",
reason: "No matching component"
};
}
// Update version map
await this.versionManager.addVersion({
componentId: component.id,
tag: normalized.tag,
digest: normalized.digest,
channel: this.determineChannel(normalized.tag)
});
// Check for auto-release triggers
const triggers = await this.getTriggers(component.id, normalized.tag);
for (const trigger of triggers) {
await this.triggerRelease(trigger, normalized);
}
return {
success: true,
action: "processed",
componentId: component.id,
versionsAdded: 1,
triggersActivated: triggers.length
};
}
private normalizePayload(payload: any): NormalizedPushEvent {
// Detect registry type and normalize
if (payload.push_data) {
// Docker Hub format
return {
repository: `${payload.repository.namespace}/${payload.repository.name}`,
tag: payload.push_data.tag,
digest: payload.push_data.images[0],
pushedAt: new Date(payload.push_data.pushed_at * 1000)
};
}
if (payload.type === "PUSH_ARTIFACT") {
// Harbor format
return {
repository: payload.event_data.repository.repo_full_name,
tag: payload.event_data.resources[0].tag,
digest: payload.event_data.resources[0].digest,
pushedAt: new Date(payload.occur_at * 1000)
};
}
// Generic format
return payload as NormalizedPushEvent;
}
}
```
### Webhook Authentication
```typescript
interface WebhookAuth {
// Signature validation
validateSignature(
payload: Buffer,
signature: string,
secret: string,
algorithm: SignatureAlgorithm
): boolean;
// Token validation
validateToken(
token: string,
expectedToken: string
): boolean;
}
type SignatureAlgorithm = "hmac-sha256" | "hmac-sha1";
class WebhookAuthenticator implements WebhookAuth {
validateSignature(
payload: Buffer,
signature: string,
secret: string,
algorithm: SignatureAlgorithm
): boolean {
const algo = algorithm === "hmac-sha256" ? "sha256" : "sha1";
const expected = crypto
.createHmac(algo, secret)
.update(payload)
.digest("hex");
// Constant-time comparison
return crypto.timingSafeEqual(
Buffer.from(signature),
Buffer.from(expected)
);
}
}
```
### Webhook Configuration
```typescript
interface WebhookConfig {
id: UUID;
integrationId: UUID;
type: WebhookType;
// Security
secretRef: string; // Vault reference for signature secret
signatureHeader?: string; // Header containing signature
signatureAlgorithm?: SignatureAlgorithm;
// Processing
enabled: boolean;
filters?: WebhookFilter[]; // Filter events
// Retry
retryPolicy: RetryPolicy;
}
interface WebhookFilter {
field: string; // JSONPath to field
operator: "equals" | "contains" | "matches";
value: string;
}
// Example: Only process tags matching semver
const semverFilter: WebhookFilter = {
field: "$.tag",
operator: "matches",
value: "^v\\d+\\.\\d+\\.\\d+$"
};
```
## Outbound Webhooks
### Event Types
| Event | Description | Payload |
|-------|-------------|---------|
| `release.created` | New release created | Release details |
| `promotion.requested` | Promotion requested | Promotion details |
| `promotion.approved` | Promotion approved | Approval details |
| `promotion.rejected` | Promotion rejected | Rejection details |
| `deployment.started` | Deployment started | Job details |
| `deployment.completed` | Deployment completed | Job details, results |
| `deployment.failed` | Deployment failed | Job details, error |
| `rollback.initiated` | Rollback initiated | Rollback details |
### Webhook Subscription
```typescript
interface WebhookSubscription {
id: UUID;
tenantId: UUID;
name: string;
// Target
url: string;
method: "POST" | "PUT";
headers?: Record<string, string>;
// Authentication
authType: "none" | "basic" | "bearer" | "signature";
credentialRef?: string;
signatureSecret?: string;
// Events
events: string[]; // Event types to subscribe
filters?: EventFilter[]; // Filter events
// Delivery
retryPolicy: RetryPolicy;
timeout: number;
// Status
enabled: boolean;
lastDelivery?: DateTime;
lastStatus?: number;
}
interface EventFilter {
field: string;
operator: string;
value: any;
}
```
### Webhook Delivery
```typescript
interface WebhookPayload {
id: string; // Delivery ID
timestamp: string; // ISO-8601
event: string; // Event type
tenantId: string;
data: Record<string, any>; // Event-specific data
}
class WebhookDeliveryService {
async deliver(
subscription: WebhookSubscription,
event: DomainEvent
): Promise<DeliveryResult> {
const payload: WebhookPayload = {
id: uuidv4(),
timestamp: new Date().toISOString(),
event: event.type,
tenantId: subscription.tenantId,
data: this.buildEventData(event)
};
const headers = this.buildHeaders(subscription, payload);
const body = JSON.stringify(payload);
// Attempt delivery with retries
return this.deliverWithRetry(subscription, headers, body);
}
private buildHeaders(
subscription: WebhookSubscription,
payload: WebhookPayload
): Record<string, string> {
const headers: Record<string, string> = {
"Content-Type": "application/json",
"X-Stella-Event": payload.event,
"X-Stella-Delivery": payload.id,
"X-Stella-Timestamp": payload.timestamp,
...subscription.headers
};
// Add signature if configured
if (subscription.authType === "signature") {
const signature = this.computeSignature(
JSON.stringify(payload),
subscription.signatureSecret!
);
headers["X-Stella-Signature"] = signature;
}
return headers;
}
private async deliverWithRetry(
subscription: WebhookSubscription,
headers: Record<string, string>,
body: string
): Promise<DeliveryResult> {
const policy = subscription.retryPolicy;
let lastError: Error | undefined;
for (let attempt = 0; attempt <= policy.maxRetries; attempt++) {
try {
const response = await fetch(subscription.url, {
method: subscription.method,
headers,
body,
signal: AbortSignal.timeout(subscription.timeout)
});
// Record delivery
await this.recordDelivery(subscription.id, {
attempt,
statusCode: response.status,
success: response.ok
});
if (response.ok) {
return { success: true, statusCode: response.status, attempts: attempt + 1 };
}
// Non-retryable status codes
if (response.status >= 400 && response.status < 500) {
return {
success: false,
statusCode: response.status,
attempts: attempt + 1,
error: `Client error: ${response.status}`
};
}
lastError = new Error(`Server error: ${response.status}`);
} catch (error) {
lastError = error as Error;
}
// Wait before retry
if (attempt < policy.maxRetries) {
const delay = this.calculateDelay(policy, attempt);
await sleep(delay);
}
}
return {
success: false,
attempts: policy.maxRetries + 1,
error: lastError?.message
};
}
}
```
### Delivery Logging
```typescript
interface WebhookDeliveryLog {
id: UUID;
subscriptionId: UUID;
deliveryId: string;
// Request
url: string;
method: string;
headers: Record<string, string>;
body: string;
// Response
statusCode?: number;
responseBody?: string;
responseTime: number;
// Result
success: boolean;
attempt: number;
error?: string;
// Timing
createdAt: DateTime;
}
```
## Webhook API
### Register Subscription
```http
POST /api/v1/webhook-subscriptions
Content-Type: application/json
{
"name": "Deployment Notifications",
"url": "https://api.example.com/webhooks/stella",
"method": "POST",
"authType": "signature",
"signatureSecret": "my-secret-key",
"events": [
"deployment.started",
"deployment.completed",
"deployment.failed"
],
"filters": [
{
"field": "data.environment.name",
"operator": "equals",
"value": "production"
}
],
"retryPolicy": {
"maxRetries": 3,
"backoffType": "exponential",
"backoffSeconds": 10
},
"timeout": 30000
}
```
### Test Subscription
```http
POST /api/v1/webhook-subscriptions/{id}/test
Content-Type: application/json
{
"event": "deployment.completed"
}
```
Response:
```json
{
"success": true,
"data": {
"deliveryId": "d1234567-...",
"statusCode": 200,
"responseTime": 245,
"response": "OK"
}
}
```
### List Deliveries
```http
GET /api/v1/webhook-subscriptions/{id}/deliveries?page=1&pageSize=20
```
## Event Payloads
### deployment.completed
```json
{
"id": "delivery-uuid",
"timestamp": "2026-01-09T10:30:00Z",
"event": "deployment.completed",
"tenantId": "tenant-uuid",
"data": {
"deploymentJob": {
"id": "job-uuid",
"status": "completed"
},
"release": {
"id": "release-uuid",
"name": "myapp-v1.2.0",
"components": [
{
"name": "api",
"digest": "sha256:abc123..."
}
]
},
"environment": {
"id": "env-uuid",
"name": "production"
},
"promotion": {
"id": "promo-uuid",
"requestedBy": "user@example.com"
},
"targets": [
{
"id": "target-uuid",
"name": "prod-host-1",
"status": "succeeded"
}
],
"timing": {
"startedAt": "2026-01-09T10:25:00Z",
"completedAt": "2026-01-09T10:30:00Z",
"durationSeconds": 300
}
}
}
```
### promotion.requested
```json
{
"id": "delivery-uuid",
"timestamp": "2026-01-09T10:00:00Z",
"event": "promotion.requested",
"tenantId": "tenant-uuid",
"data": {
"promotion": {
"id": "promo-uuid",
"status": "pending_approval"
},
"release": {
"id": "release-uuid",
"name": "myapp-v1.2.0"
},
"sourceEnvironment": {
"id": "staging-uuid",
"name": "staging"
},
"targetEnvironment": {
"id": "prod-uuid",
"name": "production"
},
"requestedBy": {
"id": "user-uuid",
"email": "user@example.com",
"name": "John Doe"
},
"approvalRequired": {
"count": 2,
"currentApprovals": 0
}
}
}
```
## Security Considerations
### Signature Verification
Receivers should verify webhook signatures:
```python
import hmac
import hashlib
def verify_signature(payload: bytes, signature: str, secret: str) -> bool:
expected = hmac.new(
secret.encode(),
payload,
hashlib.sha256
).hexdigest()
return hmac.compare_digest(signature, expected)
# In webhook handler
@app.route("/webhooks/stella", methods=["POST"])
def handle_webhook():
signature = request.headers.get("X-Stella-Signature")
if not verify_signature(request.data, signature, WEBHOOK_SECRET):
return "Invalid signature", 401
payload = request.json
# Process event...
```
### IP Allowlisting
Configure firewall rules to only accept webhooks from Stella IP ranges:
- Document IP ranges in deployment configuration
- Use VPN or private networking where possible
### Replay Protection
Check delivery timestamps to prevent replay attacks:
```python
from datetime import datetime, timedelta
MAX_TIMESTAMP_AGE = timedelta(minutes=5)
def check_timestamp(timestamp_str: str) -> bool:
timestamp = datetime.fromisoformat(timestamp_str.replace("Z", "+00:00"))
now = datetime.now(timestamp.tzinfo)
return abs(now - timestamp) < MAX_TIMESTAMP_AGE
```
## References
- [Integrations Overview](overview.md)
- [Connectors](connectors.md)
- [CI/CD Integration](ci-cd.md)

View File

@@ -0,0 +1,597 @@
# AGENTS: Deployment Agents
**Purpose**: Lightweight deployment agents for target execution.
## Agent Types
| Agent Type | Transport | Target Types |
|------------|-----------|--------------|
| `agent-docker` | gRPC | Docker hosts |
| `agent-compose` | gRPC | Docker Compose hosts |
| `agent-ssh` | SSH | Linux remote hosts |
| `agent-winrm` | WinRM | Windows remote hosts |
| `agent-ecs` | AWS API | AWS ECS services |
| `agent-nomad` | Nomad API | HashiCorp Nomad jobs |
## Modules
### Module: `agent-core`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Shared agent runtime; task execution framework |
| **Protocol** | gRPC for communication with Stella Core |
| **Security** | mTLS authentication; short-lived JWT for tasks |
**Agent Lifecycle**:
1. Agent starts with registration token
2. Agent registers with capabilities and labels
3. Agent sends heartbeats (default: 30s interval)
4. Agent receives tasks from Stella Core
5. Agent reports task completion/failure
**Agent Task Protocol**:
```typescript
// Task assignment (Core → Agent)
interface AgentTask {
id: UUID;
type: TaskType;
targetId: UUID;
payload: TaskPayload;
credentials: EncryptedCredentials;
timeout: number;
priority: TaskPriority;
idempotencyKey: string;
assignedAt: DateTime;
expiresAt: DateTime;
}
type TaskType =
| "deploy"
| "rollback"
| "health-check"
| "inspect"
| "execute-command"
| "upload-files"
| "write-sticker"
| "read-sticker";
interface DeployTaskPayload {
image: string;
digest: string;
config: DeployConfig;
artifacts: ArtifactReference[];
previousDigest?: string;
hooks: {
preDeploy?: HookConfig;
postDeploy?: HookConfig;
};
}
// Task result (Agent → Core)
interface TaskResult {
taskId: UUID;
success: boolean;
startedAt: DateTime;
completedAt: DateTime;
// Success details
outputs?: Record<string, any>;
artifacts?: ArtifactReference[];
// Failure details
error?: string;
errorType?: string;
retriable?: boolean;
// Logs
logs: string;
// Metrics
metrics: {
pullDurationMs?: number;
deployDurationMs?: number;
healthCheckDurationMs?: number;
};
}
```
---
### Module: `agent-docker`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Docker container deployment |
| **Dependencies** | Docker Engine API |
| **Capabilities** | `docker.deploy`, `docker.rollback`, `docker.inspect` |
**Docker Agent Implementation**:
```typescript
class DockerAgent implements TargetExecutor {
private docker: Docker;
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
const { image, digest, config, previousDigest } = task;
const containerName = config.containerName;
// 1. Pull image and verify digest
this.log(`Pulling image ${image}@${digest}`);
await this.docker.pull(image, { digest });
const pulledDigest = await this.getImageDigest(image);
if (pulledDigest !== digest) {
throw new DigestMismatchError(
`Expected digest ${digest}, got ${pulledDigest}. Possible tampering detected.`
);
}
// 2. Run pre-deploy hook
if (task.hooks?.preDeploy) {
await this.runHook(task.hooks.preDeploy, "pre-deploy");
}
// 3. Stop and rename existing container
const existingContainer = await this.findContainer(containerName);
if (existingContainer) {
this.log(`Stopping existing container ${containerName}`);
await existingContainer.stop({ t: 10 });
await existingContainer.rename(`${containerName}-previous-${Date.now()}`);
}
// 4. Create new container
this.log(`Creating container ${containerName} from ${image}@${digest}`);
const container = await this.docker.createContainer({
name: containerName,
Image: `${image}@${digest}`, // Always use digest, not tag
Env: this.buildEnvVars(config.environment),
HostConfig: {
PortBindings: this.buildPortBindings(config.ports),
Binds: this.buildBindMounts(config.volumes),
RestartPolicy: { Name: config.restartPolicy || "unless-stopped" },
Memory: config.memoryLimit,
CpuQuota: config.cpuLimit,
},
Labels: {
"stella.release.id": config.releaseId,
"stella.release.name": config.releaseName,
"stella.digest": digest,
"stella.deployed.at": new Date().toISOString(),
},
});
// 5. Start container
this.log(`Starting container ${containerName}`);
await container.start();
// 6. Wait for container to be healthy
if (config.healthCheck) {
this.log(`Waiting for container health check`);
const healthy = await this.waitForHealthy(container, config.healthCheck.timeout);
if (!healthy) {
await this.rollbackContainer(containerName, existingContainer);
throw new HealthCheckFailedError(`Container ${containerName} failed health check`);
}
}
// 7. Run post-deploy hook
if (task.hooks?.postDeploy) {
await this.runHook(task.hooks.postDeploy, "post-deploy");
}
// 8. Cleanup previous container
if (existingContainer && config.cleanupPrevious !== false) {
this.log(`Removing previous container`);
await existingContainer.remove({ force: true });
}
return {
success: true,
containerId: container.id,
previousDigest: previousDigest,
};
}
async rollback(task: RollbackTaskPayload): Promise<DeployResult> {
const { containerName, targetDigest } = task;
if (targetDigest) {
// Deploy specific digest
return this.deploy({ ...task, digest: targetDigest });
}
// Find and restore previous container
const previousContainer = await this.findContainer(`${containerName}-previous-*`);
if (!previousContainer) {
throw new RollbackError(`No previous container found for ${containerName}`);
}
const currentContainer = await this.findContainer(containerName);
if (currentContainer) {
await currentContainer.stop({ t: 10 });
await currentContainer.rename(`${containerName}-failed-${Date.now()}`);
}
await previousContainer.rename(containerName);
await previousContainer.start();
return { success: true, containerId: previousContainer.id };
}
async writeSticker(sticker: VersionSticker): Promise<void> {
const stickerPath = this.config.stickerPath || "/var/stella/version.json";
const stickerContent = JSON.stringify(sticker, null, 2);
if (this.config.stickerLocation === "volume") {
await this.docker.run("alpine", [
"sh", "-c",
`echo '${stickerContent}' > ${stickerPath}`
], {
HostConfig: { Binds: [`${this.config.stickerVolume}:/var/stella`] }
});
} else {
fs.writeFileSync(stickerPath, stickerContent);
}
}
}
```
---
### Module: `agent-compose`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Docker Compose stack deployment |
| **Dependencies** | Docker Compose CLI |
| **Capabilities** | `compose.deploy`, `compose.rollback`, `compose.inspect` |
**Compose Agent Implementation**:
```typescript
class ComposeAgent implements TargetExecutor {
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
const { artifacts, config } = task;
const deployDir = config.deploymentDirectory;
// 1. Write compose lock file
const composeLock = artifacts.find(a => a.type === "compose_lock");
const composeContent = await this.fetchArtifact(composeLock);
const composePath = path.join(deployDir, "compose.stella.lock.yml");
await fs.writeFile(composePath, composeContent);
// 2. Run pre-deploy hook
if (task.hooks?.preDeploy) {
await this.runHook(task.hooks.preDeploy, deployDir);
}
// 3. Pull images
this.log("Pulling images...");
await this.runCompose(deployDir, ["pull"]);
// 4. Verify digests
await this.verifyDigests(composePath, config.expectedDigests);
// 5. Deploy
this.log("Deploying services...");
await this.runCompose(deployDir, ["up", "-d", "--remove-orphans", "--force-recreate"]);
// 6. Wait for services to be healthy
if (config.healthCheck) {
const healthy = await this.waitForServicesHealthy(deployDir, config.healthCheck.timeout);
if (!healthy) {
await this.rollbackToBackup(deployDir);
throw new HealthCheckFailedError("Services failed health check");
}
}
// 7. Run post-deploy hook
if (task.hooks?.postDeploy) {
await this.runHook(task.hooks.postDeploy, deployDir);
}
// 8. Write version sticker
await this.writeSticker(config.sticker, deployDir);
return { success: true };
}
private async verifyDigests(
composePath: string,
expectedDigests: Record<string, string>
): Promise<void> {
const composeContent = yaml.parse(await fs.readFile(composePath, "utf-8"));
for (const [service, expectedDigest] of Object.entries(expectedDigests)) {
const serviceConfig = composeContent.services[service];
if (!serviceConfig) {
throw new Error(`Service ${service} not found in compose file`);
}
const image = serviceConfig.image;
if (!image.includes("@sha256:")) {
throw new Error(`Service ${service} image not pinned to digest: ${image}`);
}
const actualDigest = image.split("@")[1];
if (actualDigest !== expectedDigest) {
throw new DigestMismatchError(
`Service ${service}: expected ${expectedDigest}, got ${actualDigest}`
);
}
}
}
}
```
---
### Module: `agent-ssh`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | SSH remote execution (agentless) |
| **Dependencies** | SSH client library |
| **Capabilities** | `ssh.deploy`, `ssh.execute`, `ssh.upload` |
**SSH Remote Executor**:
```typescript
class SSHRemoteExecutor implements TargetExecutor {
async connect(config: SSHConnectionConfig): Promise<void> {
const privateKey = await this.secrets.getSecret(config.privateKeyRef);
this.ssh = new SSHClient();
await this.ssh.connect({
host: config.host,
port: config.port || 22,
username: config.username,
privateKey: privateKey.value,
readyTimeout: config.connectionTimeout || 30000,
});
}
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
const { artifacts, config } = task;
const deployDir = config.deploymentDirectory;
try {
// 1. Ensure deployment directory exists
await this.exec(`mkdir -p ${deployDir}`);
await this.exec(`mkdir -p ${deployDir}/.stella-backup`);
// 2. Backup current deployment
await this.exec(`cp -r ${deployDir}/* ${deployDir}/.stella-backup/ 2>/dev/null || true`);
// 3. Upload artifacts
for (const artifact of artifacts) {
const content = await this.fetchArtifact(artifact);
const remotePath = path.join(deployDir, artifact.name);
await this.uploadFile(content, remotePath);
}
// 4. Run pre-deploy hook
if (task.hooks?.preDeploy) {
await this.runRemoteHook(task.hooks.preDeploy, deployDir);
}
// 5. Execute deployment script
const deployScript = artifacts.find(a => a.type === "deploy_script");
if (deployScript) {
const scriptPath = path.join(deployDir, deployScript.name);
await this.exec(`chmod +x ${scriptPath}`);
const result = await this.exec(scriptPath, { cwd: deployDir, timeout: config.deploymentTimeout });
if (result.exitCode !== 0) {
throw new DeploymentError(`Deploy script failed: ${result.stderr}`);
}
}
// 6. Run post-deploy hook
if (task.hooks?.postDeploy) {
await this.runRemoteHook(task.hooks.postDeploy, deployDir);
}
// 7. Health check
if (config.healthCheck) {
const healthy = await this.runHealthCheck(config.healthCheck);
if (!healthy) {
await this.rollback(task);
throw new HealthCheckFailedError("Health check failed");
}
}
// 8. Write version sticker
await this.writeSticker(config.sticker, deployDir);
// 9. Cleanup backup
await this.exec(`rm -rf ${deployDir}/.stella-backup`);
return { success: true };
} finally {
this.ssh.end();
}
}
}
```
---
### Module: `agent-winrm`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | WinRM remote execution (agentless) |
| **Dependencies** | WinRM client library |
| **Capabilities** | `winrm.deploy`, `winrm.execute`, `winrm.upload` |
| **Authentication** | NTLM, Kerberos, Basic |
---
### Module: `agent-ecs`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | AWS ECS service deployment |
| **Dependencies** | AWS SDK |
| **Capabilities** | `ecs.deploy`, `ecs.rollback`, `ecs.inspect` |
---
### Module: `agent-nomad`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | HashiCorp Nomad job deployment |
| **Dependencies** | Nomad API client |
| **Capabilities** | `nomad.deploy`, `nomad.rollback`, `nomad.inspect` |
---
## Agent Security Model
### Registration Flow
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ AGENT REGISTRATION FLOW │
│ │
│ 1. Admin generates registration token (one-time use) │
│ POST /api/v1/admin/agent-tokens │
│ → { token: "reg_xxx", expiresAt: "..." } │
│ │
│ 2. Agent starts with registration token │
│ ./stella-agent --register --token=reg_xxx │
│ │
│ 3. Agent requests mTLS certificate │
│ POST /api/v1/agents/register │
│ Headers: X-Registration-Token: reg_xxx │
│ Body: { name, version, capabilities, csr } │
│ → { agentId, certificate, caCertificate } │
│ │
│ 4. Agent establishes mTLS connection │
│ Uses issued certificate for all subsequent requests │
│ │
│ 5. Agent requests short-lived JWT for task execution │
│ POST /api/v1/agents/token (over mTLS) │
│ → { token, expiresIn: 3600 } // 1 hour │
│ │
│ 6. Agent refreshes token before expiration │
│ Token refresh only over mTLS connection │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Communication Security
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ AGENT COMMUNICATION SECURITY │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ AGENT │ │ STELLA CORE │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ │ mTLS (mutual TLS) │ │
│ │ - Agent cert signed by Stella CA │ │
│ │ - Server cert verified by Agent │ │
│ │ - TLS 1.3 only │ │
│ │ - Perfect forward secrecy │ │
│ │◄───────────────────────────────────────►│ │
│ │ │ │
│ │ Encrypted payload │ │
│ │ - Task payloads encrypted with │ │
│ │ agent-specific key │ │
│ │ - Logs encrypted in transit │ │
│ │◄───────────────────────────────────────►│ │
│ │ │ │
│ │ Heartbeat + capability refresh │ │
│ │ - Every 30 seconds │ │
│ │ - Signed with agent key │ │
│ │─────────────────────────────────────────►│ │
│ │ │ │
│ │ Task assignment │ │
│ │ - Contains short-lived credentials │ │
│ │ - Scoped to specific target │ │
│ │ - Expires after task timeout │ │
│ │◄─────────────────────────────────────────│ │
│ │ │ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## Database Schema
```sql
-- Agents
CREATE TABLE release.agents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
version VARCHAR(50) NOT NULL,
capabilities JSONB NOT NULL DEFAULT '[]',
labels JSONB NOT NULL DEFAULT '{}',
status VARCHAR(50) NOT NULL DEFAULT 'offline' CHECK (status IN (
'online', 'offline', 'degraded'
)),
last_heartbeat TIMESTAMPTZ,
resource_usage JSONB,
certificate_fingerprint VARCHAR(64),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, name)
);
CREATE INDEX idx_agents_tenant ON release.agents(tenant_id);
CREATE INDEX idx_agents_status ON release.agents(status);
CREATE INDEX idx_agents_capabilities ON release.agents USING GIN (capabilities);
```
---
## API Endpoints
```yaml
# Agent Registration
POST /api/v1/agents/register
Headers: X-Registration-Token: {token}
Body: { name, version, capabilities, csr }
Response: { agentId, certificate, caCertificate }
# Agent Management
GET /api/v1/agents
Query: ?status={online|offline|degraded}&capability={type}
Response: Agent[]
GET /api/v1/agents/{id}
Response: Agent
PUT /api/v1/agents/{id}
Body: { labels?, capabilities? }
Response: Agent
DELETE /api/v1/agents/{id}
Response: { deleted: true }
# Agent Communication
POST /api/v1/agents/{id}/heartbeat
Body: { status, resourceUsage, capabilities }
Response: { tasks: AgentTask[] }
POST /api/v1/agents/{id}/tasks/{taskId}/complete
Body: { success, result, logs }
Response: { acknowledged: true }
# WebSocket for real-time task stream
WS /api/v1/agents/{id}/task-stream
Messages:
- { type: "task_assigned", task: AgentTask }
- { type: "task_cancelled", taskId }
```
---
## References
- [Module Overview](overview.md)
- [Deploy Orchestrator](deploy-orchestrator.md)
- [Agent Security](../security/agent-security.md)
- [API Documentation](../api/agents.md)

View File

@@ -0,0 +1,477 @@
# DEPLOY: Deployment Execution
**Purpose**: Orchestrate deployment jobs, execute on targets, manage rollbacks, and generate artifacts.
## Modules
### Module: `deploy-orchestrator`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Deployment job coordination; strategy execution |
| **Dependencies** | `target-executor`, `artifact-generator`, `agent-manager` |
| **Data Entities** | `DeploymentJob`, `DeploymentTask` |
| **Events Produced** | `deployment.started`, `deployment.task_started`, `deployment.task_completed`, `deployment.completed`, `deployment.failed` |
**Deployment Job Entity**:
```typescript
interface DeploymentJob {
id: UUID;
tenantId: UUID;
promotionId: UUID;
releaseId: UUID;
environmentId: UUID;
status: DeploymentStatus;
strategy: DeploymentStrategy;
startedAt: DateTime;
completedAt: DateTime;
artifacts: GeneratedArtifact[];
rollbackOf: UUID | null; // If this is a rollback job
tasks: DeploymentTask[];
}
type DeploymentStatus =
| "pending" // Waiting to start
| "running" // Deployment in progress
| "succeeded" // All tasks succeeded
| "failed" // One or more tasks failed
| "cancelled" // User cancelled
| "rolling_back" // Rollback in progress
| "rolled_back"; // Rollback complete
interface DeploymentTask {
id: UUID;
jobId: UUID;
targetId: UUID;
digest: string;
status: TaskStatus;
agentId: UUID | null;
startedAt: DateTime;
completedAt: DateTime;
exitCode: number | null;
logs: string;
previousDigest: string | null;
stickerWritten: boolean;
}
type TaskStatus =
| "pending"
| "running"
| "succeeded"
| "failed"
| "cancelled"
| "skipped";
```
---
### Module: `target-executor`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Target-specific deployment logic |
| **Dependencies** | `agent-manager`, `connector-runtime` |
| **Protocol** | gRPC for agents, SSH/WinRM for agentless |
**Executor Types**:
| Type | Transport | Use Case |
|------|-----------|----------|
| `agent-docker` | gRPC | Docker hosts with agent |
| `agent-compose` | gRPC | Compose hosts with agent |
| `ssh-remote` | SSH | Agentless Linux hosts |
| `winrm-remote` | WinRM | Agentless Windows hosts |
| `ecs-api` | AWS API | AWS ECS services |
| `nomad-api` | Nomad API | HashiCorp Nomad jobs |
---
### Module: `runner-executor`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Script/hook execution in sandbox |
| **Dependencies** | `plugin-sandbox` |
| **Supported Scripts** | C# (.csx), Bash, PowerShell |
**Hook Types**:
- `pre-deploy`: Run before deployment starts
- `post-deploy`: Run after deployment succeeds
- `on-failure`: Run when deployment fails
- `on-rollback`: Run during rollback
---
### Module: `artifact-generator`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Generate immutable deployment artifacts |
| **Dependencies** | `release-manager`, `environment-manager` |
| **Data Entities** | `GeneratedArtifact`, `ComposeLock`, `VersionSticker` |
**Generated Artifacts**:
| Artifact Type | Description |
|---------------|-------------|
| `compose_lock` | `compose.stella.lock.yml` - Pinned digests |
| `script` | Compiled deployment script |
| `sticker` | `stella.version.json` - Version marker |
| `evidence` | Decision and execution evidence |
| `config` | Environment-specific config files |
**Compose Lock File Generation**:
```typescript
class ComposeLockGenerator {
async generate(
release: Release,
environment: Environment,
targets: Target[]
): Promise<GeneratedArtifact> {
const services: Record<string, any> = {};
for (const component of release.components) {
services[component.componentName] = {
// CRITICAL: Always use digest, never tag
image: `${component.imageRepository}@${component.digest}`,
// Environment variables
environment: this.mergeEnvironment(
environment.config.variables,
this.buildStellaEnv(release, environment)
),
// Labels for Stella tracking
labels: {
"stella.release.id": release.id,
"stella.release.name": release.name,
"stella.component.name": component.componentName,
"stella.component.digest": component.digest,
"stella.environment": environment.name,
"stella.deployed.at": new Date().toISOString(),
},
};
}
const composeLock = {
version: "3.8",
services,
"x-stella": {
release_id: release.id,
release_name: release.name,
environment: environment.name,
generated_at: new Date().toISOString(),
inputs_hash: this.computeInputsHash(release, environment),
components: release.components.map(c => ({
name: c.componentName,
digest: c.digest,
semver: c.semver,
})),
},
};
const content = yaml.stringify(composeLock);
const hash = crypto.createHash("sha256").update(content).digest("hex");
return {
type: "compose_lock",
name: "compose.stella.lock.yml",
content: Buffer.from(content),
contentHash: `sha256:${hash}`,
};
}
}
```
**Version Sticker Generation**:
```typescript
interface VersionSticker {
stella_version: "1.0";
release_id: UUID;
release_name: string;
components: Array<{
name: string;
digest: string;
semver: string;
tag: string;
image_repository: string;
}>;
environment: string;
environment_id: UUID;
deployed_at: string;
deployed_by: UUID;
promotion_id: UUID;
workflow_run_id: UUID;
evidence_packet_id: UUID;
evidence_packet_hash: string;
orchestrator_version: string;
source_ref?: {
commit_sha: string;
branch: string;
repository: string;
};
}
```
---
### Module: `rollback-manager`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Rollback orchestration; previous state recovery |
| **Dependencies** | `deploy-orchestrator`, `target-registry` |
**Rollback Strategies**:
| Strategy | Description |
|----------|-------------|
| `to-previous` | Roll back to last successful deployment |
| `to-release` | Roll back to specific release ID |
| `to-sticker` | Roll back to version in sticker on target |
**Rollback Flow**:
1. Identify rollback target (previous release or specified)
2. Create rollback deployment job
3. Execute deployment with rollback artifacts
4. Update target state and sticker
5. Record rollback evidence
---
## Deployment Strategies
### All-at-Once
Deploy to all targets simultaneously.
```typescript
interface AllAtOnceConfig {
parallelism: number; // Max concurrent deployments (0 = unlimited)
continueOnFailure: boolean; // Continue if some targets fail
failureThreshold: number; // Max failures before abort
}
```
### Rolling
Deploy to targets sequentially with health checks.
```typescript
interface RollingConfig {
batchSize: number; // Targets per batch
batchDelay: number; // Seconds between batches
healthCheckBetweenBatches: boolean;
rollbackOnFailure: boolean;
maxUnavailable: number; // Max targets unavailable at once
}
```
### Canary
Deploy to subset, verify, then proceed.
```typescript
interface CanaryConfig {
canaryTargets: number; // Number or percentage for canary
canaryDuration: number; // Seconds to run canary
healthThreshold: number; // Required health percentage
autoPromote: boolean; // Auto-proceed if healthy
requireApproval: boolean; // Require manual approval
}
```
### Blue-Green
Deploy to B, switch traffic, retire A.
```typescript
interface BlueGreenConfig {
targetGroupA: UUID; // Current (blue) target group
targetGroupB: UUID; // New (green) target group
trafficShiftType: "instant" | "gradual";
gradualShiftSteps?: number[]; // e.g., [10, 25, 50, 100]
rollbackOnHealthFailure: boolean;
}
```
---
## Rolling Deployment Algorithm
```python
class RollingDeploymentExecutor:
def execute(self, job: DeploymentJob, config: RollingConfig) -> DeploymentResult:
targets = self.get_targets(job.environment_id)
batches = self.create_batches(targets, config.batch_size)
deployed_targets = []
failed_targets = []
for batch_index, batch in enumerate(batches):
self.log(f"Starting batch {batch_index + 1} of {len(batches)}")
# Deploy batch in parallel
batch_results = self.deploy_batch(job, batch)
for target, result in batch_results:
if result.success:
deployed_targets.append(target)
# Write version sticker
self.write_sticker(target, job.release)
else:
failed_targets.append(target)
if config.rollback_on_failure:
# Rollback all deployed targets
self.rollback_targets(deployed_targets, job.previous_release)
return DeploymentResult(
success=False,
error=f"Batch {batch_index + 1} failed, rolled back",
deployed=deployed_targets,
failed=failed_targets,
rolled_back=deployed_targets
)
# Health check between batches
if config.health_check_between_batches and batch_index < len(batches) - 1:
health_result = self.check_batch_health(deployed_targets[-len(batch):])
if not health_result.healthy:
if config.rollback_on_failure:
self.rollback_targets(deployed_targets, job.previous_release)
return DeploymentResult(
success=False,
error=f"Health check failed after batch {batch_index + 1}",
deployed=deployed_targets,
failed=failed_targets,
rolled_back=deployed_targets
)
# Delay between batches
if config.batch_delay > 0 and batch_index < len(batches) - 1:
time.sleep(config.batch_delay)
return DeploymentResult(
success=len(failed_targets) == 0,
deployed=deployed_targets,
failed=failed_targets
)
```
---
## Database Schema
```sql
-- Deployment Jobs
CREATE TABLE release.deployment_jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
release_id UUID NOT NULL REFERENCES release.releases(id),
environment_id UUID NOT NULL REFERENCES release.environments(id),
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
'pending', 'running', 'succeeded', 'failed', 'cancelled', 'rolling_back', 'rolled_back'
)),
strategy VARCHAR(50) NOT NULL DEFAULT 'all-at-once',
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
artifacts JSONB NOT NULL DEFAULT '[]',
rollback_of UUID REFERENCES release.deployment_jobs(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_deployment_jobs_promotion ON release.deployment_jobs(promotion_id);
CREATE INDEX idx_deployment_jobs_status ON release.deployment_jobs(status);
-- Deployment Tasks
CREATE TABLE release.deployment_tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
job_id UUID NOT NULL REFERENCES release.deployment_jobs(id) ON DELETE CASCADE,
target_id UUID NOT NULL REFERENCES release.targets(id),
digest VARCHAR(100) NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
'pending', 'running', 'succeeded', 'failed', 'cancelled', 'skipped'
)),
agent_id UUID REFERENCES release.agents(id),
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
exit_code INTEGER,
logs TEXT,
previous_digest VARCHAR(100),
sticker_written BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_deployment_tasks_job ON release.deployment_tasks(job_id);
CREATE INDEX idx_deployment_tasks_target ON release.deployment_tasks(target_id);
CREATE INDEX idx_deployment_tasks_status ON release.deployment_tasks(status);
-- Generated Artifacts
CREATE TABLE release.generated_artifacts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
deployment_job_id UUID REFERENCES release.deployment_jobs(id) ON DELETE CASCADE,
artifact_type VARCHAR(50) NOT NULL CHECK (artifact_type IN (
'compose_lock', 'script', 'sticker', 'evidence', 'config'
)),
name VARCHAR(255) NOT NULL,
content_hash VARCHAR(100) NOT NULL,
content BYTEA, -- for small artifacts
storage_ref VARCHAR(500), -- for large artifacts (S3, etc.)
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_generated_artifacts_job ON release.generated_artifacts(deployment_job_id);
```
---
## API Endpoints
```yaml
# Deployment Jobs (mostly read-only; created by promotions)
GET /api/v1/deployment-jobs
Query: ?promotionId={uuid}&status={status}&environmentId={uuid}
Response: DeploymentJob[]
GET /api/v1/deployment-jobs/{id}
Response: DeploymentJob (with tasks)
GET /api/v1/deployment-jobs/{id}/tasks
Response: DeploymentTask[]
GET /api/v1/deployment-jobs/{id}/tasks/{taskId}
Response: DeploymentTask (with logs)
GET /api/v1/deployment-jobs/{id}/tasks/{taskId}/logs
Query: ?follow=true
Response: string | SSE stream
GET /api/v1/deployment-jobs/{id}/artifacts
Response: GeneratedArtifact[]
GET /api/v1/deployment-jobs/{id}/artifacts/{artifactId}
Response: binary (download)
# Rollback
POST /api/v1/rollbacks
Body: {
environmentId: UUID,
strategy: "to-previous" | "to-release" | "to-sticker",
targetReleaseId?: UUID # for to-release strategy
}
Response: DeploymentJob (rollback job)
GET /api/v1/rollbacks
Query: ?environmentId={uuid}
Response: DeploymentJob[] (rollback jobs only)
```
---
## References
- [Module Overview](overview.md)
- [Agents Specification](agents.md)
- [Deployment Strategies](../deployment/strategies.md)
- [Artifact Generation](../deployment/artifacts.md)
- [API Documentation](../api/deployments.md)

View File

@@ -0,0 +1,418 @@
# ENVMGR: Environment & Inventory Manager
**Purpose**: Model environments, targets, agents, and their relationships.
## Modules
### Module: `environment-manager`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Environment CRUD, ordering, configuration, freeze windows |
| **Dependencies** | `authority` |
| **Data Entities** | `Environment`, `EnvironmentConfig`, `FreezeWindow` |
| **Events Produced** | `environment.created`, `environment.updated`, `environment.freeze_started`, `environment.freeze_ended` |
**Key Operations**:
```
CreateEnvironment(name, displayName, orderIndex, config) → Environment
UpdateEnvironment(id, config) → Environment
DeleteEnvironment(id) → void
SetFreezeWindow(environmentId, start, end, reason, exceptions) → FreezeWindow
ClearFreezeWindow(environmentId, windowId) → void
ListEnvironments(tenantId) → Environment[]
GetEnvironmentState(id) → EnvironmentState
```
**Environment Entity**:
```typescript
interface Environment {
id: UUID;
tenantId: UUID;
name: string; // "dev", "stage", "prod"
displayName: string; // "Development"
orderIndex: number; // 0, 1, 2 for promotion order
config: EnvironmentConfig;
freezeWindows: FreezeWindow[];
requiredApprovals: number; // 0 for dev, 1+ for prod
requireSeparationOfDuties: boolean;
autoPromoteFrom: UUID | null; // auto-promote from this env
promotionPolicy: string; // OPA policy name
createdAt: DateTime;
updatedAt: DateTime;
}
interface EnvironmentConfig {
variables: Record<string, string>; // env-specific variables
secrets: SecretReference[]; // vault references
registryOverrides: RegistryOverride[]; // per-env registry
agentLabels: string[]; // required agent labels
deploymentTimeout: number; // seconds
healthCheckConfig: HealthCheckConfig;
}
interface FreezeWindow {
id: UUID;
start: DateTime;
end: DateTime;
reason: string;
createdBy: UUID;
exceptions: UUID[]; // users who can override
}
```
---
### Module: `target-registry`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Deployment target inventory; capability tracking |
| **Dependencies** | `environment-manager`, `agent-manager` |
| **Data Entities** | `Target`, `TargetGroup`, `TargetCapability` |
| **Events Produced** | `target.created`, `target.updated`, `target.deleted`, `target.health_changed` |
**Target Types** (plugin-provided):
| Type | Description |
|------|-------------|
| `docker_host` | Single Docker host |
| `compose_host` | Docker Compose host |
| `ssh_remote` | Generic SSH target |
| `winrm_remote` | Windows remote target |
| `ecs_service` | AWS ECS service |
| `nomad_job` | HashiCorp Nomad job |
**Target Entity**:
```typescript
interface Target {
id: UUID;
tenantId: UUID;
environmentId: UUID;
name: string; // "prod-web-01"
targetType: string; // "docker_host"
connection: TargetConnection; // type-specific
capabilities: TargetCapability[];
labels: Record<string, string>; // for grouping
healthStatus: HealthStatus;
lastHealthCheck: DateTime;
deploymentDirectory: string; // where artifacts are placed
currentDigest: string | null; // what's currently deployed
agentId: UUID | null; // assigned agent
}
interface TargetConnection {
// Common fields
host: string;
port: number;
// Type-specific (examples)
// docker_host:
dockerSocket?: string;
tlsCert?: SecretReference;
// ssh_remote:
username?: string;
privateKey?: SecretReference;
// ecs_service:
cluster?: string;
service?: string;
region?: string;
roleArn?: string;
}
interface TargetGroup {
id: UUID;
tenantId: UUID;
environmentId: UUID;
name: string;
labels: Record<string, string>;
createdAt: DateTime;
}
```
---
### Module: `agent-manager`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Agent registration, heartbeat, capability advertisement |
| **Dependencies** | `authority` (for agent tokens) |
| **Data Entities** | `Agent`, `AgentCapability`, `AgentHeartbeat` |
| **Events Produced** | `agent.registered`, `agent.online`, `agent.offline`, `agent.capability_changed` |
**Agent Lifecycle**:
1. Agent starts, requests registration token from Authority
2. Agent registers with capabilities and labels
3. Agent sends heartbeats (default: 30s interval)
4. Agent pulls tasks from task queue
5. Agent reports task completion/failure
**Agent Entity**:
```typescript
interface Agent {
id: UUID;
tenantId: UUID;
name: string;
version: string;
capabilities: AgentCapability[];
labels: Record<string, string>;
status: "online" | "offline" | "degraded";
lastHeartbeat: DateTime;
assignedTargets: UUID[];
resourceUsage: ResourceUsage;
}
interface AgentCapability {
type: string; // "docker", "compose", "ssh", "winrm"
version: string; // capability version
config: object; // capability-specific config
}
interface ResourceUsage {
cpuPercent: number;
memoryPercent: number;
diskPercent: number;
activeTasks: number;
}
```
**Agent Registration Protocol**:
```
1. Admin generates registration token (one-time use)
POST /api/v1/admin/agent-tokens
→ { token: "reg_xxx", expiresAt: "..." }
2. Agent starts with registration token
./stella-agent --register --token=reg_xxx
3. Agent requests mTLS certificate
POST /api/v1/agents/register
Headers: X-Registration-Token: reg_xxx
Body: { name, version, capabilities, csr }
→ { agentId, certificate, caCertificate }
4. Agent establishes mTLS connection
Uses issued certificate for all subsequent requests
5. Agent requests short-lived JWT for task execution
POST /api/v1/agents/token (over mTLS)
→ { token, expiresIn: 3600 } // 1 hour
```
---
### Module: `inventory-sync`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Drift detection; expected vs actual state reconciliation |
| **Dependencies** | `target-registry`, `agent-manager` |
| **Events Produced** | `inventory.drift_detected`, `inventory.reconciled` |
**Drift Detection Process**:
1. Read `stella.version.json` from target deployment directory
2. Compare with expected state in database
3. Flag discrepancies (digest mismatch, missing sticker, unexpected files)
4. Report on dashboard
**Drift Detection Types**:
| Drift Type | Description | Severity |
|------------|-------------|----------|
| `digest_mismatch` | Running digest differs from expected | Critical |
| `missing_sticker` | No version sticker found on target | Warning |
| `stale_sticker` | Sticker timestamp older than last deployment | Warning |
| `orphan_container` | Container not managed by Stella | Info |
| `extra_files` | Unexpected files in deployment directory | Info |
---
## Cache Eviction Policies
Environment configurations and target states are cached to improve performance. **All caches MUST have bounded size and TTL-based eviction**:
| Cache Type | Purpose | TTL | Max Size | Eviction Strategy |
|-----------|---------|-----|----------|-------------------|
| **Environment Configs** | Environment configuration data | 30 minutes | 500 entries | Sliding expiration |
| **Target Health** | Target health status | 5 minutes | 2,000 entries | Sliding expiration |
| **Agent Capabilities** | Agent capability advertisement | 10 minutes | 1,000 entries | Sliding expiration |
| **Freeze Windows** | Active freeze window checks | 15 minutes | 100 entries | Absolute expiration |
**Implementation**:
```csharp
public class EnvironmentConfigCache
{
private readonly MemoryCache _cache;
public EnvironmentConfigCache()
{
_cache = new MemoryCache(new MemoryCacheOptions
{
SizeLimit = 500 // Max 500 environment configs
});
}
public void CacheConfig(Guid environmentId, EnvironmentConfig config)
{
_cache.Set(environmentId, config, new MemoryCacheEntryOptions
{
Size = 1,
SlidingExpiration = TimeSpan.FromMinutes(30) // 30-minute TTL
});
}
public EnvironmentConfig? GetCachedConfig(Guid environmentId)
=> _cache.Get<EnvironmentConfig>(environmentId);
public void InvalidateConfig(Guid environmentId)
=> _cache.Remove(environmentId);
}
```
**Cache Invalidation**:
- Environment configs: Invalidate on update
- Target health: Invalidate on health check or deployment
- Agent capabilities: Invalidate on capability change event
- Freeze windows: Invalidate on window creation/deletion
**Reference**: See [Implementation Guide](../implementation-guide.md#caching) for cache implementation patterns.
---
## Database Schema
```sql
-- Environments
CREATE TABLE release.environments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
name VARCHAR(100) NOT NULL,
display_name VARCHAR(255) NOT NULL,
order_index INTEGER NOT NULL,
config JSONB NOT NULL DEFAULT '{}',
freeze_windows JSONB NOT NULL DEFAULT '[]',
required_approvals INTEGER NOT NULL DEFAULT 0,
require_sod BOOLEAN NOT NULL DEFAULT FALSE,
auto_promote_from UUID REFERENCES release.environments(id),
promotion_policy VARCHAR(255),
deployment_timeout INTEGER NOT NULL DEFAULT 600,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, name)
);
CREATE INDEX idx_environments_tenant ON release.environments(tenant_id);
CREATE INDEX idx_environments_order ON release.environments(tenant_id, order_index);
-- Target Groups
CREATE TABLE release.target_groups (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
labels JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, environment_id, name)
);
-- Targets
CREATE TABLE release.targets (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE,
target_group_id UUID REFERENCES release.target_groups(id),
name VARCHAR(255) NOT NULL,
target_type VARCHAR(100) NOT NULL,
connection JSONB NOT NULL,
capabilities JSONB NOT NULL DEFAULT '[]',
labels JSONB NOT NULL DEFAULT '{}',
deployment_directory VARCHAR(500),
health_status VARCHAR(50) NOT NULL DEFAULT 'unknown',
last_health_check TIMESTAMPTZ,
current_digest VARCHAR(100),
agent_id UUID REFERENCES release.agents(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, environment_id, name)
);
CREATE INDEX idx_targets_tenant_env ON release.targets(tenant_id, environment_id);
CREATE INDEX idx_targets_type ON release.targets(target_type);
CREATE INDEX idx_targets_labels ON release.targets USING GIN (labels);
-- Agents
CREATE TABLE release.agents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
version VARCHAR(50) NOT NULL,
capabilities JSONB NOT NULL DEFAULT '[]',
labels JSONB NOT NULL DEFAULT '{}',
status VARCHAR(50) NOT NULL DEFAULT 'offline',
last_heartbeat TIMESTAMPTZ,
resource_usage JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, name)
);
CREATE INDEX idx_agents_tenant ON release.agents(tenant_id);
CREATE INDEX idx_agents_status ON release.agents(status);
CREATE INDEX idx_agents_capabilities ON release.agents USING GIN (capabilities);
```
---
## API Endpoints
```yaml
# Environments
POST /api/v1/environments
GET /api/v1/environments
GET /api/v1/environments/{id}
PUT /api/v1/environments/{id}
DELETE /api/v1/environments/{id}
# Freeze Windows
POST /api/v1/environments/{envId}/freeze-windows
GET /api/v1/environments/{envId}/freeze-windows
DELETE /api/v1/environments/{envId}/freeze-windows/{windowId}
# Target Groups
POST /api/v1/environments/{envId}/target-groups
GET /api/v1/environments/{envId}/target-groups
GET /api/v1/target-groups/{id}
PUT /api/v1/target-groups/{id}
DELETE /api/v1/target-groups/{id}
# Targets
POST /api/v1/targets
GET /api/v1/targets
GET /api/v1/targets/{id}
PUT /api/v1/targets/{id}
DELETE /api/v1/targets/{id}
POST /api/v1/targets/{id}/health-check
GET /api/v1/targets/{id}/sticker
GET /api/v1/targets/{id}/drift
# Agents
POST /api/v1/agents/register
GET /api/v1/agents
GET /api/v1/agents/{id}
PUT /api/v1/agents/{id}
DELETE /api/v1/agents/{id}
POST /api/v1/agents/{id}/heartbeat
POST /api/v1/agents/{id}/tasks/{taskId}/complete
```
---
## References
- [Module Overview](overview.md)
- [Agent Specification](agents.md)
- [API Documentation](../api/environments.md)
- [Agent Security](../security/agent-security.md)

View File

@@ -0,0 +1,575 @@
# RELEVI: Release Evidence
**Purpose**: Cryptographically sealed evidence packets for audit-grade release governance.
## Modules
### Module: `evidence-collector`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Evidence aggregation; packet composition |
| **Dependencies** | `promotion-manager`, `deploy-orchestrator`, `decision-engine` |
| **Data Entities** | `EvidencePacket`, `EvidenceContent` |
| **Events Produced** | `evidence.collected`, `evidence.packet_created` |
**Evidence Packet Structure**:
```typescript
interface EvidencePacket {
id: UUID;
tenantId: UUID;
promotionId: UUID;
packetType: EvidencePacketType;
content: EvidenceContent;
contentHash: string; // SHA-256 of content
signature: string; // Cryptographic signature
signerKeyRef: string; // Reference to signing key
createdAt: DateTime;
// Note: No updatedAt - packets are immutable
}
type EvidencePacketType =
| "release_decision" // Promotion decision evidence
| "deployment" // Deployment execution evidence
| "rollback" // Rollback evidence
| "ab_promotion"; // A/B promotion evidence
interface EvidenceContent {
// Metadata
version: "1.0";
generatedAt: DateTime;
generatorVersion: string;
// What
release: {
id: UUID;
name: string;
components: Array<{
name: string;
digest: string;
semver: string;
imageRepository: string;
}>;
sourceRef: SourceReference | null;
};
// Where
environment: {
id: UUID;
name: string;
targets: Array<{
id: UUID;
name: string;
type: string;
}>;
};
// Who
actors: {
requester: {
id: UUID;
name: string;
email: string;
};
approvers: Array<{
id: UUID;
name: string;
action: string;
at: DateTime;
comment: string | null;
}>;
};
// Why
decision: {
result: "allow" | "deny";
gates: Array<{
type: string;
name: string;
status: string;
message: string;
details: Record<string, any>;
}>;
reasons: string[];
};
// How
execution: {
workflowRunId: UUID | null;
deploymentJobId: UUID | null;
artifacts: Array<{
type: string;
name: string;
contentHash: string;
}>;
logs: string | null; // Compressed/truncated
};
// When
timeline: {
requestedAt: DateTime;
decidedAt: DateTime | null;
startedAt: DateTime | null;
completedAt: DateTime | null;
};
// Integrity
inputsHash: string; // Hash of all inputs for replay
previousEvidenceId: UUID | null; // Chain to previous evidence
}
```
---
### Module: `evidence-signer`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Cryptographic signing of evidence packets |
| **Dependencies** | `authority`, `vault` (for key storage) |
| **Algorithms** | RS256, ES256, Ed25519 |
**Signing Process**:
```typescript
class EvidenceSigner {
async sign(content: EvidenceContent): Promise<SignedEvidence> {
// 1. Canonicalize content (RFC 8785)
const canonicalJson = canonicalize(content);
// 2. Compute content hash
const contentHash = crypto
.createHash("sha256")
.update(canonicalJson)
.digest("hex");
// 3. Get signing key from vault
const keyRef = await this.getActiveSigningKey();
const privateKey = await this.vault.getPrivateKey(keyRef);
// 4. Sign the content hash
const signature = await this.signWithKey(contentHash, privateKey);
return {
content,
contentHash: `sha256:${contentHash}`,
signature: base64Encode(signature),
signerKeyRef: keyRef,
algorithm: this.config.signatureAlgorithm,
};
}
async verify(packet: EvidencePacket): Promise<VerificationResult> {
// 1. Canonicalize stored content
const canonicalJson = canonicalize(packet.content);
// 2. Verify content hash
const computedHash = crypto
.createHash("sha256")
.update(canonicalJson)
.digest("hex");
if (`sha256:${computedHash}` !== packet.contentHash) {
return { valid: false, error: "Content hash mismatch" };
}
// 3. Get public key
const publicKey = await this.vault.getPublicKey(packet.signerKeyRef);
// 4. Verify signature
const signatureValid = await this.verifySignature(
computedHash,
base64Decode(packet.signature),
publicKey
);
return {
valid: signatureValid,
signerKeyRef: packet.signerKeyRef,
signedAt: packet.createdAt,
};
}
}
```
---
### Module: `sticker-writer`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Version sticker generation and placement |
| **Dependencies** | `deploy-orchestrator`, `agent-manager` |
| **Data Entities** | `VersionSticker` |
**Version Sticker Schema**:
```typescript
interface VersionSticker {
stella_version: "1.0";
// Release identity
release_id: UUID;
release_name: string;
// Component details
components: Array<{
name: string;
digest: string;
semver: string;
tag: string;
image_repository: string;
}>;
// Deployment context
environment: string;
environment_id: UUID;
deployed_at: string; // ISO 8601
deployed_by: UUID;
// Traceability
promotion_id: UUID;
workflow_run_id: UUID;
// Evidence chain
evidence_packet_id: UUID;
evidence_packet_hash: string;
policy_decision_hash: string;
// Orchestrator info
orchestrator_version: string;
// Source reference
source_ref?: {
commit_sha: string;
branch: string;
repository: string;
};
}
```
**Sticker Placement**:
- Written to `/var/stella/version.json` on each target
- Atomic write (write to temp, rename)
- Read during drift detection
- Verified against expected state
---
### Module: `audit-exporter`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Compliance report generation; evidence export |
| **Dependencies** | `evidence-collector` |
| **Export Formats** | JSON, PDF, CSV |
**Audit Report Types**:
| Report Type | Description |
|-------------|-------------|
| `release_audit` | Full audit trail for a release |
| `environment_audit` | All deployments to an environment |
| `compliance_summary` | Summary for compliance review |
| `change_log` | Chronological change log |
**Report Generation**:
```typescript
interface AuditReportRequest {
type: AuditReportType;
scope: {
releaseId?: UUID;
environmentId?: UUID;
from?: DateTime;
to?: DateTime;
};
format: "json" | "pdf" | "csv";
options?: {
includeDecisionDetails: boolean;
includeApproverDetails: boolean;
includeLogs: boolean;
includeArtifacts: boolean;
};
}
interface AuditReport {
id: UUID;
type: AuditReportType;
scope: ReportScope;
generatedAt: DateTime;
generatedBy: UUID;
summary: {
totalPromotions: number;
successfulDeployments: number;
failedDeployments: number;
rollbacks: number;
averageDeploymentTime: number;
};
entries: AuditEntry[];
// For compliance
signatureChain: {
valid: boolean;
verifiedPackets: number;
invalidPackets: number;
};
}
```
---
## Immutability Enforcement
Evidence packets are append-only. This is enforced at multiple levels:
### Database Level
```sql
-- Evidence packets table with no UPDATE/DELETE
CREATE TABLE release.evidence_packets (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
packet_type VARCHAR(50) NOT NULL CHECK (packet_type IN (
'release_decision', 'deployment', 'rollback', 'ab_promotion'
)),
content JSONB NOT NULL,
content_hash VARCHAR(100) NOT NULL,
signature TEXT,
signer_key_ref VARCHAR(255),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
-- Note: No updated_at column; immutable by design
);
-- Append-only enforcement via trigger
CREATE OR REPLACE FUNCTION prevent_evidence_modification()
RETURNS TRIGGER AS $$
BEGIN
RAISE EXCEPTION 'Evidence packets are immutable and cannot be modified or deleted';
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER evidence_packets_immutable
BEFORE UPDATE OR DELETE ON evidence_packets
FOR EACH ROW EXECUTE FUNCTION prevent_evidence_modification();
-- Revoke UPDATE/DELETE from application role
REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role;
-- Version stickers table
CREATE TABLE release.version_stickers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
target_id UUID NOT NULL REFERENCES release.targets(id),
release_id UUID NOT NULL REFERENCES release.releases(id),
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
sticker_content JSONB NOT NULL,
content_hash VARCHAR(100) NOT NULL,
written_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
verified_at TIMESTAMPTZ,
drift_detected BOOLEAN NOT NULL DEFAULT FALSE
);
CREATE INDEX idx_version_stickers_target ON release.version_stickers(target_id);
CREATE INDEX idx_version_stickers_release ON release.version_stickers(release_id);
CREATE INDEX idx_evidence_packets_promotion ON release.evidence_packets(promotion_id);
CREATE INDEX idx_evidence_packets_created ON release.evidence_packets(created_at DESC);
```
### Application Level
```csharp
// Evidence service enforces immutability
public sealed class EvidenceService
{
// Only Create method - no Update or Delete
public async Task<EvidencePacket> CreateAsync(
EvidenceContent content,
CancellationToken ct)
{
// Sign content
var signed = await _signer.SignAsync(content, ct);
// Store (append-only)
var packet = new EvidencePacket
{
Id = Guid.NewGuid(),
TenantId = content.TenantId,
PromotionId = content.PromotionId,
PacketType = content.PacketType,
Content = content,
ContentHash = signed.ContentHash,
Signature = signed.Signature,
SignerKeyRef = signed.SignerKeyRef,
CreatedAt = DateTime.UtcNow,
};
await _repository.InsertAsync(packet, ct);
return packet;
}
// Read methods only
public async Task<EvidencePacket> GetAsync(Guid id, CancellationToken ct);
public async Task<IReadOnlyList<EvidencePacket>> ListAsync(
EvidenceFilter filter, CancellationToken ct);
public async Task<VerificationResult> VerifyAsync(
Guid id, CancellationToken ct);
// No Update or Delete methods exist
}
```
---
## Evidence Chain
Evidence packets form a verifiable chain:
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Evidence #1 │ │ Evidence #2 │ │ Evidence #3 │
│ (Dev Deploy) │────►│ (Stage Deploy) │────►│ (Prod Deploy) │
│ │ │ │ │ │
│ prevEvidenceId: │ │ prevEvidenceId: │ │ prevEvidenceId: │
│ null │ │ #1 │ │ #2 │
│ │ │ │ │ │
│ contentHash: │ │ contentHash: │ │ contentHash: │
│ sha256:abc... │ │ sha256:def... │ │ sha256:ghi... │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
**Chain Verification**:
```typescript
async function verifyEvidenceChain(releaseId: UUID): Promise<ChainVerificationResult> {
const packets = await getPacketsForRelease(releaseId);
const results: PacketVerificationResult[] = [];
let previousHash: string | null = null;
for (const packet of packets) {
// 1. Verify packet signature
const signatureValid = await verifySignature(packet);
// 2. Verify content hash
const contentValid = await verifyContentHash(packet);
// 3. Verify chain link
const chainValid = packet.content.previousEvidenceId === null
? previousHash === null
: await verifyPreviousLink(packet, previousHash);
results.push({
packetId: packet.id,
signatureValid,
contentValid,
chainValid,
valid: signatureValid && contentValid && chainValid,
});
previousHash = packet.contentHash;
}
return {
valid: results.every(r => r.valid),
packets: results,
};
}
```
---
## API Endpoints
```yaml
# Evidence Packets
GET /api/v1/evidence-packets
Query: ?promotionId={uuid}&type={type}&from={date}&to={date}
Response: EvidencePacket[]
GET /api/v1/evidence-packets/{id}
Response: EvidencePacket (full content)
GET /api/v1/evidence-packets/{id}/verify
Response: VerificationResult
GET /api/v1/evidence-packets/{id}/download
Query: ?format={json|pdf}
Response: binary
# Evidence Chain
GET /api/v1/releases/{id}/evidence-chain
Response: EvidenceChain
GET /api/v1/releases/{id}/evidence-chain/verify
Response: ChainVerificationResult
# Audit Reports
POST /api/v1/audit-reports
Body: {
type: "release" | "environment" | "compliance",
scope: { releaseId?, environmentId?, from?, to? },
format: "json" | "pdf" | "csv"
}
Response: { reportId: UUID, status: "generating" }
GET /api/v1/audit-reports/{id}
Response: { status, downloadUrl? }
GET /api/v1/audit-reports/{id}/download
Response: binary
# Version Stickers
GET /api/v1/version-stickers
Query: ?targetId={uuid}&releaseId={uuid}
Response: VersionSticker[]
GET /api/v1/version-stickers/{id}
Response: VersionSticker
```
---
## Deterministic Replay
Evidence packets enable deterministic replay - given the same inputs and policy version, the same decision is produced:
```typescript
async function replayDecision(evidencePacket: EvidencePacket): Promise<ReplayResult> {
const content = evidencePacket.content;
// 1. Verify inputs hash
const currentInputsHash = computeInputsHash(
content.release,
content.environment,
content.decision.gates
);
if (currentInputsHash !== content.inputsHash) {
return { valid: false, error: "Inputs have changed since original decision" };
}
// 2. Re-evaluate decision with same inputs
const replayedDecision = await evaluateDecision(
content.release,
content.environment,
{ asOf: content.timeline.decidedAt } // Use policy version from that time
);
// 3. Compare decisions
const decisionsMatch = replayedDecision.result === content.decision.result;
return {
valid: decisionsMatch,
originalDecision: content.decision.result,
replayedDecision: replayedDecision.result,
differences: decisionsMatch ? [] : computeDifferences(content.decision, replayedDecision),
};
}
```
---
## References
- [Module Overview](overview.md)
- [Design Principles](../design/principles.md)
- [Security Architecture](../security/overview.md)
- [Evidence Schema](../appendices/evidence-schema.md)

View File

@@ -0,0 +1,373 @@
# INTHUB: Integration Hub
**Purpose**: Central management of all external integrations (SCM, CI, registries, vaults, targets).
## Modules
### Module: `integration-manager`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | CRUD for integration instances; plugin type registry |
| **Dependencies** | `plugin-registry`, `authority` (for credentials) |
| **Data Entities** | `Integration`, `IntegrationType`, `IntegrationCredential` |
| **Events Produced** | `integration.created`, `integration.updated`, `integration.deleted`, `integration.health_changed` |
| **Events Consumed** | `plugin.registered`, `plugin.unregistered` |
**Key Operations**:
```
CreateIntegration(type, name, config, credentials) → Integration
UpdateIntegration(id, config, credentials) → Integration
DeleteIntegration(id) → void
TestConnection(id) → ConnectionTestResult
DiscoverResources(id, resourceType) → Resource[]
GetIntegrationHealth(id) → HealthStatus
ListIntegrations(filter) → Integration[]
```
**Integration Entity**:
```typescript
interface Integration {
id: UUID;
tenantId: UUID;
type: string; // "scm.github", "registry.harbor"
name: string; // user-defined name
config: IntegrationConfig; // type-specific config
credentialId: UUID; // reference to vault
healthStatus: HealthStatus;
lastHealthCheck: DateTime;
createdAt: DateTime;
updatedAt: DateTime;
}
interface IntegrationConfig {
endpoint: string;
authMode: "token" | "oauth" | "mtls" | "iam";
timeout: number;
retryPolicy: RetryPolicy;
customHeaders?: Record<string, string>;
// Type-specific fields added by plugin
[key: string]: any;
}
```
---
### Module: `connection-profiles`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Default settings management; "last used" pattern |
| **Dependencies** | `integration-manager` |
| **Data Entities** | `ConnectionProfile`, `ProfileTemplate` |
**Behavior**: When user adds a new integration instance:
1. Wizard defaults to last used endpoint, auth mode, network settings
2. Secrets are **never** auto-reused (explicit confirmation required)
3. User can save as named profile for reuse
**Profile Entity**:
```typescript
interface ConnectionProfile {
id: UUID;
tenantId: UUID;
name: string; // "Production GitHub"
integrationType: string;
defaultConfig: Partial<IntegrationConfig>;
isDefault: boolean;
lastUsedAt: DateTime;
createdBy: UUID;
}
```
---
### Module: `connector-runtime`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Execute plugin connector logic in controlled environment |
| **Dependencies** | `plugin-loader`, `plugin-sandbox` |
| **Protocol** | gRPC (preferred) or HTTP/REST |
**Connector Interface** (implemented by plugins):
```protobuf
service Connector {
// Connection management
rpc TestConnection(TestConnectionRequest) returns (TestConnectionResponse);
rpc GetHealth(HealthRequest) returns (HealthResponse);
// Resource discovery
rpc DiscoverResources(DiscoverRequest) returns (DiscoverResponse);
rpc ListRepositories(ListReposRequest) returns (ListReposResponse);
rpc ListBranches(ListBranchesRequest) returns (ListBranchesResponse);
rpc ListTags(ListTagsRequest) returns (ListTagsResponse);
// Registry operations
rpc ResolveTagToDigest(ResolveRequest) returns (ResolveResponse);
rpc FetchManifest(ManifestRequest) returns (ManifestResponse);
rpc VerifyDigest(VerifyRequest) returns (VerifyResponse);
// Secrets operations
rpc GetSecretsRef(SecretsRequest) returns (SecretsResponse);
rpc FetchSecret(FetchSecretRequest) returns (FetchSecretResponse);
// Workflow step execution
rpc ExecuteStep(StepRequest) returns (stream StepResponse);
rpc CancelStep(CancelRequest) returns (CancelResponse);
}
```
**Request/Response Types**:
```protobuf
message TestConnectionRequest {
string integration_id = 1;
map<string, string> config = 2;
string credential_ref = 3;
}
message TestConnectionResponse {
bool success = 1;
string error_message = 2;
map<string, string> details = 3;
int64 latency_ms = 4;
}
message ResolveRequest {
string integration_id = 1;
string image_ref = 2; // "myapp:v2.3.1"
}
message ResolveResponse {
string digest = 1; // "sha256:abc123..."
string manifest_type = 2;
int64 size_bytes = 3;
google.protobuf.Timestamp pushed_at = 4;
}
```
---
### Module: `doctor-checks`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Integration health diagnostics; troubleshooting |
| **Dependencies** | `integration-manager`, `connector-runtime` |
**Doctor Check Types**:
| Check | Purpose | Pass Criteria |
|-------|---------|---------------|
| **Connectivity** | Can reach endpoint | TCP connect succeeds |
| **TLS** | Certificate valid | Chain validates, not expired |
| **Authentication** | Credentials valid | Auth request succeeds |
| **Authorization** | Permissions sufficient | Required scopes present |
| **Version** | API version supported | Version in supported range |
| **Rate Limit** | Quota available | >10% remaining |
| **Latency** | Response time acceptable | <5s p99 |
**Doctor Check Output**:
```typescript
interface DoctorCheckResult {
checkType: string;
status: "pass" | "warn" | "fail";
message: string;
details: Record<string, any>;
suggestions: string[];
runAt: DateTime;
durationMs: number;
}
interface DoctorReport {
integrationId: UUID;
overallStatus: "healthy" | "degraded" | "unhealthy";
checks: DoctorCheckResult[];
generatedAt: DateTime;
}
```
---
## Cache Eviction Policies
Integration health status and connector results are cached to reduce load on external systems. **All caches MUST have bounded size and TTL-based eviction**:
| Cache Type | Purpose | TTL | Max Size | Eviction Strategy |
|-----------|---------|-----|----------|-------------------|
| **Health Checks** | Integration health status | 5 minutes | 1,000 entries | Sliding expiration |
| **Connection Tests** | Test connection results | 2 minutes | 500 entries | Sliding expiration |
| **Resource Discovery** | Discovered resources (repos, tags) | 10 minutes | 5,000 entries | Sliding expiration |
| **Tag Resolution** | Tag digest mappings | 1 hour | 10,000 entries | Absolute expiration |
**Implementation**:
```csharp
public class IntegrationHealthCache
{
private readonly MemoryCache _cache;
public IntegrationHealthCache()
{
_cache = new MemoryCache(new MemoryCacheOptions
{
SizeLimit = 1_000 // Max 1,000 integration health entries
});
}
public void CacheHealthStatus(Guid integrationId, HealthStatus status)
{
_cache.Set(integrationId, status, new MemoryCacheEntryOptions
{
Size = 1,
SlidingExpiration = TimeSpan.FromMinutes(5) // 5-minute TTL
});
}
public HealthStatus? GetCachedHealthStatus(Guid integrationId)
=> _cache.Get<HealthStatus>(integrationId);
}
```
**Reference**: See [Implementation Guide](../implementation-guide.md#caching) for cache implementation patterns.
---
## Integration Types
The following integration types are supported (via plugins):
### SCM Integrations
| Type | Plugin | Capabilities |
|------|--------|--------------|
| `scm.github` | Built-in | repos, branches, commits, webhooks, status |
| `scm.gitlab` | Built-in | repos, branches, commits, webhooks, pipelines |
| `scm.bitbucket` | Plugin | repos, branches, commits, webhooks |
| `scm.azure_repos` | Plugin | repos, branches, commits, pipelines |
### Registry Integrations
| Type | Plugin | Capabilities |
|------|--------|--------------|
| `registry.harbor` | Built-in | repos, tags, digests, scanning status |
| `registry.ecr` | Plugin | repos, tags, digests, IAM auth |
| `registry.gcr` | Plugin | repos, tags, digests |
| `registry.dockerhub` | Plugin | repos, tags, digests |
| `registry.ghcr` | Plugin | repos, tags, digests |
| `registry.acr` | Plugin | repos, tags, digests |
### Vault Integrations
| Type | Plugin | Capabilities |
|------|--------|--------------|
| `vault.hashicorp` | Built-in | KV, transit, PKI |
| `vault.aws_secrets` | Plugin | secrets, IAM auth |
| `vault.azure_keyvault` | Plugin | secrets, certificates |
| `vault.gcp_secrets` | Plugin | secrets, IAM auth |
### CI Integrations
| Type | Plugin | Capabilities |
|------|--------|--------------|
| `ci.github_actions` | Built-in | workflows, runs, artifacts, status |
| `ci.gitlab_ci` | Built-in | pipelines, jobs, artifacts |
| `ci.jenkins` | Plugin | jobs, builds, artifacts |
| `ci.azure_pipelines` | Plugin | pipelines, runs, artifacts |
### Router Integrations (for Progressive Delivery)
| Type | Plugin | Capabilities |
|------|--------|--------------|
| `router.nginx` | Plugin | upstream config, reload |
| `router.haproxy` | Plugin | backend config, reload |
| `router.traefik` | Plugin | dynamic config |
| `router.aws_alb` | Plugin | target groups, listener rules |
---
## Database Schema
```sql
-- Integration types (populated by plugins)
CREATE TABLE release.integration_types (
id TEXT PRIMARY KEY, -- "scm.github"
plugin_id UUID REFERENCES release.plugins(id),
display_name TEXT NOT NULL,
description TEXT,
icon_url TEXT,
config_schema JSONB NOT NULL, -- JSON Schema for config
capabilities TEXT[] NOT NULL, -- ["repos", "webhooks", "status"]
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Integration instances
CREATE TABLE release.integrations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
type_id TEXT NOT NULL REFERENCES release.integration_types(id),
name TEXT NOT NULL,
config JSONB NOT NULL,
credential_ref TEXT NOT NULL, -- vault reference
health_status TEXT NOT NULL DEFAULT 'unknown',
last_health_check TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
created_by UUID NOT NULL REFERENCES users(id),
UNIQUE(tenant_id, name)
);
-- Connection profiles
CREATE TABLE release.connection_profiles (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
name TEXT NOT NULL,
integration_type TEXT NOT NULL,
default_config JSONB NOT NULL,
is_default BOOLEAN NOT NULL DEFAULT false,
last_used_at TIMESTAMPTZ,
created_by UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(tenant_id, name)
);
-- Doctor check history
CREATE TABLE release.doctor_checks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
integration_id UUID NOT NULL REFERENCES release.integrations(id),
check_type TEXT NOT NULL,
status TEXT NOT NULL,
message TEXT,
details JSONB,
duration_ms INTEGER NOT NULL,
run_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_doctor_checks_integration ON release.doctor_checks(integration_id, run_at DESC);
```
---
## API Endpoints
See [API Documentation](../api/overview.md) for full specification.
```
GET /api/v1/integration-types # List available types
GET /api/v1/integration-types/{type} # Get type details
GET /api/v1/integrations # List integrations
POST /api/v1/integrations # Create integration
GET /api/v1/integrations/{id} # Get integration
PUT /api/v1/integrations/{id} # Update integration
DELETE /api/v1/integrations/{id} # Delete integration
POST /api/v1/integrations/{id}/test # Test connection
GET /api/v1/integrations/{id}/health # Get health status
POST /api/v1/integrations/{id}/doctor # Run doctor checks
GET /api/v1/integrations/{id}/resources # Discover resources
GET /api/v1/connection-profiles # List profiles
POST /api/v1/connection-profiles # Create profile
GET /api/v1/connection-profiles/{id} # Get profile
PUT /api/v1/connection-profiles/{id} # Update profile
DELETE /api/v1/connection-profiles/{id} # Delete profile
```

View File

@@ -0,0 +1,203 @@
# Module Landscape Overview
The Stella Ops Suite comprises existing modules (vulnerability scanning) and new modules (release orchestration). Modules are organized into **themes** (functional areas).
## Architecture Diagram
```
┌─────────────────────────────────────────────────────────────────────────────────┐
│ STELLA OPS SUITE │
│ │
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
│ │ EXISTING THEMES (Vulnerability) │ │
│ │ │ │
│ │ INGEST VEXOPS REASON SCANENG EVIDENCE │ │
│ │ ├─concelier ├─excititor ├─policy ├─scanner ├─locker │ │
│ │ └─advisory-ai └─linksets └─opa-runtime ├─sbom-gen ├─export │ │
│ │ └─reachability └─timeline │ │
│ │ │ │
│ │ RUNTIME JOBCTRL OBSERVE REPLAY DEVEXP │ │
│ │ ├─signals ├─scheduler ├─notifier └─replay-core ├─cli │ │
│ │ ├─graph ├─orchestrator └─telemetry ├─web-ui │ │
│ │ └─zastava └─task-runner └─sdk │ │
│ └───────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
│ │ NEW THEMES (Release Orchestration) │ │
│ │ │ │
│ │ INTHUB (Integration Hub) │ │
│ │ ├─integration-manager Central registry of configured integrations │ │
│ │ ├─connection-profiles Default settings + credential management │ │
│ │ ├─connector-runtime Plugin connector execution environment │ │
│ │ └─doctor-checks Integration health diagnostics │ │
│ │ │ │
│ │ ENVMGR (Environment & Inventory) │ │
│ │ ├─environment-manager Environment CRUD, ordering, config │ │
│ │ ├─target-registry Deployment targets (hosts/services) │ │
│ │ ├─agent-manager Agent registration, health, capabilities │ │
│ │ └─inventory-sync Drift detection, state reconciliation │ │
│ │ │ │
│ │ RELMAN (Release Management) │ │
│ │ ├─component-registry Image repos → components mapping │ │
│ │ ├─version-manager Tag/digest → semver mapping │ │
│ │ ├─release-manager Release bundle lifecycle │ │
│ │ └─release-catalog Release history, search, compare │ │
│ │ │ │
│ │ WORKFL (Workflow Engine) │ │
│ │ ├─workflow-designer Template creation, step graph editor │ │
│ │ ├─workflow-engine DAG execution, state machine │ │
│ │ ├─step-executor Step dispatch, retry, timeout │ │
│ │ └─step-registry Built-in + plugin-provided steps │ │
│ │ │ │
│ │ PROMOT (Promotion & Approval) │ │
│ │ ├─promotion-manager Promotion request lifecycle │ │
│ │ ├─approval-gateway Approval collection, SoD enforcement │ │
│ │ ├─decision-engine Gate evaluation, policy integration │ │
│ │ └─gate-registry Built-in + custom gates │ │
│ │ │ │
│ │ DEPLOY (Deployment Execution) │ │
│ │ ├─deploy-orchestrator Deployment job coordination │ │
│ │ ├─target-executor Target-specific deployment logic │ │
│ │ ├─runner-executor Script/hook execution sandbox │ │
│ │ ├─artifact-generator Compose/script artifact generation │ │
│ │ └─rollback-manager Rollback orchestration │ │
│ │ │ │
│ │ AGENTS (Deployment Agents) │ │
│ │ ├─agent-core Shared agent runtime │ │
│ │ ├─agent-docker Docker host agent │ │
│ │ ├─agent-compose Docker Compose agent │ │
│ │ ├─agent-ssh SSH remote executor │ │
│ │ ├─agent-winrm WinRM remote executor │ │
│ │ ├─agent-ecs AWS ECS agent │ │
│ │ └─agent-nomad HashiCorp Nomad agent │ │
│ │ │ │
│ │ PROGDL (Progressive Delivery) │ │
│ │ ├─ab-manager A/B release coordination │ │
│ │ ├─traffic-router Router plugin orchestration │ │
│ │ ├─canary-controller Canary ramp automation │ │
│ │ └─rollout-strategy Strategy templates │ │
│ │ │ │
│ │ RELEVI (Release Evidence) │ │
│ │ ├─evidence-collector Evidence aggregation │ │
│ │ ├─evidence-signer Cryptographic signing │ │
│ │ ├─sticker-writer Version sticker generation │ │
│ │ └─audit-exporter Compliance report generation │ │
│ │ │ │
│ │ PLUGIN (Plugin Infrastructure) │ │
│ │ ├─plugin-registry Plugin discovery, versioning │ │
│ │ ├─plugin-loader Plugin lifecycle management │ │
│ │ ├─plugin-sandbox Isolation, resource limits │ │
│ │ └─plugin-sdk SDK for plugin development │ │
│ └───────────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────┘
```
## Theme Summary
### Existing Themes (Vulnerability Scanning)
| Theme | Purpose | Key Modules |
|-------|---------|-------------|
| **INGEST** | Advisory ingestion | concelier, advisory-ai |
| **VEXOPS** | VEX document handling | excititor, linksets |
| **REASON** | Policy and decisioning | policy, opa-runtime |
| **SCANENG** | Scanning and SBOM | scanner, sbom-gen, reachability |
| **EVIDENCE** | Evidence and attestation | locker, export, timeline |
| **RUNTIME** | Runtime signals | signals, graph, zastava |
| **JOBCTRL** | Job orchestration | scheduler, orchestrator, task-runner |
| **OBSERVE** | Observability | notifier, telemetry |
| **REPLAY** | Deterministic replay | replay-core |
| **DEVEXP** | Developer experience | cli, web-ui, sdk |
### New Themes (Release Orchestration)
| Theme | Purpose | Key Modules | Documentation |
|-------|---------|-------------|---------------|
| **INTHUB** | Integration hub | integration-manager, connection-profiles, connector-runtime, doctor-checks | [Details](integration-hub.md) |
| **ENVMGR** | Environment & inventory | environment-manager, target-registry, agent-manager, inventory-sync | [Details](environment-manager.md) |
| **RELMAN** | Release management | component-registry, version-manager, release-manager, release-catalog | [Details](release-manager.md) |
| **WORKFL** | Workflow engine | workflow-designer, workflow-engine, step-executor, step-registry | [Details](workflow-engine.md) |
| **PROMOT** | Promotion & approval | promotion-manager, approval-gateway, decision-engine, gate-registry | [Details](promotion-manager.md) |
| **DEPLOY** | Deployment execution | deploy-orchestrator, target-executor, runner-executor, artifact-generator, rollback-manager | [Details](deploy-orchestrator.md) |
| **AGENTS** | Deployment agents | agent-core, agent-docker, agent-compose, agent-ssh, agent-winrm, agent-ecs, agent-nomad | [Details](agents.md) |
| **PROGDL** | Progressive delivery | ab-manager, traffic-router, canary-controller, rollout-strategy | [Details](progressive-delivery.md) |
| **RELEVI** | Release evidence | evidence-collector, evidence-signer, sticker-writer, audit-exporter | [Details](evidence.md) |
| **PLUGIN** | Plugin infrastructure | plugin-registry, plugin-loader, plugin-sandbox, plugin-sdk | [Details](plugin-system.md) |
## Module Dependencies
```
┌──────────────┐
│ AUTHORITY │
└──────┬───────┘
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ INTHUB │ │ ENVMGR │ │ PLUGIN │
│ (Integrations)│ │ (Environments)│ │ (Plugins) │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
└──────────┬───────┴──────────────────┘
┌───────────────┐
│ RELMAN │
│ (Releases) │
└───────┬───────┘
┌───────────────┐
│ WORKFL │
│ (Workflows) │
└───────┬───────┘
┌──────────┴──────────┐
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ PROMOT │ │ DEPLOY │
│ (Promotion) │ │ (Deployment) │
└───────┬───────┘ └───────┬───────┘
│ │
│ ▼
│ ┌───────────────┐
│ │ AGENTS │
│ │ (Agents) │
│ └───────┬───────┘
│ │
└──────────┬──────────┘
┌───────────────┐
│ RELEVI │
│ (Evidence) │
└───────────────┘
```
## Communication Patterns
| Pattern | Usage |
|---------|-------|
| **Synchronous API** | User-initiated operations (CRUD, queries) |
| **Event Bus** | Cross-module notifications (domain events) |
| **Task Queue** | Long-running operations (deployments, syncs) |
| **WebSocket/SSE** | Real-time UI updates |
| **gRPC Streams** | Agent communication |
## Database Schema Organization
Each theme owns a PostgreSQL schema:
| Schema | Owner Theme |
|--------|-------------|
| `release.integrations` | INTHUB |
| `release.environments` | ENVMGR |
| `release.components` | RELMAN |
| `release.workflows` | WORKFL |
| `release.promotions` | PROMOT |
| `release.deployments` | DEPLOY |
| `release.agents` | AGENTS |
| `release.evidence` | RELEVI |
| `release.plugins` | PLUGIN |

View File

@@ -0,0 +1,629 @@
# PLUGIN: Plugin Infrastructure
**Purpose**: Extensible plugin system for integrations, steps, and custom functionality.
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ PLUGIN ARCHITECTURE │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PLUGIN REGISTRY │ │
│ │ │ │
│ │ - Plugin discovery and versioning │ │
│ │ - Manifest validation │ │
│ │ - Dependency resolution │ │
│ └──────────────────────────────┬──────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PLUGIN LOADER │ │
│ │ │ │
│ │ - Lifecycle management (load, start, stop, unload) │ │
│ │ - Health monitoring │ │
│ │ - Hot reload support │ │
│ └──────────────────────────────┬──────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PLUGIN SANDBOX │ │
│ │ │ │
│ │ - Process isolation │ │
│ │ - Resource limits (CPU, memory, network) │ │
│ │ - Capability enforcement │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ Plugin Types: │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Connector │ │ Step │ │ Gate │ │ Agent │ │
│ │ Plugins │ │ Providers │ │ Providers │ │ Plugins │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Modules
### Module: `plugin-registry`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Plugin discovery; versioning; manifest management |
| **Data Entities** | `Plugin`, `PluginManifest`, `PluginVersion` |
| **Events Produced** | `plugin.discovered`, `plugin.registered`, `plugin.unregistered` |
**Plugin Entity**:
```typescript
interface Plugin {
id: UUID;
pluginId: string; // "com.example.my-connector"
version: string; // "1.2.3"
vendor: string;
license: string;
manifest: PluginManifest;
status: PluginStatus;
entrypoint: string; // Path to plugin executable/module
lastHealthCheck: DateTime;
healthMessage: string | null;
installedAt: DateTime;
updatedAt: DateTime;
}
type PluginStatus =
| "discovered" // Found but not loaded
| "loaded" // Loaded but not active
| "active" // Running and healthy
| "stopped" // Manually stopped
| "failed" // Failed to load or crashed
| "degraded"; // Running but with issues
```
---
### Module: `plugin-loader`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Plugin lifecycle management |
| **Dependencies** | `plugin-registry`, `plugin-sandbox` |
| **Events Produced** | `plugin.loaded`, `plugin.started`, `plugin.stopped`, `plugin.failed` |
**Plugin Lifecycle**:
```
┌──────────────┐
│ DISCOVERED │ ──── Plugin found in registry
└──────┬───────┘
│ load()
┌──────────────┐
│ LOADED │ ──── Plugin validated and prepared
└──────┬───────┘
│ start()
┌──────────────┐ ┌──────────────┐
│ ACTIVE │ ──── │ DEGRADED │ ◄── Health issues
└──────┬───────┘ └──────────────┘
│ stop() │
▼ │
┌──────────────┐ │
│ STOPPED │ ◄───────────┘ manual stop
└──────────────┘
│ unload()
┌──────────────┐
│ UNLOADED │
└──────────────┘
```
**Lifecycle Operations**:
```typescript
interface PluginLoader {
// Discovery
discover(): Promise<Plugin[]>;
refresh(): Promise<void>;
// Lifecycle
load(pluginId: string): Promise<Plugin>;
start(pluginId: string): Promise<void>;
stop(pluginId: string): Promise<void>;
unload(pluginId: string): Promise<void>;
restart(pluginId: string): Promise<void>;
// Health
checkHealth(pluginId: string): Promise<HealthStatus>;
getStatus(pluginId: string): Promise<PluginStatus>;
// Hot reload
reload(pluginId: string): Promise<void>;
}
```
---
### Module: `plugin-sandbox`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Isolation; resource limits; security |
| **Enforcement** | Process isolation, capability-based security |
**Sandbox Configuration**:
```typescript
interface SandboxConfig {
// Process isolation
processIsolation: boolean; // Run in separate process
containerIsolation: boolean; // Run in container
// Resource limits
resourceLimits: {
maxMemoryMb: number; // Memory limit
maxCpuPercent: number; // CPU limit
maxDiskMb: number; // Disk quota
maxNetworkBandwidth: number; // Network bandwidth limit
};
// Network restrictions
networkPolicy: {
allowedHosts: string[]; // Allowed outbound hosts
blockedHosts: string[]; // Blocked hosts
allowOutbound: boolean; // Allow any outbound
};
// Filesystem restrictions
filesystemPolicy: {
readOnlyPaths: string[];
writablePaths: string[];
blockedPaths: string[];
};
// Timeouts
timeouts: {
initializationMs: number;
operationMs: number;
shutdownMs: number;
};
}
```
**Capability Enforcement**:
```typescript
interface PluginCapabilities {
// Integration capabilities
integrations: {
scm: boolean;
ci: boolean;
registry: boolean;
vault: boolean;
router: boolean;
};
// Step capabilities
steps: {
deploy: boolean;
gate: boolean;
notify: boolean;
custom: boolean;
};
// System capabilities
system: {
network: boolean;
filesystem: boolean;
secrets: boolean;
database: boolean;
};
}
```
---
### Module: `plugin-sdk`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | SDK for plugin development |
| **Languages** | C#, TypeScript, Go |
**Plugin SDK Interface**:
```typescript
// Base plugin interface
interface StellaPlugin {
// Lifecycle
initialize(config: PluginConfig): Promise<void>;
start(): Promise<void>;
stop(): Promise<void>;
dispose(): Promise<void>;
// Health
getHealth(): Promise<HealthStatus>;
// Metadata
getManifest(): PluginManifest;
}
// Connector plugin interface
interface ConnectorPlugin extends StellaPlugin {
createConnector(config: ConnectorConfig): Promise<Connector>;
}
// Step provider plugin interface
interface StepProviderPlugin extends StellaPlugin {
getStepTypes(): StepType[];
executeStep(
stepType: string,
config: StepConfig,
inputs: StepInputs,
context: StepContext
): AsyncGenerator<StepEvent>;
}
// Gate provider plugin interface
interface GateProviderPlugin extends StellaPlugin {
getGateTypes(): GateType[];
evaluateGate(
gateType: string,
config: GateConfig,
context: GateContext
): Promise<GateResult>;
}
```
---
## Three-Surface Plugin Model
Plugins contribute to the system through three distinct surfaces:
### 1. Manifest Surface (Static)
The plugin manifest declares:
- Plugin identity and version
- Required capabilities
- Provided integrations/steps/gates
- Configuration schema
- UI components (optional)
```yaml
# plugin.stella.yaml
plugin:
id: "com.example.jenkins-connector"
version: "1.0.0"
vendor: "Example Corp"
license: "Apache-2.0"
description: "Jenkins CI integration for Stella Ops"
capabilities:
required:
- network
optional:
- secrets
provides:
integrations:
- type: "ci.jenkins"
displayName: "Jenkins"
configSchema: "./schemas/jenkins-config.json"
capabilities:
- "pipelines"
- "builds"
- "artifacts"
steps:
- type: "jenkins-trigger"
displayName: "Trigger Jenkins Build"
category: "integration"
configSchema: "./schemas/jenkins-trigger-config.json"
inputSchema: "./schemas/jenkins-trigger-input.json"
outputSchema: "./schemas/jenkins-trigger-output.json"
ui:
configScreen: "./ui/config.html"
icon: "./assets/jenkins-icon.svg"
dependencies:
stellaCore: ">=1.0.0"
```
### 2. Connector Runtime Surface (Dynamic)
Plugins implement connector interfaces for runtime operations:
```typescript
// Jenkins connector implementation
class JenkinsConnector implements CIConnector {
private client: JenkinsClient;
async initialize(config: ConnectorConfig, secrets: SecretHandle[]): Promise<void> {
const apiToken = await this.getSecret(secrets, "api_token");
this.client = new JenkinsClient({
baseUrl: config.endpoint,
username: config.username,
apiToken: apiToken,
});
}
async testConnection(): Promise<ConnectionTestResult> {
try {
const crumb = await this.client.getCrumb();
return { success: true, message: "Connected to Jenkins" };
} catch (error) {
return { success: false, message: error.message };
}
}
async listPipelines(): Promise<PipelineInfo[]> {
const jobs = await this.client.getJobs();
return jobs.map(job => ({
id: job.name,
name: job.displayName,
url: job.url,
lastBuild: job.lastBuild?.number,
}));
}
async triggerPipeline(pipelineId: string, params: object): Promise<PipelineRun> {
const queueItem = await this.client.build(pipelineId, params);
return {
id: queueItem.id.toString(),
pipelineId,
status: "queued",
startedAt: new Date(),
};
}
async getPipelineRun(runId: string): Promise<PipelineRun> {
const build = await this.client.getBuild(runId);
return {
id: build.number.toString(),
pipelineId: build.job,
status: this.mapStatus(build.result),
startedAt: new Date(build.timestamp),
completedAt: build.result ? new Date(build.timestamp + build.duration) : null,
};
}
}
```
### 3. Step Provider Surface (Execution)
Plugins implement step execution logic:
```typescript
// Jenkins trigger step implementation
class JenkinsTriggerStep implements StepExecutor {
async *execute(
config: StepConfig,
inputs: StepInputs,
context: StepContext
): AsyncGenerator<StepEvent> {
const connector = await context.getConnector<JenkinsConnector>(config.integrationId);
yield { type: "log", line: `Triggering Jenkins job: ${config.jobName}` };
// Trigger build
const run = await connector.triggerPipeline(config.jobName, inputs.parameters);
yield { type: "output", name: "buildId", value: run.id };
yield { type: "log", line: `Build queued: ${run.id}` };
// Wait for completion if configured
if (config.waitForCompletion) {
yield { type: "log", line: "Waiting for build to complete..." };
while (true) {
const status = await connector.getPipelineRun(run.id);
if (status.status === "succeeded") {
yield { type: "output", name: "status", value: "succeeded" };
yield { type: "result", success: true };
return;
}
if (status.status === "failed") {
yield { type: "output", name: "status", value: "failed" };
yield { type: "result", success: false, message: "Build failed" };
return;
}
yield { type: "progress", progress: 50, message: `Build running: ${status.status}` };
await sleep(config.pollIntervalSeconds * 1000);
}
}
yield { type: "result", success: true };
}
}
```
---
## Database Schema
```sql
-- Plugins
CREATE TABLE release.plugins (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
plugin_id VARCHAR(255) NOT NULL UNIQUE,
version VARCHAR(50) NOT NULL,
vendor VARCHAR(255) NOT NULL,
license VARCHAR(100),
manifest JSONB NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'discovered' CHECK (status IN (
'discovered', 'loaded', 'active', 'stopped', 'failed', 'degraded'
)),
entrypoint VARCHAR(500) NOT NULL,
last_health_check TIMESTAMPTZ,
health_message TEXT,
installed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_plugins_status ON release.plugins(status);
-- Plugin Instances (per-tenant configuration)
CREATE TABLE release.plugin_instances (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
plugin_id UUID NOT NULL REFERENCES release.plugins(id) ON DELETE CASCADE,
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
config JSONB NOT NULL DEFAULT '{}',
enabled BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_plugin_instances_tenant ON release.plugin_instances(tenant_id);
-- Integration types (populated by plugins)
CREATE TABLE release.integration_types (
id TEXT PRIMARY KEY, -- "scm.github", "ci.jenkins"
plugin_id UUID REFERENCES release.plugins(id),
display_name TEXT NOT NULL,
description TEXT,
icon_url TEXT,
config_schema JSONB NOT NULL, -- JSON Schema for config
capabilities TEXT[] NOT NULL, -- ["repos", "webhooks", "status"]
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```
---
## API Endpoints
```yaml
# Plugin Registry
GET /api/v1/plugins
Query: ?status={status}&capability={type}
Response: Plugin[]
GET /api/v1/plugins/{id}
Response: Plugin (with manifest)
POST /api/v1/plugins/{id}/enable
Response: Plugin
POST /api/v1/plugins/{id}/disable
Response: Plugin
GET /api/v1/plugins/{id}/health
Response: { status, message, diagnostics[] }
# Plugin Instances (per-tenant config)
POST /api/v1/plugin-instances
Body: { pluginId: UUID, config: object }
Response: PluginInstance
GET /api/v1/plugin-instances
Response: PluginInstance[]
PUT /api/v1/plugin-instances/{id}
Body: { config: object, enabled: boolean }
Response: PluginInstance
DELETE /api/v1/plugin-instances/{id}
Response: { deleted: true }
```
---
## Plugin Security
### Capability Declaration
Plugins must declare all required capabilities in their manifest. The system enforces:
1. **Network Access**: Plugins can only access declared hosts
2. **Secret Access**: Plugins receive secrets through controlled injection
3. **Database Access**: No direct database access; API only
4. **Filesystem Access**: Limited to declared paths
### Sandbox Enforcement
```typescript
// Plugin execution is sandboxed
class PluginSandbox {
async execute<T>(
plugin: Plugin,
operation: () => Promise<T>
): Promise<T> {
// 1. Verify capabilities
this.verifyCapabilities(plugin);
// 2. Set resource limits
const limits = this.getResourceLimits(plugin);
await this.applyLimits(limits);
// 3. Create isolated context
const context = await this.createIsolatedContext(plugin);
try {
// 4. Execute with timeout
return await this.withTimeout(
operation(),
plugin.manifest.timeouts.operationMs
);
} catch (error) {
// 5. Log and handle errors
await this.handlePluginError(plugin, error);
throw error;
} finally {
// 6. Cleanup
await context.dispose();
}
}
}
```
### Plugin Failures Cannot Crash Core
```csharp
// Core orchestration is protected from plugin failures
public sealed class PromotionDecisionEngine
{
public async Task<DecisionResult> EvaluateAsync(
Promotion promotion,
IReadOnlyList<IGateProvider> gates,
CancellationToken ct)
{
var results = new List<GateResult>();
foreach (var gate in gates)
{
try
{
// Plugin provides evaluation logic
var result = await gate.EvaluateAsync(promotion, ct);
results.Add(result);
}
catch (Exception ex)
{
// Plugin failure is logged but doesn't crash core
_logger.LogError(ex, "Gate {GateType} failed", gate.Type);
results.Add(new GateResult
{
GateType = gate.Type,
Status = GateStatus.Failed,
Message = $"Gate evaluation failed: {ex.Message}",
IsBlocking = gate.IsBlocking,
});
}
// Core decides how to aggregate (plugins cannot override)
if (results.Last().IsBlocking && _policy.FailFast)
break;
}
// Core makes final decision
return _decisionAggregator.Aggregate(results);
}
}
```
---
## References
- [Module Overview](overview.md)
- [Integration Hub](integration-hub.md)
- [Workflow Engine](workflow-engine.md)
- [Connector Interface](../integrations/connectors.md)

View File

@@ -0,0 +1,471 @@
# PROGDL: Progressive Delivery
**Purpose**: A/B releases, canary deployments, and traffic management.
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ PROGRESSIVE DELIVERY ARCHITECTURE │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ A/B RELEASE MANAGER │ │
│ │ │ │
│ │ - Create A/B release with variations │ │
│ │ - Manage traffic split configuration │ │
│ │ - Coordinate rollout stages │ │
│ │ - Handle promotion/rollback │ │
│ └──────────────────────────────┬──────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────┴──────────────────┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌───────────────────────┐ ┌───────────────────────┐ │
│ │ TARGET-GROUP A/B │ │ ROUTER-BASED A/B │ │
│ │ │ │ │ │
│ │ Deploy to groups │ │ Configure traffic │ │
│ │ by labels/membership │ │ via load balancer │ │
│ │ │ │ │ │
│ │ Good for: │ │ Good for: │ │
│ │ - Background workers │ │ - Web/API traffic │ │
│ │ - Batch processors │ │ - Customer-facing │ │
│ │ - Internal services │ │ - L7 routing │ │
│ └───────────────────────┘ └───────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CANARY CONTROLLER │ │
│ │ │ │
│ │ - Execute rollout stages │ │
│ │ - Monitor health metrics │ │
│ │ - Auto-advance or pause │ │
│ │ - Trigger rollback on failure │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ TRAFFIC ROUTER INTEGRATION │ │
│ │ │ │
│ │ Plugin-based integration with: │ │
│ │ - Nginx (config generation + reload) │ │
│ │ - HAProxy (config generation + reload) │ │
│ │ - Traefik (dynamic config API) │ │
│ │ - AWS ALB (target group weights) │ │
│ │ - Custom (webhook) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Modules
### Module: `ab-manager`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | A/B release lifecycle; variation management |
| **Dependencies** | `release-manager`, `environment-manager`, `deploy-orchestrator` |
| **Data Entities** | `ABRelease`, `Variation`, `TrafficSplit` |
| **Events Produced** | `ab.created`, `ab.started`, `ab.stage_advanced`, `ab.promoted`, `ab.rolled_back` |
**A/B Release Entity**:
```typescript
interface ABRelease {
id: UUID;
tenantId: UUID;
environmentId: UUID;
name: string;
variations: Variation[];
activeVariation: string; // "A" or "B"
trafficSplit: TrafficSplit;
rolloutStrategy: RolloutStrategy;
status: ABReleaseStatus;
createdAt: DateTime;
completedAt: DateTime | null;
createdBy: UUID;
}
interface Variation {
name: string; // "A", "B"
releaseId: UUID;
targetGroupId: UUID | null; // for target-group based A/B
trafficPercentage: number;
deploymentJobId: UUID | null;
}
interface TrafficSplit {
type: "percentage" | "sticky" | "header";
percentages: Record<string, number>; // {"A": 90, "B": 10}
stickyKey?: string; // cookie or header name
headerMatch?: { // for header-based routing
header: string;
values: Record<string, string>; // value -> variation
};
}
type ABReleaseStatus =
| "created" // Configured, not started
| "deploying" // Deploying variations
| "running" // Active with traffic split
| "promoting" // Promoting winner to 100%
| "completed" // Successfully completed
| "rolled_back"; // Rolled back to original
```
**A/B Release Models**:
| Model | Description | Use Case |
|-------|-------------|----------|
| **Target-Group A/B** | Deploy different releases to different target groups | Background workers, internal services |
| **Router-Based A/B** | Use load balancer to split traffic | Web/API traffic, customer-facing |
| **Hybrid A/B** | Combination of both | Complex deployments |
---
### Module: `traffic-router`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Router plugin orchestration; traffic shifting |
| **Dependencies** | `integration-manager`, `connector-runtime` |
| **Protocol** | Plugin-specific (API calls, config generation) |
**Router Connector Interface**:
```typescript
interface RouterConnector extends BaseConnector {
// Traffic management
configureRoute(config: RouteConfig): Promise<void>;
getTrafficDistribution(): Promise<TrafficDistribution>;
shiftTraffic(from: string, to: string, percentage: number): Promise<void>;
// Configuration
reloadConfig(): Promise<void>;
validateConfig(config: string): Promise<ValidationResult>;
}
interface RouteConfig {
upstream: string;
backends: Array<{
name: string;
targets: string[];
weight: number;
}>;
healthCheck?: {
path: string;
interval: number;
timeout: number;
};
}
interface TrafficDistribution {
backends: Array<{
name: string;
weight: number;
healthyTargets: number;
totalTargets: number;
}>;
timestamp: DateTime;
}
```
**Router Plugins**:
| Plugin | Capabilities |
|--------|-------------|
| `router.nginx` | Config generation, reload via signal/API |
| `router.haproxy` | Config generation, reload via socket |
| `router.traefik` | Dynamic config API |
| `router.aws_alb` | Target group weights via AWS API |
| `router.custom` | Webhook-based custom integration |
---
### Module: `canary-controller`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Canary ramp automation; health monitoring |
| **Dependencies** | `ab-manager`, `traffic-router` |
| **Data Entities** | `CanaryStage`, `HealthResult` |
| **Events Produced** | `canary.stage_started`, `canary.stage_passed`, `canary.stage_failed` |
**Canary Stage Entity**:
```typescript
interface CanaryStage {
id: UUID;
abReleaseId: UUID;
stageNumber: number;
trafficPercentage: number;
status: CanaryStageStatus;
healthThreshold: number; // Required health % to pass
durationSeconds: number; // How long to run stage
requireApproval: boolean; // Require manual approval
startedAt: DateTime | null;
completedAt: DateTime | null;
healthResult: HealthResult | null;
}
type CanaryStageStatus =
| "pending"
| "running"
| "succeeded"
| "failed"
| "skipped";
interface HealthResult {
healthy: boolean;
healthPercentage: number;
metrics: {
successRate: number;
errorRate: number;
latencyP50: number;
latencyP99: number;
};
samples: number;
evaluatedAt: DateTime;
}
```
**Canary Rollout Execution**:
```typescript
class CanaryController {
async executeRollout(abRelease: ABRelease): Promise<void> {
const stages = abRelease.rolloutStrategy.stages;
for (const stage of stages) {
this.log(`Starting canary stage ${stage.stageNumber}: ${stage.trafficPercentage}%`);
// 1. Shift traffic to canary percentage
await this.trafficRouter.shiftTraffic(
abRelease.variations[0].name, // baseline
abRelease.variations[1].name, // canary
stage.trafficPercentage
);
// 2. Update stage status
stage.status = "running";
stage.startedAt = new Date();
await this.save(stage);
// 3. Wait for stage duration
await this.waitForDuration(stage.durationSeconds);
// 4. Evaluate health
const healthResult = await this.evaluateHealth(abRelease, stage);
stage.healthResult = healthResult;
if (!healthResult.healthy || healthResult.healthPercentage < stage.healthThreshold) {
stage.status = "failed";
await this.save(stage);
// Rollback
await this.rollback(abRelease);
throw new CanaryFailedError(`Stage ${stage.stageNumber} failed health check`);
}
// 5. Check if approval required
if (stage.requireApproval) {
await this.waitForApproval(abRelease, stage);
}
stage.status = "succeeded";
stage.completedAt = new Date();
await this.save(stage);
// 6. Check for auto-advance
if (!abRelease.rolloutStrategy.autoAdvance) {
await this.waitForManualAdvance(abRelease);
}
}
// All stages passed - promote canary to 100%
await this.promote(abRelease, abRelease.variations[1].name);
}
private async evaluateHealth(abRelease: ABRelease, stage: CanaryStage): Promise<HealthResult> {
// Collect metrics from targets
const canaryVariation = abRelease.variations.find(v => v.name === "B");
const targets = await this.getTargets(canaryVariation.targetGroupId);
let healthyCount = 0;
let totalLatency = 0;
let errorCount = 0;
for (const target of targets) {
const health = await this.checkTargetHealth(target);
if (health.healthy) healthyCount++;
totalLatency += health.latencyMs;
if (health.errorRate > 0) errorCount++;
}
return {
healthy: healthyCount >= targets.length * (stage.healthThreshold / 100),
healthPercentage: (healthyCount / targets.length) * 100,
metrics: {
successRate: ((targets.length - errorCount) / targets.length) * 100,
errorRate: (errorCount / targets.length) * 100,
latencyP50: totalLatency / targets.length,
latencyP99: totalLatency / targets.length * 1.5, // simplified
},
samples: targets.length,
evaluatedAt: new Date(),
};
}
}
```
---
### Module: `rollout-strategy`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Strategy templates; configuration |
| **Data Entities** | `RolloutStrategyTemplate` |
**Built-in Strategy Templates**:
| Template | Stages | Description |
|----------|--------|-------------|
| `canary-10-25-50-100` | 4 | Standard canary: 10%, 25%, 50%, 100% |
| `canary-1-5-10-50-100` | 5 | Conservative: 1%, 5%, 10%, 50%, 100% |
| `blue-green-instant` | 2 | Deploy 100% to green, instant switch |
| `blue-green-gradual` | 4 | Gradual shift: 25%, 50%, 75%, 100% |
**Rollout Strategy Definition**:
```typescript
interface RolloutStrategy {
id: UUID;
name: string;
stages: Array<{
trafficPercentage: number;
durationSeconds: number;
healthThreshold: number;
requireApproval: boolean;
}>;
autoAdvance: boolean;
rollbackOnFailure: boolean;
healthCheckInterval: number;
}
// Example: Standard Canary
const standardCanary: RolloutStrategy = {
name: "canary-10-25-50-100",
stages: [
{ trafficPercentage: 10, durationSeconds: 300, healthThreshold: 95, requireApproval: false },
{ trafficPercentage: 25, durationSeconds: 600, healthThreshold: 95, requireApproval: false },
{ trafficPercentage: 50, durationSeconds: 900, healthThreshold: 95, requireApproval: true },
{ trafficPercentage: 100, durationSeconds: 0, healthThreshold: 95, requireApproval: false },
],
autoAdvance: true,
rollbackOnFailure: true,
healthCheckInterval: 30,
};
```
---
## Database Schema
```sql
-- A/B Releases
CREATE TABLE release.ab_releases (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
environment_id UUID NOT NULL REFERENCES release.environments(id),
name VARCHAR(255) NOT NULL,
variations JSONB NOT NULL, -- [{name, releaseId, targetGroupId, trafficPercentage}]
active_variation VARCHAR(50) NOT NULL DEFAULT 'A',
traffic_split JSONB NOT NULL,
rollout_strategy JSONB NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN (
'created', 'deploying', 'running', 'promoting', 'completed', 'rolled_back'
)),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
completed_at TIMESTAMPTZ,
created_by UUID REFERENCES users(id)
);
CREATE INDEX idx_ab_releases_tenant_env ON release.ab_releases(tenant_id, environment_id);
CREATE INDEX idx_ab_releases_status ON release.ab_releases(status);
-- Canary Stages
CREATE TABLE release.canary_stages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
ab_release_id UUID NOT NULL REFERENCES release.ab_releases(id) ON DELETE CASCADE,
stage_number INTEGER NOT NULL,
traffic_percentage INTEGER NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
'pending', 'running', 'succeeded', 'failed', 'skipped'
)),
health_threshold DECIMAL(5,2),
duration_seconds INTEGER,
require_approval BOOLEAN NOT NULL DEFAULT FALSE,
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
health_result JSONB,
UNIQUE (ab_release_id, stage_number)
);
```
---
## API Endpoints
```yaml
# A/B Releases
POST /api/v1/ab-releases
Body: {
environmentId: UUID,
name: string,
variations: [
{ name: "A", releaseId: UUID, targetGroupId?: UUID },
{ name: "B", releaseId: UUID, targetGroupId?: UUID }
],
trafficSplit: TrafficSplit,
rolloutStrategy: RolloutStrategy
}
Response: ABRelease
GET /api/v1/ab-releases
Query: ?environmentId={uuid}&status={status}
Response: ABRelease[]
GET /api/v1/ab-releases/{id}
Response: ABRelease (with stages)
POST /api/v1/ab-releases/{id}/start
Response: ABRelease
POST /api/v1/ab-releases/{id}/advance
Body: { stageNumber?: number } # advance to next or specific stage
Response: ABRelease
POST /api/v1/ab-releases/{id}/promote
Body: { variation: "A" | "B" } # promote to 100%
Response: ABRelease
POST /api/v1/ab-releases/{id}/rollback
Response: ABRelease
GET /api/v1/ab-releases/{id}/traffic
Response: { currentSplit: TrafficDistribution, history: TrafficHistory[] }
GET /api/v1/ab-releases/{id}/health
Response: { variations: [{ name, healthStatus, metrics }] }
# Rollout Strategies
GET /api/v1/rollout-strategies
Response: RolloutStrategyTemplate[]
GET /api/v1/rollout-strategies/{id}
Response: RolloutStrategyTemplate
```
---
## References
- [Module Overview](overview.md)
- [Deploy Orchestrator](deploy-orchestrator.md)
- [A/B Releases](../progressive-delivery/ab-releases.md)
- [Canary Controller](../progressive-delivery/canary.md)
- [Router Plugins](../progressive-delivery/routers.md)

View File

@@ -0,0 +1,433 @@
# PROMOT: Promotion & Approval Manager
**Purpose**: Manage promotion requests, approvals, gates, and decision records.
## Modules
### Module: `promotion-manager`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Promotion request lifecycle; state management |
| **Dependencies** | `release-manager`, `environment-manager`, `workflow-engine` |
| **Data Entities** | `Promotion`, `PromotionState` |
| **Events Produced** | `promotion.requested`, `promotion.approved`, `promotion.rejected`, `promotion.started`, `promotion.completed`, `promotion.failed`, `promotion.rolled_back` |
**Key Operations**:
```
RequestPromotion(releaseId, targetEnvironmentId, reason) → Promotion
ApprovePromotion(promotionId, comment) → Promotion
RejectPromotion(promotionId, reason) → Promotion
CancelPromotion(promotionId) → Promotion
GetPromotionStatus(promotionId) → PromotionState
GetDecisionRecord(promotionId) → DecisionRecord
```
**Promotion Entity**:
```typescript
interface Promotion {
id: UUID;
tenantId: UUID;
releaseId: UUID;
sourceEnvironmentId: UUID | null; // null for first deployment
targetEnvironmentId: UUID;
status: PromotionStatus;
decisionRecord: DecisionRecord;
workflowRunId: UUID | null;
requestedAt: DateTime;
requestedBy: UUID;
requestReason: string;
decidedAt: DateTime | null;
startedAt: DateTime | null;
completedAt: DateTime | null;
evidencePacketId: UUID | null;
}
type PromotionStatus =
| "pending_approval" // Waiting for human approval
| "pending_gate" // Waiting for gate evaluation
| "approved" // Ready for deployment
| "rejected" // Blocked by approval or gate
| "deploying" // Deployment in progress
| "deployed" // Successfully deployed
| "failed" // Deployment failed
| "cancelled" // User cancelled
| "rolled_back"; // Rolled back after failure
```
---
### Module: `approval-gateway`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Approval collection; separation of duties enforcement |
| **Dependencies** | `authority` (for user/group lookup) |
| **Data Entities** | `Approval`, `ApprovalPolicy` |
| **Events Produced** | `approval.granted`, `approval.denied` |
**Approval Policy Entity**:
```typescript
interface ApprovalPolicy {
id: UUID;
tenantId: UUID;
environmentId: UUID;
requiredCount: number; // Minimum approvals required
requiredRoles: string[]; // At least one approver must have role
requiredGroups: string[]; // At least one approver must be in group
requireSeparationOfDuties: boolean; // Requester cannot approve
allowSelfApproval: boolean; // Override SoD for specific users
expirationMinutes: number; // Approval expires after N minutes
}
interface Approval {
id: UUID;
tenantId: UUID;
promotionId: UUID;
approverId: UUID;
action: "approved" | "rejected";
comment: string;
approvedAt: DateTime;
approverRole: string;
approverGroups: string[];
}
```
**Separation of Duties (SoD) Rules**:
1. Requester cannot approve their own promotion (if `requireSeparationOfDuties` is true)
2. Same user cannot approve twice
3. At least N different users must approve (based on `requiredCount`)
4. At least one approver must match `requiredRoles` if specified
5. At least one approver must be in `requiredGroups` if specified
---
### Module: `decision-engine`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Gate evaluation; policy integration; decision record generation |
| **Dependencies** | `gate-registry`, `policy` (OPA integration), `scanner` (security data) |
| **Data Entities** | `DecisionRecord`, `GateResult` |
| **Events Produced** | `decision.evaluated`, `decision.recorded` |
**Decision Record Structure**:
```typescript
interface DecisionRecord {
promotionId: UUID;
evaluatedAt: DateTime;
decision: "allow" | "deny" | "pending";
// What was evaluated
release: {
id: UUID;
name: string;
components: Array<{
name: string;
digest: string;
semver: string;
}>;
};
environment: {
id: UUID;
name: string;
requiredApprovals: number;
freezeWindow: boolean;
};
// Gate evaluation results
gates: GateResult[];
// Approval status
approvalStatus: {
required: number;
received: number;
approvers: Array<{
userId: UUID;
action: string;
at: DateTime;
}>;
sodViolation: boolean;
};
// Reason for decision
reasons: string[];
// Hash of all inputs for replay verification
inputsHash: string;
}
interface GateResult {
gateType: string;
gateName: string;
status: "passed" | "failed" | "warning" | "skipped";
message: string;
details: Record<string, any>;
evaluatedAt: DateTime;
durationMs: number;
}
```
**Gate Evaluation Order**:
1. **Freeze Window Check**: Is environment in freeze?
2. **Approval Check**: All required approvals received?
3. **Security Gate**: No blocking vulnerabilities?
4. **Custom Policy Gates**: All OPA policies pass?
5. **Integration Gates**: External system checks pass?
---
### Module: `gate-registry`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Built-in + custom gate registration |
| **Dependencies** | `plugin-registry` |
| **Data Entities** | `GateDefinition`, `GateConfig` |
**Built-in Gates**:
| Gate Type | Description |
|-----------|-------------|
| `freeze-window` | Check if environment is in freeze |
| `approval` | Check if required approvals received |
| `security-scan` | Check for blocking vulnerabilities |
| `scan-freshness` | Check if scan is recent enough |
| `digest-verification` | Verify digests haven't changed |
| `environment-sequence` | Enforce promotion order |
| `custom-opa` | Custom OPA/Rego policy |
| `webhook` | External webhook gate |
**Gate Definition**:
```typescript
interface GateDefinition {
type: string;
displayName: string;
description: string;
configSchema: JSONSchema;
evaluator: "builtin" | UUID; // builtin or plugin ID
blocking: boolean; // Can block promotion
cacheable: boolean; // Can cache result
cacheTtlSeconds: number;
}
```
---
## Promotion State Machine
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ PROMOTION STATE MACHINE │
│ │
│ ┌───────────────┐ │
│ │ REQUESTED │ ◄──── User requests promotion │
│ └───────┬───────┘ │
│ │ │
│ ▼ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ PENDING │─────►│ REJECTED │ ◄──── Approver rejects │
│ │ APPROVAL │ └───────────────┘ │
│ └───────┬───────┘ │
│ │ approval received │
│ ▼ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ PENDING │─────►│ REJECTED │ ◄──── Gate fails │
│ │ GATE │ └───────────────┘ │
│ └───────┬───────┘ │
│ │ all gates pass │
│ ▼ │
│ ┌───────────────┐ │
│ │ APPROVED │ ◄──── Ready for deployment │
│ └───────┬───────┘ │
│ │ workflow starts │
│ ▼ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ DEPLOYING │─────►│ FAILED │─────►│ ROLLED_BACK │ │
│ └───────┬───────┘ └───────────────┘ └───────────────┘ │
│ │ │
│ │ deployment complete │
│ ▼ │
│ ┌───────────────┐ │
│ │ DEPLOYED │ ◄──── Success! │
│ └───────────────┘ │
│ │
│ Additional transitions: │
│ - Any non-terminal → CANCELLED: user cancels │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## Database Schema
```sql
-- Promotions
CREATE TABLE release.promotions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
release_id UUID NOT NULL REFERENCES release.releases(id),
source_environment_id UUID REFERENCES release.environments(id),
target_environment_id UUID NOT NULL REFERENCES release.environments(id),
status VARCHAR(50) NOT NULL DEFAULT 'pending_approval' CHECK (status IN (
'pending_approval', 'pending_gate', 'approved', 'rejected',
'deploying', 'deployed', 'failed', 'cancelled', 'rolled_back'
)),
decision_record JSONB,
workflow_run_id UUID REFERENCES release.workflow_runs(id),
requested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
requested_by UUID NOT NULL REFERENCES users(id),
request_reason TEXT,
decided_at TIMESTAMPTZ,
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
evidence_packet_id UUID
);
CREATE INDEX idx_promotions_tenant ON release.promotions(tenant_id);
CREATE INDEX idx_promotions_release ON release.promotions(release_id);
CREATE INDEX idx_promotions_status ON release.promotions(status);
CREATE INDEX idx_promotions_target_env ON release.promotions(target_environment_id);
-- Approvals
CREATE TABLE release.approvals (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
promotion_id UUID NOT NULL REFERENCES release.promotions(id) ON DELETE CASCADE,
approver_id UUID NOT NULL REFERENCES users(id),
action VARCHAR(50) NOT NULL CHECK (action IN ('approved', 'rejected')),
comment TEXT,
approved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
approver_role VARCHAR(255),
approver_groups JSONB NOT NULL DEFAULT '[]'
);
CREATE INDEX idx_approvals_promotion ON release.approvals(promotion_id);
CREATE INDEX idx_approvals_approver ON release.approvals(approver_id);
-- Approval Policies
CREATE TABLE release.approval_policies (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE,
required_count INTEGER NOT NULL DEFAULT 1,
required_roles JSONB NOT NULL DEFAULT '[]',
required_groups JSONB NOT NULL DEFAULT '[]',
require_sod BOOLEAN NOT NULL DEFAULT FALSE,
allow_self_approval BOOLEAN NOT NULL DEFAULT FALSE,
expiration_minutes INTEGER NOT NULL DEFAULT 1440,
UNIQUE (tenant_id, environment_id)
);
```
---
## API Endpoints
```yaml
# Promotions
POST /api/v1/promotions
Body: { releaseId, targetEnvironmentId, reason? }
Response: Promotion
GET /api/v1/promotions
Query: ?status={status}&releaseId={uuid}&environmentId={uuid}&page={n}
Response: { data: Promotion[], meta: PaginationMeta }
GET /api/v1/promotions/{id}
Response: Promotion (with decision record, approvals)
POST /api/v1/promotions/{id}/approve
Body: { comment? }
Response: Promotion
POST /api/v1/promotions/{id}/reject
Body: { reason }
Response: Promotion
POST /api/v1/promotions/{id}/cancel
Response: Promotion
GET /api/v1/promotions/{id}/decision
Response: DecisionRecord
GET /api/v1/promotions/{id}/approvals
Response: Approval[]
GET /api/v1/promotions/{id}/evidence
Response: EvidencePacket
# Gate Evaluation Preview
POST /api/v1/promotions/preview-gates
Body: { releaseId, targetEnvironmentId }
Response: { wouldPass: boolean, gates: GateResult[] }
# Approval Policies
POST /api/v1/approval-policies
GET /api/v1/approval-policies
GET /api/v1/approval-policies/{id}
PUT /api/v1/approval-policies/{id}
DELETE /api/v1/approval-policies/{id}
# Pending Approvals (for current user)
GET /api/v1/my/pending-approvals
Response: Promotion[]
```
---
## Security Gate Integration
The security gate evaluates the release against vulnerability data from the Scanner module:
```typescript
interface SecurityGateConfig {
blockOnCritical: boolean; // Block if any critical severity
blockOnHigh: boolean; // Block if any high severity
maxCritical: number; // Max allowed critical (0 for strict)
maxHigh: number; // Max allowed high
requireFreshScan: boolean; // Require scan within N hours
scanFreshnessHours: number; // How recent scan must be
allowExceptions: boolean; // Allow VEX exceptions
requireVexJustification: boolean; // Require VEX for exceptions
}
interface SecurityGateResult {
passed: boolean;
summary: {
critical: number;
high: number;
medium: number;
low: number;
};
blocking: Array<{
cve: string;
severity: string;
component: string;
digest: string;
fixAvailable: boolean;
}>;
exceptions: Array<{
cve: string;
vexStatus: string;
justification: string;
}>;
scanAge: {
component: string;
scannedAt: DateTime;
ageHours: number;
fresh: boolean;
}[];
}
```
---
## References
- [Module Overview](overview.md)
- [Workflow Engine](workflow-engine.md)
- [Security Architecture](../security/overview.md)
- [API Documentation](../api/promotions.md)

View File

@@ -0,0 +1,406 @@
# RELMAN: Release Management
**Purpose**: Manage components, versions, and release bundles.
## Modules
### Module: `component-registry`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Map image repositories to logical components |
| **Dependencies** | `integration-manager` (for registry access) |
| **Data Entities** | `Component`, `ComponentVersion` |
| **Events Produced** | `component.created`, `component.updated`, `component.deleted` |
**Key Operations**:
```
CreateComponent(name, displayName, imageRepository, registryId) → Component
UpdateComponent(id, config) → Component
DeleteComponent(id) → void
SyncVersions(componentId, forceRefresh) → VersionMap[]
ListComponents(tenantId) → Component[]
```
**Component Entity**:
```typescript
interface Component {
id: UUID;
tenantId: UUID;
name: string; // "api", "worker", "frontend"
displayName: string; // "API Service"
imageRepository: string; // "registry.example.com/myapp/api"
registryIntegrationId: UUID; // which registry integration
versioningStrategy: VersionStrategy;
deploymentTemplate: string; // which workflow template to use
defaultChannel: string; // "stable", "beta"
metadata: Record<string, string>;
}
interface VersionStrategy {
type: "semver" | "date" | "sequential" | "manual";
tagPattern?: string; // regex for tag extraction
semverExtract?: string; // regex capture group
}
```
---
### Module: `version-manager`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Tag/digest mapping; version rules |
| **Dependencies** | `component-registry`, `connector-runtime` |
| **Data Entities** | `VersionMap`, `VersionRule`, `Channel` |
| **Events Produced** | `version.resolved`, `version.updated` |
**Version Resolution**:
```typescript
interface VersionMap {
id: UUID;
componentId: UUID;
tag: string; // "v2.3.1"
digest: string; // "sha256:abc123..."
semver: string; // "2.3.1"
channel: string; // "stable"
prerelease: boolean;
buildMetadata: string;
resolvedAt: DateTime;
source: "auto" | "manual";
}
interface VersionRule {
id: UUID;
componentId: UUID;
pattern: string; // "^v(\\d+\\.\\d+\\.\\d+)$"
channel: string; // "stable"
prereleasePattern: string;// ".*-(alpha|beta|rc).*"
}
```
**Version Resolution Algorithm**:
1. Fetch tags from registry (via connector)
2. Apply version rules to extract semver
3. Resolve each tag to digest
4. Store in version map
5. Update channels ("latest stable", "latest beta")
---
### Module: `release-manager`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Release bundle lifecycle; composition |
| **Dependencies** | `component-registry`, `version-manager` |
| **Data Entities** | `Release`, `ReleaseComponent` |
| **Events Produced** | `release.created`, `release.promoted`, `release.deprecated` |
**Release Entity**:
```typescript
interface Release {
id: UUID;
tenantId: UUID;
name: string; // "myapp-v2.3.1"
displayName: string; // "MyApp 2.3.1"
components: ReleaseComponent[];
sourceRef: SourceReference;
status: ReleaseStatus;
createdAt: DateTime;
createdBy: UUID;
deployedEnvironments: UUID[]; // where currently deployed
metadata: Record<string, string>;
}
interface ReleaseComponent {
componentId: UUID;
componentName: string;
digest: string; // sha256:...
semver: string; // resolved semver
tag: string; // original tag (for display)
role: "primary" | "sidecar" | "init" | "migration";
}
interface SourceReference {
scmIntegrationId?: UUID;
commitSha?: string;
branch?: string;
ciIntegrationId?: UUID;
buildId?: string;
pipelineUrl?: string;
}
type ReleaseStatus =
| "draft" // being composed
| "ready" // ready for promotion
| "promoting" // promotion in progress
| "deployed" // deployed to at least one env
| "deprecated" // marked as deprecated
| "archived"; // no longer active
```
**Release Creation Modes**:
| Mode | Description |
|------|-------------|
| **Full Release** | All components, latest versions |
| **Partial Release** | Subset of components updated; others pinned from last deployment |
| **Pinned Release** | All versions explicitly specified |
| **Channel Release** | All components from specific channel ("beta") |
---
### Module: `release-catalog`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Release history, search, comparison |
| **Dependencies** | `release-manager` |
**Key Operations**:
```
SearchReleases(filter, pagination) → Release[]
CompareReleases(releaseA, releaseB) → ReleaseDiff
GetReleaseHistory(componentId) → Release[]
GetReleaseLineage(releaseId) → ReleaseLineage // promotion path
```
**Release Comparison**:
```typescript
interface ReleaseDiff {
releaseA: UUID;
releaseB: UUID;
added: ComponentDiff[]; // Components in B not in A
removed: ComponentDiff[]; // Components in A not in B
changed: ComponentChange[]; // Components with different versions
unchanged: ComponentDiff[]; // Components with same version
}
interface ComponentChange {
componentId: UUID;
componentName: string;
fromVersion: string;
toVersion: string;
fromDigest: string;
toDigest: string;
}
```
---
## Database Schema
```sql
-- Components
CREATE TABLE release.components (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
display_name VARCHAR(255) NOT NULL,
image_repository VARCHAR(500) NOT NULL,
registry_integration_id UUID REFERENCES release.integrations(id),
versioning_strategy JSONB NOT NULL DEFAULT '{"type": "semver"}',
deployment_template VARCHAR(255),
default_channel VARCHAR(50) NOT NULL DEFAULT 'stable',
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (tenant_id, name)
);
CREATE INDEX idx_components_tenant ON release.components(tenant_id);
-- Version Maps
CREATE TABLE release.version_maps (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
component_id UUID NOT NULL REFERENCES release.components(id) ON DELETE CASCADE,
tag VARCHAR(255) NOT NULL,
digest VARCHAR(100) NOT NULL,
semver VARCHAR(50),
channel VARCHAR(50) NOT NULL DEFAULT 'stable',
prerelease BOOLEAN NOT NULL DEFAULT FALSE,
build_metadata VARCHAR(255),
resolved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
source VARCHAR(50) NOT NULL DEFAULT 'auto',
UNIQUE (tenant_id, component_id, digest)
);
CREATE INDEX idx_version_maps_component ON release.version_maps(component_id);
CREATE INDEX idx_version_maps_digest ON release.version_maps(digest);
CREATE INDEX idx_version_maps_semver ON release.version_maps(semver);
-- Releases
CREATE TABLE release.releases (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
display_name VARCHAR(255) NOT NULL,
components JSONB NOT NULL, -- [{componentId, digest, semver, tag, role}]
source_ref JSONB, -- {scmIntegrationId, commitSha, ciIntegrationId, buildId}
status VARCHAR(50) NOT NULL DEFAULT 'draft',
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by UUID REFERENCES users(id),
UNIQUE (tenant_id, name)
);
CREATE INDEX idx_releases_tenant ON release.releases(tenant_id);
CREATE INDEX idx_releases_status ON release.releases(status);
CREATE INDEX idx_releases_created ON release.releases(created_at DESC);
-- Release Environment State
CREATE TABLE release.release_environment_state (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE,
release_id UUID NOT NULL REFERENCES release.releases(id),
status VARCHAR(50) NOT NULL,
deployed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
deployed_by UUID REFERENCES users(id),
promotion_id UUID,
evidence_ref VARCHAR(255),
UNIQUE (tenant_id, environment_id)
);
CREATE INDEX idx_release_env_state_env ON release.release_environment_state(environment_id);
CREATE INDEX idx_release_env_state_release ON release.release_environment_state(release_id);
```
---
## API Endpoints
```yaml
# Components
POST /api/v1/components
Body: { name, displayName, imageRepository, registryIntegrationId, versioningStrategy?, defaultChannel? }
Response: Component
GET /api/v1/components
Response: Component[]
GET /api/v1/components/{id}
Response: Component
PUT /api/v1/components/{id}
Response: Component
DELETE /api/v1/components/{id}
Response: { deleted: true }
POST /api/v1/components/{id}/sync-versions
Body: { forceRefresh?: boolean }
Response: { synced: number, versions: VersionMap[] }
GET /api/v1/components/{id}/versions
Query: ?channel={stable|beta}&limit={n}
Response: VersionMap[]
# Version Maps
POST /api/v1/version-maps
Body: { componentId, tag, semver, channel } # manual version assignment
Response: VersionMap
GET /api/v1/version-maps
Query: ?componentId={uuid}&channel={channel}
Response: VersionMap[]
# Releases
POST /api/v1/releases
Body: {
name: string,
displayName?: string,
components: [
{ componentId: UUID, version?: string, digest?: string, channel?: string }
],
sourceRef?: SourceReference
}
Response: Release
GET /api/v1/releases
Query: ?status={status}&componentId={uuid}&page={n}&pageSize={n}
Response: { data: Release[], meta: PaginationMeta }
GET /api/v1/releases/{id}
Response: Release (with full component details)
PUT /api/v1/releases/{id}
Body: { displayName?, metadata?, status? }
Response: Release
DELETE /api/v1/releases/{id}
Response: { deleted: true }
GET /api/v1/releases/{id}/state
Response: { environments: [{ environmentId, status, deployedAt }] }
POST /api/v1/releases/{id}/deprecate
Response: Release
GET /api/v1/releases/{id}/compare/{otherId}
Response: ReleaseDiff
# Quick release creation
POST /api/v1/releases/from-latest
Body: {
name: string,
channel?: string, # default: stable
componentIds?: UUID[], # default: all
pinFrom?: { environmentId: UUID } # for partial release
}
Response: Release
```
---
## Release Identity: Digest-First Principle
A core design invariant of the Release Orchestrator:
```
INVARIANT: A release is a set of OCI image digests (component -> digest mapping), never tags.
```
**Implementation Requirements**:
- Tags are convenience inputs for resolution
- Tags are resolved to digests at release creation time
- All downstream operations (promotion, deployment, rollback) use digests
- Digest mismatch at pull time = deployment failure (tamper detection)
**Example**:
```json
{
"id": "release-uuid",
"name": "myapp-v2.3.1",
"components": [
{
"componentId": "api-component-uuid",
"componentName": "api",
"tag": "v2.3.1",
"digest": "sha256:abc123def456...",
"semver": "2.3.1",
"role": "primary"
},
{
"componentId": "worker-component-uuid",
"componentName": "worker",
"tag": "v2.3.1",
"digest": "sha256:789xyz123abc...",
"semver": "2.3.1",
"role": "primary"
}
]
}
```
---
## References
- [Module Overview](overview.md)
- [Design Principles](../design/principles.md)
- [API Documentation](../api/releases.md)
- [Promotion Manager](promotion-manager.md)

View File

@@ -0,0 +1,590 @@
# WORKFL: Workflow Engine
**Purpose**: DAG-based workflow execution for deployments, approvals, and custom automation.
## Modules
### Module: `workflow-designer`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Template creation; DAG graph editor; validation |
| **Dependencies** | `step-registry` |
| **Data Entities** | `WorkflowTemplate`, `StepNode`, `StepEdge` |
**Workflow Template Structure**:
```typescript
interface WorkflowTemplate {
id: UUID;
tenantId: UUID;
name: string;
displayName: string;
description: string;
version: number;
// DAG structure
nodes: StepNode[];
edges: StepEdge[];
// I/O
inputs: InputDefinition[];
outputs: OutputDefinition[];
// Metadata
tags: string[];
isBuiltin: boolean;
createdAt: DateTime;
createdBy: UUID;
}
```
---
### Module: `workflow-engine`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | DAG execution; state machine; pause/resume |
| **Dependencies** | `step-executor`, `step-registry` |
| **Data Entities** | `WorkflowRun`, `WorkflowState` |
| **Events Produced** | `workflow.started`, `workflow.paused`, `workflow.resumed`, `workflow.completed`, `workflow.failed` |
**Workflow Execution Algorithm**:
```python
class WorkflowEngine:
def execute(self, workflow_run: WorkflowRun) -> None:
"""Main workflow execution loop."""
# Initialize
workflow_run.status = "running"
workflow_run.started_at = now()
self.save(workflow_run)
try:
while not self.is_terminal(workflow_run):
# Handle pause state
if workflow_run.status == "paused":
self.wait_for_resume(workflow_run)
continue
# Get nodes ready for execution
ready_nodes = self.get_ready_nodes(workflow_run)
if not ready_nodes:
# Check if we're waiting on approvals
if self.has_pending_approvals(workflow_run):
workflow_run.status = "paused"
self.save(workflow_run)
continue
# Check if all nodes are complete
if self.all_nodes_complete(workflow_run):
break
# Deadlock detection
raise WorkflowDeadlockError(workflow_run.id)
# Execute ready nodes in parallel
futures = []
for node in ready_nodes:
future = self.executor.submit(
self.execute_node,
workflow_run,
node
)
futures.append((node, future))
# Wait for at least one to complete
completed = self.wait_any(futures)
for node, result in completed:
step_run = self.get_step_run(workflow_run, node.id)
if result.success:
step_run.status = "succeeded"
step_run.outputs = result.outputs
self.propagate_outputs(workflow_run, node, result.outputs)
else:
step_run.status = "failed"
step_run.error_message = result.error
# Handle failure action
if node.on_failure == "fail":
workflow_run.status = "failed"
workflow_run.error_message = f"Step {node.name} failed: {result.error}"
self.cancel_pending_steps(workflow_run)
return
elif node.on_failure == "rollback":
self.trigger_rollback(workflow_run, node)
elif node.on_failure.startswith("goto:"):
target = node.on_failure.split(":")[1]
self.add_ready_node(workflow_run, target)
# "continue" just continues to next nodes
step_run.completed_at = now()
self.save(step_run)
# Workflow completed successfully
workflow_run.status = "succeeded"
workflow_run.completed_at = now()
self.save(workflow_run)
except WorkflowCancelledError:
workflow_run.status = "cancelled"
workflow_run.completed_at = now()
self.save(workflow_run)
except Exception as e:
workflow_run.status = "failed"
workflow_run.error_message = str(e)
workflow_run.completed_at = now()
self.save(workflow_run)
```
---
### Module: `step-executor`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Step dispatch; retry logic; timeout handling |
| **Dependencies** | `step-registry`, `plugin-sandbox` |
| **Data Entities** | `StepRun`, `StepResult` |
| **Events Produced** | `step.started`, `step.progress`, `step.completed`, `step.failed`, `step.retrying` |
**Step Node Structure**:
```typescript
interface StepNode {
id: string; // Unique within template (e.g., "deploy-api")
type: string; // Step type from registry
name: string; // Display name
config: Record<string, any>; // Step-specific configuration
inputs: InputBinding[]; // Input value bindings
outputs: OutputBinding[]; // Output declarations
position: { x: number; y: number }; // UI position
// Execution settings
timeout: number; // Seconds (default from step type)
retryPolicy: RetryPolicy;
onFailure: FailureAction;
condition?: string; // JS expression for conditional execution
// Documentation
description?: string;
documentation?: string;
}
type FailureAction = "fail" | "continue" | "rollback" | "goto:{nodeId}";
interface InputBinding {
name: string; // Input parameter name
source: InputSource;
}
type InputSource =
| { type: "literal"; value: any }
| { type: "context"; path: string } // e.g., "release.name"
| { type: "output"; nodeId: string; outputName: string }
| { type: "secret"; secretName: string }
| { type: "expression"; expression: string }; // JS expression
interface StepEdge {
id: string;
from: string; // Source node ID
to: string; // Target node ID
condition?: string; // Optional condition expression
label?: string; // Display label for conditional edges
}
interface RetryPolicy {
maxRetries: number;
backoffType: "fixed" | "exponential";
backoffSeconds: number;
retryableErrors: string[];
}
```
---
### Module: `step-registry`
| Aspect | Specification |
|--------|---------------|
| **Responsibility** | Built-in + plugin-provided step types |
| **Dependencies** | `plugin-registry` |
| **Data Entities** | `StepType`, `StepSchema` |
**Built-in Step Types**:
| Step Type | Category | Description |
|-----------|----------|-------------|
| `approval` | Control | Wait for human approval |
| `security-gate` | Gate | Evaluate security policy |
| `custom-gate` | Gate | Custom OPA policy evaluation |
| `deploy-docker` | Deploy | Deploy single container |
| `deploy-compose` | Deploy | Deploy Docker Compose stack |
| `deploy-ecs` | Deploy | Deploy to AWS ECS |
| `deploy-nomad` | Deploy | Deploy to HashiCorp Nomad |
| `health-check` | Verify | HTTP/TCP health check |
| `smoke-test` | Verify | Run smoke test suite |
| `execute-script` | Custom | Run C#/Bash script |
| `webhook` | Integration | Call external webhook |
| `trigger-ci` | Integration | Trigger CI pipeline |
| `wait-ci` | Integration | Wait for CI pipeline |
| `notify` | Notification | Send notification |
| `rollback` | Recovery | Rollback deployment |
| `traffic-shift` | Progressive | Shift traffic percentage |
**Step Type Definition**:
```typescript
interface StepType {
type: string; // "deploy-compose"
displayName: string; // "Deploy Compose Stack"
description: string;
category: StepCategory;
icon: string;
// Schema
configSchema: JSONSchema; // Step configuration schema
inputSchema: JSONSchema; // Required inputs schema
outputSchema: JSONSchema; // Produced outputs schema
// Execution
executor: "builtin" | UUID; // builtin or plugin ID
defaultTimeout: number;
safeToRetry: boolean;
retryableErrors: string[];
// Documentation
documentation: string;
examples: StepExample[];
}
```
---
## Workflow Run State Machine
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ WORKFLOW RUN STATE MACHINE │
│ │
│ ┌──────────┐ │
│ │ CREATED │ │
│ └────┬─────┘ │
│ │ start() │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ │ │
│ pause() ┌──┴──────────┐ │ │
│ ┌────────►│ PAUSED │◄─────────┐ │ │
│ │ └──────┬──────┘ │ │ │
│ │ │ resume() │ │ │
│ │ ▼ │ │ │
│ │ ┌─────────────┐ │ │ │
│ └─────────│ RUNNING │──────────┘ │ │
│ └──────┬──────┘ (waiting for │ │
│ │ approval) │ │
│ ┌────────────┼────────────┐ │ │
│ │ │ │ │ │
│ ▼ ▼ ▼ │ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ SUCCEEDED │ │ FAILED │ │ CANCELLED │ │ │
│ └───────────┘ └───────────┘ └───────────┘ │ │
│ │
│ Transitions: │
│ - CREATED → RUNNING: start() │
│ - RUNNING → PAUSED: pause(), waiting approval │
│ - PAUSED → RUNNING: resume(), approval granted │
│ - RUNNING → SUCCEEDED: all nodes complete │
│ - RUNNING → FAILED: node fails with fail action │
│ - RUNNING → CANCELLED: cancel() │
│ - PAUSED → CANCELLED: cancel() │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Step Run State Machine
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP RUN STATE MACHINE │
│ │
│ ┌──────────┐ │
│ │ PENDING │ ◄──── Initial state; dependencies not met │
│ └────┬─────┘ │
│ │ dependencies met + condition true │
│ ▼ │
│ ┌──────────┐ │
│ │ RUNNING │ ◄──── Step is executing │
│ └────┬─────┘ │
│ │ │
│ ┌────┴────────────────┬─────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ SUCCEEDED │ │ FAILED │ │ SKIPPED │ │
│ └───────────┘ └─────┬─────┘ └───────────┘ │
│ │ ▲ │
│ │ │ condition false │
│ ▼ │ │
│ ┌───────────┐ │ │
│ │ RETRYING │──────┘ (max retries exceeded) │
│ └─────┬─────┘ │
│ │ │
│ │ retry attempt │
│ └──────────────────┐ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ RUNNING │ (retry) │
│ └──────────┘ │
│ │
│ Additional transitions: │
│ - Any state → CANCELLED: workflow cancelled │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## Database Schema
```sql
-- Workflow Templates
CREATE TABLE release.workflow_templates (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
display_name VARCHAR(255) NOT NULL,
description TEXT,
version INTEGER NOT NULL DEFAULT 1,
nodes JSONB NOT NULL,
edges JSONB NOT NULL,
inputs JSONB NOT NULL DEFAULT '[]',
outputs JSONB NOT NULL DEFAULT '[]',
tags JSONB NOT NULL DEFAULT '[]',
is_builtin BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by UUID REFERENCES users(id)
);
CREATE INDEX idx_workflow_templates_tenant ON release.workflow_templates(tenant_id);
CREATE INDEX idx_workflow_templates_name ON release.workflow_templates(name);
-- Workflow Runs
CREATE TABLE release.workflow_runs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
template_id UUID NOT NULL REFERENCES release.workflow_templates(id),
template_version INTEGER NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'created',
context JSONB NOT NULL,
inputs JSONB NOT NULL DEFAULT '{}',
outputs JSONB NOT NULL DEFAULT '{}',
error_message TEXT,
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by UUID REFERENCES users(id)
);
CREATE INDEX idx_workflow_runs_tenant ON release.workflow_runs(tenant_id);
CREATE INDEX idx_workflow_runs_template ON release.workflow_runs(template_id);
CREATE INDEX idx_workflow_runs_status ON release.workflow_runs(status);
-- Step Runs
CREATE TABLE release.step_runs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
workflow_run_id UUID NOT NULL REFERENCES release.workflow_runs(id) ON DELETE CASCADE,
node_id VARCHAR(255) NOT NULL,
status VARCHAR(50) NOT NULL DEFAULT 'pending',
inputs JSONB NOT NULL DEFAULT '{}',
outputs JSONB NOT NULL DEFAULT '{}',
error_message TEXT,
logs TEXT,
attempt_number INTEGER NOT NULL DEFAULT 1,
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
UNIQUE (workflow_run_id, node_id)
);
CREATE INDEX idx_step_runs_workflow ON release.step_runs(workflow_run_id);
CREATE INDEX idx_step_runs_status ON release.step_runs(status);
-- Step Registry
CREATE TABLE release.step_types (
type VARCHAR(255) PRIMARY KEY,
display_name VARCHAR(255) NOT NULL,
description TEXT,
category VARCHAR(100) NOT NULL,
icon VARCHAR(255),
config_schema JSONB NOT NULL,
input_schema JSONB NOT NULL,
output_schema JSONB NOT NULL,
executor VARCHAR(255) NOT NULL DEFAULT 'builtin',
default_timeout INTEGER NOT NULL DEFAULT 300,
safe_to_retry BOOLEAN NOT NULL DEFAULT FALSE,
retryable_errors JSONB NOT NULL DEFAULT '[]',
documentation TEXT,
examples JSONB NOT NULL DEFAULT '[]',
plugin_id UUID REFERENCES release.plugins(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_step_types_category ON release.step_types(category);
CREATE INDEX idx_step_types_plugin ON release.step_types(plugin_id);
```
---
## Workflow Template Example: Standard Deployment
```json
{
"id": "template-standard-deploy",
"name": "standard-deploy",
"displayName": "Standard Deployment",
"version": 1,
"inputs": [
{ "name": "releaseId", "type": "uuid", "required": true },
{ "name": "environmentId", "type": "uuid", "required": true },
{ "name": "promotionId", "type": "uuid", "required": true }
],
"nodes": [
{
"id": "approval",
"type": "approval",
"name": "Approval Gate",
"config": {},
"inputs": [
{ "name": "promotionId", "source": { "type": "context", "path": "promotionId" } }
],
"position": { "x": 100, "y": 100 }
},
{
"id": "security-gate",
"type": "security-gate",
"name": "Security Verification",
"config": {
"blockOnCritical": true,
"blockOnHigh": true
},
"inputs": [
{ "name": "releaseId", "source": { "type": "context", "path": "releaseId" } }
],
"position": { "x": 100, "y": 200 }
},
{
"id": "deploy-targets",
"type": "deploy-compose",
"name": "Deploy to Targets",
"config": {
"strategy": "rolling",
"parallelism": 2
},
"inputs": [
{ "name": "releaseId", "source": { "type": "context", "path": "releaseId" } },
{ "name": "environmentId", "source": { "type": "context", "path": "environmentId" } }
],
"timeout": 600,
"retryPolicy": {
"maxRetries": 2,
"backoffType": "exponential",
"backoffSeconds": 30
},
"onFailure": "rollback",
"position": { "x": 100, "y": 400 }
},
{
"id": "health-check",
"type": "health-check",
"name": "Health Verification",
"config": {
"type": "http",
"path": "/health",
"expectedStatus": 200,
"timeout": 30,
"retries": 5
},
"inputs": [
{ "name": "targets", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "deployedTargets" } }
],
"onFailure": "rollback",
"position": { "x": 100, "y": 500 }
},
{
"id": "notify-success",
"type": "notify",
"name": "Success Notification",
"config": {
"channel": "slack",
"template": "deployment-success"
},
"onFailure": "continue",
"position": { "x": 100, "y": 700 }
},
{
"id": "rollback-handler",
"type": "rollback",
"name": "Rollback Handler",
"config": {
"strategy": "to-previous"
},
"inputs": [
{ "name": "deploymentJobId", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "jobId" } }
],
"position": { "x": 300, "y": 450 }
}
],
"edges": [
{ "id": "e1", "from": "approval", "to": "security-gate" },
{ "id": "e2", "from": "security-gate", "to": "deploy-targets" },
{ "id": "e3", "from": "deploy-targets", "to": "health-check" },
{ "id": "e4", "from": "health-check", "to": "notify-success" },
{ "id": "e5", "from": "deploy-targets", "to": "rollback-handler", "condition": "status === 'failed'" },
{ "id": "e6", "from": "health-check", "to": "rollback-handler", "condition": "status === 'failed'" }
]
}
```
---
## API Endpoints
See [API Documentation](../api/workflows.md) for full specification.
```yaml
# Workflow Templates
POST /api/v1/workflow-templates
GET /api/v1/workflow-templates
GET /api/v1/workflow-templates/{id}
PUT /api/v1/workflow-templates/{id}
DELETE /api/v1/workflow-templates/{id}
POST /api/v1/workflow-templates/{id}/validate
# Step Registry
GET /api/v1/step-types
GET /api/v1/step-types/{type}
# Workflow Runs
POST /api/v1/workflow-runs
GET /api/v1/workflow-runs
GET /api/v1/workflow-runs/{id}
POST /api/v1/workflow-runs/{id}/pause
POST /api/v1/workflow-runs/{id}/resume
POST /api/v1/workflow-runs/{id}/cancel
GET /api/v1/workflow-runs/{id}/steps
GET /api/v1/workflow-runs/{id}/steps/{nodeId}
GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs
GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts
```
---
## References
- [Module Overview](overview.md)
- [Workflow Templates](../workflow/templates.md)
- [Execution State Machine](../workflow/execution.md)
- [API Documentation](../api/workflows.md)

View File

@@ -0,0 +1,246 @@
# Alerting Rules
> Prometheus alerting rules for the Release Orchestrator.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 13.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Metrics](metrics.md), [Observability Overview](overview.md)
## Overview
The Release Orchestrator provides Prometheus alerting rules for monitoring promotions, deployments, agents, and integrations.
---
## High Priority Alerts
### Security Gate Block Rate
```yaml
- alert: PromotionGateBlockRate
expr: |
rate(stella_security_gate_results_total{result="blocked"}[1h]) /
rate(stella_security_gate_results_total[1h]) > 0.5
for: 15m
labels:
severity: warning
annotations:
summary: "High rate of security gate blocks"
description: "More than 50% of promotions are being blocked by security gates"
```
### Deployment Failure Rate
```yaml
- alert: DeploymentFailureRate
expr: |
rate(stella_deployments_total{status="failed"}[1h]) /
rate(stella_deployments_total[1h]) > 0.1
for: 10m
labels:
severity: critical
annotations:
summary: "High deployment failure rate"
description: "More than 10% of deployments are failing"
```
### Agent Offline
```yaml
- alert: AgentOffline
expr: |
stella_agents_status{status="offline"} == 1
for: 5m
labels:
severity: warning
annotations:
summary: "Agent offline"
description: "Agent {{ $labels.agent_id }} has been offline for 5 minutes"
```
### Promotion Stuck
```yaml
- alert: PromotionStuck
expr: |
time() - stella_promotion_start_time{status="deploying"} > 1800
for: 5m
labels:
severity: warning
annotations:
summary: "Promotion stuck in deploying state"
description: "Promotion {{ $labels.promotion_id }} has been deploying for more than 30 minutes"
```
### Integration Unhealthy
```yaml
- alert: IntegrationUnhealthy
expr: |
stella_integration_health{status="unhealthy"} == 1
for: 10m
labels:
severity: warning
annotations:
summary: "Integration unhealthy"
description: "Integration {{ $labels.integration_name }} has been unhealthy for 10 minutes"
```
---
## Medium Priority Alerts
### Workflow Step Timeout
```yaml
- alert: WorkflowStepTimeout
expr: |
stella_workflow_step_duration_seconds > 600
for: 1m
labels:
severity: warning
annotations:
summary: "Workflow step taking too long"
description: "Step {{ $labels.step_type }} in workflow {{ $labels.workflow_run_id }} has been running for more than 10 minutes"
```
### Evidence Generation Failure
```yaml
- alert: EvidenceGenerationFailure
expr: |
rate(stella_evidence_generation_failures_total[1h]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Evidence generation failures"
description: "Evidence generation is failing, affecting audit compliance"
```
### Target Health Degraded
```yaml
- alert: TargetHealthDegraded
expr: |
stella_target_health{status!="healthy"} == 1
for: 5m
labels:
severity: warning
annotations:
summary: "Target health degraded"
description: "Target {{ $labels.target_name }} is reporting {{ $labels.status }}"
```
### Approval Timeout
```yaml
- alert: ApprovalTimeout
expr: |
time() - stella_promotion_approval_requested_time > 86400
for: 1h
labels:
severity: warning
annotations:
summary: "Promotion awaiting approval for too long"
description: "Promotion {{ $labels.promotion_id }} has been waiting for approval for more than 24 hours"
```
---
## Low Priority Alerts
### Database Connection Pool
```yaml
- alert: DatabaseConnectionPoolExhausted
expr: |
stella_db_connection_pool_available < 5
for: 5m
labels:
severity: warning
annotations:
summary: "Database connection pool running low"
description: "Only {{ $value }} database connections available"
```
### Plugin Error Rate
```yaml
- alert: PluginErrorRate
expr: |
rate(stella_plugin_errors_total[5m]) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "Plugin errors detected"
description: "Plugin {{ $labels.plugin_id }} is experiencing errors"
```
---
## Alert Routing
### Example AlertManager Configuration
```yaml
# alertmanager.yaml
route:
receiver: default
group_by: [alertname, severity]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: pagerduty
continue: true
- match:
severity: warning
receiver: slack
receivers:
- name: default
webhook_configs:
- url: http://webhook.example.com/alerts
- name: pagerduty
pagerduty_configs:
- service_key: ${PAGERDUTY_KEY}
severity: critical
- name: slack
slack_configs:
- channel: '#alerts'
api_url: ${SLACK_WEBHOOK_URL}
title: '{{ .CommonAnnotations.summary }}'
text: '{{ .CommonAnnotations.description }}'
```
---
## Dashboard Integration
### Grafana Alert Panels
Recommended dashboard panels for alerts:
| Panel | Query |
|-------|-------|
| Active Alerts | `count(ALERTS{alertstate="firing"})` |
| Alert History | `count_over_time(ALERTS{alertstate="firing"}[24h])` |
| By Severity | `count(ALERTS{alertstate="firing"}) by (severity)` |
| By Component | `count(ALERTS{alertstate="firing"}) by (alertname)` |
---
## See Also
- [Metrics](metrics.md)
- [Observability Overview](overview.md)
- [Logging](logging.md)
- [Tracing](tracing.md)

View File

@@ -0,0 +1,220 @@
# Logging Specification
> Structured logging format and categories for the Release Orchestrator.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 13.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Observability Overview](overview.md), [Tracing](tracing.md)
## Overview
The Release Orchestrator uses structured JSON logging with consistent format, correlation IDs, and context propagation for all components.
---
## Structured Log Format
### JSON Schema
```json
{
"timestamp": "2026-01-09T14:32:15.123Z",
"level": "info",
"module": "promotion-manager",
"message": "Promotion approved",
"context": {
"tenant_id": "uuid",
"promotion_id": "uuid",
"release_id": "uuid",
"environment": "prod",
"user_id": "uuid"
},
"details": {
"approvals_count": 2,
"gates_passed": ["security", "approval", "freeze"],
"decision": "allow"
},
"trace_id": "abc123",
"span_id": "def456",
"duration_ms": 45
}
```
---
## Log Levels
| Level | Usage |
|-------|-------|
| `error` | Errors requiring attention; failures that impact functionality |
| `warn` | Potential issues; degraded functionality; approaching limits |
| `info` | Significant events; state changes; audit-relevant actions |
| `debug` | Detailed debugging info; request/response bodies |
| `trace` | Very detailed tracing; internal state; performance profiling |
---
## Log Categories
| Category | Examples |
|----------|----------|
| `api` | Request received, response sent, validation errors |
| `promotion` | Promotion requested, approved, rejected, completed |
| `deployment` | Deployment started, task assigned, completed, failed |
| `security` | Gate evaluation, vulnerability found, policy violation |
| `agent` | Agent registered, heartbeat, task execution |
| `workflow` | Workflow started, step executed, completed |
| `integration` | Integration tested, resource discovered, webhook received |
---
## Logging Examples
### API Request
```json
{
"timestamp": "2026-01-09T14:32:15.123Z",
"level": "info",
"module": "api",
"message": "Request received",
"context": {
"tenant_id": "uuid",
"user_id": "uuid"
},
"details": {
"method": "POST",
"path": "/api/v1/promotions",
"status": 201,
"duration_ms": 125
},
"trace_id": "abc123",
"span_id": "def456"
}
```
### Promotion Event
```json
{
"timestamp": "2026-01-09T14:32:15.123Z",
"level": "info",
"module": "promotion-manager",
"message": "Promotion approved",
"context": {
"tenant_id": "uuid",
"promotion_id": "uuid",
"release_id": "uuid",
"environment": "prod",
"user_id": "uuid"
},
"details": {
"approvals_count": 2,
"gates_passed": ["security", "approval", "freeze"],
"decision": "allow"
},
"trace_id": "abc123",
"span_id": "def456",
"duration_ms": 45
}
```
### Security Gate Failure
```json
{
"timestamp": "2026-01-09T14:32:15.123Z",
"level": "warn",
"module": "security",
"message": "Security gate blocked promotion",
"context": {
"tenant_id": "uuid",
"promotion_id": "uuid",
"release_id": "uuid",
"environment": "prod"
},
"details": {
"gate_name": "security-gate",
"reason": "Critical vulnerability found",
"vulnerabilities": {
"critical": 1,
"high": 3
}
},
"trace_id": "abc123",
"span_id": "def456"
}
```
---
## Sensitive Data Masking
The following fields are automatically masked in logs:
| Field Type | Masking Strategy |
|------------|------------------|
| Passwords | Not logged |
| API Keys | First 4 and last 4 chars only |
| Tokens | Hash only |
| PII | Redacted |
| Credentials | Not logged |
### Example
```json
{
"message": "Authentication succeeded",
"details": {
"api_key": "sk_l...abcd",
"token_hash": "sha256:abc123..."
}
}
```
---
## Correlation IDs
All logs include correlation IDs for request tracing:
| Field | Description |
|-------|-------------|
| `trace_id` | W3C Trace Context trace ID |
| `span_id` | Current operation span ID |
| `correlation_id` | Business-level correlation (optional) |
---
## Log Aggregation
Recommended log aggregation setup:
```yaml
# Fluent Bit configuration
[INPUT]
Name tail
Path /var/log/stella/*.log
Parser json
[FILTER]
Name nest
Match *
Operation lift
Nested_under context
[OUTPUT]
Name opensearch
Match *
Host opensearch.example.com
Index stella-logs
```
---
## See Also
- [Observability Overview](overview.md)
- [Tracing](tracing.md)
- [Alerting](alerting.md)
- [Security Overview](../security/overview.md)

View File

@@ -0,0 +1,274 @@
# Metrics Specification
## Overview
Release Orchestrator exposes Prometheus-compatible metrics for monitoring deployment health, performance, and operational status.
## Core Metrics
### Release Metrics
| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `stella_releases_total` | counter | Total releases created | `tenant`, `status` |
| `stella_releases_active` | gauge | Currently active releases | `tenant`, `status` |
| `stella_release_components_count` | histogram | Components per release | `tenant` |
### Promotion Metrics
| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `stella_promotions_total` | counter | Total promotions | `tenant`, `env`, `status` |
| `stella_promotions_in_progress` | gauge | Promotions currently in progress | `tenant`, `env` |
| `stella_promotion_duration_seconds` | histogram | Time from request to completion | `tenant`, `env`, `status` |
| `stella_approval_pending_count` | gauge | Pending approvals | `tenant`, `env` |
| `stella_approval_duration_seconds` | histogram | Time to approve | `tenant`, `env` |
### Deployment Metrics
| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `stella_deployments_total` | counter | Total deployments | `tenant`, `env`, `strategy`, `status` |
| `stella_deployment_duration_seconds` | histogram | Deployment duration | `tenant`, `env`, `strategy` |
| `stella_deployment_tasks_total` | counter | Total deployment tasks | `tenant`, `status` |
| `stella_deployment_task_duration_seconds` | histogram | Task duration | `target_type` |
| `stella_rollbacks_total` | counter | Total rollbacks | `tenant`, `env`, `reason` |
### Agent Metrics
| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `stella_agents_connected` | gauge | Connected agents | `tenant` |
| `stella_agents_by_status` | gauge | Agents by status | `tenant`, `status` |
| `stella_agent_tasks_total` | counter | Tasks executed by agents | `agent`, `type`, `status` |
| `stella_agent_task_duration_seconds` | histogram | Agent task duration | `agent`, `type` |
| `stella_agent_heartbeat_age_seconds` | gauge | Seconds since last heartbeat | `agent` |
| `stella_agent_resource_cpu_percent` | gauge | Agent CPU usage | `agent` |
| `stella_agent_resource_memory_percent` | gauge | Agent memory usage | `agent` |
### Workflow Metrics
| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `stella_workflow_runs_total` | counter | Workflow executions | `tenant`, `template`, `status` |
| `stella_workflow_runs_active` | gauge | Currently running workflows | `tenant`, `template` |
| `stella_workflow_duration_seconds` | histogram | Workflow duration | `template`, `status` |
| `stella_workflow_step_duration_seconds` | histogram | Step execution time | `step_type`, `status` |
| `stella_workflow_step_retries_total` | counter | Step retry count | `step_type` |
### Target Metrics
| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `stella_targets_total` | gauge | Total targets | `tenant`, `env`, `type` |
| `stella_targets_by_health` | gauge | Targets by health status | `tenant`, `env`, `health` |
| `stella_target_drift_detected` | gauge | Targets with drift | `tenant`, `env` |
### Integration Metrics
| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `stella_integrations_total` | gauge | Configured integrations | `tenant`, `type` |
| `stella_integration_health` | gauge | Integration health (1=healthy) | `tenant`, `integration` |
| `stella_integration_requests_total` | counter | Requests to integrations | `integration`, `operation`, `status` |
| `stella_integration_latency_seconds` | histogram | Integration request latency | `integration`, `operation` |
### Gate Metrics
| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `stella_gate_evaluations_total` | counter | Gate evaluations | `tenant`, `gate_type`, `result` |
| `stella_gate_evaluation_duration_seconds` | histogram | Gate evaluation time | `gate_type` |
| `stella_gate_blocks_total` | counter | Blocked promotions by gate | `tenant`, `gate_type`, `env` |
## API Metrics
| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `stella_http_requests_total` | counter | HTTP requests | `method`, `path`, `status` |
| `stella_http_request_duration_seconds` | histogram | Request latency | `method`, `path` |
| `stella_http_requests_in_flight` | gauge | Active requests | `method` |
| `stella_http_request_size_bytes` | histogram | Request size | `method`, `path` |
| `stella_http_response_size_bytes` | histogram | Response size | `method`, `path` |
## Evidence Metrics
| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `stella_evidence_packets_total` | counter | Evidence packets generated | `tenant`, `type` |
| `stella_evidence_packet_size_bytes` | histogram | Evidence packet size | `type` |
| `stella_evidence_verification_total` | counter | Evidence verifications | `result` |
## Prometheus Configuration
```yaml
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'stella-orchestrator'
static_configs:
- targets: ['stella-orchestrator:9090']
metrics_path: /metrics
scheme: https
tls_config:
ca_file: /etc/prometheus/ca.crt
- job_name: 'stella-agents'
kubernetes_sd_configs:
- role: pod
selectors:
- role: pod
label: "app.kubernetes.io/name=stella-agent"
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_agent_id]
target_label: agent_id
```
## Histogram Buckets
### Duration Buckets (seconds)
```yaml
# Short operations (API calls, gate evaluations)
short_duration_buckets: [0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
# Medium operations (workflow steps)
medium_duration_buckets: [0.1, 0.5, 1, 2.5, 5, 10, 30, 60, 120, 300]
# Long operations (deployments)
long_duration_buckets: [1, 5, 10, 30, 60, 120, 300, 600, 1200, 3600]
```
### Size Buckets (bytes)
```yaml
# Request/response sizes
size_buckets: [100, 1000, 10000, 100000, 1000000, 10000000]
# Evidence packet sizes
evidence_buckets: [1000, 10000, 100000, 500000, 1000000, 5000000]
```
## SLI Definitions
### Availability SLI
```promql
# API availability (99.9% target)
sum(rate(stella_http_requests_total{status!~"5.."}[5m]))
/
sum(rate(stella_http_requests_total[5m]))
```
### Latency SLI
```promql
# API latency P99 < 500ms
histogram_quantile(0.99,
sum(rate(stella_http_request_duration_seconds_bucket[5m])) by (le)
)
```
### Deployment Success SLI
```promql
# Deployment success rate (99% target)
sum(rate(stella_deployments_total{status="succeeded"}[24h]))
/
sum(rate(stella_deployments_total[24h]))
```
## Alert Rules
```yaml
groups:
- name: stella-orchestrator
rules:
- alert: HighDeploymentFailureRate
expr: |
sum(rate(stella_deployments_total{status="failed"}[1h]))
/
sum(rate(stella_deployments_total[1h])) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: High deployment failure rate
description: More than 10% of deployments failing in the last hour
- alert: AgentOffline
expr: stella_agent_heartbeat_age_seconds > 120
for: 2m
labels:
severity: warning
annotations:
summary: Agent {{ $labels.agent }} offline
description: Agent has not sent heartbeat for > 2 minutes
- alert: PendingApprovalsStale
expr: |
stella_approval_pending_count > 0
and
time() - stella_promotion_request_timestamp > 3600
for: 5m
labels:
severity: warning
annotations:
summary: Stale pending approvals
description: Approvals pending for more than 1 hour
- alert: IntegrationUnhealthy
expr: stella_integration_health == 0
for: 5m
labels:
severity: warning
annotations:
summary: Integration {{ $labels.integration }} unhealthy
description: Integration health check failing
- alert: HighAPILatency
expr: |
histogram_quantile(0.99,
sum(rate(stella_http_request_duration_seconds_bucket[5m])) by (le, path)
) > 1
for: 5m
labels:
severity: warning
annotations:
summary: High API latency on {{ $labels.path }}
description: P99 latency exceeds 1 second
```
## Grafana Dashboards
### Main Dashboard Panels
1. **Deployment Pipeline Overview**
- Promotions per environment (time series)
- Success/failure rates (gauge)
- Active deployments (stat)
2. **Agent Health**
- Connected agents (stat)
- Agent status distribution (pie chart)
- Heartbeat age (table)
3. **Gate Performance**
- Gate evaluation counts (bar chart)
- Block rate by gate type (time series)
- Evaluation latency (heatmap)
4. **API Performance**
- Request rate (time series)
- Error rate (time series)
- Latency distribution (heatmap)
## References
- [Operations Overview](overview.md)
- [Logging](logging.md)
- [Tracing](tracing.md)
- [Alerting](alerting.md)

View File

@@ -0,0 +1,508 @@
# Operations Overview
## Observability Stack
Release Orchestrator provides comprehensive observability through metrics, logging, and distributed tracing.
```
OBSERVABILITY ARCHITECTURE
┌─────────────────────────────────────────────────────────────────────────────┐
│ RELEASE ORCHESTRATOR │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Metrics │ │ Logs │ │ Traces │ │ Events │ │
│ │ Exporter │ │ Collector │ │ Exporter │ │ Publisher │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │ │
└─────────┼────────────────┼────────────────┼────────────────┼────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ OBSERVABILITY BACKENDS │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Prometheus │ │ Loki / │ │ Jaeger / │ │ Event │ │
│ │ / Mimir │ │ Elasticsearch│ │ Tempo │ │ Bus │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │ │
│ └────────────────┴────────────────┴────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Grafana │ │
│ │ Dashboards │ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Metrics
### Core Metrics
| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `stella_releases_total` | counter | Total releases created | `tenant`, `status` |
| `stella_promotions_total` | counter | Total promotions | `tenant`, `env`, `status` |
| `stella_deployments_total` | counter | Total deployments | `tenant`, `env`, `strategy` |
| `stella_deployment_duration_seconds` | histogram | Deployment duration | `tenant`, `env`, `strategy` |
| `stella_rollbacks_total` | counter | Total rollbacks | `tenant`, `env`, `reason` |
| `stella_agents_connected` | gauge | Connected agents | `tenant` |
| `stella_targets_total` | gauge | Total targets | `tenant`, `env`, `type` |
| `stella_workflow_runs_total` | counter | Workflow executions | `tenant`, `template`, `status` |
| `stella_workflow_step_duration_seconds` | histogram | Step execution time | `step_type` |
| `stella_approval_pending_count` | gauge | Pending approvals | `tenant`, `env` |
| `stella_approval_duration_seconds` | histogram | Time to approve | `tenant`, `env` |
### API Metrics
| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `stella_http_requests_total` | counter | HTTP requests | `method`, `path`, `status` |
| `stella_http_request_duration_seconds` | histogram | Request latency | `method`, `path` |
| `stella_http_requests_in_flight` | gauge | Active requests | `method` |
### Agent Metrics
| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `stella_agent_tasks_total` | counter | Tasks executed | `agent`, `type`, `status` |
| `stella_agent_task_duration_seconds` | histogram | Task duration | `agent`, `type` |
| `stella_agent_heartbeat_age_seconds` | gauge | Since last heartbeat | `agent` |
### Prometheus Configuration
```yaml
# prometheus.yml
scrape_configs:
- job_name: 'stella-orchestrator'
static_configs:
- targets: ['stella-orchestrator:9090']
metrics_path: /metrics
scheme: https
tls_config:
ca_file: /etc/prometheus/ca.crt
- job_name: 'stella-agents'
kubernetes_sd_configs:
- role: pod
selectors:
- role: pod
label: "app.kubernetes.io/name=stella-agent"
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_agent_id]
target_label: agent_id
```
## Logging
### Log Format
```json
{
"timestamp": "2026-01-09T10:30:00.123Z",
"level": "info",
"message": "Deployment started",
"service": "deploy-orchestrator",
"version": "1.0.0",
"traceId": "abc123def456",
"spanId": "789ghi",
"tenantId": "tenant-uuid",
"correlationId": "corr-uuid",
"context": {
"deploymentJobId": "job-uuid",
"releaseId": "release-uuid",
"environmentId": "env-uuid"
}
}
```
### Log Levels
| Level | Usage |
|-------|-------|
| `error` | Failures requiring attention |
| `warn` | Degraded operation, recoverable issues |
| `info` | Business events (deployment started, approval granted) |
| `debug` | Detailed operational info |
| `trace` | Very detailed debugging |
### Structured Logging Configuration
```typescript
// Logging configuration
const loggerConfig = {
level: process.env.LOG_LEVEL || 'info',
format: 'json',
outputs: [
{
type: 'stdout',
format: 'json'
},
{
type: 'file',
path: '/var/log/stella/orchestrator.log',
rotation: {
maxSize: '100MB',
maxFiles: 10
}
}
],
// Sensitive field masking
redact: [
'password',
'token',
'secret',
'credentials',
'authorization'
]
};
```
### Important Log Events
| Event | Level | Description |
|-------|-------|-------------|
| `deployment.started` | info | Deployment job started |
| `deployment.completed` | info | Deployment successful |
| `deployment.failed` | error | Deployment failed |
| `rollback.initiated` | warn | Rollback triggered |
| `approval.granted` | info | Promotion approved |
| `approval.denied` | info | Promotion rejected |
| `agent.connected` | info | Agent came online |
| `agent.disconnected` | warn | Agent went offline |
| `security.gate.failed` | warn | Security check blocked |
## Distributed Tracing
### Trace Context Propagation
```typescript
// Trace context in requests
interface TraceContext {
traceId: string;
spanId: string;
parentSpanId?: string;
sampled: boolean;
baggage?: Record<string, string>;
}
// W3C Trace Context headers
// traceparent: 00-{traceId}-{spanId}-{flags}
// tracestate: stella=...
// Example trace propagation
class TracingMiddleware {
handle(req: Request, res: Response, next: NextFunction): void {
const traceparent = req.headers['traceparent'];
const traceContext = this.parseTraceParent(traceparent);
// Start span for this request
const span = this.tracer.startSpan('http.request', {
parent: traceContext,
attributes: {
'http.method': req.method,
'http.url': req.url,
'http.user_agent': req.headers['user-agent'],
'tenant.id': req.tenantId
}
});
// Attach to request for downstream use
req.span = span;
res.on('finish', () => {
span.setAttribute('http.status_code', res.statusCode);
span.end();
});
next();
}
}
```
### Key Spans
| Span Name | Description | Attributes |
|-----------|-------------|------------|
| `deployment.execute` | Full deployment | `release_id`, `environment` |
| `task.dispatch` | Task dispatch to agent | `target_id`, `agent_id` |
| `agent.execute` | Agent task execution | `task_type`, `duration` |
| `workflow.run` | Workflow execution | `template_id`, `status` |
| `workflow.step` | Individual step | `step_type`, `node_id` |
| `approval.wait` | Waiting for approval | `promotion_id`, `duration` |
| `gate.evaluate` | Gate evaluation | `gate_type`, `result` |
### Jaeger Configuration
```yaml
# jaeger-config.yaml
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: stella-jaeger
spec:
strategy: production
collector:
maxReplicas: 5
storage:
type: elasticsearch
options:
es:
server-urls: https://elasticsearch:9200
secretName: jaeger-es-secret
ingress:
enabled: true
```
## Alerting
### Alert Rules
```yaml
# prometheus-rules.yaml
groups:
- name: stella.deployment
rules:
- alert: DeploymentFailureRateHigh
expr: |
sum(rate(stella_deployments_total{status="failed"}[5m])) /
sum(rate(stella_deployments_total[5m])) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "High deployment failure rate"
description: "More than 10% of deployments are failing"
- alert: DeploymentDurationHigh
expr: |
histogram_quantile(0.95, sum(rate(stella_deployment_duration_seconds_bucket[5m])) by (le, tenant)) > 600
for: 10m
labels:
severity: warning
annotations:
summary: "Deployment duration high"
description: "P95 deployment duration exceeds 10 minutes"
- alert: RollbackRateHigh
expr: |
sum(rate(stella_rollbacks_total[1h])) > 3
for: 5m
labels:
severity: warning
annotations:
summary: "High rollback rate"
description: "More than 3 rollbacks in the last hour"
- name: stella.agents
rules:
- alert: AgentOffline
expr: |
stella_agent_heartbeat_age_seconds > 120
for: 2m
labels:
severity: critical
annotations:
summary: "Agent offline"
description: "Agent {{ $labels.agent }} has not sent heartbeat for 2 minutes"
- alert: AgentPoolLow
expr: |
count(stella_agents_connected{status="online"}) by (tenant) < 2
for: 5m
labels:
severity: warning
annotations:
summary: "Low agent count"
description: "Fewer than 2 agents online for tenant {{ $labels.tenant }}"
- name: stella.approvals
rules:
- alert: ApprovalBacklogHigh
expr: |
stella_approval_pending_count > 10
for: 1h
labels:
severity: warning
annotations:
summary: "Approval backlog growing"
description: "More than 10 pending approvals for over an hour"
- alert: ApprovalWaitLong
expr: |
histogram_quantile(0.90, stella_approval_duration_seconds_bucket) > 86400
for: 1h
labels:
severity: info
annotations:
summary: "Long approval wait times"
description: "P90 approval wait time exceeds 24 hours"
```
### PagerDuty Integration
```typescript
interface AlertManagerConfig {
receivers: [
{
name: "stella-critical",
pagerduty_configs: [
{
service_key: "${PAGERDUTY_SERVICE_KEY}",
severity: "critical"
}
]
},
{
name: "stella-warning",
slack_configs: [
{
api_url: "${SLACK_WEBHOOK_URL}",
channel: "#stella-alerts",
send_resolved: true
}
]
}
],
route: {
receiver: "stella-warning",
routes: [
{
match: { severity: "critical" },
receiver: "stella-critical"
}
]
}
}
```
## Dashboards
### Deployment Dashboard
Key panels:
- Deployment rate over time
- Success/failure ratio
- Average deployment duration
- Deployment duration histogram
- Active deployments by environment
- Recent deployment list
### Agent Health Dashboard
Key panels:
- Connected agents count
- Agent heartbeat status
- Tasks per agent
- Task success rate by agent
- Agent resource utilization
### Approval Dashboard
Key panels:
- Pending approvals count
- Approval response time
- Approvals by user
- Rejection reasons breakdown
## Health Endpoints
### Application Health
```http
GET /health
```
Response:
```json
{
"status": "healthy",
"version": "1.0.0",
"uptime": 86400,
"checks": {
"database": { "status": "healthy", "latency": 5 },
"redis": { "status": "healthy", "latency": 2 },
"vault": { "status": "healthy", "latency": 10 }
}
}
```
### Readiness Probe
```http
GET /health/ready
```
### Liveness Probe
```http
GET /health/live
```
## Performance Tuning
### Database Connection Pool
```typescript
const poolConfig = {
min: 5,
max: 20,
acquireTimeout: 30000,
idleTimeout: 600000,
connectionTimeout: 10000
};
```
### Cache Configuration
```typescript
const cacheConfig = {
// Release cache
releases: {
ttl: 300, // 5 minutes
maxSize: 1000
},
// Target cache
targets: {
ttl: 60, // 1 minute
maxSize: 5000
},
// Workflow template cache
templates: {
ttl: 3600, // 1 hour
maxSize: 100
}
};
```
### Rate Limiting
```typescript
const rateLimitConfig = {
// API rate limits
api: {
windowMs: 60000, // 1 minute
max: 1000, // requests per window
burst: 100 // burst allowance
},
// Webhook rate limits
webhooks: {
windowMs: 60000,
max: 100
},
// Per-tenant limits
tenant: {
windowMs: 60000,
max: 500
}
};
```
## References
- [Metrics Reference](metrics.md)
- [Logging Guide](logging.md)
- [Tracing Setup](tracing.md)
- [Alert Configuration](alerting.md)

View File

@@ -0,0 +1,222 @@
# Distributed Tracing Specification
> OpenTelemetry-based distributed tracing for the Release Orchestrator.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 13.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Observability Overview](overview.md), [Logging](logging.md)
## Overview
The Release Orchestrator uses OpenTelemetry for distributed tracing, enabling end-to-end visibility of promotion workflows, deployments, and agent tasks.
---
## Trace Context Propagation
### W3C Trace Context
```typescript
// Trace context structure
interface TraceContext {
traceId: string; // 32-char hex
spanId: string; // 16-char hex
parentSpanId?: string;
sampled: boolean;
baggage: Record<string, string>;
}
// Propagation headers
const TRACE_HEADERS = {
W3C_TRACEPARENT: "traceparent",
W3C_TRACESTATE: "tracestate",
BAGGAGE: "baggage",
};
// Example traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
```
### Header Format
```
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
^ ^ ^ ^
| | | |
| trace-id (32 hex) span-id (16 hex) flags
version
```
---
## Key Traces
| Operation | Span Name | Attributes |
|-----------|-----------|------------|
| Promotion request | `promotion.request` | promotion_id, release_id, environment |
| Gate evaluation | `promotion.evaluate_gates` | gate_names, result |
| Workflow execution | `workflow.execute` | workflow_run_id, template_name |
| Step execution | `workflow.step.{type}` | step_run_id, node_id, inputs |
| Deployment job | `deployment.execute` | job_id, environment, strategy |
| Agent task | `agent.task.{type}` | task_id, agent_id, target_id |
| Plugin call | `plugin.{method}` | plugin_id, method, duration |
---
## Trace Hierarchy
### Promotion Flow
```
promotion.request (root)
+-- promotion.evaluate_gates
| +-- gate.security
| +-- gate.approval
| +-- gate.freeze_window
|
+-- workflow.execute
| +-- workflow.step.security-check
| +-- workflow.step.approval
| +-- workflow.step.deploy
| +-- deployment.execute
| +-- deployment.assign_tasks
| +-- agent.task.pull
| +-- agent.task.deploy
| +-- agent.task.health_check
|
+-- evidence.generate
+-- evidence.sign
```
---
## Span Attributes
### Common Attributes
| Attribute | Type | Description |
|-----------|------|-------------|
| `tenant.id` | string | Tenant UUID |
| `user.id` | string | User UUID (if authenticated) |
| `release.id` | string | Release UUID |
| `environment.name` | string | Environment name |
| `error` | boolean | Whether error occurred |
| `error.type` | string | Error type/class |
### Promotion Attributes
| Attribute | Type | Description |
|-----------|------|-------------|
| `promotion.id` | string | Promotion UUID |
| `promotion.status` | string | Current status |
| `promotion.gates` | string[] | Gates evaluated |
| `promotion.decision` | string | allow/deny |
### Deployment Attributes
| Attribute | Type | Description |
|-----------|------|-------------|
| `deployment.job_id` | string | Deployment job UUID |
| `deployment.strategy` | string | Deployment strategy |
| `deployment.target_count` | int | Number of targets |
| `deployment.batch_size` | int | Batch size |
### Agent Task Attributes
| Attribute | Type | Description |
|-----------|------|-------------|
| `task.id` | string | Task UUID |
| `task.type` | string | Task type |
| `agent.id` | string | Agent UUID |
| `target.id` | string | Target UUID |
---
## OpenTelemetry Configuration
### SDK Configuration
```yaml
# otel-config.yaml
service:
name: stella-release-orchestrator
version: ${VERSION}
exporters:
otlp:
endpoint: otel-collector:4317
protocol: grpc
processors:
batch:
timeout: 10s
send_batch_size: 1024
resource:
attributes:
- key: service.namespace
value: stella-ops
- key: deployment.environment
value: ${ENVIRONMENT}
```
### Environment Variables
```bash
OTEL_SERVICE_NAME=stella-release-orchestrator
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1
```
---
## Sampling Strategy
| Environment | Sampling Rate | Reason |
|-------------|---------------|--------|
| Development | 100% | Full visibility |
| Staging | 100% | Full visibility |
| Production | 10% | Cost/performance |
| Production (errors) | 100% | Always sample errors |
---
## Example Trace
```json
{
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
"spans": [
{
"spanId": "00f067aa0ba902b7",
"name": "promotion.request",
"duration_ms": 5234,
"attributes": {
"promotion.id": "promo-123",
"release.id": "rel-456",
"environment.name": "production"
}
},
{
"spanId": "00f067aa0ba902b8",
"parentSpanId": "00f067aa0ba902b7",
"name": "gate.security",
"duration_ms": 234,
"attributes": {
"gate.result": "passed",
"vulnerabilities.critical": 0
}
}
]
}
```
---
## See Also
- [Observability Overview](overview.md)
- [Logging](logging.md)
- [Metrics](metrics.md)
- [Alerting](alerting.md)

View File

@@ -0,0 +1,266 @@
# A/B Release Models
> Two models for A/B releases: target-group based and router-based traffic splitting.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 11.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Traffic Router](routers.md)
**Sprint:** [110_001 A/B Release Manager](../../../../implplan/SPRINT_20260110_110_001_PROGDL_ab_release_manager.md)
## Overview
Stella Ops supports two distinct models for A/B releases:
1. **Target-Group A/B:** Scale different target groups to shift workload
2. **Router-Based A/B:** Use traffic routers to split requests between variations
Each model has different use cases, trade-offs, and implementation requirements.
---
## Model 1: Target-Group A/B
Target-group A/B splits traffic by scaling different groups of targets. Suitable for worker services, background processors, and scenarios where sticky sessions are not required.
### Configuration
```typescript
interface TargetGroupABConfig {
type: "target-group";
// Group definitions
groupA: {
targetGroupId: UUID;
labels?: Record<string, string>;
};
groupB: {
targetGroupId: UUID;
labels?: Record<string, string>;
};
// Rollout by scaling groups
rolloutStrategy: {
type: "scale-groups";
stages: ScaleStage[];
};
}
interface ScaleStage {
name: string;
groupAPercentage: number; // Percentage of group A targets active
groupBPercentage: number; // Percentage of group B targets active
duration?: number; // Auto-advance after duration (seconds)
healthThreshold?: number; // Required health % to advance
requireApproval?: boolean;
}
```
### Example: Worker Service Canary
```typescript
const workerCanaryConfig: TargetGroupABConfig = {
type: "target-group",
groupA: { labels: { "worker-group": "A" } },
groupB: { labels: { "worker-group": "B" } },
rolloutStrategy: {
type: "scale-groups",
stages: [
// Stage 1: 100% A, 10% B (canary)
{ name: "canary", groupAPercentage: 100, groupBPercentage: 10,
duration: 300, healthThreshold: 95 },
// Stage 2: 100% A, 50% B
{ name: "expand", groupAPercentage: 100, groupBPercentage: 50,
duration: 600, healthThreshold: 95 },
// Stage 3: 50% A, 100% B
{ name: "shift", groupAPercentage: 50, groupBPercentage: 100,
duration: 600, healthThreshold: 95 },
// Stage 4: 0% A, 100% B (complete)
{ name: "complete", groupAPercentage: 0, groupBPercentage: 100,
requireApproval: true },
],
},
};
```
### Use Cases
- Background job processors
- Worker services without external traffic
- Infrastructure-level splitting
- Static traffic distribution
- Hardware-based variants
---
## Model 2: Router-Based A/B
Router-based A/B uses traffic routers (Nginx, HAProxy, ALB) to split incoming requests between variations. Suitable for APIs, web services, and scenarios requiring sticky sessions.
### Configuration
```typescript
interface RouterBasedABConfig {
type: "router-based";
// Router integration
routerIntegrationId: UUID;
// Upstream configuration
upstreamName: string;
variationA: {
targets: string[];
serviceName?: string;
};
variationB: {
targets: string[];
serviceName?: string;
};
// Traffic split configuration
trafficSplit: TrafficSplitConfig;
// Rollout strategy
rolloutStrategy: RouterRolloutStrategy;
}
interface TrafficSplitConfig {
type: "weight" | "header" | "cookie" | "tenant" | "composite";
// Weight-based (percentage)
weights?: { A: number; B: number };
// Header-based
headerName?: string;
headerValueA?: string;
headerValueB?: string;
// Cookie-based
cookieName?: string;
cookieValueA?: string;
cookieValueB?: string;
// Tenant-based (by host/path)
tenantRules?: TenantRule[];
}
```
### Rollout Strategy
```typescript
interface RouterRolloutStrategy {
type: "manual" | "time-based" | "health-based" | "composite";
stages: RouterRolloutStage[];
}
interface RouterRolloutStage {
name: string;
trafficPercentageB: number; // % of traffic to variation B
// Advancement criteria
duration?: number; // Auto-advance after duration
healthThreshold?: number; // Required health %
errorRateThreshold?: number; // Max error rate %
latencyThreshold?: number; // Max p99 latency ms
requireApproval?: boolean;
// Optional: specific routing rules for this stage
routingOverrides?: TrafficSplitConfig;
}
```
### Example: API Canary with Health-Based Advancement
```typescript
const apiCanaryConfig: RouterBasedABConfig = {
type: "router-based",
routerIntegrationId: "nginx-prod",
upstreamName: "api-backend",
variationA: { serviceName: "api-v1" },
variationB: { serviceName: "api-v2" },
trafficSplit: { type: "weight", weights: { A: 100, B: 0 } },
rolloutStrategy: {
type: "health-based",
stages: [
{ name: "canary-10", trafficPercentageB: 10,
duration: 300, healthThreshold: 99, errorRateThreshold: 1 },
{ name: "canary-25", trafficPercentageB: 25,
duration: 600, healthThreshold: 99, errorRateThreshold: 1 },
{ name: "canary-50", trafficPercentageB: 50,
duration: 900, healthThreshold: 99, errorRateThreshold: 1 },
{ name: "promote", trafficPercentageB: 100,
requireApproval: true },
],
},
};
```
### Use Cases
- API services with external traffic
- Web applications with user sessions
- Dynamic traffic distribution
- User-based variants (A/B testing)
- Feature flags and gradual rollouts
---
## Routing Strategies
### Weight-Based Routing
Splits traffic by percentage across variations.
```yaml
trafficSplit:
type: weight
weights:
A: 90
B: 10
```
### Header-Based Routing
Routes based on request header values.
```yaml
trafficSplit:
type: header
headerName: X-Feature-Flag
headerValueA: "control"
headerValueB: "experiment"
```
### Cookie-Based Routing
Routes based on cookie values for sticky sessions.
```yaml
trafficSplit:
type: cookie
cookieName: ab_variation
cookieValueA: "A"
cookieValueB: "B"
```
---
## Comparison Matrix
| Aspect | Target-Group A/B | Router-Based A/B |
|--------|------------------|------------------|
| **Traffic Control** | By scaling targets | By routing rules |
| **Sticky Sessions** | Not supported | Supported |
| **Granularity** | Target-level | Request-level |
| **External Traffic** | Not required | Required |
| **Infrastructure** | Target groups | Traffic router |
| **Use Case** | Workers, batch jobs | APIs, web apps |
| **Rollback Speed** | Slower (scaling) | Immediate (routing) |
---
## See Also
- [Progressive Delivery Module](../modules/progressive-delivery.md)
- [Canary Controller](canary.md)
- [Router Plugins](routers.md)
- [Deployment Strategies](../deployment/strategies.md)

View File

@@ -0,0 +1,270 @@
# Canary Controller
> Automated canary deployment controller with health-based stage advancement and automatic rollback.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 11.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Deployment Strategies](../deployment/strategies.md)
**Sprint:** [110_003 Canary Controller](../../../../implplan/SPRINT_20260110_110_003_PROGDL_canary_controller.md)
## Overview
The Canary Controller automates progressive rollout of new versions by gradually shifting traffic, monitoring health metrics, and automatically rolling back if issues are detected.
---
## Canary State Machine
### States
```
CREATED -> DEPLOYING -> EVALUATING -> PROMOTING/ROLLING_BACK -> COMPLETED
```
| State | Description |
|-------|-------------|
| `CREATED` | Canary release defined, not started |
| `DEPLOYING` | Deploying variation B to targets |
| `EVALUATING` | Monitoring health metrics at current stage |
| `PROMOTING` | Advancing to next stage |
| `ROLLING_BACK` | Reverting to variation A |
| `COMPLETED` | Final state (promoted or rolled back) |
---
## Implementation
### Canary Controller Class
```typescript
class CanaryController {
async executeRollout(abRelease: ABRelease): Promise<void> {
const strategy = abRelease.rolloutStrategy;
for (let i = 0; i < strategy.stages.length; i++) {
const stage = strategy.stages[i];
const stageRecord = await this.startStage(abRelease, stage, i);
try {
// 1. Apply traffic configuration for this stage
await this.applyStageTraffic(abRelease, stage);
this.emit("canary.stage_started", { abRelease, stage, stageNumber: i });
// 2. Wait for stage completion based on criteria
const result = await this.waitForStageCompletion(abRelease, stage);
if (!result.success) {
// Health check failed - rollback
this.log(`Stage ${stage.name} failed health check: ${result.reason}`);
await this.rollback(abRelease, result.reason);
return;
}
// 3. Check if approval required
if (stage.requireApproval) {
this.log(`Stage ${stage.name} requires approval`);
await this.pauseForApproval(abRelease, stage);
// Wait for approval
const approval = await this.waitForApproval(abRelease, stage);
if (!approval.approved) {
await this.rollback(abRelease, "Approval denied");
return;
}
}
await this.completeStage(stageRecord, "succeeded");
this.emit("canary.stage_completed", { abRelease, stage, stageNumber: i });
} catch (error) {
await this.completeStage(stageRecord, "failed", error.message);
await this.rollback(abRelease, error.message);
return;
}
}
// Rollout complete
await this.completeRollout(abRelease);
this.emit("canary.promoted", { abRelease });
}
}
```
### Stage Completion Logic
```typescript
private async waitForStageCompletion(
abRelease: ABRelease,
stage: RolloutStage
): Promise<StageCompletionResult> {
const startTime = Date.now();
const checkInterval = 30000; // 30 seconds
while (true) {
// Check health metrics
const health = await this.checkHealth(abRelease, stage);
if (!health.healthy) {
return {
success: false,
reason: `Health check failed: ${health.reason}`
};
}
// Check error rate (if threshold configured)
if (stage.errorRateThreshold !== undefined) {
const errorRate = await this.getErrorRate(abRelease);
if (errorRate > stage.errorRateThreshold) {
return {
success: false,
reason: `Error rate ${errorRate}% exceeds threshold ${stage.errorRateThreshold}%`
};
}
}
// Check latency (if threshold configured)
if (stage.latencyThreshold !== undefined) {
const latency = await this.getP99Latency(abRelease);
if (latency > stage.latencyThreshold) {
return {
success: false,
reason: `P99 latency ${latency}ms exceeds threshold ${stage.latencyThreshold}ms`
};
}
}
// Check duration (auto-advance)
if (stage.duration !== undefined) {
const elapsed = (Date.now() - startTime) / 1000;
if (elapsed >= stage.duration) {
return { success: true };
}
}
// Wait before next check
await sleep(checkInterval);
}
}
```
### Traffic Application
```typescript
private async applyStageTraffic(abRelease: ABRelease, stage: RolloutStage): Promise<void> {
if (abRelease.config.type === "router-based") {
const router = await this.getRouterConnector(abRelease.config.routerIntegrationId);
await router.shiftTraffic(
abRelease.config.variationA.serviceName,
abRelease.config.variationB.serviceName,
stage.trafficPercentageB
);
} else if (abRelease.config.type === "target-group") {
// Scale target groups
await this.scaleTargetGroup(
abRelease.config.groupA,
stage.groupAPercentage
);
await this.scaleTargetGroup(
abRelease.config.groupB,
stage.groupBPercentage
);
}
}
```
### Rollback
```typescript
async rollback(abRelease: ABRelease, reason: string): Promise<void> {
this.log(`Rolling back A/B release: ${reason}`);
this.emit("canary.rollback_started", { abRelease, reason });
if (abRelease.config.type === "router-based") {
// Shift all traffic back to A
const router = await this.getRouterConnector(abRelease.config.routerIntegrationId);
await router.shiftTraffic(
abRelease.config.variationB.serviceName,
abRelease.config.variationA.serviceName,
100
);
} else if (abRelease.config.type === "target-group") {
// Scale B to 0, A to 100
await this.scaleTargetGroup(abRelease.config.groupA, 100);
await this.scaleTargetGroup(abRelease.config.groupB, 0);
}
abRelease.status = "rolled_back";
await this.save(abRelease);
this.emit("canary.rolled_back", { abRelease, reason });
}
```
---
## Configuration
### Canary Stages
```yaml
rolloutStrategy:
type: health-based
stages:
- name: canary-5
trafficPercentageB: 5
duration: 300 # 5 minutes
healthThreshold: 99
errorRateThreshold: 0.5
- name: canary-25
trafficPercentageB: 25
duration: 600 # 10 minutes
healthThreshold: 99
errorRateThreshold: 1.0
- name: canary-50
trafficPercentageB: 50
duration: 900 # 15 minutes
healthThreshold: 99
errorRateThreshold: 1.0
- name: promote
trafficPercentageB: 100
requireApproval: true
```
### Health Metrics
| Metric | Description | Typical Threshold |
|--------|-------------|-------------------|
| Success Rate | % of successful requests | > 99% |
| Error Rate | % of failed requests | < 1% |
| P99 Latency | 99th percentile response time | < 500ms |
| Health Check | Container/service health | Healthy |
---
## Events
The canary controller emits events for observability:
| Event | Description |
|-------|-------------|
| `canary.stage_started` | Stage execution began |
| `canary.stage_completed` | Stage completed successfully |
| `canary.rollback_started` | Rollback initiated |
| `canary.rolled_back` | Rollback completed |
| `canary.promoted` | Full promotion completed |
---
## See Also
- [Progressive Delivery Module](../modules/progressive-delivery.md)
- [A/B Release Models](ab-releases.md)
- [Router Plugins](routers.md)
- [Metrics](../operations/metrics.md)

View File

@@ -0,0 +1,348 @@
# Router Plugins
> Traffic router plugins for progressive delivery (Nginx, AWS ALB, and custom implementations).
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 11.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Plugin System](../modules/plugin-system.md)
**Sprint:** [110_004 Router Plugins](../../../../implplan/SPRINT_20260110_110_004_PROGDL_nginx_router.md)
## Overview
Router plugins enable traffic shifting for progressive delivery. The orchestrator ships with an Nginx router plugin for v1, with HAProxy, Traefik, and AWS ALB available as additional plugins.
---
## Router Plugin Interface
All router plugins implement the `TrafficRouterPlugin` interface:
```typescript
interface TrafficRouterPlugin {
// Configuration
configureRoute(config: RouteConfig): Promise<void>;
// Traffic operations
shiftTraffic(from: string, to: string, percentage: number): Promise<void>;
getTrafficDistribution(): Promise<TrafficDistribution>;
// Health
validateConfig(): Promise<ValidationResult>;
reload(): Promise<void>;
}
interface RouteConfig {
upstream: string;
serverName: string;
variations: Variation[];
splitType: "weight" | "header" | "cookie";
headerName?: string;
headerValueB?: string;
stickySession?: boolean;
stickyDuration?: number;
}
interface Variation {
name: string;
targets: string[];
weight: number;
}
interface TrafficDistribution {
variations: {
name: string;
percentage: number;
targets: string[];
}[];
}
```
---
## Nginx Router Plugin (v1 Built-in)
The Nginx plugin generates and manages Nginx configuration for traffic splitting.
### Implementation
```typescript
class NginxRouterPlugin implements TrafficRouterPlugin {
async configureRoute(config: RouteConfig): Promise<void> {
const upstreamConfig = this.generateUpstreamConfig(config);
const serverConfig = this.generateServerConfig(config);
// Write configuration files
await this.writeConfig(
`/etc/nginx/conf.d/upstream-${config.upstream}.conf`,
upstreamConfig
);
await this.writeConfig(
`/etc/nginx/conf.d/server-${config.upstream}.conf`,
serverConfig
);
// Validate configuration
const validation = await this.validateConfig();
if (!validation.valid) {
throw new Error(`Nginx config validation failed: ${validation.error}`);
}
// Reload nginx
await this.reload();
}
}
```
### Upstream Configuration
```typescript
private generateUpstreamConfig(config: RouteConfig): string {
const lines: string[] = [];
for (const variation of config.variations) {
lines.push(`upstream ${config.upstream}_${variation.name} {`);
for (const target of variation.targets) {
lines.push(` server ${target};`);
}
lines.push(`}`);
lines.push(``);
}
// Combined upstream with weights (for percentage-based routing)
if (config.splitType === "weight") {
lines.push(`upstream ${config.upstream} {`);
for (const variation of config.variations) {
const weight = variation.weight;
for (const target of variation.targets) {
lines.push(` server ${target} weight=${weight};`);
}
}
lines.push(`}`);
}
return lines.join("\n");
}
```
### Server Configuration
```typescript
private generateServerConfig(config: RouteConfig): string {
if (config.splitType === "header" || config.splitType === "cookie") {
// Split block based on header/cookie
return `
map $http_${config.headerName || "x-variation"} $${config.upstream}_backend {
default ${config.upstream}_A;
"${config.headerValueB || "B"}" ${config.upstream}_B;
}
server {
listen 80;
server_name ${config.serverName};
location / {
proxy_pass http://$${config.upstream}_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
`;
} else {
// Weight-based (default)
return `
server {
listen 80;
server_name ${config.serverName};
location / {
proxy_pass http://${config.upstream};
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
`;
}
}
```
### Traffic Shifting
```typescript
async shiftTraffic(from: string, to: string, percentage: number): Promise<void> {
const config = await this.getCurrentConfig();
// Update weights
for (const variation of config.variations) {
if (variation.name === to) {
variation.weight = percentage;
} else {
variation.weight = 100 - percentage;
}
}
await this.configureRoute(config);
}
async getTrafficDistribution(): Promise<TrafficDistribution> {
// Parse current nginx config to get weights
const config = await this.parseCurrentConfig();
return {
variations: config.variations.map(v => ({
name: v.name,
percentage: v.weight,
targets: v.targets,
})),
};
}
```
---
## AWS ALB Router Plugin
The AWS ALB plugin manages weighted target groups for traffic splitting.
### Implementation
```typescript
class AWSALBRouterPlugin implements TrafficRouterPlugin {
private alb: AWS.ELBv2;
async configureRoute(config: RouteConfig): Promise<void> {
const listenerArn = config.listenerArn;
// Create/update target groups for each variation
const targetGroupArns: Record<string, string> = {};
for (const variation of config.variations) {
const tgArn = await this.ensureTargetGroup(
`${config.upstream}-${variation.name}`,
variation.targets
);
targetGroupArns[variation.name] = tgArn;
}
// Update listener rule with weighted target groups
await this.alb.modifyRule({
RuleArn: config.ruleArn,
Actions: [{
Type: "forward",
ForwardConfig: {
TargetGroups: config.variations.map(v => ({
TargetGroupArn: targetGroupArns[v.name],
Weight: v.weight,
})),
TargetGroupStickinessConfig: {
Enabled: config.stickySession || false,
DurationSeconds: config.stickyDuration || 3600,
},
},
}],
}).promise();
}
async shiftTraffic(from: string, to: string, percentage: number): Promise<void> {
const rule = await this.getRule();
const forwardConfig = rule.Actions[0].ForwardConfig;
// Update weights
for (const tg of forwardConfig.TargetGroups) {
if (tg.TargetGroupArn.includes(`-${to}`)) {
tg.Weight = percentage;
} else {
tg.Weight = 100 - percentage;
}
}
await this.alb.modifyRule({
RuleArn: rule.RuleArn,
Actions: rule.Actions,
}).promise();
}
async getTrafficDistribution(): Promise<TrafficDistribution> {
const rule = await this.getRule();
const forwardConfig = rule.Actions[0].ForwardConfig;
const variations = [];
for (const tg of forwardConfig.TargetGroups) {
const targets = await this.getTargetGroupTargets(tg.TargetGroupArn);
const name = tg.TargetGroupArn.split("-").pop();
variations.push({
name,
percentage: tg.Weight,
targets: targets.map(t => t.Id),
});
}
return { variations };
}
}
```
---
## Router Plugin Catalog
| Plugin | Status | Description |
|--------|--------|-------------|
| Nginx | v1 Built-in | Configuration-based weight/header routing |
| HAProxy | Plugin | Runtime API for traffic management |
| Traefik | Plugin | Dynamic configuration via API |
| AWS ALB | Plugin | Weighted target groups |
| Envoy | Planned | xDS API integration |
---
## Creating Custom Router Plugins
To create a custom router plugin:
1. **Implement Interface:** Create a class implementing `TrafficRouterPlugin`
2. **Register Plugin:** Add to plugin registry with capabilities
3. **Configuration Schema:** Define JSON Schema for plugin config
4. **Health Checks:** Implement connection testing
5. **Rollback Support:** Handle traffic reversion on failures
### Example Plugin Manifest
```yaml
plugin:
name: my-router
version: 1.0.0
type: router
capabilities:
- traffic-routing
- weight-based
- header-based
config:
type: object
properties:
endpoint:
type: string
description: Router API endpoint
auth:
type: object
properties:
type:
enum: [basic, token]
credentialRef:
type: string
```
---
## See Also
- [Progressive Delivery Module](../modules/progressive-delivery.md)
- [Plugin System](../modules/plugin-system.md)
- [Canary Controller](canary.md)
- [A/B Release Models](ab-releases.md)

View File

@@ -0,0 +1,246 @@
# Implementation Roadmap
> Phased delivery plan for the Release Orchestrator implementation.
**Status:** Planned
**Source:** [Architecture Advisory Section 14](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related:** [Implementation Guide](implementation-guide.md), [Test Structure](test-structure.md)
## Overview
The Release Orchestrator is delivered in 8 phases over 34 weeks, progressively building from foundational infrastructure to full plugin ecosystem support.
---
## Phased Delivery Plan
### Phase 1: Foundation (Weeks 1-4)
**Goal:** Core infrastructure and basic release management
| Week | Deliverables |
|------|--------------|
| Week 1 | Database schema migration; INTHUB integration-manager; connection-profiles |
| Week 2 | ENVMGR environment-manager; target-registry (basic) |
| Week 3 | RELMAN component-registry; version-manager; release-manager |
| Week 4 | Basic release CRUD APIs; CLI commands; integration tests |
**Exit Criteria:**
- Can create environments with config
- Can register components with image repos
- Can create releases with pinned digests
- Can list/search releases
**Certified Path:** Manual release creation; no deployment yet
---
### Phase 2: Workflow Engine (Weeks 5-8)
**Goal:** Workflow execution capability
| Week | Deliverables |
|------|--------------|
| Week 5 | WORKFL step-registry; built-in step types (approval, policy-gate, notify) |
| Week 6 | WORKFL workflow-designer; workflow template CRUD |
| Week 7 | WORKFL workflow-engine; DAG execution; state machine |
| Week 8 | Step executor; retry logic; timeout handling; workflow run APIs |
**Exit Criteria:**
- Can create workflow templates via API
- Can execute workflows with approval steps
- Workflow state machine handles all transitions
- Step retries work correctly
**Certified Path:** Approval-only workflows; no deployment execution yet
---
### Phase 3: Promotion & Decision (Weeks 9-12)
**Goal:** Promotion workflow with security gates
| Week | Deliverables |
|------|--------------|
| Week 9 | PROMOT promotion-manager; approval-gateway |
| Week 10 | PROMOT decision-engine; security gate integration with SCANENG |
| Week 11 | Gate registry; freeze window gate; SoD enforcement |
| Week 12 | Promotion APIs; "Why blocked?" endpoint; decision record |
**Exit Criteria:**
- Can request promotion
- Security gates evaluate scan verdicts
- Approval workflow enforces SoD
- Decision record captures gate results
**Certified Path:** Promotions with security + approval gates; no deployment yet
---
### Phase 4: Deployment Execution (Weeks 13-18)
**Goal:** Deploy to Docker/Compose targets
| Week | Deliverables |
|------|--------------|
| Week 13 | AGENTS agent-core; agent registration; heartbeat |
| Week 14 | AGENTS agent-docker; Docker host deployment |
| Week 15 | AGENTS agent-compose; Compose deployment |
| Week 16 | DEPLOY deploy-orchestrator; artifact-generator |
| Week 17 | DEPLOY rollback-manager; version sticker writing |
| Week 18 | RELEVI evidence-collector; evidence-signer; audit-exporter |
**Exit Criteria:**
- Agents can register and receive tasks
- Docker deployment works with digest verification
- Compose deployment writes lock files
- Rollback restores previous version
- Evidence packets generated for deployments
**Certified Path:** Full promotion -> deployment flow for Docker/Compose
---
### Phase 5: UI & Polish (Weeks 19-22)
**Goal:** Web console for release orchestration
| Week | Deliverables |
|------|--------------|
| Week 19 | Dashboard components; metrics widgets |
| Week 20 | Environment overview; release detail screens |
| Week 21 | Workflow editor (graph); run visualization |
| Week 22 | Promotion UI; approval queue; "Why blocked?" modal |
**Exit Criteria:**
- Dashboard shows operational metrics
- Can manage environments/releases via UI
- Can create/edit workflows in graph editor
- Can approve promotions via UI
**Certified Path:** Complete v1 user experience
---
### Phase 6: Progressive Delivery (Weeks 23-26)
**Goal:** A/B releases and canary deployments
| Week | Deliverables |
|------|--------------|
| Week 23 | PROGDL ab-manager; target-group A/B |
| Week 24 | PROGDL canary-controller; stage execution |
| Week 25 | PROGDL traffic-router; Nginx plugin |
| Week 26 | Canary UI; traffic visualization; health monitoring |
**Exit Criteria:**
- Can create A/B release with variations
- Canary controller advances stages based on health
- Traffic router shifts weights
- Rollback on health failure works
**Certified Path:** Target-group A/B; Nginx router-based A/B
---
### Phase 7: Extended Targets (Weeks 27-30)
**Goal:** ECS and Nomad support; SSH/WinRM agentless
| Week | Deliverables |
|------|--------------|
| Week 27 | AGENTS agent-ssh; SSH remote executor |
| Week 28 | AGENTS agent-winrm; WinRM remote executor |
| Week 29 | AGENTS agent-ecs; ECS deployment |
| Week 30 | AGENTS agent-nomad; Nomad deployment |
**Exit Criteria:**
- SSH deployment works with script execution
- WinRM deployment works with PowerShell
- ECS task definition updates work
- Nomad job submissions work
**Certified Path:** All target types operational
---
### Phase 8: Plugin Ecosystem (Weeks 31-34)
**Goal:** Full plugin system; external integrations
| Week | Deliverables |
|------|--------------|
| Week 31 | PLUGIN plugin-registry; plugin-loader |
| Week 32 | PLUGIN plugin-sandbox; plugin-sdk |
| Week 33 | GitHub plugin; GitLab plugin |
| Week 34 | Jenkins plugin; Vault plugin |
**Exit Criteria:**
- Can install and configure plugins
- Plugins can contribute step types
- Plugins can contribute integrations
- Plugin sandbox enforces limits
**Certified Path:** GitHub + Harbor + Docker/Compose + Vault
---
## Resource Requirements
### Team Structure
| Role | Count | Responsibilities |
|------|-------|------------------|
| Tech Lead | 1 | Architecture decisions; code review; unblocking |
| Backend Engineers | 4 | Module development; API implementation |
| Frontend Engineers | 2 | Web console; dashboard; workflow editor |
| DevOps Engineer | 1 | CI/CD; infrastructure; agent deployment |
| QA Engineer | 1 | Test automation; integration testing |
| Technical Writer | 0.5 | Documentation; API docs; user guides |
### Infrastructure Requirements
| Component | Specification |
|-----------|---------------|
| PostgreSQL | Primary database; 16+ recommended; read replicas for scale |
| Redis | Job queues; caching; session storage |
| Object Storage | S3-compatible; evidence packets; large artifacts |
| Container Runtime | Docker; for plugin sandboxes |
| Kubernetes | Optional; for Stella core deployment (not required for targets) |
---
## Risk Mitigation
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Agent security complexity | High | High | Early security review; penetration testing; mTLS implementation in Phase 4 |
| Workflow state machine edge cases | Medium | High | Comprehensive state transition tests; chaos testing |
| Plugin sandbox escapes | Low | Critical | Security audit; capability restrictions; resource limits |
| Database migration issues | Medium | Medium | Staged rollout; rollback scripts; data validation |
| UI performance with large workflows | Medium | Medium | Virtual rendering; lazy loading; performance testing |
| Integration compatibility | High | Medium | Abstract connector interface; extensive integration tests |
---
## Success Metrics
| Phase | Key Metrics |
|-------|-------------|
| Phase 1 | Release creation time < 5s; API latency p99 < 200ms |
| Phase 2 | Workflow execution reliability > 99.9% |
| Phase 3 | Gate evaluation time < 500ms; SoD enforcement 100% |
| Phase 4 | Deployment success rate > 99%; rollback time < 60s |
| Phase 5 | UI initial load < 2s; real-time update latency < 1s |
| Phase 6 | Canary rollback trigger time < 30s |
| Phase 7 | All target type coverage with unified API |
| Phase 8 | Plugin sandbox isolation verified by security audit |
---
## References
- [Sprint Index](../../implplan/SPRINT_20260110_100_000_INDEX_release_orchestrator.md)
- [Implementation Guide](implementation-guide.md)
- [Test Structure](test-structure.md)
- [Architecture Overview](architecture.md)

View File

@@ -0,0 +1,286 @@
# Agent Security Model
## Overview
Agents are trusted components that execute deployment tasks on targets. Their security model ensures:
- Strong identity through mTLS certificates
- Minimal privilege through scoped task credentials
- Audit trail through signed task receipts
- Isolation through process sandboxing
## Agent Registration Flow
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ AGENT REGISTRATION FLOW │
│ │
│ 1. Admin generates registration token (one-time use) │
│ POST /api/v1/admin/agent-tokens │
│ Response: { token: "reg_xxx", expiresAt: "..." } │
│ │
│ 2. Agent starts with registration token │
│ ./stella-agent --register --token=reg_xxx │
│ │
│ 3. Agent requests mTLS certificate │
│ POST /api/v1/agents/register │
│ Headers: X-Registration-Token: reg_xxx │
│ Body: { name, version, capabilities, csr } │
│ Response: { agentId, certificate, caCertificate } │
│ │
│ 4. Agent establishes mTLS connection │
│ Uses issued certificate for all subsequent requests │
│ │
│ 5. Agent requests short-lived JWT for task execution │
│ POST /api/v1/agents/token (over mTLS) │
│ Response: { token, expiresIn: 3600 } │
│ │
│ 6. Agent refreshes token before expiration │
│ Token refresh only over mTLS connection │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## mTLS Communication
All agent-to-core communication uses mutual TLS:
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ AGENT COMMUNICATION SECURITY │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ AGENT │ │ STELLA CORE │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ │ mTLS (mutual TLS) │ │
│ │ - Agent cert signed by Stella CA │ │
│ │ - Server cert verified by Agent │ │
│ │ - TLS 1.3 only │ │
│ │ - Perfect forward secrecy │ │
│ │◄────────────────────────────────────────►│ │
│ │ │ │
│ │ Encrypted payload │ │
│ │ - Task payloads encrypted with │ │
│ │ agent-specific key │ │
│ │ - Logs encrypted in transit │ │
│ │◄────────────────────────────────────────►│ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### TLS Requirements
| Requirement | Value |
|-------------|-------|
| Protocol | TLS 1.3 only |
| Cipher Suites | TLS_AES_256_GCM_SHA384, TLS_CHACHA20_POLY1305_SHA256 |
| Key Exchange | ECDHE with P-384 or X25519 |
| Certificate Key | RSA 4096-bit or ECDSA P-384 |
| Certificate Validity | 90 days (auto-renewed) |
## Certificate Management
### Certificate Structure
```typescript
interface AgentCertificate {
subject: {
CN: string; // Agent name
O: string; // "Stella Ops"
OU: string; // Tenant ID
};
serialNumber: string;
issuer: string; // Stella CA
validFrom: DateTime;
validTo: DateTime;
extensions: {
keyUsage: ["digitalSignature", "keyEncipherment"];
extendedKeyUsage: ["clientAuth"];
subjectAltName: string[]; // Agent ID as URI
};
}
```
### Certificate Renewal
Agents automatically renew certificates before expiration:
1. Agent detects certificate expiring within 30 days
2. Agent generates new CSR with same identity
3. Agent submits renewal request over existing mTLS connection
4. Authority issues new certificate
5. Agent transitions to new certificate seamlessly
## Secrets Management
Secrets are NEVER stored in the Stella database. Only vault references are stored.
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ SECRETS FLOW (NEVER STORED IN DB) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ VAULT │ │ STELLA CORE │ │ AGENT │ │
│ │ (Source) │ │ (Broker) │ │ (Consumer) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ │ │ Task requires secret │ │
│ │ │ │ │
│ │ Fetch with service │ │ │
│ │ account token │ │ │
│ │◄─────────────────────── │ │
│ │ │ │ │
│ │ Return secret │ │ │
│ │ (wrapped, short TTL) │ │ │
│ │────────────────────────► │ │
│ │ │ │ │
│ │ │ Embed in task payload │ │
│ │ │ (encrypted) │ │
│ │ │────────────────────────► │
│ │ │ │ │
│ │ │ │ Decrypt │
│ │ │ │ Use for task │
│ │ │ │ Discard │
│ │
│ Rules: │
│ - Secrets NEVER stored in Stella database │
│ - Only Vault references stored │
│ - Secrets fetched at execution time only │
│ - Secrets not logged (masked in logs) │
│ - Secrets not persisted in agent memory beyond task scope │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Task Security
### Task Assignment
```typescript
interface AgentTask {
id: UUID;
type: TaskType;
targetId: UUID;
payload: TaskPayload;
credentials: EncryptedCredentials; // Encrypted with agent's public key
timeout: number;
priority: TaskPriority;
idempotencyKey: string;
assignedAt: DateTime;
expiresAt: DateTime;
}
```
### Credential Scoping
Task credentials are:
- Scoped to specific target only
- Valid only for task duration
- Encrypted with agent's public key
- Logged when accessed (without values)
### Task Execution Isolation
Agents execute tasks with isolation:
```typescript
interface TaskExecutionContext {
// Process isolation
workingDirectory: string; // Unique per task
processUser: string; // Non-root user
networkNamespace: string; // If network isolation enabled
// Resource limits
memoryLimit: number; // Bytes
cpuLimit: number; // Millicores
diskLimit: number; // Bytes
networkEgress: string[]; // Allowed destinations
// Cleanup
cleanupOnComplete: boolean;
cleanupTimeout: number;
}
```
## Agent Capabilities
Agents declare capabilities that determine what tasks they can execute:
```typescript
interface AgentCapabilities {
docker?: DockerCapability;
compose?: ComposeCapability;
ssh?: SshCapability;
winrm?: WinrmCapability;
ecs?: EcsCapability;
nomad?: NomadCapability;
}
interface DockerCapability {
version: string;
apiVersion: string;
runtimes: string[];
registryAuth: boolean;
}
interface ComposeCapability {
version: string;
fileFormats: string[];
}
```
## Heartbeat Protocol
```typescript
interface AgentHeartbeat {
agentId: UUID;
timestamp: DateTime;
status: "healthy" | "degraded";
resourceUsage: {
cpuPercent: number;
memoryPercent: number;
diskPercent: number;
networkRxBytes: number;
networkTxBytes: number;
};
activeTaskCount: number;
completedTasks: number;
failedTasks: number;
errors: string[];
signature: string; // HMAC of heartbeat data
}
```
### Heartbeat Validation
1. Verify signature matches expected HMAC
2. Check timestamp is within acceptable skew (30s)
3. Update agent status based on heartbeat content
4. Trigger alerts if heartbeat missing for >90s
## Agent Revocation
When an agent is compromised or decommissioned:
1. Certificate added to CRL (Certificate Revocation List)
2. All pending tasks for agent cancelled
3. Agent removed from target assignments
4. Audit event logged
5. New agent can be registered with same name (new identity)
## Security Checklist
| Control | Implementation |
|---------|----------------|
| Identity | mTLS certificates signed by internal CA |
| Authentication | Certificate-based + short-lived JWT |
| Authorization | Task-scoped credentials |
| Encryption | TLS 1.3 for transport, envelope encryption for secrets |
| Isolation | Process sandboxing, resource limits |
| Audit | All task assignments and completions logged |
| Revocation | CRL for compromised agents |
| Secret handling | Vault integration, no persistence |
## References
- [Security Overview](overview.md)
- [Authentication & Authorization](auth.md)
- [Threat Model](threat-model.md)

View File

@@ -0,0 +1,239 @@
# Audit Trail
> Audit event structure and audited operations for compliance and forensics.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 8.5](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Evidence Module](../modules/evidence.md), [Security Overview](overview.md)
**Sprints:** [109_001 Evidence Collector](../../../../implplan/SPRINT_20260110_109_001_RELEVI_evidence_collector.md)
## Overview
The Release Orchestrator maintains a tamper-evident audit trail of all security-relevant operations. Audit events are cryptographically chained to detect tampering.
---
## Audit Event Structure
### TypeScript Interface
```typescript
interface AuditEvent {
id: UUID;
timestamp: DateTime;
tenantId: UUID;
// Actor
actorType: "user" | "agent" | "system" | "plugin";
actorId: UUID;
actorName: string;
actorIp?: string;
// Action
action: string; // "promotion.approved", "deployment.started"
resource: string; // "promotion"
resourceId: UUID;
// Context
environmentId?: UUID;
releaseId?: UUID;
promotionId?: UUID;
// Details
before?: object; // State before (for updates)
after?: object; // State after
metadata?: object; // Additional context
// Integrity
previousEventHash: string; // Hash chain for tamper detection
eventHash: string;
}
```
---
## Audited Operations
| Category | Operations |
|----------|------------|
| **Authentication** | Login, logout, token refresh, failed attempts |
| **Authorization** | Permission denied events |
| **Environments** | Create, update, delete, freeze window changes |
| **Releases** | Create, deprecate, archive |
| **Promotions** | Request, approve, reject, cancel |
| **Deployments** | Start, complete, fail, rollback |
| **Targets** | Register, update, delete, health changes |
| **Agents** | Register, heartbeat gaps, capability changes |
| **Integrations** | Create, update, delete, test |
| **Plugins** | Enable, disable, config changes |
| **Evidence** | Create (never update/delete) |
---
## Hash Chain
### Chain Verification
The audit trail uses SHA-256 hash chaining for tamper detection:
```typescript
interface HashChainEntry {
eventId: UUID;
eventHash: string;
previousEventHash: string;
}
function computeEventHash(event: AuditEvent): string {
const payload = JSON.stringify({
id: event.id,
timestamp: event.timestamp,
tenantId: event.tenantId,
actorType: event.actorType,
actorId: event.actorId,
action: event.action,
resource: event.resource,
resourceId: event.resourceId,
previousEventHash: event.previousEventHash,
});
return sha256(payload);
}
function verifyChain(events: AuditEvent[]): VerificationResult {
for (let i = 1; i < events.length; i++) {
const current = events[i];
const previous = events[i - 1];
if (current.previousEventHash !== previous.eventHash) {
return {
valid: false,
brokenAt: i,
reason: "Hash chain broken"
};
}
const computed = computeEventHash(current);
if (computed !== current.eventHash) {
return {
valid: false,
brokenAt: i,
reason: "Event hash mismatch"
};
}
}
return { valid: true };
}
```
---
## Example Audit Events
### Promotion Approved
```json
{
"id": "evt-123",
"timestamp": "2026-01-09T14:32:15Z",
"tenantId": "tenant-uuid",
"actorType": "user",
"actorId": "user-uuid",
"actorName": "jane@example.com",
"actorIp": "192.168.1.100",
"action": "promotion.approved",
"resource": "promotion",
"resourceId": "promo-uuid",
"environmentId": "env-uuid",
"releaseId": "rel-uuid",
"promotionId": "promo-uuid",
"before": {
"status": "pending"
},
"after": {
"status": "approved",
"approvals": 2
},
"metadata": {
"comment": "LGTM"
},
"previousEventHash": "sha256:abc...",
"eventHash": "sha256:def..."
}
```
### Deployment Started
```json
{
"id": "evt-124",
"timestamp": "2026-01-09T14:32:20Z",
"tenantId": "tenant-uuid",
"actorType": "system",
"actorId": "system",
"actorName": "deployment-orchestrator",
"action": "deployment.started",
"resource": "deployment",
"resourceId": "deploy-uuid",
"environmentId": "env-uuid",
"releaseId": "rel-uuid",
"promotionId": "promo-uuid",
"after": {
"status": "deploying",
"strategy": "rolling",
"targetCount": 5
},
"previousEventHash": "sha256:def...",
"eventHash": "sha256:ghi..."
}
```
---
## Retention Policy
| Environment | Retention Period |
|-------------|------------------|
| All tenants | 7 years (compliance) |
| After tenant deletion | 7 years (legal hold) |
| Archive format | NDJSON, signed |
---
## Export Format
Audit events can be exported for compliance reporting:
```bash
# Export audit trail for a date range
GET /api/v1/audit/export?
start=2026-01-01T00:00:00Z&
end=2026-01-31T23:59:59Z&
format=ndjson
```
Response includes signed digest for verification:
```json
{
"export": {
"startDate": "2026-01-01T00:00:00Z",
"endDate": "2026-01-31T23:59:59Z",
"eventCount": 15234,
"firstEventHash": "sha256:abc...",
"lastEventHash": "sha256:xyz...",
"downloadUrl": "https://..."
},
"signature": "base64-signature",
"signedAt": "2026-02-01T00:00:00Z"
}
```
---
## See Also
- [Security Overview](overview.md)
- [Evidence](../modules/evidence.md)
- [Logging](../operations/logging.md)
- [Evidence Schema](../appendices/evidence-schema.md)

View File

@@ -0,0 +1,305 @@
# Authentication & Authorization
## Authentication Methods
### OAuth 2.0 for Human Users
```
┌──────────────────────────────────────────────────────────────────────────────┐
│ OAUTH 2.0 AUTHORIZATION CODE FLOW │
│ │
│ ┌──────────┐ ┌──────────────┐ │
│ │ Browser │ │ Authority │ │
│ └────┬─────┘ └──────┬───────┘ │
│ │ │ │
│ │ 1. Login request │ │
│ │ ────────────────────────────────────► │ │
│ │ │ │
│ │ 2. Redirect to IdP │ │
│ │ ◄──────────────────────────────────── │ │
│ │ │ │
│ │ 3. User authenticates at IdP │ │
│ │ ─────────────────────────────────► │ │
│ │ │ │
│ │ 4. IdP callback with code │ │
│ │ ◄──────────────────────────────────── │ │
│ │ │ │
│ │ 5. Exchange code for tokens │ │
│ │ ────────────────────────────────────► │ │
│ │ │ │
│ │ 6. Access token + refresh token │ │
│ │ ◄──────────────────────────────────── │ │
│ │ │ │
└──────────────────────────────────────────────────────────────────────────────┘
```
### mTLS for Agents
Agents authenticate using mutual TLS with certificates issued by Stella's internal CA.
**Registration Flow:**
1. Admin generates one-time registration token
2. Agent starts with registration token
3. Agent submits CSR (Certificate Signing Request)
4. Authority issues certificate signed by Stella CA
5. Agent uses certificate for all subsequent requests
### API Keys for Service-to-Service
External services can use API keys for programmatic access:
- Keys are tenant-scoped
- Keys can have restricted permissions
- Keys can have expiration dates
- Key usage is audited
## JWT Token Structure
### Access Token Claims
```typescript
interface AccessTokenClaims {
// Standard claims
iss: string; // "https://authority.stella.local"
sub: string; // User ID
aud: string[]; // ["stella-api"]
exp: number; // Expiration timestamp
iat: number; // Issued at timestamp
jti: string; // Unique token ID
// Custom claims
tenant_id: string;
roles: string[];
permissions: Permission[];
email?: string;
name?: string;
}
```
### Token Lifetimes
| Token Type | Lifetime | Refresh |
|------------|----------|---------|
| Access Token | 15 minutes | Via refresh token |
| Refresh Token | 7 days | Rotated on use |
| Agent Token | 1 hour | Via mTLS connection |
| API Key | Configurable | Not refreshed |
## Authorization Model
### Resource Types
```typescript
type ResourceType =
| "environment"
| "release"
| "promotion"
| "target"
| "agent"
| "workflow"
| "plugin"
| "integration"
| "evidence";
```
### Action Types
```typescript
type ActionType =
| "create"
| "read"
| "update"
| "delete"
| "execute"
| "approve"
| "deploy"
| "rollback";
```
### Permission Structure
```typescript
interface Permission {
resource: ResourceType;
action: ActionType;
scope?: PermissionScope;
conditions?: Condition[];
}
type PermissionScope =
| "*" // All resources
| { environmentId: UUID } // Specific environment
| { labels: Record<string, string> }; // Label-based
```
### Built-in Roles
| Role | Description | Key Permissions |
|------|-------------|-----------------|
| `admin` | Full access | All permissions |
| `release_manager` | Manage releases and promotions | Create releases, request promotions |
| `deployer` | Execute deployments | Approve promotions (where allowed), view releases |
| `approver` | Approve promotions | Approve promotions (SoD respected) |
| `viewer` | Read-only access | Read all resources |
| `agent` | Agent service account | Execute deployment tasks |
### Role Definitions
```typescript
const roles = {
admin: {
permissions: [
{ resource: "*", action: "*" }
]
},
release_manager: {
permissions: [
{ resource: "release", action: "create" },
{ resource: "release", action: "read" },
{ resource: "release", action: "update" },
{ resource: "promotion", action: "create" },
{ resource: "promotion", action: "read" },
{ resource: "environment", action: "read" },
{ resource: "workflow", action: "read" },
{ resource: "workflow", action: "execute" }
]
},
deployer: {
permissions: [
{ resource: "release", action: "read" },
{ resource: "promotion", action: "read" },
{ resource: "promotion", action: "approve" },
{ resource: "environment", action: "read" },
{ resource: "target", action: "read" },
{ resource: "agent", action: "read" }
]
},
approver: {
permissions: [
{ resource: "promotion", action: "read" },
{ resource: "promotion", action: "approve" },
{ resource: "release", action: "read" },
{ resource: "environment", action: "read" }
]
},
viewer: {
permissions: [
{ resource: "*", action: "read" }
]
}
};
```
## Environment-Scoped Permissions
Permissions can be scoped to specific environments:
```typescript
// User can approve promotions only to staging
{
resource: "promotion",
action: "approve",
scope: { environmentId: "staging-env-id" }
}
// User can deploy only to targets with specific labels
{
resource: "target",
action: "deploy",
scope: { labels: { "tier": "frontend" } }
}
```
## Separation of Duties (SoD)
When SoD is enabled for an environment:
- The user who requested a promotion cannot approve it
- The user who created a release cannot be the sole approver
- Approval records include SoD verification status
```typescript
interface ApprovalValidation {
promotionId: UUID;
approverId: UUID;
requesterId: UUID;
sodRequired: boolean;
sodSatisfied: boolean;
validationResult: "valid" | "self_approval_denied" | "sod_violation";
}
```
## Permission Checking Algorithm
```typescript
async function checkPermission(
userId: UUID,
resource: ResourceType,
action: ActionType,
resourceId?: UUID
): Promise<boolean> {
// 1. Get user's roles and direct permissions
const userRoles = await getUserRoles(userId);
const userPermissions = await getUserPermissions(userId);
// 2. Expand role permissions
const rolePermissions = userRoles.flatMap(r => roles[r].permissions);
const allPermissions = [...rolePermissions, ...userPermissions];
// 3. Check for matching permission
for (const perm of allPermissions) {
if (matchesResource(perm.resource, resource) &&
matchesAction(perm.action, action) &&
matchesScope(perm.scope, resourceId) &&
evaluateConditions(perm.conditions)) {
return true;
}
}
return false;
}
function matchesResource(pattern: string, resource: string): boolean {
return pattern === "*" || pattern === resource;
}
function matchesAction(pattern: string, action: string): boolean {
return pattern === "*" || pattern === action;
}
```
## API Authorization Headers
All API requests require:
```http
Authorization: Bearer <access_token>
```
For agent requests (over mTLS):
```http
X-Agent-Id: <agent_id>
Authorization: Bearer <agent_token>
```
## Permission Denied Response
```json
{
"success": false,
"error": {
"code": "PERMISSION_DENIED",
"message": "User does not have permission to approve promotions to production",
"details": {
"resource": "promotion",
"action": "approve",
"scope": { "environmentId": "prod-env-id" },
"requiredRoles": ["admin", "approver"],
"userRoles": ["viewer"]
}
}
}
```
## References
- [Security Overview](overview.md)
- [Agent Security](agent-security.md)
- [Authority Module](../../../authority/architecture.md)

View File

@@ -0,0 +1,281 @@
# Security Architecture Overview
## Security Principles
| Principle | Implementation |
|-----------|----------------|
| **Defense in depth** | Multiple layers: network, auth, authz, audit |
| **Least privilege** | Role-based access; minimal permissions |
| **Zero trust** | All requests authenticated; mTLS for agents |
| **Secrets hygiene** | Secrets in vault; never in DB; ephemeral injection |
| **Audit everything** | All mutations logged; evidence trail |
| **Immutable evidence** | Evidence packets append-only; cryptographically signed |
## Authentication Architecture
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ AUTHENTICATION ARCHITECTURE │
│ │
│ Human Users Service/Agent │
│ ┌──────────┐ ┌──────────┐ │
│ │ Browser │ │ Agent │ │
│ └────┬─────┘ └────┬─────┘ │
│ │ │ │
│ │ OAuth 2.0 │ mTLS + JWT │
│ │ Authorization Code │ │
│ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ AUTHORITY MODULE │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ OAuth 2.0 │ │ mTLS │ │ API Key │ │ │
│ │ │ Provider │ │ Validator │ │ Validator │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ TOKEN ISSUER │ │ │
│ │ │ - Short-lived JWT (15 min) │ │ │
│ │ │ - Contains: user_id, tenant_id, roles, permissions │ │ │
│ │ │ - Signed with RS256 │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ API GATEWAY │ │
│ │ │ │
│ │ - Validate JWT signature │ │
│ │ - Check token expiration │ │
│ │ - Extract tenant context │ │
│ │ - Enforce rate limits │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Authorization Model
### Permission Structure
```typescript
interface Permission {
resource: ResourceType;
action: ActionType;
scope?: ScopeType;
conditions?: Condition[];
}
type ResourceType =
| "environment"
| "release"
| "promotion"
| "target"
| "agent"
| "workflow"
| "plugin"
| "integration"
| "evidence";
type ActionType =
| "create"
| "read"
| "update"
| "delete"
| "execute"
| "approve"
| "deploy"
| "rollback";
type ScopeType =
| "*" // All resources
| { environmentId: UUID } // Specific environment
| { labels: Record<string, string> }; // Label-based
```
### Role Definitions
| Role | Permissions |
|------|-------------|
| `admin` | All permissions on all resources |
| `release-manager` | Full access to releases, promotions; read environments/targets |
| `deployer` | Read releases; create/read promotions; read targets |
| `approver` | Read/approve promotions |
| `viewer` | Read-only access to all resources |
### Environment-Scoped Roles
Roles can be scoped to specific environments:
```typescript
// Example: Production deployer can only deploy to production
const prodDeployer = {
role: "deployer",
scope: { environmentId: "prod-environment-uuid" }
};
```
## Policy Enforcement Points
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ POLICY ENFORCEMENT POINTS │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ API LAYER (PEP 1) │ │
│ │ - Authenticate request │ │
│ │ - Check resource-level permissions │ │
│ │ - Enforce tenant isolation │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ SERVICE LAYER (PEP 2) │ │
│ │ - Check business-level permissions │ │
│ │ - Validate separation of duties │ │
│ │ - Enforce approval policies │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ DECISION ENGINE (PEP 3) │ │
│ │ - Evaluate security gates │ │
│ │ - Evaluate custom OPA policies │ │
│ │ - Produce signed decision records │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ DATA LAYER (PEP 4) │ │
│ │ - Row-level security (tenant_id) │ │
│ │ - Append-only enforcement (evidence) │ │
│ │ - Encryption at rest │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Agent Security Model
See [Agent Security](agent-security.md) for detailed agent security architecture.
Key features:
- mTLS authentication with CA-signed certificates
- One-time registration tokens
- Short-lived JWT for task execution
- Encrypted task payloads
- Scoped credentials per task
## Secrets Management
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ SECRETS FLOW (NEVER STORED IN DB) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ VAULT │ │ STELLA CORE │ │ AGENT │ │
│ │ (Source) │ │ (Broker) │ │ (Consumer) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ │ │ Task requires secret │ │
│ │ │ │ │
│ │ Fetch with service │ │ │
│ │ account token │ │ │
│ │◄─────────────────────── │ │
│ │ │ │ │
│ │ Return secret │ │ │
│ │ (wrapped, short TTL) │ │ │
│ │───────────────────────► │ │
│ │ │ │ │
│ │ │ Embed in task payload │ │
│ │ │ (encrypted) │ │
│ │ │───────────────────────► │
│ │ │ │ │
│ │ │ │ Decrypt │
│ │ │ │ Use for task │
│ │ │ │ Discard │
│ │
│ Rules: │
│ - Secrets NEVER stored in Stella database │
│ - Only Vault references stored │
│ - Secrets fetched at execution time only │
│ - Secrets not logged (masked in logs) │
│ - Secrets not persisted in agent memory beyond task scope │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Threat Model
| Threat | Attack Vector | Mitigation |
|--------|---------------|------------|
| **Credential theft** | Database breach | Secrets never in DB; only vault refs |
| **Token replay** | Stolen JWT | Short-lived tokens (15 min); refresh tokens rotated |
| **Agent impersonation** | Fake agent | mTLS with CA-signed certs; registration token one-time |
| **Digest tampering** | Modified image | Digest verification at pull time; mismatch = failure |
| **Evidence tampering** | Modified audit records | Append-only table; cryptographic signing |
| **Privilege escalation** | Compromised account | Role-based access; SoD enforcement; audit logs |
| **Supply chain attack** | Malicious plugin | Plugin sandbox; capability declarations; review process |
| **Lateral movement** | Compromised target | Short-lived task credentials; scoped permissions |
| **Data exfiltration** | Log/artifact theft | Encryption at rest; network segmentation |
| **Denial of service** | Resource exhaustion | Rate limiting; resource quotas; circuit breakers |
## Audit Trail
### Audit Event Structure
```typescript
interface AuditEvent {
id: UUID;
timestamp: DateTime;
tenantId: UUID;
// Actor
actorType: "user" | "agent" | "system" | "plugin";
actorId: UUID;
actorName: string;
actorIp?: string;
// Action
action: string; // "promotion.approved", "deployment.started"
resource: string; // "promotion"
resourceId: UUID;
// Context
environmentId?: UUID;
releaseId?: UUID;
promotionId?: UUID;
// Details
before?: object; // State before (for updates)
after?: object; // State after
metadata?: object; // Additional context
// Integrity
previousEventHash: string; // Hash chain for tamper detection
eventHash: string;
}
```
### Audited Operations
| Category | Operations |
|----------|------------|
| **Authentication** | Login, logout, token refresh, failed attempts |
| **Authorization** | Permission denied events |
| **Environments** | Create, update, delete, freeze window changes |
| **Releases** | Create, deprecate, archive |
| **Promotions** | Request, approve, reject, cancel |
| **Deployments** | Start, complete, fail, rollback |
| **Targets** | Register, update, delete, health changes |
| **Agents** | Register, heartbeat gaps, capability changes |
| **Integrations** | Create, update, delete, test |
| **Plugins** | Enable, disable, config changes |
| **Evidence** | Create (never update/delete) |
## References
- [Authentication & Authorization](auth.md)
- [Agent Security](agent-security.md)
- [Threat Model](threat-model.md)
- [Audit Trail](audit-trail.md)

View File

@@ -0,0 +1,207 @@
# Threat Model
## Overview
This document identifies threats to the Release Orchestrator and their mitigations.
## Threat Categories
### T1: Credential Theft
| Aspect | Description |
|--------|-------------|
| **Threat** | Attacker gains access to credentials through database breach |
| **Attack Vector** | SQL injection, database backup theft, insider threat |
| **Assets at Risk** | Registry credentials, vault tokens, SSH keys |
| **Mitigation** | Secrets NEVER stored in database; only vault references stored |
| **Detection** | Anomalous vault access patterns, failed authentication attempts |
### T2: Token Replay
| Aspect | Description |
|--------|-------------|
| **Threat** | Attacker captures and reuses valid JWT tokens |
| **Attack Vector** | Man-in-the-middle, log file exposure, memory dump |
| **Assets at Risk** | User sessions, API access |
| **Mitigation** | Short-lived tokens (15 min), refresh token rotation, TLS everywhere |
| **Detection** | Token used from unusual IP, concurrent sessions |
### T3: Agent Impersonation
| Aspect | Description |
|--------|-------------|
| **Threat** | Attacker registers fake agent to receive deployment tasks |
| **Attack Vector** | Stolen registration token, certificate forgery |
| **Assets at Risk** | Deployment credentials, target access |
| **Mitigation** | One-time registration tokens, mTLS with CA-signed certs |
| **Detection** | Registration from unexpected network, capability mismatch |
### T4: Digest Tampering
| Aspect | Description |
|--------|-------------|
| **Threat** | Attacker modifies container image after release creation |
| **Attack Vector** | Registry compromise, man-in-the-middle at pull time |
| **Assets at Risk** | Application integrity, supply chain |
| **Mitigation** | Digest verification at pull time; mismatch = deployment failure |
| **Detection** | Pull failures due to digest mismatch |
### T5: Evidence Tampering
| Aspect | Description |
|--------|-------------|
| **Threat** | Attacker modifies audit records to hide malicious activity |
| **Attack Vector** | Database admin access, SQL injection |
| **Assets at Risk** | Audit integrity, compliance |
| **Mitigation** | Append-only table, cryptographic signing, no UPDATE/DELETE |
| **Detection** | Signature verification failure, hash chain break |
### T6: Privilege Escalation
| Aspect | Description |
|--------|-------------|
| **Threat** | User gains permissions beyond their role |
| **Attack Vector** | Role assignment exploit, permission bypass |
| **Assets at Risk** | Environment access, approval authority |
| **Mitigation** | Role-based access, SoD enforcement, audit logs |
| **Detection** | Unusual permission patterns, SoD violation attempts |
### T7: Supply Chain Attack
| Aspect | Description |
|--------|-------------|
| **Threat** | Malicious plugin injected into workflow |
| **Attack Vector** | Plugin repository compromise, typosquatting |
| **Assets at Risk** | All environments, all credentials |
| **Mitigation** | Plugin sandbox, capability declarations, signed manifests |
| **Detection** | Unexpected network egress, resource anomalies |
### T8: Lateral Movement
| Aspect | Description |
|--------|-------------|
| **Threat** | Attacker uses compromised target to access others |
| **Attack Vector** | Target compromise, credential reuse |
| **Assets at Risk** | Other targets, environments |
| **Mitigation** | Short-lived task credentials, scoped permissions |
| **Detection** | Cross-target credential use, unexpected connections |
### T9: Data Exfiltration
| Aspect | Description |
|--------|-------------|
| **Threat** | Attacker extracts logs, artifacts, or configuration |
| **Attack Vector** | API abuse, log aggregator compromise |
| **Assets at Risk** | Application data, deployment configurations |
| **Mitigation** | Encryption at rest, network segmentation, audit logging |
| **Detection** | Large data transfers, unusual API patterns |
### T10: Denial of Service
| Aspect | Description |
|--------|-------------|
| **Threat** | Attacker exhausts resources to prevent deployments |
| **Attack Vector** | API flooding, workflow loop, agent task spam |
| **Assets at Risk** | Service availability |
| **Mitigation** | Rate limiting, resource quotas, circuit breakers |
| **Detection** | Resource exhaustion alerts, traffic spikes |
## STRIDE Analysis
| Category | Threats | Primary Mitigations |
|----------|---------|---------------------|
| **Spoofing** | T3 Agent Impersonation | mTLS, registration tokens |
| **Tampering** | T4 Digest, T5 Evidence | Digest verification, append-only tables |
| **Repudiation** | Evidence manipulation | Signed evidence packets |
| **Information Disclosure** | T1 Credentials, T9 Exfiltration | Vault integration, encryption |
| **Denial of Service** | T10 Resource exhaustion | Rate limits, quotas |
| **Elevation of Privilege** | T6 Escalation | RBAC, SoD enforcement |
## Trust Boundaries
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ TRUST BOUNDARIES │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PUBLIC NETWORK (Untrusted) │ │
│ │ │ │
│ │ Internet, External Users, External Services │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ TLS + Authentication │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ DMZ (Semi-trusted) │ │
│ │ │ │
│ │ API Gateway, Webhook Gateway │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ Internal mTLS │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ INTERNAL NETWORK (Trusted) │ │
│ │ │ │
│ │ Stella Core Services, Database, Internal Vault │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ Agent mTLS │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ DEPLOYMENT NETWORK (Controlled) │ │
│ │ │ │
│ │ Agents, Targets │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Data Classification
| Classification | Examples | Protection Requirements |
|---------------|----------|------------------------|
| **Critical** | Vault credentials, signing keys | Hardware security, minimal access |
| **Sensitive** | User tokens, agent certificates | Encryption, access logging |
| **Internal** | Release configs, workflow definitions | Encryption at rest |
| **Public** | API documentation, release names | Integrity protection |
## Security Controls Summary
| Control | Implementation | Threats Addressed |
|---------|----------------|-------------------|
| mTLS | Agent communication | T3 |
| Short-lived tokens | 15-min access tokens | T2 |
| Vault integration | No secrets in DB | T1 |
| Digest verification | Pull-time validation | T4 |
| Append-only tables | Evidence immutability | T5 |
| RBAC + SoD | Permission enforcement | T6 |
| Plugin sandbox | Resource limits, capability control | T7 |
| Scoped credentials | Task-specific access | T8 |
| Encryption | At rest and in transit | T9 |
| Rate limiting | API and resource quotas | T10 |
## Incident Response
### Detection Signals
| Signal | Indicates | Response |
|--------|-----------|----------|
| Digest mismatch at pull | T4 Tampering | Halt deployment, investigate registry |
| Evidence signature failure | T5 Tampering | Preserve logs, forensic analysis |
| Unusual agent registration | T3 Impersonation | Revoke agent, review access |
| SoD violation attempt | T6 Escalation | Block action, alert admin |
| Plugin network egress | T7 Supply chain | Isolate plugin, review manifest |
### Response Procedures
1. **Contain** - Isolate affected component (revoke token, disable agent)
2. **Investigate** - Collect logs, evidence packets, audit trail
3. **Remediate** - Patch vulnerability, rotate credentials
4. **Recover** - Restore service, verify integrity
5. **Report** - Document incident, update threat model
## References
- [Security Overview](overview.md)
- [Agent Security](agent-security.md)
- [Audit Trail](audit-trail.md)

View File

@@ -0,0 +1,508 @@
# Test Structure & Guidelines
> Test organization, categorization, and patterns for Release Orchestrator modules.
---
## Test Directory Layout
Release Orchestrator tests follow the Stella Ops standard test structure:
```
src/ReleaseOrchestrator/
├── __Libraries/
│ ├── StellaOps.ReleaseOrchestrator.Core/
│ ├── StellaOps.ReleaseOrchestrator.Workflow/
│ ├── StellaOps.ReleaseOrchestrator.Promotion/
│ └── StellaOps.ReleaseOrchestrator.Deploy/
├── __Tests/
│ ├── StellaOps.ReleaseOrchestrator.Core.Tests/ # Unit tests for Core
│ ├── StellaOps.ReleaseOrchestrator.Workflow.Tests/ # Unit tests for Workflow
│ ├── StellaOps.ReleaseOrchestrator.Promotion.Tests/ # Unit tests for Promotion
│ ├── StellaOps.ReleaseOrchestrator.Deploy.Tests/ # Unit tests for Deploy
│ ├── StellaOps.ReleaseOrchestrator.Integration.Tests/ # Integration tests
│ └── StellaOps.ReleaseOrchestrator.Acceptance.Tests/ # End-to-end tests
└── StellaOps.ReleaseOrchestrator.WebService/
```
**Shared test infrastructure**:
```
src/__Tests/__Libraries/
├── StellaOps.Infrastructure.Postgres.Testing/ # PostgreSQL Testcontainers fixtures
└── StellaOps.Testing.Common/ # Common test utilities
```
---
## Test Categories
Tests **MUST** be categorized using xUnit traits to enable selective execution:
### Unit Tests
```csharp
[Trait("Category", "Unit")]
public class PromotionValidatorTests
{
[Fact]
public void Validate_MissingReleaseId_ReturnsFalse()
{
// Arrange
var validator = new PromotionValidator();
var promotion = new Promotion { ReleaseId = Guid.Empty };
// Act
var result = validator.Validate(promotion);
// Assert
Assert.False(result.IsValid);
Assert.Contains("ReleaseId is required", result.Errors);
}
}
```
**Characteristics**:
- No database, network, or file system access
- Fast execution (< 100ms per test)
- Isolated from external dependencies
- Deterministic and repeatable
### Integration Tests
```csharp
[Trait("Category", "Integration")]
public class PromotionRepositoryTests : IClassFixture<PostgresFixture>
{
private readonly PostgresFixture _fixture;
public PromotionRepositoryTests(PostgresFixture fixture)
{
_fixture = fixture;
}
[Fact]
public async Task SaveAsync_ValidPromotion_PersistsToDatabase()
{
// Arrange
await using var connection = _fixture.CreateConnection();
var repository = new PromotionRepository(connection, _fixture.TimeProvider);
var promotion = new Promotion
{
Id = Guid.NewGuid(),
TenantId = _fixture.DefaultTenantId,
ReleaseId = Guid.NewGuid(),
TargetEnvironmentId = Guid.NewGuid(),
Status = PromotionState.PendingApproval,
RequestedAt = _fixture.TimeProvider.GetUtcNow(),
RequestedBy = Guid.NewGuid()
};
// Act
await repository.SaveAsync(promotion, CancellationToken.None);
// Assert
var retrieved = await repository.GetByIdAsync(promotion.Id, CancellationToken.None);
Assert.NotNull(retrieved);
Assert.Equal(promotion.ReleaseId, retrieved.ReleaseId);
}
}
```
**Characteristics**:
- Uses Testcontainers for PostgreSQL
- Requires Docker to be running
- Slower execution (hundreds of ms per test)
- Tests data access layer and database constraints
### Acceptance Tests
```csharp
[Trait("Category", "Acceptance")]
public class PromotionWorkflowTests : IClassFixture<WebApplicationFactory<Program>>
{
private readonly WebApplicationFactory<Program> _factory;
private readonly HttpClient _client;
public PromotionWorkflowTests(WebApplicationFactory<Program> factory)
{
_factory = factory;
_client = factory.CreateClient();
}
[Fact]
public async Task PromotionWorkflow_EndToEnd_SuccessfullyDeploysRelease()
{
// Arrange: Create environment, release, and promotion
var envId = await CreateEnvironmentAsync("Production");
var releaseId = await CreateReleaseAsync("v2.3.1");
// Act: Request promotion
var promotionResponse = await _client.PostAsJsonAsync(
"/api/v1/promotions",
new { releaseId, targetEnvironmentId = envId });
promotionResponse.EnsureSuccessStatusCode();
var promotion = await promotionResponse.Content.ReadFromJsonAsync<PromotionDto>();
// Act: Approve promotion
var approveResponse = await _client.PostAsync(
$"/api/v1/promotions/{promotion.Id}/approve", null);
approveResponse.EnsureSuccessStatusCode();
// Assert: Verify deployment completed
var status = await GetPromotionStatusAsync(promotion.Id);
Assert.Equal("deployed", status.Status);
}
}
```
**Characteristics**:
- Tests full API surface and workflows
- Uses `WebApplicationFactory` for in-memory hosting
- Tests end-to-end scenarios
- May involve multiple services
---
## PostgreSQL Test Fixtures
### Testcontainers Fixture
```csharp
public class PostgresFixture : IAsyncLifetime
{
private PostgreSqlContainer? _container;
private NpgsqlConnection? _connection;
public TimeProvider TimeProvider { get; private set; } = null!;
public IGuidGenerator GuidGenerator { get; private set; } = null!;
public Guid DefaultTenantId { get; private set; }
public async Task InitializeAsync()
{
// Start PostgreSQL container
_container = new PostgreSqlBuilder()
.WithImage("postgres:16")
.WithDatabase("stellaops_test")
.WithUsername("postgres")
.WithPassword("postgres")
.Build();
await _container.StartAsync();
// Create connection
_connection = new NpgsqlConnection(_container.GetConnectionString());
await _connection.OpenAsync();
// Run migrations
await ApplyMigrationsAsync();
// Setup test infrastructure
TimeProvider = new ManualTimeProvider();
GuidGenerator = new SequentialGuidGenerator();
DefaultTenantId = Guid.Parse("00000000-0000-0000-0000-000000000001");
// Seed test data
await SeedTestDataAsync();
}
public NpgsqlConnection CreateConnection()
{
if (_container == null)
throw new InvalidOperationException("Container not initialized");
return new NpgsqlConnection(_container.GetConnectionString());
}
private async Task ApplyMigrationsAsync()
{
// Apply schema migrations
await ExecuteSqlFileAsync("schema/release-orchestrator-schema.sql");
}
private async Task SeedTestDataAsync()
{
// Create default tenant
await using var cmd = _connection!.CreateCommand();
cmd.CommandText = @"
INSERT INTO tenants (id, name, created_at)
VALUES (@id, @name, @created_at)
ON CONFLICT DO NOTHING";
cmd.Parameters.AddWithValue("id", DefaultTenantId);
cmd.Parameters.AddWithValue("name", "Test Tenant");
cmd.Parameters.AddWithValue("created_at", TimeProvider.GetUtcNow());
await cmd.ExecuteNonQueryAsync();
}
public async Task DisposeAsync()
{
if (_connection != null)
{
await _connection.DisposeAsync();
}
if (_container != null)
{
await _container.DisposeAsync();
}
}
}
```
---
## Test Patterns
### Deterministic Time in Tests
```csharp
public class PromotionTimingTests
{
[Fact]
public void CreatePromotion_SetsCorrectTimestamp()
{
// Arrange
var manualTime = new ManualTimeProvider();
manualTime.SetUtcNow(new DateTimeOffset(2026, 1, 10, 14, 30, 0, TimeSpan.Zero));
var guidGen = new SequentialGuidGenerator();
var manager = new PromotionManager(manualTime, guidGen);
// Act
var promotion = manager.CreatePromotion(
releaseId: Guid.Parse("00000000-0000-0000-0000-000000000001"),
targetEnvId: Guid.Parse("00000000-0000-0000-0000-000000000002")
);
// Assert
Assert.Equal(
new DateTimeOffset(2026, 1, 10, 14, 30, 0, TimeSpan.Zero),
promotion.RequestedAt
);
}
}
```
### Testing CancellationToken Propagation
```csharp
public class PromotionCancellationTests
{
[Fact]
public async Task ApprovePromotionAsync_CancellationRequested_ThrowsOperationCanceledException()
{
// Arrange
var cts = new CancellationTokenSource();
var repository = new Mock<IPromotionRepository>();
repository
.Setup(r => r.GetByIdAsync(It.IsAny<Guid>(), It.IsAny<CancellationToken>()))
.Returns(async (Guid id, CancellationToken ct) =>
{
await Task.Delay(100, ct); // Simulate delay
return new Promotion { Id = id };
});
var manager = new PromotionManager(repository.Object, TimeProvider.System, new SystemGuidGenerator());
// Act & Assert
cts.Cancel(); // Cancel before operation completes
await Assert.ThrowsAsync<OperationCanceledException>(async () =>
await manager.ApprovePromotionAsync(Guid.NewGuid(), Guid.NewGuid(), cts.Token)
);
}
}
```
### Testing Immutability
```csharp
public class ReleaseImmutabilityTests
{
[Fact]
public void GetComponents_ReturnsImmutableCollection()
{
// Arrange
var release = new Release
{
Components = new Dictionary<string, ComponentDigest>
{
["api"] = new ComponentDigest("registry.io/api", "sha256:abc123", "v1.0.0")
}.ToImmutableDictionary()
};
// Act
var components = release.Components;
// Assert: Attempting to modify throws
Assert.Throws<NotSupportedException>(() =>
{
var mutable = (IDictionary<string, ComponentDigest>)components;
mutable["web"] = new ComponentDigest("registry.io/web", "sha256:def456", "v1.0.0");
});
}
}
```
### Testing Evidence Hash Determinism
```csharp
public class EvidenceHashDeterminismTests
{
[Fact]
public void ComputeEvidenceHash_SameInputs_ProducesSameHash()
{
// Arrange
var decisionRecord = new DecisionRecord
{
PromotionId = Guid.Parse("00000000-0000-0000-0000-000000000001"),
DecidedAt = new DateTimeOffset(2026, 1, 10, 12, 0, 0, TimeSpan.Zero),
Outcome = "approved",
GateResults = ImmutableArray.Create(
new GateResult("security", "pass", null)
)
};
// Act: Compute hash multiple times
var hash1 = EvidenceHasher.ComputeHash(decisionRecord);
var hash2 = EvidenceHasher.ComputeHash(decisionRecord);
// Assert: Hashes are identical
Assert.Equal(hash1, hash2);
}
}
```
---
## Running Tests
### Run All Tests
```bash
dotnet test src/StellaOps.sln
```
### Run Only Unit Tests
```bash
dotnet test src/StellaOps.sln --filter "Category=Unit"
```
### Run Only Integration Tests
```bash
dotnet test src/StellaOps.sln --filter "Category=Integration"
```
### Run Specific Test Class
```bash
dotnet test --filter "FullyQualifiedName~PromotionValidatorTests"
```
### Run with Coverage
```bash
dotnet test src/StellaOps.sln --collect:"XPlat Code Coverage"
```
---
## Test Data Builders
Use builder pattern for complex test data:
```csharp
public class PromotionBuilder
{
private Guid _id = Guid.NewGuid();
private Guid _tenantId = Guid.NewGuid();
private Guid _releaseId = Guid.NewGuid();
private Guid _targetEnvId = Guid.NewGuid();
private PromotionState _status = PromotionState.PendingApproval;
private DateTimeOffset _requestedAt = DateTimeOffset.UtcNow;
public PromotionBuilder WithId(Guid id)
{
_id = id;
return this;
}
public PromotionBuilder WithStatus(PromotionState status)
{
_status = status;
return this;
}
public PromotionBuilder WithReleaseId(Guid releaseId)
{
_releaseId = releaseId;
return this;
}
public Promotion Build()
{
return new Promotion
{
Id = _id,
TenantId = _tenantId,
ReleaseId = _releaseId,
TargetEnvironmentId = _targetEnvId,
Status = _status,
RequestedAt = _requestedAt,
RequestedBy = Guid.NewGuid()
};
}
}
// Usage in tests
[Fact]
public void ApprovePromotion_PendingStatus_TransitionsToApproved()
{
var promotion = new PromotionBuilder()
.WithStatus(PromotionState.PendingApproval)
.Build();
// ... test logic
}
```
---
## Code Coverage Requirements
- **Unit tests**: Aim for 80%+ coverage of business logic
- **Integration tests**: Cover all data access paths and constraints
- **Acceptance tests**: Cover critical user journeys
**Exclusions from coverage**:
- Program.cs / Startup.cs configuration code
- DTOs and simple data classes
- Generated code
---
## Summary Checklist
Before merging:
- [ ] All tests categorized with `[Trait("Category", "...")]`
- [ ] Unit tests use `TimeProvider` and `IGuidGenerator` for determinism
- [ ] Integration tests use `PostgresFixture` with Testcontainers
- [ ] `CancellationToken` propagation tested where applicable
- [ ] Evidence hash determinism verified
- [ ] No test reimplements production logic
- [ ] All tests pass locally and in CI
- [ ] Code coverage meets requirements
---
## References
- [Implementation Guide](./implementation-guide.md) .NET implementation patterns
- [CLAUDE.md](../../../CLAUDE.md) Stella Ops coding rules
- [PostgreSQL Testing Guide](../../infrastructure/Postgres.Testing/README.md) Testcontainers setup
- [src/__Tests/AGENTS.md](../../../src/__Tests/AGENTS.md) Global test infrastructure

View File

@@ -0,0 +1,207 @@
# Dashboard Specification
> Main dashboard layout and metrics specification for the Release Orchestrator UI.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 12.1](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [WebSocket APIs](../api/websockets.md), [Metrics](../operations/metrics.md)
**Sprint:** [111_001 Dashboard Overview](../../../../implplan/SPRINT_20260110_111_001_FE_dashboard_overview.md)
## Overview
The dashboard provides a real-time overview of security posture, release operations, estate health, and compliance status.
---
## Dashboard Layout
```
+-----------------------------------------------------------------------------+
| STELLA OPS SUITE |
| +-----+ [User Menu v] |
| |Logo | Dashboard Releases Environments Workflows Integrations |
+-----------------------------------------------------------------------------+
| |
| +-------------------------------+ +-----------------------------------+ |
| | SECURITY POSTURE | | RELEASE OPERATIONS | |
| | | | | |
| | +---------+ +---------+ | | +---------+ +---------+ | |
| | |Critical | | High | | | |In Flight| |Completed| | |
| | | 0 * | | 3 * | | | | 2 | | 47 | | |
| | |reachable| |reachable| | | |deploys | | today | | |
| | +---------+ +---------+ | | +---------+ +---------+ | |
| | | | | |
| | Blocked: 2 releases | | Pending Approval: 3 | |
| | Risk Drift: 1 env | | Failed (24h): 1 | |
| | | | | |
| +-------------------------------+ +-----------------------------------+ |
| |
| +-------------------------------+ +-----------------------------------+ |
| | ESTATE HEALTH | | COMPLIANCE/AUDIT | |
| | | | | |
| | Agents: 12 online, 1 offline| | Evidence Complete: 98% | |
| | Targets: 45/47 healthy | | Policy Changes: 2 (this week) | |
| | Drift Detected: 2 targets | | Audit Exports: 5 (this month) | |
| | | | | |
| +-------------------------------+ +-----------------------------------+ |
| |
| +-----------------------------------------------------------------------+ |
| | RECENT ACTIVITY | |
| | | |
| | * 14:32 myapp-v2.3.1 deployed to prod (jane@example.com) | |
| | o 14:28 myapp-v2.3.1 promoted to stage (auto) | |
| | * 14:15 api-v1.2.0 blocked: critical vuln CVE-2024-1234 | |
| | o 13:45 worker-v3.0.0 release created (john@example.com) | |
| | * 13:30 Target prod-web-03 health: degraded | |
| | | |
| +-----------------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------------+
```
---
## Dashboard Metrics
### TypeScript Interfaces
```typescript
interface DashboardMetrics {
// Security Posture
security: {
criticalReachable: number;
highReachable: number;
blockedReleases: number;
riskDriftEnvironments: number;
digestsAnalyzedToday: number;
digestQuota: number;
};
// Release Operations
operations: {
deploymentsInFlight: number;
deploymentsCompletedToday: number;
deploymentsFailed24h: number;
pendingApprovals: number;
averageDeployTime: number; // seconds
};
// Estate Health
estate: {
agentsOnline: number;
agentsOffline: number;
agentsDegraded: number;
targetsHealthy: number;
targetsUnhealthy: number;
targetsDrift: number;
};
// Compliance/Audit
compliance: {
evidenceCompleteness: number; // percentage
policyChangesThisWeek: number;
auditExportsThisMonth: number;
lastExportDate: DateTime;
};
}
```
---
## Dashboard Panels
### 1. Security Posture Panel
Displays current security state across all releases:
| Metric | Description |
|--------|-------------|
| Critical Reachable | Critical vulnerabilities with confirmed reachability |
| High Reachable | High severity vulnerabilities with confirmed reachability |
| Blocked Releases | Releases blocked by security gates |
| Risk Drift | Environments with changed risk since deployment |
### 2. Release Operations Panel
Shows active deployment operations:
| Metric | Description |
|--------|-------------|
| In Flight | Deployments currently in progress |
| Completed Today | Successful deployments in last 24h |
| Pending Approval | Promotions awaiting approval |
| Failed (24h) | Failed deployments in last 24h |
### 3. Estate Health Panel
Displays agent and target health:
| Metric | Description |
|--------|-------------|
| Agents Online | Number of agents reporting healthy |
| Agents Offline | Agents that missed heartbeats |
| Targets Healthy | Targets passing health checks |
| Drift Detected | Targets with version drift |
### 4. Compliance/Audit Panel
Shows audit and compliance status:
| Metric | Description |
|--------|-------------|
| Evidence Complete | % of deployments with full evidence |
| Policy Changes | Policy modifications this week |
| Audit Exports | Evidence exports this month |
---
## Real-Time Updates
### WebSocket Integration
```typescript
interface DashboardStreamMessage {
type: "metric_update" | "activity" | "alert";
timestamp: DateTime;
payload: MetricUpdate | ActivityEvent | Alert;
}
// Subscribe to dashboard stream
const ws = new WebSocket("/api/v1/dashboard/stream");
ws.onmessage = (event) => {
const message: DashboardStreamMessage = JSON.parse(event.data);
switch (message.type) {
case "metric_update":
updateMetrics(message.payload);
break;
case "activity":
addActivityItem(message.payload);
break;
case "alert":
showAlert(message.payload);
break;
}
};
```
---
## Performance Targets
| Metric | Target |
|--------|--------|
| Initial Load | < 2 seconds |
| Metric Refresh | Every 30 seconds |
| WebSocket Reconnect | Exponential backoff (1s, 2s, 4s, ... 30s max) |
| Activity History | Last 50 events |
---
## See Also
- [WebSocket APIs](../api/websockets.md)
- [Metrics](../operations/metrics.md)
- [Workflow Editor](workflow-editor.md)
- [Key Screens](screens.md)

View File

@@ -0,0 +1,332 @@
# UI Overview
## Status
**Planned** - UI implementation has not started.
## Design Principles
| Principle | Implementation |
|-----------|----------------|
| **Clarity** | Clear status indicators, intuitive navigation |
| **Real-time** | Live updates via WebSocket for deployments |
| **Actionable** | One-click approvals, quick actions |
| **Audit-friendly** | Full history visibility, evidence access |
| **Mobile-aware** | Responsive design for on-call scenarios |
## Main Screens
### Dashboard
The main dashboard provides an at-a-glance view of deployment health across environments.
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ RELEASE ORCHESTRATOR [User] [Settings] │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ ENVIRONMENT PIPELINE │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ DEV │───►│ STAGING │───►│ UAT │───►│ PROD │ │ │
│ │ │ v1.5.0 │ │ v1.4.2 │ │ v1.4.1 │ │ v1.4.0 │ │ │
│ │ │ 3/3 OK │ │ 2/2 OK │ │ 2/2 OK │ │ 5/5 OK │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────┐ ┌──────────────────────────────┐ │
│ │ PENDING APPROVALS (3) │ │ RECENT DEPLOYMENTS │ │
│ │ │ │ │ │
│ │ ● myapp → prod [Approve] │ │ ✓ api v1.5.0 → dev 2m │ │
│ │ Requested by: John │ │ ✓ web v1.4.2 → staging 15m │ │
│ │ 2 hours ago │ │ ✗ api v1.4.1 → uat 1h │ │
│ │ │ │ ✓ web v1.4.0 → prod 2h │ │
│ │ ● web → uat [Approve] │ │ │ │
│ │ Requested by: Jane │ │ [View All] │ │
│ │ 30 minutes ago │ │ │ │
│ │ │ │ │ │
│ └──────────────────────────────┘ └──────────────────────────────┘ │
│ │
│ ┌──────────────────────────────┐ ┌──────────────────────────────┐ │
│ │ AGENT STATUS │ │ ACTIVE WORKFLOWS │ │
│ │ │ │ │ │
│ │ ● 12 Online │ │ ● Deploy api v1.5.0 │ │
│ │ ○ 1 Offline │ │ Step: Health Check (3/5) │ │
│ │ ◐ 2 Degraded │ │ │ │
│ │ │ │ ● Promote web to UAT │ │
│ │ [View Details] │ │ Step: Awaiting Approval │ │
│ │ │ │ │ │
│ └──────────────────────────────┘ └──────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Releases View
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ RELEASES [+ Create Release] │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Filter: [All ▼] Status: [All ▼] Search: [________________] │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ NAME STATUS COMPONENTS ENVIRONMENTS CREATED │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ myapp-v1.5.0 Ready 3 dev 2h ago │ │
│ │ myapp-v1.4.2 Deployed 3 staging, uat 1d ago │ │
│ │ myapp-v1.4.1 Deployed 3 prod 3d ago │ │
│ │ myapp-v1.4.0 Deprecated 3 - 1w ago │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ RELEASE DETAIL: myapp-v1.5.0 [Promote ▼] │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ Components: │ │
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
│ │ │ api sha256:abc123... registry.io/myorg/api │ │ │
│ │ │ web sha256:def456... registry.io/myorg/web │ │ │
│ │ │ worker sha256:ghi789... registry.io/myorg/worker │ │ │
│ │ └────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ Source: https://github.com/myorg/myapp @ v1.5.0 │ │
│ │ Created: 2h ago by john@example.com │ │
│ │ │ │
│ │ Promotion History: │ │
│ │ dev (✓) → staging (pending) → uat (-) → prod (-) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Promotion Detail
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ PROMOTION: myapp-v1.5.0 → production │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Status: PENDING APPROVAL [Approve] [Reject] │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ GATE EVALUATION │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ ✓ Security Gate Passed │ │
│ │ No critical vulnerabilities │ │
│ │ │ │
│ │ ✓ Freeze Window Check Passed │ │
│ │ No active freeze windows │ │
│ │ │ │
│ │ ◐ Approval Gate 1/2 Approvals │ │
│ │ Jane approved 30m ago │ │
│ │ Waiting for 1 more approval │ │
│ │ │ │
│ │ ○ Separation of Duties Pending │ │
│ │ Requester: John (cannot approve) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PROMOTION TIMELINE │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ 10:00 John requested promotion │ │
│ │ 10:05 Security gate evaluated: PASSED │ │
│ │ 10:05 Freeze check: PASSED │ │
│ │ 10:30 Jane approved │ │
│ │ 11:00 Waiting for additional approval... │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Workflow Editor
Visual editor for creating and modifying workflow templates.
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ WORKFLOW EDITOR: standard-deploy [Save] [Run] │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────────────────────────────────────┐ │
│ │ STEP PALETTE │ │ │ │
│ │ │ │ │ │
│ │ Control │ │ ┌──────────┐ │ │
│ │ ├─ Approval │ │ │ Approval │ │ │
│ │ ├─ Wait │ │ │ Gate │ │ │
│ │ └─ Condition │ │ └────┬─────┘ │ │
│ │ │ │ │ │ │
│ │ Gates │ │ ▼ │ │
│ │ ├─ Security │ │ ┌──────────┐ │ │
│ │ ├─ Freeze │ │ │ Security │ │ │
│ │ └─ Custom │ │ │ Gate │ │ │
│ │ │ │ └────┬─────┘ │ │
│ │ Deploy │ │ │ │ │
│ │ ├─ Docker │ │ ▼ │ │
│ │ ├─ Compose │ │ ┌──────────┐ │ │
│ │ └─ ECS │ │ │ Deploy │ │ │
│ │ │ │ │ Targets │ │ │
│ │ Verify │ │ └────┬─────┘ │ │
│ │ ├─ Health │ │ │ │ │
│ │ └─ Smoke Test │ │ ┌────┴────┐ │ │
│ │ │ │ │ │ │ │
│ │ Notify │ │ ▼ ▼ │ │
│ │ ├─ Slack │ │ ┌──────┐ ┌──────────┐ │ │
│ │ └─ Email │ │ │Health│ │ Rollback │◄──[on failure] │ │
│ │ │ │ │Check │ │ Handler │ │ │
│ │ │ │ └──┬───┘ └────┬─────┘ │ │
│ │ │ │ │ │ │ │
│ │ │ │ ▼ ▼ │ │
│ │ │ │ ┌──────┐ ┌──────────┐ │ │
│ │ │ │ │Notify│ │ Notify │ │ │
│ │ │ │ │Success│ │ Failure │ │ │
│ │ │ │ └──────┘ └──────────┘ │ │
│ │ │ │ │ │
│ └─────────────────┘ └─────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ STEP PROPERTIES: Deploy Targets │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ Type: deploy-compose │ │
│ │ Strategy: [Rolling ▼] │ │
│ │ Parallelism: [2] │ │
│ │ Timeout: [600] seconds │ │
│ │ On Failure: [Rollback ▼] │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Deployment Live View
Real-time view of an active deployment.
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ DEPLOYMENT: myapp-v1.5.0 → production [Abort]│
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Status: RUNNING Progress: ████████░░ 80% │
│ Strategy: Rolling (batch 4/5) Duration: 5m 23s │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ TARGET STATUS │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ ✓ prod-host-1 sha256:abc123 Deployed Health: OK │ │
│ │ ✓ prod-host-2 sha256:abc123 Deployed Health: OK │ │
│ │ ✓ prod-host-3 sha256:abc123 Deployed Health: OK │ │
│ │ ● prod-host-4 sha256:abc123 Deploying Health: Checking... │ │
│ │ ○ prod-host-5 - Pending Health: - │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ LIVE LOGS: prod-host-4 │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ 10:25:15 Pulling image sha256:abc123... │ │
│ │ 10:25:18 Image pulled successfully │ │
│ │ 10:25:19 Stopping existing container... │ │
│ │ 10:25:20 Starting new container... │ │
│ │ 10:25:21 Container started │ │
│ │ 10:25:22 Running health check... │ │
│ │ 10:25:25 Health check passed (1/3) │ │
│ │ 10:25:28 Health check passed (2/3) │ │
│ │ ... │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Environment Management
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ ENVIRONMENTS [+ Add Environment] │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ NAME ORDER TARGETS CURRENT RELEASE APPROVALS STATUS │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ development 1 3 myapp-v1.5.0 0 Active │ │
│ │ staging 2 2 myapp-v1.4.2 1 Active │ │
│ │ uat 3 2 myapp-v1.4.1 1 Active │ │
│ │ production 4 5 myapp-v1.4.0 2 + SoD Active │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ ENVIRONMENT DETAIL: production [Edit] │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ Approval Policy: │ │
│ │ - Required approvals: 2 │ │
│ │ - Separation of duties: Enabled │ │
│ │ - Approver roles: release-manager, tech-lead │ │
│ │ │ │
│ │ Freeze Windows: │ │
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
│ │ │ Holiday Freeze Dec 20 - Jan 5 Active [Remove] │ │ │
│ │ │ Weekend Freeze Sat-Sun Active [Remove] │ │ │
│ │ └────────────────────────────────────────────────────────────┘ │ │
│ │ [+ Add Freeze Window] │ │
│ │ │ │
│ │ Targets: │ │
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
│ │ │ prod-host-1 docker_host healthy sha256:abc... │ │ │
│ │ │ prod-host-2 docker_host healthy sha256:abc... │ │ │
│ │ │ prod-host-3 docker_host healthy sha256:abc... │ │ │
│ │ │ prod-host-4 docker_host healthy sha256:abc... │ │ │
│ │ │ prod-host-5 docker_host degraded sha256:abc... │ │ │
│ │ └────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Key Interactions
### Approval Flow
1. User sees pending approval notification on dashboard
2. Clicks to view promotion detail
3. Reviews gate evaluation results and change details
4. Clicks "Approve" or "Reject" with optional comment
5. System validates SoD requirements
6. Promotion advances or notification sent
### Quick Promote
1. From release detail, user clicks "Promote"
2. Selects target environment from dropdown
3. Confirms promotion request
4. System evaluates gates immediately
5. If auto-approved, deployment begins
6. If approval required, notification sent to approvers
### Emergency Rollback
1. From deployment history or alert, user clicks "Rollback"
2. System shows previous healthy version
3. User confirms rollback
4. System creates rollback deployment job
5. Real-time progress shown
## Mobile Considerations
- Responsive design for smaller screens
- Critical actions (approve/reject) accessible on mobile
- Push notifications for pending approvals
- Simplified views for monitoring on-the-go
## References
- [API Overview](../api/overview.md)
- [Workflow Templates](../workflow/templates.md)

View File

@@ -0,0 +1,232 @@
# Key UI Screens
> Specification for key UI screens: Environment Overview, Release Detail, and Why Blocked modal.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 12.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Environment Manager](../modules/environment-manager.md), [Release Manager](../modules/release-manager.md)
**Sprints:** [111_002 - 111_007](../../../../implplan/)
## Overview
This document specifies the key UI screens for release orchestration.
---
## Environment Overview Screen
The environment overview shows the deployment pipeline and current state of each environment.
```
+-----------------------------------------------------------------------------+
| ENVIRONMENTS [+ New Environment] |
+-----------------------------------------------------------------------------+
| |
| +------------------------------------------------------------------------+ |
| | ENVIRONMENT PIPELINE | |
| | | |
| | +---------+ +---------+ +---------+ +---------+ | |
| | | DEV | ---> | TEST | ---> | STAGE | ---> | PROD | | |
| | | | | | | | | | | |
| | | v2.4.0 | | v2.3.1 | | v2.3.1 | | v2.3.0 | | |
| | | * 5 min | | * 2h | | * 1d | | * 3d | | |
| | +---------+ +---------+ +---------+ +---------+ | |
| | | |
| +------------------------------------------------------------------------+ |
| |
| +------------------------------------------------------------------------+ |
| | PRODUCTION [Manage] [View] | |
| | | |
| | Current Release: myapp-v2.3.0 | |
| | Deployed: 3 days ago by jane@example.com | |
| | Targets: 5 healthy, 0 unhealthy | |
| | | |
| | +---------------------------------------------------------------+ | |
| | | Pending Promotion: myapp-v2.3.1 [Review] | | |
| | | Waiting: 2 approvals (1/2) | | |
| | | Security: V All gates pass | | |
| | +---------------------------------------------------------------+ | |
| | | |
| | Freeze Windows: None active | |
| | Required Approvals: 2 | |
| | | |
| +------------------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------------+
```
### Features
- **Environment Pipeline:** Visual flow showing version progression
- **Environment Cards:** Detailed view of each environment
- **Target Health:** Real-time target health indicators
- **Pending Promotions:** Promotions awaiting action
- **Freeze Windows:** Active and scheduled freeze windows
- **Approval Status:** Current approval count vs required
---
## Release Detail Screen
The release detail screen shows all information about a specific release.
```
+-----------------------------------------------------------------------------+
| RELEASE: myapp-v2.3.1 |
| Created: 2 hours ago by jane@example.com |
+-----------------------------------------------------------------------------+
| |
| [Overview] [Components] [Security] [Deployments] [Evidence] |
| |
| +------------------------------------------------------------------------+ |
| | COMPONENTS | |
| | | |
| | +------------------------------------------------------------------+ | |
| | | api | | |
| | | Version: 2.3.1 Digest: sha256:abc123... | | |
| | | Security: V 0 critical, 0 high (0 reachable) | | |
| | | Image: registry.example.com/myapp/api@sha256:abc123 | | |
| | +------------------------------------------------------------------+ | |
| | | |
| | +------------------------------------------------------------------+ | |
| | | worker | | |
| | | Version: 2.3.1 Digest: sha256:def456... | | |
| | | Security: V 0 critical, 0 high (0 reachable) | | |
| | | Image: registry.example.com/myapp/worker@sha256:def456 | | |
| | +------------------------------------------------------------------+ | |
| | | |
| +------------------------------------------------------------------------+ |
| |
| +------------------------------------------------------------------------+ |
| | DEPLOYMENT STATUS | |
| | | |
| | dev *--------------------------------------------* Deployed (2h) | |
| | test *--------------------------------------------* Deployed (1h) | |
| | stage o--------------------------------------------* Deploying... | |
| | prod o Not deployed | |
| | | |
| +------------------------------------------------------------------------+ |
| |
| [Promote to Stage v] [Compare with Production] [Download Evidence] |
| |
+-----------------------------------------------------------------------------+
```
### Tabs
1. **Overview:** Release metadata and summary
2. **Components:** Component list with digests and versions
3. **Security:** Vulnerability summary and reachability analysis
4. **Deployments:** Deployment history across environments
5. **Evidence:** Evidence packets for compliance
### Features
- **Digest Display:** Full OCI digests for each component
- **Security Summary:** Vulnerability counts by severity
- **Deployment Timeline:** Visual progress across environments
- **Quick Actions:** Promote, compare, and export options
---
## "Why Blocked?" Modal
The "Why Blocked?" modal explains why a promotion cannot proceed.
```
+-----------------------------------------------------------------------------+
| WHY IS THIS PROMOTION BLOCKED? [Close] |
+-----------------------------------------------------------------------------+
| |
| Release: myapp-v2.4.0 -> Production |
| |
| +------------------------------------------------------------------------+ |
| | X SECURITY GATE FAILED | |
| | | |
| | Component 'api' has 1 critical reachable vulnerability: | |
| | | |
| | - CVE-2024-1234 (Critical, CVSS 9.8) | |
| | Package: log4j 2.14.0 | |
| | Reachability: V Confirmed reachable via api/logging/Logger.java | |
| | Fixed in: 2.17.1 | |
| | [View Details] [View Evidence] | |
| | | |
| | Remediation: Update log4j to version 2.17.1 or later | |
| | | |
| +------------------------------------------------------------------------+ |
| |
| +------------------------------------------------------------------------+ |
| | V APPROVAL GATE PASSED | |
| | | |
| | Required: 2 approvals | |
| | Received: 2 approvals | |
| | - john@example.com (2h ago): "LGTM" | |
| | - sarah@example.com (1h ago): "Approved for prod" | |
| | | |
| +------------------------------------------------------------------------+ |
| |
| +------------------------------------------------------------------------+ |
| | V FREEZE WINDOW GATE PASSED | |
| | | |
| | No active freeze windows for production | |
| | | |
| +------------------------------------------------------------------------+ |
| |
| Policy evaluated at: 2026-01-09T14:32:15Z |
| Policy hash: sha256:789xyz... |
| [View Full Decision Record] |
| |
+-----------------------------------------------------------------------------+
```
### Features
- **Gate-by-Gate Status:** Shows each gate with pass/fail status
- **Failure Details:** Specific information about why a gate failed
- **Vulnerability Details:** CVE info, package, version, and remediation
- **Reachability Evidence:** Links to reachability analysis
- **Approval History:** List of approvers and their comments
- **Override Mechanism:** Request override for authorized users
- **Decision Record:** Link to full evidence packet
---
## Navigation Structure
```
Dashboard
+-- Releases
| +-- [Release Detail]
| +-- Create Release
| +-- Compare Releases
|
+-- Environments
| +-- [Environment Overview]
| +-- Create Environment
| +-- Manage Targets
|
+-- Workflows
| +-- [Workflow Editor]
| +-- Workflow Runs
| +-- Step Types
|
+-- Integrations
| +-- Connectors
| +-- Plugins
| +-- Vault
|
+-- Settings
+-- Users & Teams
+-- Policies
+-- Audit Log
```
---
## See Also
- [Dashboard](dashboard.md)
- [Workflow Editor](workflow-editor.md)
- [Environment Manager](../modules/environment-manager.md)
- [Release Manager](../modules/release-manager.md)
- [Promotion Manager](../modules/promotion-manager.md)

View File

@@ -0,0 +1,296 @@
# Workflow Editor Specification
> Visual workflow editor for creating and editing DAG-based workflow templates.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 12.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Workflow Engine](../modules/workflow-engine.md), [Workflow Templates](../workflow/templates.md)
**Sprint:** [111_004 Workflow Editor](../../../../implplan/SPRINT_20260110_111_004_FE_workflow_editor.md)
## Overview
The workflow editor provides a visual graph editor for creating and editing workflow templates. It supports drag-and-drop node placement, connection creation, real-time run visualization, and bidirectional YAML synchronization.
---
## Graph Editor Component
### Editor State
```typescript
interface WorkflowEditorState {
template: WorkflowTemplate;
selectedNode: string | null;
selectedEdge: string | null;
zoom: number;
pan: { x: number; y: number };
mode: "select" | "pan" | "connect";
clipboard: StepNode[] | null;
undoStack: WorkflowTemplate[];
redoStack: WorkflowTemplate[];
}
interface WorkflowEditorProps {
template: WorkflowTemplate;
stepTypes: StepType[];
readOnly: boolean;
onSave: (template: WorkflowTemplate) => void;
onValidate: (template: WorkflowTemplate) => ValidationResult;
}
```
### Node Renderer
```typescript
interface NodeRendererProps {
node: StepNode;
stepType: StepType;
status?: StepRunStatus; // For run visualization
selected: boolean;
onSelect: () => void;
onMove: (position: Position) => void;
onConnect: (sourceHandle: string) => void;
}
const NodeRenderer: React.FC<NodeRendererProps> = ({
node, stepType, status, selected
}) => {
const statusColor = getStatusColor(status);
return (
<div className={`workflow-node ${selected ? 'selected' : ''}`}
style={{ borderColor: statusColor }}>
{/* Node header */}
<div className="node-header" style={{ backgroundColor: stepType.color }}>
<Icon name={stepType.icon} />
<span className="node-name">{node.name}</span>
{status && <StatusBadge status={status} />}
</div>
{/* Node body */}
<div className="node-body">
<span className="node-type">{stepType.name}</span>
{node.timeout && <span className="node-timeout">T {node.timeout}s</span>}
</div>
{/* Connection handles */}
<Handle type="target" position="top" />
<Handle type="source" position="bottom" />
{/* Conditional indicator */}
{node.condition && (
<div className="condition-badge" title={node.condition}>
<Icon name="condition" />
</div>
)}
</div>
);
};
```
---
## Run Visualization Overlay
### Real-Time Execution Display
```typescript
interface RunVisualizationProps {
template: WorkflowTemplate;
workflowRun: WorkflowRun;
stepRuns: StepRun[];
onNodeClick: (nodeId: string) => void;
}
const RunVisualization: React.FC<RunVisualizationProps> = ({
template, workflowRun, stepRuns, onNodeClick
}) => {
// WebSocket for real-time updates
const { subscribe, unsubscribe } = useWorkflowStream(workflowRun.id);
useEffect(() => {
const handlers = {
'step_started': (data) => updateStepStatus(data.nodeId, 'running'),
'step_completed': (data) => updateStepStatus(data.nodeId, data.status),
'step_log': (data) => appendLog(data.nodeId, data.line),
};
subscribe(handlers);
return () => unsubscribe();
}, [workflowRun.id]);
return (
<div className="run-visualization">
{/* Workflow graph with status overlay */}
<WorkflowGraph
template={template}
nodeRenderer={(node) => (
<NodeRenderer
node={node}
stepType={getStepType(node.type)}
status={getStepRunStatus(node.id)}
selected={selectedNode === node.id}
onSelect={() => setSelectedNode(node.id)}
/>
)}
edgeRenderer={(edge) => (
<EdgeRenderer
edge={edge}
animated={isEdgeActive(edge)}
/>
)}
/>
{/* Log panel */}
{selectedNode && (
<LogPanel
stepRun={getStepRun(selectedNode)}
streaming={isStepRunning(selectedNode)}
/>
)}
{/* Progress bar */}
<ProgressBar
completed={completedSteps}
total={totalSteps}
status={workflowRun.status}
/>
</div>
);
};
```
### Status Indicators
| Status | Visual |
|--------|--------|
| Pending | Gray circle |
| Running | Blue spinner |
| Success | Green checkmark |
| Failed | Red X |
| Skipped | Yellow dash |
---
## Canvas Operations
### Drag and Drop
- Drag steps from palette to canvas
- Drop creates new node at position
- Connect nodes by dragging from source to target handle
- Multi-select with Shift+click or box selection
### Validation
The editor performs real-time validation:
- **DAG Cycle Detection:** Prevent circular dependencies
- **Orphan Node Detection:** Warn about unconnected nodes
- **Required Inputs:** Highlight missing required configuration
- **Type Compatibility:** Validate edge connections between compatible types
### Zoom and Pan
| Action | Control |
|--------|---------|
| Zoom In | Ctrl + Mouse Wheel Up |
| Zoom Out | Ctrl + Mouse Wheel Down |
| Fit View | Ctrl + 0 |
| Pan | Middle Mouse Drag / Space + Drag |
| Reset | Ctrl + R |
---
## YAML Editor Mode
### Monaco Editor Integration
The editor supports a bidirectional YAML mode for power users:
```typescript
interface YAMLEditorProps {
template: WorkflowTemplate;
onChange: (template: WorkflowTemplate) => void;
onValidate: (yaml: string) => ValidationResult;
}
const YAMLEditor: React.FC<YAMLEditorProps> = ({ template, onChange, onValidate }) => {
const [yaml, setYaml] = useState(templateToYaml(template));
return (
<MonacoEditor
language="yaml"
value={yaml}
onChange={(value) => {
setYaml(value);
const result = onValidate(value);
if (result.valid) {
onChange(yamlToTemplate(value));
}
}}
options={{
minimap: { enabled: false },
lineNumbers: 'on',
scrollBeyondLastLine: false,
}}
/>
);
};
```
### Bidirectional Sync
Changes in either view (graph or YAML) are synchronized:
- Graph changes update YAML immediately
- Valid YAML changes update graph
- Invalid YAML shows error markers without updating graph
---
## Step Palette
### Available Step Types
The palette shows all available step types from core and plugins:
```typescript
interface StepPaletteProps {
stepTypes: StepType[];
onDragStart: (stepType: string) => void;
filter: string;
}
const categories = [
{ name: "Deployment", types: ["deploy", "rollback"] },
{ name: "Gates", types: ["security-gate", "approval", "freeze-window-gate"] },
{ name: "Utility", types: ["script", "wait", "notify"] },
{ name: "Plugins", types: [] }, // Dynamically loaded
];
```
---
## Keyboard Shortcuts
| Shortcut | Action |
|----------|--------|
| Ctrl + S | Save template |
| Ctrl + Z | Undo |
| Ctrl + Shift + Z | Redo |
| Delete | Delete selected |
| Ctrl + C | Copy selected |
| Ctrl + V | Paste |
| Ctrl + A | Select all |
| Escape | Deselect / Cancel |
---
## See Also
- [Workflow Templates](../workflow/templates.md)
- [Workflow APIs](../api/workflows.md)
- [Dashboard](dashboard.md)
- [Key Screens](screens.md)

View File

@@ -0,0 +1,591 @@
# Workflow Execution
## Overview
The Workflow Engine executes workflow templates as DAGs (Directed Acyclic Graphs) of steps, managing state transitions, parallelism, retries, and failure handling.
## Execution Architecture
```
WORKFLOW EXECUTION ARCHITECTURE
┌─────────────────────────────────────────────────────────────────────────────┐
│ WORKFLOW ENGINE │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ WORKFLOW RUNNER │ │
│ │ │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ Template │───►│ Execution │───►│ Context │ │ │
│ │ │ Parser │ │ Planner │ │ Builder │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ │ │ │ │ │ │
│ │ └────────────────┼─────────────────┘ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ DAG EXECUTOR │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │
│ │ │ │ Ready │ │ Running │ │ Waiting │ │ Completed│ │ │ │
│ │ │ │ Queue │ │ Set │ │ Set │ │ Set │ │ │ │
│ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ STEP DISPATCHER │ │ │ │
│ │ │ └──────────────────────────────────────────────────────┘ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ STEP EXECUTOR POOL │ │
│ │ │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ Executor 1 │ │ Executor 2 │ │ Executor 3 │ │ Executor N │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Workflow Run State Machine
```
WORKFLOW RUN STATES
┌──────────┐
│ CREATED │
└────┬─────┘
│ start()
┌──────────┐
│ RUNNING │◄──────────────────┐
└────┬─────┘ │
│ │
┌───────────────────┼───────────────────┐ │
│ │ │ │
▼ ▼ ▼ │
┌──────────┐ ┌──────────┐ ┌──────────┐│
│ WAITING │ │ PAUSED │ │ FAILING ││
│ APPROVAL │ │ │ │ ││
└────┬─────┘ └────┬─────┘ └────┬─────┘│
│ │ │ │
│ approve() │ resume() │ │
│ │ │ │
└───────────────►──┴──────────────────┘ │
│ │
└─────────────────────────┘
┌───────────────────────┼───────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│COMPLETED │ │ FAILED │ │ CANCELLED│
└──────────┘ └──────────┘ └──────────┘
```
### State Transitions
| Current State | Event | Next State | Description |
|---------------|-------|------------|-------------|
| `created` | `start()` | `running` | Begin workflow execution |
| `running` | Step requires approval | `waiting_approval` | Pause for human approval |
| `running` | `pause()` | `paused` | Manual pause requested |
| `running` | Step fails | `failing` | Handle failure path |
| `running` | All steps complete | `completed` | Workflow success |
| `waiting_approval` | `approve()` | `running` | Resume after approval |
| `waiting_approval` | `reject()` | `failed` | Rejection ends workflow |
| `paused` | `resume()` | `running` | Resume execution |
| `paused` | `cancel()` | `cancelled` | Cancel workflow |
| `failing` | Rollback complete | `failed` | Failure handling done |
| `failing` | Rollback succeeds | `running` | Resume with fallback |
## Step Execution State Machine
```
STEP STATES
┌──────────┐
│ PENDING │
└────┬─────┘
│ schedule()
┌──────────┐
│ QUEUED │
└────┬─────┘
│ dispatch()
┌──────────┐
│ RUNNING │◄─────────┐
└────┬─────┘ │
│ │ retry()
┌───────────────────┼───────────────┐│
│ │ ││
▼ ▼ ▼│
┌──────────┐ ┌──────────┐ ┌──────────┐
│SUCCEEDED │ │ FAILED │ │ RETRYING │
└──────────┘ └────┬─────┘ └──────────┘
┌─────────────────────┐
│ FAILURE HANDLER │
│ ┌───────────────┐ │
│ │ fail │──┼─► Mark workflow failing
│ │ continue │──┼─► Continue to next step
│ │ rollback │──┼─► Trigger rollback path
│ │ goto:{nodeId} │──┼─► Jump to specific node
│ └───────────────┘ │
└─────────────────────┘
```
### Step States
| State | Description |
|-------|-------------|
| `pending` | Step not yet ready (dependencies incomplete) |
| `queued` | Ready for execution, waiting for executor |
| `running` | Currently executing |
| `succeeded` | Completed successfully |
| `failed` | Failed after all retries exhausted |
| `retrying` | Failed, waiting for retry |
| `skipped` | Condition evaluated to false |
## DAG Execution Algorithm
```python
class DAGExecutor:
def __init__(self, workflow_run: WorkflowRun):
self.run = workflow_run
self.template = workflow_run.template
self.pending = set(node.id for node in template.nodes)
self.running = set()
self.completed = set()
self.failed = set()
self.outputs = {} # nodeId -> outputs
async def execute(self):
"""Main execution loop."""
self.run.status = WorkflowStatus.RUNNING
self.run.started_at = datetime.utcnow()
while self.pending or self.running:
# Find ready nodes (all dependencies satisfied)
ready = self.find_ready_nodes()
# Dispatch ready nodes
for node_id in ready:
asyncio.create_task(self.execute_node(node_id))
self.pending.remove(node_id)
self.running.add(node_id)
# Wait for any node to complete
if self.running:
await self.wait_for_completion()
# Check for deadlock
if not ready and self.pending and not self.running:
raise DeadlockException(self.pending)
# Determine final status
if self.failed:
self.run.status = WorkflowStatus.FAILED
else:
self.run.status = WorkflowStatus.COMPLETED
self.run.completed_at = datetime.utcnow()
def find_ready_nodes(self) -> List[str]:
"""Find nodes whose dependencies are all complete."""
ready = []
for node_id in self.pending:
node = self.template.get_node(node_id)
# Check condition
if node.condition:
if not self.evaluate_condition(node.condition):
self.mark_skipped(node_id)
continue
# Check all incoming edges
incoming = self.template.get_incoming_edges(node_id)
dependencies_met = all(
edge.from_node in self.completed
for edge in incoming
if self.evaluate_edge_condition(edge)
)
if dependencies_met:
ready.append(node_id)
return ready
async def execute_node(self, node_id: str):
"""Execute a single node."""
node = self.template.get_node(node_id)
step_run = StepRun(
workflow_run_id=self.run.id,
node_id=node_id,
status=StepStatus.RUNNING
)
try:
# Resolve inputs
inputs = self.resolve_inputs(node)
# Get step executor
executor = self.step_registry.get_executor(node.type)
# Execute with timeout
async with asyncio.timeout(node.timeout):
outputs = await executor.execute(inputs, node.config)
# Store outputs
self.outputs[node_id] = outputs
step_run.outputs = outputs
step_run.status = StepStatus.SUCCEEDED
self.running.remove(node_id)
self.completed.add(node_id)
except Exception as e:
await self.handle_step_failure(node, step_run, e)
async def handle_step_failure(self, node, step_run, error):
"""Handle step failure according to retry and failure policies."""
step_run.attempt_number += 1
# Check retry policy
if step_run.attempt_number <= node.retry_policy.max_retries:
if self.is_retryable(error, node.retry_policy):
step_run.status = StepStatus.RETRYING
delay = self.calculate_backoff(node.retry_policy, step_run.attempt_number)
await asyncio.sleep(delay)
await self.execute_node(node.id) # Retry
return
# No more retries - handle failure
step_run.status = StepStatus.FAILED
step_run.error = str(error)
match node.on_failure:
case "fail":
self.run.status = WorkflowStatus.FAILING
self.failed.add(node.id)
case "continue":
self.completed.add(node.id) # Continue as if succeeded
case "rollback":
await self.trigger_rollback(node)
case _ if node.on_failure.startswith("goto:"):
target = node.on_failure.split(":")[1]
self.pending.add(target) # Add target to pending
self.running.remove(node.id)
```
## Input Resolution
Inputs to steps can come from multiple sources:
```typescript
interface InputResolver {
resolve(binding: InputBinding, context: ExecutionContext): any;
}
class StandardInputResolver implements InputResolver {
resolve(binding: InputBinding, context: ExecutionContext): any {
switch (binding.source.type) {
case "literal":
return binding.source.value;
case "context":
// Navigate context path: "release.name" -> context.release.name
return this.navigatePath(context, binding.source.path);
case "output":
// Get output from previous step
const stepOutputs = context.stepOutputs[binding.source.nodeId];
return stepOutputs?.[binding.source.outputName];
case "secret":
// Fetch from vault (never cached)
return this.secretsClient.fetch(binding.source.secretName);
case "expression":
// Evaluate JavaScript expression
return this.expressionEvaluator.evaluate(
binding.source.expression,
context
);
}
}
}
```
## Execution Context
The execution context provides data available to all steps:
```typescript
interface ExecutionContext {
// Workflow identifiers
workflowRunId: UUID;
templateId: UUID;
templateVersion: number;
// Input values
inputs: Record<string, any>;
// Domain objects (loaded at start)
release?: Release;
promotion?: Promotion;
environment?: Environment;
targets?: Target[];
// Step outputs (accumulated during execution)
stepOutputs: Record<string, Record<string, any>>;
// Tenant context
tenantId: UUID;
userId: UUID;
// Metadata
startedAt: DateTime;
correlationId: string;
}
```
## Concurrency Control
### Parallelism Within Workflows
```typescript
interface ParallelConfig {
maxConcurrency: number; // Max simultaneous steps
failFast: boolean; // Stop all on first failure
}
// Example: Parallel deployment to multiple targets
const parallelDeploy: StepNode = {
id: "parallel-deploy",
type: "parallel",
config: {
maxConcurrency: 5,
failFast: false
},
children: [
{ id: "deploy-target-1", type: "deploy-docker", ... },
{ id: "deploy-target-2", type: "deploy-docker", ... },
{ id: "deploy-target-3", type: "deploy-docker", ... },
]
};
```
### Global Concurrency Limits
```typescript
interface ConcurrencyLimits {
maxWorkflowsPerTenant: number; // Concurrent workflow runs
maxStepsPerWorkflow: number; // Concurrent steps per workflow
maxDeploymentsPerEnvironment: number; // Prevent deployment conflicts
}
// Default limits
const defaults: ConcurrencyLimits = {
maxWorkflowsPerTenant: 10,
maxStepsPerWorkflow: 20,
maxDeploymentsPerEnvironment: 1 // One deployment at a time
};
```
## Checkpoint and Resume
Workflows support checkpointing for long-running executions:
```typescript
interface WorkflowCheckpoint {
workflowRunId: UUID;
checkpointedAt: DateTime;
// Execution state
pendingNodes: string[];
completedNodes: string[];
failedNodes: string[];
// Accumulated data
stepOutputs: Record<string, Record<string, any>>;
// Context snapshot
contextSnapshot: ExecutionContext;
}
class CheckpointManager {
// Save checkpoint after each step completion
async saveCheckpoint(run: WorkflowRun): Promise<void> {
const checkpoint: WorkflowCheckpoint = {
workflowRunId: run.id,
checkpointedAt: new Date(),
pendingNodes: Array.from(run.executor.pending),
completedNodes: Array.from(run.executor.completed),
failedNodes: Array.from(run.executor.failed),
stepOutputs: run.executor.outputs,
contextSnapshot: run.context
};
await this.repository.save(checkpoint);
}
// Resume from checkpoint after service restart
async resumeFromCheckpoint(workflowRunId: UUID): Promise<WorkflowRun> {
const checkpoint = await this.repository.get(workflowRunId);
const run = new WorkflowRun();
run.executor.pending = new Set(checkpoint.pendingNodes);
run.executor.completed = new Set(checkpoint.completedNodes);
run.executor.failed = new Set(checkpoint.failedNodes);
run.executor.outputs = checkpoint.stepOutputs;
run.context = checkpoint.contextSnapshot;
// Resume execution
await run.executor.execute();
return run;
}
}
```
## Timeout Handling
```typescript
interface TimeoutConfig {
stepTimeout: number; // Per-step timeout (seconds)
workflowTimeout: number; // Total workflow timeout (seconds)
}
class TimeoutHandler {
async executeWithTimeout<T>(
operation: () => Promise<T>,
timeoutSeconds: number,
onTimeout: () => Promise<void>
): Promise<T> {
const controller = new AbortController();
const timeoutId = setTimeout(
() => controller.abort(),
timeoutSeconds * 1000
);
try {
const result = await operation();
clearTimeout(timeoutId);
return result;
} catch (error) {
if (error.name === 'AbortError') {
await onTimeout();
throw new TimeoutException(timeoutSeconds);
}
throw error;
}
}
}
```
## Event Emission
The workflow engine emits events for observability:
```typescript
type WorkflowEvent =
| { type: "workflow.started"; workflowRunId: UUID; templateId: UUID }
| { type: "workflow.completed"; workflowRunId: UUID; status: string }
| { type: "workflow.failed"; workflowRunId: UUID; error: string }
| { type: "step.started"; workflowRunId: UUID; nodeId: string }
| { type: "step.completed"; workflowRunId: UUID; nodeId: string; outputs: any }
| { type: "step.failed"; workflowRunId: UUID; nodeId: string; error: string }
| { type: "step.retrying"; workflowRunId: UUID; nodeId: string; attempt: number };
class WorkflowEventEmitter {
private subscribers: Map<string, ((event: WorkflowEvent) => void)[]> = new Map();
emit(event: WorkflowEvent): void {
const handlers = this.subscribers.get(event.type) || [];
for (const handler of handlers) {
handler(event);
}
// Also emit to event bus for external consumers
this.eventBus.publish("workflow.events", event);
}
}
```
## Execution Monitoring
### Real-time Progress
```typescript
interface WorkflowProgress {
workflowRunId: UUID;
status: WorkflowStatus;
// Step progress
totalSteps: number;
completedSteps: number;
runningSteps: number;
failedSteps: number;
// Current activity
currentNodes: string[];
// Timing
startedAt: DateTime;
estimatedCompletion?: DateTime;
// Step details
steps: StepProgress[];
}
interface StepProgress {
nodeId: string;
nodeName: string;
status: StepStatus;
startedAt?: DateTime;
completedAt?: DateTime;
attempt: number;
logs?: string;
}
```
### WebSocket Streaming
```typescript
// Client subscribes to workflow progress
const ws = new WebSocket(`/api/v1/workflow-runs/${runId}/stream`);
ws.onmessage = (event) => {
const progress: WorkflowProgress = JSON.parse(event.data);
updateUI(progress);
};
// Server streams updates
class WorkflowStreamHandler {
async stream(runId: UUID, connection: WebSocket): Promise<void> {
const subscription = this.eventBus.subscribe(`workflow.${runId}.*`);
for await (const event of subscription) {
const progress = await this.buildProgress(runId);
connection.send(JSON.stringify(progress));
if (progress.status === 'completed' || progress.status === 'failed') {
break;
}
}
connection.close();
}
}
```
## References
- [Workflow Templates](templates.md)
- [Workflow Engine Module](../modules/workflow-engine.md)
- [Promotion Manager](../modules/promotion-manager.md)

View File

@@ -0,0 +1,405 @@
# Promotion State Machine
## Overview
Promotions move releases through environments (Dev -> Staging -> Production). The promotion state machine manages the lifecycle from request to completion.
## Promotion States
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ PROMOTION STATE MACHINE │
│ │
│ ┌──────────────────┐ │
│ │ PENDING_APPROVAL │ (initial) │
│ └────────┬─────────┘ │
│ │ │
│ ┌──────────────────┼──────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ REJECTED │ │ PENDING_GATE │ │ CANCELLED │ │
│ └────────────────┘ └────────┬───────┘ └────────────────┘ │
│ │ │
│ │ gates pass │
│ ▼ │
│ ┌────────────────┐ │
│ │ APPROVED │ │
│ └────────┬───────┘ │
│ │ │
│ │ start deployment │
│ ▼ │
│ ┌────────────────┐ │
│ │ DEPLOYING │ │
│ └────────┬───────┘ │
│ │ │
│ ┌──────────────────┼──────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ FAILED │ │ DEPLOYED │ │ ROLLED_BACK │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## State Definitions
| State | Description |
|-------|-------------|
| `pending_approval` | Awaiting human approval (if required) |
| `pending_gate` | Awaiting automated gate evaluation |
| `approved` | All approvals and gates satisfied; ready for deployment |
| `rejected` | Blocked by approval rejection or gate failure |
| `deploying` | Deployment in progress |
| `deployed` | Successfully deployed to target environment |
| `failed` | Deployment failed (not rolled back) |
| `cancelled` | Cancelled by user before completion |
| `rolled_back` | Deployment rolled back to previous version |
## State Transitions
### Valid Transitions
```typescript
const validTransitions: Record<PromotionStatus, PromotionStatus[]> = {
pending_approval: ["pending_gate", "approved", "rejected", "cancelled"],
pending_gate: ["approved", "rejected", "cancelled"],
approved: ["deploying", "cancelled"],
deploying: ["deployed", "failed", "rolled_back"],
rejected: [], // terminal
cancelled: [], // terminal
deployed: [], // terminal (for this promotion)
failed: ["rolled_back"], // can trigger rollback
rolled_back: [] // terminal
};
```
### Transition Events
```typescript
interface PromotionTransition {
promotionId: UUID;
fromState: PromotionStatus;
toState: PromotionStatus;
trigger: TransitionTrigger;
triggeredBy: UUID; // user or system
timestamp: DateTime;
details: object;
}
type TransitionTrigger =
| "approval_granted"
| "approval_rejected"
| "gate_passed"
| "gate_failed"
| "deployment_started"
| "deployment_completed"
| "deployment_failed"
| "rollback_triggered"
| "rollback_completed"
| "user_cancelled";
```
## Promotion Flow
### 1. Request Promotion
```typescript
async function requestPromotion(request: PromotionRequest): Promise<Promotion> {
// Validate release exists and is ready
const release = await getRelease(request.releaseId);
if (release.status !== "ready" && release.status !== "deployed") {
throw new Error("Release not ready for promotion");
}
// Validate target environment
const environment = await getEnvironment(request.targetEnvironmentId);
// Check freeze windows
if (await isEnvironmentFrozen(environment.id)) {
throw new Error("Environment is frozen");
}
// Determine initial state
const requiresApproval = environment.requiredApprovals > 0;
const initialStatus = requiresApproval ? "pending_approval" : "pending_gate";
// Create promotion
const promotion = await createPromotion({
releaseId: request.releaseId,
sourceEnvironmentId: release.currentEnvironmentId,
targetEnvironmentId: environment.id,
status: initialStatus,
requestedBy: request.userId,
requestReason: request.reason
});
// Emit event
await emitEvent("promotion.requested", promotion);
return promotion;
}
```
### 2. Approval Phase
```typescript
async function processApproval(
promotionId: UUID,
approverId: UUID,
action: "approve" | "reject",
comment?: string
): Promise<Promotion> {
const promotion = await getPromotion(promotionId);
const environment = await getEnvironment(promotion.targetEnvironmentId);
// Validate approver can approve
await validateApproverPermission(approverId, environment.id);
// Check separation of duties
if (environment.requireSeparationOfDuties) {
if (approverId === promotion.requestedBy) {
throw new Error("Separation of duties violation: requester cannot approve");
}
}
// Record approval
await recordApproval({
promotionId,
approverId,
action,
comment
});
if (action === "reject") {
return await transitionState(promotion, "rejected", {
trigger: "approval_rejected",
triggeredBy: approverId,
details: { reason: comment }
});
}
// Check if all required approvals received
const approvalCount = await countApprovals(promotionId);
if (approvalCount >= environment.requiredApprovals) {
return await transitionState(promotion, "pending_gate", {
trigger: "approval_granted",
triggeredBy: approverId
});
}
return promotion;
}
```
### 3. Gate Evaluation
```typescript
async function evaluateGates(promotionId: UUID): Promise<GateEvaluationResult> {
const promotion = await getPromotion(promotionId);
const environment = await getEnvironment(promotion.targetEnvironmentId);
const release = await getRelease(promotion.releaseId);
const gateResults: GateResult[] = [];
// Security gate
const securityResult = await evaluateSecurityGate(release, environment);
gateResults.push(securityResult);
// Custom policy gates
for (const policy of environment.policies) {
const policyResult = await evaluatePolicyGate(release, environment, policy);
gateResults.push(policyResult);
}
// Aggregate results
const allPassed = gateResults.every(g => g.passed);
const blockingFailures = gateResults.filter(g => !g.passed && g.blocking);
// Create decision record
const decisionRecord = await createDecisionRecord({
promotionId,
gateResults,
decision: allPassed ? "allow" : "block",
decidedAt: new Date()
});
// Transition state
if (allPassed) {
await transitionState(promotion, "approved", {
trigger: "gate_passed",
triggeredBy: "system",
details: { decisionRecordId: decisionRecord.id }
});
} else {
await transitionState(promotion, "rejected", {
trigger: "gate_failed",
triggeredBy: "system",
details: { blockingGates: blockingFailures }
});
}
return { passed: allPassed, gateResults, decisionRecord };
}
```
### 4. Deployment Execution
```typescript
async function executeDeployment(promotionId: UUID): Promise<DeploymentJob> {
const promotion = await getPromotion(promotionId);
// Transition to deploying
await transitionState(promotion, "deploying", {
trigger: "deployment_started",
triggeredBy: "system"
});
// Generate artifacts
const artifacts = await generateArtifacts(promotion);
// Create deployment job
const job = await createDeploymentJob({
promotionId,
releaseId: promotion.releaseId,
environmentId: promotion.targetEnvironmentId,
artifacts
});
// Execute via workflow or direct
const workflowRun = await startDeploymentWorkflow(job);
// Update promotion with workflow reference
await updatePromotion(promotionId, { workflowRunId: workflowRun.id });
return job;
}
```
### 5. Completion Handling
```typescript
async function handleDeploymentCompletion(
jobId: UUID,
status: "succeeded" | "failed"
): Promise<Promotion> {
const job = await getDeploymentJob(jobId);
const promotion = await getPromotion(job.promotionId);
if (status === "succeeded") {
// Generate evidence packet
const evidence = await generateEvidencePacket(promotion, job);
// Update release environment state
await updateReleaseEnvironmentState({
releaseId: promotion.releaseId,
environmentId: promotion.targetEnvironmentId,
status: "deployed",
promotionId: promotion.id,
evidenceRef: evidence.id
});
return await transitionState(promotion, "deployed", {
trigger: "deployment_completed",
triggeredBy: "system",
details: { evidencePacketId: evidence.id }
});
} else {
return await transitionState(promotion, "failed", {
trigger: "deployment_failed",
triggeredBy: "system",
details: { jobId, error: job.errorMessage }
});
}
}
```
## Decision Record
Every promotion produces a decision record:
```typescript
interface DecisionRecord {
id: UUID;
promotionId: UUID;
decision: "allow" | "block";
decidedAt: DateTime;
// Inputs
release: {
id: UUID;
name: string;
components: Array<{ name: string; digest: string }>;
};
environment: {
id: UUID;
name: string;
};
// Gate results
gateResults: Array<{
gateName: string;
gateType: string;
passed: boolean;
blocking: boolean;
message: string;
details: object;
evaluatedAt: DateTime;
}>;
// Approvals
approvals: Array<{
approverId: UUID;
approverName: string;
action: "approved" | "rejected";
comment?: string;
timestamp: DateTime;
}>;
// Context
requester: {
id: UUID;
name: string;
};
requestReason: string;
// Signature
contentHash: string;
signature: string;
}
```
## API Endpoints
```yaml
# Request promotion
POST /api/v1/promotions
Body: { releaseId, targetEnvironmentId, reason? }
Response: Promotion
# Approve/reject promotion
POST /api/v1/promotions/{id}/approve
POST /api/v1/promotions/{id}/reject
Body: { comment? }
Response: Promotion
# Cancel promotion
POST /api/v1/promotions/{id}/cancel
Response: Promotion
# Get decision record
GET /api/v1/promotions/{id}/decision
Response: DecisionRecord
# Preview gates (dry run)
POST /api/v1/promotions/preview-gates
Body: { releaseId, targetEnvironmentId }
Response: { wouldPass: boolean, gates: GateResult[] }
```
## References
- [Workflow Templates](templates.md)
- [Workflow Execution](execution.md)
- [Evidence Schema](../appendices/evidence-schema.md)

View File

@@ -0,0 +1,327 @@
# Workflow Template Structure
## Overview
Workflow templates define the DAG (Directed Acyclic Graph) of steps to execute during deployment, promotion, and other automated processes.
## Template Structure
```typescript
interface WorkflowTemplate {
id: UUID;
tenantId: UUID;
name: string; // "standard-deploy"
displayName: string; // "Standard Deployment"
description: string;
version: number; // Auto-incremented
// DAG structure
nodes: StepNode[];
edges: StepEdge[];
// I/O definitions
inputs: InputDefinition[];
outputs: OutputDefinition[];
// Metadata
tags: string[];
isBuiltin: boolean;
createdAt: DateTime;
createdBy: UUID;
}
```
## Node Types
### Step Node
```typescript
interface StepNode {
id: string; // Unique within template (e.g., "deploy-api")
type: string; // Step type from registry
name: string; // Display name
config: Record<string, any>; // Step-specific configuration
inputs: InputBinding[]; // Input value bindings
outputs: OutputBinding[]; // Output declarations
position: { x: number; y: number }; // UI position
// Execution settings
timeout: number; // Seconds (default from step type)
retryPolicy: RetryPolicy;
onFailure: FailureAction;
condition?: string; // JS expression for conditional execution
// Documentation
description?: string;
documentation?: string;
}
type FailureAction = "fail" | "continue" | "rollback" | "goto:{nodeId}";
interface RetryPolicy {
maxRetries: number;
backoffType: "fixed" | "exponential";
backoffSeconds: number;
retryableErrors: string[];
}
```
### Input Bindings
```typescript
interface InputBinding {
name: string; // Input parameter name
source: InputSource;
}
type InputSource =
| { type: "literal"; value: any }
| { type: "context"; path: string } // e.g., "release.name"
| { type: "output"; nodeId: string; outputName: string }
| { type: "secret"; secretName: string }
| { type: "expression"; expression: string }; // JS expression
```
### Edge Types
```typescript
interface StepEdge {
id: string;
from: string; // Source node ID
to: string; // Target node ID
condition?: string; // Optional condition expression
label?: string; // Display label for conditional edges
}
```
## Built-in Step Types
### Control Steps
| Type | Description | Config |
|------|-------------|--------|
| `approval` | Wait for human approval | `promotionId` |
| `wait` | Wait for specified duration | `durationSeconds` |
| `condition` | Branch based on condition | `expression` |
| `parallel` | Execute children in parallel | `maxConcurrency` |
### Gate Steps
| Type | Description | Config |
|------|-------------|--------|
| `security-gate` | Evaluate security policy | `blockOnCritical`, `blockOnHigh` |
| `custom-gate` | Custom OPA policy evaluation | `policyName` |
| `freeze-check` | Check freeze windows | - |
| `approval-check` | Check approval status | `requiredCount` |
### Deploy Steps
| Type | Description | Config |
|------|-------------|--------|
| `deploy-docker` | Deploy single container | `containerName`, `strategy` |
| `deploy-compose` | Deploy Docker Compose stack | `composePath`, `strategy` |
| `deploy-ecs` | Deploy to AWS ECS | `cluster`, `service` |
| `deploy-nomad` | Deploy to HashiCorp Nomad | `jobName` |
### Verification Steps
| Type | Description | Config |
|------|-------------|--------|
| `health-check` | HTTP/TCP health check | `type`, `path`, `expectedStatus` |
| `smoke-test` | Run smoke test suite | `testSuite`, `timeout` |
| `verify-digest` | Verify deployed digest | `expectedDigest` |
### Integration Steps
| Type | Description | Config |
|------|-------------|--------|
| `webhook` | Call external webhook | `url`, `method`, `headers` |
| `trigger-ci` | Trigger CI pipeline | `integrationId`, `pipelineId` |
| `wait-ci` | Wait for CI pipeline | `runId`, `timeout` |
### Notification Steps
| Type | Description | Config |
|------|-------------|--------|
| `notify` | Send notification | `channel`, `template` |
| `slack` | Send Slack message | `channel`, `message` |
| `email` | Send email | `recipients`, `template` |
### Recovery Steps
| Type | Description | Config |
|------|-------------|--------|
| `rollback` | Rollback deployment | `strategy`, `targetReleaseId` |
| `execute-script` | Run recovery script | `scriptType`, `scriptRef` |
## Template Example: Standard Deployment
```json
{
"id": "template-standard-deploy",
"name": "standard-deploy",
"displayName": "Standard Deployment",
"version": 1,
"inputs": [
{ "name": "releaseId", "type": "uuid", "required": true },
{ "name": "environmentId", "type": "uuid", "required": true },
{ "name": "promotionId", "type": "uuid", "required": true }
],
"nodes": [
{
"id": "approval",
"type": "approval",
"name": "Approval Gate",
"config": {},
"inputs": [
{ "name": "promotionId", "source": { "type": "context", "path": "promotionId" } }
],
"position": { "x": 100, "y": 100 }
},
{
"id": "security-gate",
"type": "security-gate",
"name": "Security Verification",
"config": {
"blockOnCritical": true,
"blockOnHigh": true
},
"inputs": [
{ "name": "releaseId", "source": { "type": "context", "path": "releaseId" } }
],
"position": { "x": 100, "y": 200 }
},
{
"id": "pre-deploy-hook",
"type": "execute-script",
"name": "Pre-Deploy Hook",
"config": {
"scriptType": "csharp",
"scriptRef": "hooks/pre-deploy.csx"
},
"inputs": [
{ "name": "release", "source": { "type": "context", "path": "release" } },
{ "name": "environment", "source": { "type": "context", "path": "environment" } }
],
"timeout": 300,
"onFailure": "fail",
"position": { "x": 100, "y": 300 }
},
{
"id": "deploy-targets",
"type": "deploy-compose",
"name": "Deploy to Targets",
"config": {
"strategy": "rolling",
"parallelism": 2
},
"inputs": [
{ "name": "releaseId", "source": { "type": "context", "path": "releaseId" } },
{ "name": "environmentId", "source": { "type": "context", "path": "environmentId" } }
],
"timeout": 600,
"retryPolicy": {
"maxRetries": 2,
"backoffType": "exponential",
"backoffSeconds": 30
},
"onFailure": "rollback",
"position": { "x": 100, "y": 400 }
},
{
"id": "health-check",
"type": "health-check",
"name": "Health Verification",
"config": {
"type": "http",
"path": "/health",
"expectedStatus": 200,
"timeout": 30,
"retries": 5
},
"inputs": [
{ "name": "targets", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "deployedTargets" } }
],
"onFailure": "rollback",
"position": { "x": 100, "y": 500 }
},
{
"id": "post-deploy-hook",
"type": "execute-script",
"name": "Post-Deploy Hook",
"config": {
"scriptType": "bash",
"inline": "echo 'Deployment complete'"
},
"timeout": 300,
"onFailure": "continue",
"position": { "x": 100, "y": 600 }
},
{
"id": "notify-success",
"type": "notify",
"name": "Success Notification",
"config": {
"channel": "slack",
"template": "deployment-success"
},
"inputs": [
{ "name": "release", "source": { "type": "context", "path": "release" } },
{ "name": "environment", "source": { "type": "context", "path": "environment" } }
],
"onFailure": "continue",
"position": { "x": 100, "y": 700 }
},
{
"id": "rollback-handler",
"type": "rollback",
"name": "Rollback Handler",
"config": {
"strategy": "to-previous"
},
"inputs": [
{ "name": "deploymentJobId", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "jobId" } }
],
"position": { "x": 300, "y": 450 }
},
{
"id": "notify-failure",
"type": "notify",
"name": "Failure Notification",
"config": {
"channel": "slack",
"template": "deployment-failure"
},
"onFailure": "continue",
"position": { "x": 300, "y": 550 }
}
],
"edges": [
{ "id": "e1", "from": "approval", "to": "security-gate" },
{ "id": "e2", "from": "security-gate", "to": "pre-deploy-hook" },
{ "id": "e3", "from": "pre-deploy-hook", "to": "deploy-targets" },
{ "id": "e4", "from": "deploy-targets", "to": "health-check" },
{ "id": "e5", "from": "health-check", "to": "post-deploy-hook" },
{ "id": "e6", "from": "post-deploy-hook", "to": "notify-success" },
{ "id": "e7", "from": "deploy-targets", "to": "rollback-handler", "condition": "status === 'failed'" },
{ "id": "e8", "from": "health-check", "to": "rollback-handler", "condition": "status === 'failed'" },
{ "id": "e9", "from": "rollback-handler", "to": "notify-failure" }
]
}
```
## Template Validation
Templates are validated for:
1. **Structural validity**: Valid JSON/YAML, required fields present
2. **DAG validity**: No cycles, all edges reference valid nodes
3. **Type validity**: All step types exist in registry
4. **Schema validity**: Step configs match type schemas
5. **Input validity**: All required inputs are bindable
## References
- [Workflow Engine](../modules/workflow-engine.md)
- [Execution State Machine](execution.md)
- [Step Registry](../modules/workflow-engine.md#module-step-registry)