add release orchestrator docs and sprints gaps fills

This commit is contained in:
2026-01-11 01:05:17 +02:00
parent d58c093887
commit a62974a8c2
37 changed files with 6061 additions and 0 deletions

View File

@@ -325,10 +325,17 @@ public sealed record FreezeWindowDeactivated(
) : IDomainEvent;
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/api/environments.md` (partial) | Markdown | API endpoint documentation for environment management (CRUD, freeze windows) |
---
## Acceptance Criteria
### Code
- [ ] Create environment with all fields
- [ ] Update environment preserves audit fields
- [ ] Delete environment checks for targets/releases
@@ -341,6 +348,12 @@ public sealed record FreezeWindowDeactivated(
- [ ] Domain events published
- [ ] Unit test coverage ≥85%
### Documentation
- [ ] API documentation created for environment endpoints
- [ ] All environment CRUD endpoints documented with request/response schemas
- [ ] Freeze window endpoints documented
- [ ] Cross-references to environment-manager.md added
---
## Test Plan
@@ -399,3 +412,4 @@ public sealed record FreezeWindowDeactivated(
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: api/environments.md (partial) |

View File

@@ -357,10 +357,17 @@ public sealed class HealthCheckScheduler : IHostedService, IDisposable
}
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/api/environments.md` (partial) | Markdown | API endpoint documentation for target management (target groups, targets, health) |
---
## Acceptance Criteria
### Code
- [ ] Register target with connection config
- [ ] Update target preserves encrypted config
- [ ] Unregister checks for active deployments
@@ -372,6 +379,12 @@ public sealed class HealthCheckScheduler : IHostedService, IDisposable
- [ ] Scheduled health checks run
- [ ] Unit test coverage ≥85%
### Documentation
- [ ] API documentation created for target endpoints
- [ ] All target CRUD endpoints documented
- [ ] Target group endpoints documented
- [ ] Health check endpoints documented
---
## Dependencies
@@ -405,3 +418,4 @@ public sealed class HealthCheckScheduler : IHostedService, IDisposable
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: api/environments.md (partial - targets) |

View File

@@ -490,10 +490,17 @@ public sealed class HeartbeatTimeoutMonitor : IHostedService, IDisposable
}
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/api/agents.md` | Markdown | API endpoint documentation for agent registration, heartbeat, task management |
---
## Acceptance Criteria
### Code
- [ ] Registration token created with expiry
- [ ] Token can only be used once
- [ ] Agent registered with certificate
@@ -505,6 +512,13 @@ public sealed class HeartbeatTimeoutMonitor : IHostedService, IDisposable
- [ ] Agent capabilities stored correctly
- [ ] Unit test coverage ≥85%
### Documentation
- [ ] API documentation file created (api/agents.md)
- [ ] Agent registration endpoint documented
- [ ] Heartbeat endpoint documented
- [ ] Task endpoints documented
- [ ] mTLS flow documented
---
## Dependencies
@@ -537,3 +551,4 @@ public sealed class HeartbeatTimeoutMonitor : IHostedService, IDisposable
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: api/agents.md |

View File

@@ -451,10 +451,17 @@ public sealed record ComponentDeleted(
) : IDomainEvent;
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/api/releases.md` (partial) | Markdown | API endpoint documentation for component registry (list, create, update components) |
---
## Acceptance Criteria
### Code
- [ ] Register component with registry/repository
- [ ] Validate registry connectivity on register
- [ ] Check for duplicate components
@@ -466,6 +473,11 @@ public sealed record ComponentDeleted(
- [ ] Import discovered components
- [ ] Unit test coverage >=85%
### Documentation
- [ ] Component API endpoints documented
- [ ] List/Get/Create/Update/Delete component endpoints included
- [ ] Component version strategy schema documented
---
## Test Plan
@@ -520,3 +532,4 @@ public sealed record ComponentDeleted(
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: api/releases.md (partial - components) |

View File

@@ -455,10 +455,17 @@ public sealed record VersionResolved(
) : IDomainEvent;
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/api/releases.md` (partial) | Markdown | API endpoint documentation for version resolution (tag to digest, version maps) |
---
## Acceptance Criteria
### Code
- [ ] Resolve tag to digest
- [ ] Resolve digest returns same digest
- [ ] Record new version with metadata
@@ -469,6 +476,12 @@ public sealed record VersionResolved(
- [ ] List versions with pagination
- [ ] Unit test coverage >=85%
### Documentation
- [ ] Version API endpoints documented
- [ ] Tag resolution endpoint documented
- [ ] Version map listing documented
- [ ] Digest-first principle explained
---
## Test Plan
@@ -525,3 +538,4 @@ public sealed record VersionResolved(
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: api/releases.md (partial - versions) |

View File

@@ -554,10 +554,17 @@ public sealed record ReleaseDeleted(
) : IDomainEvent;
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/api/releases.md` (partial) | Markdown | API endpoint documentation for release management (create, quick create, compare) |
---
## Acceptance Criteria
### Code
- [ ] Create draft release
- [ ] Add components to draft release
- [ ] Remove components from draft release
@@ -569,6 +576,13 @@ public sealed record ReleaseDeleted(
- [ ] Delete only draft releases
- [ ] Unit test coverage >=85%
### Documentation
- [ ] Release API endpoints documented
- [ ] Create release endpoint documented with full schema
- [ ] Quick create release endpoint documented
- [ ] Compare releases endpoint documented
- [ ] Release creation modes explained
---
## Test Plan
@@ -626,3 +640,4 @@ public sealed record ReleaseDeleted(
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: api/releases.md (partial - releases) |

View File

@@ -596,10 +596,18 @@ public sealed record WorkflowTemplateDeprecated(
) : IDomainEvent;
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/api/workflows.md` (partial) | Markdown | API endpoint documentation for workflow templates (CRUD, validate) |
---
## Acceptance Criteria
### Code
- [ ] Parse YAML workflow definitions
- [ ] Parse JSON workflow definitions
- [ ] Validate step types exist
@@ -611,6 +619,12 @@ public sealed record WorkflowTemplateDeprecated(
- [ ] Deprecate workflow templates
- [ ] Unit test coverage >=85%
### Documentation
- [ ] Workflow template API endpoints documented
- [ ] Template validation endpoint documented
- [ ] Full workflow template JSON schema included
- [ ] DAG validation rules documented
---
## Test Plan
@@ -669,3 +683,4 @@ public sealed record WorkflowTemplateDeprecated(
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: api/workflows.md (partial - templates) |

View File

@@ -478,10 +478,18 @@ public sealed class StepRegistryInitializer : IHostedService
}
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/api/workflows.md` (partial) | Markdown | API endpoint documentation for step registry (list available steps, get step schema) |
---
## Acceptance Criteria
### Code
- [ ] Register built-in step types
- [ ] Load plugin step types
- [ ] Validate step configurations against schema
@@ -492,6 +500,12 @@ public sealed class StepRegistryInitializer : IHostedService
- [ ] Required property validation works
- [ ] Unit test coverage >=85%
### Documentation
- [ ] Step registry API endpoints documented
- [ ] List steps endpoint documented (GET /api/v1/steps)
- [ ] Built-in step types listed
- [ ] Plugin-provided step discovery explained
---
## Test Plan
@@ -547,3 +561,4 @@ public sealed class StepRegistryInitializer : IHostedService
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: api/workflows.md (partial - step registry) |

View File

@@ -643,10 +643,18 @@ public sealed record WorkflowStepFailed(
) : IDomainEvent;
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/api/workflows.md` (partial) | Markdown | API endpoint documentation for workflow runs (start, pause, resume, cancel) |
---
## Acceptance Criteria
### Code
- [ ] Start workflow from template
- [ ] Execute steps in dependency order
- [ ] Execute independent steps in parallel
@@ -658,6 +666,12 @@ public sealed record WorkflowStepFailed(
- [ ] Evaluate step conditions
- [ ] Unit test coverage >=85%
### Documentation
- [ ] Workflow run API endpoints documented
- [ ] Start workflow run endpoint documented
- [ ] Pause/Resume/Cancel endpoints documented
- [ ] Run status response schema included
---
## Test Plan
@@ -717,3 +731,4 @@ public sealed record WorkflowStepFailed(
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: api/workflows.md (partial - workflow runs) |

View File

@@ -513,10 +513,18 @@ public sealed record PromotionDeployed(
) : IDomainEvent;
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/api/promotions.md` (partial) | Markdown | API endpoint documentation for promotion requests (create, list, get, cancel) |
---
## Acceptance Criteria
### Code
- [ ] Create promotion request
- [ ] Validate release is finalized
- [ ] Validate environment order
@@ -528,6 +536,12 @@ public sealed record PromotionDeployed(
- [ ] List pending approvals
- [ ] Unit test coverage >=85%
### Documentation
- [ ] Promotion API endpoints documented
- [ ] Create promotion request documented with full schema
- [ ] List/Get/Cancel promotion endpoints documented
- [ ] Promotion state machine referenced
---
## Test Plan
@@ -583,3 +597,4 @@ public sealed record PromotionDeployed(
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: api/promotions.md (partial - promotions) |

View File

@@ -560,10 +560,18 @@ public sealed record ApprovalThresholdMet(
) : IDomainEvent;
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/api/promotions.md` (partial) | Markdown | API endpoint documentation for approvals (approve, reject, SoD enforcement) |
---
## Acceptance Criteria
### Code
- [ ] Approve promotion with comment
- [ ] Reject promotion with reason
- [ ] Enforce separation of duties
@@ -574,6 +582,13 @@ public sealed record ApprovalThresholdMet(
- [ ] Notify approvers on request
- [ ] Unit test coverage >=85%
### Documentation
- [ ] Approval API endpoints documented
- [ ] Approve promotion endpoint documented (POST /api/v1/promotions/{id}/approve)
- [ ] Reject promotion endpoint documented
- [ ] Separation of duties rules explained
- [ ] Approval record schema included
---
## Test Plan
@@ -630,3 +645,4 @@ public sealed record ApprovalThresholdMet(
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: api/promotions.md (partial - approvals) |

View File

@@ -871,10 +871,18 @@ public sealed class ContainerLogStreamer
}
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/deployment/agent-based.md` (partial) | Markdown | Agent-based deployment documentation (Docker agent with 9 operations) |
---
## Acceptance Criteria
### Code
- [ ] Pull images with digest references
- [ ] Pull from authenticated registries
- [ ] Create containers with environment variables
@@ -888,6 +896,12 @@ public sealed class ContainerLogStreamer
- [ ] Stream container logs
- [ ] Unit test coverage >=85%
### Documentation
- [ ] Docker agent documentation section created
- [ ] All Docker operations documented (pull, run, stop, remove, health check, logs)
- [ ] TypeScript implementation code included
- [ ] Digest verification flow documented
---
## Dependencies
@@ -919,3 +933,4 @@ public sealed class ContainerLogStreamer
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: deployment/agent-based.md (partial - Docker) |

View File

@@ -910,10 +910,18 @@ public sealed class ComposeHealthCheckTask
}
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/deployment/agent-based.md` (partial) | Markdown | Agent-based deployment documentation (Compose agent with 8 operations) |
---
## Acceptance Criteria
### Code
- [ ] Deploy compose stack from compose.stella.lock.yml
- [ ] Pull images before deployment
- [ ] Support authenticated registries
@@ -927,6 +935,12 @@ public sealed class ComposeHealthCheckTask
- [ ] Backup existing deployment before update
- [ ] Unit test coverage >=85%
### Documentation
- [ ] Compose agent documentation section created
- [ ] All Compose operations documented (pull, up, down, scale, health, backup)
- [ ] TypeScript implementation code included
- [ ] Compose lock file usage documented
---
## Dependencies
@@ -959,3 +973,4 @@ public sealed class ComposeHealthCheckTask
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: deployment/agent-based.md (partial - Compose) |

View File

@@ -748,10 +748,18 @@ public sealed class SshTunnelTask
}
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/deployment/agentless.md` (partial) | Markdown | Agentless deployment documentation (SSH remote executor) |
---
## Acceptance Criteria
### Code
- [ ] Execute remote commands via SSH
- [ ] Support password authentication
- [ ] Support private key authentication
@@ -765,6 +773,13 @@ public sealed class SshTunnelTask
- [ ] Timeout handling for commands
- [ ] Unit test coverage >=85%
### Documentation
- [ ] SSH remote executor documentation created
- [ ] All SSH operations documented (execute, upload, download, tunnel)
- [ ] SFTP file transfer flow documented
- [ ] SSH key authentication documented
- [ ] TypeScript implementation included
---
## Dependencies
@@ -795,3 +810,4 @@ public sealed class SshTunnelTask
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: deployment/agentless.md (partial - SSH) |

View File

@@ -718,10 +718,19 @@ export class DashboardService {
}
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/api/websockets.md` | Markdown | WebSocket/SSE endpoint documentation for real-time updates (workflow runs, deployments, dashboard metrics, agent tasks) |
| `docs/modules/release-orchestrator/ui/dashboard.md` | Markdown | Dashboard specification with layout, metrics, TypeScript interfaces |
---
## Acceptance Criteria
### Code
- [ ] Dashboard loads within 2 seconds
- [ ] Pipeline overview shows all environments
- [ ] Environment health status displayed correctly
@@ -735,6 +744,16 @@ export class DashboardService {
- [ ] Responsive layout on tablet/desktop
- [ ] Unit test coverage >=80%
### Documentation
- [ ] WebSocket API documentation file created
- [ ] All 4 real-time streams documented (workflow, deployment, dashboard, agent)
- [ ] WebSocket authentication flow documented
- [ ] Message format schemas included
- [ ] Dashboard specification file created
- [ ] Dashboard layout diagram included
- [ ] Metrics TypeScript interfaces documented
---
## Dependencies
@@ -770,3 +789,4 @@ export class DashboardService {
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverables: api/websockets.md, ui/dashboard.md |

View File

@@ -921,10 +921,18 @@ export class EnvironmentSettingsComponent implements OnChanges {
</div>
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/ui/screens.md` (partial) | Markdown | Key UI screens reference (environment overview, release detail, "Why Blocked?" modal) |
---
## Acceptance Criteria
### Code
- [ ] Environment list displays all environments
- [ ] Environment cards show health status
- [ ] Create environment dialog works
@@ -940,6 +948,14 @@ export class EnvironmentSettingsComponent implements OnChanges {
- [ ] Form validation works
- [ ] Unit test coverage >=80%
### Documentation
- [ ] UI screens specification file created
- [ ] Environment overview screen documented with ASCII mockup
- [ ] Target management screens documented
- [ ] Freeze window editor documented
- [ ] All screen wireframes included
---
## Dependencies
@@ -974,3 +990,4 @@ export class EnvironmentSettingsComponent implements OnChanges {
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: ui/screens.md (partial - environment screens) |

View File

@@ -1130,10 +1130,18 @@ export class YamlEditorComponent implements AfterViewInit, OnChanges {
</div>
```
### Documentation Deliverables
| Deliverable | Type | Description |
|-------------|------|-------------|
| `docs/modules/release-orchestrator/ui/workflow-editor.md` | Markdown | Workflow editor specification (graph editor, DAG visualization, Monaco integration) |
---
## Acceptance Criteria
### Code
- [ ] Workflow list displays all workflows
- [ ] Create new workflow initializes empty canvas
- [ ] Load existing workflow displays DAG
@@ -1151,6 +1159,15 @@ export class YamlEditorComponent implements AfterViewInit, OnChanges {
- [ ] Validation errors displayed
- [ ] Unit test coverage >=80%
### Documentation
- [ ] Workflow editor specification file created
- [ ] Graph editor component interface documented
- [ ] DAG visualization documented (D3.js integration)
- [ ] Run visualization overlay documented
- [ ] WebSocket integration for real-time updates documented
- [ ] YAML editor bidirectional sync documented
---
## Dependencies
@@ -1187,3 +1204,4 @@ export class YamlEditorComponent implements AfterViewInit, OnChanges {
| Date | Entry |
|------|-------|
| 10-Jan-2026 | Sprint created |
| 11-Jan-2026 | Added documentation deliverable: ui/workflow-editor.md |

View File

@@ -0,0 +1,274 @@
# Agent APIs
> API endpoints for agent registration, lifecycle management, and task coordination.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 6.3.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Agents Module](../modules/agents.md), [Agent Security](../security/agent-security.md)
## Overview
The Agent API provides endpoints for registering deployment agents, managing their lifecycle, and coordinating task execution. Agents use mTLS for secure communication after initial registration.
---
## Registration Endpoints
### Register Agent
**Endpoint:** `POST /api/v1/agents/register`
Registers a new agent with the orchestrator. Requires a one-time registration token.
**Headers:**
```
X-Agent-Token: {registration-token}
```
**Request:**
```json
{
"name": "agent-prod-01",
"version": "1.0.0",
"capabilities": ["docker", "compose"],
"labels": {
"datacenter": "us-east-1",
"role": "deployment"
}
}
```
**Response:** `201 Created`
```json
{
"agentId": "uuid",
"token": "jwt-token-for-subsequent-requests",
"config": {
"heartbeatInterval": 30,
"taskPollInterval": 5,
"logLevel": "info"
},
"certificate": {
"cert": "-----BEGIN CERTIFICATE-----...",
"key": "-----BEGIN PRIVATE KEY-----...",
"ca": "-----BEGIN CERTIFICATE-----...",
"expiresAt": "2026-01-11T14:23:45Z"
}
}
```
**Notes:**
- Registration token is single-use and expires after 24 hours
- After registration, agent must use mTLS for all subsequent requests
- Certificate is short-lived (24h) and must be renewed via heartbeat
---
## Lifecycle Endpoints
### List Agents
**Endpoint:** `GET /api/v1/agents`
**Query Parameters:**
- `status` (string): Filter by status (`online`, `offline`, `degraded`)
- `capability` (string): Filter by capability (`docker`, `compose`, `ssh`, `winrm`, `ecs`, `nomad`)
**Response:** `200 OK`
```json
[
{
"id": "uuid",
"name": "agent-prod-01",
"version": "1.0.0",
"status": "online",
"capabilities": ["docker", "compose"],
"lastHeartbeat": "2026-01-10T14:23:45Z",
"resourceUsage": {
"cpu": 15.5,
"memory": 45.2
}
}
]
```
### Get Agent
**Endpoint:** `GET /api/v1/agents/{id}`
**Response:** `200 OK` - Full agent details including assigned targets
### Update Agent
**Endpoint:** `PUT /api/v1/agents/{id}`
**Request:**
```json
{
"labels": {
"datacenter": "us-west-2"
},
"capabilities": ["docker", "compose", "ssh"]
}
```
**Response:** `200 OK` - Updated agent
### Delete Agent
**Endpoint:** `DELETE /api/v1/agents/{id}`
Revokes agent credentials and removes registration.
**Response:** `200 OK`
```json
{ "deleted": true }
```
---
## Heartbeat Endpoints
### Send Heartbeat
**Endpoint:** `POST /api/v1/agents/{id}/heartbeat`
Agents must send heartbeats at the configured interval to maintain online status and receive pending tasks.
**Request:**
```json
{
"status": "healthy",
"resourceUsage": {
"cpu": 15.5,
"memory": 45.2,
"disk": 60.0
},
"capabilities": ["docker", "compose"],
"runningTasks": 2
}
```
**Response:** `200 OK`
```json
{
"tasks": [
{
"taskId": "uuid",
"taskType": "docker.pull",
"payload": {
"image": "myapp",
"tag": "v2.3.1",
"digest": "sha256:abc123..."
},
"credentials": {
"registry.username": "user",
"registry.password": "token"
},
"timeout": 300
}
],
"certificateRenewal": {
"cert": "-----BEGIN CERTIFICATE-----...",
"expiresAt": "2026-01-11T14:23:45Z"
}
}
```
**Notes:**
- Certificate renewal is included when current certificate is within 1 hour of expiration
- Tasks array contains pending work for the agent
- Missing heartbeats for 3 intervals marks agent as `offline`
---
## Task Endpoints
### Complete Task
**Endpoint:** `POST /api/v1/agents/{id}/tasks/{taskId}/complete`
Reports task completion status back to the orchestrator.
**Request:**
```json
{
"success": true,
"result": {
"imageId": "sha256:abc123...",
"containerId": "container-uuid"
},
"logs": [
{ "timestamp": "2026-01-10T14:23:45Z", "level": "info", "message": "Pulling image..." },
{ "timestamp": "2026-01-10T14:23:50Z", "level": "info", "message": "Image pulled successfully" }
]
}
```
**Response:** `200 OK`
```json
{ "acknowledged": true }
```
### Get Pending Tasks
**Endpoint:** `GET /api/v1/agents/{id}/tasks`
Alternative to heartbeat for polling pending tasks.
**Response:** `200 OK`
```json
{
"tasks": [
{
"taskId": "uuid",
"taskType": "docker.run",
"priority": 10,
"createdAt": "2026-01-10T14:20:00Z"
}
]
}
```
---
## WebSocket Endpoints
### Task Stream
**Endpoint:** `WS /api/v1/agents/{id}/task-stream`
Real-time task assignment stream for agents.
**Messages (Server to Agent):**
```json
{ "type": "task_assigned", "task": { "taskId": "uuid", "taskType": "docker.pull", ... } }
{ "type": "task_cancelled", "taskId": "uuid" }
```
**Messages (Agent to Server):**
```json
{ "type": "task_progress", "taskId": "uuid", "progress": 50, "message": "Pulling layer 3/5" }
{ "type": "task_log", "taskId": "uuid", "level": "info", "message": "..." }
```
---
## Error Responses
| Status Code | Description |
|-------------|-------------|
| `401` | Invalid or expired registration token |
| `403` | Agent not authorized for this operation |
| `404` | Agent not found |
| `409` | Agent name already registered |
| `503` | Agent offline or unreachable |
---
## See Also
- [Environments API](environments.md)
- [Agents Module](../modules/agents.md)
- [Agent Security](../security/agent-security.md)
- [WebSocket APIs](websockets.md)

View File

@@ -0,0 +1,289 @@
# Environment Management APIs
> API endpoints for managing environments, targets, agents, freeze windows, and inventory.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 6.3.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Environment Manager](../modules/environment-manager.md), [Agents](../modules/agents.md)
## Overview
The Environment Management API provides CRUD operations for environments, target groups, deployment targets, agents, freeze windows, and inventory synchronization. All endpoints require authentication and respect tenant isolation via Row-Level Security.
---
## Environment Endpoints
### Create Environment
**Endpoint:** `POST /api/v1/environments`
**Request:**
```json
{
"name": "production",
"displayName": "Production",
"orderIndex": 3,
"config": {
"deploymentTimeout": 600,
"healthCheckInterval": 30
},
"requiredApprovals": 2,
"requireSod": true,
"promotionPolicy": "default"
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"name": "production",
"displayName": "Production",
"orderIndex": 3,
"isProduction": true,
"requiredApprovals": 2,
"requireSeparationOfDuties": true,
"createdAt": "2026-01-10T14:23:45Z"
}
```
### List Environments
**Endpoint:** `GET /api/v1/environments`
**Query Parameters:**
- `includeState` (boolean): Include current release state
**Response:** `200 OK`
```json
[
{
"id": "uuid",
"name": "development",
"displayName": "Development",
"orderIndex": 1,
"currentRelease": {
"id": "release-uuid",
"name": "myapp-v2.3.1",
"deployedAt": "2026-01-09T10:00:00Z"
}
}
]
```
### Get Environment
**Endpoint:** `GET /api/v1/environments/{id}`
**Response:** `200 OK` - Full environment details
### Update Environment
**Endpoint:** `PUT /api/v1/environments/{id}`
**Request:** Partial environment object
**Response:** `200 OK` - Updated environment
### Delete Environment
**Endpoint:** `DELETE /api/v1/environments/{id}`
**Response:** `200 OK`
```json
{ "deleted": true }
```
---
## Freeze Window Endpoints
### Create Freeze Window
**Endpoint:** `POST /api/v1/environments/{envId}/freeze-windows`
**Request:**
```json
{
"start": "2026-01-15T00:00:00Z",
"end": "2026-01-20T00:00:00Z",
"reason": "Holiday freeze",
"exceptions": ["user-uuid-1", "user-uuid-2"]
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"environmentId": "env-uuid",
"start": "2026-01-15T00:00:00Z",
"end": "2026-01-20T00:00:00Z",
"reason": "Holiday freeze",
"createdBy": "user-uuid"
}
```
### List Freeze Windows
**Endpoint:** `GET /api/v1/environments/{envId}/freeze-windows`
**Query Parameters:**
- `active` (boolean): Filter to active freeze windows only
**Response:** `200 OK` - Array of freeze windows
### Delete Freeze Window
**Endpoint:** `DELETE /api/v1/environments/{envId}/freeze-windows/{windowId}`
**Response:** `200 OK`
```json
{ "deleted": true }
```
---
## Target Group Endpoints
### Create Target Group
**Endpoint:** `POST /api/v1/environments/{envId}/target-groups`
### List Target Groups
**Endpoint:** `GET /api/v1/environments/{envId}/target-groups`
### Get Target Group
**Endpoint:** `GET /api/v1/target-groups/{id}`
### Update Target Group
**Endpoint:** `PUT /api/v1/target-groups/{id}`
### Delete Target Group
**Endpoint:** `DELETE /api/v1/target-groups/{id}`
---
## Target Endpoints
### Create Target
**Endpoint:** `POST /api/v1/targets`
**Request:**
```json
{
"environmentId": "env-uuid",
"targetGroupId": "group-uuid",
"name": "prod-web-01",
"targetType": "docker_host",
"connection": {
"host": "192.168.1.100",
"port": 2375,
"tlsEnabled": true
},
"labels": {
"role": "web",
"datacenter": "us-east-1"
},
"deploymentDirectory": "/opt/deployments"
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"name": "prod-web-01",
"targetType": "docker_host",
"healthStatus": "unknown",
"createdAt": "2026-01-10T14:23:45Z"
}
```
### List Targets
**Endpoint:** `GET /api/v1/targets`
**Query Parameters:**
- `environmentId` (UUID): Filter by environment
- `targetType` (string): Filter by type (`docker_host`, `compose_host`, `ecs_service`, `nomad_job`)
- `labels` (JSON): Filter by labels
- `healthStatus` (string): Filter by health status
**Response:** `200 OK` - Array of targets
### Get Target
**Endpoint:** `GET /api/v1/targets/{id}`
### Update Target
**Endpoint:** `PUT /api/v1/targets/{id}`
### Delete Target
**Endpoint:** `DELETE /api/v1/targets/{id}`
### Trigger Health Check
**Endpoint:** `POST /api/v1/targets/{id}/health-check`
**Response:** `200 OK`
```json
{
"status": "healthy",
"message": "Docker daemon responding",
"checkedAt": "2026-01-10T14:23:45Z"
}
```
### Get Version Sticker
**Endpoint:** `GET /api/v1/targets/{id}/sticker`
**Response:** `200 OK`
```json
{
"releaseId": "uuid",
"releaseName": "myapp-v2.3.1",
"components": [
{
"componentId": "uuid",
"componentName": "api",
"digest": "sha256:abc123..."
}
],
"deployedAt": "2026-01-09T10:00:00Z",
"deployedBy": "user-uuid"
}
```
### Check Drift
**Endpoint:** `GET /api/v1/targets/{id}/drift`
**Response:** `200 OK`
```json
{
"hasDrift": true,
"expected": { "releaseId": "uuid", "digest": "sha256:abc..." },
"actual": { "digest": "sha256:def..." },
"differences": [
{ "component": "api", "expected": "sha256:abc...", "actual": "sha256:def..." }
]
}
```
---
## See Also
- [Agents API](agents.md)
- [Environment Manager Module](../modules/environment-manager.md)
- [Agent Security](../security/agent-security.md)

View File

@@ -0,0 +1,317 @@
# Promotion & Approval APIs
> API endpoints for managing promotions, approvals, and gate evaluations.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 6.3.5](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Promotion Manager Module](../modules/promotion-manager.md), [Workflow Promotion](../workflow/promotion.md)
## Overview
The Promotion API provides endpoints for requesting release promotions between environments, managing approvals, and evaluating promotion gates. Promotions enforce separation of duties (SoD) and require configured approvals before deployment proceeds.
---
## Promotion Endpoints
### Create Promotion Request
**Endpoint:** `POST /api/v1/promotions`
Initiates a promotion request for a release to a target environment.
**Request:**
```json
{
"releaseId": "uuid",
"targetEnvironmentId": "uuid",
"reason": "Deploying v2.3.1 with critical bug fix"
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"releaseId": "uuid",
"releaseName": "myapp-v2.3.1",
"sourceEnvironmentId": "uuid",
"sourceEnvironmentName": "Staging",
"targetEnvironmentId": "uuid",
"targetEnvironmentName": "Production",
"status": "pending",
"requestedBy": "user-uuid",
"requestedAt": "2026-01-10T14:23:45Z",
"reason": "Deploying v2.3.1 with critical bug fix"
}
```
**Status Flow:**
```
pending -> awaiting_approval -> approved -> deploying -> deployed
-> rejected
-> cancelled
-> failed
-> rolled_back
```
### List Promotions
**Endpoint:** `GET /api/v1/promotions`
**Query Parameters:**
- `status` (string): Filter by status
- `releaseId` (UUID): Filter by release
- `environmentId` (UUID): Filter by target environment
- `page` (number): Page number
**Response:** `200 OK`
```json
{
"data": [
{
"id": "uuid",
"releaseName": "myapp-v2.3.1",
"targetEnvironmentName": "Production",
"status": "awaiting_approval",
"requestedAt": "2026-01-10T14:23:45Z"
}
],
"meta": { "page": 1, "totalCount": 25 }
}
```
### Get Promotion
**Endpoint:** `GET /api/v1/promotions/{id}`
**Response:** `200 OK` - Full promotion with decision record and approvals
### Approve Promotion
**Endpoint:** `POST /api/v1/promotions/{id}/approve`
**Request:**
```json
{
"comment": "Approved after reviewing security scan results"
}
```
**Response:** `200 OK`
```json
{
"id": "uuid",
"status": "approved",
"approvalCount": 2,
"requiredApprovals": 2,
"decidedAt": "2026-01-10T14:30:00Z"
}
```
**Notes:**
- Separation of Duties (SoD): The user who requested the promotion cannot approve it if `requireSod` is enabled on the environment
- Multi-party approval: Promotion proceeds when `approvalCount >= requiredApprovals`
### Reject Promotion
**Endpoint:** `POST /api/v1/promotions/{id}/reject`
**Request:**
```json
{
"reason": "Security vulnerabilities not addressed"
}
```
**Response:** `200 OK` - Updated promotion with `status: rejected`
### Cancel Promotion
**Endpoint:** `POST /api/v1/promotions/{id}/cancel`
Cancels a pending or awaiting_approval promotion.
**Response:** `200 OK` - Updated promotion with `status: cancelled`
---
## Decision & Evidence Endpoints
### Get Decision Record
**Endpoint:** `GET /api/v1/promotions/{id}/decision`
Returns the full decision record including gate evaluations.
**Response:** `200 OK`
```json
{
"promotionId": "uuid",
"decision": "allow",
"decidedAt": "2026-01-10T14:30:00Z",
"gates": [
{
"gateName": "security-gate",
"passed": true,
"details": {
"criticalCount": 0,
"highCount": 3,
"maxCritical": 0,
"maxHigh": 5
}
},
{
"gateName": "freeze-window-gate",
"passed": true,
"details": {
"activeFreezeWindow": null
}
}
],
"approvals": [
{
"approverId": "uuid",
"approverName": "John Doe",
"decision": "approved",
"comment": "LGTM",
"approvedAt": "2026-01-10T14:28:00Z"
}
]
}
```
### Get Approvals
**Endpoint:** `GET /api/v1/promotions/{id}/approvals`
**Response:** `200 OK` - Array of approval records
### Get Evidence Packet
**Endpoint:** `GET /api/v1/promotions/{id}/evidence`
Returns the signed evidence packet for the promotion decision.
**Response:** `200 OK`
```json
{
"id": "uuid",
"type": "release_decision",
"version": "1.0",
"content": { ... },
"contentHash": "sha256:abc...",
"signature": "base64-signature",
"signatureAlgorithm": "ECDSA-P256-SHA256",
"signerKeyRef": "key-id",
"generatedAt": "2026-01-10T14:30:00Z"
}
```
---
## Gate Preview Endpoints
### Preview Gate Evaluation
**Endpoint:** `POST /api/v1/promotions/preview-gates`
Evaluates gates without creating a promotion (dry run).
**Request:**
```json
{
"releaseId": "uuid",
"targetEnvironmentId": "uuid"
}
```
**Response:** `200 OK`
```json
{
"wouldPass": false,
"gates": [
{
"gateName": "security-gate",
"passed": false,
"blocking": true,
"message": "3 critical vulnerabilities exceed threshold (max: 0)"
},
{
"gateName": "freeze-window-gate",
"passed": true,
"blocking": false,
"message": "No active freeze window"
}
]
}
```
---
## Approval Policy Endpoints
### Create Approval Policy
**Endpoint:** `POST /api/v1/approval-policies`
**Request:**
```json
{
"name": "production-policy",
"environmentId": "uuid",
"requiredApprovals": 2,
"approverGroups": ["release-managers", "sre-team"],
"requireSeparationOfDuties": true,
"autoExpireHours": 24
}
```
### List Approval Policies
**Endpoint:** `GET /api/v1/approval-policies`
### Get Approval Policy
**Endpoint:** `GET /api/v1/approval-policies/{id}`
### Update Approval Policy
**Endpoint:** `PUT /api/v1/approval-policies/{id}`
### Delete Approval Policy
**Endpoint:** `DELETE /api/v1/approval-policies/{id}`
---
## Current User Endpoints
### Get My Pending Approvals
**Endpoint:** `GET /api/v1/my/pending-approvals`
Returns promotions awaiting approval from the current user.
**Response:** `200 OK` - Array of promotions
---
## Error Responses
| Status Code | Description |
|-------------|-------------|
| `400` | Invalid promotion request |
| `403` | User cannot approve (SoD violation or not in approver list) |
| `404` | Promotion not found |
| `409` | Promotion already decided |
| `422` | Gate evaluation failed |
---
## See Also
- [Workflows API](workflows.md)
- [Releases API](releases.md)
- [Promotion Manager Module](../modules/promotion-manager.md)
- [Security Gates](../modules/promotion-manager.md#security-gate)

View File

@@ -0,0 +1,345 @@
# Release Management APIs
> API endpoints for managing components, versions, and release bundles.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 6.3.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Release Manager Module](../modules/release-manager.md), [Integration Hub](../modules/integration-hub.md)
## Overview
The Release Management API provides endpoints for managing container components, version tracking, and release bundle creation. All releases are identified by immutable OCI digests, ensuring cryptographic verification throughout the deployment pipeline.
> **Design Principle:** Release identity is established via digest, not tag. Tags are human-friendly aliases; digests are the source of truth.
---
## Component Endpoints
### Create Component
**Endpoint:** `POST /api/v1/components`
Registers a new container component for release management.
**Request:**
```json
{
"name": "api",
"displayName": "API Service",
"imageRepository": "myorg/api",
"registryIntegrationId": "uuid",
"versioningStrategy": "semver",
"defaultChannel": "stable"
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"name": "api",
"displayName": "API Service",
"imageRepository": "myorg/api",
"registryIntegrationId": "uuid",
"versioningStrategy": "semver",
"createdAt": "2026-01-10T14:23:45Z"
}
```
### List Components
**Endpoint:** `GET /api/v1/components`
**Response:** `200 OK` - Array of components
### Get Component
**Endpoint:** `GET /api/v1/components/{id}`
### Update Component
**Endpoint:** `PUT /api/v1/components/{id}`
### Delete Component
**Endpoint:** `DELETE /api/v1/components/{id}`
### Sync Versions
**Endpoint:** `POST /api/v1/components/{id}/sync-versions`
Triggers a refresh of available versions from the container registry.
**Request:**
```json
{
"forceRefresh": true
}
```
**Response:** `200 OK`
```json
{
"synced": 15,
"versions": [
{
"tag": "v2.3.1",
"digest": "sha256:abc123...",
"semver": "2.3.1",
"channel": "stable",
"pushedAt": "2026-01-09T10:00:00Z"
}
]
}
```
### List Component Versions
**Endpoint:** `GET /api/v1/components/{id}/versions`
**Query Parameters:**
- `channel` (string): Filter by channel (`stable`, `beta`, `rc`)
- `limit` (number): Maximum versions to return
**Response:** `200 OK` - Array of version maps
---
## Version Map Endpoints
### Create Version Map
**Endpoint:** `POST /api/v1/version-maps`
Manually assign a semver and channel to a tag/digest.
**Request:**
```json
{
"componentId": "uuid",
"tag": "v2.3.1",
"semver": "2.3.1",
"channel": "stable"
}
```
**Response:** `201 Created`
### List Version Maps
**Endpoint:** `GET /api/v1/version-maps`
**Query Parameters:**
- `componentId` (UUID): Filter by component
- `channel` (string): Filter by channel
---
## Release Endpoints
### Create Release
**Endpoint:** `POST /api/v1/releases`
Creates a new release bundle with specified component versions.
**Request:**
```json
{
"name": "myapp-v2.3.1",
"displayName": "My App 2.3.1",
"components": [
{ "componentId": "uuid", "version": "2.3.1" },
{ "componentId": "uuid", "digest": "sha256:def456..." },
{ "componentId": "uuid", "channel": "stable" }
],
"sourceRef": {
"scmIntegrationId": "uuid",
"repository": "myorg/myapp",
"branch": "main",
"commitSha": "abc123"
}
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"name": "myapp-v2.3.1",
"displayName": "My App 2.3.1",
"status": "draft",
"components": [
{
"componentId": "uuid",
"componentName": "api",
"version": "2.3.1",
"digest": "sha256:abc123...",
"channel": "stable"
}
],
"createdAt": "2026-01-10T14:23:45Z",
"createdBy": "user-uuid"
}
```
### Create Release from Latest
**Endpoint:** `POST /api/v1/releases/from-latest`
Convenience endpoint to create a release from the latest versions of all (or specified) components.
**Request:**
```json
{
"name": "myapp-latest",
"channel": "stable",
"componentIds": ["uuid1", "uuid2"],
"pinFrom": {
"environmentId": "uuid"
}
}
```
**Response:** `201 Created` - Release with resolved digests
### List Releases
**Endpoint:** `GET /api/v1/releases`
**Query Parameters:**
- `status` (string): Filter by status (`draft`, `ready`, `promoting`, `deployed`, `deprecated`)
- `componentId` (UUID): Filter by component inclusion
- `page` (number): Page number
- `pageSize` (number): Items per page
**Response:** `200 OK`
```json
{
"data": [
{
"id": "uuid",
"name": "myapp-v2.3.1",
"status": "deployed",
"componentCount": 3,
"createdAt": "2026-01-10T14:23:45Z"
}
],
"meta": {
"page": 1,
"pageSize": 20,
"totalCount": 150,
"totalPages": 8
}
}
```
### Get Release
**Endpoint:** `GET /api/v1/releases/{id}`
**Response:** `200 OK` - Full release with component details
### Update Release
**Endpoint:** `PUT /api/v1/releases/{id}`
**Request:**
```json
{
"displayName": "Updated Display Name",
"metadata": { "key": "value" },
"status": "ready"
}
```
### Delete Release
**Endpoint:** `DELETE /api/v1/releases/{id}`
### Get Release State
**Endpoint:** `GET /api/v1/releases/{id}/state`
Returns the deployment state of a release across environments.
**Response:** `200 OK`
```json
{
"environments": [
{
"environmentId": "uuid",
"environmentName": "Development",
"status": "deployed",
"deployedAt": "2026-01-09T10:00:00Z"
},
{
"environmentId": "uuid",
"environmentName": "Staging",
"status": "deployed",
"deployedAt": "2026-01-10T08:00:00Z"
},
{
"environmentId": "uuid",
"environmentName": "Production",
"status": "not_deployed"
}
]
}
```
### Deprecate Release
**Endpoint:** `POST /api/v1/releases/{id}/deprecate`
Marks a release as deprecated, preventing new promotions.
**Response:** `200 OK` - Updated release with `status: deprecated`
### Compare Releases
**Endpoint:** `GET /api/v1/releases/{id}/compare/{otherId}`
Compares two releases to identify component differences.
**Response:** `200 OK`
```json
{
"added": [
{ "componentId": "uuid", "componentName": "worker" }
],
"removed": [
{ "componentId": "uuid", "componentName": "legacy-service" }
],
"changed": [
{
"component": "api",
"fromVersion": "2.3.0",
"toVersion": "2.3.1",
"fromDigest": "sha256:old...",
"toDigest": "sha256:new..."
}
]
}
```
---
## Error Responses
| Status Code | Description |
|-------------|-------------|
| `400` | Invalid release configuration |
| `404` | Release or component not found |
| `409` | Release name already exists |
| `422` | Cannot resolve component version |
---
## See Also
- [Promotions API](promotions.md)
- [Release Manager Module](../modules/release-manager.md)
- [Integration Hub](../modules/integration-hub.md)
- [Design Principles](../design/principles.md)

View File

@@ -0,0 +1,374 @@
# Real-Time APIs (WebSocket/SSE)
> WebSocket and Server-Sent Events endpoints for real-time updates.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 6.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Workflow Execution](../workflow/execution.md), [UI Dashboard](../ui/dashboard.md)
## Overview
The Release Orchestrator provides real-time streaming endpoints for workflow runs, deployment progress, agent tasks, and dashboard metrics. These endpoints support both WebSocket connections and Server-Sent Events (SSE) for browser compatibility.
---
## Authentication
All WebSocket and SSE connections require authentication via JWT token:
**WebSocket:** Token in query parameter or first message
```
ws://api/v1/workflow-runs/{id}/stream?token=jwt-token
```
**SSE:** Token in Authorization header
```
GET /api/v1/dashboard/stream
Authorization: Bearer jwt-token
```
---
## Workflow Run Stream
**Endpoint:** `WS /api/v1/workflow-runs/{id}/stream`
Streams real-time updates for a workflow run including step progress and logs.
### Message Types (Server to Client)
**Step Started:**
```json
{
"type": "step_started",
"nodeId": "security-check",
"stepType": "security-gate",
"timestamp": "2026-01-10T14:23:45Z"
}
```
**Step Progress:**
```json
{
"type": "step_progress",
"nodeId": "deploy",
"progress": 50,
"message": "Deploying to target 3/6"
}
```
**Step Log:**
```json
{
"type": "step_log",
"nodeId": "deploy",
"line": "Pulling image sha256:abc123...",
"level": "info",
"timestamp": "2026-01-10T14:23:50Z"
}
```
**Step Completed:**
```json
{
"type": "step_completed",
"nodeId": "security-check",
"status": "succeeded",
"outputs": {
"criticalCount": 0,
"highCount": 3
},
"duration": 5.2,
"timestamp": "2026-01-10T14:23:50Z"
}
```
**Workflow Completed:**
```json
{
"type": "workflow_completed",
"status": "succeeded",
"duration": 125.5,
"outputs": {
"deploymentId": "uuid"
},
"timestamp": "2026-01-10T14:25:50Z"
}
```
---
## Deployment Job Stream
**Endpoint:** `WS /api/v1/deployment-jobs/{id}/stream`
Streams real-time updates for deployment job execution.
### Message Types (Server to Client)
**Task Started:**
```json
{
"type": "task_started",
"taskId": "uuid",
"targetId": "uuid",
"targetName": "prod-web-01",
"taskType": "docker.pull",
"timestamp": "2026-01-10T14:23:45Z"
}
```
**Task Progress:**
```json
{
"type": "task_progress",
"taskId": "uuid",
"progress": 75,
"message": "Pulling layer 4/5"
}
```
**Task Log:**
```json
{
"type": "task_log",
"taskId": "uuid",
"line": "Container started successfully",
"level": "info"
}
```
**Task Completed:**
```json
{
"type": "task_completed",
"taskId": "uuid",
"targetId": "uuid",
"status": "succeeded",
"duration": 45.2,
"result": {
"containerId": "abc123",
"digest": "sha256:..."
},
"timestamp": "2026-01-10T14:24:30Z"
}
```
**Job Completed:**
```json
{
"type": "job_completed",
"status": "succeeded",
"targetsDeployed": 4,
"targetsFailed": 0,
"duration": 180.5,
"timestamp": "2026-01-10T14:26:45Z"
}
```
---
## Agent Task Stream
**Endpoint:** `WS /api/v1/agents/{id}/task-stream`
Bidirectional stream for agent task assignment and progress reporting.
### Message Types (Server to Agent)
**Task Assigned:**
```json
{
"type": "task_assigned",
"task": {
"taskId": "uuid",
"taskType": "docker.pull",
"payload": {
"image": "myapp",
"digest": "sha256:abc123..."
},
"credentials": {
"registry.username": "user",
"registry.password": "token"
},
"timeout": 300
}
}
```
**Task Cancelled:**
```json
{
"type": "task_cancelled",
"taskId": "uuid",
"reason": "Deployment cancelled by user"
}
```
### Message Types (Agent to Server)
**Task Progress:**
```json
{
"type": "task_progress",
"taskId": "uuid",
"progress": 50,
"message": "Pulling image layer 3/5"
}
```
**Task Log:**
```json
{
"type": "task_log",
"taskId": "uuid",
"level": "info",
"message": "Image layer downloaded: sha256:def456..."
}
```
**Task Completed:**
```json
{
"type": "task_completed",
"taskId": "uuid",
"success": true,
"result": {
"imageId": "sha256:abc123..."
}
}
```
---
## Dashboard Metrics Stream
**Endpoint:** `WS /api/v1/dashboard/stream`
Streams real-time dashboard metrics and alerts.
### Message Types (Server to Client)
**Metric Update:**
```json
{
"type": "metric_update",
"metrics": {
"pipelineStatus": [
{ "environmentId": "uuid", "name": "Production", "health": "healthy" }
],
"pendingApprovals": 3,
"activeDeployments": 1,
"recentReleases": 12,
"systemHealth": {
"agentsOnline": 8,
"agentsTotal": 10,
"queueDepth": 5
}
},
"timestamp": "2026-01-10T14:23:45Z"
}
```
**Alert:**
```json
{
"type": "alert",
"alert": {
"id": "uuid",
"severity": "warning",
"title": "Deployment Failed",
"message": "Deployment to Production failed: health check timeout",
"resourceType": "deployment",
"resourceId": "uuid",
"timestamp": "2026-01-10T14:23:45Z"
}
}
```
**Promotion Update:**
```json
{
"type": "promotion_update",
"promotion": {
"id": "uuid",
"releaseName": "myapp-v2.3.1",
"targetEnvironment": "Production",
"status": "awaiting_approval",
"requestedBy": "John Doe"
}
}
```
---
## Connection Management
### Reconnection
Clients should implement exponential backoff reconnection:
```javascript
const connect = (retryCount = 0) => {
const ws = new WebSocket(url);
ws.onclose = () => {
const delay = Math.min(1000 * Math.pow(2, retryCount), 30000);
setTimeout(() => connect(retryCount + 1), delay);
};
ws.onopen = () => {
retryCount = 0;
};
};
```
### Heartbeat
WebSocket connections receive periodic heartbeat messages:
```json
{
"type": "heartbeat",
"timestamp": "2026-01-10T14:23:45Z"
}
```
Clients should respond with:
```json
{
"type": "pong"
}
```
Connections without pong response within 30 seconds are terminated.
---
## Error Messages
```json
{
"type": "error",
"code": "unauthorized",
"message": "Token expired",
"timestamp": "2026-01-10T14:23:45Z"
}
```
| Error Code | Description |
|------------|-------------|
| `unauthorized` | Invalid or expired token |
| `forbidden` | No access to resource |
| `not_found` | Resource not found |
| `rate_limited` | Too many connections |
| `internal_error` | Server error |
---
## See Also
- [Workflows API](workflows.md)
- [Agents API](agents.md)
- [UI Dashboard](../ui/dashboard.md)
- [Workflow Execution](../workflow/execution.md)

View File

@@ -0,0 +1,354 @@
# Workflow APIs
> API endpoints for managing workflow templates, step registry, and workflow runs.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 6.3.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Workflow Engine Module](../modules/workflow-engine.md), [Workflow Templates](../workflow/templates.md)
## Overview
The Workflow API provides endpoints for managing workflow templates (DAG definitions), discovering available step types, and executing workflow runs. Workflows are directed acyclic graphs (DAGs) of steps that orchestrate promotions, deployments, and other automation tasks.
---
## Workflow Template Endpoints
### Create Workflow Template
**Endpoint:** `POST /api/v1/workflow-templates`
**Request:**
```json
{
"name": "standard-promotion",
"displayName": "Standard Promotion Workflow",
"description": "Default workflow for promoting releases",
"nodes": [
{
"id": "security-check",
"type": "security-gate",
"name": "Security Check",
"config": {
"maxCritical": 0,
"maxHigh": 5
},
"position": { "x": 100, "y": 100 }
},
{
"id": "approval",
"type": "approval",
"name": "Manager Approval",
"config": {
"approvers": ["manager-group"],
"minApprovals": 1
},
"position": { "x": 300, "y": 100 }
},
{
"id": "deploy",
"type": "deploy",
"name": "Deploy to Target",
"config": {
"strategy": "rolling",
"batchSize": "25%"
},
"position": { "x": 500, "y": 100 }
}
],
"edges": [
{ "from": "security-check", "to": "approval" },
{ "from": "approval", "to": "deploy" }
],
"inputs": [
{ "name": "releaseId", "type": "uuid", "required": true },
{ "name": "environmentId", "type": "uuid", "required": true }
],
"outputs": [
{ "name": "deploymentId", "type": "uuid" }
]
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"name": "standard-promotion",
"displayName": "Standard Promotion Workflow",
"version": 1,
"nodeCount": 3,
"isActive": true,
"createdAt": "2026-01-10T14:23:45Z"
}
```
### List Workflow Templates
**Endpoint:** `GET /api/v1/workflow-templates`
**Query Parameters:**
- `includeBuiltin` (boolean): Include system-provided templates
- `tags` (string): Filter by tags
**Response:** `200 OK` - Array of workflow templates
### Get Workflow Template
**Endpoint:** `GET /api/v1/workflow-templates/{id}`
**Response:** `200 OK` - Full template with nodes and edges
### Update Workflow Template
**Endpoint:** `PUT /api/v1/workflow-templates/{id}`
Creates a new version of the template.
**Request:** Partial or full template definition
**Response:** `200 OK` - New version of template
### Delete Workflow Template
**Endpoint:** `DELETE /api/v1/workflow-templates/{id}`
**Response:** `200 OK`
```json
{ "deleted": true }
```
### Validate Workflow Template
**Endpoint:** `POST /api/v1/workflow-templates/{id}/validate`
Validates a template with sample inputs.
**Request:**
```json
{
"inputs": {
"releaseId": "sample-uuid",
"environmentId": "sample-uuid"
}
}
```
**Response:** `200 OK`
```json
{
"valid": true,
"errors": []
}
```
Or on validation failure:
```json
{
"valid": false,
"errors": [
{ "nodeId": "deploy", "field": "config.strategy", "message": "Invalid strategy: unknown" },
{ "type": "dag", "message": "Cycle detected: node-a -> node-b -> node-a" }
]
}
```
---
## Step Registry Endpoints
### List Step Types
**Endpoint:** `GET /api/v1/step-types`
Lists all available step types from core and plugins.
**Query Parameters:**
- `category` (string): Filter by category (`deployment`, `gate`, `notification`, `utility`)
- `provider` (string): Filter by provider (`builtin`, `plugin-id`)
**Response:** `200 OK`
```json
[
{
"type": "script",
"displayName": "Script",
"description": "Execute shell script on target",
"category": "utility",
"provider": "builtin",
"configSchema": { ... }
},
{
"type": "security-gate",
"displayName": "Security Gate",
"description": "Check vulnerability thresholds",
"category": "gate",
"provider": "builtin",
"configSchema": { ... }
}
]
```
### Get Step Type
**Endpoint:** `GET /api/v1/step-types/{type}`
**Response:** `200 OK` - Full step type with configuration schema
---
## Workflow Run Endpoints
### Start Workflow Run
**Endpoint:** `POST /api/v1/workflow-runs`
**Request:**
```json
{
"templateId": "uuid",
"context": {
"releaseId": "uuid",
"environmentId": "uuid",
"variables": {
"deploymentTimeout": 600
}
}
}
```
**Response:** `201 Created`
```json
{
"id": "uuid",
"templateId": "uuid",
"templateVersion": 1,
"status": "running",
"startedAt": "2026-01-10T14:23:45Z"
}
```
### List Workflow Runs
**Endpoint:** `GET /api/v1/workflow-runs`
**Query Parameters:**
- `status` (string): Filter by status (`pending`, `running`, `succeeded`, `failed`, `cancelled`)
- `templateId` (UUID): Filter by template
- `page` (number): Page number
**Response:** `200 OK`
```json
{
"data": [
{
"id": "uuid",
"templateName": "standard-promotion",
"status": "running",
"progress": 66,
"startedAt": "2026-01-10T14:23:45Z"
}
],
"meta": { "page": 1, "totalCount": 50 }
}
```
### Get Workflow Run
**Endpoint:** `GET /api/v1/workflow-runs/{id}`
**Response:** `200 OK` - Full run with step statuses
### Pause Workflow Run
**Endpoint:** `POST /api/v1/workflow-runs/{id}/pause`
Pauses a running workflow at the next step boundary.
**Response:** `200 OK` - Updated workflow run
### Resume Workflow Run
**Endpoint:** `POST /api/v1/workflow-runs/{id}/resume`
Resumes a paused workflow.
**Response:** `200 OK` - Updated workflow run
### Cancel Workflow Run
**Endpoint:** `POST /api/v1/workflow-runs/{id}/cancel`
Cancels a running or paused workflow.
**Response:** `200 OK` - Updated workflow run
### List Step Runs
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps`
**Response:** `200 OK`
```json
[
{
"nodeId": "security-check",
"stepType": "security-gate",
"status": "succeeded",
"startedAt": "2026-01-10T14:23:45Z",
"completedAt": "2026-01-10T14:23:50Z"
},
{
"nodeId": "approval",
"stepType": "approval",
"status": "running",
"startedAt": "2026-01-10T14:23:50Z"
}
]
```
### Get Step Run
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}`
**Response:** `200 OK` - Step run with logs
### Get Step Logs
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs`
**Query Parameters:**
- `follow` (boolean): Stream logs in real-time via SSE
**Response:** `200 OK` - Log content or SSE stream
### List Step Artifacts
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts`
**Response:** `200 OK` - Array of artifacts
### Download Artifact
**Endpoint:** `GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts/{artifactId}`
**Response:** Binary download
---
## Error Responses
| Status Code | Description |
|-------------|-------------|
| `400` | Invalid workflow template |
| `404` | Template or run not found |
| `409` | Workflow already running |
| `422` | DAG validation failed |
---
## See Also
- [WebSocket APIs](websockets.md) - Real-time workflow updates
- [Workflow Engine Module](../modules/workflow-engine.md)
- [Workflow Templates](../workflow/templates.md)
- [Workflow Execution](../workflow/execution.md)

View File

@@ -0,0 +1,224 @@
# Configuration Reference
> Environment variables and OPA policy examples for the Release Orchestrator.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 15.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Security Overview](../security/overview.md), [Promotion Manager](../modules/promotion-manager.md)
**Sprint:** [101_001 Foundation](../../../../implplan/SPRINT_20260110_101_001_DB_schema_core_tables.md)
## Overview
This document provides the configuration reference for the Release Orchestrator, including environment variables and OPA policy examples.
---
## Environment Variables
### Core Configuration
```bash
# Database
STELLA_DATABASE_URL=postgresql://user:pass@host:5432/stella
STELLA_REDIS_URL=redis://host:6379
STELLA_SECRET_KEY=base64-encoded-32-bytes
STELLA_LOG_LEVEL=info
STELLA_LOG_FORMAT=json
```
### Authentication (Authority)
```bash
# OAuth/OIDC
STELLA_OAUTH_ISSUER=https://auth.example.com
STELLA_OAUTH_CLIENT_ID=stella-app
STELLA_OAUTH_CLIENT_SECRET=secret
```
### Agents
```bash
# Agent TLS
STELLA_AGENT_LISTEN_PORT=8443
STELLA_AGENT_TLS_CERT=/path/to/cert.pem
STELLA_AGENT_TLS_KEY=/path/to/key.pem
STELLA_AGENT_CA_CERT=/path/to/ca.pem
```
### Plugins
```bash
# Plugin configuration
STELLA_PLUGIN_DIR=/var/stella/plugins
STELLA_PLUGIN_SANDBOX_MEMORY=512m
STELLA_PLUGIN_SANDBOX_CPU=1
```
### Integrations
```bash
# Vault integration
STELLA_VAULT_ADDR=https://vault.example.com
STELLA_VAULT_TOKEN=hvs.xxx
```
---
## Full Configuration File
```yaml
# stella-config.yaml
database:
url: postgresql://user:pass@host:5432/stella
pool_size: 20
ssl_mode: require
redis:
url: redis://host:6379
prefix: stella
auth:
issuer: https://auth.example.com
client_id: stella-app
client_secret_ref: vault://secrets/oauth-client-secret
agents:
listen_port: 8443
tls:
cert_path: /etc/stella/agent.crt
key_path: /etc/stella/agent.key
ca_path: /etc/stella/ca.crt
heartbeat_interval: 30
task_timeout: 600
plugins:
directory: /var/stella/plugins
sandbox:
memory: 512m
cpu: 1
network: restricted
evidence:
storage_path: /var/stella/evidence
signing_key_ref: vault://secrets/evidence-signing-key
retention_days: 2555 # 7 years
logging:
level: info
format: json
output: stdout
telemetry:
enabled: true
otlp_endpoint: otel-collector:4317
service_name: stella-release-orchestrator
```
---
## OPA Policy Examples
### Security Gate Policy
```rego
# security_gate.rego
package stella.gates.security
default allow = false
allow {
input.release.components[_].security.reachable_critical == 0
input.release.components[_].security.reachable_high == 0
}
deny[msg] {
component := input.release.components[_]
component.security.reachable_critical > 0
msg := sprintf("Component %s has %d reachable critical vulnerabilities",
[component.name, component.security.reachable_critical])
}
```
### Approval Gate Policy
```rego
# approval_gate.rego
package stella.gates.approval
default allow = false
allow {
count(input.approvals) >= input.environment.required_approvals
separation_of_duties_met
}
separation_of_duties_met {
not input.environment.require_sod
}
separation_of_duties_met {
input.environment.require_sod
approver_ids := {a.approver_id | a := input.approvals[_]; a.action == "approved"}
not input.promotion.requested_by in approver_ids
}
```
### Freeze Window Gate Policy
```rego
# freeze_window_gate.rego
package stella.gates.freeze
default allow = true
allow = false {
window := input.environment.freeze_windows[_]
time.now_ns() >= time.parse_rfc3339_ns(window.start)
time.now_ns() <= time.parse_rfc3339_ns(window.end)
not input.promotion.requested_by in window.exceptions
}
```
---
## API Error Codes
| Code | HTTP Status | Description |
|------|-------------|-------------|
| `RELEASE_NOT_FOUND` | 404 | Release with specified ID does not exist |
| `ENVIRONMENT_NOT_FOUND` | 404 | Environment with specified ID does not exist |
| `PROMOTION_BLOCKED` | 403 | Promotion blocked by policy gates |
| `APPROVAL_REQUIRED` | 403 | Additional approvals required |
| `FREEZE_WINDOW_ACTIVE` | 403 | Environment is in freeze window |
| `DIGEST_MISMATCH` | 400 | Image digest does not match expected |
| `AGENT_OFFLINE` | 503 | Required agent is offline |
| `WORKFLOW_FAILED` | 500 | Workflow execution failed |
| `PLUGIN_ERROR` | 500 | Plugin returned an error |
| `QUOTA_EXCEEDED` | 429 | Digest analysis quota exceeded |
| `VALIDATION_ERROR` | 400 | Request validation failed |
| `UNAUTHORIZED` | 401 | Authentication required |
| `FORBIDDEN` | 403 | Insufficient permissions |
---
## Default Values
| Setting | Default | Description |
|---------|---------|-------------|
| Agent heartbeat interval | 30s | Frequency of agent heartbeats |
| Task timeout | 600s | Maximum time for agent task |
| Deployment batch size | 25% | Percentage of targets per batch |
| Health check timeout | 60s | Timeout for health checks |
| Evidence retention | 7 years | Audit compliance requirement |
| Max workflow steps | 50 | Maximum steps per workflow |
| Max parallel tasks | 10 | Per-agent concurrent tasks |
---
## See Also
- [Security Overview](../security/overview.md)
- [Promotion Manager](../modules/promotion-manager.md)
- [Database Schema](../data-model/schema.md)
- [Glossary](glossary.md)

View File

@@ -0,0 +1,403 @@
# Agent-Based Deployment
> Agent-based deployment using Docker and Compose agents for executing tasks on targets.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 10.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Agents Module](../modules/agents.md), [Deploy Orchestrator](../modules/deploy-orchestrator.md)
**Sprints:** [108_002 Docker Agent](../../../../implplan/SPRINT_20260110_108_002_AGENTS_docker.md), [108_003 Compose Agent](../../../../implplan/SPRINT_20260110_108_003_AGENTS_compose.md)
## Overview
Agent-based deployment uses lightweight agents installed on target hosts to execute deployment tasks. Agents communicate with the orchestrator over mTLS and receive tasks through heartbeat polling or WebSocket streams.
---
## Agent Task Protocol
### Task Payload Structure
```typescript
// Task assignment (Core -> Agent)
interface AgentTask {
id: UUID;
type: TaskType;
targetId: UUID;
payload: TaskPayload;
credentials: EncryptedCredentials;
timeout: number;
priority: TaskPriority;
idempotencyKey: string;
assignedAt: DateTime;
expiresAt: DateTime;
}
type TaskType =
| "deploy"
| "rollback"
| "health-check"
| "inspect"
| "execute-command"
| "upload-files"
| "write-sticker"
| "read-sticker";
interface DeployTaskPayload {
image: string;
digest: string;
config: DeployConfig;
artifacts: ArtifactReference[];
previousDigest?: string;
hooks: {
preDeploy?: HookConfig;
postDeploy?: HookConfig;
};
}
```
### Task Result Structure
```typescript
// Task result (Agent -> Core)
interface TaskResult {
taskId: UUID;
success: boolean;
startedAt: DateTime;
completedAt: DateTime;
// Success details
outputs?: Record<string, any>;
artifacts?: ArtifactReference[];
// Failure details
error?: string;
errorType?: string;
retriable?: boolean;
// Logs
logs: string;
// Metrics
metrics: {
pullDurationMs?: number;
deployDurationMs?: number;
healthCheckDurationMs?: number;
};
}
```
---
## Docker Agent Implementation
The Docker agent deploys single containers to Docker hosts with digest verification.
### Docker Agent Capabilities
- Pull images with digest verification
- Create and start containers
- Stop and remove containers
- Health check monitoring
- Version sticker management
- Rollback to previous container
### Deploy Task Flow
```typescript
class DockerAgent implements TargetExecutor {
private docker: Docker;
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
const { image, digest, config, previousDigest } = task;
const containerName = config.containerName;
// 1. Pull image and verify digest
this.log(`Pulling image ${image}@${digest}`);
await this.docker.pull(image, { digest });
const pulledDigest = await this.getImageDigest(image);
if (pulledDigest !== digest) {
throw new DigestMismatchError(
`Expected digest ${digest}, got ${pulledDigest}. Possible tampering detected.`
);
}
// 2. Run pre-deploy hook
if (task.hooks?.preDeploy) {
await this.runHook(task.hooks.preDeploy, "pre-deploy");
}
// 3. Stop and rename existing container
const existingContainer = await this.findContainer(containerName);
if (existingContainer) {
this.log(`Stopping existing container ${containerName}`);
await existingContainer.stop({ t: 10 });
await existingContainer.rename(`${containerName}-previous-${Date.now()}`);
}
// 4. Create new container
this.log(`Creating container ${containerName} from ${image}@${digest}`);
const container = await this.docker.createContainer({
name: containerName,
Image: `${image}@${digest}`, // Always use digest, not tag
Env: this.buildEnvVars(config.environment),
HostConfig: {
PortBindings: this.buildPortBindings(config.ports),
Binds: this.buildBindMounts(config.volumes),
RestartPolicy: { Name: config.restartPolicy || "unless-stopped" },
Memory: config.memoryLimit,
CpuQuota: config.cpuLimit,
},
Labels: {
"stella.release.id": config.releaseId,
"stella.release.name": config.releaseName,
"stella.digest": digest,
"stella.deployed.at": new Date().toISOString(),
},
});
// 5. Start container
this.log(`Starting container ${containerName}`);
await container.start();
// 6. Wait for container to be healthy (if health check configured)
if (config.healthCheck) {
this.log(`Waiting for container health check`);
const healthy = await this.waitForHealthy(container, config.healthCheck.timeout);
if (!healthy) {
// Rollback to previous container
await this.rollbackContainer(containerName, existingContainer);
throw new HealthCheckFailedError(`Container ${containerName} failed health check`);
}
}
// 7. Run post-deploy hook
if (task.hooks?.postDeploy) {
await this.runHook(task.hooks.postDeploy, "post-deploy");
}
// 8. Cleanup previous container
if (existingContainer && config.cleanupPrevious !== false) {
this.log(`Removing previous container`);
await existingContainer.remove({ force: true });
}
return {
success: true,
containerId: container.id,
previousDigest: previousDigest,
logs: this.getLogs(),
durationMs: this.getDuration(),
};
}
}
```
### Rollback Implementation
```typescript
async rollback(task: RollbackTaskPayload): Promise<DeployResult> {
const { containerName, targetDigest } = task;
// Find previous container or use specified digest
if (targetDigest) {
// Deploy specific digest
return this.deploy({
...task,
digest: targetDigest,
});
}
// Find and restore previous container
const previousContainer = await this.findContainer(`${containerName}-previous-*`);
if (!previousContainer) {
throw new RollbackError(`No previous container found for ${containerName}`);
}
// Stop current, rename, start previous
const currentContainer = await this.findContainer(containerName);
if (currentContainer) {
await currentContainer.stop({ t: 10 });
await currentContainer.rename(`${containerName}-failed-${Date.now()}`);
}
await previousContainer.rename(containerName);
await previousContainer.start();
return {
success: true,
containerId: previousContainer.id,
logs: this.getLogs(),
durationMs: this.getDuration(),
};
}
```
### Version Sticker Management
```typescript
async writeSticker(sticker: VersionSticker): Promise<void> {
const stickerPath = this.config.stickerPath || "/var/stella/version.json";
const stickerContent = JSON.stringify(sticker, null, 2);
// Write to host filesystem or container volume
if (this.config.stickerLocation === "volume") {
// Write to shared volume
await this.docker.run("alpine", [
"sh", "-c",
`echo '${stickerContent}' > ${stickerPath}`
], {
HostConfig: {
Binds: [`${this.config.stickerVolume}:/var/stella`]
}
});
} else {
// Write directly to host
fs.writeFileSync(stickerPath, stickerContent);
}
}
```
---
## Compose Agent Implementation
The Compose agent deploys multi-container applications defined in Docker Compose files.
### Compose Agent Capabilities
- Pull images for all services
- Verify digests for all services
- Deploy using compose lock files
- Health check all services
- Rollback to previous deployment
- Version sticker management
### Deploy Task Flow
```typescript
class ComposeAgent implements TargetExecutor {
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
const { artifacts, config } = task;
const deployDir = config.deploymentDirectory;
// 1. Write compose lock file
const composeLock = artifacts.find(a => a.type === "compose_lock");
const composeContent = await this.fetchArtifact(composeLock);
const composePath = path.join(deployDir, "compose.stella.lock.yml");
await fs.writeFile(composePath, composeContent);
// 2. Write any additional config files
for (const artifact of artifacts.filter(a => a.type === "config")) {
const content = await this.fetchArtifact(artifact);
await fs.writeFile(path.join(deployDir, artifact.name), content);
}
// 3. Run pre-deploy hook
if (task.hooks?.preDeploy) {
await this.runHook(task.hooks.preDeploy, deployDir);
}
// 4. Pull images
this.log("Pulling images...");
const pullResult = await this.runCompose(deployDir, ["pull"]);
if (!pullResult.success) {
throw new Error(`Failed to pull images: ${pullResult.stderr}`);
}
// 5. Verify digests
await this.verifyDigests(composePath, config.expectedDigests);
// 6. Deploy
this.log("Deploying services...");
const upResult = await this.runCompose(deployDir, [
"up", "-d",
"--remove-orphans",
"--force-recreate"
]);
if (!upResult.success) {
throw new Error(`Failed to deploy: ${upResult.stderr}`);
}
// 7. Wait for services to be healthy
if (config.healthCheck) {
this.log("Waiting for services to be healthy...");
const healthy = await this.waitForServicesHealthy(
deployDir,
config.healthCheck.timeout
);
if (!healthy) {
// Rollback
await this.rollbackToBackup(deployDir);
throw new HealthCheckFailedError("Services failed health check");
}
}
// 8. Run post-deploy hook
if (task.hooks?.postDeploy) {
await this.runHook(task.hooks.postDeploy, deployDir);
}
// 9. Write version sticker
await this.writeSticker(config.sticker, deployDir);
return {
success: true,
logs: this.getLogs(),
durationMs: this.getDuration(),
};
}
}
```
### Digest Verification
```typescript
private async verifyDigests(
composePath: string,
expectedDigests: Record<string, string>
): Promise<void> {
const composeContent = yaml.parse(await fs.readFile(composePath, "utf-8"));
for (const [service, expectedDigest] of Object.entries(expectedDigests)) {
const serviceConfig = composeContent.services[service];
if (!serviceConfig) {
throw new Error(`Service ${service} not found in compose file`);
}
const image = serviceConfig.image;
if (!image.includes("@sha256:")) {
throw new Error(`Service ${service} image not pinned to digest: ${image}`);
}
const actualDigest = image.split("@")[1];
if (actualDigest !== expectedDigest) {
throw new DigestMismatchError(
`Service ${service}: expected ${expectedDigest}, got ${actualDigest}`
);
}
}
}
```
---
## Security Considerations
1. **Digest Verification:** All deployments verify image digests before execution
2. **Credential Encryption:** Credentials are encrypted in transit and at rest
3. **mTLS Communication:** All agent-server communication uses mutual TLS
4. **Hook Sandboxing:** Pre/post-deploy hooks run in isolated environments
5. **Audit Logging:** All deployment actions are logged with actor context
---
## See Also
- [Agents Module](../modules/agents.md)
- [Agent Security](../security/agent-security.md)
- [Deployment Orchestrator](../modules/deploy-orchestrator.md)
- [Agentless Deployment](agentless.md)

View File

@@ -0,0 +1,427 @@
# Agentless Deployment (SSH/WinRM)
> Agentless deployment using SSH and WinRM for remote execution without installing agents.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 10.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Agents Module](../modules/agents.md), [Deploy Orchestrator](../modules/deploy-orchestrator.md)
**Sprints:** [108_004 SSH Agent](../../../../implplan/SPRINT_20260110_108_004_AGENTS_ssh.md), [108_005 WinRM Agent](../../../../implplan/SPRINT_20260110_108_005_AGENTS_winrm.md)
## Overview
Agentless deployment enables deployment to targets without requiring a pre-installed agent. The orchestrator connects directly to targets using SSH (Linux/Unix) or WinRM (Windows) to execute deployment commands.
---
## SSH Remote Executor
### Capabilities
- SSH key-based authentication
- File transfer via SFTP
- Remote command execution
- Docker operations over SSH
- Script execution
- Backup and rollback
### Connection Management
```typescript
class SSHRemoteExecutor implements TargetExecutor {
private ssh: SSHClient;
async connect(config: SSHConnectionConfig): Promise<void> {
const privateKey = await this.secrets.getSecret(config.privateKeyRef);
this.ssh = new SSHClient();
await this.ssh.connect({
host: config.host,
port: config.port || 22,
username: config.username,
privateKey: privateKey.value,
readyTimeout: config.connectionTimeout || 30000,
keepaliveInterval: 10000,
});
}
}
```
### Deploy Task Flow
```typescript
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
const { artifacts, config } = task;
const deployDir = config.deploymentDirectory;
try {
// 1. Ensure deployment directory exists
await this.exec(`mkdir -p ${deployDir}`);
await this.exec(`mkdir -p ${deployDir}/.stella-backup`);
// 2. Backup current deployment
await this.exec(`cp -r ${deployDir}/* ${deployDir}/.stella-backup/ 2>/dev/null || true`);
// 3. Upload artifacts
for (const artifact of artifacts) {
const content = await this.fetchArtifact(artifact);
const remotePath = path.join(deployDir, artifact.name);
await this.uploadFile(content, remotePath);
}
// 4. Run pre-deploy hook
if (task.hooks?.preDeploy) {
await this.runRemoteHook(task.hooks.preDeploy, deployDir);
}
// 5. Execute deployment script
const deployScript = artifacts.find(a => a.type === "deploy_script");
if (deployScript) {
const scriptPath = path.join(deployDir, deployScript.name);
await this.exec(`chmod +x ${scriptPath}`);
const result = await this.exec(scriptPath, {
cwd: deployDir,
timeout: config.deploymentTimeout,
env: config.environment,
});
if (result.exitCode !== 0) {
throw new DeploymentError(`Deploy script failed: ${result.stderr}`);
}
}
// 6. Run post-deploy hook
if (task.hooks?.postDeploy) {
await this.runRemoteHook(task.hooks.postDeploy, deployDir);
}
// 7. Health check
if (config.healthCheck) {
const healthy = await this.runHealthCheck(config.healthCheck);
if (!healthy) {
await this.rollback(task);
throw new HealthCheckFailedError("Health check failed");
}
}
// 8. Write version sticker
await this.writeSticker(config.sticker, deployDir);
// 9. Cleanup backup
await this.exec(`rm -rf ${deployDir}/.stella-backup`);
return {
success: true,
logs: this.getLogs(),
durationMs: this.getDuration(),
};
} finally {
this.ssh.end();
}
}
```
### Command Execution
```typescript
private async exec(
command: string,
options?: ExecOptions
): Promise<CommandResult> {
return new Promise((resolve, reject) => {
const timeout = options?.timeout || 60000;
let stdout = "";
let stderr = "";
this.ssh.exec(command, { cwd: options?.cwd }, (err, stream) => {
if (err) {
reject(err);
return;
}
const timer = setTimeout(() => {
stream.close();
reject(new TimeoutError(`Command timed out after ${timeout}ms`));
}, timeout);
stream.on("data", (data: Buffer) => {
stdout += data.toString();
this.log(data.toString());
});
stream.stderr.on("data", (data: Buffer) => {
stderr += data.toString();
this.log(`[stderr] ${data.toString()}`);
});
stream.on("close", (code: number) => {
clearTimeout(timer);
resolve({ exitCode: code, stdout, stderr });
});
});
});
}
```
### File Upload via SFTP
```typescript
private async uploadFile(content: Buffer | string, remotePath: string): Promise<void> {
return new Promise((resolve, reject) => {
this.ssh.sftp((err, sftp) => {
if (err) {
reject(err);
return;
}
const writeStream = sftp.createWriteStream(remotePath);
writeStream.on("close", () => resolve());
writeStream.on("error", reject);
writeStream.end(content);
});
});
}
```
### Rollback
```typescript
async rollback(task: RollbackTaskPayload): Promise<DeployResult> {
const deployDir = task.config.deploymentDirectory;
// Restore from backup
await this.exec(`rm -rf ${deployDir}/*`);
await this.exec(`cp -r ${deployDir}/.stella-backup/* ${deployDir}/`);
// Re-run deployment from backup
const deployScript = path.join(deployDir, "deploy.sh");
await this.exec(deployScript, { cwd: deployDir });
return {
success: true,
logs: this.getLogs(),
durationMs: this.getDuration(),
};
}
```
---
## WinRM Remote Executor
### Capabilities
- NTLM/Kerberos authentication
- PowerShell script execution
- File transfer via base64 encoding
- Windows container operations
- Windows service management
### Connection Management
```typescript
class WinRMRemoteExecutor implements TargetExecutor {
private winrm: WinRMClient;
async connect(config: WinRMConnectionConfig): Promise<void> {
const credential = await this.secrets.getSecret(config.credentialRef);
this.winrm = new WinRMClient({
host: config.host,
port: config.port || 5986,
username: credential.username,
password: credential.password,
protocol: config.useHttps ? "https" : "http",
authentication: config.authType || "ntlm", // ntlm, kerberos, basic
});
await this.winrm.openShell();
}
}
```
### Deploy Task Flow
```typescript
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
const { artifacts, config } = task;
const deployDir = config.deploymentDirectory;
try {
// 1. Ensure deployment directory exists
await this.execPowerShell(`
if (-not (Test-Path "${deployDir}")) {
New-Item -ItemType Directory -Path "${deployDir}" -Force
}
if (-not (Test-Path "${deployDir}\\.stella-backup")) {
New-Item -ItemType Directory -Path "${deployDir}\\.stella-backup" -Force
}
`);
// 2. Backup current deployment
await this.execPowerShell(`
Get-ChildItem "${deployDir}" -Exclude ".stella-backup" |
Copy-Item -Destination "${deployDir}\\.stella-backup" -Recurse -Force
`);
// 3. Upload artifacts
for (const artifact of artifacts) {
const content = await this.fetchArtifact(artifact);
const remotePath = `${deployDir}\\${artifact.name}`;
await this.uploadFile(content, remotePath);
}
// 4. Run pre-deploy hook
if (task.hooks?.preDeploy) {
await this.runRemoteHook(task.hooks.preDeploy, deployDir);
}
// 5. Execute deployment script
const deployScript = artifacts.find(a => a.type === "deploy_script");
if (deployScript) {
const scriptPath = `${deployDir}\\${deployScript.name}`;
const result = await this.execPowerShell(`
Set-Location "${deployDir}"
& "${scriptPath}"
exit $LASTEXITCODE
`, { timeout: config.deploymentTimeout });
if (result.exitCode !== 0) {
throw new DeploymentError(`Deploy script failed: ${result.stderr}`);
}
}
// 6. Run post-deploy hook
if (task.hooks?.postDeploy) {
await this.runRemoteHook(task.hooks.postDeploy, deployDir);
}
// 7. Health check
if (config.healthCheck) {
const healthy = await this.runHealthCheck(config.healthCheck);
if (!healthy) {
await this.rollback(task);
throw new HealthCheckFailedError("Health check failed");
}
}
// 8. Write version sticker
await this.writeSticker(config.sticker, deployDir);
// 9. Cleanup backup
await this.execPowerShell(`
Remove-Item -Path "${deployDir}\\.stella-backup" -Recurse -Force
`);
return {
success: true,
logs: this.getLogs(),
durationMs: this.getDuration(),
};
} finally {
this.winrm.closeShell();
}
}
```
### PowerShell Execution
```typescript
private async execPowerShell(
script: string,
options?: ExecOptions
): Promise<CommandResult> {
const encoded = Buffer.from(script, "utf16le").toString("base64");
return this.winrm.runCommand(
`powershell -EncodedCommand ${encoded}`,
{ timeout: options?.timeout || 60000 }
);
}
```
### File Upload
```typescript
private async uploadFile(content: Buffer | string, remotePath: string): Promise<void> {
// Use PowerShell to write file content
const base64Content = Buffer.from(content).toString("base64");
await this.execPowerShell(`
$bytes = [Convert]::FromBase64String("${base64Content}")
[IO.File]::WriteAllBytes("${remotePath}", $bytes)
`);
}
```
---
## Security Considerations
### SSH Security
1. **Key-Based Authentication:** Always use SSH keys, never passwords
2. **Key Rotation:** Regularly rotate SSH keys
3. **Bastion Hosts:** Use jump hosts for network isolation
4. **Connection Timeouts:** Enforce strict connection timeouts
5. **Known Hosts:** Verify host fingerprints
### WinRM Security
1. **HTTPS Required:** Always use WinRM over HTTPS in production
2. **Certificate Validation:** Validate server certificates
3. **Kerberos Preferred:** Use Kerberos when available, NTLM as fallback
4. **Credential Protection:** Store credentials in vault
5. **Session Cleanup:** Always close sessions after use
---
## Configuration Examples
### SSH Target Configuration
```yaml
target:
name: web-server-01
type: ssh
connection:
host: 192.168.1.100
port: 22
username: deploy
privateKeyRef: vault://ssh-keys/deploy-key
deployment:
directory: /opt/myapp
healthCheck:
command: curl -f http://localhost:8080/health
timeout: 30
```
### WinRM Target Configuration
```yaml
target:
name: windows-server-01
type: winrm
connection:
host: 192.168.1.200
port: 5986
useHttps: true
authType: kerberos
credentialRef: vault://windows-creds/deploy-user
deployment:
directory: C:\Apps\MyApp
healthCheck:
command: Invoke-WebRequest -Uri http://localhost:8080/health -UseBasicParsing
timeout: 30
```
---
## See Also
- [Agent-Based Deployment](agent-based.md)
- [Agents Module](../modules/agents.md)
- [Deployment Orchestrator](../modules/deploy-orchestrator.md)
- [Security Overview](../security/overview.md)

View File

@@ -0,0 +1,246 @@
# Alerting Rules
> Prometheus alerting rules for the Release Orchestrator.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 13.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Metrics](metrics.md), [Observability Overview](overview.md)
## Overview
The Release Orchestrator provides Prometheus alerting rules for monitoring promotions, deployments, agents, and integrations.
---
## High Priority Alerts
### Security Gate Block Rate
```yaml
- alert: PromotionGateBlockRate
expr: |
rate(stella_security_gate_results_total{result="blocked"}[1h]) /
rate(stella_security_gate_results_total[1h]) > 0.5
for: 15m
labels:
severity: warning
annotations:
summary: "High rate of security gate blocks"
description: "More than 50% of promotions are being blocked by security gates"
```
### Deployment Failure Rate
```yaml
- alert: DeploymentFailureRate
expr: |
rate(stella_deployments_total{status="failed"}[1h]) /
rate(stella_deployments_total[1h]) > 0.1
for: 10m
labels:
severity: critical
annotations:
summary: "High deployment failure rate"
description: "More than 10% of deployments are failing"
```
### Agent Offline
```yaml
- alert: AgentOffline
expr: |
stella_agents_status{status="offline"} == 1
for: 5m
labels:
severity: warning
annotations:
summary: "Agent offline"
description: "Agent {{ $labels.agent_id }} has been offline for 5 minutes"
```
### Promotion Stuck
```yaml
- alert: PromotionStuck
expr: |
time() - stella_promotion_start_time{status="deploying"} > 1800
for: 5m
labels:
severity: warning
annotations:
summary: "Promotion stuck in deploying state"
description: "Promotion {{ $labels.promotion_id }} has been deploying for more than 30 minutes"
```
### Integration Unhealthy
```yaml
- alert: IntegrationUnhealthy
expr: |
stella_integration_health{status="unhealthy"} == 1
for: 10m
labels:
severity: warning
annotations:
summary: "Integration unhealthy"
description: "Integration {{ $labels.integration_name }} has been unhealthy for 10 minutes"
```
---
## Medium Priority Alerts
### Workflow Step Timeout
```yaml
- alert: WorkflowStepTimeout
expr: |
stella_workflow_step_duration_seconds > 600
for: 1m
labels:
severity: warning
annotations:
summary: "Workflow step taking too long"
description: "Step {{ $labels.step_type }} in workflow {{ $labels.workflow_run_id }} has been running for more than 10 minutes"
```
### Evidence Generation Failure
```yaml
- alert: EvidenceGenerationFailure
expr: |
rate(stella_evidence_generation_failures_total[1h]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Evidence generation failures"
description: "Evidence generation is failing, affecting audit compliance"
```
### Target Health Degraded
```yaml
- alert: TargetHealthDegraded
expr: |
stella_target_health{status!="healthy"} == 1
for: 5m
labels:
severity: warning
annotations:
summary: "Target health degraded"
description: "Target {{ $labels.target_name }} is reporting {{ $labels.status }}"
```
### Approval Timeout
```yaml
- alert: ApprovalTimeout
expr: |
time() - stella_promotion_approval_requested_time > 86400
for: 1h
labels:
severity: warning
annotations:
summary: "Promotion awaiting approval for too long"
description: "Promotion {{ $labels.promotion_id }} has been waiting for approval for more than 24 hours"
```
---
## Low Priority Alerts
### Database Connection Pool
```yaml
- alert: DatabaseConnectionPoolExhausted
expr: |
stella_db_connection_pool_available < 5
for: 5m
labels:
severity: warning
annotations:
summary: "Database connection pool running low"
description: "Only {{ $value }} database connections available"
```
### Plugin Error Rate
```yaml
- alert: PluginErrorRate
expr: |
rate(stella_plugin_errors_total[5m]) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "Plugin errors detected"
description: "Plugin {{ $labels.plugin_id }} is experiencing errors"
```
---
## Alert Routing
### Example AlertManager Configuration
```yaml
# alertmanager.yaml
route:
receiver: default
group_by: [alertname, severity]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: pagerduty
continue: true
- match:
severity: warning
receiver: slack
receivers:
- name: default
webhook_configs:
- url: http://webhook.example.com/alerts
- name: pagerduty
pagerduty_configs:
- service_key: ${PAGERDUTY_KEY}
severity: critical
- name: slack
slack_configs:
- channel: '#alerts'
api_url: ${SLACK_WEBHOOK_URL}
title: '{{ .CommonAnnotations.summary }}'
text: '{{ .CommonAnnotations.description }}'
```
---
## Dashboard Integration
### Grafana Alert Panels
Recommended dashboard panels for alerts:
| Panel | Query |
|-------|-------|
| Active Alerts | `count(ALERTS{alertstate="firing"})` |
| Alert History | `count_over_time(ALERTS{alertstate="firing"}[24h])` |
| By Severity | `count(ALERTS{alertstate="firing"}) by (severity)` |
| By Component | `count(ALERTS{alertstate="firing"}) by (alertname)` |
---
## See Also
- [Metrics](metrics.md)
- [Observability Overview](overview.md)
- [Logging](logging.md)
- [Tracing](tracing.md)

View File

@@ -0,0 +1,220 @@
# Logging Specification
> Structured logging format and categories for the Release Orchestrator.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 13.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Observability Overview](overview.md), [Tracing](tracing.md)
## Overview
The Release Orchestrator uses structured JSON logging with consistent format, correlation IDs, and context propagation for all components.
---
## Structured Log Format
### JSON Schema
```json
{
"timestamp": "2026-01-09T14:32:15.123Z",
"level": "info",
"module": "promotion-manager",
"message": "Promotion approved",
"context": {
"tenant_id": "uuid",
"promotion_id": "uuid",
"release_id": "uuid",
"environment": "prod",
"user_id": "uuid"
},
"details": {
"approvals_count": 2,
"gates_passed": ["security", "approval", "freeze"],
"decision": "allow"
},
"trace_id": "abc123",
"span_id": "def456",
"duration_ms": 45
}
```
---
## Log Levels
| Level | Usage |
|-------|-------|
| `error` | Errors requiring attention; failures that impact functionality |
| `warn` | Potential issues; degraded functionality; approaching limits |
| `info` | Significant events; state changes; audit-relevant actions |
| `debug` | Detailed debugging info; request/response bodies |
| `trace` | Very detailed tracing; internal state; performance profiling |
---
## Log Categories
| Category | Examples |
|----------|----------|
| `api` | Request received, response sent, validation errors |
| `promotion` | Promotion requested, approved, rejected, completed |
| `deployment` | Deployment started, task assigned, completed, failed |
| `security` | Gate evaluation, vulnerability found, policy violation |
| `agent` | Agent registered, heartbeat, task execution |
| `workflow` | Workflow started, step executed, completed |
| `integration` | Integration tested, resource discovered, webhook received |
---
## Logging Examples
### API Request
```json
{
"timestamp": "2026-01-09T14:32:15.123Z",
"level": "info",
"module": "api",
"message": "Request received",
"context": {
"tenant_id": "uuid",
"user_id": "uuid"
},
"details": {
"method": "POST",
"path": "/api/v1/promotions",
"status": 201,
"duration_ms": 125
},
"trace_id": "abc123",
"span_id": "def456"
}
```
### Promotion Event
```json
{
"timestamp": "2026-01-09T14:32:15.123Z",
"level": "info",
"module": "promotion-manager",
"message": "Promotion approved",
"context": {
"tenant_id": "uuid",
"promotion_id": "uuid",
"release_id": "uuid",
"environment": "prod",
"user_id": "uuid"
},
"details": {
"approvals_count": 2,
"gates_passed": ["security", "approval", "freeze"],
"decision": "allow"
},
"trace_id": "abc123",
"span_id": "def456",
"duration_ms": 45
}
```
### Security Gate Failure
```json
{
"timestamp": "2026-01-09T14:32:15.123Z",
"level": "warn",
"module": "security",
"message": "Security gate blocked promotion",
"context": {
"tenant_id": "uuid",
"promotion_id": "uuid",
"release_id": "uuid",
"environment": "prod"
},
"details": {
"gate_name": "security-gate",
"reason": "Critical vulnerability found",
"vulnerabilities": {
"critical": 1,
"high": 3
}
},
"trace_id": "abc123",
"span_id": "def456"
}
```
---
## Sensitive Data Masking
The following fields are automatically masked in logs:
| Field Type | Masking Strategy |
|------------|------------------|
| Passwords | Not logged |
| API Keys | First 4 and last 4 chars only |
| Tokens | Hash only |
| PII | Redacted |
| Credentials | Not logged |
### Example
```json
{
"message": "Authentication succeeded",
"details": {
"api_key": "sk_l...abcd",
"token_hash": "sha256:abc123..."
}
}
```
---
## Correlation IDs
All logs include correlation IDs for request tracing:
| Field | Description |
|-------|-------------|
| `trace_id` | W3C Trace Context trace ID |
| `span_id` | Current operation span ID |
| `correlation_id` | Business-level correlation (optional) |
---
## Log Aggregation
Recommended log aggregation setup:
```yaml
# Fluent Bit configuration
[INPUT]
Name tail
Path /var/log/stella/*.log
Parser json
[FILTER]
Name nest
Match *
Operation lift
Nested_under context
[OUTPUT]
Name opensearch
Match *
Host opensearch.example.com
Index stella-logs
```
---
## See Also
- [Observability Overview](overview.md)
- [Tracing](tracing.md)
- [Alerting](alerting.md)
- [Security Overview](../security/overview.md)

View File

@@ -0,0 +1,222 @@
# Distributed Tracing Specification
> OpenTelemetry-based distributed tracing for the Release Orchestrator.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 13.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Observability Overview](overview.md), [Logging](logging.md)
## Overview
The Release Orchestrator uses OpenTelemetry for distributed tracing, enabling end-to-end visibility of promotion workflows, deployments, and agent tasks.
---
## Trace Context Propagation
### W3C Trace Context
```typescript
// Trace context structure
interface TraceContext {
traceId: string; // 32-char hex
spanId: string; // 16-char hex
parentSpanId?: string;
sampled: boolean;
baggage: Record<string, string>;
}
// Propagation headers
const TRACE_HEADERS = {
W3C_TRACEPARENT: "traceparent",
W3C_TRACESTATE: "tracestate",
BAGGAGE: "baggage",
};
// Example traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
```
### Header Format
```
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
^ ^ ^ ^
| | | |
| trace-id (32 hex) span-id (16 hex) flags
version
```
---
## Key Traces
| Operation | Span Name | Attributes |
|-----------|-----------|------------|
| Promotion request | `promotion.request` | promotion_id, release_id, environment |
| Gate evaluation | `promotion.evaluate_gates` | gate_names, result |
| Workflow execution | `workflow.execute` | workflow_run_id, template_name |
| Step execution | `workflow.step.{type}` | step_run_id, node_id, inputs |
| Deployment job | `deployment.execute` | job_id, environment, strategy |
| Agent task | `agent.task.{type}` | task_id, agent_id, target_id |
| Plugin call | `plugin.{method}` | plugin_id, method, duration |
---
## Trace Hierarchy
### Promotion Flow
```
promotion.request (root)
+-- promotion.evaluate_gates
| +-- gate.security
| +-- gate.approval
| +-- gate.freeze_window
|
+-- workflow.execute
| +-- workflow.step.security-check
| +-- workflow.step.approval
| +-- workflow.step.deploy
| +-- deployment.execute
| +-- deployment.assign_tasks
| +-- agent.task.pull
| +-- agent.task.deploy
| +-- agent.task.health_check
|
+-- evidence.generate
+-- evidence.sign
```
---
## Span Attributes
### Common Attributes
| Attribute | Type | Description |
|-----------|------|-------------|
| `tenant.id` | string | Tenant UUID |
| `user.id` | string | User UUID (if authenticated) |
| `release.id` | string | Release UUID |
| `environment.name` | string | Environment name |
| `error` | boolean | Whether error occurred |
| `error.type` | string | Error type/class |
### Promotion Attributes
| Attribute | Type | Description |
|-----------|------|-------------|
| `promotion.id` | string | Promotion UUID |
| `promotion.status` | string | Current status |
| `promotion.gates` | string[] | Gates evaluated |
| `promotion.decision` | string | allow/deny |
### Deployment Attributes
| Attribute | Type | Description |
|-----------|------|-------------|
| `deployment.job_id` | string | Deployment job UUID |
| `deployment.strategy` | string | Deployment strategy |
| `deployment.target_count` | int | Number of targets |
| `deployment.batch_size` | int | Batch size |
### Agent Task Attributes
| Attribute | Type | Description |
|-----------|------|-------------|
| `task.id` | string | Task UUID |
| `task.type` | string | Task type |
| `agent.id` | string | Agent UUID |
| `target.id` | string | Target UUID |
---
## OpenTelemetry Configuration
### SDK Configuration
```yaml
# otel-config.yaml
service:
name: stella-release-orchestrator
version: ${VERSION}
exporters:
otlp:
endpoint: otel-collector:4317
protocol: grpc
processors:
batch:
timeout: 10s
send_batch_size: 1024
resource:
attributes:
- key: service.namespace
value: stella-ops
- key: deployment.environment
value: ${ENVIRONMENT}
```
### Environment Variables
```bash
OTEL_SERVICE_NAME=stella-release-orchestrator
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1
```
---
## Sampling Strategy
| Environment | Sampling Rate | Reason |
|-------------|---------------|--------|
| Development | 100% | Full visibility |
| Staging | 100% | Full visibility |
| Production | 10% | Cost/performance |
| Production (errors) | 100% | Always sample errors |
---
## Example Trace
```json
{
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
"spans": [
{
"spanId": "00f067aa0ba902b7",
"name": "promotion.request",
"duration_ms": 5234,
"attributes": {
"promotion.id": "promo-123",
"release.id": "rel-456",
"environment.name": "production"
}
},
{
"spanId": "00f067aa0ba902b8",
"parentSpanId": "00f067aa0ba902b7",
"name": "gate.security",
"duration_ms": 234,
"attributes": {
"gate.result": "passed",
"vulnerabilities.critical": 0
}
}
]
}
```
---
## See Also
- [Observability Overview](overview.md)
- [Logging](logging.md)
- [Metrics](metrics.md)
- [Alerting](alerting.md)

View File

@@ -0,0 +1,266 @@
# A/B Release Models
> Two models for A/B releases: target-group based and router-based traffic splitting.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 11.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Traffic Router](routers.md)
**Sprint:** [110_001 A/B Release Manager](../../../../implplan/SPRINT_20260110_110_001_PROGDL_ab_release_manager.md)
## Overview
Stella Ops supports two distinct models for A/B releases:
1. **Target-Group A/B:** Scale different target groups to shift workload
2. **Router-Based A/B:** Use traffic routers to split requests between variations
Each model has different use cases, trade-offs, and implementation requirements.
---
## Model 1: Target-Group A/B
Target-group A/B splits traffic by scaling different groups of targets. Suitable for worker services, background processors, and scenarios where sticky sessions are not required.
### Configuration
```typescript
interface TargetGroupABConfig {
type: "target-group";
// Group definitions
groupA: {
targetGroupId: UUID;
labels?: Record<string, string>;
};
groupB: {
targetGroupId: UUID;
labels?: Record<string, string>;
};
// Rollout by scaling groups
rolloutStrategy: {
type: "scale-groups";
stages: ScaleStage[];
};
}
interface ScaleStage {
name: string;
groupAPercentage: number; // Percentage of group A targets active
groupBPercentage: number; // Percentage of group B targets active
duration?: number; // Auto-advance after duration (seconds)
healthThreshold?: number; // Required health % to advance
requireApproval?: boolean;
}
```
### Example: Worker Service Canary
```typescript
const workerCanaryConfig: TargetGroupABConfig = {
type: "target-group",
groupA: { labels: { "worker-group": "A" } },
groupB: { labels: { "worker-group": "B" } },
rolloutStrategy: {
type: "scale-groups",
stages: [
// Stage 1: 100% A, 10% B (canary)
{ name: "canary", groupAPercentage: 100, groupBPercentage: 10,
duration: 300, healthThreshold: 95 },
// Stage 2: 100% A, 50% B
{ name: "expand", groupAPercentage: 100, groupBPercentage: 50,
duration: 600, healthThreshold: 95 },
// Stage 3: 50% A, 100% B
{ name: "shift", groupAPercentage: 50, groupBPercentage: 100,
duration: 600, healthThreshold: 95 },
// Stage 4: 0% A, 100% B (complete)
{ name: "complete", groupAPercentage: 0, groupBPercentage: 100,
requireApproval: true },
],
},
};
```
### Use Cases
- Background job processors
- Worker services without external traffic
- Infrastructure-level splitting
- Static traffic distribution
- Hardware-based variants
---
## Model 2: Router-Based A/B
Router-based A/B uses traffic routers (Nginx, HAProxy, ALB) to split incoming requests between variations. Suitable for APIs, web services, and scenarios requiring sticky sessions.
### Configuration
```typescript
interface RouterBasedABConfig {
type: "router-based";
// Router integration
routerIntegrationId: UUID;
// Upstream configuration
upstreamName: string;
variationA: {
targets: string[];
serviceName?: string;
};
variationB: {
targets: string[];
serviceName?: string;
};
// Traffic split configuration
trafficSplit: TrafficSplitConfig;
// Rollout strategy
rolloutStrategy: RouterRolloutStrategy;
}
interface TrafficSplitConfig {
type: "weight" | "header" | "cookie" | "tenant" | "composite";
// Weight-based (percentage)
weights?: { A: number; B: number };
// Header-based
headerName?: string;
headerValueA?: string;
headerValueB?: string;
// Cookie-based
cookieName?: string;
cookieValueA?: string;
cookieValueB?: string;
// Tenant-based (by host/path)
tenantRules?: TenantRule[];
}
```
### Rollout Strategy
```typescript
interface RouterRolloutStrategy {
type: "manual" | "time-based" | "health-based" | "composite";
stages: RouterRolloutStage[];
}
interface RouterRolloutStage {
name: string;
trafficPercentageB: number; // % of traffic to variation B
// Advancement criteria
duration?: number; // Auto-advance after duration
healthThreshold?: number; // Required health %
errorRateThreshold?: number; // Max error rate %
latencyThreshold?: number; // Max p99 latency ms
requireApproval?: boolean;
// Optional: specific routing rules for this stage
routingOverrides?: TrafficSplitConfig;
}
```
### Example: API Canary with Health-Based Advancement
```typescript
const apiCanaryConfig: RouterBasedABConfig = {
type: "router-based",
routerIntegrationId: "nginx-prod",
upstreamName: "api-backend",
variationA: { serviceName: "api-v1" },
variationB: { serviceName: "api-v2" },
trafficSplit: { type: "weight", weights: { A: 100, B: 0 } },
rolloutStrategy: {
type: "health-based",
stages: [
{ name: "canary-10", trafficPercentageB: 10,
duration: 300, healthThreshold: 99, errorRateThreshold: 1 },
{ name: "canary-25", trafficPercentageB: 25,
duration: 600, healthThreshold: 99, errorRateThreshold: 1 },
{ name: "canary-50", trafficPercentageB: 50,
duration: 900, healthThreshold: 99, errorRateThreshold: 1 },
{ name: "promote", trafficPercentageB: 100,
requireApproval: true },
],
},
};
```
### Use Cases
- API services with external traffic
- Web applications with user sessions
- Dynamic traffic distribution
- User-based variants (A/B testing)
- Feature flags and gradual rollouts
---
## Routing Strategies
### Weight-Based Routing
Splits traffic by percentage across variations.
```yaml
trafficSplit:
type: weight
weights:
A: 90
B: 10
```
### Header-Based Routing
Routes based on request header values.
```yaml
trafficSplit:
type: header
headerName: X-Feature-Flag
headerValueA: "control"
headerValueB: "experiment"
```
### Cookie-Based Routing
Routes based on cookie values for sticky sessions.
```yaml
trafficSplit:
type: cookie
cookieName: ab_variation
cookieValueA: "A"
cookieValueB: "B"
```
---
## Comparison Matrix
| Aspect | Target-Group A/B | Router-Based A/B |
|--------|------------------|------------------|
| **Traffic Control** | By scaling targets | By routing rules |
| **Sticky Sessions** | Not supported | Supported |
| **Granularity** | Target-level | Request-level |
| **External Traffic** | Not required | Required |
| **Infrastructure** | Target groups | Traffic router |
| **Use Case** | Workers, batch jobs | APIs, web apps |
| **Rollback Speed** | Slower (scaling) | Immediate (routing) |
---
## See Also
- [Progressive Delivery Module](../modules/progressive-delivery.md)
- [Canary Controller](canary.md)
- [Router Plugins](routers.md)
- [Deployment Strategies](../deployment/strategies.md)

View File

@@ -0,0 +1,270 @@
# Canary Controller
> Automated canary deployment controller with health-based stage advancement and automatic rollback.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 11.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Deployment Strategies](../deployment/strategies.md)
**Sprint:** [110_003 Canary Controller](../../../../implplan/SPRINT_20260110_110_003_PROGDL_canary_controller.md)
## Overview
The Canary Controller automates progressive rollout of new versions by gradually shifting traffic, monitoring health metrics, and automatically rolling back if issues are detected.
---
## Canary State Machine
### States
```
CREATED -> DEPLOYING -> EVALUATING -> PROMOTING/ROLLING_BACK -> COMPLETED
```
| State | Description |
|-------|-------------|
| `CREATED` | Canary release defined, not started |
| `DEPLOYING` | Deploying variation B to targets |
| `EVALUATING` | Monitoring health metrics at current stage |
| `PROMOTING` | Advancing to next stage |
| `ROLLING_BACK` | Reverting to variation A |
| `COMPLETED` | Final state (promoted or rolled back) |
---
## Implementation
### Canary Controller Class
```typescript
class CanaryController {
async executeRollout(abRelease: ABRelease): Promise<void> {
const strategy = abRelease.rolloutStrategy;
for (let i = 0; i < strategy.stages.length; i++) {
const stage = strategy.stages[i];
const stageRecord = await this.startStage(abRelease, stage, i);
try {
// 1. Apply traffic configuration for this stage
await this.applyStageTraffic(abRelease, stage);
this.emit("canary.stage_started", { abRelease, stage, stageNumber: i });
// 2. Wait for stage completion based on criteria
const result = await this.waitForStageCompletion(abRelease, stage);
if (!result.success) {
// Health check failed - rollback
this.log(`Stage ${stage.name} failed health check: ${result.reason}`);
await this.rollback(abRelease, result.reason);
return;
}
// 3. Check if approval required
if (stage.requireApproval) {
this.log(`Stage ${stage.name} requires approval`);
await this.pauseForApproval(abRelease, stage);
// Wait for approval
const approval = await this.waitForApproval(abRelease, stage);
if (!approval.approved) {
await this.rollback(abRelease, "Approval denied");
return;
}
}
await this.completeStage(stageRecord, "succeeded");
this.emit("canary.stage_completed", { abRelease, stage, stageNumber: i });
} catch (error) {
await this.completeStage(stageRecord, "failed", error.message);
await this.rollback(abRelease, error.message);
return;
}
}
// Rollout complete
await this.completeRollout(abRelease);
this.emit("canary.promoted", { abRelease });
}
}
```
### Stage Completion Logic
```typescript
private async waitForStageCompletion(
abRelease: ABRelease,
stage: RolloutStage
): Promise<StageCompletionResult> {
const startTime = Date.now();
const checkInterval = 30000; // 30 seconds
while (true) {
// Check health metrics
const health = await this.checkHealth(abRelease, stage);
if (!health.healthy) {
return {
success: false,
reason: `Health check failed: ${health.reason}`
};
}
// Check error rate (if threshold configured)
if (stage.errorRateThreshold !== undefined) {
const errorRate = await this.getErrorRate(abRelease);
if (errorRate > stage.errorRateThreshold) {
return {
success: false,
reason: `Error rate ${errorRate}% exceeds threshold ${stage.errorRateThreshold}%`
};
}
}
// Check latency (if threshold configured)
if (stage.latencyThreshold !== undefined) {
const latency = await this.getP99Latency(abRelease);
if (latency > stage.latencyThreshold) {
return {
success: false,
reason: `P99 latency ${latency}ms exceeds threshold ${stage.latencyThreshold}ms`
};
}
}
// Check duration (auto-advance)
if (stage.duration !== undefined) {
const elapsed = (Date.now() - startTime) / 1000;
if (elapsed >= stage.duration) {
return { success: true };
}
}
// Wait before next check
await sleep(checkInterval);
}
}
```
### Traffic Application
```typescript
private async applyStageTraffic(abRelease: ABRelease, stage: RolloutStage): Promise<void> {
if (abRelease.config.type === "router-based") {
const router = await this.getRouterConnector(abRelease.config.routerIntegrationId);
await router.shiftTraffic(
abRelease.config.variationA.serviceName,
abRelease.config.variationB.serviceName,
stage.trafficPercentageB
);
} else if (abRelease.config.type === "target-group") {
// Scale target groups
await this.scaleTargetGroup(
abRelease.config.groupA,
stage.groupAPercentage
);
await this.scaleTargetGroup(
abRelease.config.groupB,
stage.groupBPercentage
);
}
}
```
### Rollback
```typescript
async rollback(abRelease: ABRelease, reason: string): Promise<void> {
this.log(`Rolling back A/B release: ${reason}`);
this.emit("canary.rollback_started", { abRelease, reason });
if (abRelease.config.type === "router-based") {
// Shift all traffic back to A
const router = await this.getRouterConnector(abRelease.config.routerIntegrationId);
await router.shiftTraffic(
abRelease.config.variationB.serviceName,
abRelease.config.variationA.serviceName,
100
);
} else if (abRelease.config.type === "target-group") {
// Scale B to 0, A to 100
await this.scaleTargetGroup(abRelease.config.groupA, 100);
await this.scaleTargetGroup(abRelease.config.groupB, 0);
}
abRelease.status = "rolled_back";
await this.save(abRelease);
this.emit("canary.rolled_back", { abRelease, reason });
}
```
---
## Configuration
### Canary Stages
```yaml
rolloutStrategy:
type: health-based
stages:
- name: canary-5
trafficPercentageB: 5
duration: 300 # 5 minutes
healthThreshold: 99
errorRateThreshold: 0.5
- name: canary-25
trafficPercentageB: 25
duration: 600 # 10 minutes
healthThreshold: 99
errorRateThreshold: 1.0
- name: canary-50
trafficPercentageB: 50
duration: 900 # 15 minutes
healthThreshold: 99
errorRateThreshold: 1.0
- name: promote
trafficPercentageB: 100
requireApproval: true
```
### Health Metrics
| Metric | Description | Typical Threshold |
|--------|-------------|-------------------|
| Success Rate | % of successful requests | > 99% |
| Error Rate | % of failed requests | < 1% |
| P99 Latency | 99th percentile response time | < 500ms |
| Health Check | Container/service health | Healthy |
---
## Events
The canary controller emits events for observability:
| Event | Description |
|-------|-------------|
| `canary.stage_started` | Stage execution began |
| `canary.stage_completed` | Stage completed successfully |
| `canary.rollback_started` | Rollback initiated |
| `canary.rolled_back` | Rollback completed |
| `canary.promoted` | Full promotion completed |
---
## See Also
- [Progressive Delivery Module](../modules/progressive-delivery.md)
- [A/B Release Models](ab-releases.md)
- [Router Plugins](routers.md)
- [Metrics](../operations/metrics.md)

View File

@@ -0,0 +1,348 @@
# Router Plugins
> Traffic router plugins for progressive delivery (Nginx, AWS ALB, and custom implementations).
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 11.4](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Progressive Delivery Module](../modules/progressive-delivery.md), [Plugin System](../modules/plugin-system.md)
**Sprint:** [110_004 Router Plugins](../../../../implplan/SPRINT_20260110_110_004_PROGDL_nginx_router.md)
## Overview
Router plugins enable traffic shifting for progressive delivery. The orchestrator ships with an Nginx router plugin for v1, with HAProxy, Traefik, and AWS ALB available as additional plugins.
---
## Router Plugin Interface
All router plugins implement the `TrafficRouterPlugin` interface:
```typescript
interface TrafficRouterPlugin {
// Configuration
configureRoute(config: RouteConfig): Promise<void>;
// Traffic operations
shiftTraffic(from: string, to: string, percentage: number): Promise<void>;
getTrafficDistribution(): Promise<TrafficDistribution>;
// Health
validateConfig(): Promise<ValidationResult>;
reload(): Promise<void>;
}
interface RouteConfig {
upstream: string;
serverName: string;
variations: Variation[];
splitType: "weight" | "header" | "cookie";
headerName?: string;
headerValueB?: string;
stickySession?: boolean;
stickyDuration?: number;
}
interface Variation {
name: string;
targets: string[];
weight: number;
}
interface TrafficDistribution {
variations: {
name: string;
percentage: number;
targets: string[];
}[];
}
```
---
## Nginx Router Plugin (v1 Built-in)
The Nginx plugin generates and manages Nginx configuration for traffic splitting.
### Implementation
```typescript
class NginxRouterPlugin implements TrafficRouterPlugin {
async configureRoute(config: RouteConfig): Promise<void> {
const upstreamConfig = this.generateUpstreamConfig(config);
const serverConfig = this.generateServerConfig(config);
// Write configuration files
await this.writeConfig(
`/etc/nginx/conf.d/upstream-${config.upstream}.conf`,
upstreamConfig
);
await this.writeConfig(
`/etc/nginx/conf.d/server-${config.upstream}.conf`,
serverConfig
);
// Validate configuration
const validation = await this.validateConfig();
if (!validation.valid) {
throw new Error(`Nginx config validation failed: ${validation.error}`);
}
// Reload nginx
await this.reload();
}
}
```
### Upstream Configuration
```typescript
private generateUpstreamConfig(config: RouteConfig): string {
const lines: string[] = [];
for (const variation of config.variations) {
lines.push(`upstream ${config.upstream}_${variation.name} {`);
for (const target of variation.targets) {
lines.push(` server ${target};`);
}
lines.push(`}`);
lines.push(``);
}
// Combined upstream with weights (for percentage-based routing)
if (config.splitType === "weight") {
lines.push(`upstream ${config.upstream} {`);
for (const variation of config.variations) {
const weight = variation.weight;
for (const target of variation.targets) {
lines.push(` server ${target} weight=${weight};`);
}
}
lines.push(`}`);
}
return lines.join("\n");
}
```
### Server Configuration
```typescript
private generateServerConfig(config: RouteConfig): string {
if (config.splitType === "header" || config.splitType === "cookie") {
// Split block based on header/cookie
return `
map $http_${config.headerName || "x-variation"} $${config.upstream}_backend {
default ${config.upstream}_A;
"${config.headerValueB || "B"}" ${config.upstream}_B;
}
server {
listen 80;
server_name ${config.serverName};
location / {
proxy_pass http://$${config.upstream}_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
`;
} else {
// Weight-based (default)
return `
server {
listen 80;
server_name ${config.serverName};
location / {
proxy_pass http://${config.upstream};
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
`;
}
}
```
### Traffic Shifting
```typescript
async shiftTraffic(from: string, to: string, percentage: number): Promise<void> {
const config = await this.getCurrentConfig();
// Update weights
for (const variation of config.variations) {
if (variation.name === to) {
variation.weight = percentage;
} else {
variation.weight = 100 - percentage;
}
}
await this.configureRoute(config);
}
async getTrafficDistribution(): Promise<TrafficDistribution> {
// Parse current nginx config to get weights
const config = await this.parseCurrentConfig();
return {
variations: config.variations.map(v => ({
name: v.name,
percentage: v.weight,
targets: v.targets,
})),
};
}
```
---
## AWS ALB Router Plugin
The AWS ALB plugin manages weighted target groups for traffic splitting.
### Implementation
```typescript
class AWSALBRouterPlugin implements TrafficRouterPlugin {
private alb: AWS.ELBv2;
async configureRoute(config: RouteConfig): Promise<void> {
const listenerArn = config.listenerArn;
// Create/update target groups for each variation
const targetGroupArns: Record<string, string> = {};
for (const variation of config.variations) {
const tgArn = await this.ensureTargetGroup(
`${config.upstream}-${variation.name}`,
variation.targets
);
targetGroupArns[variation.name] = tgArn;
}
// Update listener rule with weighted target groups
await this.alb.modifyRule({
RuleArn: config.ruleArn,
Actions: [{
Type: "forward",
ForwardConfig: {
TargetGroups: config.variations.map(v => ({
TargetGroupArn: targetGroupArns[v.name],
Weight: v.weight,
})),
TargetGroupStickinessConfig: {
Enabled: config.stickySession || false,
DurationSeconds: config.stickyDuration || 3600,
},
},
}],
}).promise();
}
async shiftTraffic(from: string, to: string, percentage: number): Promise<void> {
const rule = await this.getRule();
const forwardConfig = rule.Actions[0].ForwardConfig;
// Update weights
for (const tg of forwardConfig.TargetGroups) {
if (tg.TargetGroupArn.includes(`-${to}`)) {
tg.Weight = percentage;
} else {
tg.Weight = 100 - percentage;
}
}
await this.alb.modifyRule({
RuleArn: rule.RuleArn,
Actions: rule.Actions,
}).promise();
}
async getTrafficDistribution(): Promise<TrafficDistribution> {
const rule = await this.getRule();
const forwardConfig = rule.Actions[0].ForwardConfig;
const variations = [];
for (const tg of forwardConfig.TargetGroups) {
const targets = await this.getTargetGroupTargets(tg.TargetGroupArn);
const name = tg.TargetGroupArn.split("-").pop();
variations.push({
name,
percentage: tg.Weight,
targets: targets.map(t => t.Id),
});
}
return { variations };
}
}
```
---
## Router Plugin Catalog
| Plugin | Status | Description |
|--------|--------|-------------|
| Nginx | v1 Built-in | Configuration-based weight/header routing |
| HAProxy | Plugin | Runtime API for traffic management |
| Traefik | Plugin | Dynamic configuration via API |
| AWS ALB | Plugin | Weighted target groups |
| Envoy | Planned | xDS API integration |
---
## Creating Custom Router Plugins
To create a custom router plugin:
1. **Implement Interface:** Create a class implementing `TrafficRouterPlugin`
2. **Register Plugin:** Add to plugin registry with capabilities
3. **Configuration Schema:** Define JSON Schema for plugin config
4. **Health Checks:** Implement connection testing
5. **Rollback Support:** Handle traffic reversion on failures
### Example Plugin Manifest
```yaml
plugin:
name: my-router
version: 1.0.0
type: router
capabilities:
- traffic-routing
- weight-based
- header-based
config:
type: object
properties:
endpoint:
type: string
description: Router API endpoint
auth:
type: object
properties:
type:
enum: [basic, token]
credentialRef:
type: string
```
---
## See Also
- [Progressive Delivery Module](../modules/progressive-delivery.md)
- [Plugin System](../modules/plugin-system.md)
- [Canary Controller](canary.md)
- [A/B Release Models](ab-releases.md)

View File

@@ -0,0 +1,246 @@
# Implementation Roadmap
> Phased delivery plan for the Release Orchestrator implementation.
**Status:** Planned
**Source:** [Architecture Advisory Section 14](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related:** [Implementation Guide](implementation-guide.md), [Test Structure](test-structure.md)
## Overview
The Release Orchestrator is delivered in 8 phases over 34 weeks, progressively building from foundational infrastructure to full plugin ecosystem support.
---
## Phased Delivery Plan
### Phase 1: Foundation (Weeks 1-4)
**Goal:** Core infrastructure and basic release management
| Week | Deliverables |
|------|--------------|
| Week 1 | Database schema migration; INTHUB integration-manager; connection-profiles |
| Week 2 | ENVMGR environment-manager; target-registry (basic) |
| Week 3 | RELMAN component-registry; version-manager; release-manager |
| Week 4 | Basic release CRUD APIs; CLI commands; integration tests |
**Exit Criteria:**
- Can create environments with config
- Can register components with image repos
- Can create releases with pinned digests
- Can list/search releases
**Certified Path:** Manual release creation; no deployment yet
---
### Phase 2: Workflow Engine (Weeks 5-8)
**Goal:** Workflow execution capability
| Week | Deliverables |
|------|--------------|
| Week 5 | WORKFL step-registry; built-in step types (approval, policy-gate, notify) |
| Week 6 | WORKFL workflow-designer; workflow template CRUD |
| Week 7 | WORKFL workflow-engine; DAG execution; state machine |
| Week 8 | Step executor; retry logic; timeout handling; workflow run APIs |
**Exit Criteria:**
- Can create workflow templates via API
- Can execute workflows with approval steps
- Workflow state machine handles all transitions
- Step retries work correctly
**Certified Path:** Approval-only workflows; no deployment execution yet
---
### Phase 3: Promotion & Decision (Weeks 9-12)
**Goal:** Promotion workflow with security gates
| Week | Deliverables |
|------|--------------|
| Week 9 | PROMOT promotion-manager; approval-gateway |
| Week 10 | PROMOT decision-engine; security gate integration with SCANENG |
| Week 11 | Gate registry; freeze window gate; SoD enforcement |
| Week 12 | Promotion APIs; "Why blocked?" endpoint; decision record |
**Exit Criteria:**
- Can request promotion
- Security gates evaluate scan verdicts
- Approval workflow enforces SoD
- Decision record captures gate results
**Certified Path:** Promotions with security + approval gates; no deployment yet
---
### Phase 4: Deployment Execution (Weeks 13-18)
**Goal:** Deploy to Docker/Compose targets
| Week | Deliverables |
|------|--------------|
| Week 13 | AGENTS agent-core; agent registration; heartbeat |
| Week 14 | AGENTS agent-docker; Docker host deployment |
| Week 15 | AGENTS agent-compose; Compose deployment |
| Week 16 | DEPLOY deploy-orchestrator; artifact-generator |
| Week 17 | DEPLOY rollback-manager; version sticker writing |
| Week 18 | RELEVI evidence-collector; evidence-signer; audit-exporter |
**Exit Criteria:**
- Agents can register and receive tasks
- Docker deployment works with digest verification
- Compose deployment writes lock files
- Rollback restores previous version
- Evidence packets generated for deployments
**Certified Path:** Full promotion -> deployment flow for Docker/Compose
---
### Phase 5: UI & Polish (Weeks 19-22)
**Goal:** Web console for release orchestration
| Week | Deliverables |
|------|--------------|
| Week 19 | Dashboard components; metrics widgets |
| Week 20 | Environment overview; release detail screens |
| Week 21 | Workflow editor (graph); run visualization |
| Week 22 | Promotion UI; approval queue; "Why blocked?" modal |
**Exit Criteria:**
- Dashboard shows operational metrics
- Can manage environments/releases via UI
- Can create/edit workflows in graph editor
- Can approve promotions via UI
**Certified Path:** Complete v1 user experience
---
### Phase 6: Progressive Delivery (Weeks 23-26)
**Goal:** A/B releases and canary deployments
| Week | Deliverables |
|------|--------------|
| Week 23 | PROGDL ab-manager; target-group A/B |
| Week 24 | PROGDL canary-controller; stage execution |
| Week 25 | PROGDL traffic-router; Nginx plugin |
| Week 26 | Canary UI; traffic visualization; health monitoring |
**Exit Criteria:**
- Can create A/B release with variations
- Canary controller advances stages based on health
- Traffic router shifts weights
- Rollback on health failure works
**Certified Path:** Target-group A/B; Nginx router-based A/B
---
### Phase 7: Extended Targets (Weeks 27-30)
**Goal:** ECS and Nomad support; SSH/WinRM agentless
| Week | Deliverables |
|------|--------------|
| Week 27 | AGENTS agent-ssh; SSH remote executor |
| Week 28 | AGENTS agent-winrm; WinRM remote executor |
| Week 29 | AGENTS agent-ecs; ECS deployment |
| Week 30 | AGENTS agent-nomad; Nomad deployment |
**Exit Criteria:**
- SSH deployment works with script execution
- WinRM deployment works with PowerShell
- ECS task definition updates work
- Nomad job submissions work
**Certified Path:** All target types operational
---
### Phase 8: Plugin Ecosystem (Weeks 31-34)
**Goal:** Full plugin system; external integrations
| Week | Deliverables |
|------|--------------|
| Week 31 | PLUGIN plugin-registry; plugin-loader |
| Week 32 | PLUGIN plugin-sandbox; plugin-sdk |
| Week 33 | GitHub plugin; GitLab plugin |
| Week 34 | Jenkins plugin; Vault plugin |
**Exit Criteria:**
- Can install and configure plugins
- Plugins can contribute step types
- Plugins can contribute integrations
- Plugin sandbox enforces limits
**Certified Path:** GitHub + Harbor + Docker/Compose + Vault
---
## Resource Requirements
### Team Structure
| Role | Count | Responsibilities |
|------|-------|------------------|
| Tech Lead | 1 | Architecture decisions; code review; unblocking |
| Backend Engineers | 4 | Module development; API implementation |
| Frontend Engineers | 2 | Web console; dashboard; workflow editor |
| DevOps Engineer | 1 | CI/CD; infrastructure; agent deployment |
| QA Engineer | 1 | Test automation; integration testing |
| Technical Writer | 0.5 | Documentation; API docs; user guides |
### Infrastructure Requirements
| Component | Specification |
|-----------|---------------|
| PostgreSQL | Primary database; 16+ recommended; read replicas for scale |
| Redis | Job queues; caching; session storage |
| Object Storage | S3-compatible; evidence packets; large artifacts |
| Container Runtime | Docker; for plugin sandboxes |
| Kubernetes | Optional; for Stella core deployment (not required for targets) |
---
## Risk Mitigation
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Agent security complexity | High | High | Early security review; penetration testing; mTLS implementation in Phase 4 |
| Workflow state machine edge cases | Medium | High | Comprehensive state transition tests; chaos testing |
| Plugin sandbox escapes | Low | Critical | Security audit; capability restrictions; resource limits |
| Database migration issues | Medium | Medium | Staged rollout; rollback scripts; data validation |
| UI performance with large workflows | Medium | Medium | Virtual rendering; lazy loading; performance testing |
| Integration compatibility | High | Medium | Abstract connector interface; extensive integration tests |
---
## Success Metrics
| Phase | Key Metrics |
|-------|-------------|
| Phase 1 | Release creation time < 5s; API latency p99 < 200ms |
| Phase 2 | Workflow execution reliability > 99.9% |
| Phase 3 | Gate evaluation time < 500ms; SoD enforcement 100% |
| Phase 4 | Deployment success rate > 99%; rollback time < 60s |
| Phase 5 | UI initial load < 2s; real-time update latency < 1s |
| Phase 6 | Canary rollback trigger time < 30s |
| Phase 7 | All target type coverage with unified API |
| Phase 8 | Plugin sandbox isolation verified by security audit |
---
## References
- [Sprint Index](../../implplan/SPRINT_20260110_100_000_INDEX_release_orchestrator.md)
- [Implementation Guide](implementation-guide.md)
- [Test Structure](test-structure.md)
- [Architecture Overview](architecture.md)

View File

@@ -0,0 +1,239 @@
# Audit Trail
> Audit event structure and audited operations for compliance and forensics.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 8.5](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Evidence Module](../modules/evidence.md), [Security Overview](overview.md)
**Sprints:** [109_001 Evidence Collector](../../../../implplan/SPRINT_20260110_109_001_RELEVI_evidence_collector.md)
## Overview
The Release Orchestrator maintains a tamper-evident audit trail of all security-relevant operations. Audit events are cryptographically chained to detect tampering.
---
## Audit Event Structure
### TypeScript Interface
```typescript
interface AuditEvent {
id: UUID;
timestamp: DateTime;
tenantId: UUID;
// Actor
actorType: "user" | "agent" | "system" | "plugin";
actorId: UUID;
actorName: string;
actorIp?: string;
// Action
action: string; // "promotion.approved", "deployment.started"
resource: string; // "promotion"
resourceId: UUID;
// Context
environmentId?: UUID;
releaseId?: UUID;
promotionId?: UUID;
// Details
before?: object; // State before (for updates)
after?: object; // State after
metadata?: object; // Additional context
// Integrity
previousEventHash: string; // Hash chain for tamper detection
eventHash: string;
}
```
---
## Audited Operations
| Category | Operations |
|----------|------------|
| **Authentication** | Login, logout, token refresh, failed attempts |
| **Authorization** | Permission denied events |
| **Environments** | Create, update, delete, freeze window changes |
| **Releases** | Create, deprecate, archive |
| **Promotions** | Request, approve, reject, cancel |
| **Deployments** | Start, complete, fail, rollback |
| **Targets** | Register, update, delete, health changes |
| **Agents** | Register, heartbeat gaps, capability changes |
| **Integrations** | Create, update, delete, test |
| **Plugins** | Enable, disable, config changes |
| **Evidence** | Create (never update/delete) |
---
## Hash Chain
### Chain Verification
The audit trail uses SHA-256 hash chaining for tamper detection:
```typescript
interface HashChainEntry {
eventId: UUID;
eventHash: string;
previousEventHash: string;
}
function computeEventHash(event: AuditEvent): string {
const payload = JSON.stringify({
id: event.id,
timestamp: event.timestamp,
tenantId: event.tenantId,
actorType: event.actorType,
actorId: event.actorId,
action: event.action,
resource: event.resource,
resourceId: event.resourceId,
previousEventHash: event.previousEventHash,
});
return sha256(payload);
}
function verifyChain(events: AuditEvent[]): VerificationResult {
for (let i = 1; i < events.length; i++) {
const current = events[i];
const previous = events[i - 1];
if (current.previousEventHash !== previous.eventHash) {
return {
valid: false,
brokenAt: i,
reason: "Hash chain broken"
};
}
const computed = computeEventHash(current);
if (computed !== current.eventHash) {
return {
valid: false,
brokenAt: i,
reason: "Event hash mismatch"
};
}
}
return { valid: true };
}
```
---
## Example Audit Events
### Promotion Approved
```json
{
"id": "evt-123",
"timestamp": "2026-01-09T14:32:15Z",
"tenantId": "tenant-uuid",
"actorType": "user",
"actorId": "user-uuid",
"actorName": "jane@example.com",
"actorIp": "192.168.1.100",
"action": "promotion.approved",
"resource": "promotion",
"resourceId": "promo-uuid",
"environmentId": "env-uuid",
"releaseId": "rel-uuid",
"promotionId": "promo-uuid",
"before": {
"status": "pending"
},
"after": {
"status": "approved",
"approvals": 2
},
"metadata": {
"comment": "LGTM"
},
"previousEventHash": "sha256:abc...",
"eventHash": "sha256:def..."
}
```
### Deployment Started
```json
{
"id": "evt-124",
"timestamp": "2026-01-09T14:32:20Z",
"tenantId": "tenant-uuid",
"actorType": "system",
"actorId": "system",
"actorName": "deployment-orchestrator",
"action": "deployment.started",
"resource": "deployment",
"resourceId": "deploy-uuid",
"environmentId": "env-uuid",
"releaseId": "rel-uuid",
"promotionId": "promo-uuid",
"after": {
"status": "deploying",
"strategy": "rolling",
"targetCount": 5
},
"previousEventHash": "sha256:def...",
"eventHash": "sha256:ghi..."
}
```
---
## Retention Policy
| Environment | Retention Period |
|-------------|------------------|
| All tenants | 7 years (compliance) |
| After tenant deletion | 7 years (legal hold) |
| Archive format | NDJSON, signed |
---
## Export Format
Audit events can be exported for compliance reporting:
```bash
# Export audit trail for a date range
GET /api/v1/audit/export?
start=2026-01-01T00:00:00Z&
end=2026-01-31T23:59:59Z&
format=ndjson
```
Response includes signed digest for verification:
```json
{
"export": {
"startDate": "2026-01-01T00:00:00Z",
"endDate": "2026-01-31T23:59:59Z",
"eventCount": 15234,
"firstEventHash": "sha256:abc...",
"lastEventHash": "sha256:xyz...",
"downloadUrl": "https://..."
},
"signature": "base64-signature",
"signedAt": "2026-02-01T00:00:00Z"
}
```
---
## See Also
- [Security Overview](overview.md)
- [Evidence](../modules/evidence.md)
- [Logging](../operations/logging.md)
- [Evidence Schema](../appendices/evidence-schema.md)

View File

@@ -0,0 +1,207 @@
# Dashboard Specification
> Main dashboard layout and metrics specification for the Release Orchestrator UI.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 12.1](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [WebSocket APIs](../api/websockets.md), [Metrics](../operations/metrics.md)
**Sprint:** [111_001 Dashboard Overview](../../../../implplan/SPRINT_20260110_111_001_FE_dashboard_overview.md)
## Overview
The dashboard provides a real-time overview of security posture, release operations, estate health, and compliance status.
---
## Dashboard Layout
```
+-----------------------------------------------------------------------------+
| STELLA OPS SUITE |
| +-----+ [User Menu v] |
| |Logo | Dashboard Releases Environments Workflows Integrations |
+-----------------------------------------------------------------------------+
| |
| +-------------------------------+ +-----------------------------------+ |
| | SECURITY POSTURE | | RELEASE OPERATIONS | |
| | | | | |
| | +---------+ +---------+ | | +---------+ +---------+ | |
| | |Critical | | High | | | |In Flight| |Completed| | |
| | | 0 * | | 3 * | | | | 2 | | 47 | | |
| | |reachable| |reachable| | | |deploys | | today | | |
| | +---------+ +---------+ | | +---------+ +---------+ | |
| | | | | |
| | Blocked: 2 releases | | Pending Approval: 3 | |
| | Risk Drift: 1 env | | Failed (24h): 1 | |
| | | | | |
| +-------------------------------+ +-----------------------------------+ |
| |
| +-------------------------------+ +-----------------------------------+ |
| | ESTATE HEALTH | | COMPLIANCE/AUDIT | |
| | | | | |
| | Agents: 12 online, 1 offline| | Evidence Complete: 98% | |
| | Targets: 45/47 healthy | | Policy Changes: 2 (this week) | |
| | Drift Detected: 2 targets | | Audit Exports: 5 (this month) | |
| | | | | |
| +-------------------------------+ +-----------------------------------+ |
| |
| +-----------------------------------------------------------------------+ |
| | RECENT ACTIVITY | |
| | | |
| | * 14:32 myapp-v2.3.1 deployed to prod (jane@example.com) | |
| | o 14:28 myapp-v2.3.1 promoted to stage (auto) | |
| | * 14:15 api-v1.2.0 blocked: critical vuln CVE-2024-1234 | |
| | o 13:45 worker-v3.0.0 release created (john@example.com) | |
| | * 13:30 Target prod-web-03 health: degraded | |
| | | |
| +-----------------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------------+
```
---
## Dashboard Metrics
### TypeScript Interfaces
```typescript
interface DashboardMetrics {
// Security Posture
security: {
criticalReachable: number;
highReachable: number;
blockedReleases: number;
riskDriftEnvironments: number;
digestsAnalyzedToday: number;
digestQuota: number;
};
// Release Operations
operations: {
deploymentsInFlight: number;
deploymentsCompletedToday: number;
deploymentsFailed24h: number;
pendingApprovals: number;
averageDeployTime: number; // seconds
};
// Estate Health
estate: {
agentsOnline: number;
agentsOffline: number;
agentsDegraded: number;
targetsHealthy: number;
targetsUnhealthy: number;
targetsDrift: number;
};
// Compliance/Audit
compliance: {
evidenceCompleteness: number; // percentage
policyChangesThisWeek: number;
auditExportsThisMonth: number;
lastExportDate: DateTime;
};
}
```
---
## Dashboard Panels
### 1. Security Posture Panel
Displays current security state across all releases:
| Metric | Description |
|--------|-------------|
| Critical Reachable | Critical vulnerabilities with confirmed reachability |
| High Reachable | High severity vulnerabilities with confirmed reachability |
| Blocked Releases | Releases blocked by security gates |
| Risk Drift | Environments with changed risk since deployment |
### 2. Release Operations Panel
Shows active deployment operations:
| Metric | Description |
|--------|-------------|
| In Flight | Deployments currently in progress |
| Completed Today | Successful deployments in last 24h |
| Pending Approval | Promotions awaiting approval |
| Failed (24h) | Failed deployments in last 24h |
### 3. Estate Health Panel
Displays agent and target health:
| Metric | Description |
|--------|-------------|
| Agents Online | Number of agents reporting healthy |
| Agents Offline | Agents that missed heartbeats |
| Targets Healthy | Targets passing health checks |
| Drift Detected | Targets with version drift |
### 4. Compliance/Audit Panel
Shows audit and compliance status:
| Metric | Description |
|--------|-------------|
| Evidence Complete | % of deployments with full evidence |
| Policy Changes | Policy modifications this week |
| Audit Exports | Evidence exports this month |
---
## Real-Time Updates
### WebSocket Integration
```typescript
interface DashboardStreamMessage {
type: "metric_update" | "activity" | "alert";
timestamp: DateTime;
payload: MetricUpdate | ActivityEvent | Alert;
}
// Subscribe to dashboard stream
const ws = new WebSocket("/api/v1/dashboard/stream");
ws.onmessage = (event) => {
const message: DashboardStreamMessage = JSON.parse(event.data);
switch (message.type) {
case "metric_update":
updateMetrics(message.payload);
break;
case "activity":
addActivityItem(message.payload);
break;
case "alert":
showAlert(message.payload);
break;
}
};
```
---
## Performance Targets
| Metric | Target |
|--------|--------|
| Initial Load | < 2 seconds |
| Metric Refresh | Every 30 seconds |
| WebSocket Reconnect | Exponential backoff (1s, 2s, 4s, ... 30s max) |
| Activity History | Last 50 events |
---
## See Also
- [WebSocket APIs](../api/websockets.md)
- [Metrics](../operations/metrics.md)
- [Workflow Editor](workflow-editor.md)
- [Key Screens](screens.md)

View File

@@ -0,0 +1,232 @@
# Key UI Screens
> Specification for key UI screens: Environment Overview, Release Detail, and Why Blocked modal.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 12.3](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Environment Manager](../modules/environment-manager.md), [Release Manager](../modules/release-manager.md)
**Sprints:** [111_002 - 111_007](../../../../implplan/)
## Overview
This document specifies the key UI screens for release orchestration.
---
## Environment Overview Screen
The environment overview shows the deployment pipeline and current state of each environment.
```
+-----------------------------------------------------------------------------+
| ENVIRONMENTS [+ New Environment] |
+-----------------------------------------------------------------------------+
| |
| +------------------------------------------------------------------------+ |
| | ENVIRONMENT PIPELINE | |
| | | |
| | +---------+ +---------+ +---------+ +---------+ | |
| | | DEV | ---> | TEST | ---> | STAGE | ---> | PROD | | |
| | | | | | | | | | | |
| | | v2.4.0 | | v2.3.1 | | v2.3.1 | | v2.3.0 | | |
| | | * 5 min | | * 2h | | * 1d | | * 3d | | |
| | +---------+ +---------+ +---------+ +---------+ | |
| | | |
| +------------------------------------------------------------------------+ |
| |
| +------------------------------------------------------------------------+ |
| | PRODUCTION [Manage] [View] | |
| | | |
| | Current Release: myapp-v2.3.0 | |
| | Deployed: 3 days ago by jane@example.com | |
| | Targets: 5 healthy, 0 unhealthy | |
| | | |
| | +---------------------------------------------------------------+ | |
| | | Pending Promotion: myapp-v2.3.1 [Review] | | |
| | | Waiting: 2 approvals (1/2) | | |
| | | Security: V All gates pass | | |
| | +---------------------------------------------------------------+ | |
| | | |
| | Freeze Windows: None active | |
| | Required Approvals: 2 | |
| | | |
| +------------------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------------+
```
### Features
- **Environment Pipeline:** Visual flow showing version progression
- **Environment Cards:** Detailed view of each environment
- **Target Health:** Real-time target health indicators
- **Pending Promotions:** Promotions awaiting action
- **Freeze Windows:** Active and scheduled freeze windows
- **Approval Status:** Current approval count vs required
---
## Release Detail Screen
The release detail screen shows all information about a specific release.
```
+-----------------------------------------------------------------------------+
| RELEASE: myapp-v2.3.1 |
| Created: 2 hours ago by jane@example.com |
+-----------------------------------------------------------------------------+
| |
| [Overview] [Components] [Security] [Deployments] [Evidence] |
| |
| +------------------------------------------------------------------------+ |
| | COMPONENTS | |
| | | |
| | +------------------------------------------------------------------+ | |
| | | api | | |
| | | Version: 2.3.1 Digest: sha256:abc123... | | |
| | | Security: V 0 critical, 0 high (0 reachable) | | |
| | | Image: registry.example.com/myapp/api@sha256:abc123 | | |
| | +------------------------------------------------------------------+ | |
| | | |
| | +------------------------------------------------------------------+ | |
| | | worker | | |
| | | Version: 2.3.1 Digest: sha256:def456... | | |
| | | Security: V 0 critical, 0 high (0 reachable) | | |
| | | Image: registry.example.com/myapp/worker@sha256:def456 | | |
| | +------------------------------------------------------------------+ | |
| | | |
| +------------------------------------------------------------------------+ |
| |
| +------------------------------------------------------------------------+ |
| | DEPLOYMENT STATUS | |
| | | |
| | dev *--------------------------------------------* Deployed (2h) | |
| | test *--------------------------------------------* Deployed (1h) | |
| | stage o--------------------------------------------* Deploying... | |
| | prod o Not deployed | |
| | | |
| +------------------------------------------------------------------------+ |
| |
| [Promote to Stage v] [Compare with Production] [Download Evidence] |
| |
+-----------------------------------------------------------------------------+
```
### Tabs
1. **Overview:** Release metadata and summary
2. **Components:** Component list with digests and versions
3. **Security:** Vulnerability summary and reachability analysis
4. **Deployments:** Deployment history across environments
5. **Evidence:** Evidence packets for compliance
### Features
- **Digest Display:** Full OCI digests for each component
- **Security Summary:** Vulnerability counts by severity
- **Deployment Timeline:** Visual progress across environments
- **Quick Actions:** Promote, compare, and export options
---
## "Why Blocked?" Modal
The "Why Blocked?" modal explains why a promotion cannot proceed.
```
+-----------------------------------------------------------------------------+
| WHY IS THIS PROMOTION BLOCKED? [Close] |
+-----------------------------------------------------------------------------+
| |
| Release: myapp-v2.4.0 -> Production |
| |
| +------------------------------------------------------------------------+ |
| | X SECURITY GATE FAILED | |
| | | |
| | Component 'api' has 1 critical reachable vulnerability: | |
| | | |
| | - CVE-2024-1234 (Critical, CVSS 9.8) | |
| | Package: log4j 2.14.0 | |
| | Reachability: V Confirmed reachable via api/logging/Logger.java | |
| | Fixed in: 2.17.1 | |
| | [View Details] [View Evidence] | |
| | | |
| | Remediation: Update log4j to version 2.17.1 or later | |
| | | |
| +------------------------------------------------------------------------+ |
| |
| +------------------------------------------------------------------------+ |
| | V APPROVAL GATE PASSED | |
| | | |
| | Required: 2 approvals | |
| | Received: 2 approvals | |
| | - john@example.com (2h ago): "LGTM" | |
| | - sarah@example.com (1h ago): "Approved for prod" | |
| | | |
| +------------------------------------------------------------------------+ |
| |
| +------------------------------------------------------------------------+ |
| | V FREEZE WINDOW GATE PASSED | |
| | | |
| | No active freeze windows for production | |
| | | |
| +------------------------------------------------------------------------+ |
| |
| Policy evaluated at: 2026-01-09T14:32:15Z |
| Policy hash: sha256:789xyz... |
| [View Full Decision Record] |
| |
+-----------------------------------------------------------------------------+
```
### Features
- **Gate-by-Gate Status:** Shows each gate with pass/fail status
- **Failure Details:** Specific information about why a gate failed
- **Vulnerability Details:** CVE info, package, version, and remediation
- **Reachability Evidence:** Links to reachability analysis
- **Approval History:** List of approvers and their comments
- **Override Mechanism:** Request override for authorized users
- **Decision Record:** Link to full evidence packet
---
## Navigation Structure
```
Dashboard
+-- Releases
| +-- [Release Detail]
| +-- Create Release
| +-- Compare Releases
|
+-- Environments
| +-- [Environment Overview]
| +-- Create Environment
| +-- Manage Targets
|
+-- Workflows
| +-- [Workflow Editor]
| +-- Workflow Runs
| +-- Step Types
|
+-- Integrations
| +-- Connectors
| +-- Plugins
| +-- Vault
|
+-- Settings
+-- Users & Teams
+-- Policies
+-- Audit Log
```
---
## See Also
- [Dashboard](dashboard.md)
- [Workflow Editor](workflow-editor.md)
- [Environment Manager](../modules/environment-manager.md)
- [Release Manager](../modules/release-manager.md)
- [Promotion Manager](../modules/promotion-manager.md)

View File

@@ -0,0 +1,296 @@
# Workflow Editor Specification
> Visual workflow editor for creating and editing DAG-based workflow templates.
**Status:** Planned (not yet implemented)
**Source:** [Architecture Advisory Section 12.2](../../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
**Related Modules:** [Workflow Engine](../modules/workflow-engine.md), [Workflow Templates](../workflow/templates.md)
**Sprint:** [111_004 Workflow Editor](../../../../implplan/SPRINT_20260110_111_004_FE_workflow_editor.md)
## Overview
The workflow editor provides a visual graph editor for creating and editing workflow templates. It supports drag-and-drop node placement, connection creation, real-time run visualization, and bidirectional YAML synchronization.
---
## Graph Editor Component
### Editor State
```typescript
interface WorkflowEditorState {
template: WorkflowTemplate;
selectedNode: string | null;
selectedEdge: string | null;
zoom: number;
pan: { x: number; y: number };
mode: "select" | "pan" | "connect";
clipboard: StepNode[] | null;
undoStack: WorkflowTemplate[];
redoStack: WorkflowTemplate[];
}
interface WorkflowEditorProps {
template: WorkflowTemplate;
stepTypes: StepType[];
readOnly: boolean;
onSave: (template: WorkflowTemplate) => void;
onValidate: (template: WorkflowTemplate) => ValidationResult;
}
```
### Node Renderer
```typescript
interface NodeRendererProps {
node: StepNode;
stepType: StepType;
status?: StepRunStatus; // For run visualization
selected: boolean;
onSelect: () => void;
onMove: (position: Position) => void;
onConnect: (sourceHandle: string) => void;
}
const NodeRenderer: React.FC<NodeRendererProps> = ({
node, stepType, status, selected
}) => {
const statusColor = getStatusColor(status);
return (
<div className={`workflow-node ${selected ? 'selected' : ''}`}
style={{ borderColor: statusColor }}>
{/* Node header */}
<div className="node-header" style={{ backgroundColor: stepType.color }}>
<Icon name={stepType.icon} />
<span className="node-name">{node.name}</span>
{status && <StatusBadge status={status} />}
</div>
{/* Node body */}
<div className="node-body">
<span className="node-type">{stepType.name}</span>
{node.timeout && <span className="node-timeout">T {node.timeout}s</span>}
</div>
{/* Connection handles */}
<Handle type="target" position="top" />
<Handle type="source" position="bottom" />
{/* Conditional indicator */}
{node.condition && (
<div className="condition-badge" title={node.condition}>
<Icon name="condition" />
</div>
)}
</div>
);
};
```
---
## Run Visualization Overlay
### Real-Time Execution Display
```typescript
interface RunVisualizationProps {
template: WorkflowTemplate;
workflowRun: WorkflowRun;
stepRuns: StepRun[];
onNodeClick: (nodeId: string) => void;
}
const RunVisualization: React.FC<RunVisualizationProps> = ({
template, workflowRun, stepRuns, onNodeClick
}) => {
// WebSocket for real-time updates
const { subscribe, unsubscribe } = useWorkflowStream(workflowRun.id);
useEffect(() => {
const handlers = {
'step_started': (data) => updateStepStatus(data.nodeId, 'running'),
'step_completed': (data) => updateStepStatus(data.nodeId, data.status),
'step_log': (data) => appendLog(data.nodeId, data.line),
};
subscribe(handlers);
return () => unsubscribe();
}, [workflowRun.id]);
return (
<div className="run-visualization">
{/* Workflow graph with status overlay */}
<WorkflowGraph
template={template}
nodeRenderer={(node) => (
<NodeRenderer
node={node}
stepType={getStepType(node.type)}
status={getStepRunStatus(node.id)}
selected={selectedNode === node.id}
onSelect={() => setSelectedNode(node.id)}
/>
)}
edgeRenderer={(edge) => (
<EdgeRenderer
edge={edge}
animated={isEdgeActive(edge)}
/>
)}
/>
{/* Log panel */}
{selectedNode && (
<LogPanel
stepRun={getStepRun(selectedNode)}
streaming={isStepRunning(selectedNode)}
/>
)}
{/* Progress bar */}
<ProgressBar
completed={completedSteps}
total={totalSteps}
status={workflowRun.status}
/>
</div>
);
};
```
### Status Indicators
| Status | Visual |
|--------|--------|
| Pending | Gray circle |
| Running | Blue spinner |
| Success | Green checkmark |
| Failed | Red X |
| Skipped | Yellow dash |
---
## Canvas Operations
### Drag and Drop
- Drag steps from palette to canvas
- Drop creates new node at position
- Connect nodes by dragging from source to target handle
- Multi-select with Shift+click or box selection
### Validation
The editor performs real-time validation:
- **DAG Cycle Detection:** Prevent circular dependencies
- **Orphan Node Detection:** Warn about unconnected nodes
- **Required Inputs:** Highlight missing required configuration
- **Type Compatibility:** Validate edge connections between compatible types
### Zoom and Pan
| Action | Control |
|--------|---------|
| Zoom In | Ctrl + Mouse Wheel Up |
| Zoom Out | Ctrl + Mouse Wheel Down |
| Fit View | Ctrl + 0 |
| Pan | Middle Mouse Drag / Space + Drag |
| Reset | Ctrl + R |
---
## YAML Editor Mode
### Monaco Editor Integration
The editor supports a bidirectional YAML mode for power users:
```typescript
interface YAMLEditorProps {
template: WorkflowTemplate;
onChange: (template: WorkflowTemplate) => void;
onValidate: (yaml: string) => ValidationResult;
}
const YAMLEditor: React.FC<YAMLEditorProps> = ({ template, onChange, onValidate }) => {
const [yaml, setYaml] = useState(templateToYaml(template));
return (
<MonacoEditor
language="yaml"
value={yaml}
onChange={(value) => {
setYaml(value);
const result = onValidate(value);
if (result.valid) {
onChange(yamlToTemplate(value));
}
}}
options={{
minimap: { enabled: false },
lineNumbers: 'on',
scrollBeyondLastLine: false,
}}
/>
);
};
```
### Bidirectional Sync
Changes in either view (graph or YAML) are synchronized:
- Graph changes update YAML immediately
- Valid YAML changes update graph
- Invalid YAML shows error markers without updating graph
---
## Step Palette
### Available Step Types
The palette shows all available step types from core and plugins:
```typescript
interface StepPaletteProps {
stepTypes: StepType[];
onDragStart: (stepType: string) => void;
filter: string;
}
const categories = [
{ name: "Deployment", types: ["deploy", "rollback"] },
{ name: "Gates", types: ["security-gate", "approval", "freeze-window-gate"] },
{ name: "Utility", types: ["script", "wait", "notify"] },
{ name: "Plugins", types: [] }, // Dynamically loaded
];
```
---
## Keyboard Shortcuts
| Shortcut | Action |
|----------|--------|
| Ctrl + S | Save template |
| Ctrl + Z | Undo |
| Ctrl + Shift + Z | Redo |
| Delete | Delete selected |
| Ctrl + C | Copy selected |
| Ctrl + V | Paste |
| Ctrl + A | Select all |
| Escape | Deselect / Cancel |
---
## See Also
- [Workflow Templates](../workflow/templates.md)
- [Workflow APIs](../api/workflows.md)
- [Dashboard](dashboard.md)
- [Key Screens](screens.md)