release orchestrator pivot, architecture and planning
This commit is contained in:
137
docs/modules/release-orchestrator/README.md
Normal file
137
docs/modules/release-orchestrator/README.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# Release Orchestrator
|
||||
|
||||
> Central release control plane for non-Kubernetes container estates.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
**Source:** [Full Architecture Specification](../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
|
||||
## Purpose
|
||||
|
||||
The Release Orchestrator extends Stella Ops from a vulnerability scanning platform into **Stella Ops Suite** — a unified release control plane for non-Kubernetes container environments. It integrates:
|
||||
|
||||
- **Existing capabilities**: SBOM generation, reachability-aware vulnerability analysis, VEX support, policy engine, evidence locker, deterministic replay
|
||||
- **New capabilities**: Environment management, release orchestration, promotion workflows, deployment execution, progressive delivery, audit-grade release governance
|
||||
|
||||
## Scope
|
||||
|
||||
| In Scope | Out of Scope |
|
||||
|----------|--------------|
|
||||
| Non-K8s container deployments (Docker, Compose, ECS, Nomad) | Kubernetes deployments (use ArgoCD, Flux) |
|
||||
| Release identity via OCI digests | Tag-based release identity |
|
||||
| Plugin-extensible integrations | Hard-coded vendor integrations |
|
||||
| SSH/WinRM + agent-based deployment | Cloud-native serverless deployments |
|
||||
| L4/L7 traffic management via router plugins | Built-in service mesh |
|
||||
|
||||
## Documentation Structure
|
||||
|
||||
### Design & Principles
|
||||
- [Design Principles](design/principles.md) — Core principles and invariants
|
||||
- [Key Decisions](design/decisions.md) — Architectural decision record
|
||||
|
||||
### Implementation
|
||||
- [Implementation Guide](implementation-guide.md) — .NET 10 patterns and best practices
|
||||
- [Test Structure](test-structure.md) — Test organization and guidelines
|
||||
|
||||
### Module Architecture
|
||||
- [Module Overview](modules/overview.md) — All modules and themes
|
||||
- [Integration Hub (INTHUB)](modules/integration-hub.md) — External integrations
|
||||
- [Environment Manager (ENVMGR)](modules/environment-manager.md) — Environments and targets
|
||||
- [Release Manager (RELMAN)](modules/release-manager.md) — Release bundles and versions
|
||||
- [Workflow Engine (WORKFL)](modules/workflow-engine.md) — DAG execution
|
||||
- [Promotion Manager (PROMOT)](modules/promotion-manager.md) — Approvals and gates
|
||||
- [Deploy Orchestrator (DEPLOY)](modules/deploy-orchestrator.md) — Deployment execution
|
||||
- [Agents (AGENTS)](modules/agents.md) — Deployment agents
|
||||
- [Progressive Delivery (PROGDL)](modules/progressive-delivery.md) — A/B and canary
|
||||
- [Release Evidence (RELEVI)](modules/evidence.md) — Evidence packets
|
||||
- [Plugin System (PLUGIN)](modules/plugin-system.md) — Plugin infrastructure
|
||||
|
||||
### Data Model
|
||||
- [Database Schema](data-model/schema.md) — PostgreSQL schema specification
|
||||
- [Entity Definitions](data-model/entities.md) — Entity descriptions
|
||||
|
||||
### API Specification
|
||||
- [API Overview](api/overview.md) — API design principles
|
||||
- [Environment APIs](api/environments.md) — Environment endpoints
|
||||
- [Release APIs](api/releases.md) — Release endpoints
|
||||
- [Promotion APIs](api/promotions.md) — Promotion endpoints
|
||||
- [Workflow APIs](api/workflows.md) — Workflow endpoints
|
||||
- [Agent APIs](api/agents.md) — Agent endpoints
|
||||
- [WebSocket APIs](api/websockets.md) — Real-time endpoints
|
||||
|
||||
### Workflow Engine
|
||||
- [Template Structure](workflow/templates.md) — Workflow template specification
|
||||
- [Execution State Machine](workflow/execution.md) — Workflow state machine
|
||||
- [Promotion State Machine](workflow/promotion.md) — Promotion state machine
|
||||
|
||||
### Security
|
||||
- [Security Overview](security/overview.md) — Security principles
|
||||
- [Authentication & Authorization](security/auth.md) — AuthN/AuthZ
|
||||
- [Agent Security](security/agent-security.md) — Agent security model
|
||||
- [Threat Model](security/threat-model.md) — Threats and mitigations
|
||||
- [Audit Trail](security/audit-trail.md) — Audit logging
|
||||
|
||||
### Integrations
|
||||
- [Integration Overview](integrations/overview.md) — Integration types
|
||||
- [Connector Interface](integrations/connectors.md) — Connector specification
|
||||
- [Webhook Architecture](integrations/webhooks.md) — Webhook handling
|
||||
- [CI/CD Patterns](integrations/ci-cd.md) — CI/CD integration patterns
|
||||
|
||||
### Deployment
|
||||
- [Deployment Overview](deployment/overview.md) — Architecture overview
|
||||
- [Deployment Strategies](deployment/strategies.md) — Deployment strategies
|
||||
- [Agent-Based Deployment](deployment/agent-based.md) — Agent deployment
|
||||
- [Agentless Deployment](deployment/agentless.md) — SSH/WinRM deployment
|
||||
- [Artifact Generation](deployment/artifacts.md) — Generated artifacts
|
||||
|
||||
### Progressive Delivery
|
||||
- [Progressive Overview](progressive-delivery/overview.md) — Progressive delivery architecture
|
||||
- [A/B Releases](progressive-delivery/ab-releases.md) — A/B release models
|
||||
- [Canary Controller](progressive-delivery/canary.md) — Canary implementation
|
||||
- [Router Plugins](progressive-delivery/routers.md) — Traffic routing plugins
|
||||
|
||||
### UI/UX
|
||||
- [Dashboard Specification](ui/dashboard.md) — Dashboard screens
|
||||
- [Workflow Editor](ui/workflow-editor.md) — Workflow editor
|
||||
- [Screen Reference](ui/screens.md) — Key UI screens
|
||||
|
||||
### Operations
|
||||
- [Metrics](operations/metrics.md) — Metrics specification
|
||||
- [Logging](operations/logging.md) — Logging patterns
|
||||
- [Tracing](operations/tracing.md) — Distributed tracing
|
||||
- [Alerting](operations/alerting.md) — Alert rules
|
||||
|
||||
### Implementation
|
||||
- [Roadmap](roadmap.md) — Implementation phases
|
||||
- [Resource Requirements](roadmap.md#resource-requirements) — Sizing
|
||||
|
||||
### Appendices
|
||||
- [Glossary](appendices/glossary.md) — Term definitions
|
||||
- [Configuration Reference](appendices/config.md) — Configuration options
|
||||
- [Error Codes](appendices/errors.md) — API error codes
|
||||
- [Evidence Schema](appendices/evidence-schema.md) — Evidence packet format
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Key Principles
|
||||
|
||||
1. **Digest-first release identity** — Releases are immutable OCI digests, not tags
|
||||
2. **Evidence for every decision** — Every promotion/deployment produces sealed evidence
|
||||
3. **Pluggable everything, stable core** — Integrations are plugins; core is stable
|
||||
4. **No feature gating** — All plans include all features
|
||||
5. **Offline-first operation** — Core works in air-gapped environments
|
||||
6. **Immutable generated artifacts** — Every deployment generates stored artifacts
|
||||
|
||||
### Platform Themes
|
||||
|
||||
| Theme | Purpose |
|
||||
|-------|---------|
|
||||
| **INTHUB** | Integration hub — external system connections |
|
||||
| **ENVMGR** | Environment management — environments, targets, agents |
|
||||
| **RELMAN** | Release management — components, versions, releases |
|
||||
| **WORKFL** | Workflow engine — DAG execution, steps |
|
||||
| **PROMOT** | Promotion — approvals, gates, decisions |
|
||||
| **DEPLOY** | Deployment — execution, artifacts, rollback |
|
||||
| **AGENTS** | Agents — Docker, Compose, ECS, Nomad |
|
||||
| **PROGDL** | Progressive delivery — A/B, canary |
|
||||
| **RELEVI** | Evidence — packets, stickers, audit |
|
||||
| **PLUGIN** | Plugins — registry, loader, SDK |
|
||||
299
docs/modules/release-orchestrator/api/overview.md
Normal file
299
docs/modules/release-orchestrator/api/overview.md
Normal file
@@ -0,0 +1,299 @@
|
||||
# API Overview
|
||||
|
||||
**Version**: v1
|
||||
**Base Path**: `/api/v1`
|
||||
|
||||
## Design Principles
|
||||
|
||||
| Principle | Implementation |
|
||||
|-----------|----------------|
|
||||
| **RESTful** | Resource-oriented URLs, standard HTTP methods |
|
||||
| **Versioned** | `/api/v1/...` prefix; breaking changes require version bump |
|
||||
| **Consistent** | Standard response envelope, error format, pagination |
|
||||
| **Authenticated** | OAuth 2.0 Bearer tokens via Authority module |
|
||||
| **Tenant-scoped** | Tenant ID from token; all operations scoped to tenant |
|
||||
| **Audited** | All mutating operations logged with user/timestamp |
|
||||
|
||||
## Authentication
|
||||
|
||||
All API requests require a valid JWT Bearer token:
|
||||
|
||||
```http
|
||||
Authorization: Bearer <token>
|
||||
```
|
||||
|
||||
Tokens are issued by the Authority module and contain:
|
||||
- `user_id`: User identifier
|
||||
- `tenant_id`: Tenant scope
|
||||
- `roles`: User roles
|
||||
- `permissions`: Specific permissions
|
||||
|
||||
## Standard Response Envelope
|
||||
|
||||
### Success Response
|
||||
|
||||
```typescript
|
||||
interface ApiResponse<T> {
|
||||
success: true;
|
||||
data: T;
|
||||
meta?: {
|
||||
pagination?: PaginationMeta;
|
||||
requestId: string;
|
||||
timestamp: string;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Error Response
|
||||
|
||||
```typescript
|
||||
interface ApiErrorResponse {
|
||||
success: false;
|
||||
error: {
|
||||
code: string; // e.g., "PROMOTION_BLOCKED"
|
||||
message: string; // Human-readable message
|
||||
details?: object; // Additional context
|
||||
validationErrors?: ValidationError[];
|
||||
};
|
||||
meta: {
|
||||
requestId: string;
|
||||
timestamp: string;
|
||||
};
|
||||
}
|
||||
|
||||
interface ValidationError {
|
||||
field: string;
|
||||
message: string;
|
||||
code: string;
|
||||
}
|
||||
```
|
||||
|
||||
### Pagination
|
||||
|
||||
```typescript
|
||||
interface PaginationMeta {
|
||||
page: number;
|
||||
pageSize: number;
|
||||
totalItems: number;
|
||||
totalPages: number;
|
||||
hasNext: boolean;
|
||||
hasPrevious: boolean;
|
||||
}
|
||||
```
|
||||
|
||||
## HTTP Status Codes
|
||||
|
||||
| Code | Description |
|
||||
|------|-------------|
|
||||
| `200` | Success |
|
||||
| `201` | Created |
|
||||
| `204` | No Content |
|
||||
| `400` | Bad Request - validation error |
|
||||
| `401` | Unauthorized - invalid/missing token |
|
||||
| `403` | Forbidden - insufficient permissions |
|
||||
| `404` | Not Found |
|
||||
| `409` | Conflict - resource state conflict |
|
||||
| `422` | Unprocessable Entity - business rule violation |
|
||||
| `429` | Too Many Requests - rate limited |
|
||||
| `500` | Internal Server Error |
|
||||
|
||||
## Common Query Parameters
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `page` | integer | Page number (1-indexed) |
|
||||
| `pageSize` | integer | Items per page (max 100) |
|
||||
| `sort` | string | Sort field (prefix `-` for descending) |
|
||||
| `filter` | string | JSON filter expression |
|
||||
|
||||
## API Modules
|
||||
|
||||
### Integration Hub (INTHUB)
|
||||
|
||||
```
|
||||
GET /api/v1/integration-types
|
||||
GET /api/v1/integration-types/{typeId}
|
||||
POST /api/v1/integrations
|
||||
GET /api/v1/integrations
|
||||
GET /api/v1/integrations/{id}
|
||||
PUT /api/v1/integrations/{id}
|
||||
DELETE /api/v1/integrations/{id}
|
||||
POST /api/v1/integrations/{id}/test
|
||||
POST /api/v1/integrations/{id}/discover
|
||||
GET /api/v1/integrations/{id}/health
|
||||
```
|
||||
|
||||
### Environment & Inventory (ENVMGR)
|
||||
|
||||
```
|
||||
POST /api/v1/environments
|
||||
GET /api/v1/environments
|
||||
GET /api/v1/environments/{id}
|
||||
PUT /api/v1/environments/{id}
|
||||
DELETE /api/v1/environments/{id}
|
||||
POST /api/v1/environments/{envId}/freeze-windows
|
||||
GET /api/v1/environments/{envId}/freeze-windows
|
||||
DELETE /api/v1/environments/{envId}/freeze-windows/{windowId}
|
||||
POST /api/v1/targets
|
||||
GET /api/v1/targets
|
||||
GET /api/v1/targets/{id}
|
||||
PUT /api/v1/targets/{id}
|
||||
DELETE /api/v1/targets/{id}
|
||||
POST /api/v1/targets/{id}/health-check
|
||||
GET /api/v1/targets/{id}/sticker
|
||||
GET /api/v1/targets/{id}/drift
|
||||
POST /api/v1/agents/register
|
||||
GET /api/v1/agents
|
||||
GET /api/v1/agents/{id}
|
||||
PUT /api/v1/agents/{id}
|
||||
DELETE /api/v1/agents/{id}
|
||||
POST /api/v1/agents/{id}/heartbeat
|
||||
```
|
||||
|
||||
### Release Management (RELMAN)
|
||||
|
||||
```
|
||||
POST /api/v1/components
|
||||
GET /api/v1/components
|
||||
GET /api/v1/components/{id}
|
||||
PUT /api/v1/components/{id}
|
||||
DELETE /api/v1/components/{id}
|
||||
POST /api/v1/components/{id}/sync-versions
|
||||
GET /api/v1/components/{id}/versions
|
||||
POST /api/v1/releases
|
||||
GET /api/v1/releases
|
||||
GET /api/v1/releases/{id}
|
||||
PUT /api/v1/releases/{id}
|
||||
DELETE /api/v1/releases/{id}
|
||||
GET /api/v1/releases/{id}/state
|
||||
POST /api/v1/releases/{id}/deprecate
|
||||
GET /api/v1/releases/{id}/compare/{otherId}
|
||||
POST /api/v1/releases/from-latest
|
||||
```
|
||||
|
||||
### Workflow Engine (WORKFL)
|
||||
|
||||
```
|
||||
POST /api/v1/workflow-templates
|
||||
GET /api/v1/workflow-templates
|
||||
GET /api/v1/workflow-templates/{id}
|
||||
PUT /api/v1/workflow-templates/{id}
|
||||
DELETE /api/v1/workflow-templates/{id}
|
||||
POST /api/v1/workflow-templates/{id}/validate
|
||||
GET /api/v1/step-types
|
||||
GET /api/v1/step-types/{type}
|
||||
POST /api/v1/workflow-runs
|
||||
GET /api/v1/workflow-runs
|
||||
GET /api/v1/workflow-runs/{id}
|
||||
POST /api/v1/workflow-runs/{id}/pause
|
||||
POST /api/v1/workflow-runs/{id}/resume
|
||||
POST /api/v1/workflow-runs/{id}/cancel
|
||||
GET /api/v1/workflow-runs/{id}/steps
|
||||
GET /api/v1/workflow-runs/{id}/steps/{nodeId}
|
||||
GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs
|
||||
GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts
|
||||
```
|
||||
|
||||
### Promotion & Approval (PROMOT)
|
||||
|
||||
```
|
||||
POST /api/v1/promotions
|
||||
GET /api/v1/promotions
|
||||
GET /api/v1/promotions/{id}
|
||||
POST /api/v1/promotions/{id}/approve
|
||||
POST /api/v1/promotions/{id}/reject
|
||||
POST /api/v1/promotions/{id}/cancel
|
||||
GET /api/v1/promotions/{id}/decision
|
||||
GET /api/v1/promotions/{id}/approvals
|
||||
GET /api/v1/promotions/{id}/evidence
|
||||
POST /api/v1/promotions/preview-gates
|
||||
POST /api/v1/approval-policies
|
||||
GET /api/v1/approval-policies
|
||||
GET /api/v1/my/pending-approvals
|
||||
```
|
||||
|
||||
### Deployment (DEPLOY)
|
||||
|
||||
```
|
||||
GET /api/v1/deployment-jobs
|
||||
GET /api/v1/deployment-jobs/{id}
|
||||
GET /api/v1/deployment-jobs/{id}/tasks
|
||||
GET /api/v1/deployment-jobs/{id}/tasks/{taskId}
|
||||
GET /api/v1/deployment-jobs/{id}/tasks/{taskId}/logs
|
||||
GET /api/v1/deployment-jobs/{id}/artifacts
|
||||
GET /api/v1/deployment-jobs/{id}/artifacts/{artifactId}
|
||||
POST /api/v1/rollbacks
|
||||
GET /api/v1/rollbacks
|
||||
```
|
||||
|
||||
### Progressive Delivery (PROGDL)
|
||||
|
||||
```
|
||||
POST /api/v1/ab-releases
|
||||
GET /api/v1/ab-releases
|
||||
GET /api/v1/ab-releases/{id}
|
||||
POST /api/v1/ab-releases/{id}/start
|
||||
POST /api/v1/ab-releases/{id}/advance
|
||||
POST /api/v1/ab-releases/{id}/promote
|
||||
POST /api/v1/ab-releases/{id}/rollback
|
||||
GET /api/v1/ab-releases/{id}/traffic
|
||||
GET /api/v1/ab-releases/{id}/health
|
||||
GET /api/v1/rollout-strategies
|
||||
```
|
||||
|
||||
### Release Evidence (RELEVI)
|
||||
|
||||
```
|
||||
GET /api/v1/evidence-packets
|
||||
GET /api/v1/evidence-packets/{id}
|
||||
GET /api/v1/evidence-packets/{id}/download
|
||||
POST /api/v1/audit-reports
|
||||
GET /api/v1/audit-reports/{id}
|
||||
GET /api/v1/audit-reports/{id}/download
|
||||
GET /api/v1/version-stickers
|
||||
GET /api/v1/version-stickers/{id}
|
||||
```
|
||||
|
||||
### Plugin Infrastructure (PLUGIN)
|
||||
|
||||
```
|
||||
GET /api/v1/plugins
|
||||
GET /api/v1/plugins/{id}
|
||||
POST /api/v1/plugins/{id}/enable
|
||||
POST /api/v1/plugins/{id}/disable
|
||||
GET /api/v1/plugins/{id}/health
|
||||
POST /api/v1/plugin-instances
|
||||
GET /api/v1/plugin-instances
|
||||
PUT /api/v1/plugin-instances/{id}
|
||||
DELETE /api/v1/plugin-instances/{id}
|
||||
```
|
||||
|
||||
## WebSocket Endpoints
|
||||
|
||||
```
|
||||
WS /api/v1/workflow-runs/{id}/stream
|
||||
WS /api/v1/deployment-jobs/{id}/stream
|
||||
WS /api/v1/agents/{id}/task-stream
|
||||
WS /api/v1/dashboard/stream
|
||||
```
|
||||
|
||||
## Rate Limits
|
||||
|
||||
| Tier | Requests/minute | Burst |
|
||||
|------|-----------------|-------|
|
||||
| Standard | 1000 | 100 |
|
||||
| Premium | 5000 | 500 |
|
||||
|
||||
Rate limit headers:
|
||||
- `X-RateLimit-Limit`: Request limit
|
||||
- `X-RateLimit-Remaining`: Remaining requests
|
||||
- `X-RateLimit-Reset`: Reset timestamp
|
||||
|
||||
## References
|
||||
|
||||
- [Environments API](environments.md)
|
||||
- [Releases API](releases.md)
|
||||
- [Promotions API](promotions.md)
|
||||
- [Workflows API](workflows.md)
|
||||
- [Agents API](agents.md)
|
||||
- [WebSocket API](websockets.md)
|
||||
296
docs/modules/release-orchestrator/appendices/errors.md
Normal file
296
docs/modules/release-orchestrator/appendices/errors.md
Normal file
@@ -0,0 +1,296 @@
|
||||
# API Error Codes
|
||||
|
||||
## Overview
|
||||
|
||||
All API errors follow a consistent format with error codes for programmatic handling.
|
||||
|
||||
## Error Response Format
|
||||
|
||||
```typescript
|
||||
interface ApiErrorResponse {
|
||||
success: false;
|
||||
error: {
|
||||
code: string; // Machine-readable error code
|
||||
message: string; // Human-readable message
|
||||
details?: object; // Additional context
|
||||
validationErrors?: ValidationError[];
|
||||
};
|
||||
meta: {
|
||||
requestId: string;
|
||||
timestamp: string;
|
||||
};
|
||||
}
|
||||
|
||||
interface ValidationError {
|
||||
field: string;
|
||||
message: string;
|
||||
code: string;
|
||||
}
|
||||
```
|
||||
|
||||
## Error Code Categories
|
||||
|
||||
| Prefix | Category | HTTP Status Range |
|
||||
|--------|----------|-------------------|
|
||||
| `AUTH_` | Authentication | 401 |
|
||||
| `PERM_` | Authorization/Permission | 403 |
|
||||
| `VAL_` | Validation | 400 |
|
||||
| `RES_` | Resource | 404, 409 |
|
||||
| `ENV_` | Environment | 422 |
|
||||
| `REL_` | Release | 422 |
|
||||
| `PROM_` | Promotion | 422 |
|
||||
| `DEPLOY_` | Deployment | 422 |
|
||||
| `GATE_` | Gate | 422 |
|
||||
| `AGT_` | Agent | 422 |
|
||||
| `INT_` | Integration | 422 |
|
||||
| `WF_` | Workflow | 422 |
|
||||
| `SYS_` | System | 500 |
|
||||
|
||||
## Authentication Errors (401)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `AUTH_TOKEN_MISSING` | Authentication token required | No token provided |
|
||||
| `AUTH_TOKEN_INVALID` | Invalid authentication token | Token cannot be parsed |
|
||||
| `AUTH_TOKEN_EXPIRED` | Authentication token expired | Token has expired |
|
||||
| `AUTH_TOKEN_REVOKED` | Authentication token revoked | Token has been revoked |
|
||||
| `AUTH_AGENT_CERT_INVALID` | Invalid agent certificate | Agent mTLS cert invalid |
|
||||
| `AUTH_AGENT_CERT_EXPIRED` | Agent certificate expired | Agent cert has expired |
|
||||
| `AUTH_API_KEY_INVALID` | Invalid API key | API key not recognized |
|
||||
|
||||
## Permission Errors (403)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `PERM_DENIED` | Permission denied | Generic permission denial |
|
||||
| `PERM_RESOURCE_DENIED` | Access to resource denied | Cannot access specific resource |
|
||||
| `PERM_ACTION_DENIED` | Action not permitted | Cannot perform specific action |
|
||||
| `PERM_SCOPE_DENIED` | Outside permitted scope | Action outside user's scope |
|
||||
| `PERM_SOD_VIOLATION` | Separation of duties violation | SoD prevents action |
|
||||
| `PERM_SELF_APPROVAL` | Cannot approve own request | Self-approval not allowed |
|
||||
| `PERM_TENANT_MISMATCH` | Tenant mismatch | Resource belongs to different tenant |
|
||||
|
||||
## Validation Errors (400)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `VAL_REQUIRED_FIELD` | Required field missing | Field is required |
|
||||
| `VAL_INVALID_FORMAT` | Invalid field format | Field format incorrect |
|
||||
| `VAL_INVALID_VALUE` | Invalid field value | Value not in allowed set |
|
||||
| `VAL_TOO_LONG` | Field value too long | Exceeds max length |
|
||||
| `VAL_TOO_SHORT` | Field value too short | Below min length |
|
||||
| `VAL_INVALID_UUID` | Invalid UUID format | Not a valid UUID |
|
||||
| `VAL_INVALID_DIGEST` | Invalid digest format | Not a valid OCI digest |
|
||||
| `VAL_INVALID_SEMVER` | Invalid semver format | Not valid semantic version |
|
||||
| `VAL_INVALID_JSON` | Invalid JSON | Request body not valid JSON |
|
||||
| `VAL_SCHEMA_MISMATCH` | Schema validation failed | Doesn't match schema |
|
||||
|
||||
## Resource Errors (404, 409)
|
||||
|
||||
| Code | Message | HTTP | Description |
|
||||
|------|---------|------|-------------|
|
||||
| `RES_NOT_FOUND` | Resource not found | 404 | Generic not found |
|
||||
| `RES_ENVIRONMENT_NOT_FOUND` | Environment not found | 404 | Environment doesn't exist |
|
||||
| `RES_RELEASE_NOT_FOUND` | Release not found | 404 | Release doesn't exist |
|
||||
| `RES_PROMOTION_NOT_FOUND` | Promotion not found | 404 | Promotion doesn't exist |
|
||||
| `RES_TARGET_NOT_FOUND` | Target not found | 404 | Target doesn't exist |
|
||||
| `RES_AGENT_NOT_FOUND` | Agent not found | 404 | Agent doesn't exist |
|
||||
| `RES_CONFLICT` | Resource conflict | 409 | Resource state conflict |
|
||||
| `RES_ALREADY_EXISTS` | Resource already exists | 409 | Duplicate resource |
|
||||
| `RES_VERSION_CONFLICT` | Version conflict | 409 | Optimistic lock failure |
|
||||
|
||||
## Environment Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `ENV_FROZEN` | Environment is frozen | Deployment blocked by freeze window |
|
||||
| `ENV_FREEZE_ACTIVE` | Active freeze window | Cannot modify during freeze |
|
||||
| `ENV_INVALID_ORDER` | Invalid environment order | Order index conflict |
|
||||
| `ENV_CIRCULAR_PROMOTION` | Circular promotion path | Auto-promote creates cycle |
|
||||
| `ENV_QUOTA_EXCEEDED` | Environment quota exceeded | Max environments reached |
|
||||
|
||||
## Release Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `REL_ALREADY_FINALIZED` | Release already finalized | Cannot modify finalized release |
|
||||
| `REL_NOT_READY` | Release not ready | Release not in ready state |
|
||||
| `REL_DIGEST_MISMATCH` | Digest mismatch | Resolved digest differs |
|
||||
| `REL_TAG_NOT_FOUND` | Tag not found in registry | Cannot resolve tag |
|
||||
| `REL_COMPONENT_MISSING` | Component not found | Referenced component missing |
|
||||
| `REL_INVALID_STATUS_TRANSITION` | Invalid status transition | Status change not allowed |
|
||||
| `REL_DEPRECATED` | Release deprecated | Cannot promote deprecated release |
|
||||
|
||||
## Promotion Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `PROM_ALREADY_EXISTS` | Promotion already pending | Duplicate promotion request |
|
||||
| `PROM_NOT_PENDING` | Promotion not pending | Cannot approve/reject |
|
||||
| `PROM_ALREADY_APPROVED` | Promotion already approved | Already approved |
|
||||
| `PROM_ALREADY_REJECTED` | Promotion already rejected | Already rejected |
|
||||
| `PROM_ALREADY_CANCELLED` | Promotion already cancelled | Already cancelled |
|
||||
| `PROM_DEPLOYING` | Promotion is deploying | Cannot cancel during deploy |
|
||||
| `PROM_INVALID_STATE` | Invalid promotion state | State doesn't allow action |
|
||||
| `PROM_APPROVER_REQUIRED` | Additional approvers required | Insufficient approvals |
|
||||
| `PROM_SKIP_ENVIRONMENT` | Cannot skip environments | Must promote sequentially |
|
||||
|
||||
## Deployment Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `DEPLOY_IN_PROGRESS` | Deployment in progress | Another deployment running |
|
||||
| `DEPLOY_NO_TARGETS` | No targets available | No targets in environment |
|
||||
| `DEPLOY_TARGET_UNHEALTHY` | Target unhealthy | Target failed health check |
|
||||
| `DEPLOY_AGENT_UNAVAILABLE` | Agent unavailable | Required agent offline |
|
||||
| `DEPLOY_ARTIFACT_MISSING` | Deployment artifact missing | Required artifact not found |
|
||||
| `DEPLOY_TIMEOUT` | Deployment timeout | Exceeded timeout |
|
||||
| `DEPLOY_PULL_FAILED` | Image pull failed | Cannot pull container image |
|
||||
| `DEPLOY_DIGEST_VERIFICATION_FAILED` | Digest verification failed | Image tampered |
|
||||
| `DEPLOY_HEALTH_CHECK_FAILED` | Health check failed | Post-deploy health failed |
|
||||
| `DEPLOY_ROLLBACK_IN_PROGRESS` | Rollback in progress | Already rolling back |
|
||||
| `DEPLOY_NOTHING_TO_ROLLBACK` | Nothing to rollback | No previous deployment |
|
||||
|
||||
## Gate Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `GATE_EVALUATION_FAILED` | Gate evaluation failed | Gate cannot be evaluated |
|
||||
| `GATE_SECURITY_BLOCKED` | Blocked by security gate | Security policy violation |
|
||||
| `GATE_POLICY_BLOCKED` | Blocked by policy gate | Custom policy violation |
|
||||
| `GATE_APPROVAL_BLOCKED` | Blocked pending approval | Awaiting approval |
|
||||
| `GATE_TIMEOUT` | Gate evaluation timeout | Evaluation exceeded timeout |
|
||||
|
||||
## Agent Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `AGT_REGISTRATION_FAILED` | Agent registration failed | Cannot register agent |
|
||||
| `AGT_TOKEN_INVALID` | Invalid registration token | Bad or expired token |
|
||||
| `AGT_TOKEN_USED` | Registration token already used | One-time token reused |
|
||||
| `AGT_CERTIFICATE_FAILED` | Certificate issuance failed | Cannot issue certificate |
|
||||
| `AGT_OFFLINE` | Agent offline | Agent not responding |
|
||||
| `AGT_CAPABILITY_MISSING` | Missing capability | Agent lacks required capability |
|
||||
| `AGT_TASK_FAILED` | Task execution failed | Agent task failed |
|
||||
| `AGT_HEARTBEAT_TIMEOUT` | Heartbeat timeout | Agent heartbeat overdue |
|
||||
|
||||
## Integration Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `INT_CONNECTION_FAILED` | Connection failed | Cannot connect to integration |
|
||||
| `INT_AUTH_FAILED` | Authentication failed | Integration auth failed |
|
||||
| `INT_RATE_LIMITED` | Rate limited | Integration rate limit hit |
|
||||
| `INT_TIMEOUT` | Integration timeout | Request timeout |
|
||||
| `INT_INVALID_RESPONSE` | Invalid response | Unexpected response format |
|
||||
| `INT_RESOURCE_NOT_FOUND` | External resource not found | Registry/SCM resource missing |
|
||||
|
||||
## Workflow Errors (422)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `WF_TEMPLATE_NOT_FOUND` | Workflow template not found | Template doesn't exist |
|
||||
| `WF_TEMPLATE_INVALID` | Invalid workflow template | Template validation failed |
|
||||
| `WF_CYCLE_DETECTED` | Cycle detected in workflow | DAG contains cycle |
|
||||
| `WF_STEP_FAILED` | Workflow step failed | Step execution failed |
|
||||
| `WF_ALREADY_RUNNING` | Workflow already running | Duplicate workflow run |
|
||||
| `WF_INVALID_STATE` | Invalid workflow state | Cannot perform action |
|
||||
| `WF_EXPRESSION_ERROR` | Expression evaluation error | Bad expression |
|
||||
|
||||
## System Errors (500)
|
||||
|
||||
| Code | Message | Description |
|
||||
|------|---------|-------------|
|
||||
| `SYS_INTERNAL_ERROR` | Internal server error | Unexpected error |
|
||||
| `SYS_DATABASE_ERROR` | Database error | Database operation failed |
|
||||
| `SYS_STORAGE_ERROR` | Storage error | Storage operation failed |
|
||||
| `SYS_VAULT_ERROR` | Vault error | Secret retrieval failed |
|
||||
| `SYS_QUEUE_ERROR` | Queue error | Message queue failed |
|
||||
| `SYS_SERVICE_UNAVAILABLE` | Service unavailable | Dependency unavailable |
|
||||
| `SYS_OVERLOADED` | System overloaded | Capacity exceeded |
|
||||
|
||||
## Example Error Responses
|
||||
|
||||
### Validation Error
|
||||
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"error": {
|
||||
"code": "VAL_REQUIRED_FIELD",
|
||||
"message": "Validation failed",
|
||||
"validationErrors": [
|
||||
{
|
||||
"field": "releaseId",
|
||||
"message": "Release ID is required",
|
||||
"code": "VAL_REQUIRED_FIELD"
|
||||
},
|
||||
{
|
||||
"field": "targetEnvironmentId",
|
||||
"message": "Invalid UUID format",
|
||||
"code": "VAL_INVALID_UUID"
|
||||
}
|
||||
]
|
||||
},
|
||||
"meta": {
|
||||
"requestId": "req-12345",
|
||||
"timestamp": "2026-01-10T14:30:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Permission Error
|
||||
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"error": {
|
||||
"code": "PERM_SOD_VIOLATION",
|
||||
"message": "Separation of duties violation: requester cannot approve their own promotion",
|
||||
"details": {
|
||||
"promotionId": "promo-uuid",
|
||||
"requesterId": "user-uuid",
|
||||
"approverId": "user-uuid",
|
||||
"environmentId": "env-uuid",
|
||||
"requiresSoD": true
|
||||
}
|
||||
},
|
||||
"meta": {
|
||||
"requestId": "req-12345",
|
||||
"timestamp": "2026-01-10T14:30:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Gate Block Error
|
||||
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"error": {
|
||||
"code": "GATE_SECURITY_BLOCKED",
|
||||
"message": "Promotion blocked by security gate",
|
||||
"details": {
|
||||
"gateName": "security-gate",
|
||||
"releaseId": "rel-uuid",
|
||||
"targetEnvironment": "production",
|
||||
"violations": [
|
||||
{
|
||||
"type": "critical_vulnerability",
|
||||
"count": 3,
|
||||
"threshold": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"meta": {
|
||||
"requestId": "req-12345",
|
||||
"timestamp": "2026-01-10T14:30:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [API Overview](../api/overview.md)
|
||||
- [Security Overview](../security/overview.md)
|
||||
549
docs/modules/release-orchestrator/appendices/evidence-schema.md
Normal file
549
docs/modules/release-orchestrator/appendices/evidence-schema.md
Normal file
@@ -0,0 +1,549 @@
|
||||
# Evidence Packet Schema
|
||||
|
||||
## Overview
|
||||
|
||||
Evidence packets are cryptographically signed, immutable records of deployment decisions and outcomes. They provide audit-grade proof of who did what, when, and why.
|
||||
|
||||
## Evidence Packet Types
|
||||
|
||||
| Type | Description | Generated When |
|
||||
|------|-------------|----------------|
|
||||
| `release_decision` | Promotion decision evidence | Promotion approved/rejected |
|
||||
| `deployment` | Deployment execution evidence | Deployment completes |
|
||||
| `rollback` | Rollback evidence | Rollback completes |
|
||||
| `ab_promotion` | A/B release promotion evidence | A/B promotion completes |
|
||||
|
||||
## Schema Definition
|
||||
|
||||
### Evidence Packet Structure
|
||||
|
||||
```typescript
|
||||
interface EvidencePacket {
|
||||
// Identification
|
||||
id: UUID;
|
||||
version: "1.0";
|
||||
type: EvidencePacketType;
|
||||
|
||||
// Metadata
|
||||
generatedAt: DateTime;
|
||||
generatorVersion: string;
|
||||
tenantId: UUID;
|
||||
|
||||
// Content
|
||||
content: EvidenceContent;
|
||||
|
||||
// Integrity
|
||||
contentHash: string; // SHA-256 of canonical JSON content
|
||||
signature: string; // Base64-encoded signature
|
||||
signatureAlgorithm: string; // "RS256", "ES256"
|
||||
signerKeyRef: string; // Reference to signing key
|
||||
}
|
||||
|
||||
type EvidencePacketType =
|
||||
| "release_decision"
|
||||
| "deployment"
|
||||
| "rollback"
|
||||
| "ab_promotion";
|
||||
```
|
||||
|
||||
### Evidence Content
|
||||
|
||||
```typescript
|
||||
interface EvidenceContent {
|
||||
// What was released
|
||||
release: ReleaseEvidence;
|
||||
|
||||
// Where it was released
|
||||
environment: EnvironmentEvidence;
|
||||
|
||||
// Who requested and approved
|
||||
actors: ActorEvidence;
|
||||
|
||||
// Why it was allowed
|
||||
decision: DecisionEvidence;
|
||||
|
||||
// How it was executed (deployment only)
|
||||
execution?: ExecutionEvidence;
|
||||
|
||||
// Previous state (for rollback)
|
||||
previous?: PreviousStateEvidence;
|
||||
}
|
||||
```
|
||||
|
||||
### Release Evidence
|
||||
|
||||
```typescript
|
||||
interface ReleaseEvidence {
|
||||
id: UUID;
|
||||
name: string;
|
||||
displayName: string;
|
||||
createdAt: DateTime;
|
||||
createdBy: ActorRef;
|
||||
|
||||
components: Array<{
|
||||
id: UUID;
|
||||
name: string;
|
||||
digest: string;
|
||||
semver: string;
|
||||
tag: string;
|
||||
role: "primary" | "sidecar" | "init" | "migration";
|
||||
}>;
|
||||
|
||||
sourceRef?: {
|
||||
scmIntegrationId?: UUID;
|
||||
repository?: string;
|
||||
commitSha?: string;
|
||||
branch?: string;
|
||||
ciIntegrationId?: UUID;
|
||||
buildId?: string;
|
||||
pipelineUrl?: string;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Environment Evidence
|
||||
|
||||
```typescript
|
||||
interface EnvironmentEvidence {
|
||||
id: UUID;
|
||||
name: string;
|
||||
displayName: string;
|
||||
orderIndex: number;
|
||||
|
||||
targets: Array<{
|
||||
id: UUID;
|
||||
name: string;
|
||||
type: string;
|
||||
healthStatus: string;
|
||||
}>;
|
||||
|
||||
configuration: {
|
||||
requiredApprovals: number;
|
||||
requireSeparationOfDuties: boolean;
|
||||
promotionPolicy?: string;
|
||||
deploymentTimeout: number;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Actor Evidence
|
||||
|
||||
```typescript
|
||||
interface ActorEvidence {
|
||||
requester: ActorRef;
|
||||
requestReason: string;
|
||||
requestedAt: DateTime;
|
||||
|
||||
approvers: Array<{
|
||||
actor: ActorRef;
|
||||
action: "approved" | "rejected";
|
||||
comment?: string;
|
||||
timestamp: DateTime;
|
||||
roles: string[];
|
||||
}>;
|
||||
|
||||
deployer?: {
|
||||
agent: AgentRef;
|
||||
triggeredBy: ActorRef;
|
||||
startedAt: DateTime;
|
||||
};
|
||||
}
|
||||
|
||||
interface ActorRef {
|
||||
id: UUID;
|
||||
type: "user" | "system" | "agent";
|
||||
name: string;
|
||||
email?: string;
|
||||
}
|
||||
|
||||
interface AgentRef {
|
||||
id: UUID;
|
||||
name: string;
|
||||
version: string;
|
||||
}
|
||||
```
|
||||
|
||||
### Decision Evidence
|
||||
|
||||
```typescript
|
||||
interface DecisionEvidence {
|
||||
promotionId: UUID;
|
||||
decision: "allow" | "block";
|
||||
decidedAt: DateTime;
|
||||
|
||||
gateResults: Array<{
|
||||
gateName: string;
|
||||
gateType: string;
|
||||
passed: boolean;
|
||||
blocking: boolean;
|
||||
message: string;
|
||||
evaluatedAt: DateTime;
|
||||
details: object;
|
||||
}>;
|
||||
|
||||
freezeWindowCheck: {
|
||||
checked: boolean;
|
||||
windowActive: boolean;
|
||||
windowId?: UUID;
|
||||
exemption?: {
|
||||
grantedBy: UUID;
|
||||
reason: string;
|
||||
};
|
||||
};
|
||||
|
||||
separationOfDuties: {
|
||||
required: boolean;
|
||||
satisfied: boolean;
|
||||
requesterIds: UUID[];
|
||||
approverIds: UUID[];
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Execution Evidence
|
||||
|
||||
```typescript
|
||||
interface ExecutionEvidence {
|
||||
deploymentJobId: UUID;
|
||||
strategy: string;
|
||||
startedAt: DateTime;
|
||||
completedAt: DateTime;
|
||||
status: "succeeded" | "failed" | "rolled_back";
|
||||
|
||||
tasks: Array<{
|
||||
targetId: UUID;
|
||||
targetName: string;
|
||||
agentId: UUID;
|
||||
status: string;
|
||||
startedAt: DateTime;
|
||||
completedAt: DateTime;
|
||||
digest: string;
|
||||
stickerWritten: boolean;
|
||||
error?: string;
|
||||
}>;
|
||||
|
||||
artifacts: Array<{
|
||||
name: string;
|
||||
type: string;
|
||||
sha256: string;
|
||||
storageRef: string;
|
||||
}>;
|
||||
|
||||
metrics: {
|
||||
totalTasks: number;
|
||||
succeededTasks: number;
|
||||
failedTasks: number;
|
||||
totalDurationSeconds: number;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Previous State Evidence
|
||||
|
||||
```typescript
|
||||
interface PreviousStateEvidence {
|
||||
releaseId: UUID;
|
||||
releaseName: string;
|
||||
deployedAt: DateTime;
|
||||
deployedBy: ActorRef;
|
||||
components: Array<{
|
||||
name: string;
|
||||
digest: string;
|
||||
}>;
|
||||
}
|
||||
```
|
||||
|
||||
## Example Evidence Packet
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "evid-12345-uuid",
|
||||
"version": "1.0",
|
||||
"type": "deployment",
|
||||
"generatedAt": "2026-01-10T14:35:00Z",
|
||||
"generatorVersion": "stella-evidence-generator@1.5.0",
|
||||
"tenantId": "tenant-uuid",
|
||||
|
||||
"content": {
|
||||
"release": {
|
||||
"id": "rel-uuid",
|
||||
"name": "myapp-v2.3.1",
|
||||
"displayName": "MyApp v2.3.1",
|
||||
"createdAt": "2026-01-10T10:00:00Z",
|
||||
"createdBy": {
|
||||
"id": "user-uuid",
|
||||
"type": "user",
|
||||
"name": "John Doe",
|
||||
"email": "john@example.com"
|
||||
},
|
||||
"components": [
|
||||
{
|
||||
"id": "comp-api-uuid",
|
||||
"name": "api",
|
||||
"digest": "sha256:abc123def456...",
|
||||
"semver": "2.3.1",
|
||||
"tag": "v2.3.1",
|
||||
"role": "primary"
|
||||
},
|
||||
{
|
||||
"id": "comp-worker-uuid",
|
||||
"name": "worker",
|
||||
"digest": "sha256:789xyz...",
|
||||
"semver": "2.3.1",
|
||||
"tag": "v2.3.1",
|
||||
"role": "primary"
|
||||
}
|
||||
],
|
||||
"sourceRef": {
|
||||
"repository": "github.com/myorg/myapp",
|
||||
"commitSha": "abc123",
|
||||
"branch": "main",
|
||||
"buildId": "build-456"
|
||||
}
|
||||
},
|
||||
|
||||
"environment": {
|
||||
"id": "env-prod-uuid",
|
||||
"name": "production",
|
||||
"displayName": "Production",
|
||||
"orderIndex": 2,
|
||||
"targets": [
|
||||
{
|
||||
"id": "target-1-uuid",
|
||||
"name": "prod-web-01",
|
||||
"type": "compose_host",
|
||||
"healthStatus": "healthy"
|
||||
},
|
||||
{
|
||||
"id": "target-2-uuid",
|
||||
"name": "prod-web-02",
|
||||
"type": "compose_host",
|
||||
"healthStatus": "healthy"
|
||||
}
|
||||
],
|
||||
"configuration": {
|
||||
"requiredApprovals": 2,
|
||||
"requireSeparationOfDuties": true,
|
||||
"deploymentTimeout": 600
|
||||
}
|
||||
},
|
||||
|
||||
"actors": {
|
||||
"requester": {
|
||||
"id": "user-john-uuid",
|
||||
"type": "user",
|
||||
"name": "John Doe",
|
||||
"email": "john@example.com"
|
||||
},
|
||||
"requestReason": "Release v2.3.1 with performance improvements",
|
||||
"requestedAt": "2026-01-10T12:00:00Z",
|
||||
"approvers": [
|
||||
{
|
||||
"actor": {
|
||||
"id": "user-jane-uuid",
|
||||
"type": "user",
|
||||
"name": "Jane Smith",
|
||||
"email": "jane@example.com"
|
||||
},
|
||||
"action": "approved",
|
||||
"comment": "LGTM, tests passed",
|
||||
"timestamp": "2026-01-10T13:00:00Z",
|
||||
"roles": ["release_manager"]
|
||||
},
|
||||
{
|
||||
"actor": {
|
||||
"id": "user-bob-uuid",
|
||||
"type": "user",
|
||||
"name": "Bob Johnson",
|
||||
"email": "bob@example.com"
|
||||
},
|
||||
"action": "approved",
|
||||
"comment": "Approved for production",
|
||||
"timestamp": "2026-01-10T13:30:00Z",
|
||||
"roles": ["approver"]
|
||||
}
|
||||
],
|
||||
"deployer": {
|
||||
"agent": {
|
||||
"id": "agent-prod-uuid",
|
||||
"name": "prod-agent-01",
|
||||
"version": "1.5.0"
|
||||
},
|
||||
"triggeredBy": {
|
||||
"id": "system",
|
||||
"type": "system",
|
||||
"name": "Stella Orchestrator"
|
||||
},
|
||||
"startedAt": "2026-01-10T14:00:00Z"
|
||||
}
|
||||
},
|
||||
|
||||
"decision": {
|
||||
"promotionId": "promo-uuid",
|
||||
"decision": "allow",
|
||||
"decidedAt": "2026-01-10T13:55:00Z",
|
||||
"gateResults": [
|
||||
{
|
||||
"gateName": "security-gate",
|
||||
"gateType": "security",
|
||||
"passed": true,
|
||||
"blocking": true,
|
||||
"message": "No critical or high vulnerabilities",
|
||||
"evaluatedAt": "2026-01-10T13:50:00Z",
|
||||
"details": {
|
||||
"critical": 0,
|
||||
"high": 0,
|
||||
"medium": 5,
|
||||
"low": 12
|
||||
}
|
||||
},
|
||||
{
|
||||
"gateName": "approval-gate",
|
||||
"gateType": "approval",
|
||||
"passed": true,
|
||||
"blocking": true,
|
||||
"message": "2/2 required approvals received",
|
||||
"evaluatedAt": "2026-01-10T13:55:00Z",
|
||||
"details": {
|
||||
"required": 2,
|
||||
"received": 2
|
||||
}
|
||||
}
|
||||
],
|
||||
"freezeWindowCheck": {
|
||||
"checked": true,
|
||||
"windowActive": false
|
||||
},
|
||||
"separationOfDuties": {
|
||||
"required": true,
|
||||
"satisfied": true,
|
||||
"requesterIds": ["user-john-uuid"],
|
||||
"approverIds": ["user-jane-uuid", "user-bob-uuid"]
|
||||
}
|
||||
},
|
||||
|
||||
"execution": {
|
||||
"deploymentJobId": "job-uuid",
|
||||
"strategy": "rolling",
|
||||
"startedAt": "2026-01-10T14:00:00Z",
|
||||
"completedAt": "2026-01-10T14:35:00Z",
|
||||
"status": "succeeded",
|
||||
"tasks": [
|
||||
{
|
||||
"targetId": "target-1-uuid",
|
||||
"targetName": "prod-web-01",
|
||||
"agentId": "agent-prod-uuid",
|
||||
"status": "succeeded",
|
||||
"startedAt": "2026-01-10T14:00:00Z",
|
||||
"completedAt": "2026-01-10T14:15:00Z",
|
||||
"digest": "sha256:abc123def456...",
|
||||
"stickerWritten": true
|
||||
},
|
||||
{
|
||||
"targetId": "target-2-uuid",
|
||||
"targetName": "prod-web-02",
|
||||
"agentId": "agent-prod-uuid",
|
||||
"status": "succeeded",
|
||||
"startedAt": "2026-01-10T14:20:00Z",
|
||||
"completedAt": "2026-01-10T14:35:00Z",
|
||||
"digest": "sha256:abc123def456...",
|
||||
"stickerWritten": true
|
||||
}
|
||||
],
|
||||
"artifacts": [
|
||||
{
|
||||
"name": "compose.stella.lock.yml",
|
||||
"type": "compose-lock",
|
||||
"sha256": "checksum...",
|
||||
"storageRef": "s3://artifacts/job-uuid/compose.stella.lock.yml"
|
||||
}
|
||||
],
|
||||
"metrics": {
|
||||
"totalTasks": 2,
|
||||
"succeededTasks": 2,
|
||||
"failedTasks": 0,
|
||||
"totalDurationSeconds": 2100
|
||||
}
|
||||
}
|
||||
},
|
||||
|
||||
"contentHash": "sha256:content-hash...",
|
||||
"signature": "base64-signature...",
|
||||
"signatureAlgorithm": "RS256",
|
||||
"signerKeyRef": "stella/signing/prod-key-2026"
|
||||
}
|
||||
```
|
||||
|
||||
## Signature Verification
|
||||
|
||||
```typescript
|
||||
async function verifyEvidencePacket(packet: EvidencePacket): Promise<VerificationResult> {
|
||||
// 1. Verify content hash
|
||||
const canonicalContent = canonicalize(packet.content);
|
||||
const computedHash = sha256(canonicalContent);
|
||||
|
||||
if (computedHash !== packet.contentHash) {
|
||||
return { valid: false, error: "Content hash mismatch" };
|
||||
}
|
||||
|
||||
// 2. Get signing key
|
||||
const publicKey = await getPublicKey(packet.signerKeyRef);
|
||||
|
||||
// 3. Verify signature
|
||||
const signatureValid = await verify(
|
||||
packet.signature,
|
||||
packet.contentHash,
|
||||
publicKey,
|
||||
packet.signatureAlgorithm
|
||||
);
|
||||
|
||||
if (!signatureValid) {
|
||||
return { valid: false, error: "Invalid signature" };
|
||||
}
|
||||
|
||||
return { valid: true };
|
||||
}
|
||||
```
|
||||
|
||||
## Storage
|
||||
|
||||
Evidence packets are stored in an append-only table:
|
||||
|
||||
```sql
|
||||
CREATE TABLE release.evidence_packets (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id),
|
||||
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
|
||||
type TEXT NOT NULL,
|
||||
version TEXT NOT NULL DEFAULT '1.0',
|
||||
content JSONB NOT NULL,
|
||||
content_hash TEXT NOT NULL,
|
||||
signature TEXT NOT NULL,
|
||||
signature_algorithm TEXT NOT NULL,
|
||||
signer_key_ref TEXT NOT NULL,
|
||||
generated_at TIMESTAMPTZ NOT NULL,
|
||||
generator_version TEXT NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
-- Note: No updated_at - packets are immutable
|
||||
);
|
||||
|
||||
-- Prevent modifications
|
||||
REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role;
|
||||
```
|
||||
|
||||
## Export Formats
|
||||
|
||||
Evidence packets can be exported in multiple formats:
|
||||
|
||||
| Format | Use Case |
|
||||
|--------|----------|
|
||||
| JSON | API consumption, archival |
|
||||
| PDF | Human-readable compliance reports |
|
||||
| CSV | Spreadsheet analysis |
|
||||
| SLSA | SLSA provenance format |
|
||||
|
||||
## References
|
||||
|
||||
- [Security Overview](../security/overview.md)
|
||||
- [Deployment Artifacts](../deployment/artifacts.md)
|
||||
- [Audit Trail](../security/audit-trail.md)
|
||||
235
docs/modules/release-orchestrator/appendices/glossary.md
Normal file
235
docs/modules/release-orchestrator/appendices/glossary.md
Normal file
@@ -0,0 +1,235 @@
|
||||
# Glossary
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### Agent
|
||||
A software component installed on deployment targets that receives and executes deployment tasks. Agents communicate with the orchestrator via mTLS and execute deployments locally on the target.
|
||||
|
||||
### Approval
|
||||
A human decision to authorize a promotion request. Approvals may require multiple approvers and enforce separation of duties.
|
||||
|
||||
### Approval Policy
|
||||
Rules defining who can approve promotions to specific environments, including required approval counts and SoD requirements.
|
||||
|
||||
### Blue-Green Deployment
|
||||
A deployment strategy using two identical production environments. Traffic switches from "blue" (current) to "green" (new) after validation.
|
||||
|
||||
### Canary Deployment
|
||||
A deployment strategy that gradually rolls out changes to a small subset of targets before full deployment, allowing validation with real traffic.
|
||||
|
||||
### Channel
|
||||
A version stream for components (e.g., "stable", "beta", "nightly"). Each channel tracks the latest compatible version.
|
||||
|
||||
### Component
|
||||
A deployable unit mapped to a container image repository. Components have versions tracked via digest.
|
||||
|
||||
### Compose Lock
|
||||
A Docker Compose file with all image references pinned to specific digests, ensuring reproducible deployments.
|
||||
|
||||
### Connector
|
||||
A plugin that integrates Release Orchestrator with external systems (registries, CI/CD, notifications, etc.).
|
||||
|
||||
### Decision Record
|
||||
An immutable record of all gate evaluations and conditions considered when making a promotion decision.
|
||||
|
||||
### Deployment Job
|
||||
A unit of work representing the deployment of a release to an environment. Contains multiple deployment tasks.
|
||||
|
||||
### Deployment Task
|
||||
A single target-level deployment operation within a deployment job.
|
||||
|
||||
### Digest
|
||||
A cryptographic hash (SHA-256) that uniquely identifies a container image. Format: `sha256:abc123...`
|
||||
|
||||
### Drift
|
||||
A mismatch between the expected deployed version (from version sticker) and the actual running version on a target.
|
||||
|
||||
### Environment
|
||||
A logical grouping of deployment targets representing a stage in the promotion pipeline (e.g., dev, staging, production).
|
||||
|
||||
### Evidence Packet
|
||||
An immutable, cryptographically signed record of deployment decisions and outcomes for audit purposes.
|
||||
|
||||
### Freeze Window
|
||||
A time period during which deployments to an environment are blocked (e.g., holiday code freeze).
|
||||
|
||||
### Gate
|
||||
A checkpoint in the promotion workflow that must pass before deployment proceeds. Types include security gates, approval gates, and custom policy gates.
|
||||
|
||||
### Promotion
|
||||
The process of moving a release from one environment to another, subject to gates and approvals.
|
||||
|
||||
### Release
|
||||
A versioned bundle of component digests representing a deployable unit. Releases are immutable once created.
|
||||
|
||||
### Rolling Deployment
|
||||
A deployment strategy that updates targets in batches, maintaining availability throughout the process.
|
||||
|
||||
### Rollback
|
||||
The process of reverting to a previous release version when a deployment fails or causes issues.
|
||||
|
||||
### Security Gate
|
||||
An automated gate that evaluates security policies (vulnerability thresholds, compliance requirements) before allowing promotion.
|
||||
|
||||
### Separation of Duties (SoD)
|
||||
A security principle requiring that the person who requests a promotion cannot be the same person who approves it.
|
||||
|
||||
### Step
|
||||
A single unit of work within a workflow template. Steps have types (deploy, approve, notify, etc.) and can have dependencies.
|
||||
|
||||
### Target
|
||||
A specific deployment destination (host, service, container) within an environment.
|
||||
|
||||
### Tenant
|
||||
An isolated organizational unit with its own environments, releases, and configurations. Multi-tenancy ensures data isolation.
|
||||
|
||||
### Version Map
|
||||
A mapping of image tags to digests for a component, allowing tag-based references while maintaining digest-based deployments.
|
||||
|
||||
### Version Sticker
|
||||
Metadata placed on deployment targets indicating the currently deployed release and digest.
|
||||
|
||||
### Workflow
|
||||
A DAG (Directed Acyclic Graph) of steps defining the deployment process, including gates, approvals, and verification.
|
||||
|
||||
### Workflow Template
|
||||
A reusable workflow definition that can be customized for specific deployment scenarios.
|
||||
|
||||
## Module Abbreviations
|
||||
|
||||
| Abbreviation | Full Name | Description |
|
||||
|--------------|-----------|-------------|
|
||||
| INTHUB | Integration Hub | External system integration |
|
||||
| ENVMGR | Environment Manager | Environment and target management |
|
||||
| RELMAN | Release Management | Component and release management |
|
||||
| WORKFL | Workflow Engine | Workflow execution |
|
||||
| PROMOT | Promotion & Approval | Promotion and approval handling |
|
||||
| DEPLOY | Deployment Execution | Deployment orchestration |
|
||||
| AGENTS | Deployment Agents | Agent management |
|
||||
| PROGDL | Progressive Delivery | A/B and canary releases |
|
||||
| RELEVI | Release Evidence | Audit and compliance |
|
||||
| PLUGIN | Plugin Infrastructure | Plugin system |
|
||||
|
||||
## Deployment Strategies
|
||||
|
||||
| Strategy | Description |
|
||||
|----------|-------------|
|
||||
| All-at-once | Deploy to all targets simultaneously |
|
||||
| Rolling | Deploy in batches with availability |
|
||||
| Canary | Gradual rollout with metrics validation |
|
||||
| Blue-Green | Parallel environment with traffic switch |
|
||||
|
||||
## Status Values
|
||||
|
||||
### Promotion Status
|
||||
|
||||
| Status | Description |
|
||||
|--------|-------------|
|
||||
| `pending` | Promotion created, not yet evaluated |
|
||||
| `pending_approval` | Waiting for human approval |
|
||||
| `approved` | Approved, ready for deployment |
|
||||
| `rejected` | Rejected by approver |
|
||||
| `deploying` | Deployment in progress |
|
||||
| `completed` | Successfully deployed |
|
||||
| `failed` | Deployment failed |
|
||||
| `cancelled` | Cancelled by user |
|
||||
|
||||
### Deployment Job Status
|
||||
|
||||
| Status | Description |
|
||||
|--------|-------------|
|
||||
| `pending` | Job created, not started |
|
||||
| `preparing` | Generating artifacts |
|
||||
| `running` | Tasks executing |
|
||||
| `completing` | Verifying deployment |
|
||||
| `completed` | Successfully completed |
|
||||
| `failed` | Deployment failed |
|
||||
| `rolling_back` | Rollback in progress |
|
||||
| `rolled_back` | Rollback completed |
|
||||
|
||||
### Agent Status
|
||||
|
||||
| Status | Description |
|
||||
|--------|-------------|
|
||||
| `online` | Agent connected and healthy |
|
||||
| `offline` | Agent not connected |
|
||||
| `degraded` | Agent connected but reporting issues |
|
||||
|
||||
### Target Health Status
|
||||
|
||||
| Status | Description |
|
||||
|--------|-------------|
|
||||
| `healthy` | Target responding correctly |
|
||||
| `unhealthy` | Target failing health checks |
|
||||
| `unknown` | Health status not determined |
|
||||
|
||||
## API Error Codes
|
||||
|
||||
| Code | Description |
|
||||
|------|-------------|
|
||||
| `RELEASE_NOT_FOUND` | Release ID does not exist |
|
||||
| `ENVIRONMENT_NOT_FOUND` | Environment ID does not exist |
|
||||
| `PROMOTION_BLOCKED` | Promotion blocked by gate or freeze |
|
||||
| `APPROVAL_REQUIRED` | Promotion requires approval |
|
||||
| `INSUFFICIENT_APPROVALS` | Not enough approvals |
|
||||
| `SOD_VIOLATION` | Separation of duties violated |
|
||||
| `FREEZE_WINDOW_ACTIVE` | Environment in freeze window |
|
||||
| `SECURITY_GATE_FAILED` | Security requirements not met |
|
||||
| `NO_AGENT_AVAILABLE` | No agent available for target |
|
||||
| `DEPLOYMENT_IN_PROGRESS` | Another deployment running |
|
||||
| `ROLLBACK_NOT_POSSIBLE` | No previous version to rollback to |
|
||||
|
||||
## Integration Types
|
||||
|
||||
| Type | Category | Description |
|
||||
|------|----------|-------------|
|
||||
| `docker-registry` | Registry | Docker Registry v2 |
|
||||
| `ecr` | Registry | AWS ECR |
|
||||
| `acr` | Registry | Azure Container Registry |
|
||||
| `gcr` | Registry | Google Container Registry |
|
||||
| `harbor` | Registry | Harbor Registry |
|
||||
| `gitlab-ci` | CI/CD | GitLab CI/CD |
|
||||
| `github-actions` | CI/CD | GitHub Actions |
|
||||
| `jenkins` | CI/CD | Jenkins |
|
||||
| `slack` | Notification | Slack |
|
||||
| `teams` | Notification | Microsoft Teams |
|
||||
| `email` | Notification | Email (SMTP) |
|
||||
| `hashicorp-vault` | Secrets | HashiCorp Vault |
|
||||
| `prometheus` | Metrics | Prometheus |
|
||||
|
||||
## Workflow Step Types
|
||||
|
||||
| Type | Category | Description |
|
||||
|------|----------|-------------|
|
||||
| `approval` | Control | Wait for human approval |
|
||||
| `wait` | Control | Wait for duration |
|
||||
| `condition` | Control | Branch based on condition |
|
||||
| `parallel` | Control | Execute children in parallel |
|
||||
| `security-gate` | Gate | Evaluate security policy |
|
||||
| `custom-gate` | Gate | Custom OPA policy |
|
||||
| `freeze-check` | Gate | Check freeze windows |
|
||||
| `deploy-docker` | Deploy | Deploy single container |
|
||||
| `deploy-compose` | Deploy | Deploy Compose stack |
|
||||
| `health-check` | Verify | HTTP/TCP health check |
|
||||
| `smoke-test` | Verify | Run smoke tests |
|
||||
| `notify` | Notify | Send notification |
|
||||
| `webhook` | Integration | Call external webhook |
|
||||
| `trigger-ci` | Integration | Trigger CI pipeline |
|
||||
| `rollback` | Recovery | Rollback deployment |
|
||||
|
||||
## Security Terms
|
||||
|
||||
| Term | Description |
|
||||
|------|-------------|
|
||||
| mTLS | Mutual TLS - both client and server authenticate with certificates |
|
||||
| JWT | JSON Web Token - used for API authentication |
|
||||
| RBAC | Role-Based Access Control |
|
||||
| OPA | Open Policy Agent - policy evaluation engine |
|
||||
| SoD | Separation of Duties |
|
||||
| PEP | Policy Enforcement Point |
|
||||
|
||||
## References
|
||||
|
||||
- [Design Principles](../design/principles.md)
|
||||
- [API Overview](../api/overview.md)
|
||||
- [Security Overview](../security/overview.md)
|
||||
410
docs/modules/release-orchestrator/architecture.md
Normal file
410
docs/modules/release-orchestrator/architecture.md
Normal file
@@ -0,0 +1,410 @@
|
||||
# Release Orchestrator Architecture
|
||||
|
||||
> Technical architecture specification for the Release Orchestrator — Stella Ops Suite's central release control plane for non-Kubernetes container estates.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
|
||||
## Overview
|
||||
|
||||
The Release Orchestrator transforms Stella Ops Suite from a vulnerability scanning platform into a centralized, auditable release control plane. It sits between CI systems and runtime targets, governing promotion across environments, enforcing security and policy gates, and producing verifiable evidence for every release decision.
|
||||
|
||||
### Core Value Proposition
|
||||
|
||||
- **Release orchestration** — UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks
|
||||
- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates
|
||||
- **OCI-digest-first releases** — Immutable digest-based release identity
|
||||
- **Toolchain-agnostic integrations** — Plug into any SCM, CI, registry, secrets system
|
||||
- **Auditability + standards** — Evidence packets, SBOM/VEX/attestation support, deterministic replay
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Digest-First Release Identity** — A release is an immutable set of OCI digests, never mutable tags. Tags are resolved to digests at release creation time.
|
||||
|
||||
2. **Pluggable Everything, Stable Core** — Integrations are plugins; the core orchestration engine is stable. Plugins contribute UI screens, connector logic, step types, and agent types.
|
||||
|
||||
3. **Evidence for Every Decision** — Every deployment/promotion produces an immutable evidence record containing who, what, why, how, and when.
|
||||
|
||||
4. **No Feature Gating** — All plans include all features. Limits are only: environments, new digests/day, fair use on deployments.
|
||||
|
||||
5. **Offline-First Operation** — All core operations work in air-gapped environments. Plugins may require connectivity; core does not.
|
||||
|
||||
6. **Immutable Generated Artifacts** — Every deployment generates and stores immutable artifacts (compose lockfiles, scripts, evidence).
|
||||
|
||||
## Platform Themes
|
||||
|
||||
The Release Orchestrator introduces ten new functional themes:
|
||||
|
||||
| Theme | Purpose | Key Modules |
|
||||
|-------|---------|-------------|
|
||||
| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime |
|
||||
| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager |
|
||||
| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager |
|
||||
| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor |
|
||||
| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine |
|
||||
| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Artifact Generator |
|
||||
| **AGENTS** | Deployment agents | Agent Core, Docker/Compose/ECS/Nomad agents |
|
||||
| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller |
|
||||
| **RELEVI** | Release evidence | Evidence Collector, Sticker Writer, Audit Exporter |
|
||||
| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin SDK |
|
||||
|
||||
## Components
|
||||
|
||||
```
|
||||
ReleaseOrchestrator/
|
||||
├── __Libraries/
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Core/ # Core domain models
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Workflow/ # DAG workflow engine
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Promotion/ # Promotion logic
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Deploy/ # Deployment coordination
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Evidence/ # Evidence generation
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Plugin/ # Plugin infrastructure
|
||||
│ └── StellaOps.ReleaseOrchestrator.Integration/ # Integration connectors
|
||||
├── StellaOps.ReleaseOrchestrator.WebService/ # HTTP API
|
||||
├── StellaOps.ReleaseOrchestrator.Worker/ # Background processing
|
||||
├── StellaOps.Agent.Core/ # Agent base framework
|
||||
├── StellaOps.Agent.Docker/ # Docker host agent
|
||||
├── StellaOps.Agent.Compose/ # Docker Compose agent
|
||||
├── StellaOps.Agent.SSH/ # SSH agentless executor
|
||||
├── StellaOps.Agent.WinRM/ # WinRM agentless executor
|
||||
├── StellaOps.Agent.ECS/ # AWS ECS agent
|
||||
├── StellaOps.Agent.Nomad/ # HashiCorp Nomad agent
|
||||
└── __Tests/
|
||||
└── StellaOps.ReleaseOrchestrator.*.Tests/
|
||||
```
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Release Orchestration Flow
|
||||
|
||||
```
|
||||
CI Build → Registry Push → Webhook → Stella Scan → Create Release →
|
||||
Request Promotion → Gate Evaluation → Decision Record →
|
||||
Deploy via Agent → Version Sticker → Evidence Packet
|
||||
```
|
||||
|
||||
### Detailed Flow
|
||||
|
||||
1. **CI pushes image** to registry by digest; triggers webhook to Stella
|
||||
2. **Stella scans** the new digest (if not already scanned); stores verdict
|
||||
3. **Release created** bundling component digests with semantic version
|
||||
4. **Promotion requested** to move release from source → target environment
|
||||
5. **Gate evaluation** runs: security verdict, approval count, freeze windows, custom policies
|
||||
6. **Decision record** produced with evidence refs and signed
|
||||
7. **Deployment executed** via agent to target (Docker/Compose/ECS/Nomad)
|
||||
8. **Version sticker** written to target for drift detection
|
||||
9. **Evidence packet** sealed and stored
|
||||
|
||||
## Key Abstractions
|
||||
|
||||
### Environment
|
||||
|
||||
```csharp
|
||||
public sealed record Environment
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required string Name { get; init; } // "dev", "stage", "prod"
|
||||
public required string Slug { get; init; } // URL-safe identifier
|
||||
public required int PromotionOrder { get; init; } // 1, 2, 3...
|
||||
public required FreezeWindow[] FreezeWindows { get; init; }
|
||||
public required ApprovalPolicy ApprovalPolicy { get; init; }
|
||||
public required bool IsProduction { get; init; }
|
||||
public EnvironmentState State { get; init; } // Active, Frozen, Retired
|
||||
}
|
||||
```
|
||||
|
||||
### Release
|
||||
|
||||
```csharp
|
||||
public sealed record Release
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required string Version { get; init; } // SemVer: "2.3.1"
|
||||
public required string Name { get; init; } // Display name
|
||||
public required ImmutableDictionary<string, ComponentDigest> Components { get; init; }
|
||||
public required string SourceRef { get; init; } // Git SHA or tag
|
||||
public required DateTimeOffset CreatedAt { get; init; }
|
||||
public required Guid CreatedBy { get; init; }
|
||||
public ReleaseState State { get; init; } // Draft, Active, Deprecated
|
||||
}
|
||||
|
||||
public sealed record ComponentDigest
|
||||
{
|
||||
public required string Repository { get; init; } // registry.example.com/app/api
|
||||
public required string Digest { get; init; } // sha256:abc123...
|
||||
public required string? ResolvedFromTag { get; init; } // Optional: "v2.3.1"
|
||||
}
|
||||
```
|
||||
|
||||
### Promotion
|
||||
|
||||
```csharp
|
||||
public sealed record Promotion
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required Guid ReleaseId { get; init; }
|
||||
public required Guid SourceEnvironmentId { get; init; }
|
||||
public required Guid TargetEnvironmentId { get; init; }
|
||||
public required Guid RequestedBy { get; init; }
|
||||
public required DateTimeOffset RequestedAt { get; init; }
|
||||
public PromotionState State { get; init; } // Pending, Approved, Rejected, Deployed, RolledBack
|
||||
public required ImmutableArray<GateResult> GateResults { get; init; }
|
||||
public required ImmutableArray<ApprovalRecord> Approvals { get; init; }
|
||||
public required DecisionRecord? Decision { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### Workflow
|
||||
|
||||
```csharp
|
||||
public sealed record Workflow
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required string Name { get; init; }
|
||||
public required ImmutableArray<WorkflowStep> Steps { get; init; }
|
||||
public required ImmutableDictionary<string, string[]> DependencyGraph { get; init; }
|
||||
}
|
||||
|
||||
public sealed record WorkflowStep
|
||||
{
|
||||
public required string Id { get; init; }
|
||||
public required string Type { get; init; } // "script", "approval", "deploy", "gate"
|
||||
public required StepProvider Provider { get; init; }
|
||||
public required ImmutableDictionary<string, object> Config { get; init; }
|
||||
public required string[] DependsOn { get; init; }
|
||||
public StepState State { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### Target
|
||||
|
||||
```csharp
|
||||
public sealed record Target
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required Guid EnvironmentId { get; init; }
|
||||
public required string Name { get; init; }
|
||||
public required TargetType Type { get; init; } // DockerHost, ComposeHost, ECSService, NomadJob
|
||||
public required ImmutableDictionary<string, string> Labels { get; init; }
|
||||
public required Guid? AgentId { get; init; } // Null for agentless
|
||||
public required TargetState State { get; init; }
|
||||
public required HealthStatus Health { get; init; }
|
||||
}
|
||||
|
||||
public enum TargetType
|
||||
{
|
||||
DockerHost,
|
||||
ComposeHost,
|
||||
ECSService,
|
||||
NomadJob,
|
||||
SSHRemote,
|
||||
WinRMRemote
|
||||
}
|
||||
```
|
||||
|
||||
### Agent
|
||||
|
||||
```csharp
|
||||
public sealed record Agent
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required string Name { get; init; }
|
||||
public required string Version { get; init; }
|
||||
public required ImmutableArray<string> Capabilities { get; init; }
|
||||
public required DateTimeOffset LastHeartbeat { get; init; }
|
||||
public required AgentState State { get; init; } // Online, Offline, Degraded
|
||||
public required ImmutableDictionary<string, string> Labels { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
| Table | Purpose |
|
||||
|-------|---------|
|
||||
| `release.environments` | Environment definitions with freeze windows |
|
||||
| `release.targets` | Deployment targets within environments |
|
||||
| `release.agents` | Registered deployment agents |
|
||||
| `release.components` | Component definitions (service → repository mapping) |
|
||||
| `release.releases` | Release bundles (version → component digests) |
|
||||
| `release.promotions` | Promotion requests and state |
|
||||
| `release.approvals` | Approval records |
|
||||
| `release.workflows` | Workflow templates |
|
||||
| `release.workflow_runs` | Workflow execution state |
|
||||
| `release.deployment_jobs` | Deployment job records |
|
||||
| `release.evidence_packets` | Sealed evidence records |
|
||||
| `release.integrations` | Integration configurations |
|
||||
| `release.plugins` | Plugin registrations |
|
||||
|
||||
## Gate Types
|
||||
|
||||
| Gate | Purpose | Evaluation |
|
||||
|------|---------|------------|
|
||||
| **Security** | Check scan verdict | Query latest scan for release digest; block on critical/high reachable |
|
||||
| **Approval** | Human sign-off | Count approvals; check SoD rules |
|
||||
| **FreezeWindow** | Calendar-based blocking | Check target environment freeze windows |
|
||||
| **PreviousEnvironment** | Require prior deployment | Verify release deployed to source environment |
|
||||
| **Policy** | Custom OPA/Rego rules | Evaluate policy with promotion context |
|
||||
| **HealthCheck** | Target health | Verify target is healthy before deploy |
|
||||
|
||||
## Plugin System (Three-Surface Model)
|
||||
|
||||
Plugins contribute through three surfaces:
|
||||
|
||||
### 1. Manifest (Static Declaration)
|
||||
|
||||
```yaml
|
||||
# plugin-manifest.yaml
|
||||
name: github-integration
|
||||
version: 1.0.0
|
||||
provider: StellaOps.Integration.GitHub.Plugin
|
||||
capabilities:
|
||||
integrations:
|
||||
- type: scm
|
||||
id: github
|
||||
displayName: GitHub
|
||||
steps:
|
||||
- type: github-status
|
||||
displayName: Update GitHub Status
|
||||
gates:
|
||||
- type: github-check
|
||||
displayName: GitHub Check Required
|
||||
```
|
||||
|
||||
### 2. Connector Runtime (Dynamic Execution)
|
||||
|
||||
```csharp
|
||||
public interface IIntegrationConnector
|
||||
{
|
||||
Task<ConnectionTestResult> TestConnectionAsync(CancellationToken ct);
|
||||
Task<HealthStatus> GetHealthAsync(CancellationToken ct);
|
||||
Task<IReadOnlyList<Resource>> DiscoverResourcesAsync(string resourceType, CancellationToken ct);
|
||||
}
|
||||
|
||||
public interface ISCMConnector : IIntegrationConnector
|
||||
{
|
||||
Task<CommitInfo> GetCommitAsync(string ref, CancellationToken ct);
|
||||
Task CreateCommitStatusAsync(string commit, CommitStatus status, CancellationToken ct);
|
||||
}
|
||||
|
||||
public interface IRegistryConnector : IIntegrationConnector
|
||||
{
|
||||
Task<string> ResolveDigestAsync(string imageRef, CancellationToken ct);
|
||||
Task<bool> VerifyDigestAsync(string imageRef, string expectedDigest, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Step Provider (Execution Contract)
|
||||
|
||||
```csharp
|
||||
public interface IStepProvider
|
||||
{
|
||||
StepExecutionCharacteristics Characteristics { get; }
|
||||
Task<StepResult> ExecuteAsync(StepContext context, CancellationToken ct);
|
||||
Task<StepResult> RollbackAsync(StepContext context, CancellationToken ct);
|
||||
}
|
||||
|
||||
public sealed record StepExecutionCharacteristics
|
||||
{
|
||||
public bool IsIdempotent { get; init; }
|
||||
public bool SupportsRollback { get; init; }
|
||||
public TimeSpan DefaultTimeout { get; init; }
|
||||
public ResourceRequirements Resources { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
## Invariants
|
||||
|
||||
1. **Release identity is immutable** — Once created, a release's component digests cannot be changed. Create a new release instead.
|
||||
|
||||
2. **Promotions are append-only** — Promotion state transitions are recorded; no edits or deletions.
|
||||
|
||||
3. **Evidence packets are sealed** — Evidence is cryptographically signed and stored immutably.
|
||||
|
||||
4. **Digest verification at deploy time** — Agents verify image digests at pull time; mismatch fails deployment.
|
||||
|
||||
5. **Separation of duties enforced** — Requester cannot be sole approver for production promotions.
|
||||
|
||||
6. **Workflow execution is deterministic** — Same inputs produce same execution order and outputs.
|
||||
|
||||
## Error Handling
|
||||
|
||||
- **Transient failures** — Retry with exponential backoff; circuit breaker for repeated failures
|
||||
- **Agent disconnection** — Mark agent offline; reassign pending tasks to other agents
|
||||
- **Deployment failure** — Automatic rollback if configured; otherwise mark promotion as failed
|
||||
- **Gate failure** — Block promotion; require manual intervention or re-evaluation
|
||||
|
||||
## Observability
|
||||
|
||||
### Metrics
|
||||
|
||||
- `release_promotions_total` — Counter by environment and outcome
|
||||
- `release_deployments_duration_seconds` — Histogram of deployment times
|
||||
- `release_gate_evaluations_total` — Counter by gate type and result
|
||||
- `release_agents_online` — Gauge of online agents
|
||||
- `release_workflow_steps_duration_seconds` — Histogram by step type
|
||||
|
||||
### Traces
|
||||
|
||||
- `promotion.request` — Span for promotion request handling
|
||||
- `gate.evaluate` — Span for each gate evaluation
|
||||
- `deployment.execute` — Span for deployment execution
|
||||
- `agent.task` — Span for agent task execution
|
||||
|
||||
### Logs
|
||||
|
||||
- Structured logs with correlation IDs
|
||||
- Promotion ID, release ID, environment ID in all relevant logs
|
||||
- Sensitive data (secrets, credentials) masked
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Agent Security
|
||||
|
||||
- **mTLS authentication** — Agents authenticate with CA-signed certificates
|
||||
- **Short-lived credentials** — Task credentials expire after execution
|
||||
- **Capability-based authorization** — Agents only receive tasks matching their capabilities
|
||||
- **Heartbeat monitoring** — Detect and flag agent disconnections
|
||||
|
||||
### Secrets Management
|
||||
|
||||
- **Never stored in database** — Only vault references stored
|
||||
- **Fetched at execution time** — Secrets retrieved just-in-time for deployment
|
||||
- **Short-lived** — Dynamic credentials with minimal TTL
|
||||
- **Masked in logs** — Secret values never logged
|
||||
|
||||
### Plugin Sandbox
|
||||
|
||||
- **Resource limits** — CPU, memory, timeout limits per plugin
|
||||
- **Capability restrictions** — Plugins declare required capabilities
|
||||
- **Network isolation** — Optional network restrictions for plugins
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- **Promotion evaluation** — < 5 seconds for typical gate evaluation
|
||||
- **Deployment latency** — Dominated by image pull time; orchestration overhead < 10 seconds
|
||||
- **Agent heartbeat** — 30-second interval; offline detection within 90 seconds
|
||||
- **Workflow step timeout** — Configurable; default 5 minutes per step
|
||||
|
||||
## Implementation Roadmap
|
||||
|
||||
| Phase | Focus | Key Deliverables |
|
||||
|-------|-------|------------------|
|
||||
| **Phase 1** | Foundation | Environment management, integration hub, release bundles |
|
||||
| **Phase 2** | Workflow Engine | DAG execution, step registry, workflow templates |
|
||||
| **Phase 3** | Promotion & Decision | Approval gateway, security gates, decision records |
|
||||
| **Phase 4** | Deployment Execution | Docker/Compose agents, artifact generation, rollback |
|
||||
| **Phase 5** | UI & Polish | Release dashboard, promotion UI, environment management |
|
||||
| **Phase 6** | Progressive Delivery | A/B releases, canary, traffic routing |
|
||||
| **Phase 7** | Extended Targets | ECS, Nomad, SSH/WinRM agentless |
|
||||
| **Phase 8** | Plugin Ecosystem | Full plugin system, marketplace |
|
||||
|
||||
## References
|
||||
|
||||
- [Product Vision](../../product/VISION.md)
|
||||
- [Architecture Overview](../../ARCHITECTURE_OVERVIEW.md)
|
||||
- [Full Orchestrator Specification](../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
- [Competitive Landscape](../../product/competitive-landscape.md)
|
||||
343
docs/modules/release-orchestrator/data-model/entities.md
Normal file
343
docs/modules/release-orchestrator/data-model/entities.md
Normal file
@@ -0,0 +1,343 @@
|
||||
# Entity Definitions
|
||||
|
||||
This document describes the core entities in the Release Orchestrator data model.
|
||||
|
||||
## Entity Relationship Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ENTITY RELATIONSHIPS │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │
|
||||
│ │ Tenant │───────│ Environment │───────│ Target │ │
|
||||
│ └──────────┘ └──────────────┘ └────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │
|
||||
│ │ Component│ │ Approval │ │ Agent │ │
|
||||
│ └──────────┘ │ Policy │ └────────────┘ │
|
||||
│ │ └──────────────┘ │ │
|
||||
│ │ │ │ │
|
||||
│ ▼ │ ▼ │
|
||||
│ ┌──────────┐ │ ┌─────────────┐ │
|
||||
│ │ Version │ │ │ Deployment │ │
|
||||
│ │ Map │ │ │ Task │ │
|
||||
│ └──────────┘ │ └─────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ │ │ │ │
|
||||
│ ▼ │ ▼ │
|
||||
│ ┌─────────────────────────┼─────────────────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ │ ┌──────────┐ ┌─────▼─────┐ ┌─────────────┐ │ │
|
||||
│ │ │ Release │─────│ Promotion │─────│ Deployment │ │ │
|
||||
│ │ └──────────┘ └───────────┘ │ Job │ │ │
|
||||
│ │ │ │ └─────────────┘ │ │
|
||||
│ │ │ │ │ │ │
|
||||
│ │ │ ▼ │ │ │
|
||||
│ │ │ ┌───────────┐ │ │ │
|
||||
│ │ │ │ Approval │ │ │ │
|
||||
│ │ │ └───────────┘ │ │ │
|
||||
│ │ │ │ │ │ │
|
||||
│ │ │ ▼ ▼ │ │
|
||||
│ │ │ ┌───────────┐ ┌───────────┐ │ │
|
||||
│ │ │ │ Decision │ │ Generated │ │ │
|
||||
│ │ │ │ Record │ │ Artifacts │ │ │
|
||||
│ │ │ └───────────┘ └───────────┘ │ │
|
||||
│ │ │ │ │ │ │
|
||||
│ │ │ └────────┬────────┘ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ ▼ │ │
|
||||
│ │ │ ┌───────────┐ │ │
|
||||
│ │ └───────────────────►│ Evidence │◄────────────┘ │
|
||||
│ │ │ Packet │ │
|
||||
│ │ └───────────┘ │
|
||||
│ │ │ │
|
||||
│ │ ▼ │
|
||||
│ │ ┌───────────┐ │
|
||||
│ │ │ Version │ │
|
||||
│ │ │ Sticker │ │
|
||||
│ │ └───────────┘ │
|
||||
│ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────────┘
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Core Entities
|
||||
|
||||
### Environment
|
||||
|
||||
Represents a deployment target environment (dev, staging, production).
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `name` | string | Unique name (e.g., "prod") |
|
||||
| `display_name` | string | Display name (e.g., "Production") |
|
||||
| `order_index` | integer | Promotion order |
|
||||
| `config` | JSONB | Environment configuration |
|
||||
| `freeze_windows` | JSONB | Active freeze windows |
|
||||
| `required_approvals` | integer | Approvals needed for promotion |
|
||||
| `require_sod` | boolean | Require separation of duties |
|
||||
| `created_at` | timestamp | Creation time |
|
||||
|
||||
### Target
|
||||
|
||||
Represents a deployment target (host, service).
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `environment_id` | UUID | Environment reference |
|
||||
| `name` | string | Target name |
|
||||
| `target_type` | string | Type (docker_host, compose_host, etc.) |
|
||||
| `connection` | JSONB | Connection configuration |
|
||||
| `labels` | JSONB | Target labels |
|
||||
| `health_status` | string | Current health status |
|
||||
| `current_digest` | string | Currently deployed digest |
|
||||
|
||||
### Agent
|
||||
|
||||
Represents a deployment agent.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `name` | string | Agent name |
|
||||
| `version` | string | Agent version |
|
||||
| `capabilities` | JSONB | Agent capabilities |
|
||||
| `status` | string | online/offline/degraded |
|
||||
| `last_heartbeat` | timestamp | Last heartbeat time |
|
||||
|
||||
### Component
|
||||
|
||||
Represents a deployable component (maps to an image repository).
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `name` | string | Component name |
|
||||
| `display_name` | string | Display name |
|
||||
| `image_repository` | string | Image repository URL |
|
||||
| `versioning_strategy` | JSONB | How versions are determined |
|
||||
| `default_channel` | string | Default version channel |
|
||||
|
||||
### Version Map
|
||||
|
||||
Maps image tags to digests and semantic versions.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `component_id` | UUID | Component reference |
|
||||
| `tag` | string | Image tag |
|
||||
| `digest` | string | Image digest (sha256:...) |
|
||||
| `semver` | string | Semantic version |
|
||||
| `channel` | string | Version channel (stable, beta) |
|
||||
|
||||
### Release
|
||||
|
||||
A versioned bundle of component digests.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `name` | string | Release name |
|
||||
| `display_name` | string | Display name |
|
||||
| `components` | JSONB | Component/digest mappings |
|
||||
| `source_ref` | JSONB | Source code reference |
|
||||
| `status` | string | draft/ready/deployed/deprecated |
|
||||
| `created_by` | UUID | Creator user reference |
|
||||
|
||||
### Promotion
|
||||
|
||||
A request to promote a release to an environment.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `release_id` | UUID | Release reference |
|
||||
| `source_environment_id` | UUID | Source environment (nullable) |
|
||||
| `target_environment_id` | UUID | Target environment |
|
||||
| `status` | string | Promotion status |
|
||||
| `decision_record` | JSONB | Gate evaluation results |
|
||||
| `workflow_run_id` | UUID | Associated workflow run |
|
||||
| `requested_by` | UUID | Requesting user |
|
||||
| `requested_at` | timestamp | Request time |
|
||||
|
||||
### Approval
|
||||
|
||||
An approval or rejection of a promotion.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `promotion_id` | UUID | Promotion reference |
|
||||
| `approver_id` | UUID | Approving user |
|
||||
| `action` | string | approved/rejected |
|
||||
| `comment` | string | Approval comment |
|
||||
| `approved_at` | timestamp | Approval time |
|
||||
|
||||
### Deployment Job
|
||||
|
||||
A deployment execution job.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `promotion_id` | UUID | Promotion reference |
|
||||
| `release_id` | UUID | Release reference |
|
||||
| `environment_id` | UUID | Environment reference |
|
||||
| `status` | string | Job status |
|
||||
| `strategy` | string | Deployment strategy |
|
||||
| `artifacts` | JSONB | Generated artifacts |
|
||||
| `rollback_of` | UUID | If rollback, original job |
|
||||
|
||||
### Deployment Task
|
||||
|
||||
A task to deploy to a single target.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `job_id` | UUID | Job reference |
|
||||
| `target_id` | UUID | Target reference |
|
||||
| `digest` | string | Digest to deploy |
|
||||
| `status` | string | Task status |
|
||||
| `agent_id` | UUID | Assigned agent |
|
||||
| `logs` | text | Execution logs |
|
||||
| `previous_digest` | string | Previous digest (for rollback) |
|
||||
|
||||
### Evidence Packet
|
||||
|
||||
Immutable audit evidence for a promotion/deployment.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `promotion_id` | UUID | Promotion reference |
|
||||
| `packet_type` | string | Type of evidence |
|
||||
| `content` | JSONB | Evidence content |
|
||||
| `content_hash` | string | SHA-256 of content |
|
||||
| `signature` | string | Cryptographic signature |
|
||||
| `signer_key_ref` | string | Signing key reference |
|
||||
| `created_at` | timestamp | Creation time (no update) |
|
||||
|
||||
### Version Sticker
|
||||
|
||||
Version marker placed on deployment targets.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `target_id` | UUID | Target reference |
|
||||
| `release_id` | UUID | Release reference |
|
||||
| `promotion_id` | UUID | Promotion reference |
|
||||
| `sticker_content` | JSONB | Sticker JSON content |
|
||||
| `content_hash` | string | Content hash |
|
||||
| `written_at` | timestamp | Write time |
|
||||
| `drift_detected` | boolean | Drift detection flag |
|
||||
|
||||
## Workflow Entities
|
||||
|
||||
### Workflow Template
|
||||
|
||||
A reusable workflow definition.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference (null for builtin) |
|
||||
| `name` | string | Template name |
|
||||
| `version` | integer | Template version |
|
||||
| `nodes` | JSONB | Step nodes |
|
||||
| `edges` | JSONB | Step edges |
|
||||
| `inputs` | JSONB | Input definitions |
|
||||
| `outputs` | JSONB | Output definitions |
|
||||
| `is_builtin` | boolean | Is built-in template |
|
||||
|
||||
### Workflow Run
|
||||
|
||||
An execution of a workflow template.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `template_id` | UUID | Template reference |
|
||||
| `template_version` | integer | Template version at execution |
|
||||
| `status` | string | Run status |
|
||||
| `context` | JSONB | Execution context |
|
||||
| `inputs` | JSONB | Input values |
|
||||
| `outputs` | JSONB | Output values |
|
||||
| `started_at` | timestamp | Start time |
|
||||
| `completed_at` | timestamp | Completion time |
|
||||
|
||||
### Step Run
|
||||
|
||||
Execution of a single step within a workflow run.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `workflow_run_id` | UUID | Workflow run reference |
|
||||
| `node_id` | string | Node ID from template |
|
||||
| `status` | string | Step status |
|
||||
| `inputs` | JSONB | Resolved inputs |
|
||||
| `outputs` | JSONB | Produced outputs |
|
||||
| `logs` | text | Execution logs |
|
||||
| `attempt_number` | integer | Retry attempt number |
|
||||
|
||||
## Plugin Entities
|
||||
|
||||
### Plugin
|
||||
|
||||
A registered plugin.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `plugin_id` | string | Unique plugin identifier |
|
||||
| `version` | string | Plugin version |
|
||||
| `vendor` | string | Plugin vendor |
|
||||
| `manifest` | JSONB | Plugin manifest |
|
||||
| `status` | string | Plugin status |
|
||||
| `entrypoint` | string | Plugin entrypoint path |
|
||||
|
||||
### Plugin Instance
|
||||
|
||||
A tenant-specific plugin configuration.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `plugin_id` | UUID | Plugin reference |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `config` | JSONB | Tenant configuration |
|
||||
| `enabled` | boolean | Is enabled for tenant |
|
||||
|
||||
## Integration Entities
|
||||
|
||||
### Integration
|
||||
|
||||
A configured external integration.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | UUID | Primary key |
|
||||
| `tenant_id` | UUID | Tenant reference |
|
||||
| `type_id` | string | Integration type |
|
||||
| `name` | string | Integration name |
|
||||
| `config` | JSONB | Integration configuration |
|
||||
| `credential_ref` | string | Vault credential reference |
|
||||
| `health_status` | string | Connection health |
|
||||
|
||||
## References
|
||||
|
||||
- [Database Schema](schema.md)
|
||||
- [Module Overview](../modules/overview.md)
|
||||
631
docs/modules/release-orchestrator/data-model/schema.md
Normal file
631
docs/modules/release-orchestrator/data-model/schema.md
Normal file
@@ -0,0 +1,631 @@
|
||||
# Database Schema (PostgreSQL)
|
||||
|
||||
This document specifies the complete PostgreSQL schema for the Release Orchestrator.
|
||||
|
||||
## Schema Organization
|
||||
|
||||
All release orchestration tables reside in the `release` schema:
|
||||
|
||||
```sql
|
||||
CREATE SCHEMA IF NOT EXISTS release;
|
||||
SET search_path TO release, public;
|
||||
```
|
||||
|
||||
## Core Tables
|
||||
|
||||
### Tenant and Authority Extensions
|
||||
|
||||
```sql
|
||||
-- Extended: Add release-related permissions
|
||||
ALTER TABLE permissions ADD COLUMN IF NOT EXISTS
|
||||
resource_type VARCHAR(50) CHECK (resource_type IN (
|
||||
'environment', 'release', 'promotion', 'target', 'workflow', 'plugin'
|
||||
));
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration Hub
|
||||
|
||||
```sql
|
||||
CREATE TABLE integration_types (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
name VARCHAR(100) NOT NULL UNIQUE,
|
||||
category VARCHAR(50) NOT NULL CHECK (category IN (
|
||||
'scm', 'ci', 'registry', 'vault', 'target', 'router'
|
||||
)),
|
||||
plugin_id UUID REFERENCES plugins(id),
|
||||
config_schema JSONB NOT NULL,
|
||||
secrets_schema JSONB NOT NULL,
|
||||
is_builtin BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE TABLE integrations (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
integration_type_id UUID NOT NULL REFERENCES integration_types(id),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
config JSONB NOT NULL,
|
||||
credential_ref VARCHAR(500), -- Vault path or encrypted ref
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'unknown' CHECK (status IN (
|
||||
'healthy', 'degraded', 'unhealthy', 'unknown'
|
||||
)),
|
||||
last_health_check TIMESTAMPTZ,
|
||||
last_health_message TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
created_by UUID REFERENCES users(id),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_integrations_tenant ON integrations(tenant_id);
|
||||
CREATE INDEX idx_integrations_type ON integrations(integration_type_id);
|
||||
|
||||
CREATE TABLE connection_profiles (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
user_id UUID NOT NULL REFERENCES users(id),
|
||||
integration_type_id UUID NOT NULL REFERENCES integration_types(id),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
config_defaults JSONB NOT NULL,
|
||||
is_default BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, user_id, integration_type_id, name)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Environment & Inventory
|
||||
|
||||
```sql
|
||||
CREATE TABLE environments (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(100) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
order_index INTEGER NOT NULL,
|
||||
config JSONB NOT NULL DEFAULT '{}',
|
||||
freeze_windows JSONB NOT NULL DEFAULT '[]',
|
||||
required_approvals INTEGER NOT NULL DEFAULT 0,
|
||||
require_sod BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
auto_promote_from UUID REFERENCES environments(id),
|
||||
promotion_policy VARCHAR(255),
|
||||
deployment_timeout INTEGER NOT NULL DEFAULT 600,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_environments_tenant ON environments(tenant_id);
|
||||
CREATE INDEX idx_environments_order ON environments(tenant_id, order_index);
|
||||
|
||||
CREATE TABLE target_groups (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
labels JSONB NOT NULL DEFAULT '{}',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, environment_id, name)
|
||||
);
|
||||
|
||||
CREATE TABLE targets (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
|
||||
target_group_id UUID REFERENCES target_groups(id),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
target_type VARCHAR(100) NOT NULL,
|
||||
connection JSONB NOT NULL,
|
||||
capabilities JSONB NOT NULL DEFAULT '[]',
|
||||
labels JSONB NOT NULL DEFAULT '{}',
|
||||
deployment_directory VARCHAR(500),
|
||||
health_status VARCHAR(50) NOT NULL DEFAULT 'unknown' CHECK (health_status IN (
|
||||
'healthy', 'degraded', 'unhealthy', 'unknown'
|
||||
)),
|
||||
last_health_check TIMESTAMPTZ,
|
||||
current_digest VARCHAR(100),
|
||||
agent_id UUID REFERENCES agents(id),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, environment_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_targets_tenant_env ON targets(tenant_id, environment_id);
|
||||
CREATE INDEX idx_targets_type ON targets(target_type);
|
||||
CREATE INDEX idx_targets_labels ON targets USING GIN (labels);
|
||||
|
||||
CREATE TABLE agents (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
version VARCHAR(50) NOT NULL,
|
||||
capabilities JSONB NOT NULL DEFAULT '[]',
|
||||
labels JSONB NOT NULL DEFAULT '{}',
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'offline' CHECK (status IN (
|
||||
'online', 'offline', 'degraded'
|
||||
)),
|
||||
last_heartbeat TIMESTAMPTZ,
|
||||
resource_usage JSONB,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_agents_tenant ON agents(tenant_id);
|
||||
CREATE INDEX idx_agents_status ON agents(status);
|
||||
CREATE INDEX idx_agents_capabilities ON agents USING GIN (capabilities);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Release Management
|
||||
|
||||
```sql
|
||||
CREATE TABLE components (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
image_repository VARCHAR(500) NOT NULL,
|
||||
registry_integration_id UUID REFERENCES integrations(id),
|
||||
versioning_strategy JSONB NOT NULL DEFAULT '{"type": "semver"}',
|
||||
deployment_template VARCHAR(255),
|
||||
default_channel VARCHAR(50) NOT NULL DEFAULT 'stable',
|
||||
metadata JSONB NOT NULL DEFAULT '{}',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_components_tenant ON components(tenant_id);
|
||||
|
||||
CREATE TABLE version_maps (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
component_id UUID NOT NULL REFERENCES components(id) ON DELETE CASCADE,
|
||||
tag VARCHAR(255) NOT NULL,
|
||||
digest VARCHAR(100) NOT NULL,
|
||||
semver VARCHAR(50),
|
||||
channel VARCHAR(50) NOT NULL DEFAULT 'stable',
|
||||
prerelease BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
build_metadata VARCHAR(255),
|
||||
resolved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
source VARCHAR(50) NOT NULL DEFAULT 'auto' CHECK (source IN ('auto', 'manual')),
|
||||
UNIQUE (tenant_id, component_id, digest)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_version_maps_component ON version_maps(component_id);
|
||||
CREATE INDEX idx_version_maps_digest ON version_maps(digest);
|
||||
CREATE INDEX idx_version_maps_semver ON version_maps(semver);
|
||||
|
||||
CREATE TABLE releases (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
components JSONB NOT NULL, -- [{componentId, digest, semver, tag, role}]
|
||||
source_ref JSONB, -- {scmIntegrationId, commitSha, ciIntegrationId, buildId}
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'draft' CHECK (status IN (
|
||||
'draft', 'ready', 'promoting', 'deployed', 'deprecated', 'archived'
|
||||
)),
|
||||
metadata JSONB NOT NULL DEFAULT '{}',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
created_by UUID REFERENCES users(id),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_releases_tenant ON releases(tenant_id);
|
||||
CREATE INDEX idx_releases_status ON releases(status);
|
||||
CREATE INDEX idx_releases_created ON releases(created_at DESC);
|
||||
|
||||
CREATE TABLE release_environment_state (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
|
||||
release_id UUID NOT NULL REFERENCES releases(id),
|
||||
status VARCHAR(50) NOT NULL CHECK (status IN (
|
||||
'deployed', 'deploying', 'failed', 'rolling_back', 'rolled_back'
|
||||
)),
|
||||
deployed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
deployed_by UUID REFERENCES users(id),
|
||||
promotion_id UUID, -- will reference promotions
|
||||
evidence_ref VARCHAR(255),
|
||||
UNIQUE (tenant_id, environment_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_release_env_state_env ON release_environment_state(environment_id);
|
||||
CREATE INDEX idx_release_env_state_release ON release_environment_state(release_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow Engine
|
||||
|
||||
```sql
|
||||
CREATE TABLE workflow_templates (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE, -- NULL for builtin
|
||||
name VARCHAR(255) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
description TEXT,
|
||||
version INTEGER NOT NULL DEFAULT 1,
|
||||
nodes JSONB NOT NULL,
|
||||
edges JSONB NOT NULL,
|
||||
inputs JSONB NOT NULL DEFAULT '[]',
|
||||
outputs JSONB NOT NULL DEFAULT '[]',
|
||||
is_builtin BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
tags JSONB NOT NULL DEFAULT '[]',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
created_by UUID REFERENCES users(id),
|
||||
UNIQUE (tenant_id, name, version)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_workflow_templates_tenant ON workflow_templates(tenant_id);
|
||||
CREATE INDEX idx_workflow_templates_builtin ON workflow_templates(is_builtin);
|
||||
|
||||
CREATE TABLE workflow_runs (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
template_id UUID NOT NULL REFERENCES workflow_templates(id),
|
||||
template_version INTEGER NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN (
|
||||
'created', 'running', 'paused', 'succeeded', 'failed', 'cancelled'
|
||||
)),
|
||||
context JSONB NOT NULL, -- inputs, variables, release info
|
||||
outputs JSONB,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
error_message TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
triggered_by UUID REFERENCES users(id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_workflow_runs_tenant ON workflow_runs(tenant_id);
|
||||
CREATE INDEX idx_workflow_runs_status ON workflow_runs(status);
|
||||
CREATE INDEX idx_workflow_runs_template ON workflow_runs(template_id);
|
||||
|
||||
CREATE TABLE step_runs (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
workflow_run_id UUID NOT NULL REFERENCES workflow_runs(id) ON DELETE CASCADE,
|
||||
node_id VARCHAR(100) NOT NULL,
|
||||
step_type VARCHAR(100) NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
|
||||
'pending', 'running', 'succeeded', 'failed', 'skipped', 'retrying', 'cancelled'
|
||||
)),
|
||||
inputs JSONB NOT NULL,
|
||||
config JSONB NOT NULL,
|
||||
outputs JSONB,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
attempt_number INTEGER NOT NULL DEFAULT 1,
|
||||
error_message TEXT,
|
||||
error_type VARCHAR(100),
|
||||
logs TEXT,
|
||||
artifacts JSONB NOT NULL DEFAULT '[]',
|
||||
t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional)
|
||||
ts_wall TIMESTAMPTZ, -- Wall-clock timestamp for debugging (optional)
|
||||
UNIQUE (workflow_run_id, node_id, attempt_number)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_step_runs_workflow ON step_runs(workflow_run_id);
|
||||
CREATE INDEX idx_step_runs_status ON step_runs(status);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Promotion & Approval
|
||||
|
||||
```sql
|
||||
CREATE TABLE promotions (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
release_id UUID NOT NULL REFERENCES releases(id),
|
||||
source_environment_id UUID REFERENCES environments(id),
|
||||
target_environment_id UUID NOT NULL REFERENCES environments(id),
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending_approval' CHECK (status IN (
|
||||
'pending_approval', 'pending_gate', 'approved', 'rejected',
|
||||
'deploying', 'deployed', 'failed', 'cancelled', 'rolled_back'
|
||||
)),
|
||||
decision_record JSONB,
|
||||
workflow_run_id UUID REFERENCES workflow_runs(id),
|
||||
requested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
requested_by UUID NOT NULL REFERENCES users(id),
|
||||
request_reason TEXT,
|
||||
decided_at TIMESTAMPTZ,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
evidence_packet_id UUID,
|
||||
t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional)
|
||||
ts_wall TIMESTAMPTZ -- Wall-clock timestamp for debugging (optional)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_promotions_tenant ON promotions(tenant_id);
|
||||
CREATE INDEX idx_promotions_release ON promotions(release_id);
|
||||
CREATE INDEX idx_promotions_status ON promotions(status);
|
||||
CREATE INDEX idx_promotions_target_env ON promotions(target_environment_id);
|
||||
|
||||
-- Add FK to release_environment_state
|
||||
ALTER TABLE release_environment_state
|
||||
ADD CONSTRAINT fk_release_env_state_promotion
|
||||
FOREIGN KEY (promotion_id) REFERENCES promotions(id);
|
||||
|
||||
CREATE TABLE approvals (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
promotion_id UUID NOT NULL REFERENCES promotions(id) ON DELETE CASCADE,
|
||||
approver_id UUID NOT NULL REFERENCES users(id),
|
||||
action VARCHAR(50) NOT NULL CHECK (action IN ('approved', 'rejected')),
|
||||
comment TEXT,
|
||||
approved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
approver_role VARCHAR(255),
|
||||
approver_groups JSONB NOT NULL DEFAULT '[]'
|
||||
);
|
||||
|
||||
CREATE INDEX idx_approvals_promotion ON approvals(promotion_id);
|
||||
CREATE INDEX idx_approvals_approver ON approvals(approver_id);
|
||||
|
||||
CREATE TABLE approval_policies (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
|
||||
required_count INTEGER NOT NULL DEFAULT 1,
|
||||
required_roles JSONB NOT NULL DEFAULT '[]',
|
||||
required_groups JSONB NOT NULL DEFAULT '[]',
|
||||
require_sod BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
allow_self_approval BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
expiration_minutes INTEGER NOT NULL DEFAULT 1440,
|
||||
UNIQUE (tenant_id, environment_id)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment
|
||||
|
||||
```sql
|
||||
CREATE TABLE deployment_jobs (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
promotion_id UUID NOT NULL REFERENCES promotions(id),
|
||||
release_id UUID NOT NULL REFERENCES releases(id),
|
||||
environment_id UUID NOT NULL REFERENCES environments(id),
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
|
||||
'pending', 'running', 'succeeded', 'failed', 'cancelled', 'rolling_back', 'rolled_back'
|
||||
)),
|
||||
strategy VARCHAR(50) NOT NULL DEFAULT 'all-at-once',
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
artifacts JSONB NOT NULL DEFAULT '[]',
|
||||
rollback_of UUID REFERENCES deployment_jobs(id),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
t_hlc BIGINT, -- Hybrid Logical Clock for ordering (optional)
|
||||
ts_wall TIMESTAMPTZ -- Wall-clock timestamp for debugging (optional)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_deployment_jobs_promotion ON deployment_jobs(promotion_id);
|
||||
CREATE INDEX idx_deployment_jobs_status ON deployment_jobs(status);
|
||||
|
||||
CREATE TABLE deployment_tasks (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
job_id UUID NOT NULL REFERENCES deployment_jobs(id) ON DELETE CASCADE,
|
||||
target_id UUID NOT NULL REFERENCES targets(id),
|
||||
digest VARCHAR(100) NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
|
||||
'pending', 'running', 'succeeded', 'failed', 'cancelled', 'skipped'
|
||||
)),
|
||||
agent_id UUID REFERENCES agents(id),
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
exit_code INTEGER,
|
||||
logs TEXT,
|
||||
previous_digest VARCHAR(100),
|
||||
sticker_written BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_deployment_tasks_job ON deployment_tasks(job_id);
|
||||
CREATE INDEX idx_deployment_tasks_target ON deployment_tasks(target_id);
|
||||
CREATE INDEX idx_deployment_tasks_status ON deployment_tasks(status);
|
||||
|
||||
CREATE TABLE generated_artifacts (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
deployment_job_id UUID REFERENCES deployment_jobs(id) ON DELETE CASCADE,
|
||||
artifact_type VARCHAR(50) NOT NULL CHECK (artifact_type IN (
|
||||
'compose_lock', 'script', 'sticker', 'evidence', 'config'
|
||||
)),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
content_hash VARCHAR(100) NOT NULL,
|
||||
content BYTEA, -- for small artifacts
|
||||
storage_ref VARCHAR(500), -- for large artifacts (S3, etc.)
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_generated_artifacts_job ON generated_artifacts(deployment_job_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Progressive Delivery
|
||||
|
||||
```sql
|
||||
CREATE TABLE ab_releases (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES environments(id),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
variations JSONB NOT NULL, -- [{name, releaseId, targetGroupId, trafficPercentage}]
|
||||
active_variation VARCHAR(50) NOT NULL DEFAULT 'A',
|
||||
traffic_split JSONB NOT NULL,
|
||||
rollout_strategy JSONB NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN (
|
||||
'created', 'deploying', 'running', 'promoting', 'completed', 'rolled_back'
|
||||
)),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
completed_at TIMESTAMPTZ,
|
||||
created_by UUID REFERENCES users(id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_ab_releases_tenant_env ON ab_releases(tenant_id, environment_id);
|
||||
CREATE INDEX idx_ab_releases_status ON ab_releases(status);
|
||||
|
||||
CREATE TABLE canary_stages (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
ab_release_id UUID NOT NULL REFERENCES ab_releases(id) ON DELETE CASCADE,
|
||||
stage_number INTEGER NOT NULL,
|
||||
traffic_percentage INTEGER NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
|
||||
'pending', 'running', 'succeeded', 'failed', 'skipped'
|
||||
)),
|
||||
health_threshold DECIMAL(5,2),
|
||||
duration_seconds INTEGER,
|
||||
require_approval BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
health_result JSONB,
|
||||
UNIQUE (ab_release_id, stage_number)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Release Evidence
|
||||
|
||||
```sql
|
||||
CREATE TABLE evidence_packets (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
promotion_id UUID NOT NULL REFERENCES promotions(id),
|
||||
packet_type VARCHAR(50) NOT NULL CHECK (packet_type IN (
|
||||
'release_decision', 'deployment', 'rollback', 'ab_promotion'
|
||||
)),
|
||||
content JSONB NOT NULL,
|
||||
content_hash VARCHAR(100) NOT NULL,
|
||||
signature TEXT,
|
||||
signer_key_ref VARCHAR(255),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
-- Note: No UPDATE or DELETE allowed (append-only)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_evidence_packets_promotion ON evidence_packets(promotion_id);
|
||||
CREATE INDEX idx_evidence_packets_created ON evidence_packets(created_at DESC);
|
||||
|
||||
-- Append-only enforcement via trigger
|
||||
CREATE OR REPLACE FUNCTION prevent_evidence_modification()
|
||||
RETURNS TRIGGER AS $$
|
||||
BEGIN
|
||||
RAISE EXCEPTION 'Evidence packets are immutable and cannot be modified or deleted';
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
CREATE TRIGGER evidence_packets_immutable
|
||||
BEFORE UPDATE OR DELETE ON evidence_packets
|
||||
FOR EACH ROW EXECUTE FUNCTION prevent_evidence_modification();
|
||||
|
||||
CREATE TABLE version_stickers (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
target_id UUID NOT NULL REFERENCES targets(id),
|
||||
deployment_job_id UUID REFERENCES deployment_jobs(id),
|
||||
release_id UUID NOT NULL REFERENCES releases(id),
|
||||
digest VARCHAR(100) NOT NULL,
|
||||
sticker_content JSONB NOT NULL,
|
||||
written_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
verified_at TIMESTAMPTZ,
|
||||
verification_status VARCHAR(50) CHECK (verification_status IN ('valid', 'mismatch', 'missing'))
|
||||
);
|
||||
|
||||
CREATE INDEX idx_version_stickers_target ON version_stickers(target_id);
|
||||
CREATE INDEX idx_version_stickers_release ON version_stickers(release_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Plugin Infrastructure
|
||||
|
||||
```sql
|
||||
CREATE TABLE plugins (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
name VARCHAR(255) NOT NULL UNIQUE,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
version VARCHAR(50) NOT NULL,
|
||||
description TEXT,
|
||||
manifest JSONB NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'inactive' CHECK (status IN (
|
||||
'active', 'inactive', 'error'
|
||||
)),
|
||||
error_message TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE TABLE plugin_instances (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
plugin_id UUID NOT NULL REFERENCES plugins(id),
|
||||
config JSONB NOT NULL DEFAULT '{}',
|
||||
enabled BOOLEAN NOT NULL DEFAULT TRUE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, plugin_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_plugin_instances_tenant ON plugin_instances(tenant_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Hybrid Logical Clock (HLC) for Distributed Ordering
|
||||
|
||||
**Optional Enhancement**: For strict distributed ordering and multi-region support, the following tables include optional `t_hlc` (Hybrid Logical Clock timestamp) and `ts_wall` (wall-clock timestamp) columns:
|
||||
|
||||
- `promotions` — Promotion state transitions
|
||||
- `deployment_jobs` — Deployment task ordering
|
||||
- `step_runs` — Workflow step execution ordering
|
||||
|
||||
**When to use HLC**:
|
||||
- Multi-region deployments requiring strict causal ordering
|
||||
- Deterministic replay across distributed systems
|
||||
- Timeline event ordering in audit logs
|
||||
|
||||
**HLC Schema**:
|
||||
```sql
|
||||
t_hlc BIGINT -- HLC timestamp (monotonic, skew-tolerant)
|
||||
ts_wall TIMESTAMPTZ -- Wall-clock timestamp (informational)
|
||||
```
|
||||
|
||||
**Usage**:
|
||||
- `t_hlc` is generated by `IHybridLogicalClock.Tick()` on state transitions
|
||||
- `ts_wall` is populated by `TimeProvider.GetUtcNow()` for debugging
|
||||
- Index on `t_hlc` for ordering queries: `CREATE INDEX idx_promotions_hlc ON promotions(t_hlc);`
|
||||
|
||||
**Reference**: See [Implementation Guide](../implementation-guide.md#hybrid-logical-clock-hlc-for-distributed-ordering) for HLC usage patterns.
|
||||
|
||||
---
|
||||
|
||||
## Row-Level Security (Multi-Tenancy)
|
||||
|
||||
All tables with `tenant_id` should have RLS enabled:
|
||||
|
||||
```sql
|
||||
-- Enable RLS on all release tables
|
||||
ALTER TABLE integrations ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE environments ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE targets ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE releases ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE promotions ENABLE ROW LEVEL SECURITY;
|
||||
-- ... etc.
|
||||
|
||||
-- Example policy
|
||||
CREATE POLICY tenant_isolation ON integrations
|
||||
FOR ALL
|
||||
USING (tenant_id = current_setting('app.tenant_id')::UUID);
|
||||
```
|
||||
308
docs/modules/release-orchestrator/deployment/artifacts.md
Normal file
308
docs/modules/release-orchestrator/deployment/artifacts.md
Normal file
@@ -0,0 +1,308 @@
|
||||
# Artifact Generation
|
||||
|
||||
## Overview
|
||||
|
||||
Every deployment generates immutable artifacts that enable reproducibility, audit, and rollback.
|
||||
|
||||
## Generated Artifacts
|
||||
|
||||
### 1. Compose Lock File
|
||||
|
||||
**File:** `compose.stella.lock.yml`
|
||||
|
||||
A Docker Compose file with all image references pinned to specific digests.
|
||||
|
||||
```yaml
|
||||
# compose.stella.lock.yml
|
||||
# Generated by Stella Ops - DO NOT EDIT
|
||||
# Release: myapp-v2.3.1
|
||||
# Generated: 2026-01-10T14:30:00Z
|
||||
# Generator: stella-artifact-generator@1.5.0
|
||||
|
||||
version: "3.8"
|
||||
|
||||
services:
|
||||
api:
|
||||
image: registry.example.com/myapp/api@sha256:abc123...
|
||||
# Original tag: v2.3.1
|
||||
deploy:
|
||||
replicas: 2
|
||||
environment:
|
||||
- DATABASE_URL=${DATABASE_URL}
|
||||
- REDIS_URL=${REDIS_URL}
|
||||
labels:
|
||||
stella.component.id: "comp-api-uuid"
|
||||
stella.release.id: "rel-uuid"
|
||||
stella.digest: "sha256:abc123..."
|
||||
|
||||
worker:
|
||||
image: registry.example.com/myapp/worker@sha256:def456...
|
||||
# Original tag: v2.3.1
|
||||
deploy:
|
||||
replicas: 1
|
||||
labels:
|
||||
stella.component.id: "comp-worker-uuid"
|
||||
stella.release.id: "rel-uuid"
|
||||
stella.digest: "sha256:def456..."
|
||||
|
||||
# Stella metadata
|
||||
x-stella:
|
||||
release:
|
||||
id: "rel-uuid"
|
||||
name: "myapp-v2.3.1"
|
||||
created_at: "2026-01-10T14:00:00Z"
|
||||
environment:
|
||||
id: "env-uuid"
|
||||
name: "production"
|
||||
deployment:
|
||||
id: "deploy-uuid"
|
||||
started_at: "2026-01-10T14:30:00Z"
|
||||
checksums:
|
||||
sha256: "checksum-of-this-file"
|
||||
```
|
||||
|
||||
### 2. Version Sticker
|
||||
|
||||
**File:** `stella.version.json`
|
||||
|
||||
Metadata file placed on deployment targets indicating current deployment state.
|
||||
|
||||
```json
|
||||
{
|
||||
"version": "1.0",
|
||||
"generatedAt": "2026-01-10T14:35:00Z",
|
||||
"generator": "stella-artifact-generator@1.5.0",
|
||||
|
||||
"release": {
|
||||
"id": "rel-uuid",
|
||||
"name": "myapp-v2.3.1",
|
||||
"createdAt": "2026-01-10T14:00:00Z",
|
||||
"components": [
|
||||
{
|
||||
"name": "api",
|
||||
"digest": "sha256:abc123...",
|
||||
"semver": "2.3.1",
|
||||
"tag": "v2.3.1"
|
||||
},
|
||||
{
|
||||
"name": "worker",
|
||||
"digest": "sha256:def456...",
|
||||
"semver": "2.3.1",
|
||||
"tag": "v2.3.1"
|
||||
}
|
||||
]
|
||||
},
|
||||
|
||||
"deployment": {
|
||||
"id": "deploy-uuid",
|
||||
"promotionId": "promo-uuid",
|
||||
"environmentId": "env-uuid",
|
||||
"environmentName": "production",
|
||||
"targetId": "target-uuid",
|
||||
"targetName": "prod-web-01",
|
||||
"strategy": "rolling",
|
||||
"startedAt": "2026-01-10T14:30:00Z",
|
||||
"completedAt": "2026-01-10T14:35:00Z"
|
||||
},
|
||||
|
||||
"deployer": {
|
||||
"userId": "user-uuid",
|
||||
"userName": "john.doe",
|
||||
"agentId": "agent-uuid",
|
||||
"agentName": "prod-agent-01"
|
||||
},
|
||||
|
||||
"previous": {
|
||||
"releaseId": "prev-rel-uuid",
|
||||
"releaseName": "myapp-v2.3.0",
|
||||
"digest": "sha256:789..."
|
||||
},
|
||||
|
||||
"signature": "base64-encoded-signature",
|
||||
"signatureAlgorithm": "RS256",
|
||||
"signerKeyRef": "stella/signing/prod-key-2026"
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Evidence Packet
|
||||
|
||||
**File:** Evidence stored in database (exportable as JSON/PDF)
|
||||
|
||||
See [Evidence Schema](../appendices/evidence-schema.md) for full specification.
|
||||
|
||||
### 4. Deployment Script (Optional)
|
||||
|
||||
**File:** `deploy.stella.script.dll` or `deploy.stella.sh`
|
||||
|
||||
When deployments use C# or shell scripts with hooks:
|
||||
|
||||
```csharp
|
||||
// deploy.stella.csx (source, compiled to DLL)
|
||||
#r "nuget: StellaOps.Sdk, 1.0.0"
|
||||
|
||||
using StellaOps.Sdk;
|
||||
|
||||
// Pre-deploy hook
|
||||
await Context.RunPreDeployHook(async (ctx) => {
|
||||
await ctx.ExecuteCommand("./scripts/backup-database.sh");
|
||||
await ctx.HealthCheck("/ready", timeout: 30);
|
||||
});
|
||||
|
||||
// Deploy
|
||||
await Context.Deploy();
|
||||
|
||||
// Post-deploy hook
|
||||
await Context.RunPostDeployHook(async (ctx) => {
|
||||
await ctx.ExecuteCommand("./scripts/warm-cache.sh");
|
||||
await ctx.Notify("slack", "Deployment complete");
|
||||
});
|
||||
```
|
||||
|
||||
## Artifact Storage
|
||||
|
||||
### Storage Structure
|
||||
|
||||
```
|
||||
artifacts/
|
||||
├── {tenant_id}/
|
||||
│ ├── {deployment_id}/
|
||||
│ │ ├── compose.stella.lock.yml
|
||||
│ │ ├── deploy.stella.script.dll (if applicable)
|
||||
│ │ ├── deploy.stella.script.csx (source)
|
||||
│ │ ├── manifest.json
|
||||
│ │ └── checksums.sha256
|
||||
│ └── ...
|
||||
└── ...
|
||||
```
|
||||
|
||||
### Manifest File
|
||||
|
||||
```json
|
||||
{
|
||||
"version": "1.0",
|
||||
"deploymentId": "deploy-uuid",
|
||||
"createdAt": "2026-01-10T14:30:00Z",
|
||||
"artifacts": [
|
||||
{
|
||||
"name": "compose.stella.lock.yml",
|
||||
"type": "compose-lock",
|
||||
"size": 2048,
|
||||
"sha256": "abc123..."
|
||||
},
|
||||
{
|
||||
"name": "deploy.stella.script.dll",
|
||||
"type": "script-compiled",
|
||||
"size": 8192,
|
||||
"sha256": "def456..."
|
||||
}
|
||||
],
|
||||
"totalSize": 10240,
|
||||
"signature": "base64-signature"
|
||||
}
|
||||
```
|
||||
|
||||
## Artifact Generation Process
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ARTIFACT GENERATION FLOW │
|
||||
│ │
|
||||
│ ┌─────────────────┐ │
|
||||
│ │ Promotion │ │
|
||||
│ │ Approved │ │
|
||||
│ └────────┬────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ ARTIFACT GENERATOR │ │
|
||||
│ │ │ │
|
||||
│ │ 1. Load release bundle (components, digests) │ │
|
||||
│ │ 2. Load environment configuration (variables, secrets refs) │ │
|
||||
│ │ 3. Load workflow template (hooks, scripts) │ │
|
||||
│ │ 4. Generate compose.stella.lock.yml │ │
|
||||
│ │ 5. Compile scripts (if any) │ │
|
||||
│ │ 6. Generate version sticker template │ │
|
||||
│ │ 7. Compute checksums │ │
|
||||
│ │ 8. Sign artifacts │ │
|
||||
│ │ 9. Store in artifact storage │ │
|
||||
│ │ │ │
|
||||
│ └────────────────────────────┬────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ DEPLOYMENT ORCHESTRATOR │ │
|
||||
│ │ │ │
|
||||
│ │ Artifacts distributed to targets via agents │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Artifact Properties
|
||||
|
||||
### Immutability
|
||||
|
||||
Once generated, artifacts are never modified:
|
||||
- Content-addressed storage (hash in path/metadata)
|
||||
- No overwrite capability
|
||||
- Append-only storage pattern
|
||||
|
||||
### Integrity
|
||||
|
||||
All artifacts are:
|
||||
- Checksummed (SHA-256)
|
||||
- Signed with deployment key
|
||||
- Verifiable at deployment time
|
||||
|
||||
### Retention
|
||||
|
||||
| Environment | Retention Period |
|
||||
|-------------|------------------|
|
||||
| Development | 30 days |
|
||||
| Staging | 90 days |
|
||||
| Production | 7 years (compliance) |
|
||||
|
||||
## API Operations
|
||||
|
||||
```yaml
|
||||
# List artifacts for deployment
|
||||
GET /api/v1/deployment-jobs/{id}/artifacts
|
||||
Response: Artifact[]
|
||||
|
||||
# Download specific artifact
|
||||
GET /api/v1/deployment-jobs/{id}/artifacts/{name}
|
||||
Response: binary
|
||||
|
||||
# Get artifact manifest
|
||||
GET /api/v1/deployment-jobs/{id}/artifacts/manifest
|
||||
Response: ArtifactManifest
|
||||
|
||||
# Verify artifact integrity
|
||||
POST /api/v1/deployment-jobs/{id}/artifacts/{name}/verify
|
||||
Response: { valid: boolean, checksum: string, signature: string }
|
||||
```
|
||||
|
||||
## Drift Detection
|
||||
|
||||
Version stickers enable drift detection:
|
||||
|
||||
```typescript
|
||||
interface DriftCheck {
|
||||
targetId: UUID;
|
||||
expectedSticker: VersionSticker;
|
||||
actualSticker: VersionSticker | null;
|
||||
driftDetected: boolean;
|
||||
driftType?: "missing" | "corrupted" | "mismatch";
|
||||
details?: {
|
||||
expectedDigest: string;
|
||||
actualDigest: string;
|
||||
field: string;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Deployment Overview](overview.md)
|
||||
- [Deployment Strategies](strategies.md)
|
||||
- [Evidence Schema](../appendices/evidence-schema.md)
|
||||
671
docs/modules/release-orchestrator/deployment/overview.md
Normal file
671
docs/modules/release-orchestrator/deployment/overview.md
Normal file
@@ -0,0 +1,671 @@
|
||||
# Deployment Overview
|
||||
|
||||
## Purpose
|
||||
|
||||
The Deployment system executes the actual deployment of releases to target environments, managing deployment jobs, tasks, artifact generation, and rollback capabilities.
|
||||
|
||||
## Deployment Architecture
|
||||
|
||||
```
|
||||
DEPLOYMENT ARCHITECTURE
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ DEPLOY ORCHESTRATOR │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ DEPLOYMENT JOB MANAGER │ │
|
||||
│ │ │ │
|
||||
│ │ Promotion ───► Create Job ───► Plan Tasks ───► Execute Tasks │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────────────┼───────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌─────────────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │
|
||||
│ │ TARGET EXECUTOR │ │ RUNNER EXECUTOR │ │ ARTIFACT GENERATOR │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ - Task dispatch │ │ - Agent tasks │ │ - Compose files │ │
|
||||
│ │ - Status tracking │ │ - SSH tasks │ │ - Env configs │ │
|
||||
│ │ - Log aggregation │ │ - API tasks │ │ - Manifests │ │
|
||||
│ └─────────────────────┘ └─────────────────┘ └─────────────────────┘ │
|
||||
│ │ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌────────────────────────────┼────────────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Agent │ │ Agentless │ │ API │
|
||||
│ Execution │ │ Execution │ │ Execution │
|
||||
│ │ │ │ │ │
|
||||
│ Docker, │ │ SSH, │ │ ECS, │
|
||||
│ Compose │ │ WinRM │ │ Nomad │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
## Deployment Flow
|
||||
|
||||
### Standard Deployment Flow
|
||||
|
||||
```
|
||||
DEPLOYMENT FLOW
|
||||
|
||||
Promotion Deployment Task Agent/Target
|
||||
Approved Job Execution
|
||||
│ │ │ │
|
||||
│ Create Job │ │ │
|
||||
├───────────────►│ │ │
|
||||
│ │ │ │
|
||||
│ │ Generate │ │
|
||||
│ │ Artifacts │ │
|
||||
│ ├────────────────►│ │
|
||||
│ │ │ │
|
||||
│ │ Create Tasks │ │
|
||||
│ │ per Target │ │
|
||||
│ ├────────────────►│ │
|
||||
│ │ │ │
|
||||
│ │ │ Dispatch Task │
|
||||
│ │ ├────────────────►│
|
||||
│ │ │ │
|
||||
│ │ │ Execute │
|
||||
│ │ │ (Pull, Deploy) │
|
||||
│ │ │ │
|
||||
│ │ │ Report Status │
|
||||
│ │ │◄────────────────┤
|
||||
│ │ │ │
|
||||
│ │ Aggregate │ │
|
||||
│ │ Results │ │
|
||||
│ │◄────────────────┤ │
|
||||
│ │ │ │
|
||||
│ Job Complete │ │ │
|
||||
│◄───────────────┤ │ │
|
||||
│ │ │ │
|
||||
```
|
||||
|
||||
## Deployment Job
|
||||
|
||||
### Job Entity
|
||||
|
||||
```typescript
|
||||
interface DeploymentJob {
|
||||
id: UUID;
|
||||
promotionId: UUID;
|
||||
releaseId: UUID;
|
||||
environmentId: UUID;
|
||||
|
||||
// Execution configuration
|
||||
strategy: DeploymentStrategy;
|
||||
parallelism: number;
|
||||
|
||||
// Status tracking
|
||||
status: JobStatus;
|
||||
startedAt?: DateTime;
|
||||
completedAt?: DateTime;
|
||||
|
||||
// Artifacts
|
||||
artifacts: GeneratedArtifact[];
|
||||
|
||||
// Rollback reference
|
||||
rollbackOf?: UUID; // If this is a rollback job
|
||||
previousJobId?: UUID; // Previous successful job
|
||||
|
||||
// Tasks
|
||||
tasks: DeploymentTask[];
|
||||
}
|
||||
|
||||
type JobStatus =
|
||||
| "pending"
|
||||
| "preparing"
|
||||
| "running"
|
||||
| "completing"
|
||||
| "completed"
|
||||
| "failed"
|
||||
| "rolling_back"
|
||||
| "rolled_back";
|
||||
|
||||
type DeploymentStrategy =
|
||||
| "all-at-once"
|
||||
| "rolling"
|
||||
| "canary"
|
||||
| "blue-green";
|
||||
```
|
||||
|
||||
### Job State Machine
|
||||
|
||||
```
|
||||
JOB STATE MACHINE
|
||||
|
||||
┌──────────┐
|
||||
│ PENDING │
|
||||
└────┬─────┘
|
||||
│ start()
|
||||
▼
|
||||
┌──────────┐
|
||||
│PREPARING │
|
||||
│ │
|
||||
│ Generate │
|
||||
│ artifacts│
|
||||
└────┬─────┘
|
||||
│
|
||||
▼
|
||||
┌──────────┐
|
||||
│ RUNNING │◄────────────────┐
|
||||
│ │ │
|
||||
│ Execute │ │
|
||||
│ tasks │ │
|
||||
└────┬─────┘ │
|
||||
│ │
|
||||
┌───────────────┼───────────────┐ │
|
||||
│ │ │ │
|
||||
▼ ▼ ▼ │
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│COMPLETING│ │ FAILED │ │ ROLLING │ │
|
||||
│ │ │ │ │ BACK │──┘
|
||||
│ Verify │ │ │ │ │
|
||||
│ health │ │ │ │ │
|
||||
└────┬─────┘ └────┬─────┘ └────┬─────┘
|
||||
│ │ │
|
||||
▼ │ ▼
|
||||
┌──────────┐ │ ┌──────────┐
|
||||
│COMPLETED │ │ │ ROLLED │
|
||||
└──────────┘ │ │ BACK │
|
||||
│ └──────────┘
|
||||
│
|
||||
▼
|
||||
[Failure
|
||||
handling]
|
||||
```
|
||||
|
||||
## Deployment Task
|
||||
|
||||
### Task Entity
|
||||
|
||||
```typescript
|
||||
interface DeploymentTask {
|
||||
id: UUID;
|
||||
jobId: UUID;
|
||||
targetId: UUID;
|
||||
|
||||
// What to deploy
|
||||
componentId: UUID;
|
||||
digest: string;
|
||||
|
||||
// Execution
|
||||
status: TaskStatus;
|
||||
agentId?: UUID;
|
||||
startedAt?: DateTime;
|
||||
completedAt?: DateTime;
|
||||
|
||||
// Results
|
||||
logs: string;
|
||||
previousDigest?: string; // For rollback
|
||||
error?: string;
|
||||
|
||||
// Retry tracking
|
||||
attemptNumber: number;
|
||||
maxAttempts: number;
|
||||
}
|
||||
|
||||
type TaskStatus =
|
||||
| "pending"
|
||||
| "queued"
|
||||
| "dispatched"
|
||||
| "running"
|
||||
| "verifying"
|
||||
| "succeeded"
|
||||
| "failed"
|
||||
| "retrying";
|
||||
```
|
||||
|
||||
### Task Dispatch
|
||||
|
||||
```typescript
|
||||
class TaskDispatcher {
|
||||
async dispatchTask(task: DeploymentTask): Promise<void> {
|
||||
const target = await this.targetRepository.get(task.targetId);
|
||||
|
||||
switch (target.executionModel) {
|
||||
case "agent":
|
||||
await this.dispatchToAgent(task, target);
|
||||
break;
|
||||
|
||||
case "ssh":
|
||||
await this.dispatchViaSsh(task, target);
|
||||
break;
|
||||
|
||||
case "api":
|
||||
await this.dispatchViaApi(task, target);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
private async dispatchToAgent(
|
||||
task: DeploymentTask,
|
||||
target: Target
|
||||
): Promise<void> {
|
||||
// Find available agent for target
|
||||
const agent = await this.agentManager.findAgentForTarget(target);
|
||||
|
||||
if (!agent) {
|
||||
throw new NoAgentAvailableError(target.id);
|
||||
}
|
||||
|
||||
// Create task payload
|
||||
const payload: AgentTaskPayload = {
|
||||
taskId: task.id,
|
||||
targetId: target.id,
|
||||
action: "deploy",
|
||||
digest: task.digest,
|
||||
config: target.connection,
|
||||
credentials: await this.fetchTaskCredentials(target)
|
||||
};
|
||||
|
||||
// Dispatch to agent
|
||||
await this.agentClient.dispatchTask(agent.id, payload);
|
||||
|
||||
// Update task status
|
||||
task.status = "dispatched";
|
||||
task.agentId = agent.id;
|
||||
await this.taskRepository.update(task);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Generated Artifacts
|
||||
|
||||
### Artifact Types
|
||||
|
||||
| Type | Description | Format |
|
||||
|------|-------------|--------|
|
||||
| `compose-file` | Docker Compose file | YAML |
|
||||
| `compose-lock` | Pinned compose file | YAML |
|
||||
| `env-file` | Environment variables | .env |
|
||||
| `systemd-unit` | Systemd service unit | .service |
|
||||
| `nginx-config` | Nginx configuration | .conf |
|
||||
| `manifest` | Deployment manifest | JSON |
|
||||
|
||||
### Compose Lock Generation
|
||||
|
||||
```typescript
|
||||
interface ComposeLock {
|
||||
version: string;
|
||||
services: Record<string, LockedService>;
|
||||
generated: {
|
||||
releaseId: string;
|
||||
promotionId: string;
|
||||
timestamp: string;
|
||||
digest: string; // Hash of this file
|
||||
};
|
||||
}
|
||||
|
||||
interface LockedService {
|
||||
image: string; // Full image reference with digest
|
||||
environment?: Record<string, string>;
|
||||
labels: Record<string, string>;
|
||||
}
|
||||
|
||||
class ComposeArtifactGenerator {
|
||||
async generateLock(
|
||||
release: Release,
|
||||
target: Target,
|
||||
template: ComposeTemplate
|
||||
): Promise<ComposeLock> {
|
||||
const services: Record<string, LockedService> = {};
|
||||
|
||||
for (const [serviceName, serviceConfig] of Object.entries(template.services)) {
|
||||
// Find component for this service
|
||||
const componentDigest = release.components.find(
|
||||
c => c.name === serviceConfig.componentName
|
||||
);
|
||||
|
||||
if (!componentDigest) {
|
||||
throw new Error(`No component found for service ${serviceName}`);
|
||||
}
|
||||
|
||||
// Build locked image reference
|
||||
const imageRef = `${componentDigest.repository}@${componentDigest.digest}`;
|
||||
|
||||
services[serviceName] = {
|
||||
image: imageRef,
|
||||
environment: {
|
||||
...serviceConfig.environment,
|
||||
STELLA_RELEASE_ID: release.id,
|
||||
STELLA_DIGEST: componentDigest.digest
|
||||
},
|
||||
labels: {
|
||||
"stella.release.id": release.id,
|
||||
"stella.component.name": componentDigest.name,
|
||||
"stella.digest": componentDigest.digest,
|
||||
"stella.deployed.at": new Date().toISOString()
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
const lock: ComposeLock = {
|
||||
version: "3.8",
|
||||
services,
|
||||
generated: {
|
||||
releaseId: release.id,
|
||||
promotionId: target.promotionId,
|
||||
timestamp: new Date().toISOString(),
|
||||
digest: "" // Computed below
|
||||
}
|
||||
};
|
||||
|
||||
// Compute content hash
|
||||
const content = yaml.stringify(lock);
|
||||
lock.generated.digest = crypto.createHash("sha256").update(content).digest("hex");
|
||||
|
||||
return lock;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Deployment Execution
|
||||
|
||||
### Execution Models
|
||||
|
||||
| Model | Description | Use Case |
|
||||
|-------|-------------|----------|
|
||||
| `agent` | Stella agent on target | Docker hosts, servers |
|
||||
| `ssh` | SSH-based agentless | Unix servers |
|
||||
| `winrm` | WinRM-based agentless | Windows servers |
|
||||
| `api` | API-based | ECS, Nomad, K8s |
|
||||
|
||||
### Agent-Based Execution
|
||||
|
||||
```typescript
|
||||
class AgentExecutor {
|
||||
async execute(task: DeploymentTask): Promise<ExecutionResult> {
|
||||
const agent = await this.agentManager.get(task.agentId);
|
||||
const target = await this.targetRepository.get(task.targetId);
|
||||
|
||||
// Prepare task payload with secrets
|
||||
const payload: TaskPayload = {
|
||||
taskId: task.id,
|
||||
targetId: target.id,
|
||||
action: "deploy",
|
||||
digest: task.digest,
|
||||
config: target.connection,
|
||||
artifacts: await this.getArtifacts(task.jobId),
|
||||
credentials: await this.secretsManager.fetchForTask(target)
|
||||
};
|
||||
|
||||
// Dispatch to agent
|
||||
const taskRef = await this.agentClient.dispatchTask(agent.id, payload);
|
||||
|
||||
// Wait for completion
|
||||
const result = await this.waitForTaskCompletion(taskRef, task.timeout);
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
private async waitForTaskCompletion(
|
||||
taskRef: TaskReference,
|
||||
timeout: number
|
||||
): Promise<ExecutionResult> {
|
||||
const deadline = Date.now() + timeout * 1000;
|
||||
|
||||
while (Date.now() < deadline) {
|
||||
const status = await this.agentClient.getTaskStatus(taskRef);
|
||||
|
||||
if (status.completed) {
|
||||
return {
|
||||
success: status.success,
|
||||
logs: status.logs,
|
||||
deployedDigest: status.deployedDigest,
|
||||
error: status.error
|
||||
};
|
||||
}
|
||||
|
||||
await sleep(1000);
|
||||
}
|
||||
|
||||
throw new TimeoutError(`Task did not complete within ${timeout} seconds`);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### SSH-Based Execution
|
||||
|
||||
```typescript
|
||||
class SshExecutor {
|
||||
async execute(task: DeploymentTask): Promise<ExecutionResult> {
|
||||
const target = await this.targetRepository.get(task.targetId);
|
||||
const sshConfig = target.connection as SshConnectionConfig;
|
||||
|
||||
// Get SSH credentials from vault
|
||||
const creds = await this.secretsManager.fetchSshCredentials(
|
||||
sshConfig.credentialRef
|
||||
);
|
||||
|
||||
// Connect via SSH
|
||||
const ssh = new NodeSSH();
|
||||
await ssh.connect({
|
||||
host: sshConfig.host,
|
||||
port: sshConfig.port || 22,
|
||||
username: creds.username,
|
||||
privateKey: creds.privateKey
|
||||
});
|
||||
|
||||
try {
|
||||
// Upload artifacts
|
||||
const artifacts = await this.getArtifacts(task.jobId);
|
||||
for (const artifact of artifacts) {
|
||||
await ssh.putFile(artifact.localPath, artifact.remotePath);
|
||||
}
|
||||
|
||||
// Execute deployment script
|
||||
const result = await ssh.execCommand(
|
||||
this.buildDeployCommand(task, target),
|
||||
{ cwd: sshConfig.workDir }
|
||||
);
|
||||
|
||||
return {
|
||||
success: result.code === 0,
|
||||
logs: `${result.stdout}\n${result.stderr}`,
|
||||
error: result.code !== 0 ? result.stderr : undefined
|
||||
};
|
||||
} finally {
|
||||
ssh.dispose();
|
||||
}
|
||||
}
|
||||
|
||||
private buildDeployCommand(task: DeploymentTask, target: Target): string {
|
||||
// Build deployment command based on target type
|
||||
switch (target.targetType) {
|
||||
case "compose_host":
|
||||
return `cd ${target.connection.workDir} && docker-compose pull && docker-compose up -d`;
|
||||
|
||||
case "docker_host":
|
||||
return `docker pull ${task.digest} && docker stop ${target.containerName} && docker run -d --name ${target.containerName} ${task.digest}`;
|
||||
|
||||
default:
|
||||
throw new Error(`Unsupported target type: ${target.targetType}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Health Verification
|
||||
|
||||
```typescript
|
||||
interface HealthCheckConfig {
|
||||
type: "http" | "tcp" | "command";
|
||||
timeout: number;
|
||||
retries: number;
|
||||
interval: number;
|
||||
|
||||
// HTTP-specific
|
||||
path?: string;
|
||||
expectedStatus?: number;
|
||||
expectedBody?: string;
|
||||
|
||||
// TCP-specific
|
||||
port?: number;
|
||||
|
||||
// Command-specific
|
||||
command?: string;
|
||||
}
|
||||
|
||||
class HealthVerifier {
|
||||
async verify(
|
||||
target: Target,
|
||||
config: HealthCheckConfig
|
||||
): Promise<HealthCheckResult> {
|
||||
let lastError: Error | undefined;
|
||||
|
||||
for (let attempt = 0; attempt < config.retries; attempt++) {
|
||||
try {
|
||||
const result = await this.performCheck(target, config);
|
||||
|
||||
if (result.healthy) {
|
||||
return result;
|
||||
}
|
||||
|
||||
lastError = new Error(result.message);
|
||||
} catch (error) {
|
||||
lastError = error as Error;
|
||||
}
|
||||
|
||||
if (attempt < config.retries - 1) {
|
||||
await sleep(config.interval * 1000);
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
healthy: false,
|
||||
message: lastError?.message || "Health check failed",
|
||||
attempts: config.retries
|
||||
};
|
||||
}
|
||||
|
||||
private async performCheck(
|
||||
target: Target,
|
||||
config: HealthCheckConfig
|
||||
): Promise<HealthCheckResult> {
|
||||
switch (config.type) {
|
||||
case "http":
|
||||
return this.httpCheck(target, config);
|
||||
|
||||
case "tcp":
|
||||
return this.tcpCheck(target, config);
|
||||
|
||||
case "command":
|
||||
return this.commandCheck(target, config);
|
||||
}
|
||||
}
|
||||
|
||||
private async httpCheck(
|
||||
target: Target,
|
||||
config: HealthCheckConfig
|
||||
): Promise<HealthCheckResult> {
|
||||
const url = `${target.healthEndpoint}${config.path || "/health"}`;
|
||||
|
||||
try {
|
||||
const response = await fetch(url, {
|
||||
signal: AbortSignal.timeout(config.timeout * 1000)
|
||||
});
|
||||
|
||||
const healthy = response.status === (config.expectedStatus || 200);
|
||||
|
||||
return {
|
||||
healthy,
|
||||
message: healthy ? "OK" : `Status ${response.status}`,
|
||||
statusCode: response.status
|
||||
};
|
||||
} catch (error) {
|
||||
return {
|
||||
healthy: false,
|
||||
message: (error as Error).message
|
||||
};
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Rollback Management
|
||||
|
||||
```typescript
|
||||
class RollbackManager {
|
||||
async initiateRollback(
|
||||
jobId: UUID,
|
||||
reason: string
|
||||
): Promise<DeploymentJob> {
|
||||
const failedJob = await this.jobRepository.get(jobId);
|
||||
const previousJob = await this.findPreviousSuccessfulJob(
|
||||
failedJob.environmentId,
|
||||
failedJob.releaseId
|
||||
);
|
||||
|
||||
if (!previousJob) {
|
||||
throw new NoRollbackTargetError(jobId);
|
||||
}
|
||||
|
||||
// Create rollback job
|
||||
const rollbackJob: DeploymentJob = {
|
||||
id: uuidv4(),
|
||||
promotionId: failedJob.promotionId,
|
||||
releaseId: previousJob.releaseId, // Previous release
|
||||
environmentId: failedJob.environmentId,
|
||||
strategy: "all-at-once", // Fast rollback
|
||||
parallelism: 10,
|
||||
status: "pending",
|
||||
rollbackOf: jobId,
|
||||
previousJobId: previousJob.id,
|
||||
artifacts: [],
|
||||
tasks: []
|
||||
};
|
||||
|
||||
// Create tasks to restore previous state
|
||||
for (const task of failedJob.tasks) {
|
||||
const previousTask = previousJob.tasks.find(
|
||||
t => t.targetId === task.targetId
|
||||
);
|
||||
|
||||
if (previousTask) {
|
||||
rollbackJob.tasks.push({
|
||||
id: uuidv4(),
|
||||
jobId: rollbackJob.id,
|
||||
targetId: task.targetId,
|
||||
componentId: previousTask.componentId,
|
||||
digest: previousTask.previousDigest || task.previousDigest!,
|
||||
status: "pending",
|
||||
logs: "",
|
||||
attemptNumber: 0,
|
||||
maxAttempts: 3
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
await this.jobRepository.save(rollbackJob);
|
||||
|
||||
// Execute rollback
|
||||
await this.executeJob(rollbackJob);
|
||||
|
||||
return rollbackJob;
|
||||
}
|
||||
|
||||
private async findPreviousSuccessfulJob(
|
||||
environmentId: UUID,
|
||||
excludeReleaseId: UUID
|
||||
): Promise<DeploymentJob | null> {
|
||||
return this.jobRepository.findOne({
|
||||
environmentId,
|
||||
status: "completed",
|
||||
releaseId: { $ne: excludeReleaseId }
|
||||
}, {
|
||||
orderBy: { completedAt: "desc" }
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Deployment Strategies](strategies.md)
|
||||
- [Agent-Based Deployment](agent-based.md)
|
||||
- [Agentless Deployment](agentless.md)
|
||||
- [Generated Artifacts](artifacts.md)
|
||||
- [Deploy Orchestrator Module](../modules/deploy-orchestrator.md)
|
||||
656
docs/modules/release-orchestrator/deployment/strategies.md
Normal file
656
docs/modules/release-orchestrator/deployment/strategies.md
Normal file
@@ -0,0 +1,656 @@
|
||||
# Deployment Strategies
|
||||
|
||||
## Overview
|
||||
|
||||
Release Orchestrator supports multiple deployment strategies to balance deployment speed, risk, and availability requirements.
|
||||
|
||||
## Strategy Comparison
|
||||
|
||||
| Strategy | Description | Risk Level | Downtime | Rollback Speed |
|
||||
|----------|-------------|------------|----------|----------------|
|
||||
| All-at-once | Deploy to all targets simultaneously | High | Brief | Fast |
|
||||
| Rolling | Deploy to targets in batches | Medium | None | Medium |
|
||||
| Canary | Deploy to subset, then expand | Low | None | Fast |
|
||||
| Blue-Green | Deploy to parallel environment | Low | None | Instant |
|
||||
|
||||
## All-at-Once Strategy
|
||||
|
||||
### Description
|
||||
|
||||
Deploys to all targets simultaneously. Simple and fast, but highest risk.
|
||||
|
||||
```
|
||||
ALL-AT-ONCE DEPLOYMENT
|
||||
|
||||
Time 0 Time 1
|
||||
┌─────────────────┐ ┌─────────────────┐
|
||||
│ Target 1 [v1] │ │ Target 1 [v2] │
|
||||
├─────────────────┤ ├─────────────────┤
|
||||
│ Target 2 [v1] │ ───► │ Target 2 [v2] │
|
||||
├─────────────────┤ ├─────────────────┤
|
||||
│ Target 3 [v1] │ │ Target 3 [v2] │
|
||||
└─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
```typescript
|
||||
interface AllAtOnceConfig {
|
||||
strategy: "all-at-once";
|
||||
|
||||
// Concurrency limit (0 = unlimited)
|
||||
maxConcurrent: number;
|
||||
|
||||
// Health check after deployment
|
||||
healthCheck: HealthCheckConfig;
|
||||
|
||||
// Failure behavior
|
||||
failureBehavior: "rollback" | "continue" | "pause";
|
||||
}
|
||||
|
||||
// Example
|
||||
const config: AllAtOnceConfig = {
|
||||
strategy: "all-at-once",
|
||||
maxConcurrent: 0,
|
||||
healthCheck: {
|
||||
type: "http",
|
||||
path: "/health",
|
||||
timeout: 30,
|
||||
retries: 3,
|
||||
interval: 10
|
||||
},
|
||||
failureBehavior: "rollback"
|
||||
};
|
||||
```
|
||||
|
||||
### Execution
|
||||
|
||||
```typescript
|
||||
class AllAtOnceExecutor {
|
||||
async execute(job: DeploymentJob, config: AllAtOnceConfig): Promise<void> {
|
||||
const tasks = job.tasks;
|
||||
const concurrency = config.maxConcurrent || tasks.length;
|
||||
|
||||
// Execute all tasks with concurrency limit
|
||||
const results = await pMap(
|
||||
tasks,
|
||||
async (task) => {
|
||||
try {
|
||||
await this.executeTask(task);
|
||||
return { taskId: task.id, success: true };
|
||||
} catch (error) {
|
||||
return { taskId: task.id, success: false, error };
|
||||
}
|
||||
},
|
||||
{ concurrency }
|
||||
);
|
||||
|
||||
// Check for failures
|
||||
const failures = results.filter(r => !r.success);
|
||||
|
||||
if (failures.length > 0) {
|
||||
if (config.failureBehavior === "rollback") {
|
||||
await this.rollbackAll(job);
|
||||
throw new DeploymentFailedError(failures);
|
||||
} else if (config.failureBehavior === "pause") {
|
||||
job.status = "failed";
|
||||
throw new DeploymentFailedError(failures);
|
||||
}
|
||||
// "continue" - proceed despite failures
|
||||
}
|
||||
|
||||
// Health check all targets
|
||||
await this.verifyAllTargets(job, config.healthCheck);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
- Development environments
|
||||
- Small deployments
|
||||
- Time-critical updates
|
||||
- Stateless services with fast startup
|
||||
|
||||
## Rolling Strategy
|
||||
|
||||
### Description
|
||||
|
||||
Deploys to targets in configurable batches, maintaining availability throughout.
|
||||
|
||||
```
|
||||
ROLLING DEPLOYMENT (batch size: 1)
|
||||
|
||||
Time 0 Time 1 Time 2 Time 3
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ T1 [v1] │ │ T1 [v2] ✓ │ │ T1 [v2] ✓ │ │ T1 [v2] ✓ │
|
||||
├─────────────┤ ├─────────────┤ ├─────────────┤ ├─────────────┤
|
||||
│ T2 [v1] │──►│ T2 [v1] │──►│ T2 [v2] ✓ │──►│ T2 [v2] ✓ │
|
||||
├─────────────┤ ├─────────────┤ ├─────────────┤ ├─────────────┤
|
||||
│ T3 [v1] │ │ T3 [v1] │ │ T3 [v1] │ │ T3 [v2] ✓ │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
```typescript
|
||||
interface RollingConfig {
|
||||
strategy: "rolling";
|
||||
|
||||
// Batch configuration
|
||||
batchSize: number; // Targets per batch
|
||||
batchPercent?: number; // Alternative: percentage of targets
|
||||
|
||||
// Timing
|
||||
batchDelay: number; // Seconds between batches
|
||||
stabilizationTime: number; // Wait after health check passes
|
||||
|
||||
// Health check
|
||||
healthCheck: HealthCheckConfig;
|
||||
|
||||
// Failure handling
|
||||
maxFailedBatches: number; // Failures before stopping
|
||||
failureBehavior: "rollback" | "pause" | "skip";
|
||||
|
||||
// Ordering
|
||||
targetOrder: "default" | "shuffle" | "priority";
|
||||
}
|
||||
|
||||
// Example
|
||||
const config: RollingConfig = {
|
||||
strategy: "rolling",
|
||||
batchSize: 2,
|
||||
batchDelay: 30,
|
||||
stabilizationTime: 60,
|
||||
healthCheck: {
|
||||
type: "http",
|
||||
path: "/health",
|
||||
timeout: 30,
|
||||
retries: 5,
|
||||
interval: 10
|
||||
},
|
||||
maxFailedBatches: 1,
|
||||
failureBehavior: "rollback",
|
||||
targetOrder: "default"
|
||||
};
|
||||
```
|
||||
|
||||
### Execution
|
||||
|
||||
```typescript
|
||||
class RollingExecutor {
|
||||
async execute(job: DeploymentJob, config: RollingConfig): Promise<void> {
|
||||
const tasks = this.orderTasks(job.tasks, config.targetOrder);
|
||||
const batches = this.createBatches(tasks, config);
|
||||
let failedBatches = 0;
|
||||
const completedTasks: DeploymentTask[] = [];
|
||||
|
||||
for (const batch of batches) {
|
||||
this.emitProgress(job, {
|
||||
phase: "deploying",
|
||||
currentBatch: batches.indexOf(batch) + 1,
|
||||
totalBatches: batches.length,
|
||||
completedTargets: completedTasks.length,
|
||||
totalTargets: tasks.length
|
||||
});
|
||||
|
||||
// Execute batch
|
||||
const results = await Promise.all(
|
||||
batch.map(task => this.executeTask(task))
|
||||
);
|
||||
|
||||
// Check batch results
|
||||
const failures = results.filter(r => !r.success);
|
||||
|
||||
if (failures.length > 0) {
|
||||
failedBatches++;
|
||||
|
||||
if (failedBatches > config.maxFailedBatches) {
|
||||
if (config.failureBehavior === "rollback") {
|
||||
await this.rollbackCompleted(completedTasks);
|
||||
}
|
||||
throw new DeploymentFailedError(failures);
|
||||
}
|
||||
|
||||
if (config.failureBehavior === "pause") {
|
||||
job.status = "failed";
|
||||
throw new DeploymentFailedError(failures);
|
||||
}
|
||||
// "skip" - continue to next batch
|
||||
}
|
||||
|
||||
// Health check batch targets
|
||||
await this.verifyBatch(batch, config.healthCheck);
|
||||
|
||||
// Wait for stabilization
|
||||
if (config.stabilizationTime > 0) {
|
||||
await sleep(config.stabilizationTime * 1000);
|
||||
}
|
||||
|
||||
completedTasks.push(...batch);
|
||||
|
||||
// Wait before next batch
|
||||
if (batches.indexOf(batch) < batches.length - 1) {
|
||||
await sleep(config.batchDelay * 1000);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private createBatches(
|
||||
tasks: DeploymentTask[],
|
||||
config: RollingConfig
|
||||
): DeploymentTask[][] {
|
||||
const batchSize = config.batchPercent
|
||||
? Math.ceil(tasks.length * config.batchPercent / 100)
|
||||
: config.batchSize;
|
||||
|
||||
const batches: DeploymentTask[][] = [];
|
||||
for (let i = 0; i < tasks.length; i += batchSize) {
|
||||
batches.push(tasks.slice(i, i + batchSize));
|
||||
}
|
||||
|
||||
return batches;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
- Production deployments
|
||||
- High-availability requirements
|
||||
- Large target counts
|
||||
- Services requiring gradual rollout
|
||||
|
||||
## Canary Strategy
|
||||
|
||||
### Description
|
||||
|
||||
Deploys to a small subset of targets first, validates, then expands to remaining targets.
|
||||
|
||||
```
|
||||
CANARY DEPLOYMENT
|
||||
|
||||
Phase 1: Canary (10%) Phase 2: Expand (50%) Phase 3: Full (100%)
|
||||
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ T1 [v2] ✓ │ ◄─canary │ T1 [v2] ✓ │ │ T1 [v2] ✓ │
|
||||
├─────────────┤ ├─────────────┤ ├─────────────┤
|
||||
│ T2 [v1] │ │ T2 [v2] ✓ │ │ T2 [v2] ✓ │
|
||||
├─────────────┤ ├─────────────┤ ├─────────────┤
|
||||
│ T3 [v1] │ │ T3 [v2] ✓ │ │ T3 [v2] ✓ │
|
||||
├─────────────┤ ├─────────────┤ ├─────────────┤
|
||||
│ T4 [v1] │ │ T4 [v2] ✓ │ │ T4 [v2] ✓ │
|
||||
├─────────────┤ ├─────────────┤ ├─────────────┤
|
||||
│ T5 [v1] │ │ T5 [v1] │ │ T5 [v2] ✓ │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
Health Check Health Check Health Check
|
||||
Error Rate Check Error Rate Check Error Rate Check
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
```typescript
|
||||
interface CanaryConfig {
|
||||
strategy: "canary";
|
||||
|
||||
// Canary stages
|
||||
stages: CanaryStage[];
|
||||
|
||||
// Canary selection
|
||||
canarySelector: "random" | "labeled" | "first";
|
||||
canaryLabel?: string; // Label for canary targets
|
||||
|
||||
// Automatic vs manual progression
|
||||
autoProgress: boolean;
|
||||
|
||||
// Health and metrics checks
|
||||
healthCheck: HealthCheckConfig;
|
||||
metricsCheck?: MetricsCheckConfig;
|
||||
}
|
||||
|
||||
interface CanaryStage {
|
||||
name: string;
|
||||
percentage: number; // Target percentage
|
||||
duration: number; // Minimum time at this stage (seconds)
|
||||
autoProgress: boolean; // Auto-advance after duration
|
||||
}
|
||||
|
||||
interface MetricsCheckConfig {
|
||||
integrationId: UUID; // Metrics integration
|
||||
queries: MetricQuery[];
|
||||
failureThreshold: number; // Percentage deviation to fail
|
||||
}
|
||||
|
||||
interface MetricQuery {
|
||||
name: string;
|
||||
query: string; // PromQL or similar
|
||||
operator: "lt" | "gt" | "eq";
|
||||
threshold: number;
|
||||
}
|
||||
|
||||
// Example
|
||||
const config: CanaryConfig = {
|
||||
strategy: "canary",
|
||||
stages: [
|
||||
{ name: "canary", percentage: 10, duration: 300, autoProgress: false },
|
||||
{ name: "expand", percentage: 50, duration: 300, autoProgress: true },
|
||||
{ name: "full", percentage: 100, duration: 0, autoProgress: true }
|
||||
],
|
||||
canarySelector: "labeled",
|
||||
canaryLabel: "canary=true",
|
||||
autoProgress: false,
|
||||
healthCheck: {
|
||||
type: "http",
|
||||
path: "/health",
|
||||
timeout: 30,
|
||||
retries: 5,
|
||||
interval: 10
|
||||
},
|
||||
metricsCheck: {
|
||||
integrationId: "prometheus-uuid",
|
||||
queries: [
|
||||
{
|
||||
name: "error_rate",
|
||||
query: "rate(http_requests_total{status=~\"5..\"}[5m]) / rate(http_requests_total[5m])",
|
||||
operator: "lt",
|
||||
threshold: 0.01 // Less than 1% error rate
|
||||
}
|
||||
],
|
||||
failureThreshold: 10
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### Execution
|
||||
|
||||
```typescript
|
||||
class CanaryExecutor {
|
||||
async execute(job: DeploymentJob, config: CanaryConfig): Promise<void> {
|
||||
const tasks = this.orderTasks(job.tasks, config);
|
||||
|
||||
for (const stage of config.stages) {
|
||||
const targetCount = Math.ceil(tasks.length * stage.percentage / 100);
|
||||
const stageTasks = tasks.slice(0, targetCount);
|
||||
const newTasks = stageTasks.filter(t => t.status === "pending");
|
||||
|
||||
this.emitProgress(job, {
|
||||
phase: "canary",
|
||||
stage: stage.name,
|
||||
percentage: stage.percentage,
|
||||
targets: stageTasks.length
|
||||
});
|
||||
|
||||
// Deploy to new targets in this stage
|
||||
await Promise.all(newTasks.map(task => this.executeTask(task)));
|
||||
|
||||
// Health check stage targets
|
||||
await this.verifyTargets(stageTasks, config.healthCheck);
|
||||
|
||||
// Metrics check if configured
|
||||
if (config.metricsCheck) {
|
||||
await this.checkMetrics(stageTasks, config.metricsCheck);
|
||||
}
|
||||
|
||||
// Wait for stage duration
|
||||
if (stage.duration > 0) {
|
||||
await this.waitWithMonitoring(
|
||||
stageTasks,
|
||||
stage.duration,
|
||||
config.metricsCheck
|
||||
);
|
||||
}
|
||||
|
||||
// Wait for manual approval if not auto-progress
|
||||
if (!stage.autoProgress && stage.percentage < 100) {
|
||||
await this.waitForApproval(job, stage.name);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private async checkMetrics(
|
||||
targets: DeploymentTask[],
|
||||
config: MetricsCheckConfig
|
||||
): Promise<void> {
|
||||
const metricsClient = await this.getMetricsClient(config.integrationId);
|
||||
|
||||
for (const query of config.queries) {
|
||||
const result = await metricsClient.query(query.query);
|
||||
|
||||
const passed = this.evaluateMetric(result, query);
|
||||
|
||||
if (!passed) {
|
||||
throw new CanaryMetricsFailedError(query.name, result, query.threshold);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
- Risk-sensitive deployments
|
||||
- Services with real user traffic
|
||||
- Deployments with metrics-based validation
|
||||
- Gradual feature rollouts
|
||||
|
||||
## Blue-Green Strategy
|
||||
|
||||
### Description
|
||||
|
||||
Deploys to a parallel "green" environment while "blue" continues serving traffic, then switches.
|
||||
|
||||
```
|
||||
BLUE-GREEN DEPLOYMENT
|
||||
|
||||
Phase 1: Deploy Green Phase 2: Switch Traffic
|
||||
|
||||
┌─────────────────────────┐ ┌─────────────────────────┐
|
||||
│ Load Balancer │ │ Load Balancer │
|
||||
│ │ │ │ │ │
|
||||
│ ▼ │ │ ▼ │
|
||||
│ ┌─────────────┐ │ │ ┌─────────────┐ │
|
||||
│ │ Blue [v1] │◄─active│ │ │ Blue [v1] │ │
|
||||
│ │ T1, T2, T3 │ │ │ │ T1, T2, T3 │ │
|
||||
│ └─────────────┘ │ │ └─────────────┘ │
|
||||
│ │ │ │
|
||||
│ ┌─────────────┐ │ │ ┌─────────────┐ │
|
||||
│ │ Green [v2] │◄─deploy│ │ │ Green [v2] │◄─active│
|
||||
│ │ T4, T5, T6 │ │ │ │ T4, T5, T6 │ │
|
||||
│ └─────────────┘ │ │ └─────────────┘ │
|
||||
│ │ │ │
|
||||
└─────────────────────────┘ └─────────────────────────┘
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
```typescript
|
||||
interface BlueGreenConfig {
|
||||
strategy: "blue-green";
|
||||
|
||||
// Environment labels
|
||||
blueLabel: string; // Label for blue targets
|
||||
greenLabel: string; // Label for green targets
|
||||
|
||||
// Traffic routing
|
||||
routerIntegration: UUID; // Router/LB integration
|
||||
routingConfig: RoutingConfig;
|
||||
|
||||
// Validation
|
||||
healthCheck: HealthCheckConfig;
|
||||
warmupTime: number; // Seconds to warm up green
|
||||
validationTests?: string[]; // Test suites to run
|
||||
|
||||
// Switchover
|
||||
switchoverMode: "instant" | "gradual";
|
||||
gradualSteps?: number[]; // Percentage steps for gradual
|
||||
|
||||
// Rollback
|
||||
keepBlueActive: number; // Seconds to keep blue ready
|
||||
}
|
||||
|
||||
// Example
|
||||
const config: BlueGreenConfig = {
|
||||
strategy: "blue-green",
|
||||
blueLabel: "deployment=blue",
|
||||
greenLabel: "deployment=green",
|
||||
routerIntegration: "nginx-lb-uuid",
|
||||
routingConfig: {
|
||||
upstreamName: "myapp",
|
||||
healthEndpoint: "/health"
|
||||
},
|
||||
healthCheck: {
|
||||
type: "http",
|
||||
path: "/health",
|
||||
timeout: 30,
|
||||
retries: 5,
|
||||
interval: 10
|
||||
},
|
||||
warmupTime: 60,
|
||||
validationTests: ["smoke-test-suite"],
|
||||
switchoverMode: "instant",
|
||||
keepBlueActive: 1800 // 30 minutes
|
||||
};
|
||||
```
|
||||
|
||||
### Execution
|
||||
|
||||
```typescript
|
||||
class BlueGreenExecutor {
|
||||
async execute(job: DeploymentJob, config: BlueGreenConfig): Promise<void> {
|
||||
// Identify blue and green targets
|
||||
const { blue, green } = this.categorizeTargets(job.tasks, config);
|
||||
|
||||
// Phase 1: Deploy to green
|
||||
this.emitProgress(job, { phase: "deploying-green" });
|
||||
|
||||
await Promise.all(green.map(task => this.executeTask(task)));
|
||||
|
||||
// Health check green targets
|
||||
await this.verifyTargets(green, config.healthCheck);
|
||||
|
||||
// Warmup period
|
||||
if (config.warmupTime > 0) {
|
||||
this.emitProgress(job, { phase: "warming-up" });
|
||||
await sleep(config.warmupTime * 1000);
|
||||
}
|
||||
|
||||
// Run validation tests
|
||||
if (config.validationTests?.length) {
|
||||
this.emitProgress(job, { phase: "validating" });
|
||||
await this.runValidationTests(green, config.validationTests);
|
||||
}
|
||||
|
||||
// Phase 2: Switch traffic
|
||||
this.emitProgress(job, { phase: "switching-traffic" });
|
||||
|
||||
if (config.switchoverMode === "instant") {
|
||||
await this.instantSwitchover(config, blue, green);
|
||||
} else {
|
||||
await this.gradualSwitchover(config, blue, green);
|
||||
}
|
||||
|
||||
// Verify traffic routing
|
||||
await this.verifyRouting(green, config);
|
||||
|
||||
// Schedule blue decommission
|
||||
if (config.keepBlueActive > 0) {
|
||||
this.scheduleBlueDecommission(blue, config.keepBlueActive);
|
||||
}
|
||||
}
|
||||
|
||||
private async instantSwitchover(
|
||||
config: BlueGreenConfig,
|
||||
blue: DeploymentTask[],
|
||||
green: DeploymentTask[]
|
||||
): Promise<void> {
|
||||
const router = await this.getRouter(config.routerIntegration);
|
||||
|
||||
// Update upstream to green targets
|
||||
await router.updateUpstream(config.routingConfig.upstreamName, {
|
||||
servers: green.map(t => ({
|
||||
address: t.target.address,
|
||||
weight: 1
|
||||
}))
|
||||
});
|
||||
|
||||
// Remove blue from rotation
|
||||
await router.removeServers(
|
||||
config.routingConfig.upstreamName,
|
||||
blue.map(t => t.target.address)
|
||||
);
|
||||
}
|
||||
|
||||
private async gradualSwitchover(
|
||||
config: BlueGreenConfig,
|
||||
blue: DeploymentTask[],
|
||||
green: DeploymentTask[]
|
||||
): Promise<void> {
|
||||
const router = await this.getRouter(config.routerIntegration);
|
||||
const steps = config.gradualSteps || [25, 50, 75, 100];
|
||||
|
||||
for (const percentage of steps) {
|
||||
await router.setTrafficSplit(config.routingConfig.upstreamName, {
|
||||
blue: 100 - percentage,
|
||||
green: percentage
|
||||
});
|
||||
|
||||
// Monitor for errors
|
||||
await this.monitorTraffic(30);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
- Zero-downtime deployments
|
||||
- Database migration deployments
|
||||
- High-stakes production updates
|
||||
- Instant rollback requirements
|
||||
|
||||
## Strategy Selection Guide
|
||||
|
||||
```
|
||||
STRATEGY SELECTION
|
||||
|
||||
START
|
||||
│
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ Zero downtime needed? │
|
||||
└───────────┬────────────┘
|
||||
│
|
||||
No │ Yes
|
||||
│ │ │
|
||||
▼ │ ▼
|
||||
┌──────────┐ │ ┌───────────────────┐
|
||||
│ All-at- │ │ │ Metrics-based │
|
||||
│ once │ │ │ validation needed?│
|
||||
└──────────┘ │ └─────────┬─────────┘
|
||||
│ │
|
||||
│ No │ Yes
|
||||
│ │ │ │
|
||||
│ ▼ │ ▼
|
||||
│ ┌──────────┐│ ┌──────────┐
|
||||
│ │ Instant ││ │ Canary │
|
||||
│ │ rollback? ││ │ │
|
||||
│ └────┬─────┘│ └──────────┘
|
||||
│ │ │
|
||||
│ No │ Yes │
|
||||
│ │ │ │ │
|
||||
│ ▼ │ ▼ │
|
||||
│┌──────┐│┌────┴─────┐
|
||||
││Rolling│││Blue-Green│
|
||||
│└──────┘│└──────────┘
|
||||
│ │
|
||||
└───────┘
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Deployment Overview](overview.md)
|
||||
- [Progressive Delivery](../modules/progressive-delivery.md)
|
||||
- [Rollback Management](overview.md#rollback-management)
|
||||
249
docs/modules/release-orchestrator/design/decisions.md
Normal file
249
docs/modules/release-orchestrator/design/decisions.md
Normal file
@@ -0,0 +1,249 @@
|
||||
# Key Architectural Decisions
|
||||
|
||||
This document records significant architectural decisions and their rationale.
|
||||
|
||||
## ADR-001: Digest-First Release Identity
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Container images can be referenced by tags (e.g., `v1.2.3`) or digests (e.g., `sha256:abc123...`). Tags are mutable - the same tag can point to different images over time.
|
||||
|
||||
**Decision:**
|
||||
All releases are identified by immutable OCI digests, never tags. Tags are accepted as input but immediately resolved to digests at release creation time.
|
||||
|
||||
**Consequences:**
|
||||
- Releases are immutable and reproducible
|
||||
- Digest mismatch at pull time indicates tampering (deployment fails)
|
||||
- Rollback targets specific digest, not "previous tag"
|
||||
- Requires registry integration for tag resolution
|
||||
- Users see both tag (friendly) and digest (authoritative) in UI
|
||||
|
||||
---
|
||||
|
||||
## ADR-002: Evidence for Every Decision
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Compliance and audit requirements demand proof of what was deployed, when, by whom, and why.
|
||||
|
||||
**Decision:**
|
||||
Every promotion and deployment produces a cryptographically signed evidence packet that is immutable and append-only.
|
||||
|
||||
**Consequences:**
|
||||
- Evidence table has no UPDATE/DELETE permissions
|
||||
- Evidence enables audit-grade compliance reporting
|
||||
- Evidence enables deterministic replay (same inputs + policy = same decision)
|
||||
- Evidence packets are exportable for external audit systems
|
||||
- Storage requirements increase over time
|
||||
|
||||
---
|
||||
|
||||
## ADR-003: Plugin Architecture for Integrations
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Organizations use diverse toolchains (registries, CI/CD, vaults, notification systems). Hard-coding integrations limits adoption.
|
||||
|
||||
**Decision:**
|
||||
All integrations are implemented as plugins via a three-surface contract (Manifest, Connector Runtime, Step Provider). Core orchestration is stable and plugin-agnostic.
|
||||
|
||||
**Consequences:**
|
||||
- Core has no hard-coded vendor integrations
|
||||
- New integrations can be added without core changes
|
||||
- Plugin failures cannot crash core (sandbox isolation)
|
||||
- Plugin interface must be versioned and stable
|
||||
- Additional complexity in plugin lifecycle management
|
||||
|
||||
---
|
||||
|
||||
## ADR-004: No Feature Gating
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Enterprise software often gates security features behind premium tiers, creating "pay for security" anti-patterns.
|
||||
|
||||
**Decision:**
|
||||
All plans include all features. Pricing is based only on:
|
||||
- Number of environments
|
||||
- New digests analyzed per day
|
||||
- Fair use on deployments
|
||||
|
||||
**Consequences:**
|
||||
- No feature flags tied to billing tier
|
||||
- Transparent pricing without feature fragmentation
|
||||
- May limit revenue optimization per customer
|
||||
- Quota enforcement must be clear and user-friendly
|
||||
|
||||
---
|
||||
|
||||
## ADR-005: Offline-First Operation
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Many organizations operate in air-gapped or restricted network environments. Dependency on external services limits adoption.
|
||||
|
||||
**Decision:**
|
||||
All core operations must work in air-gapped environments. External data is synced via mirror bundles. Plugins may require connectivity; core does not.
|
||||
|
||||
**Consequences:**
|
||||
- No runtime calls to external APIs for core decisions
|
||||
- Advisory data synced via offline bundles
|
||||
- Plugin connectivity requirements are declared in manifest
|
||||
- Evidence packets exportable for external submission
|
||||
- Additional complexity in data synchronization
|
||||
|
||||
---
|
||||
|
||||
## ADR-006: Agent-Based and Agentless Deployment
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Some organizations prefer agents for security isolation; others prefer agentless for simplicity.
|
||||
|
||||
**Decision:**
|
||||
Support both agent-based (persistent daemon on targets) and agentless (SSH/WinRM on demand) deployment models.
|
||||
|
||||
**Consequences:**
|
||||
- Agent provides better performance and reliability
|
||||
- Agentless reduces infrastructure footprint
|
||||
- Unified task model abstracts deployment details
|
||||
- Security model must handle both patterns
|
||||
- Higher testing matrix
|
||||
|
||||
---
|
||||
|
||||
## ADR-007: PostgreSQL as Primary Database
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Database choice affects scalability, operations, and feature availability.
|
||||
|
||||
**Decision:**
|
||||
PostgreSQL (16+) as the primary database with:
|
||||
- Per-module schema isolation
|
||||
- Row-level security for multi-tenancy
|
||||
- JSONB for flexible configuration
|
||||
- Append-only triggers for evidence tables
|
||||
|
||||
**Consequences:**
|
||||
- Proven scalability and reliability
|
||||
- Rich feature set (JSONB, RLS, triggers)
|
||||
- Single database technology to operate
|
||||
- Requires PostgreSQL expertise
|
||||
- Schema migrations must be carefully managed
|
||||
|
||||
---
|
||||
|
||||
## ADR-008: Workflow Engine with DAG Execution
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Deployment workflows need conditional logic, parallel execution, error handling, and rollback support.
|
||||
|
||||
**Decision:**
|
||||
Implement a DAG-based workflow engine where:
|
||||
- Workflows are templates with nodes (steps) and edges (dependencies)
|
||||
- Steps execute when all dependencies are satisfied
|
||||
- Expressions reference previous step outputs
|
||||
- Built-in support for approval, retry, timeout, and rollback
|
||||
|
||||
**Consequences:**
|
||||
- Flexible workflow composition
|
||||
- Visual representation in UI
|
||||
- Complex error handling scenarios supported
|
||||
- Learning curve for workflow authors
|
||||
- Expression engine security considerations
|
||||
|
||||
---
|
||||
|
||||
## ADR-009: Separation of Duties Enforcement
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Compliance requires that the person requesting a change cannot be the same person approving it.
|
||||
|
||||
**Decision:**
|
||||
Separation of Duties (SoD) is enforced at the approval gateway level, preventing self-approval when SoD is enabled for an environment.
|
||||
|
||||
**Consequences:**
|
||||
- Prevents single-person deployment to sensitive environments
|
||||
- Configurable per environment
|
||||
- May slow down deployments
|
||||
- Requires minimum team size for SoD-enabled environments
|
||||
|
||||
---
|
||||
|
||||
## ADR-010: Version Stickers for Drift Detection
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Knowing what's actually deployed on targets is essential for audit and troubleshooting.
|
||||
|
||||
**Decision:**
|
||||
Every deployment writes a `stella.version.json` sticker file on the target containing release ID, digests, deployment timestamp, and deployer identity.
|
||||
|
||||
**Consequences:**
|
||||
- Enables drift detection (expected vs actual)
|
||||
- Provides audit trail on target hosts
|
||||
- Enables accurate "what's deployed where" queries
|
||||
- Requires file access on targets
|
||||
- Sticker corruption/deletion must be handled
|
||||
|
||||
---
|
||||
|
||||
## ADR-011: Security Gate Integration
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Security scanning exists as a separate concern; release orchestration should leverage but not duplicate it.
|
||||
|
||||
**Decision:**
|
||||
Security scanning remains in existing modules (Scanner, VEX). Release orchestration consumes scan results through a security gate that evaluates vulnerability thresholds.
|
||||
|
||||
**Consequences:**
|
||||
- Clear separation of concerns
|
||||
- Existing scanning investment preserved
|
||||
- Gate configuration determines block thresholds
|
||||
- Requires API integration with scanning modules
|
||||
- Policy engine evaluates security verdicts
|
||||
|
||||
---
|
||||
|
||||
## ADR-012: gRPC for Agent Communication
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Agent communication requires efficient, bidirectional, and secure data transfer.
|
||||
|
||||
**Decision:**
|
||||
Use gRPC for agent communication with:
|
||||
- mTLS for transport security
|
||||
- Bidirectional streaming for logs and progress
|
||||
- Protocol buffers for efficient serialization
|
||||
|
||||
**Consequences:**
|
||||
- Efficient binary protocol
|
||||
- Strong typing via protobuf
|
||||
- Built-in streaming support
|
||||
- Requires gRPC infrastructure
|
||||
- Firewall considerations for gRPC traffic
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Design Principles](principles.md)
|
||||
- [Security Architecture](../security/overview.md)
|
||||
- [Plugin System](../modules/plugin-system.md)
|
||||
221
docs/modules/release-orchestrator/design/principles.md
Normal file
221
docs/modules/release-orchestrator/design/principles.md
Normal file
@@ -0,0 +1,221 @@
|
||||
# Design Principles & Invariants
|
||||
|
||||
> These principles are **inviolable** and MUST be reflected in all code, UI, documentation, and audit artifacts.
|
||||
|
||||
## Core Principles
|
||||
|
||||
### Principle 1: Release Identity via Digest
|
||||
|
||||
```
|
||||
INVARIANT: A release is a set of OCI image digests (component → digest mapping), never tags.
|
||||
```
|
||||
|
||||
- Tags are convenience inputs for resolution
|
||||
- Tags are resolved to digests at release creation time
|
||||
- All downstream operations (promotion, deployment, rollback) use digests
|
||||
- Digest mismatch at pull time = deployment failure (tamper detection)
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Release creation API accepts tags but immediately resolves to digests
|
||||
- All internal references use `sha256:` prefixed digests
|
||||
- Agent deployment verifies digest at pull time
|
||||
- Rollback targets specific digest, not "previous tag"
|
||||
|
||||
### Principle 2: Determinism and Evidence
|
||||
|
||||
```
|
||||
INVARIANT: Every deployment/promotion produces an immutable evidence record.
|
||||
```
|
||||
|
||||
Evidence record contains:
|
||||
- **Who**: User identity (from Authority)
|
||||
- **What**: Release bundle (digests), target environment, target hosts
|
||||
- **Why**: Policy evaluation result, approval records, decision reasons
|
||||
- **How**: Generated artifacts (compose files, scripts), execution logs
|
||||
- **When**: Timestamps for request, decision, execution, completion
|
||||
|
||||
Evidence enables:
|
||||
- Audit-grade compliance reporting
|
||||
- Deterministic replay (same inputs + policy → same decision)
|
||||
- "Why blocked?" explainability
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Evidence is generated synchronously with decision
|
||||
- Evidence is signed before storage
|
||||
- Evidence table is append-only (no UPDATE/DELETE)
|
||||
- Evidence includes hash of all inputs for replay verification
|
||||
|
||||
### Principle 3: Pluggable Everything, Stable Core
|
||||
|
||||
```
|
||||
INVARIANT: Integrations are plugins; the core orchestration engine is stable.
|
||||
```
|
||||
|
||||
**Plugins contribute:**
|
||||
- Configuration screens (UI)
|
||||
- Connector logic (runtime)
|
||||
- Step node types (workflow)
|
||||
- Doctor checks (diagnostics)
|
||||
- Agent types (deployment)
|
||||
|
||||
**Core engine provides:**
|
||||
- Workflow execution (DAG processing)
|
||||
- State machine management
|
||||
- Evidence generation
|
||||
- Policy evaluation
|
||||
- Credential brokering
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Core has no hard-coded integrations
|
||||
- Plugin interface is versioned and stable
|
||||
- Plugin failures cannot crash core
|
||||
- Core provides fallback behavior when plugins unavailable
|
||||
|
||||
### Principle 4: No Feature Gating
|
||||
|
||||
```
|
||||
INVARIANT: All plans include all features. Limits are only:
|
||||
- Number of environments
|
||||
- Number of new digests analyzed per day
|
||||
- Fair use on deployments
|
||||
```
|
||||
|
||||
This prevents:
|
||||
- "Pay for security" anti-pattern
|
||||
- Per-project/per-seat billing landmines
|
||||
- Feature fragmentation across tiers
|
||||
|
||||
**Implementation Requirements:**
|
||||
- No feature flags tied to billing tier
|
||||
- Quota enforcement is transparent (clear error messages)
|
||||
- Usage metrics exposed for customer visibility
|
||||
- Overage handling is graceful (soft limits with warnings)
|
||||
|
||||
### Principle 5: Offline-First Operation
|
||||
|
||||
```
|
||||
INVARIANT: All core operations MUST work in air-gapped environments.
|
||||
```
|
||||
|
||||
Implications:
|
||||
- No runtime calls to external APIs for core decisions
|
||||
- Vulnerability data synced via mirror bundles
|
||||
- Plugins may require connectivity; core does not
|
||||
- Evidence packets exportable for external audit
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Core decision logic has no external HTTP calls
|
||||
- All external data is pre-synced and cached
|
||||
- Plugin connectivity requirements are declared in manifest
|
||||
- Offline mode is explicit configuration, not degraded fallback
|
||||
|
||||
### Principle 6: Immutable Generated Artifacts
|
||||
|
||||
```
|
||||
INVARIANT: Every deployment generates and stores immutable artifacts.
|
||||
```
|
||||
|
||||
Generated artifacts:
|
||||
- `compose.stella.lock.yml`: Pinned digests, resolved env refs
|
||||
- `deploy.stella.script.dll`: Compiled C# script (or hash reference)
|
||||
- `release.evidence.json`: Decision record
|
||||
- `stella.version.json`: Version sticker placed on target
|
||||
|
||||
Version sticker enables:
|
||||
- Drift detection (expected vs actual)
|
||||
- Audit trail on target host
|
||||
- Rollback reference
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Artifacts are content-addressed (hash in filename or metadata)
|
||||
- Artifacts are stored before deployment execution
|
||||
- Artifact storage is immutable (no overwrites)
|
||||
- Version sticker is atomic write on target
|
||||
|
||||
---
|
||||
|
||||
## Architectural Invariants (Enforced by Design)
|
||||
|
||||
These invariants are enforced through database constraints, code architecture, and operational controls.
|
||||
|
||||
| Invariant | Enforcement Mechanism |
|
||||
|-----------|----------------------|
|
||||
| Digests are immutable | Database constraint: digest column is unique, no updates |
|
||||
| Evidence packets are append-only | Evidence table has no UPDATE/DELETE permissions |
|
||||
| Secrets never in database | Vault integration; only references stored |
|
||||
| Plugins cannot bypass policy | Policy evaluation in core, not plugin |
|
||||
| Multi-tenant isolation | `tenant_id` FK on all tables; row-level security |
|
||||
| Workflow state is auditable | State transitions logged; no direct state manipulation |
|
||||
| Approvals are tamper-evident | Approval records are signed and append-only |
|
||||
|
||||
### Database Enforcement
|
||||
|
||||
```sql
|
||||
-- Example: Evidence table with no UPDATE/DELETE
|
||||
CREATE TABLE release.evidence_packets (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id),
|
||||
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
|
||||
content_hash TEXT NOT NULL,
|
||||
content JSONB NOT NULL,
|
||||
signature TEXT NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
-- No updated_at column; immutable by design
|
||||
);
|
||||
|
||||
-- Revoke UPDATE/DELETE from application role
|
||||
REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role;
|
||||
```
|
||||
|
||||
### Code Architecture Enforcement
|
||||
|
||||
```csharp
|
||||
// Policy evaluation is ALWAYS in core, never delegated to plugins
|
||||
public sealed class PromotionDecisionEngine
|
||||
{
|
||||
// Plugins provide gate implementations, but core orchestrates evaluation
|
||||
public async Task<DecisionResult> EvaluateAsync(
|
||||
Promotion promotion,
|
||||
IReadOnlyList<IGateProvider> gates,
|
||||
CancellationToken ct)
|
||||
{
|
||||
// Core controls evaluation order and aggregation
|
||||
var results = new List<GateResult>();
|
||||
foreach (var gate in gates)
|
||||
{
|
||||
// Plugin provides evaluation logic
|
||||
var result = await gate.EvaluateAsync(promotion, ct);
|
||||
results.Add(result);
|
||||
|
||||
// Core decides how to aggregate (plugins cannot override)
|
||||
if (result.IsBlocking && _policy.FailFast)
|
||||
break;
|
||||
}
|
||||
|
||||
// Core makes final decision
|
||||
return _decisionAggregator.Aggregate(results);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Document Conventions
|
||||
|
||||
Throughout the Release Orchestrator documentation:
|
||||
|
||||
- **MUST**: Mandatory requirement; non-compliance is a bug
|
||||
- **SHOULD**: Recommended but not mandatory; deviation requires justification
|
||||
- **MAY**: Optional; implementation decision
|
||||
- **Entity names**: `PascalCase` (e.g., `ReleaseBundle`)
|
||||
- **Table names**: `snake_case` (e.g., `release_bundles`)
|
||||
- **API paths**: `/api/v1/resource-name`
|
||||
- **Module names**: `kebab-case` (e.g., `release-manager`)
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Key Architectural Decisions](decisions.md)
|
||||
- [Module Architecture](../modules/overview.md)
|
||||
- [Security Architecture](../security/overview.md)
|
||||
602
docs/modules/release-orchestrator/implementation-guide.md
Normal file
602
docs/modules/release-orchestrator/implementation-guide.md
Normal file
@@ -0,0 +1,602 @@
|
||||
# Implementation Guide
|
||||
|
||||
> .NET 10 implementation patterns and best practices for Release Orchestrator modules.
|
||||
|
||||
**Target Audience**: Development team implementing Release Orchestrator modules
|
||||
**Prerequisites**: Familiarity with [CLAUDE.md](../../../CLAUDE.md) coding rules
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This guide supplements the architecture documentation with .NET 10-specific implementation patterns required for all Release Orchestrator modules. These patterns ensure:
|
||||
|
||||
- Deterministic behavior for evidence reproducibility
|
||||
- Testability through dependency injection
|
||||
- Compliance with Stella Ops coding standards
|
||||
- Performance and reliability
|
||||
|
||||
---
|
||||
|
||||
## Code Quality Requirements
|
||||
|
||||
### Compiler Configuration
|
||||
|
||||
All Release Orchestrator projects **MUST** enforce warnings as errors:
|
||||
|
||||
```xml
|
||||
<!-- In Directory.Build.props or .csproj -->
|
||||
<PropertyGroup>
|
||||
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
|
||||
<Nullable>enable</Nullable>
|
||||
<ImplicitUsings>disable</ImplicitUsings>
|
||||
</PropertyGroup>
|
||||
```
|
||||
|
||||
**Rationale**: Warnings indicate potential bugs, regressions, or code quality drift. Treating them as errors prevents them from being ignored.
|
||||
|
||||
---
|
||||
|
||||
## Determinism & Time Handling
|
||||
|
||||
### TimeProvider Injection
|
||||
|
||||
**Never** use `DateTime.UtcNow`, `DateTimeOffset.UtcNow`, or `DateTimeOffset.Now` directly. Always inject `TimeProvider`.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - non-deterministic, hard to test
|
||||
public class PromotionManager
|
||||
{
|
||||
public Promotion CreatePromotion(Guid releaseId, Guid targetEnvId)
|
||||
{
|
||||
return new Promotion
|
||||
{
|
||||
Id = Guid.NewGuid(),
|
||||
ReleaseId = releaseId,
|
||||
TargetEnvironmentId = targetEnvId,
|
||||
RequestedAt = DateTimeOffset.UtcNow // ❌ Hard-coded time
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// ✅ GOOD - injectable, testable, deterministic
|
||||
public class PromotionManager
|
||||
{
|
||||
private readonly TimeProvider _timeProvider;
|
||||
private readonly IGuidGenerator _guidGenerator;
|
||||
|
||||
public PromotionManager(TimeProvider timeProvider, IGuidGenerator guidGenerator)
|
||||
{
|
||||
_timeProvider = timeProvider;
|
||||
_guidGenerator = guidGenerator;
|
||||
}
|
||||
|
||||
public Promotion CreatePromotion(Guid releaseId, Guid targetEnvId)
|
||||
{
|
||||
return new Promotion
|
||||
{
|
||||
Id = _guidGenerator.NewGuid(),
|
||||
ReleaseId = releaseId,
|
||||
TargetEnvironmentId = targetEnvId,
|
||||
RequestedAt = _timeProvider.GetUtcNow() // ✅ Injected, testable
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Registration**:
|
||||
```csharp
|
||||
// Production: use system time
|
||||
services.AddSingleton(TimeProvider.System);
|
||||
|
||||
// Testing: use manual time for deterministic tests
|
||||
var manualTime = new ManualTimeProvider();
|
||||
manualTime.SetUtcNow(new DateTimeOffset(2026, 1, 10, 12, 0, 0, TimeSpan.Zero));
|
||||
services.AddSingleton<TimeProvider>(manualTime);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GUID Generation
|
||||
|
||||
**Never** use `Guid.NewGuid()` directly. Always inject `IGuidGenerator`.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD
|
||||
var releaseId = Guid.NewGuid();
|
||||
|
||||
// ✅ GOOD
|
||||
var releaseId = _guidGenerator.NewGuid();
|
||||
```
|
||||
|
||||
**Interface**:
|
||||
```csharp
|
||||
public interface IGuidGenerator
|
||||
{
|
||||
Guid NewGuid();
|
||||
}
|
||||
|
||||
// Production implementation
|
||||
public sealed class SystemGuidGenerator : IGuidGenerator
|
||||
{
|
||||
public Guid NewGuid() => Guid.NewGuid();
|
||||
}
|
||||
|
||||
// Deterministic test implementation
|
||||
public sealed class SequentialGuidGenerator : IGuidGenerator
|
||||
{
|
||||
private int _counter;
|
||||
|
||||
public Guid NewGuid()
|
||||
{
|
||||
var bytes = new byte[16];
|
||||
BitConverter.GetBytes(_counter++).CopyTo(bytes, 0);
|
||||
return new Guid(bytes);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Async & Cancellation
|
||||
|
||||
### CancellationToken Propagation
|
||||
|
||||
**Always** propagate `CancellationToken` through async call chains. Never use `CancellationToken.None` except at entry points where no token is available.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - ignores cancellation
|
||||
public async Task<Promotion> ApprovePromotionAsync(Guid promotionId, Guid userId, CancellationToken ct)
|
||||
{
|
||||
var promotion = await _repository.GetByIdAsync(promotionId, CancellationToken.None); // ❌ Wrong
|
||||
|
||||
promotion.Approvals.Add(new Approval
|
||||
{
|
||||
ApproverId = userId,
|
||||
ApprovedAt = _timeProvider.GetUtcNow()
|
||||
});
|
||||
|
||||
await _repository.SaveAsync(promotion, CancellationToken.None); // ❌ Wrong
|
||||
await Task.Delay(1000); // ❌ Missing ct
|
||||
|
||||
return promotion;
|
||||
}
|
||||
|
||||
// ✅ GOOD - propagates cancellation
|
||||
public async Task<Promotion> ApprovePromotionAsync(Guid promotionId, Guid userId, CancellationToken ct)
|
||||
{
|
||||
var promotion = await _repository.GetByIdAsync(promotionId, ct); // ✅ Propagated
|
||||
|
||||
promotion.Approvals.Add(new Approval
|
||||
{
|
||||
ApproverId = userId,
|
||||
ApprovedAt = _timeProvider.GetUtcNow()
|
||||
});
|
||||
|
||||
await _repository.SaveAsync(promotion, ct); // ✅ Propagated
|
||||
await Task.Delay(1000, ct); // ✅ Cancellable
|
||||
|
||||
return promotion;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## HTTP Client Usage
|
||||
|
||||
### IHttpClientFactory for Connector Runtime
|
||||
|
||||
**Never** instantiate `HttpClient` directly. Always use `IHttpClientFactory` with configured timeouts and resilience policies.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - direct instantiation risks socket exhaustion
|
||||
public class GitHubConnector
|
||||
{
|
||||
public async Task<string> GetCommitAsync(string sha)
|
||||
{
|
||||
using var client = new HttpClient(); // ❌ Socket exhaustion risk
|
||||
var response = await client.GetAsync($"https://api.github.com/commits/{sha}");
|
||||
return await response.Content.ReadAsStringAsync();
|
||||
}
|
||||
}
|
||||
|
||||
// ✅ GOOD - factory with resilience
|
||||
public class GitHubConnector
|
||||
{
|
||||
private readonly IHttpClientFactory _httpClientFactory;
|
||||
|
||||
public GitHubConnector(IHttpClientFactory httpClientFactory)
|
||||
{
|
||||
_httpClientFactory = httpClientFactory;
|
||||
}
|
||||
|
||||
public async Task<string> GetCommitAsync(string sha, CancellationToken ct)
|
||||
{
|
||||
var client = _httpClientFactory.CreateClient("GitHub");
|
||||
var response = await client.GetAsync($"/commits/{sha}", ct);
|
||||
response.EnsureSuccessStatusCode();
|
||||
return await response.Content.ReadAsStringAsync(ct);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Registration with resilience**:
|
||||
```csharp
|
||||
services.AddHttpClient("GitHub", client =>
|
||||
{
|
||||
client.BaseAddress = new Uri("https://api.github.com");
|
||||
client.Timeout = TimeSpan.FromSeconds(30);
|
||||
client.DefaultRequestHeaders.Add("User-Agent", "StellaOps/1.0");
|
||||
})
|
||||
.AddStandardResilienceHandler(options =>
|
||||
{
|
||||
options.Retry.MaxRetryAttempts = 3;
|
||||
options.CircuitBreaker.SamplingDuration = TimeSpan.FromSeconds(30);
|
||||
options.TotalRequestTimeout.Timeout = TimeSpan.FromMinutes(1);
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Culture & Formatting
|
||||
|
||||
### Invariant Culture for Parsing
|
||||
|
||||
**Always** use `CultureInfo.InvariantCulture` for parsing and formatting dates, numbers, and any string that will be persisted, hashed, or compared.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - culture-sensitive
|
||||
var percentage = double.Parse(input);
|
||||
var formatted = value.ToString("P2");
|
||||
var dateStr = date.ToString("yyyy-MM-dd");
|
||||
|
||||
// ✅ GOOD - invariant culture
|
||||
var percentage = double.Parse(input, CultureInfo.InvariantCulture);
|
||||
var formatted = value.ToString("P2", CultureInfo.InvariantCulture);
|
||||
var dateStr = date.ToString("yyyy-MM-dd", CultureInfo.InvariantCulture);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## JSON Handling
|
||||
|
||||
### RFC 8785 Canonical JSON for Evidence
|
||||
|
||||
For evidence packets and decision records that will be hashed or signed, use **RFC 8785-compliant** canonical JSON serialization.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - non-canonical JSON
|
||||
var json = JsonSerializer.Serialize(decisionRecord, new JsonSerializerOptions
|
||||
{
|
||||
Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
|
||||
PropertyNamingPolicy = JsonNamingPolicy.CamelCase
|
||||
});
|
||||
var hash = ComputeHash(json); // ❌ Non-deterministic
|
||||
|
||||
// ✅ GOOD - use shared canonicalizer
|
||||
var canonicalJson = CanonicalJsonSerializer.Serialize(decisionRecord);
|
||||
var hash = ComputeHash(canonicalJson); // ✅ Deterministic
|
||||
```
|
||||
|
||||
**Canonical JSON Requirements**:
|
||||
- Keys sorted alphabetically
|
||||
- Minimal escaping per RFC 8785 spec
|
||||
- No exponent notation for numbers
|
||||
- No trailing/leading zeros
|
||||
- No whitespace
|
||||
|
||||
---
|
||||
|
||||
## Database Interaction
|
||||
|
||||
### DateTimeOffset for PostgreSQL timestamptz
|
||||
|
||||
PostgreSQL `timestamptz` columns **MUST** be read and written as `DateTimeOffset`, not `DateTime`.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - loses offset information
|
||||
await using var reader = await command.ExecuteReaderAsync(ct);
|
||||
while (await reader.ReadAsync(ct))
|
||||
{
|
||||
var createdAt = reader.GetDateTime(reader.GetOrdinal("created_at")); // ❌ Loses offset
|
||||
}
|
||||
|
||||
// ✅ GOOD - preserves offset
|
||||
await using var reader = await command.ExecuteReaderAsync(ct);
|
||||
while (await reader.ReadAsync(ct))
|
||||
{
|
||||
var createdAt = reader.GetFieldValue<DateTimeOffset>(reader.GetOrdinal("created_at")); // ✅ Correct
|
||||
}
|
||||
```
|
||||
|
||||
**Insertion**:
|
||||
```csharp
|
||||
// ✅ Always use UTC DateTimeOffset
|
||||
var createdAt = _timeProvider.GetUtcNow(); // Returns DateTimeOffset
|
||||
await command.ExecuteNonQueryAsync(ct);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Hybrid Logical Clock (HLC) for Distributed Ordering
|
||||
|
||||
For distributed ordering and audit-safe sequencing, use `IHybridLogicalClock` from `StellaOps.HybridLogicalClock`.
|
||||
|
||||
**When to use HLC**:
|
||||
- Promotion state transitions
|
||||
- Workflow step execution ordering
|
||||
- Deployment task sequencing
|
||||
- Timeline event ordering
|
||||
|
||||
```csharp
|
||||
public class PromotionStateTransition
|
||||
{
|
||||
private readonly IHybridLogicalClock _hlc;
|
||||
private readonly TimeProvider _timeProvider;
|
||||
|
||||
public async Task TransitionStateAsync(
|
||||
Promotion promotion,
|
||||
PromotionState newState,
|
||||
CancellationToken ct)
|
||||
{
|
||||
var transition = new StateTransition
|
||||
{
|
||||
PromotionId = promotion.Id,
|
||||
FromState = promotion.Status,
|
||||
ToState = newState,
|
||||
THlc = _hlc.Tick(), // ✅ Monotonic, skew-tolerant ordering
|
||||
TsWall = _timeProvider.GetUtcNow(), // ✅ Informational timestamp
|
||||
TransitionedBy = _currentUser.Id
|
||||
};
|
||||
|
||||
await _repository.RecordTransitionAsync(transition, ct);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**HLC State Persistence**:
|
||||
```csharp
|
||||
// Service startup
|
||||
public async Task StartAsync(CancellationToken ct)
|
||||
{
|
||||
await _hlc.InitializeFromStateAsync(ct); // Restore monotonicity
|
||||
}
|
||||
|
||||
// Service shutdown
|
||||
public async Task StopAsync(CancellationToken ct)
|
||||
{
|
||||
await _hlc.PersistStateAsync(ct); // Persist HLC state
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration & Options
|
||||
|
||||
### Options Validation at Startup
|
||||
|
||||
Use `ValidateDataAnnotations()` and `ValidateOnStart()` for all options classes.
|
||||
|
||||
```csharp
|
||||
// Options class
|
||||
public sealed class PromotionManagerOptions
|
||||
{
|
||||
[Required]
|
||||
[Range(1, 10)]
|
||||
public int MaxConcurrentPromotions { get; set; } = 3;
|
||||
|
||||
[Required]
|
||||
[Range(1, 3600)]
|
||||
public int ApprovalExpirationSeconds { get; set; } = 1440;
|
||||
}
|
||||
|
||||
// Registration with validation
|
||||
services.AddOptions<PromotionManagerOptions>()
|
||||
.Bind(configuration.GetSection("PromotionManager"))
|
||||
.ValidateDataAnnotations()
|
||||
.ValidateOnStart();
|
||||
|
||||
// Complex validation
|
||||
public class PromotionManagerOptionsValidator : IValidateOptions<PromotionManagerOptions>
|
||||
{
|
||||
public ValidateOptionsResult Validate(string? name, PromotionManagerOptions options)
|
||||
{
|
||||
if (options.MaxConcurrentPromotions <= 0)
|
||||
return ValidateOptionsResult.Fail("MaxConcurrentPromotions must be positive");
|
||||
|
||||
return ValidateOptionsResult.Success;
|
||||
}
|
||||
}
|
||||
|
||||
services.AddSingleton<IValidateOptions<PromotionManagerOptions>, PromotionManagerOptionsValidator>();
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Immutability & Collections
|
||||
|
||||
### Return Immutable Collections from Public APIs
|
||||
|
||||
Public APIs **MUST** return `IReadOnlyList<T>`, `ImmutableArray<T>`, or defensive copies. Never expose mutable backing stores.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - exposes mutable backing store
|
||||
public class ReleaseManager
|
||||
{
|
||||
private readonly List<Component> _components = new();
|
||||
|
||||
public List<Component> Components => _components; // ❌ Callers can mutate!
|
||||
}
|
||||
|
||||
// ✅ GOOD - immutable return
|
||||
public class ReleaseManager
|
||||
{
|
||||
private readonly List<Component> _components = new();
|
||||
|
||||
public IReadOnlyList<Component> Components => _components.AsReadOnly(); // ✅ Read-only
|
||||
|
||||
// Or using ImmutableArray
|
||||
public ImmutableArray<Component> GetComponents() => _components.ToImmutableArray();
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
### No Silent Stubs
|
||||
|
||||
Placeholder code **MUST** throw `NotImplementedException` or return an explicit error. Never return success from unimplemented paths.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - silent stub masks missing implementation
|
||||
public async Task<Result> DeployToNomadAsync(Deployment deployment, CancellationToken ct)
|
||||
{
|
||||
// TODO: implement Nomad deployment
|
||||
return Result.Success(); // ❌ Ships broken feature!
|
||||
}
|
||||
|
||||
// ✅ GOOD - explicit failure
|
||||
public async Task<Result> DeployToNomadAsync(Deployment deployment, CancellationToken ct)
|
||||
{
|
||||
throw new NotImplementedException(
|
||||
"Nomad deployment not yet implemented. See SPRINT_20260115_003_AGENTS_nomad_support.md");
|
||||
}
|
||||
|
||||
// ✅ Alternative: return unsupported result
|
||||
public async Task<Result> DeployToNomadAsync(Deployment deployment, CancellationToken ct)
|
||||
{
|
||||
return Result.Failure("Nomad deployment target not yet supported. Use Docker or Compose.");
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Caching
|
||||
|
||||
### Bounded Caches with Eviction
|
||||
|
||||
**Do not** use `ConcurrentDictionary` or `Dictionary` for caching without eviction policies. Use bounded caches with TTL/LRU eviction.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - unbounded growth
|
||||
public class VersionMapCache
|
||||
{
|
||||
private readonly ConcurrentDictionary<string, DigestMapping> _cache = new();
|
||||
|
||||
public void Add(string tag, DigestMapping mapping)
|
||||
{
|
||||
_cache[tag] = mapping; // ❌ Never evicts, memory grows forever
|
||||
}
|
||||
}
|
||||
|
||||
// ✅ GOOD - bounded with eviction
|
||||
public class VersionMapCache
|
||||
{
|
||||
private readonly MemoryCache _cache;
|
||||
|
||||
public VersionMapCache()
|
||||
{
|
||||
_cache = new MemoryCache(new MemoryCacheOptions
|
||||
{
|
||||
SizeLimit = 10_000 // Max 10k entries
|
||||
});
|
||||
}
|
||||
|
||||
public void Add(string tag, DigestMapping mapping)
|
||||
{
|
||||
_cache.Set(tag, mapping, new MemoryCacheEntryOptions
|
||||
{
|
||||
Size = 1,
|
||||
SlidingExpiration = TimeSpan.FromHours(1) // ✅ 1 hour TTL
|
||||
});
|
||||
}
|
||||
|
||||
public DigestMapping? Get(string tag) => _cache.Get<DigestMapping>(tag);
|
||||
}
|
||||
```
|
||||
|
||||
**Cache TTL Recommendations**:
|
||||
- **Integration health checks**: 5 minutes
|
||||
- **Version maps (tag → digest)**: 1 hour
|
||||
- **Environment configs**: 30 minutes
|
||||
- **Agent capabilities**: 10 minutes
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Test Helpers Must Call Production Code
|
||||
|
||||
Test helpers **MUST** call production code, not reimplement algorithms. Only mock I/O and network boundaries.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - test reimplements production logic
|
||||
public static string ComputeEvidenceHash(DecisionRecord record)
|
||||
{
|
||||
// Custom hash implementation in test
|
||||
var json = JsonSerializer.Serialize(record); // ❌ Different from production!
|
||||
return SHA256.HashData(Encoding.UTF8.GetBytes(json)).ToHexString();
|
||||
}
|
||||
|
||||
// ✅ GOOD - test uses production code
|
||||
public static string ComputeEvidenceHash(DecisionRecord record)
|
||||
{
|
||||
// Calls production EvidenceHasher
|
||||
return EvidenceHasher.ComputeHash(record); // ✅ Same as production
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Path Resolution
|
||||
|
||||
### Explicit CLI Options for Paths
|
||||
|
||||
**Do not** derive paths from `AppContext.BaseDirectory` with parent directory walks. Use explicit CLI options or environment variables.
|
||||
|
||||
```csharp
|
||||
// ❌ BAD - fragile parent walks
|
||||
var repoRoot = Path.GetFullPath(Path.Combine(
|
||||
AppContext.BaseDirectory, "..", "..", "..", ".."));
|
||||
|
||||
// ✅ GOOD - explicit option with fallback
|
||||
[Option("--repo-root", Description = "Repository root path")]
|
||||
public string? RepoRoot { get; set; }
|
||||
|
||||
public string GetRepoRoot() =>
|
||||
RepoRoot
|
||||
?? Environment.GetEnvironmentVariable("STELLAOPS_REPO_ROOT")
|
||||
?? throw new InvalidOperationException(
|
||||
"Repository root not specified. Use --repo-root or set STELLAOPS_REPO_ROOT.");
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary Checklist
|
||||
|
||||
Before submitting a pull request, verify:
|
||||
|
||||
- [ ] `TreatWarningsAsErrors` enabled in project file
|
||||
- [ ] All timestamps use `TimeProvider`, never `DateTime.UtcNow`
|
||||
- [ ] All GUIDs use `IGuidGenerator`, never `Guid.NewGuid()`
|
||||
- [ ] `CancellationToken` propagated through all async methods
|
||||
- [ ] HTTP clients use `IHttpClientFactory`, never `new HttpClient()`
|
||||
- [ ] Culture-invariant parsing for all formatted strings
|
||||
- [ ] Canonical JSON for evidence/decision records
|
||||
- [ ] `DateTimeOffset` for all PostgreSQL `timestamptz` columns
|
||||
- [ ] HLC used for distributed ordering where applicable
|
||||
- [ ] Options classes validated at startup with `ValidateOnStart()`
|
||||
- [ ] Public APIs return immutable collections
|
||||
- [ ] No silent stubs; unimplemented code throws `NotImplementedException`
|
||||
- [ ] Caches have bounded size and TTL eviction
|
||||
- [ ] Tests exercise production code, not reimplementations
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [CLAUDE.md](../../../CLAUDE.md) — Stella Ops coding rules
|
||||
- [Test Structure](./test-structure.md) — Test organization guidelines
|
||||
- [Database Schema](./data-model/schema.md) — Schema patterns
|
||||
- [HLC Documentation](../../eventing/event-envelope-schema.md) — Event ordering with HLC
|
||||
643
docs/modules/release-orchestrator/integrations/ci-cd.md
Normal file
643
docs/modules/release-orchestrator/integrations/ci-cd.md
Normal file
@@ -0,0 +1,643 @@
|
||||
# CI/CD Integration
|
||||
|
||||
## Overview
|
||||
|
||||
Release Orchestrator integrates with CI/CD systems to:
|
||||
- Receive build completion notifications
|
||||
- Trigger additional pipelines during deployment
|
||||
- Create releases from CI artifacts
|
||||
- Report deployment status back to CI systems
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Pattern 1: CI Triggers Release
|
||||
|
||||
```
|
||||
CI TRIGGERS RELEASE
|
||||
|
||||
┌────────────┐ ┌────────────┐ ┌────────────────────┐
|
||||
│ CI/CD │ │ Container │ │ Release │
|
||||
│ System │ │ Registry │ │ Orchestrator │
|
||||
└─────┬──────┘ └─────┬──────┘ └─────────┬──────────┘
|
||||
│ │ │
|
||||
│ Build & Push │ │
|
||||
│─────────────────►│ │
|
||||
│ │ │
|
||||
│ │ Webhook: image pushed
|
||||
│ │─────────────────────►│
|
||||
│ │ │
|
||||
│ │ │ Create/Update
|
||||
│ │ │ Version Map
|
||||
│ │ │
|
||||
│ │ │ Auto-create
|
||||
│ │ │ Release (if configured)
|
||||
│ │ │
|
||||
│ API: Create Release (optional) │
|
||||
│────────────────────────────────────────►│
|
||||
│ │ │
|
||||
│ │ │ Start Promotion
|
||||
│ │ │ Workflow
|
||||
│ │ │
|
||||
```
|
||||
|
||||
### Pattern 2: Orchestrator Triggers CI
|
||||
|
||||
```
|
||||
ORCHESTRATOR TRIGGERS CI
|
||||
|
||||
┌────────────────────┐ ┌────────────┐ ┌────────────┐
|
||||
│ Release │ │ CI/CD │ │ Target │
|
||||
│ Orchestrator │ │ System │ │ Systems │
|
||||
└─────────┬──────────┘ └─────┬──────┘ └─────┬──────┘
|
||||
│ │ │
|
||||
│ Pre-deploy: Trigger │ │
|
||||
│ Integration Tests │ │
|
||||
│─────────────────────►│ │
|
||||
│ │ │
|
||||
│ │ Run Tests │
|
||||
│ │─────────────────►│
|
||||
│ │ │
|
||||
│ Wait for completion │ │
|
||||
│◄─────────────────────│ │
|
||||
│ │ │
|
||||
│ If passed: Deploy │ │
|
||||
│─────────────────────────────────────────►
|
||||
│ │ │
|
||||
```
|
||||
|
||||
### Pattern 3: Bidirectional Integration
|
||||
|
||||
```
|
||||
BIDIRECTIONAL INTEGRATION
|
||||
|
||||
┌────────────┐ ┌────────────────────┐
|
||||
│ CI/CD │◄───────────────────────►│ Release │
|
||||
│ System │ │ Orchestrator │
|
||||
└─────┬──────┘ └─────────┬──────────┘
|
||||
│ │
|
||||
│══════════════════════════════════════════│
|
||||
│ Events (both directions) │
|
||||
│══════════════════════════════════════════│
|
||||
│ │
|
||||
│ CI Events: │
|
||||
│ - Pipeline completed │
|
||||
│ - Tests passed/failed │
|
||||
│ - Artifacts ready │
|
||||
│ │
|
||||
│ Orchestrator Events: │
|
||||
│ - Deployment started │
|
||||
│ - Deployment completed │
|
||||
│ - Rollback initiated │
|
||||
│ │
|
||||
```
|
||||
|
||||
## CI/CD System Configuration
|
||||
|
||||
### GitLab CI Integration
|
||||
|
||||
```yaml
|
||||
# .gitlab-ci.yml
|
||||
stages:
|
||||
- build
|
||||
- push
|
||||
- release
|
||||
|
||||
variables:
|
||||
STELLA_API_URL: https://stella.example.com/api/v1
|
||||
COMPONENT_NAME: myapp
|
||||
|
||||
build:
|
||||
stage: build
|
||||
script:
|
||||
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
|
||||
|
||||
push:
|
||||
stage: push
|
||||
script:
|
||||
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
|
||||
- docker tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG
|
||||
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG
|
||||
rules:
|
||||
- if: $CI_COMMIT_TAG
|
||||
|
||||
release:
|
||||
stage: release
|
||||
image: curlimages/curl:latest
|
||||
script:
|
||||
- |
|
||||
# Get image digest
|
||||
DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG | cut -d@ -f2)
|
||||
|
||||
# Create release in Stella
|
||||
curl -X POST "$STELLA_API_URL/releases" \
|
||||
-H "Authorization: Bearer $STELLA_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{
|
||||
\"name\": \"$COMPONENT_NAME-$CI_COMMIT_TAG\",
|
||||
\"components\": [{
|
||||
\"componentId\": \"$STELLA_COMPONENT_ID\",
|
||||
\"digest\": \"$DIGEST\"
|
||||
}],
|
||||
\"sourceRef\": {
|
||||
\"type\": \"git\",
|
||||
\"repository\": \"$CI_PROJECT_URL\",
|
||||
\"commit\": \"$CI_COMMIT_SHA\",
|
||||
\"tag\": \"$CI_COMMIT_TAG\"
|
||||
}
|
||||
}"
|
||||
rules:
|
||||
- if: $CI_COMMIT_TAG
|
||||
```
|
||||
|
||||
### GitHub Actions Integration
|
||||
|
||||
```yaml
|
||||
# .github/workflows/release.yml
|
||||
name: Release to Stella
|
||||
|
||||
on:
|
||||
push:
|
||||
tags:
|
||||
- 'v*'
|
||||
|
||||
jobs:
|
||||
build-and-release:
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
|
||||
- name: Login to Container Registry
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
registry: ghcr.io
|
||||
username: ${{ github.actor }}
|
||||
password: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
- name: Build and push
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
push: true
|
||||
tags: |
|
||||
ghcr.io/${{ github.repository }}:${{ github.sha }}
|
||||
ghcr.io/${{ github.repository }}:${{ github.ref_name }}
|
||||
|
||||
- name: Get image digest
|
||||
id: digest
|
||||
run: |
|
||||
DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' ghcr.io/${{ github.repository }}:${{ github.ref_name }} | cut -d@ -f2)
|
||||
echo "digest=$DIGEST" >> $GITHUB_OUTPUT
|
||||
|
||||
- name: Create Stella Release
|
||||
uses: stella-ops/create-release-action@v1
|
||||
with:
|
||||
stella-url: ${{ vars.STELLA_API_URL }}
|
||||
stella-token: ${{ secrets.STELLA_TOKEN }}
|
||||
release-name: ${{ github.event.repository.name }}-${{ github.ref_name }}
|
||||
components: |
|
||||
- componentId: ${{ vars.STELLA_COMPONENT_ID }}
|
||||
digest: ${{ steps.digest.outputs.digest }}
|
||||
source-ref: |
|
||||
type: git
|
||||
repository: ${{ github.server_url }}/${{ github.repository }}
|
||||
commit: ${{ github.sha }}
|
||||
tag: ${{ github.ref_name }}
|
||||
```
|
||||
|
||||
### Jenkins Integration
|
||||
|
||||
```groovy
|
||||
// Jenkinsfile
|
||||
pipeline {
|
||||
agent any
|
||||
|
||||
environment {
|
||||
STELLA_API_URL = 'https://stella.example.com/api/v1'
|
||||
STELLA_TOKEN = credentials('stella-api-token')
|
||||
REGISTRY = 'registry.example.com'
|
||||
IMAGE_NAME = 'myorg/myapp'
|
||||
}
|
||||
|
||||
stages {
|
||||
stage('Build') {
|
||||
steps {
|
||||
script {
|
||||
docker.build("${REGISTRY}/${IMAGE_NAME}:${env.BUILD_TAG}")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
stage('Push') {
|
||||
steps {
|
||||
script {
|
||||
docker.withRegistry("https://${REGISTRY}", 'registry-creds') {
|
||||
docker.image("${REGISTRY}/${IMAGE_NAME}:${env.BUILD_TAG}").push()
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
stage('Create Release') {
|
||||
when {
|
||||
tag pattern: "v\\d+\\.\\d+\\.\\d+", comparator: "REGEXP"
|
||||
}
|
||||
steps {
|
||||
script {
|
||||
def digest = sh(
|
||||
script: "docker inspect --format='{{index .RepoDigests 0}}' ${REGISTRY}/${IMAGE_NAME}:${env.TAG_NAME} | cut -d@ -f2",
|
||||
returnStdout: true
|
||||
).trim()
|
||||
|
||||
def response = httpRequest(
|
||||
url: "${STELLA_API_URL}/releases",
|
||||
httpMode: 'POST',
|
||||
contentType: 'APPLICATION_JSON',
|
||||
customHeaders: [[name: 'Authorization', value: "Bearer ${STELLA_TOKEN}"]],
|
||||
requestBody: """
|
||||
{
|
||||
"name": "${IMAGE_NAME}-${env.TAG_NAME}",
|
||||
"components": [{
|
||||
"componentId": "${env.STELLA_COMPONENT_ID}",
|
||||
"digest": "${digest}"
|
||||
}],
|
||||
"sourceRef": {
|
||||
"type": "git",
|
||||
"repository": "${env.GIT_URL}",
|
||||
"commit": "${env.GIT_COMMIT}",
|
||||
"tag": "${env.TAG_NAME}"
|
||||
}
|
||||
}
|
||||
"""
|
||||
)
|
||||
|
||||
echo "Release created: ${response.content}"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
post {
|
||||
success {
|
||||
// Notify Stella of successful build
|
||||
httpRequest(
|
||||
url: "${STELLA_API_URL}/webhooks/ci-status",
|
||||
httpMode: 'POST',
|
||||
contentType: 'APPLICATION_JSON',
|
||||
customHeaders: [[name: 'Authorization', value: "Bearer ${STELLA_TOKEN}"]],
|
||||
requestBody: """
|
||||
{
|
||||
"buildId": "${env.BUILD_ID}",
|
||||
"status": "success",
|
||||
"commit": "${env.GIT_COMMIT}"
|
||||
}
|
||||
"""
|
||||
)
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Workflow Step Integration
|
||||
|
||||
### Trigger CI Pipeline Step
|
||||
|
||||
```typescript
|
||||
// Step type: trigger-ci
|
||||
interface TriggerCIConfig {
|
||||
integrationId: UUID; // CI integration reference
|
||||
pipelineId: string; // Pipeline to trigger
|
||||
ref?: string; // Branch/tag reference
|
||||
variables?: Record<string, string>;
|
||||
waitForCompletion: boolean;
|
||||
timeout?: number;
|
||||
}
|
||||
|
||||
class TriggerCIStep implements IStepExecutor {
|
||||
async execute(
|
||||
inputs: StepInputs,
|
||||
config: TriggerCIConfig,
|
||||
context: ExecutionContext
|
||||
): Promise<StepOutputs> {
|
||||
const connector = await this.getConnector(config.integrationId);
|
||||
|
||||
// Trigger pipeline
|
||||
const run = await connector.triggerPipeline(
|
||||
config.pipelineId,
|
||||
{
|
||||
ref: config.ref || context.release?.sourceRef?.tag,
|
||||
variables: {
|
||||
...config.variables,
|
||||
STELLA_RELEASE_ID: context.release?.id,
|
||||
STELLA_PROMOTION_ID: context.promotion?.id,
|
||||
STELLA_ENVIRONMENT: context.environment?.name
|
||||
}
|
||||
}
|
||||
);
|
||||
|
||||
if (!config.waitForCompletion) {
|
||||
return {
|
||||
pipelineRunId: run.id,
|
||||
status: run.status,
|
||||
webUrl: run.webUrl
|
||||
};
|
||||
}
|
||||
|
||||
// Wait for completion
|
||||
const finalStatus = await this.waitForPipeline(
|
||||
connector,
|
||||
run.id,
|
||||
config.timeout || 3600
|
||||
);
|
||||
|
||||
if (finalStatus.status !== "success") {
|
||||
throw new StepError(
|
||||
`Pipeline failed with status: ${finalStatus.status}`,
|
||||
{ pipelineRunId: run.id, status: finalStatus }
|
||||
);
|
||||
}
|
||||
|
||||
return {
|
||||
pipelineRunId: run.id,
|
||||
status: finalStatus.status,
|
||||
webUrl: run.webUrl
|
||||
};
|
||||
}
|
||||
|
||||
private async waitForPipeline(
|
||||
connector: ICICDConnector,
|
||||
runId: string,
|
||||
timeout: number
|
||||
): Promise<PipelineRun> {
|
||||
const deadline = Date.now() + timeout * 1000;
|
||||
|
||||
while (Date.now() < deadline) {
|
||||
const run = await connector.getPipelineRun(runId);
|
||||
|
||||
if (run.status === "success" || run.status === "failed" || run.status === "cancelled") {
|
||||
return run;
|
||||
}
|
||||
|
||||
await sleep(10000); // Poll every 10 seconds
|
||||
}
|
||||
|
||||
throw new TimeoutError(`Pipeline did not complete within ${timeout} seconds`);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Wait for CI Step
|
||||
|
||||
```typescript
|
||||
// Step type: wait-ci
|
||||
interface WaitCIConfig {
|
||||
integrationId: UUID;
|
||||
runId?: string; // If known, or from input
|
||||
runIdInput?: string; // Input name containing run ID
|
||||
timeout: number;
|
||||
failOnError: boolean;
|
||||
}
|
||||
|
||||
class WaitCIStep implements IStepExecutor {
|
||||
async execute(
|
||||
inputs: StepInputs,
|
||||
config: WaitCIConfig,
|
||||
context: ExecutionContext
|
||||
): Promise<StepOutputs> {
|
||||
const runId = config.runId || inputs[config.runIdInput!];
|
||||
|
||||
if (!runId) {
|
||||
throw new StepError("Pipeline run ID not provided");
|
||||
}
|
||||
|
||||
const connector = await this.getConnector(config.integrationId);
|
||||
|
||||
const finalStatus = await this.waitForPipeline(
|
||||
connector,
|
||||
runId,
|
||||
config.timeout
|
||||
);
|
||||
|
||||
const success = finalStatus.status === "success";
|
||||
|
||||
if (!success && config.failOnError) {
|
||||
throw new StepError(
|
||||
`Pipeline failed with status: ${finalStatus.status}`,
|
||||
{ pipelineRunId: runId, status: finalStatus }
|
||||
);
|
||||
}
|
||||
|
||||
return {
|
||||
status: finalStatus.status,
|
||||
success,
|
||||
pipelineRun: finalStatus
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Deployment Status Reporting
|
||||
|
||||
### GitHub Deployment Status
|
||||
|
||||
```typescript
|
||||
class GitHubStatusReporter {
|
||||
async reportDeploymentStart(
|
||||
integration: Integration,
|
||||
deployment: DeploymentContext
|
||||
): Promise<void> {
|
||||
const client = await this.getClient(integration);
|
||||
|
||||
// Create deployment
|
||||
const { data: ghDeployment } = await client.repos.createDeployment({
|
||||
owner: deployment.repository.owner,
|
||||
repo: deployment.repository.name,
|
||||
ref: deployment.sourceRef.commit,
|
||||
environment: deployment.environment.name,
|
||||
auto_merge: false,
|
||||
required_contexts: [],
|
||||
payload: {
|
||||
stellaReleaseId: deployment.release.id,
|
||||
stellaPromotionId: deployment.promotion.id
|
||||
}
|
||||
});
|
||||
|
||||
// Set status to in_progress
|
||||
await client.repos.createDeploymentStatus({
|
||||
owner: deployment.repository.owner,
|
||||
repo: deployment.repository.name,
|
||||
deployment_id: ghDeployment.id,
|
||||
state: "in_progress",
|
||||
log_url: `${this.stellaUrl}/deployments/${deployment.jobId}`,
|
||||
description: "Deployment in progress"
|
||||
});
|
||||
|
||||
// Store deployment ID for later status update
|
||||
await this.storeMapping(deployment.jobId, ghDeployment.id);
|
||||
}
|
||||
|
||||
async reportDeploymentComplete(
|
||||
integration: Integration,
|
||||
deployment: DeploymentContext,
|
||||
success: boolean
|
||||
): Promise<void> {
|
||||
const client = await this.getClient(integration);
|
||||
const ghDeploymentId = await this.getMapping(deployment.jobId);
|
||||
|
||||
await client.repos.createDeploymentStatus({
|
||||
owner: deployment.repository.owner,
|
||||
repo: deployment.repository.name,
|
||||
deployment_id: ghDeploymentId,
|
||||
state: success ? "success" : "failure",
|
||||
log_url: `${this.stellaUrl}/deployments/${deployment.jobId}`,
|
||||
environment_url: deployment.environment.url,
|
||||
description: success
|
||||
? "Deployment completed successfully"
|
||||
: "Deployment failed"
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### GitLab Pipeline Status
|
||||
|
||||
```typescript
|
||||
class GitLabStatusReporter {
|
||||
async reportDeploymentStatus(
|
||||
integration: Integration,
|
||||
deployment: DeploymentContext,
|
||||
state: "running" | "success" | "failed" | "canceled"
|
||||
): Promise<void> {
|
||||
const client = await this.getClient(integration);
|
||||
|
||||
await client.post(
|
||||
`/projects/${integration.config.projectId}/statuses/${deployment.sourceRef.commit}`,
|
||||
{
|
||||
state,
|
||||
ref: deployment.sourceRef.tag || deployment.sourceRef.branch,
|
||||
name: `stella/${deployment.environment.name}`,
|
||||
target_url: `${this.stellaUrl}/deployments/${deployment.jobId}`,
|
||||
description: this.getDescription(state, deployment)
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
private getDescription(state: string, deployment: DeploymentContext): string {
|
||||
switch (state) {
|
||||
case "running":
|
||||
return `Deploying to ${deployment.environment.name}`;
|
||||
case "success":
|
||||
return `Deployed to ${deployment.environment.name}`;
|
||||
case "failed":
|
||||
return `Deployment to ${deployment.environment.name} failed`;
|
||||
case "canceled":
|
||||
return `Deployment to ${deployment.environment.name} cancelled`;
|
||||
default:
|
||||
return "";
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## API for CI Systems
|
||||
|
||||
### Create Release from CI
|
||||
|
||||
```http
|
||||
POST /api/v1/releases
|
||||
Authorization: Bearer <ci-token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"name": "myapp-v1.2.0",
|
||||
"components": [
|
||||
{
|
||||
"componentId": "component-uuid",
|
||||
"digest": "sha256:abc123..."
|
||||
}
|
||||
],
|
||||
"sourceRef": {
|
||||
"type": "git",
|
||||
"repository": "https://github.com/myorg/myapp",
|
||||
"commit": "abc123def456",
|
||||
"tag": "v1.2.0",
|
||||
"branch": "main"
|
||||
},
|
||||
"metadata": {
|
||||
"buildId": "12345",
|
||||
"buildUrl": "https://ci.example.com/builds/12345",
|
||||
"triggeredBy": "ci-pipeline"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Report Build Status
|
||||
|
||||
```http
|
||||
POST /api/v1/ci-events/build-complete
|
||||
Authorization: Bearer <ci-token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"integrationId": "integration-uuid",
|
||||
"buildId": "12345",
|
||||
"status": "success",
|
||||
"commit": "abc123def456",
|
||||
"artifacts": [
|
||||
{
|
||||
"name": "myapp",
|
||||
"digest": "sha256:abc123...",
|
||||
"repository": "registry.example.com/myorg/myapp"
|
||||
}
|
||||
],
|
||||
"testResults": {
|
||||
"passed": 150,
|
||||
"failed": 0,
|
||||
"skipped": 5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Service Account for CI
|
||||
|
||||
### Creating CI Service Account
|
||||
|
||||
```http
|
||||
POST /api/v1/service-accounts
|
||||
Authorization: Bearer <admin-token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"name": "ci-pipeline",
|
||||
"description": "Service account for CI/CD integration",
|
||||
"roles": ["release-creator"],
|
||||
"permissions": [
|
||||
{ "resource": "release", "action": "create" },
|
||||
{ "resource": "component", "action": "read" },
|
||||
{ "resource": "version-map", "action": "read" }
|
||||
],
|
||||
"expiresIn": "365d"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"id": "sa-uuid",
|
||||
"name": "ci-pipeline",
|
||||
"token": "stella_sa_xxxxxxxxxxxxx",
|
||||
"expiresAt": "2027-01-09T00:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Integrations Overview](overview.md)
|
||||
- [Connectors](connectors.md)
|
||||
- [Webhooks](webhooks.md)
|
||||
- [Workflow Templates](../workflow/templates.md)
|
||||
900
docs/modules/release-orchestrator/integrations/connectors.md
Normal file
900
docs/modules/release-orchestrator/integrations/connectors.md
Normal file
@@ -0,0 +1,900 @@
|
||||
# Connector Development
|
||||
|
||||
## Overview
|
||||
|
||||
Connectors are the integration layer between Release Orchestrator and external systems. Each connector implements a standard interface for its integration type.
|
||||
|
||||
## Connector Architecture
|
||||
|
||||
```
|
||||
CONNECTOR ARCHITECTURE
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ CONNECTOR RUNTIME │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ CONNECTOR INTERFACE │ │
|
||||
│ │ │ │
|
||||
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │
|
||||
│ │ │ getCapabilities()│ │ ping() │ │ authenticate() │ │ │
|
||||
│ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │
|
||||
│ │ │ discover() │ │ execute() │ │ healthCheck() │ │ │
|
||||
│ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ CONNECTOR IMPLEMENTATIONS │ │
|
||||
│ │ │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
|
||||
│ │ │ Registry │ │ CI/CD │ │ Notification│ │ Secret │ │ │
|
||||
│ │ │ Connectors │ │ Connectors │ │ Connectors │ │ Connectors │ │ │
|
||||
│ │ │ │ │ │ │ │ │ │ │ │
|
||||
│ │ │ - Docker │ │ - GitLab │ │ - Slack │ │ - Vault │ │ │
|
||||
│ │ │ - ECR │ │ - GitHub │ │ - Teams │ │ - AWS SM │ │ │
|
||||
│ │ │ - ACR │ │ - Jenkins │ │ - Email │ │ - Azure KV │ │ │
|
||||
│ │ │ - Harbor │ │ - Azure DO │ │ - PagerDuty │ │ │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Base Connector Interface
|
||||
|
||||
```typescript
|
||||
interface IConnector {
|
||||
// Metadata
|
||||
readonly typeId: string;
|
||||
readonly displayName: string;
|
||||
readonly version: string;
|
||||
readonly capabilities: ConnectorCapabilities;
|
||||
|
||||
// Lifecycle
|
||||
initialize(config: IntegrationConfig): Promise<void>;
|
||||
dispose(): Promise<void>;
|
||||
|
||||
// Health
|
||||
ping(config: IntegrationConfig): Promise<void>;
|
||||
healthCheck(config: IntegrationConfig, creds: Credential): Promise<HealthCheckResult>;
|
||||
|
||||
// Authentication
|
||||
authenticate(config: IntegrationConfig, creds: Credential): Promise<AuthContext>;
|
||||
|
||||
// Discovery (optional)
|
||||
discover?(
|
||||
config: IntegrationConfig,
|
||||
authContext: AuthContext,
|
||||
resourceType: string,
|
||||
filter?: DiscoveryFilter
|
||||
): Promise<DiscoveredResource[]>;
|
||||
}
|
||||
|
||||
interface ConnectorCapabilities {
|
||||
discovery: boolean;
|
||||
webhooks: boolean;
|
||||
streaming: boolean;
|
||||
batchOperations: boolean;
|
||||
customActions: string[];
|
||||
}
|
||||
```
|
||||
|
||||
## Registry Connectors
|
||||
|
||||
### IRegistryConnector
|
||||
|
||||
```typescript
|
||||
interface IRegistryConnector extends IConnector {
|
||||
// Repository operations
|
||||
listRepositories(authContext: AuthContext): Promise<Repository[]>;
|
||||
|
||||
// Tag operations
|
||||
listTags(authContext: AuthContext, repository: string): Promise<Tag[]>;
|
||||
getManifest(authContext: AuthContext, repository: string, reference: string): Promise<Manifest>;
|
||||
getDigest(authContext: AuthContext, repository: string, tag: string): Promise<string>;
|
||||
|
||||
// Image operations
|
||||
imageExists(authContext: AuthContext, repository: string, digest: string): Promise<boolean>;
|
||||
getImageMetadata(authContext: AuthContext, repository: string, digest: string): Promise<ImageMetadata>;
|
||||
}
|
||||
|
||||
interface Repository {
|
||||
name: string;
|
||||
fullName: string;
|
||||
tagCount?: number;
|
||||
lastUpdated?: DateTime;
|
||||
}
|
||||
|
||||
interface Tag {
|
||||
name: string;
|
||||
digest: string;
|
||||
createdAt?: DateTime;
|
||||
size?: number;
|
||||
}
|
||||
|
||||
interface ImageMetadata {
|
||||
digest: string;
|
||||
mediaType: string;
|
||||
size: number;
|
||||
architecture: string;
|
||||
os: string;
|
||||
created: DateTime;
|
||||
labels: Record<string, string>;
|
||||
layers: LayerInfo[];
|
||||
}
|
||||
```
|
||||
|
||||
### Docker Registry Connector
|
||||
|
||||
```typescript
|
||||
class DockerRegistryConnector implements IRegistryConnector {
|
||||
readonly typeId = "docker-registry";
|
||||
readonly displayName = "Docker Registry";
|
||||
readonly version = "1.0.0";
|
||||
readonly capabilities: ConnectorCapabilities = {
|
||||
discovery: true,
|
||||
webhooks: true,
|
||||
streaming: false,
|
||||
batchOperations: false,
|
||||
customActions: []
|
||||
};
|
||||
|
||||
private httpClient: HttpClient;
|
||||
|
||||
async initialize(config: DockerRegistryConfig): Promise<void> {
|
||||
this.httpClient = new HttpClient({
|
||||
baseUrl: config.url,
|
||||
timeout: config.timeout || 30000,
|
||||
insecureSkipVerify: config.insecureSkipVerify
|
||||
});
|
||||
}
|
||||
|
||||
async ping(config: DockerRegistryConfig): Promise<void> {
|
||||
const response = await this.httpClient.get("/v2/");
|
||||
if (response.status !== 200 && response.status !== 401) {
|
||||
throw new Error(`Registry unavailable: ${response.status}`);
|
||||
}
|
||||
}
|
||||
|
||||
async authenticate(
|
||||
config: DockerRegistryConfig,
|
||||
creds: BasicCredential
|
||||
): Promise<AuthContext> {
|
||||
// Get auth challenge from /v2/
|
||||
const challenge = await this.getAuthChallenge();
|
||||
|
||||
if (challenge.type === "bearer") {
|
||||
// OAuth2 token flow
|
||||
const token = await this.getToken(challenge, creds);
|
||||
return { type: "bearer", token };
|
||||
} else {
|
||||
// Basic auth
|
||||
return {
|
||||
type: "basic",
|
||||
credentials: Buffer.from(`${creds.username}:${creds.password}`).toString("base64")
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
async getDigest(
|
||||
authContext: AuthContext,
|
||||
repository: string,
|
||||
tag: string
|
||||
): Promise<string> {
|
||||
const response = await this.httpClient.head(
|
||||
`/v2/${repository}/manifests/${tag}`,
|
||||
{
|
||||
headers: {
|
||||
...this.authHeader(authContext),
|
||||
Accept: "application/vnd.docker.distribution.manifest.v2+json"
|
||||
}
|
||||
}
|
||||
);
|
||||
|
||||
const digest = response.headers.get("docker-content-digest");
|
||||
if (!digest) {
|
||||
throw new Error("No digest header in response");
|
||||
}
|
||||
|
||||
return digest;
|
||||
}
|
||||
|
||||
async getImageMetadata(
|
||||
authContext: AuthContext,
|
||||
repository: string,
|
||||
digest: string
|
||||
): Promise<ImageMetadata> {
|
||||
// Fetch manifest
|
||||
const manifest = await this.getManifest(authContext, repository, digest);
|
||||
|
||||
// Fetch config blob
|
||||
const configDigest = manifest.config.digest;
|
||||
const configResponse = await this.httpClient.get(
|
||||
`/v2/${repository}/blobs/${configDigest}`,
|
||||
{ headers: this.authHeader(authContext) }
|
||||
);
|
||||
|
||||
const config = await configResponse.json();
|
||||
|
||||
return {
|
||||
digest,
|
||||
mediaType: manifest.mediaType,
|
||||
size: manifest.config.size,
|
||||
architecture: config.architecture,
|
||||
os: config.os,
|
||||
created: new Date(config.created),
|
||||
labels: config.config?.Labels || {},
|
||||
layers: manifest.layers.map(l => ({
|
||||
digest: l.digest,
|
||||
size: l.size,
|
||||
mediaType: l.mediaType
|
||||
}))
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### ECR Connector
|
||||
|
||||
```typescript
|
||||
class ECRConnector implements IRegistryConnector {
|
||||
readonly typeId = "ecr";
|
||||
readonly displayName = "AWS ECR";
|
||||
readonly version = "1.0.0";
|
||||
readonly capabilities: ConnectorCapabilities = {
|
||||
discovery: true,
|
||||
webhooks: false,
|
||||
streaming: false,
|
||||
batchOperations: true,
|
||||
customActions: ["createRepository", "setLifecyclePolicy"]
|
||||
};
|
||||
|
||||
private ecrClient: ECRClient;
|
||||
|
||||
async initialize(config: ECRConfig): Promise<void> {
|
||||
this.ecrClient = new ECRClient({
|
||||
region: config.region,
|
||||
credentials: {
|
||||
accessKeyId: config.accessKeyId,
|
||||
secretAccessKey: config.secretAccessKey
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
async authenticate(
|
||||
config: ECRConfig,
|
||||
creds: AWSCredential
|
||||
): Promise<AuthContext> {
|
||||
const command = new GetAuthorizationTokenCommand({});
|
||||
const response = await this.ecrClient.send(command);
|
||||
|
||||
const authData = response.authorizationData?.[0];
|
||||
if (!authData?.authorizationToken) {
|
||||
throw new Error("Failed to get ECR authorization token");
|
||||
}
|
||||
|
||||
return {
|
||||
type: "bearer",
|
||||
token: authData.authorizationToken,
|
||||
expiresAt: authData.expiresAt
|
||||
};
|
||||
}
|
||||
|
||||
async listRepositories(authContext: AuthContext): Promise<Repository[]> {
|
||||
const repositories: Repository[] = [];
|
||||
let nextToken: string | undefined;
|
||||
|
||||
do {
|
||||
const command = new DescribeRepositoriesCommand({
|
||||
nextToken
|
||||
});
|
||||
const response = await this.ecrClient.send(command);
|
||||
|
||||
for (const repo of response.repositories || []) {
|
||||
repositories.push({
|
||||
name: repo.repositoryName!,
|
||||
fullName: repo.repositoryUri!,
|
||||
lastUpdated: repo.createdAt
|
||||
});
|
||||
}
|
||||
|
||||
nextToken = response.nextToken;
|
||||
} while (nextToken);
|
||||
|
||||
return repositories;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## CI/CD Connectors
|
||||
|
||||
### ICICDConnector
|
||||
|
||||
```typescript
|
||||
interface ICICDConnector extends IConnector {
|
||||
// Pipeline operations
|
||||
listPipelines(authContext: AuthContext): Promise<Pipeline[]>;
|
||||
getPipeline(authContext: AuthContext, pipelineId: string): Promise<Pipeline>;
|
||||
|
||||
// Trigger operations
|
||||
triggerPipeline(
|
||||
authContext: AuthContext,
|
||||
pipelineId: string,
|
||||
params: TriggerParams
|
||||
): Promise<PipelineRun>;
|
||||
|
||||
// Run operations
|
||||
getPipelineRun(authContext: AuthContext, runId: string): Promise<PipelineRun>;
|
||||
cancelPipelineRun(authContext: AuthContext, runId: string): Promise<void>;
|
||||
getPipelineRunLogs(authContext: AuthContext, runId: string): Promise<string>;
|
||||
}
|
||||
|
||||
interface Pipeline {
|
||||
id: string;
|
||||
name: string;
|
||||
ref?: string;
|
||||
webUrl?: string;
|
||||
}
|
||||
|
||||
interface TriggerParams {
|
||||
ref?: string; // Branch/tag
|
||||
variables?: Record<string, string>;
|
||||
}
|
||||
|
||||
interface PipelineRun {
|
||||
id: string;
|
||||
pipelineId: string;
|
||||
status: PipelineStatus;
|
||||
ref?: string;
|
||||
webUrl?: string;
|
||||
startedAt?: DateTime;
|
||||
finishedAt?: DateTime;
|
||||
}
|
||||
|
||||
type PipelineStatus =
|
||||
| "pending"
|
||||
| "running"
|
||||
| "success"
|
||||
| "failed"
|
||||
| "cancelled";
|
||||
```
|
||||
|
||||
### GitLab CI Connector
|
||||
|
||||
```typescript
|
||||
class GitLabCIConnector implements ICICDConnector {
|
||||
readonly typeId = "gitlab-ci";
|
||||
readonly displayName = "GitLab CI/CD";
|
||||
readonly version = "1.0.0";
|
||||
readonly capabilities: ConnectorCapabilities = {
|
||||
discovery: true,
|
||||
webhooks: true,
|
||||
streaming: false,
|
||||
batchOperations: false,
|
||||
customActions: ["retryPipeline"]
|
||||
};
|
||||
|
||||
private apiClient: GitLabClient;
|
||||
|
||||
async initialize(config: GitLabCIConfig): Promise<void> {
|
||||
this.apiClient = new GitLabClient({
|
||||
baseUrl: config.url,
|
||||
projectId: config.projectId
|
||||
});
|
||||
}
|
||||
|
||||
async authenticate(
|
||||
config: GitLabCIConfig,
|
||||
creds: TokenCredential
|
||||
): Promise<AuthContext> {
|
||||
// Validate token with user endpoint
|
||||
this.apiClient.setToken(creds.token);
|
||||
await this.apiClient.get("/user");
|
||||
|
||||
return {
|
||||
type: "bearer",
|
||||
token: creds.token
|
||||
};
|
||||
}
|
||||
|
||||
async triggerPipeline(
|
||||
authContext: AuthContext,
|
||||
pipelineId: string,
|
||||
params: TriggerParams
|
||||
): Promise<PipelineRun> {
|
||||
const response = await this.apiClient.post(
|
||||
`/projects/${this.projectId}/pipeline`,
|
||||
{
|
||||
ref: params.ref || this.defaultBranch,
|
||||
variables: Object.entries(params.variables || {}).map(([key, value]) => ({
|
||||
key,
|
||||
value,
|
||||
variable_type: "env_var"
|
||||
}))
|
||||
},
|
||||
{ headers: { Authorization: `Bearer ${authContext.token}` } }
|
||||
);
|
||||
|
||||
return {
|
||||
id: response.id.toString(),
|
||||
pipelineId: pipelineId,
|
||||
status: this.mapStatus(response.status),
|
||||
ref: response.ref,
|
||||
webUrl: response.web_url,
|
||||
startedAt: response.started_at ? new Date(response.started_at) : undefined
|
||||
};
|
||||
}
|
||||
|
||||
async getPipelineRun(
|
||||
authContext: AuthContext,
|
||||
runId: string
|
||||
): Promise<PipelineRun> {
|
||||
const response = await this.apiClient.get(
|
||||
`/projects/${this.projectId}/pipelines/${runId}`,
|
||||
{ headers: { Authorization: `Bearer ${authContext.token}` } }
|
||||
);
|
||||
|
||||
return {
|
||||
id: response.id.toString(),
|
||||
pipelineId: response.id.toString(),
|
||||
status: this.mapStatus(response.status),
|
||||
ref: response.ref,
|
||||
webUrl: response.web_url,
|
||||
startedAt: response.started_at ? new Date(response.started_at) : undefined,
|
||||
finishedAt: response.finished_at ? new Date(response.finished_at) : undefined
|
||||
};
|
||||
}
|
||||
|
||||
private mapStatus(gitlabStatus: string): PipelineStatus {
|
||||
const statusMap: Record<string, PipelineStatus> = {
|
||||
created: "pending",
|
||||
waiting_for_resource: "pending",
|
||||
preparing: "pending",
|
||||
pending: "pending",
|
||||
running: "running",
|
||||
success: "success",
|
||||
failed: "failed",
|
||||
canceled: "cancelled",
|
||||
skipped: "cancelled",
|
||||
manual: "pending"
|
||||
};
|
||||
return statusMap[gitlabStatus] || "pending";
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Notification Connectors
|
||||
|
||||
### INotificationConnector
|
||||
|
||||
```typescript
|
||||
interface INotificationConnector extends IConnector {
|
||||
// Channel operations
|
||||
listChannels(authContext: AuthContext): Promise<Channel[]>;
|
||||
|
||||
// Send operations
|
||||
sendMessage(
|
||||
authContext: AuthContext,
|
||||
channel: string,
|
||||
message: NotificationMessage
|
||||
): Promise<MessageResult>;
|
||||
|
||||
sendTemplate(
|
||||
authContext: AuthContext,
|
||||
channel: string,
|
||||
templateId: string,
|
||||
data: Record<string, any>
|
||||
): Promise<MessageResult>;
|
||||
}
|
||||
|
||||
interface Channel {
|
||||
id: string;
|
||||
name: string;
|
||||
type: string;
|
||||
}
|
||||
|
||||
interface NotificationMessage {
|
||||
text: string;
|
||||
title?: string;
|
||||
color?: string;
|
||||
fields?: MessageField[];
|
||||
actions?: MessageAction[];
|
||||
}
|
||||
|
||||
interface MessageField {
|
||||
name: string;
|
||||
value: string;
|
||||
inline?: boolean;
|
||||
}
|
||||
|
||||
interface MessageAction {
|
||||
type: "button" | "link";
|
||||
text: string;
|
||||
url?: string;
|
||||
style?: "primary" | "danger" | "default";
|
||||
}
|
||||
```
|
||||
|
||||
### Slack Connector
|
||||
|
||||
```typescript
|
||||
class SlackConnector implements INotificationConnector {
|
||||
readonly typeId = "slack";
|
||||
readonly displayName = "Slack";
|
||||
readonly version = "1.0.0";
|
||||
readonly capabilities: ConnectorCapabilities = {
|
||||
discovery: true,
|
||||
webhooks: true,
|
||||
streaming: false,
|
||||
batchOperations: false,
|
||||
customActions: ["addReaction", "updateMessage"]
|
||||
};
|
||||
|
||||
private slackClient: WebClient;
|
||||
|
||||
async initialize(config: SlackConfig): Promise<void> {
|
||||
// Client initialized in authenticate
|
||||
}
|
||||
|
||||
async authenticate(
|
||||
config: SlackConfig,
|
||||
creds: TokenCredential
|
||||
): Promise<AuthContext> {
|
||||
this.slackClient = new WebClient(creds.token);
|
||||
|
||||
// Test authentication
|
||||
const result = await this.slackClient.auth.test();
|
||||
if (!result.ok) {
|
||||
throw new Error("Slack authentication failed");
|
||||
}
|
||||
|
||||
return {
|
||||
type: "bearer",
|
||||
token: creds.token,
|
||||
teamId: result.team_id,
|
||||
userId: result.user_id
|
||||
};
|
||||
}
|
||||
|
||||
async listChannels(authContext: AuthContext): Promise<Channel[]> {
|
||||
const channels: Channel[] = [];
|
||||
let cursor: string | undefined;
|
||||
|
||||
do {
|
||||
const result = await this.slackClient.conversations.list({
|
||||
types: "public_channel,private_channel",
|
||||
cursor
|
||||
});
|
||||
|
||||
for (const channel of result.channels || []) {
|
||||
channels.push({
|
||||
id: channel.id!,
|
||||
name: channel.name!,
|
||||
type: channel.is_private ? "private" : "public"
|
||||
});
|
||||
}
|
||||
|
||||
cursor = result.response_metadata?.next_cursor;
|
||||
} while (cursor);
|
||||
|
||||
return channels;
|
||||
}
|
||||
|
||||
async sendMessage(
|
||||
authContext: AuthContext,
|
||||
channel: string,
|
||||
message: NotificationMessage
|
||||
): Promise<MessageResult> {
|
||||
const blocks = this.buildBlocks(message);
|
||||
|
||||
const result = await this.slackClient.chat.postMessage({
|
||||
channel,
|
||||
text: message.text,
|
||||
blocks,
|
||||
attachments: message.color ? [{
|
||||
color: message.color,
|
||||
blocks
|
||||
}] : undefined
|
||||
});
|
||||
|
||||
return {
|
||||
messageId: result.ts!,
|
||||
channel: result.channel!,
|
||||
success: result.ok
|
||||
};
|
||||
}
|
||||
|
||||
private buildBlocks(message: NotificationMessage): KnownBlock[] {
|
||||
const blocks: KnownBlock[] = [];
|
||||
|
||||
if (message.title) {
|
||||
blocks.push({
|
||||
type: "header",
|
||||
text: {
|
||||
type: "plain_text",
|
||||
text: message.title
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
blocks.push({
|
||||
type: "section",
|
||||
text: {
|
||||
type: "mrkdwn",
|
||||
text: message.text
|
||||
}
|
||||
});
|
||||
|
||||
if (message.fields?.length) {
|
||||
blocks.push({
|
||||
type: "section",
|
||||
fields: message.fields.map(f => ({
|
||||
type: "mrkdwn",
|
||||
text: `*${f.name}*\n${f.value}`
|
||||
}))
|
||||
});
|
||||
}
|
||||
|
||||
if (message.actions?.length) {
|
||||
blocks.push({
|
||||
type: "actions",
|
||||
elements: message.actions.map(a => ({
|
||||
type: "button",
|
||||
text: {
|
||||
type: "plain_text",
|
||||
text: a.text
|
||||
},
|
||||
url: a.url,
|
||||
style: a.style === "danger" ? "danger" : "primary"
|
||||
}))
|
||||
});
|
||||
}
|
||||
|
||||
return blocks;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Secret Store Connectors
|
||||
|
||||
### ISecretConnector
|
||||
|
||||
```typescript
|
||||
interface ISecretConnector extends IConnector {
|
||||
// Secret operations
|
||||
getSecret(
|
||||
authContext: AuthContext,
|
||||
path: string,
|
||||
key?: string
|
||||
): Promise<SecretValue>;
|
||||
|
||||
listSecrets(
|
||||
authContext: AuthContext,
|
||||
path: string
|
||||
): Promise<string[]>;
|
||||
}
|
||||
|
||||
interface SecretValue {
|
||||
value: string;
|
||||
version?: string;
|
||||
createdAt?: DateTime;
|
||||
expiresAt?: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
### HashiCorp Vault Connector
|
||||
|
||||
```typescript
|
||||
class VaultConnector implements ISecretConnector {
|
||||
readonly typeId = "hashicorp-vault";
|
||||
readonly displayName = "HashiCorp Vault";
|
||||
readonly version = "1.0.0";
|
||||
readonly capabilities: ConnectorCapabilities = {
|
||||
discovery: true,
|
||||
webhooks: false,
|
||||
streaming: false,
|
||||
batchOperations: false,
|
||||
customActions: ["renewToken"]
|
||||
};
|
||||
|
||||
private vaultClient: VaultClient;
|
||||
|
||||
async initialize(config: VaultConfig): Promise<void> {
|
||||
this.vaultClient = new VaultClient({
|
||||
endpoint: config.url,
|
||||
namespace: config.namespace
|
||||
});
|
||||
}
|
||||
|
||||
async authenticate(
|
||||
config: VaultConfig,
|
||||
creds: Credential
|
||||
): Promise<AuthContext> {
|
||||
let token: string;
|
||||
|
||||
switch (config.authMethod) {
|
||||
case "token":
|
||||
token = (creds as TokenCredential).token;
|
||||
break;
|
||||
|
||||
case "approle":
|
||||
const approle = creds as AppRoleCredential;
|
||||
const result = await this.vaultClient.auth.approle.login({
|
||||
role_id: approle.roleId,
|
||||
secret_id: approle.secretId
|
||||
});
|
||||
token = result.auth.client_token;
|
||||
break;
|
||||
|
||||
case "kubernetes":
|
||||
const k8s = creds as KubernetesCredential;
|
||||
const k8sResult = await this.vaultClient.auth.kubernetes.login({
|
||||
role: k8s.role,
|
||||
jwt: k8s.serviceAccountToken
|
||||
});
|
||||
token = k8sResult.auth.client_token;
|
||||
break;
|
||||
|
||||
default:
|
||||
throw new Error(`Unsupported auth method: ${config.authMethod}`);
|
||||
}
|
||||
|
||||
this.vaultClient.token = token;
|
||||
|
||||
return {
|
||||
type: "bearer",
|
||||
token,
|
||||
renewable: true
|
||||
};
|
||||
}
|
||||
|
||||
async getSecret(
|
||||
authContext: AuthContext,
|
||||
path: string,
|
||||
key?: string
|
||||
): Promise<SecretValue> {
|
||||
const result = await this.vaultClient.kv.v2.read({
|
||||
mount_path: this.mountPath,
|
||||
path
|
||||
});
|
||||
|
||||
const data = result.data.data;
|
||||
const value = key ? data[key] : JSON.stringify(data);
|
||||
|
||||
return {
|
||||
value,
|
||||
version: result.data.metadata.version.toString(),
|
||||
createdAt: new Date(result.data.metadata.created_time)
|
||||
};
|
||||
}
|
||||
|
||||
async listSecrets(
|
||||
authContext: AuthContext,
|
||||
path: string
|
||||
): Promise<string[]> {
|
||||
const result = await this.vaultClient.kv.v2.list({
|
||||
mount_path: this.mountPath,
|
||||
path
|
||||
});
|
||||
|
||||
return result.data.keys;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Custom Connector Development
|
||||
|
||||
### Plugin Structure
|
||||
|
||||
```
|
||||
my-connector/
|
||||
├── manifest.yaml
|
||||
├── src/
|
||||
│ ├── connector.ts
|
||||
│ ├── config.ts
|
||||
│ └── types.ts
|
||||
└── package.json
|
||||
```
|
||||
|
||||
### Manifest
|
||||
|
||||
```yaml
|
||||
# manifest.yaml
|
||||
id: my-custom-connector
|
||||
version: 1.0.0
|
||||
name: My Custom Connector
|
||||
description: Custom connector for XYZ service
|
||||
author: Your Name
|
||||
|
||||
connector:
|
||||
typeId: my-service
|
||||
displayName: My Service
|
||||
entrypoint: ./src/connector.js
|
||||
|
||||
capabilities:
|
||||
discovery: true
|
||||
webhooks: false
|
||||
streaming: false
|
||||
batchOperations: false
|
||||
|
||||
config_schema:
|
||||
type: object
|
||||
properties:
|
||||
url:
|
||||
type: string
|
||||
format: uri
|
||||
description: Service URL
|
||||
timeout:
|
||||
type: integer
|
||||
default: 30000
|
||||
required:
|
||||
- url
|
||||
|
||||
credential_types:
|
||||
- api-key
|
||||
- oauth2
|
||||
```
|
||||
|
||||
### Implementation
|
||||
|
||||
```typescript
|
||||
// connector.ts
|
||||
import { IConnector, ConnectorCapabilities } from "@stella-ops/connector-sdk";
|
||||
|
||||
export class MyConnector implements IConnector {
|
||||
readonly typeId = "my-service";
|
||||
readonly displayName = "My Service";
|
||||
readonly version = "1.0.0";
|
||||
readonly capabilities: ConnectorCapabilities = {
|
||||
discovery: true,
|
||||
webhooks: false,
|
||||
streaming: false,
|
||||
batchOperations: false,
|
||||
customActions: []
|
||||
};
|
||||
|
||||
async initialize(config: MyConfig): Promise<void> {
|
||||
// Initialize your connector
|
||||
}
|
||||
|
||||
async dispose(): Promise<void> {
|
||||
// Cleanup resources
|
||||
}
|
||||
|
||||
async ping(config: MyConfig): Promise<void> {
|
||||
// Check connectivity
|
||||
}
|
||||
|
||||
async healthCheck(config: MyConfig, creds: Credential): Promise<HealthCheckResult> {
|
||||
// Full health check
|
||||
}
|
||||
|
||||
async authenticate(config: MyConfig, creds: Credential): Promise<AuthContext> {
|
||||
// Authenticate and return context
|
||||
}
|
||||
|
||||
async discover(
|
||||
config: MyConfig,
|
||||
authContext: AuthContext,
|
||||
resourceType: string,
|
||||
filter?: DiscoveryFilter
|
||||
): Promise<DiscoveredResource[]> {
|
||||
// Discover resources
|
||||
}
|
||||
}
|
||||
|
||||
// Export connector factory
|
||||
export default function createConnector(): IConnector {
|
||||
return new MyConnector();
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Integrations Overview](overview.md)
|
||||
- [Webhooks](webhooks.md)
|
||||
- [Plugin System](../modules/plugin-system.md)
|
||||
412
docs/modules/release-orchestrator/integrations/overview.md
Normal file
412
docs/modules/release-orchestrator/integrations/overview.md
Normal file
@@ -0,0 +1,412 @@
|
||||
# Integrations Overview
|
||||
|
||||
## Purpose
|
||||
|
||||
The Integration Hub (INTHUB) provides a unified interface for connecting Release Orchestrator to external systems including container registries, CI/CD pipelines, notification services, secret stores, and metrics providers.
|
||||
|
||||
## Integration Architecture
|
||||
|
||||
```
|
||||
INTEGRATION HUB ARCHITECTURE
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ INTEGRATION HUB │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ INTEGRATION MANAGER │ │
|
||||
│ │ │ │
|
||||
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
|
||||
│ │ │ Type │ │ Instance │ │ Health │ │ Discovery │ │ │
|
||||
│ │ │ Registry │ │ Manager │ │ Monitor │ │ Service │ │ │
|
||||
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ CONNECTOR RUNTIME │ │
|
||||
│ │ │ │
|
||||
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ CONNECTOR POOL │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │
|
||||
│ │ │ │ Docker │ │ GitLab │ │ Slack │ │ Vault │ │ │ │
|
||||
│ │ │ │ Registry │ │ CI │ │ │ │ │ │ │ │
|
||||
│ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ └──────────────────────────────────────────────────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────┬─────────────────┼─────────────────┬─────────────┐
|
||||
│ │ │ │ │
|
||||
▼ ▼ ▼ ▼ ▼
|
||||
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
|
||||
│Container│ │ CI/CD │ │ Notifi- │ │ Secret │ │ Metrics │
|
||||
│Registry │ │ Systems │ │ cations │ │ Stores │ │ Systems │
|
||||
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘
|
||||
```
|
||||
|
||||
## Integration Types
|
||||
|
||||
### Container Registries
|
||||
|
||||
| Type ID | Description | Discovery Support |
|
||||
|---------|-------------|-------------------|
|
||||
| `docker-registry` | Docker Registry v2 API | Yes |
|
||||
| `docker-hub` | Docker Hub | Yes |
|
||||
| `gcr` | Google Container Registry | Yes |
|
||||
| `ecr` | AWS Elastic Container Registry | Yes |
|
||||
| `acr` | Azure Container Registry | Yes |
|
||||
| `ghcr` | GitHub Container Registry | Yes |
|
||||
| `harbor` | Harbor Registry | Yes |
|
||||
| `jfrog` | JFrog Artifactory | Yes |
|
||||
| `nexus` | Sonatype Nexus | Yes |
|
||||
| `quay` | Quay.io | Yes |
|
||||
|
||||
### CI/CD Systems
|
||||
|
||||
| Type ID | Description | Trigger Support |
|
||||
|---------|-------------|-----------------|
|
||||
| `gitlab-ci` | GitLab CI/CD | Yes |
|
||||
| `github-actions` | GitHub Actions | Yes |
|
||||
| `jenkins` | Jenkins | Yes |
|
||||
| `azure-devops` | Azure DevOps Pipelines | Yes |
|
||||
| `circleci` | CircleCI | Yes |
|
||||
| `teamcity` | TeamCity | Yes |
|
||||
| `drone` | Drone CI | Yes |
|
||||
|
||||
### Notification Services
|
||||
|
||||
| Type ID | Description | Features |
|
||||
|---------|-------------|----------|
|
||||
| `slack` | Slack | Channels, threads, reactions |
|
||||
| `teams` | Microsoft Teams | Channels, cards |
|
||||
| `email` | Email (SMTP) | Templates, attachments |
|
||||
| `webhook` | Generic Webhook | JSON payloads |
|
||||
| `pagerduty` | PagerDuty | Incidents, alerts |
|
||||
| `opsgenie` | OpsGenie | Alerts, on-call |
|
||||
|
||||
### Secret Stores
|
||||
|
||||
| Type ID | Description | Features |
|
||||
|---------|-------------|----------|
|
||||
| `hashicorp-vault` | HashiCorp Vault | KV, Transit, PKI |
|
||||
| `aws-secrets-manager` | AWS Secrets Manager | Rotation, versioning |
|
||||
| `azure-key-vault` | Azure Key Vault | Keys, secrets, certs |
|
||||
| `gcp-secret-manager` | GCP Secret Manager | Versions, labels |
|
||||
|
||||
### Metrics & Monitoring
|
||||
|
||||
| Type ID | Description | Use Case |
|
||||
|---------|-------------|----------|
|
||||
| `prometheus` | Prometheus | Canary metrics |
|
||||
| `datadog` | Datadog | APM, logs, metrics |
|
||||
| `newrelic` | New Relic | APM, infra monitoring |
|
||||
| `dynatrace` | Dynatrace | Full-stack monitoring |
|
||||
|
||||
## Integration Configuration
|
||||
|
||||
### Integration Entity
|
||||
|
||||
```typescript
|
||||
interface Integration {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
typeId: string; // e.g., "docker-registry"
|
||||
name: string; // Display name
|
||||
description?: string;
|
||||
|
||||
// Connection configuration
|
||||
config: IntegrationConfig;
|
||||
|
||||
// Credential reference (stored in vault)
|
||||
credentialRef: string;
|
||||
|
||||
// Health tracking
|
||||
healthStatus: "healthy" | "degraded" | "unhealthy" | "unknown";
|
||||
lastHealthCheck?: DateTime;
|
||||
|
||||
// Metadata
|
||||
labels: Record<string, string>;
|
||||
createdAt: DateTime;
|
||||
updatedAt: DateTime;
|
||||
}
|
||||
|
||||
interface IntegrationConfig {
|
||||
// Common fields
|
||||
url?: string;
|
||||
timeout?: number;
|
||||
retries?: number;
|
||||
|
||||
// Type-specific fields
|
||||
[key: string]: any;
|
||||
}
|
||||
```
|
||||
|
||||
### Type-Specific Configuration
|
||||
|
||||
```typescript
|
||||
// Docker Registry
|
||||
interface DockerRegistryConfig extends IntegrationConfig {
|
||||
url: string; // https://registry.example.com
|
||||
repository?: string; // Optional default repository
|
||||
insecureSkipVerify?: boolean; // Skip TLS verification
|
||||
}
|
||||
|
||||
// GitLab CI
|
||||
interface GitLabCIConfig extends IntegrationConfig {
|
||||
url: string; // https://gitlab.example.com
|
||||
projectId: string; // Project ID or path
|
||||
defaultBranch?: string; // Default ref for triggers
|
||||
}
|
||||
|
||||
// Slack
|
||||
interface SlackConfig extends IntegrationConfig {
|
||||
workspace?: string; // Workspace identifier
|
||||
defaultChannel?: string; // Default channel for notifications
|
||||
iconEmoji?: string; // Bot icon
|
||||
}
|
||||
|
||||
// HashiCorp Vault
|
||||
interface VaultConfig extends IntegrationConfig {
|
||||
url: string; // https://vault.example.com
|
||||
namespace?: string; // Vault namespace
|
||||
mountPath: string; // Secret mount path
|
||||
authMethod: "token" | "approle" | "kubernetes";
|
||||
}
|
||||
```
|
||||
|
||||
## Credential Management
|
||||
|
||||
Credentials are never stored in the Release Orchestrator database. Instead, references to external secret stores are used.
|
||||
|
||||
### Credential Reference Format
|
||||
|
||||
```
|
||||
vault://vault-integration-id/path/to/secret#key
|
||||
└─────────┬────────┘ └─────┬─────┘ └┬┘
|
||||
Vault ID Secret path Key
|
||||
```
|
||||
|
||||
### Credential Types
|
||||
|
||||
```typescript
|
||||
type CredentialType =
|
||||
| "basic" // Username/password
|
||||
| "token" // Bearer token
|
||||
| "api-key" // API key
|
||||
| "oauth2" // OAuth2 credentials
|
||||
| "service-account" // GCP/K8s service account
|
||||
| "certificate"; // Client certificate
|
||||
|
||||
interface CredentialReference {
|
||||
type: CredentialType;
|
||||
ref: string; // Vault reference
|
||||
}
|
||||
|
||||
// Examples
|
||||
const dockerCreds: CredentialReference = {
|
||||
type: "basic",
|
||||
ref: "vault://vault-1/docker/registry.example.com#credentials"
|
||||
};
|
||||
|
||||
const gitlabToken: CredentialReference = {
|
||||
type: "token",
|
||||
ref: "vault://vault-1/ci/gitlab#access_token"
|
||||
};
|
||||
```
|
||||
|
||||
## Health Monitoring
|
||||
|
||||
### Health Check Types
|
||||
|
||||
| Check Type | Description | Frequency |
|
||||
|------------|-------------|-----------|
|
||||
| `connectivity` | TCP/HTTP connectivity | 1 min |
|
||||
| `authentication` | Credential validity | 5 min |
|
||||
| `functionality` | Full operation test | 15 min |
|
||||
|
||||
### Health Check Flow
|
||||
|
||||
```typescript
|
||||
interface HealthCheckResult {
|
||||
integrationId: UUID;
|
||||
checkType: string;
|
||||
status: "healthy" | "degraded" | "unhealthy";
|
||||
latencyMs: number;
|
||||
message?: string;
|
||||
checkedAt: DateTime;
|
||||
}
|
||||
|
||||
class IntegrationHealthMonitor {
|
||||
async checkHealth(integration: Integration): Promise<HealthCheckResult> {
|
||||
const connector = this.connectorPool.get(integration.typeId);
|
||||
const startTime = Date.now();
|
||||
|
||||
try {
|
||||
// Connectivity check
|
||||
await connector.ping(integration.config);
|
||||
|
||||
// Authentication check
|
||||
const creds = await this.fetchCredentials(integration.credentialRef);
|
||||
await connector.authenticate(integration.config, creds);
|
||||
|
||||
return {
|
||||
integrationId: integration.id,
|
||||
checkType: "full",
|
||||
status: "healthy",
|
||||
latencyMs: Date.now() - startTime,
|
||||
checkedAt: new Date()
|
||||
};
|
||||
} catch (error) {
|
||||
return {
|
||||
integrationId: integration.id,
|
||||
checkType: "full",
|
||||
status: this.classifyError(error),
|
||||
latencyMs: Date.now() - startTime,
|
||||
message: error.message,
|
||||
checkedAt: new Date()
|
||||
};
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Discovery Service
|
||||
|
||||
Integrations can discover resources from connected systems.
|
||||
|
||||
### Discovery Operations
|
||||
|
||||
```typescript
|
||||
interface DiscoveryService {
|
||||
// Discover available repositories
|
||||
discoverRepositories(integrationId: UUID): Promise<Repository[]>;
|
||||
|
||||
// Discover tags/versions
|
||||
discoverTags(integrationId: UUID, repository: string): Promise<Tag[]>;
|
||||
|
||||
// Discover pipelines
|
||||
discoverPipelines(integrationId: UUID): Promise<Pipeline[]>;
|
||||
|
||||
// Discover notification channels
|
||||
discoverChannels(integrationId: UUID): Promise<Channel[]>;
|
||||
}
|
||||
|
||||
// Example: Discover Docker repositories
|
||||
const repos = await discoveryService.discoverRepositories(dockerIntegrationId);
|
||||
// Returns: [{ name: "myapp", tags: ["latest", "v1.0.0", ...] }, ...]
|
||||
```
|
||||
|
||||
### Discovery Caching
|
||||
|
||||
```typescript
|
||||
interface DiscoveryCache {
|
||||
key: string; // integration_id:resource_type
|
||||
data: any;
|
||||
discoveredAt: DateTime;
|
||||
ttlSeconds: number;
|
||||
}
|
||||
|
||||
// Cache TTLs by resource type
|
||||
const cacheTTLs = {
|
||||
repositories: 3600, // 1 hour
|
||||
tags: 300, // 5 minutes
|
||||
pipelines: 3600, // 1 hour
|
||||
channels: 86400 // 24 hours
|
||||
};
|
||||
```
|
||||
|
||||
## API Reference
|
||||
|
||||
### Create Integration
|
||||
|
||||
```http
|
||||
POST /api/v1/integrations
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"typeId": "docker-registry",
|
||||
"name": "Production Registry",
|
||||
"config": {
|
||||
"url": "https://registry.example.com",
|
||||
"repository": "myorg"
|
||||
},
|
||||
"credentialRef": "vault://vault-1/docker/prod-registry#credentials",
|
||||
"labels": {
|
||||
"environment": "production"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Test Integration
|
||||
|
||||
```http
|
||||
POST /api/v1/integrations/{id}/test
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"connectivityTest": { "status": "passed", "latencyMs": 45 },
|
||||
"authenticationTest": { "status": "passed", "latencyMs": 120 },
|
||||
"functionalityTest": { "status": "passed", "latencyMs": 230 }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Discover Resources
|
||||
|
||||
```http
|
||||
POST /api/v1/integrations/{id}/discover
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"resourceType": "repositories",
|
||||
"filter": {
|
||||
"namePattern": "myapp-*"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Integration Errors
|
||||
|
||||
| Error Code | Description | Retry Strategy |
|
||||
|------------|-------------|----------------|
|
||||
| `INTEGRATION_NOT_FOUND` | Integration ID not found | No retry |
|
||||
| `INTEGRATION_UNHEALTHY` | Integration health check failing | Backoff retry |
|
||||
| `CREDENTIAL_FETCH_FAILED` | Cannot fetch credentials | Retry with backoff |
|
||||
| `CONNECTION_REFUSED` | Cannot connect to endpoint | Retry with backoff |
|
||||
| `AUTHENTICATION_FAILED` | Invalid credentials | No retry |
|
||||
| `RATE_LIMITED` | Too many requests | Retry after delay |
|
||||
|
||||
### Circuit Breaker
|
||||
|
||||
```typescript
|
||||
interface CircuitBreakerConfig {
|
||||
failureThreshold: number; // Failures before opening
|
||||
successThreshold: number; // Successes to close
|
||||
timeout: number; // Time in open state (ms)
|
||||
}
|
||||
|
||||
// Default configuration
|
||||
const defaultCircuitBreaker: CircuitBreakerConfig = {
|
||||
failureThreshold: 5,
|
||||
successThreshold: 3,
|
||||
timeout: 60000
|
||||
};
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Connectors](connectors.md)
|
||||
- [Webhooks](webhooks.md)
|
||||
- [CI/CD Integration](ci-cd.md)
|
||||
- [Integration Hub Module](../modules/integration-hub.md)
|
||||
627
docs/modules/release-orchestrator/integrations/webhooks.md
Normal file
627
docs/modules/release-orchestrator/integrations/webhooks.md
Normal file
@@ -0,0 +1,627 @@
|
||||
# Webhooks
|
||||
|
||||
## Overview
|
||||
|
||||
Release Orchestrator supports both inbound webhooks (receiving events from external systems) and outbound webhooks (sending events to external systems).
|
||||
|
||||
## Inbound Webhooks
|
||||
|
||||
### Webhook Types
|
||||
|
||||
| Type | Source | Triggers |
|
||||
|------|--------|----------|
|
||||
| `registry-push` | Container registries | Image push events |
|
||||
| `ci-pipeline` | CI/CD systems | Pipeline completion |
|
||||
| `github-app` | GitHub | PR, push, workflow events |
|
||||
| `gitlab-webhook` | GitLab | Pipeline, push, MR events |
|
||||
| `generic` | Any system | Custom payloads |
|
||||
|
||||
### Registry Push Webhook
|
||||
|
||||
Receives events when new images are pushed to registries.
|
||||
|
||||
```
|
||||
POST /api/v1/webhooks/registry/{integrationId}
|
||||
Content-Type: application/json
|
||||
|
||||
# Docker Hub
|
||||
{
|
||||
"push_data": {
|
||||
"tag": "v1.2.0",
|
||||
"images": ["sha256:abc123..."],
|
||||
"pushed_at": 1704067200
|
||||
},
|
||||
"repository": {
|
||||
"name": "myapp",
|
||||
"namespace": "myorg",
|
||||
"repo_url": "https://hub.docker.com/r/myorg/myapp"
|
||||
}
|
||||
}
|
||||
|
||||
# Harbor
|
||||
{
|
||||
"type": "PUSH_ARTIFACT",
|
||||
"occur_at": 1704067200,
|
||||
"event_data": {
|
||||
"repository": {
|
||||
"name": "myapp",
|
||||
"repo_full_name": "myorg/myapp"
|
||||
},
|
||||
"resources": [{
|
||||
"digest": "sha256:abc123...",
|
||||
"tag": "v1.2.0"
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Webhook Handler
|
||||
|
||||
```typescript
|
||||
interface WebhookHandler {
|
||||
handleRegistryPush(
|
||||
integrationId: UUID,
|
||||
payload: RegistryPushPayload
|
||||
): Promise<WebhookResponse>;
|
||||
|
||||
handleCIPipeline(
|
||||
integrationId: UUID,
|
||||
payload: CIPipelinePayload
|
||||
): Promise<WebhookResponse>;
|
||||
}
|
||||
|
||||
class RegistryWebhookHandler implements WebhookHandler {
|
||||
async handleRegistryPush(
|
||||
integrationId: UUID,
|
||||
payload: RegistryPushPayload
|
||||
): Promise<WebhookResponse> {
|
||||
// Normalize payload from different registries
|
||||
const normalized = this.normalizePayload(payload);
|
||||
|
||||
// Find matching component
|
||||
const component = await this.componentRegistry.findByRepository(
|
||||
normalized.repository
|
||||
);
|
||||
|
||||
if (!component) {
|
||||
return {
|
||||
success: true,
|
||||
action: "ignored",
|
||||
reason: "No matching component"
|
||||
};
|
||||
}
|
||||
|
||||
// Update version map
|
||||
await this.versionManager.addVersion({
|
||||
componentId: component.id,
|
||||
tag: normalized.tag,
|
||||
digest: normalized.digest,
|
||||
channel: this.determineChannel(normalized.tag)
|
||||
});
|
||||
|
||||
// Check for auto-release triggers
|
||||
const triggers = await this.getTriggers(component.id, normalized.tag);
|
||||
for (const trigger of triggers) {
|
||||
await this.triggerRelease(trigger, normalized);
|
||||
}
|
||||
|
||||
return {
|
||||
success: true,
|
||||
action: "processed",
|
||||
componentId: component.id,
|
||||
versionsAdded: 1,
|
||||
triggersActivated: triggers.length
|
||||
};
|
||||
}
|
||||
|
||||
private normalizePayload(payload: any): NormalizedPushEvent {
|
||||
// Detect registry type and normalize
|
||||
if (payload.push_data) {
|
||||
// Docker Hub format
|
||||
return {
|
||||
repository: `${payload.repository.namespace}/${payload.repository.name}`,
|
||||
tag: payload.push_data.tag,
|
||||
digest: payload.push_data.images[0],
|
||||
pushedAt: new Date(payload.push_data.pushed_at * 1000)
|
||||
};
|
||||
}
|
||||
|
||||
if (payload.type === "PUSH_ARTIFACT") {
|
||||
// Harbor format
|
||||
return {
|
||||
repository: payload.event_data.repository.repo_full_name,
|
||||
tag: payload.event_data.resources[0].tag,
|
||||
digest: payload.event_data.resources[0].digest,
|
||||
pushedAt: new Date(payload.occur_at * 1000)
|
||||
};
|
||||
}
|
||||
|
||||
// Generic format
|
||||
return payload as NormalizedPushEvent;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Webhook Authentication
|
||||
|
||||
```typescript
|
||||
interface WebhookAuth {
|
||||
// Signature validation
|
||||
validateSignature(
|
||||
payload: Buffer,
|
||||
signature: string,
|
||||
secret: string,
|
||||
algorithm: SignatureAlgorithm
|
||||
): boolean;
|
||||
|
||||
// Token validation
|
||||
validateToken(
|
||||
token: string,
|
||||
expectedToken: string
|
||||
): boolean;
|
||||
}
|
||||
|
||||
type SignatureAlgorithm = "hmac-sha256" | "hmac-sha1";
|
||||
|
||||
class WebhookAuthenticator implements WebhookAuth {
|
||||
validateSignature(
|
||||
payload: Buffer,
|
||||
signature: string,
|
||||
secret: string,
|
||||
algorithm: SignatureAlgorithm
|
||||
): boolean {
|
||||
const algo = algorithm === "hmac-sha256" ? "sha256" : "sha1";
|
||||
const expected = crypto
|
||||
.createHmac(algo, secret)
|
||||
.update(payload)
|
||||
.digest("hex");
|
||||
|
||||
// Constant-time comparison
|
||||
return crypto.timingSafeEqual(
|
||||
Buffer.from(signature),
|
||||
Buffer.from(expected)
|
||||
);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Webhook Configuration
|
||||
|
||||
```typescript
|
||||
interface WebhookConfig {
|
||||
id: UUID;
|
||||
integrationId: UUID;
|
||||
type: WebhookType;
|
||||
|
||||
// Security
|
||||
secretRef: string; // Vault reference for signature secret
|
||||
signatureHeader?: string; // Header containing signature
|
||||
signatureAlgorithm?: SignatureAlgorithm;
|
||||
|
||||
// Processing
|
||||
enabled: boolean;
|
||||
filters?: WebhookFilter[]; // Filter events
|
||||
|
||||
// Retry
|
||||
retryPolicy: RetryPolicy;
|
||||
}
|
||||
|
||||
interface WebhookFilter {
|
||||
field: string; // JSONPath to field
|
||||
operator: "equals" | "contains" | "matches";
|
||||
value: string;
|
||||
}
|
||||
|
||||
// Example: Only process tags matching semver
|
||||
const semverFilter: WebhookFilter = {
|
||||
field: "$.tag",
|
||||
operator: "matches",
|
||||
value: "^v\\d+\\.\\d+\\.\\d+$"
|
||||
};
|
||||
```
|
||||
|
||||
## Outbound Webhooks
|
||||
|
||||
### Event Types
|
||||
|
||||
| Event | Description | Payload |
|
||||
|-------|-------------|---------|
|
||||
| `release.created` | New release created | Release details |
|
||||
| `promotion.requested` | Promotion requested | Promotion details |
|
||||
| `promotion.approved` | Promotion approved | Approval details |
|
||||
| `promotion.rejected` | Promotion rejected | Rejection details |
|
||||
| `deployment.started` | Deployment started | Job details |
|
||||
| `deployment.completed` | Deployment completed | Job details, results |
|
||||
| `deployment.failed` | Deployment failed | Job details, error |
|
||||
| `rollback.initiated` | Rollback initiated | Rollback details |
|
||||
|
||||
### Webhook Subscription
|
||||
|
||||
```typescript
|
||||
interface WebhookSubscription {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string;
|
||||
|
||||
// Target
|
||||
url: string;
|
||||
method: "POST" | "PUT";
|
||||
headers?: Record<string, string>;
|
||||
|
||||
// Authentication
|
||||
authType: "none" | "basic" | "bearer" | "signature";
|
||||
credentialRef?: string;
|
||||
signatureSecret?: string;
|
||||
|
||||
// Events
|
||||
events: string[]; // Event types to subscribe
|
||||
filters?: EventFilter[]; // Filter events
|
||||
|
||||
// Delivery
|
||||
retryPolicy: RetryPolicy;
|
||||
timeout: number;
|
||||
|
||||
// Status
|
||||
enabled: boolean;
|
||||
lastDelivery?: DateTime;
|
||||
lastStatus?: number;
|
||||
}
|
||||
|
||||
interface EventFilter {
|
||||
field: string;
|
||||
operator: string;
|
||||
value: any;
|
||||
}
|
||||
```
|
||||
|
||||
### Webhook Delivery
|
||||
|
||||
```typescript
|
||||
interface WebhookPayload {
|
||||
id: string; // Delivery ID
|
||||
timestamp: string; // ISO-8601
|
||||
event: string; // Event type
|
||||
tenantId: string;
|
||||
data: Record<string, any>; // Event-specific data
|
||||
}
|
||||
|
||||
class WebhookDeliveryService {
|
||||
async deliver(
|
||||
subscription: WebhookSubscription,
|
||||
event: DomainEvent
|
||||
): Promise<DeliveryResult> {
|
||||
const payload: WebhookPayload = {
|
||||
id: uuidv4(),
|
||||
timestamp: new Date().toISOString(),
|
||||
event: event.type,
|
||||
tenantId: subscription.tenantId,
|
||||
data: this.buildEventData(event)
|
||||
};
|
||||
|
||||
const headers = this.buildHeaders(subscription, payload);
|
||||
const body = JSON.stringify(payload);
|
||||
|
||||
// Attempt delivery with retries
|
||||
return this.deliverWithRetry(subscription, headers, body);
|
||||
}
|
||||
|
||||
private buildHeaders(
|
||||
subscription: WebhookSubscription,
|
||||
payload: WebhookPayload
|
||||
): Record<string, string> {
|
||||
const headers: Record<string, string> = {
|
||||
"Content-Type": "application/json",
|
||||
"X-Stella-Event": payload.event,
|
||||
"X-Stella-Delivery": payload.id,
|
||||
"X-Stella-Timestamp": payload.timestamp,
|
||||
...subscription.headers
|
||||
};
|
||||
|
||||
// Add signature if configured
|
||||
if (subscription.authType === "signature") {
|
||||
const signature = this.computeSignature(
|
||||
JSON.stringify(payload),
|
||||
subscription.signatureSecret!
|
||||
);
|
||||
headers["X-Stella-Signature"] = signature;
|
||||
}
|
||||
|
||||
return headers;
|
||||
}
|
||||
|
||||
private async deliverWithRetry(
|
||||
subscription: WebhookSubscription,
|
||||
headers: Record<string, string>,
|
||||
body: string
|
||||
): Promise<DeliveryResult> {
|
||||
const policy = subscription.retryPolicy;
|
||||
let lastError: Error | undefined;
|
||||
|
||||
for (let attempt = 0; attempt <= policy.maxRetries; attempt++) {
|
||||
try {
|
||||
const response = await fetch(subscription.url, {
|
||||
method: subscription.method,
|
||||
headers,
|
||||
body,
|
||||
signal: AbortSignal.timeout(subscription.timeout)
|
||||
});
|
||||
|
||||
// Record delivery
|
||||
await this.recordDelivery(subscription.id, {
|
||||
attempt,
|
||||
statusCode: response.status,
|
||||
success: response.ok
|
||||
});
|
||||
|
||||
if (response.ok) {
|
||||
return { success: true, statusCode: response.status, attempts: attempt + 1 };
|
||||
}
|
||||
|
||||
// Non-retryable status codes
|
||||
if (response.status >= 400 && response.status < 500) {
|
||||
return {
|
||||
success: false,
|
||||
statusCode: response.status,
|
||||
attempts: attempt + 1,
|
||||
error: `Client error: ${response.status}`
|
||||
};
|
||||
}
|
||||
|
||||
lastError = new Error(`Server error: ${response.status}`);
|
||||
} catch (error) {
|
||||
lastError = error as Error;
|
||||
}
|
||||
|
||||
// Wait before retry
|
||||
if (attempt < policy.maxRetries) {
|
||||
const delay = this.calculateDelay(policy, attempt);
|
||||
await sleep(delay);
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
success: false,
|
||||
attempts: policy.maxRetries + 1,
|
||||
error: lastError?.message
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Delivery Logging
|
||||
|
||||
```typescript
|
||||
interface WebhookDeliveryLog {
|
||||
id: UUID;
|
||||
subscriptionId: UUID;
|
||||
deliveryId: string;
|
||||
|
||||
// Request
|
||||
url: string;
|
||||
method: string;
|
||||
headers: Record<string, string>;
|
||||
body: string;
|
||||
|
||||
// Response
|
||||
statusCode?: number;
|
||||
responseBody?: string;
|
||||
responseTime: number;
|
||||
|
||||
// Result
|
||||
success: boolean;
|
||||
attempt: number;
|
||||
error?: string;
|
||||
|
||||
// Timing
|
||||
createdAt: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
## Webhook API
|
||||
|
||||
### Register Subscription
|
||||
|
||||
```http
|
||||
POST /api/v1/webhook-subscriptions
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"name": "Deployment Notifications",
|
||||
"url": "https://api.example.com/webhooks/stella",
|
||||
"method": "POST",
|
||||
"authType": "signature",
|
||||
"signatureSecret": "my-secret-key",
|
||||
"events": [
|
||||
"deployment.started",
|
||||
"deployment.completed",
|
||||
"deployment.failed"
|
||||
],
|
||||
"filters": [
|
||||
{
|
||||
"field": "data.environment.name",
|
||||
"operator": "equals",
|
||||
"value": "production"
|
||||
}
|
||||
],
|
||||
"retryPolicy": {
|
||||
"maxRetries": 3,
|
||||
"backoffType": "exponential",
|
||||
"backoffSeconds": 10
|
||||
},
|
||||
"timeout": 30000
|
||||
}
|
||||
```
|
||||
|
||||
### Test Subscription
|
||||
|
||||
```http
|
||||
POST /api/v1/webhook-subscriptions/{id}/test
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"event": "deployment.completed"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"deliveryId": "d1234567-...",
|
||||
"statusCode": 200,
|
||||
"responseTime": 245,
|
||||
"response": "OK"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### List Deliveries
|
||||
|
||||
```http
|
||||
GET /api/v1/webhook-subscriptions/{id}/deliveries?page=1&pageSize=20
|
||||
```
|
||||
|
||||
## Event Payloads
|
||||
|
||||
### deployment.completed
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "delivery-uuid",
|
||||
"timestamp": "2026-01-09T10:30:00Z",
|
||||
"event": "deployment.completed",
|
||||
"tenantId": "tenant-uuid",
|
||||
"data": {
|
||||
"deploymentJob": {
|
||||
"id": "job-uuid",
|
||||
"status": "completed"
|
||||
},
|
||||
"release": {
|
||||
"id": "release-uuid",
|
||||
"name": "myapp-v1.2.0",
|
||||
"components": [
|
||||
{
|
||||
"name": "api",
|
||||
"digest": "sha256:abc123..."
|
||||
}
|
||||
]
|
||||
},
|
||||
"environment": {
|
||||
"id": "env-uuid",
|
||||
"name": "production"
|
||||
},
|
||||
"promotion": {
|
||||
"id": "promo-uuid",
|
||||
"requestedBy": "user@example.com"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"id": "target-uuid",
|
||||
"name": "prod-host-1",
|
||||
"status": "succeeded"
|
||||
}
|
||||
],
|
||||
"timing": {
|
||||
"startedAt": "2026-01-09T10:25:00Z",
|
||||
"completedAt": "2026-01-09T10:30:00Z",
|
||||
"durationSeconds": 300
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### promotion.requested
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "delivery-uuid",
|
||||
"timestamp": "2026-01-09T10:00:00Z",
|
||||
"event": "promotion.requested",
|
||||
"tenantId": "tenant-uuid",
|
||||
"data": {
|
||||
"promotion": {
|
||||
"id": "promo-uuid",
|
||||
"status": "pending_approval"
|
||||
},
|
||||
"release": {
|
||||
"id": "release-uuid",
|
||||
"name": "myapp-v1.2.0"
|
||||
},
|
||||
"sourceEnvironment": {
|
||||
"id": "staging-uuid",
|
||||
"name": "staging"
|
||||
},
|
||||
"targetEnvironment": {
|
||||
"id": "prod-uuid",
|
||||
"name": "production"
|
||||
},
|
||||
"requestedBy": {
|
||||
"id": "user-uuid",
|
||||
"email": "user@example.com",
|
||||
"name": "John Doe"
|
||||
},
|
||||
"approvalRequired": {
|
||||
"count": 2,
|
||||
"currentApprovals": 0
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Signature Verification
|
||||
|
||||
Receivers should verify webhook signatures:
|
||||
|
||||
```python
|
||||
import hmac
|
||||
import hashlib
|
||||
|
||||
def verify_signature(payload: bytes, signature: str, secret: str) -> bool:
|
||||
expected = hmac.new(
|
||||
secret.encode(),
|
||||
payload,
|
||||
hashlib.sha256
|
||||
).hexdigest()
|
||||
|
||||
return hmac.compare_digest(signature, expected)
|
||||
|
||||
# In webhook handler
|
||||
@app.route("/webhooks/stella", methods=["POST"])
|
||||
def handle_webhook():
|
||||
signature = request.headers.get("X-Stella-Signature")
|
||||
if not verify_signature(request.data, signature, WEBHOOK_SECRET):
|
||||
return "Invalid signature", 401
|
||||
|
||||
payload = request.json
|
||||
# Process event...
|
||||
```
|
||||
|
||||
### IP Allowlisting
|
||||
|
||||
Configure firewall rules to only accept webhooks from Stella IP ranges:
|
||||
- Document IP ranges in deployment configuration
|
||||
- Use VPN or private networking where possible
|
||||
|
||||
### Replay Protection
|
||||
|
||||
Check delivery timestamps to prevent replay attacks:
|
||||
|
||||
```python
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
MAX_TIMESTAMP_AGE = timedelta(minutes=5)
|
||||
|
||||
def check_timestamp(timestamp_str: str) -> bool:
|
||||
timestamp = datetime.fromisoformat(timestamp_str.replace("Z", "+00:00"))
|
||||
now = datetime.now(timestamp.tzinfo)
|
||||
return abs(now - timestamp) < MAX_TIMESTAMP_AGE
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Integrations Overview](overview.md)
|
||||
- [Connectors](connectors.md)
|
||||
- [CI/CD Integration](ci-cd.md)
|
||||
597
docs/modules/release-orchestrator/modules/agents.md
Normal file
597
docs/modules/release-orchestrator/modules/agents.md
Normal file
@@ -0,0 +1,597 @@
|
||||
# AGENTS: Deployment Agents
|
||||
|
||||
**Purpose**: Lightweight deployment agents for target execution.
|
||||
|
||||
## Agent Types
|
||||
|
||||
| Agent Type | Transport | Target Types |
|
||||
|------------|-----------|--------------|
|
||||
| `agent-docker` | gRPC | Docker hosts |
|
||||
| `agent-compose` | gRPC | Docker Compose hosts |
|
||||
| `agent-ssh` | SSH | Linux remote hosts |
|
||||
| `agent-winrm` | WinRM | Windows remote hosts |
|
||||
| `agent-ecs` | AWS API | AWS ECS services |
|
||||
| `agent-nomad` | Nomad API | HashiCorp Nomad jobs |
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `agent-core`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Shared agent runtime; task execution framework |
|
||||
| **Protocol** | gRPC for communication with Stella Core |
|
||||
| **Security** | mTLS authentication; short-lived JWT for tasks |
|
||||
|
||||
**Agent Lifecycle**:
|
||||
1. Agent starts with registration token
|
||||
2. Agent registers with capabilities and labels
|
||||
3. Agent sends heartbeats (default: 30s interval)
|
||||
4. Agent receives tasks from Stella Core
|
||||
5. Agent reports task completion/failure
|
||||
|
||||
**Agent Task Protocol**:
|
||||
```typescript
|
||||
// Task assignment (Core → Agent)
|
||||
interface AgentTask {
|
||||
id: UUID;
|
||||
type: TaskType;
|
||||
targetId: UUID;
|
||||
payload: TaskPayload;
|
||||
credentials: EncryptedCredentials;
|
||||
timeout: number;
|
||||
priority: TaskPriority;
|
||||
idempotencyKey: string;
|
||||
assignedAt: DateTime;
|
||||
expiresAt: DateTime;
|
||||
}
|
||||
|
||||
type TaskType =
|
||||
| "deploy"
|
||||
| "rollback"
|
||||
| "health-check"
|
||||
| "inspect"
|
||||
| "execute-command"
|
||||
| "upload-files"
|
||||
| "write-sticker"
|
||||
| "read-sticker";
|
||||
|
||||
interface DeployTaskPayload {
|
||||
image: string;
|
||||
digest: string;
|
||||
config: DeployConfig;
|
||||
artifacts: ArtifactReference[];
|
||||
previousDigest?: string;
|
||||
hooks: {
|
||||
preDeploy?: HookConfig;
|
||||
postDeploy?: HookConfig;
|
||||
};
|
||||
}
|
||||
|
||||
// Task result (Agent → Core)
|
||||
interface TaskResult {
|
||||
taskId: UUID;
|
||||
success: boolean;
|
||||
startedAt: DateTime;
|
||||
completedAt: DateTime;
|
||||
|
||||
// Success details
|
||||
outputs?: Record<string, any>;
|
||||
artifacts?: ArtifactReference[];
|
||||
|
||||
// Failure details
|
||||
error?: string;
|
||||
errorType?: string;
|
||||
retriable?: boolean;
|
||||
|
||||
// Logs
|
||||
logs: string;
|
||||
|
||||
// Metrics
|
||||
metrics: {
|
||||
pullDurationMs?: number;
|
||||
deployDurationMs?: number;
|
||||
healthCheckDurationMs?: number;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `agent-docker`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Docker container deployment |
|
||||
| **Dependencies** | Docker Engine API |
|
||||
| **Capabilities** | `docker.deploy`, `docker.rollback`, `docker.inspect` |
|
||||
|
||||
**Docker Agent Implementation**:
|
||||
```typescript
|
||||
class DockerAgent implements TargetExecutor {
|
||||
private docker: Docker;
|
||||
|
||||
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
|
||||
const { image, digest, config, previousDigest } = task;
|
||||
const containerName = config.containerName;
|
||||
|
||||
// 1. Pull image and verify digest
|
||||
this.log(`Pulling image ${image}@${digest}`);
|
||||
await this.docker.pull(image, { digest });
|
||||
|
||||
const pulledDigest = await this.getImageDigest(image);
|
||||
if (pulledDigest !== digest) {
|
||||
throw new DigestMismatchError(
|
||||
`Expected digest ${digest}, got ${pulledDigest}. Possible tampering detected.`
|
||||
);
|
||||
}
|
||||
|
||||
// 2. Run pre-deploy hook
|
||||
if (task.hooks?.preDeploy) {
|
||||
await this.runHook(task.hooks.preDeploy, "pre-deploy");
|
||||
}
|
||||
|
||||
// 3. Stop and rename existing container
|
||||
const existingContainer = await this.findContainer(containerName);
|
||||
if (existingContainer) {
|
||||
this.log(`Stopping existing container ${containerName}`);
|
||||
await existingContainer.stop({ t: 10 });
|
||||
await existingContainer.rename(`${containerName}-previous-${Date.now()}`);
|
||||
}
|
||||
|
||||
// 4. Create new container
|
||||
this.log(`Creating container ${containerName} from ${image}@${digest}`);
|
||||
const container = await this.docker.createContainer({
|
||||
name: containerName,
|
||||
Image: `${image}@${digest}`, // Always use digest, not tag
|
||||
Env: this.buildEnvVars(config.environment),
|
||||
HostConfig: {
|
||||
PortBindings: this.buildPortBindings(config.ports),
|
||||
Binds: this.buildBindMounts(config.volumes),
|
||||
RestartPolicy: { Name: config.restartPolicy || "unless-stopped" },
|
||||
Memory: config.memoryLimit,
|
||||
CpuQuota: config.cpuLimit,
|
||||
},
|
||||
Labels: {
|
||||
"stella.release.id": config.releaseId,
|
||||
"stella.release.name": config.releaseName,
|
||||
"stella.digest": digest,
|
||||
"stella.deployed.at": new Date().toISOString(),
|
||||
},
|
||||
});
|
||||
|
||||
// 5. Start container
|
||||
this.log(`Starting container ${containerName}`);
|
||||
await container.start();
|
||||
|
||||
// 6. Wait for container to be healthy
|
||||
if (config.healthCheck) {
|
||||
this.log(`Waiting for container health check`);
|
||||
const healthy = await this.waitForHealthy(container, config.healthCheck.timeout);
|
||||
if (!healthy) {
|
||||
await this.rollbackContainer(containerName, existingContainer);
|
||||
throw new HealthCheckFailedError(`Container ${containerName} failed health check`);
|
||||
}
|
||||
}
|
||||
|
||||
// 7. Run post-deploy hook
|
||||
if (task.hooks?.postDeploy) {
|
||||
await this.runHook(task.hooks.postDeploy, "post-deploy");
|
||||
}
|
||||
|
||||
// 8. Cleanup previous container
|
||||
if (existingContainer && config.cleanupPrevious !== false) {
|
||||
this.log(`Removing previous container`);
|
||||
await existingContainer.remove({ force: true });
|
||||
}
|
||||
|
||||
return {
|
||||
success: true,
|
||||
containerId: container.id,
|
||||
previousDigest: previousDigest,
|
||||
};
|
||||
}
|
||||
|
||||
async rollback(task: RollbackTaskPayload): Promise<DeployResult> {
|
||||
const { containerName, targetDigest } = task;
|
||||
|
||||
if (targetDigest) {
|
||||
// Deploy specific digest
|
||||
return this.deploy({ ...task, digest: targetDigest });
|
||||
}
|
||||
|
||||
// Find and restore previous container
|
||||
const previousContainer = await this.findContainer(`${containerName}-previous-*`);
|
||||
if (!previousContainer) {
|
||||
throw new RollbackError(`No previous container found for ${containerName}`);
|
||||
}
|
||||
|
||||
const currentContainer = await this.findContainer(containerName);
|
||||
if (currentContainer) {
|
||||
await currentContainer.stop({ t: 10 });
|
||||
await currentContainer.rename(`${containerName}-failed-${Date.now()}`);
|
||||
}
|
||||
|
||||
await previousContainer.rename(containerName);
|
||||
await previousContainer.start();
|
||||
|
||||
return { success: true, containerId: previousContainer.id };
|
||||
}
|
||||
|
||||
async writeSticker(sticker: VersionSticker): Promise<void> {
|
||||
const stickerPath = this.config.stickerPath || "/var/stella/version.json";
|
||||
const stickerContent = JSON.stringify(sticker, null, 2);
|
||||
|
||||
if (this.config.stickerLocation === "volume") {
|
||||
await this.docker.run("alpine", [
|
||||
"sh", "-c",
|
||||
`echo '${stickerContent}' > ${stickerPath}`
|
||||
], {
|
||||
HostConfig: { Binds: [`${this.config.stickerVolume}:/var/stella`] }
|
||||
});
|
||||
} else {
|
||||
fs.writeFileSync(stickerPath, stickerContent);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `agent-compose`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Docker Compose stack deployment |
|
||||
| **Dependencies** | Docker Compose CLI |
|
||||
| **Capabilities** | `compose.deploy`, `compose.rollback`, `compose.inspect` |
|
||||
|
||||
**Compose Agent Implementation**:
|
||||
```typescript
|
||||
class ComposeAgent implements TargetExecutor {
|
||||
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
|
||||
const { artifacts, config } = task;
|
||||
const deployDir = config.deploymentDirectory;
|
||||
|
||||
// 1. Write compose lock file
|
||||
const composeLock = artifacts.find(a => a.type === "compose_lock");
|
||||
const composeContent = await this.fetchArtifact(composeLock);
|
||||
const composePath = path.join(deployDir, "compose.stella.lock.yml");
|
||||
await fs.writeFile(composePath, composeContent);
|
||||
|
||||
// 2. Run pre-deploy hook
|
||||
if (task.hooks?.preDeploy) {
|
||||
await this.runHook(task.hooks.preDeploy, deployDir);
|
||||
}
|
||||
|
||||
// 3. Pull images
|
||||
this.log("Pulling images...");
|
||||
await this.runCompose(deployDir, ["pull"]);
|
||||
|
||||
// 4. Verify digests
|
||||
await this.verifyDigests(composePath, config.expectedDigests);
|
||||
|
||||
// 5. Deploy
|
||||
this.log("Deploying services...");
|
||||
await this.runCompose(deployDir, ["up", "-d", "--remove-orphans", "--force-recreate"]);
|
||||
|
||||
// 6. Wait for services to be healthy
|
||||
if (config.healthCheck) {
|
||||
const healthy = await this.waitForServicesHealthy(deployDir, config.healthCheck.timeout);
|
||||
if (!healthy) {
|
||||
await this.rollbackToBackup(deployDir);
|
||||
throw new HealthCheckFailedError("Services failed health check");
|
||||
}
|
||||
}
|
||||
|
||||
// 7. Run post-deploy hook
|
||||
if (task.hooks?.postDeploy) {
|
||||
await this.runHook(task.hooks.postDeploy, deployDir);
|
||||
}
|
||||
|
||||
// 8. Write version sticker
|
||||
await this.writeSticker(config.sticker, deployDir);
|
||||
|
||||
return { success: true };
|
||||
}
|
||||
|
||||
private async verifyDigests(
|
||||
composePath: string,
|
||||
expectedDigests: Record<string, string>
|
||||
): Promise<void> {
|
||||
const composeContent = yaml.parse(await fs.readFile(composePath, "utf-8"));
|
||||
|
||||
for (const [service, expectedDigest] of Object.entries(expectedDigests)) {
|
||||
const serviceConfig = composeContent.services[service];
|
||||
if (!serviceConfig) {
|
||||
throw new Error(`Service ${service} not found in compose file`);
|
||||
}
|
||||
|
||||
const image = serviceConfig.image;
|
||||
if (!image.includes("@sha256:")) {
|
||||
throw new Error(`Service ${service} image not pinned to digest: ${image}`);
|
||||
}
|
||||
|
||||
const actualDigest = image.split("@")[1];
|
||||
if (actualDigest !== expectedDigest) {
|
||||
throw new DigestMismatchError(
|
||||
`Service ${service}: expected ${expectedDigest}, got ${actualDigest}`
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `agent-ssh`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | SSH remote execution (agentless) |
|
||||
| **Dependencies** | SSH client library |
|
||||
| **Capabilities** | `ssh.deploy`, `ssh.execute`, `ssh.upload` |
|
||||
|
||||
**SSH Remote Executor**:
|
||||
```typescript
|
||||
class SSHRemoteExecutor implements TargetExecutor {
|
||||
async connect(config: SSHConnectionConfig): Promise<void> {
|
||||
const privateKey = await this.secrets.getSecret(config.privateKeyRef);
|
||||
|
||||
this.ssh = new SSHClient();
|
||||
await this.ssh.connect({
|
||||
host: config.host,
|
||||
port: config.port || 22,
|
||||
username: config.username,
|
||||
privateKey: privateKey.value,
|
||||
readyTimeout: config.connectionTimeout || 30000,
|
||||
});
|
||||
}
|
||||
|
||||
async deploy(task: DeployTaskPayload): Promise<DeployResult> {
|
||||
const { artifacts, config } = task;
|
||||
const deployDir = config.deploymentDirectory;
|
||||
|
||||
try {
|
||||
// 1. Ensure deployment directory exists
|
||||
await this.exec(`mkdir -p ${deployDir}`);
|
||||
await this.exec(`mkdir -p ${deployDir}/.stella-backup`);
|
||||
|
||||
// 2. Backup current deployment
|
||||
await this.exec(`cp -r ${deployDir}/* ${deployDir}/.stella-backup/ 2>/dev/null || true`);
|
||||
|
||||
// 3. Upload artifacts
|
||||
for (const artifact of artifacts) {
|
||||
const content = await this.fetchArtifact(artifact);
|
||||
const remotePath = path.join(deployDir, artifact.name);
|
||||
await this.uploadFile(content, remotePath);
|
||||
}
|
||||
|
||||
// 4. Run pre-deploy hook
|
||||
if (task.hooks?.preDeploy) {
|
||||
await this.runRemoteHook(task.hooks.preDeploy, deployDir);
|
||||
}
|
||||
|
||||
// 5. Execute deployment script
|
||||
const deployScript = artifacts.find(a => a.type === "deploy_script");
|
||||
if (deployScript) {
|
||||
const scriptPath = path.join(deployDir, deployScript.name);
|
||||
await this.exec(`chmod +x ${scriptPath}`);
|
||||
const result = await this.exec(scriptPath, { cwd: deployDir, timeout: config.deploymentTimeout });
|
||||
if (result.exitCode !== 0) {
|
||||
throw new DeploymentError(`Deploy script failed: ${result.stderr}`);
|
||||
}
|
||||
}
|
||||
|
||||
// 6. Run post-deploy hook
|
||||
if (task.hooks?.postDeploy) {
|
||||
await this.runRemoteHook(task.hooks.postDeploy, deployDir);
|
||||
}
|
||||
|
||||
// 7. Health check
|
||||
if (config.healthCheck) {
|
||||
const healthy = await this.runHealthCheck(config.healthCheck);
|
||||
if (!healthy) {
|
||||
await this.rollback(task);
|
||||
throw new HealthCheckFailedError("Health check failed");
|
||||
}
|
||||
}
|
||||
|
||||
// 8. Write version sticker
|
||||
await this.writeSticker(config.sticker, deployDir);
|
||||
|
||||
// 9. Cleanup backup
|
||||
await this.exec(`rm -rf ${deployDir}/.stella-backup`);
|
||||
|
||||
return { success: true };
|
||||
} finally {
|
||||
this.ssh.end();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `agent-winrm`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | WinRM remote execution (agentless) |
|
||||
| **Dependencies** | WinRM client library |
|
||||
| **Capabilities** | `winrm.deploy`, `winrm.execute`, `winrm.upload` |
|
||||
| **Authentication** | NTLM, Kerberos, Basic |
|
||||
|
||||
---
|
||||
|
||||
### Module: `agent-ecs`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | AWS ECS service deployment |
|
||||
| **Dependencies** | AWS SDK |
|
||||
| **Capabilities** | `ecs.deploy`, `ecs.rollback`, `ecs.inspect` |
|
||||
|
||||
---
|
||||
|
||||
### Module: `agent-nomad`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | HashiCorp Nomad job deployment |
|
||||
| **Dependencies** | Nomad API client |
|
||||
| **Capabilities** | `nomad.deploy`, `nomad.rollback`, `nomad.inspect` |
|
||||
|
||||
---
|
||||
|
||||
## Agent Security Model
|
||||
|
||||
### Registration Flow
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ AGENT REGISTRATION FLOW │
|
||||
│ │
|
||||
│ 1. Admin generates registration token (one-time use) │
|
||||
│ POST /api/v1/admin/agent-tokens │
|
||||
│ → { token: "reg_xxx", expiresAt: "..." } │
|
||||
│ │
|
||||
│ 2. Agent starts with registration token │
|
||||
│ ./stella-agent --register --token=reg_xxx │
|
||||
│ │
|
||||
│ 3. Agent requests mTLS certificate │
|
||||
│ POST /api/v1/agents/register │
|
||||
│ Headers: X-Registration-Token: reg_xxx │
|
||||
│ Body: { name, version, capabilities, csr } │
|
||||
│ → { agentId, certificate, caCertificate } │
|
||||
│ │
|
||||
│ 4. Agent establishes mTLS connection │
|
||||
│ Uses issued certificate for all subsequent requests │
|
||||
│ │
|
||||
│ 5. Agent requests short-lived JWT for task execution │
|
||||
│ POST /api/v1/agents/token (over mTLS) │
|
||||
│ → { token, expiresIn: 3600 } // 1 hour │
|
||||
│ │
|
||||
│ 6. Agent refreshes token before expiration │
|
||||
│ Token refresh only over mTLS connection │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Communication Security
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ AGENT COMMUNICATION SECURITY │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ AGENT │ │ STELLA CORE │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ │ │ │
|
||||
│ │ mTLS (mutual TLS) │ │
|
||||
│ │ - Agent cert signed by Stella CA │ │
|
||||
│ │ - Server cert verified by Agent │ │
|
||||
│ │ - TLS 1.3 only │ │
|
||||
│ │ - Perfect forward secrecy │ │
|
||||
│ │◄───────────────────────────────────────►│ │
|
||||
│ │ │ │
|
||||
│ │ Encrypted payload │ │
|
||||
│ │ - Task payloads encrypted with │ │
|
||||
│ │ agent-specific key │ │
|
||||
│ │ - Logs encrypted in transit │ │
|
||||
│ │◄───────────────────────────────────────►│ │
|
||||
│ │ │ │
|
||||
│ │ Heartbeat + capability refresh │ │
|
||||
│ │ - Every 30 seconds │ │
|
||||
│ │ - Signed with agent key │ │
|
||||
│ │─────────────────────────────────────────►│ │
|
||||
│ │ │ │
|
||||
│ │ Task assignment │ │
|
||||
│ │ - Contains short-lived credentials │ │
|
||||
│ │ - Scoped to specific target │ │
|
||||
│ │ - Expires after task timeout │ │
|
||||
│ │◄─────────────────────────────────────────│ │
|
||||
│ │ │ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Agents
|
||||
CREATE TABLE release.agents (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
version VARCHAR(50) NOT NULL,
|
||||
capabilities JSONB NOT NULL DEFAULT '[]',
|
||||
labels JSONB NOT NULL DEFAULT '{}',
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'offline' CHECK (status IN (
|
||||
'online', 'offline', 'degraded'
|
||||
)),
|
||||
last_heartbeat TIMESTAMPTZ,
|
||||
resource_usage JSONB,
|
||||
certificate_fingerprint VARCHAR(64),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_agents_tenant ON release.agents(tenant_id);
|
||||
CREATE INDEX idx_agents_status ON release.agents(status);
|
||||
CREATE INDEX idx_agents_capabilities ON release.agents USING GIN (capabilities);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Agent Registration
|
||||
POST /api/v1/agents/register
|
||||
Headers: X-Registration-Token: {token}
|
||||
Body: { name, version, capabilities, csr }
|
||||
Response: { agentId, certificate, caCertificate }
|
||||
|
||||
# Agent Management
|
||||
GET /api/v1/agents
|
||||
Query: ?status={online|offline|degraded}&capability={type}
|
||||
Response: Agent[]
|
||||
|
||||
GET /api/v1/agents/{id}
|
||||
Response: Agent
|
||||
|
||||
PUT /api/v1/agents/{id}
|
||||
Body: { labels?, capabilities? }
|
||||
Response: Agent
|
||||
|
||||
DELETE /api/v1/agents/{id}
|
||||
Response: { deleted: true }
|
||||
|
||||
# Agent Communication
|
||||
POST /api/v1/agents/{id}/heartbeat
|
||||
Body: { status, resourceUsage, capabilities }
|
||||
Response: { tasks: AgentTask[] }
|
||||
|
||||
POST /api/v1/agents/{id}/tasks/{taskId}/complete
|
||||
Body: { success, result, logs }
|
||||
Response: { acknowledged: true }
|
||||
|
||||
# WebSocket for real-time task stream
|
||||
WS /api/v1/agents/{id}/task-stream
|
||||
Messages:
|
||||
- { type: "task_assigned", task: AgentTask }
|
||||
- { type: "task_cancelled", taskId }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Deploy Orchestrator](deploy-orchestrator.md)
|
||||
- [Agent Security](../security/agent-security.md)
|
||||
- [API Documentation](../api/agents.md)
|
||||
477
docs/modules/release-orchestrator/modules/deploy-orchestrator.md
Normal file
477
docs/modules/release-orchestrator/modules/deploy-orchestrator.md
Normal file
@@ -0,0 +1,477 @@
|
||||
# DEPLOY: Deployment Execution
|
||||
|
||||
**Purpose**: Orchestrate deployment jobs, execute on targets, manage rollbacks, and generate artifacts.
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `deploy-orchestrator`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Deployment job coordination; strategy execution |
|
||||
| **Dependencies** | `target-executor`, `artifact-generator`, `agent-manager` |
|
||||
| **Data Entities** | `DeploymentJob`, `DeploymentTask` |
|
||||
| **Events Produced** | `deployment.started`, `deployment.task_started`, `deployment.task_completed`, `deployment.completed`, `deployment.failed` |
|
||||
|
||||
**Deployment Job Entity**:
|
||||
```typescript
|
||||
interface DeploymentJob {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
promotionId: UUID;
|
||||
releaseId: UUID;
|
||||
environmentId: UUID;
|
||||
status: DeploymentStatus;
|
||||
strategy: DeploymentStrategy;
|
||||
startedAt: DateTime;
|
||||
completedAt: DateTime;
|
||||
artifacts: GeneratedArtifact[];
|
||||
rollbackOf: UUID | null; // If this is a rollback job
|
||||
tasks: DeploymentTask[];
|
||||
}
|
||||
|
||||
type DeploymentStatus =
|
||||
| "pending" // Waiting to start
|
||||
| "running" // Deployment in progress
|
||||
| "succeeded" // All tasks succeeded
|
||||
| "failed" // One or more tasks failed
|
||||
| "cancelled" // User cancelled
|
||||
| "rolling_back" // Rollback in progress
|
||||
| "rolled_back"; // Rollback complete
|
||||
|
||||
interface DeploymentTask {
|
||||
id: UUID;
|
||||
jobId: UUID;
|
||||
targetId: UUID;
|
||||
digest: string;
|
||||
status: TaskStatus;
|
||||
agentId: UUID | null;
|
||||
startedAt: DateTime;
|
||||
completedAt: DateTime;
|
||||
exitCode: number | null;
|
||||
logs: string;
|
||||
previousDigest: string | null;
|
||||
stickerWritten: boolean;
|
||||
}
|
||||
|
||||
type TaskStatus =
|
||||
| "pending"
|
||||
| "running"
|
||||
| "succeeded"
|
||||
| "failed"
|
||||
| "cancelled"
|
||||
| "skipped";
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `target-executor`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Target-specific deployment logic |
|
||||
| **Dependencies** | `agent-manager`, `connector-runtime` |
|
||||
| **Protocol** | gRPC for agents, SSH/WinRM for agentless |
|
||||
|
||||
**Executor Types**:
|
||||
|
||||
| Type | Transport | Use Case |
|
||||
|------|-----------|----------|
|
||||
| `agent-docker` | gRPC | Docker hosts with agent |
|
||||
| `agent-compose` | gRPC | Compose hosts with agent |
|
||||
| `ssh-remote` | SSH | Agentless Linux hosts |
|
||||
| `winrm-remote` | WinRM | Agentless Windows hosts |
|
||||
| `ecs-api` | AWS API | AWS ECS services |
|
||||
| `nomad-api` | Nomad API | HashiCorp Nomad jobs |
|
||||
|
||||
---
|
||||
|
||||
### Module: `runner-executor`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Script/hook execution in sandbox |
|
||||
| **Dependencies** | `plugin-sandbox` |
|
||||
| **Supported Scripts** | C# (.csx), Bash, PowerShell |
|
||||
|
||||
**Hook Types**:
|
||||
- `pre-deploy`: Run before deployment starts
|
||||
- `post-deploy`: Run after deployment succeeds
|
||||
- `on-failure`: Run when deployment fails
|
||||
- `on-rollback`: Run during rollback
|
||||
|
||||
---
|
||||
|
||||
### Module: `artifact-generator`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Generate immutable deployment artifacts |
|
||||
| **Dependencies** | `release-manager`, `environment-manager` |
|
||||
| **Data Entities** | `GeneratedArtifact`, `ComposeLock`, `VersionSticker` |
|
||||
|
||||
**Generated Artifacts**:
|
||||
|
||||
| Artifact Type | Description |
|
||||
|---------------|-------------|
|
||||
| `compose_lock` | `compose.stella.lock.yml` - Pinned digests |
|
||||
| `script` | Compiled deployment script |
|
||||
| `sticker` | `stella.version.json` - Version marker |
|
||||
| `evidence` | Decision and execution evidence |
|
||||
| `config` | Environment-specific config files |
|
||||
|
||||
**Compose Lock File Generation**:
|
||||
```typescript
|
||||
class ComposeLockGenerator {
|
||||
async generate(
|
||||
release: Release,
|
||||
environment: Environment,
|
||||
targets: Target[]
|
||||
): Promise<GeneratedArtifact> {
|
||||
|
||||
const services: Record<string, any> = {};
|
||||
|
||||
for (const component of release.components) {
|
||||
services[component.componentName] = {
|
||||
// CRITICAL: Always use digest, never tag
|
||||
image: `${component.imageRepository}@${component.digest}`,
|
||||
|
||||
// Environment variables
|
||||
environment: this.mergeEnvironment(
|
||||
environment.config.variables,
|
||||
this.buildStellaEnv(release, environment)
|
||||
),
|
||||
|
||||
// Labels for Stella tracking
|
||||
labels: {
|
||||
"stella.release.id": release.id,
|
||||
"stella.release.name": release.name,
|
||||
"stella.component.name": component.componentName,
|
||||
"stella.component.digest": component.digest,
|
||||
"stella.environment": environment.name,
|
||||
"stella.deployed.at": new Date().toISOString(),
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
const composeLock = {
|
||||
version: "3.8",
|
||||
services,
|
||||
"x-stella": {
|
||||
release_id: release.id,
|
||||
release_name: release.name,
|
||||
environment: environment.name,
|
||||
generated_at: new Date().toISOString(),
|
||||
inputs_hash: this.computeInputsHash(release, environment),
|
||||
components: release.components.map(c => ({
|
||||
name: c.componentName,
|
||||
digest: c.digest,
|
||||
semver: c.semver,
|
||||
})),
|
||||
},
|
||||
};
|
||||
|
||||
const content = yaml.stringify(composeLock);
|
||||
const hash = crypto.createHash("sha256").update(content).digest("hex");
|
||||
|
||||
return {
|
||||
type: "compose_lock",
|
||||
name: "compose.stella.lock.yml",
|
||||
content: Buffer.from(content),
|
||||
contentHash: `sha256:${hash}`,
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Version Sticker Generation**:
|
||||
```typescript
|
||||
interface VersionSticker {
|
||||
stella_version: "1.0";
|
||||
release_id: UUID;
|
||||
release_name: string;
|
||||
components: Array<{
|
||||
name: string;
|
||||
digest: string;
|
||||
semver: string;
|
||||
tag: string;
|
||||
image_repository: string;
|
||||
}>;
|
||||
environment: string;
|
||||
environment_id: UUID;
|
||||
deployed_at: string;
|
||||
deployed_by: UUID;
|
||||
promotion_id: UUID;
|
||||
workflow_run_id: UUID;
|
||||
evidence_packet_id: UUID;
|
||||
evidence_packet_hash: string;
|
||||
orchestrator_version: string;
|
||||
source_ref?: {
|
||||
commit_sha: string;
|
||||
branch: string;
|
||||
repository: string;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `rollback-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Rollback orchestration; previous state recovery |
|
||||
| **Dependencies** | `deploy-orchestrator`, `target-registry` |
|
||||
|
||||
**Rollback Strategies**:
|
||||
|
||||
| Strategy | Description |
|
||||
|----------|-------------|
|
||||
| `to-previous` | Roll back to last successful deployment |
|
||||
| `to-release` | Roll back to specific release ID |
|
||||
| `to-sticker` | Roll back to version in sticker on target |
|
||||
|
||||
**Rollback Flow**:
|
||||
1. Identify rollback target (previous release or specified)
|
||||
2. Create rollback deployment job
|
||||
3. Execute deployment with rollback artifacts
|
||||
4. Update target state and sticker
|
||||
5. Record rollback evidence
|
||||
|
||||
---
|
||||
|
||||
## Deployment Strategies
|
||||
|
||||
### All-at-Once
|
||||
Deploy to all targets simultaneously.
|
||||
|
||||
```typescript
|
||||
interface AllAtOnceConfig {
|
||||
parallelism: number; // Max concurrent deployments (0 = unlimited)
|
||||
continueOnFailure: boolean; // Continue if some targets fail
|
||||
failureThreshold: number; // Max failures before abort
|
||||
}
|
||||
```
|
||||
|
||||
### Rolling
|
||||
Deploy to targets sequentially with health checks.
|
||||
|
||||
```typescript
|
||||
interface RollingConfig {
|
||||
batchSize: number; // Targets per batch
|
||||
batchDelay: number; // Seconds between batches
|
||||
healthCheckBetweenBatches: boolean;
|
||||
rollbackOnFailure: boolean;
|
||||
maxUnavailable: number; // Max targets unavailable at once
|
||||
}
|
||||
```
|
||||
|
||||
### Canary
|
||||
Deploy to subset, verify, then proceed.
|
||||
|
||||
```typescript
|
||||
interface CanaryConfig {
|
||||
canaryTargets: number; // Number or percentage for canary
|
||||
canaryDuration: number; // Seconds to run canary
|
||||
healthThreshold: number; // Required health percentage
|
||||
autoPromote: boolean; // Auto-proceed if healthy
|
||||
requireApproval: boolean; // Require manual approval
|
||||
}
|
||||
```
|
||||
|
||||
### Blue-Green
|
||||
Deploy to B, switch traffic, retire A.
|
||||
|
||||
```typescript
|
||||
interface BlueGreenConfig {
|
||||
targetGroupA: UUID; // Current (blue) target group
|
||||
targetGroupB: UUID; // New (green) target group
|
||||
trafficShiftType: "instant" | "gradual";
|
||||
gradualShiftSteps?: number[]; // e.g., [10, 25, 50, 100]
|
||||
rollbackOnHealthFailure: boolean;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rolling Deployment Algorithm
|
||||
|
||||
```python
|
||||
class RollingDeploymentExecutor:
|
||||
def execute(self, job: DeploymentJob, config: RollingConfig) -> DeploymentResult:
|
||||
targets = self.get_targets(job.environment_id)
|
||||
batches = self.create_batches(targets, config.batch_size)
|
||||
|
||||
deployed_targets = []
|
||||
failed_targets = []
|
||||
|
||||
for batch_index, batch in enumerate(batches):
|
||||
self.log(f"Starting batch {batch_index + 1} of {len(batches)}")
|
||||
|
||||
# Deploy batch in parallel
|
||||
batch_results = self.deploy_batch(job, batch)
|
||||
|
||||
for target, result in batch_results:
|
||||
if result.success:
|
||||
deployed_targets.append(target)
|
||||
# Write version sticker
|
||||
self.write_sticker(target, job.release)
|
||||
else:
|
||||
failed_targets.append(target)
|
||||
|
||||
if config.rollback_on_failure:
|
||||
# Rollback all deployed targets
|
||||
self.rollback_targets(deployed_targets, job.previous_release)
|
||||
return DeploymentResult(
|
||||
success=False,
|
||||
error=f"Batch {batch_index + 1} failed, rolled back",
|
||||
deployed=deployed_targets,
|
||||
failed=failed_targets,
|
||||
rolled_back=deployed_targets
|
||||
)
|
||||
|
||||
# Health check between batches
|
||||
if config.health_check_between_batches and batch_index < len(batches) - 1:
|
||||
health_result = self.check_batch_health(deployed_targets[-len(batch):])
|
||||
|
||||
if not health_result.healthy:
|
||||
if config.rollback_on_failure:
|
||||
self.rollback_targets(deployed_targets, job.previous_release)
|
||||
return DeploymentResult(
|
||||
success=False,
|
||||
error=f"Health check failed after batch {batch_index + 1}",
|
||||
deployed=deployed_targets,
|
||||
failed=failed_targets,
|
||||
rolled_back=deployed_targets
|
||||
)
|
||||
|
||||
# Delay between batches
|
||||
if config.batch_delay > 0 and batch_index < len(batches) - 1:
|
||||
time.sleep(config.batch_delay)
|
||||
|
||||
return DeploymentResult(
|
||||
success=len(failed_targets) == 0,
|
||||
deployed=deployed_targets,
|
||||
failed=failed_targets
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Deployment Jobs
|
||||
CREATE TABLE release.deployment_jobs (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
|
||||
release_id UUID NOT NULL REFERENCES release.releases(id),
|
||||
environment_id UUID NOT NULL REFERENCES release.environments(id),
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
|
||||
'pending', 'running', 'succeeded', 'failed', 'cancelled', 'rolling_back', 'rolled_back'
|
||||
)),
|
||||
strategy VARCHAR(50) NOT NULL DEFAULT 'all-at-once',
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
artifacts JSONB NOT NULL DEFAULT '[]',
|
||||
rollback_of UUID REFERENCES release.deployment_jobs(id),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_deployment_jobs_promotion ON release.deployment_jobs(promotion_id);
|
||||
CREATE INDEX idx_deployment_jobs_status ON release.deployment_jobs(status);
|
||||
|
||||
-- Deployment Tasks
|
||||
CREATE TABLE release.deployment_tasks (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
job_id UUID NOT NULL REFERENCES release.deployment_jobs(id) ON DELETE CASCADE,
|
||||
target_id UUID NOT NULL REFERENCES release.targets(id),
|
||||
digest VARCHAR(100) NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
|
||||
'pending', 'running', 'succeeded', 'failed', 'cancelled', 'skipped'
|
||||
)),
|
||||
agent_id UUID REFERENCES release.agents(id),
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
exit_code INTEGER,
|
||||
logs TEXT,
|
||||
previous_digest VARCHAR(100),
|
||||
sticker_written BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_deployment_tasks_job ON release.deployment_tasks(job_id);
|
||||
CREATE INDEX idx_deployment_tasks_target ON release.deployment_tasks(target_id);
|
||||
CREATE INDEX idx_deployment_tasks_status ON release.deployment_tasks(status);
|
||||
|
||||
-- Generated Artifacts
|
||||
CREATE TABLE release.generated_artifacts (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
deployment_job_id UUID REFERENCES release.deployment_jobs(id) ON DELETE CASCADE,
|
||||
artifact_type VARCHAR(50) NOT NULL CHECK (artifact_type IN (
|
||||
'compose_lock', 'script', 'sticker', 'evidence', 'config'
|
||||
)),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
content_hash VARCHAR(100) NOT NULL,
|
||||
content BYTEA, -- for small artifacts
|
||||
storage_ref VARCHAR(500), -- for large artifacts (S3, etc.)
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_generated_artifacts_job ON release.generated_artifacts(deployment_job_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Deployment Jobs (mostly read-only; created by promotions)
|
||||
GET /api/v1/deployment-jobs
|
||||
Query: ?promotionId={uuid}&status={status}&environmentId={uuid}
|
||||
Response: DeploymentJob[]
|
||||
|
||||
GET /api/v1/deployment-jobs/{id}
|
||||
Response: DeploymentJob (with tasks)
|
||||
|
||||
GET /api/v1/deployment-jobs/{id}/tasks
|
||||
Response: DeploymentTask[]
|
||||
|
||||
GET /api/v1/deployment-jobs/{id}/tasks/{taskId}
|
||||
Response: DeploymentTask (with logs)
|
||||
|
||||
GET /api/v1/deployment-jobs/{id}/tasks/{taskId}/logs
|
||||
Query: ?follow=true
|
||||
Response: string | SSE stream
|
||||
|
||||
GET /api/v1/deployment-jobs/{id}/artifacts
|
||||
Response: GeneratedArtifact[]
|
||||
|
||||
GET /api/v1/deployment-jobs/{id}/artifacts/{artifactId}
|
||||
Response: binary (download)
|
||||
|
||||
# Rollback
|
||||
POST /api/v1/rollbacks
|
||||
Body: {
|
||||
environmentId: UUID,
|
||||
strategy: "to-previous" | "to-release" | "to-sticker",
|
||||
targetReleaseId?: UUID # for to-release strategy
|
||||
}
|
||||
Response: DeploymentJob (rollback job)
|
||||
|
||||
GET /api/v1/rollbacks
|
||||
Query: ?environmentId={uuid}
|
||||
Response: DeploymentJob[] (rollback jobs only)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Agents Specification](agents.md)
|
||||
- [Deployment Strategies](../deployment/strategies.md)
|
||||
- [Artifact Generation](../deployment/artifacts.md)
|
||||
- [API Documentation](../api/deployments.md)
|
||||
418
docs/modules/release-orchestrator/modules/environment-manager.md
Normal file
418
docs/modules/release-orchestrator/modules/environment-manager.md
Normal file
@@ -0,0 +1,418 @@
|
||||
# ENVMGR: Environment & Inventory Manager
|
||||
|
||||
**Purpose**: Model environments, targets, agents, and their relationships.
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `environment-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Environment CRUD, ordering, configuration, freeze windows |
|
||||
| **Dependencies** | `authority` |
|
||||
| **Data Entities** | `Environment`, `EnvironmentConfig`, `FreezeWindow` |
|
||||
| **Events Produced** | `environment.created`, `environment.updated`, `environment.freeze_started`, `environment.freeze_ended` |
|
||||
|
||||
**Key Operations**:
|
||||
```
|
||||
CreateEnvironment(name, displayName, orderIndex, config) → Environment
|
||||
UpdateEnvironment(id, config) → Environment
|
||||
DeleteEnvironment(id) → void
|
||||
SetFreezeWindow(environmentId, start, end, reason, exceptions) → FreezeWindow
|
||||
ClearFreezeWindow(environmentId, windowId) → void
|
||||
ListEnvironments(tenantId) → Environment[]
|
||||
GetEnvironmentState(id) → EnvironmentState
|
||||
```
|
||||
|
||||
**Environment Entity**:
|
||||
```typescript
|
||||
interface Environment {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string; // "dev", "stage", "prod"
|
||||
displayName: string; // "Development"
|
||||
orderIndex: number; // 0, 1, 2 for promotion order
|
||||
config: EnvironmentConfig;
|
||||
freezeWindows: FreezeWindow[];
|
||||
requiredApprovals: number; // 0 for dev, 1+ for prod
|
||||
requireSeparationOfDuties: boolean;
|
||||
autoPromoteFrom: UUID | null; // auto-promote from this env
|
||||
promotionPolicy: string; // OPA policy name
|
||||
createdAt: DateTime;
|
||||
updatedAt: DateTime;
|
||||
}
|
||||
|
||||
interface EnvironmentConfig {
|
||||
variables: Record<string, string>; // env-specific variables
|
||||
secrets: SecretReference[]; // vault references
|
||||
registryOverrides: RegistryOverride[]; // per-env registry
|
||||
agentLabels: string[]; // required agent labels
|
||||
deploymentTimeout: number; // seconds
|
||||
healthCheckConfig: HealthCheckConfig;
|
||||
}
|
||||
|
||||
interface FreezeWindow {
|
||||
id: UUID;
|
||||
start: DateTime;
|
||||
end: DateTime;
|
||||
reason: string;
|
||||
createdBy: UUID;
|
||||
exceptions: UUID[]; // users who can override
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `target-registry`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Deployment target inventory; capability tracking |
|
||||
| **Dependencies** | `environment-manager`, `agent-manager` |
|
||||
| **Data Entities** | `Target`, `TargetGroup`, `TargetCapability` |
|
||||
| **Events Produced** | `target.created`, `target.updated`, `target.deleted`, `target.health_changed` |
|
||||
|
||||
**Target Types** (plugin-provided):
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
| `docker_host` | Single Docker host |
|
||||
| `compose_host` | Docker Compose host |
|
||||
| `ssh_remote` | Generic SSH target |
|
||||
| `winrm_remote` | Windows remote target |
|
||||
| `ecs_service` | AWS ECS service |
|
||||
| `nomad_job` | HashiCorp Nomad job |
|
||||
|
||||
**Target Entity**:
|
||||
```typescript
|
||||
interface Target {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
environmentId: UUID;
|
||||
name: string; // "prod-web-01"
|
||||
targetType: string; // "docker_host"
|
||||
connection: TargetConnection; // type-specific
|
||||
capabilities: TargetCapability[];
|
||||
labels: Record<string, string>; // for grouping
|
||||
healthStatus: HealthStatus;
|
||||
lastHealthCheck: DateTime;
|
||||
deploymentDirectory: string; // where artifacts are placed
|
||||
currentDigest: string | null; // what's currently deployed
|
||||
agentId: UUID | null; // assigned agent
|
||||
}
|
||||
|
||||
interface TargetConnection {
|
||||
// Common fields
|
||||
host: string;
|
||||
port: number;
|
||||
|
||||
// Type-specific (examples)
|
||||
// docker_host:
|
||||
dockerSocket?: string;
|
||||
tlsCert?: SecretReference;
|
||||
|
||||
// ssh_remote:
|
||||
username?: string;
|
||||
privateKey?: SecretReference;
|
||||
|
||||
// ecs_service:
|
||||
cluster?: string;
|
||||
service?: string;
|
||||
region?: string;
|
||||
roleArn?: string;
|
||||
}
|
||||
|
||||
interface TargetGroup {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
environmentId: UUID;
|
||||
name: string;
|
||||
labels: Record<string, string>;
|
||||
createdAt: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `agent-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Agent registration, heartbeat, capability advertisement |
|
||||
| **Dependencies** | `authority` (for agent tokens) |
|
||||
| **Data Entities** | `Agent`, `AgentCapability`, `AgentHeartbeat` |
|
||||
| **Events Produced** | `agent.registered`, `agent.online`, `agent.offline`, `agent.capability_changed` |
|
||||
|
||||
**Agent Lifecycle**:
|
||||
1. Agent starts, requests registration token from Authority
|
||||
2. Agent registers with capabilities and labels
|
||||
3. Agent sends heartbeats (default: 30s interval)
|
||||
4. Agent pulls tasks from task queue
|
||||
5. Agent reports task completion/failure
|
||||
|
||||
**Agent Entity**:
|
||||
```typescript
|
||||
interface Agent {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string;
|
||||
version: string;
|
||||
capabilities: AgentCapability[];
|
||||
labels: Record<string, string>;
|
||||
status: "online" | "offline" | "degraded";
|
||||
lastHeartbeat: DateTime;
|
||||
assignedTargets: UUID[];
|
||||
resourceUsage: ResourceUsage;
|
||||
}
|
||||
|
||||
interface AgentCapability {
|
||||
type: string; // "docker", "compose", "ssh", "winrm"
|
||||
version: string; // capability version
|
||||
config: object; // capability-specific config
|
||||
}
|
||||
|
||||
interface ResourceUsage {
|
||||
cpuPercent: number;
|
||||
memoryPercent: number;
|
||||
diskPercent: number;
|
||||
activeTasks: number;
|
||||
}
|
||||
```
|
||||
|
||||
**Agent Registration Protocol**:
|
||||
```
|
||||
1. Admin generates registration token (one-time use)
|
||||
POST /api/v1/admin/agent-tokens
|
||||
→ { token: "reg_xxx", expiresAt: "..." }
|
||||
|
||||
2. Agent starts with registration token
|
||||
./stella-agent --register --token=reg_xxx
|
||||
|
||||
3. Agent requests mTLS certificate
|
||||
POST /api/v1/agents/register
|
||||
Headers: X-Registration-Token: reg_xxx
|
||||
Body: { name, version, capabilities, csr }
|
||||
→ { agentId, certificate, caCertificate }
|
||||
|
||||
4. Agent establishes mTLS connection
|
||||
Uses issued certificate for all subsequent requests
|
||||
|
||||
5. Agent requests short-lived JWT for task execution
|
||||
POST /api/v1/agents/token (over mTLS)
|
||||
→ { token, expiresIn: 3600 } // 1 hour
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `inventory-sync`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Drift detection; expected vs actual state reconciliation |
|
||||
| **Dependencies** | `target-registry`, `agent-manager` |
|
||||
| **Events Produced** | `inventory.drift_detected`, `inventory.reconciled` |
|
||||
|
||||
**Drift Detection Process**:
|
||||
1. Read `stella.version.json` from target deployment directory
|
||||
2. Compare with expected state in database
|
||||
3. Flag discrepancies (digest mismatch, missing sticker, unexpected files)
|
||||
4. Report on dashboard
|
||||
|
||||
**Drift Detection Types**:
|
||||
|
||||
| Drift Type | Description | Severity |
|
||||
|------------|-------------|----------|
|
||||
| `digest_mismatch` | Running digest differs from expected | Critical |
|
||||
| `missing_sticker` | No version sticker found on target | Warning |
|
||||
| `stale_sticker` | Sticker timestamp older than last deployment | Warning |
|
||||
| `orphan_container` | Container not managed by Stella | Info |
|
||||
| `extra_files` | Unexpected files in deployment directory | Info |
|
||||
|
||||
---
|
||||
|
||||
## Cache Eviction Policies
|
||||
|
||||
Environment configurations and target states are cached to improve performance. **All caches MUST have bounded size and TTL-based eviction**:
|
||||
|
||||
| Cache Type | Purpose | TTL | Max Size | Eviction Strategy |
|
||||
|-----------|---------|-----|----------|-------------------|
|
||||
| **Environment Configs** | Environment configuration data | 30 minutes | 500 entries | Sliding expiration |
|
||||
| **Target Health** | Target health status | 5 minutes | 2,000 entries | Sliding expiration |
|
||||
| **Agent Capabilities** | Agent capability advertisement | 10 minutes | 1,000 entries | Sliding expiration |
|
||||
| **Freeze Windows** | Active freeze window checks | 15 minutes | 100 entries | Absolute expiration |
|
||||
|
||||
**Implementation**:
|
||||
```csharp
|
||||
public class EnvironmentConfigCache
|
||||
{
|
||||
private readonly MemoryCache _cache;
|
||||
|
||||
public EnvironmentConfigCache()
|
||||
{
|
||||
_cache = new MemoryCache(new MemoryCacheOptions
|
||||
{
|
||||
SizeLimit = 500 // Max 500 environment configs
|
||||
});
|
||||
}
|
||||
|
||||
public void CacheConfig(Guid environmentId, EnvironmentConfig config)
|
||||
{
|
||||
_cache.Set(environmentId, config, new MemoryCacheEntryOptions
|
||||
{
|
||||
Size = 1,
|
||||
SlidingExpiration = TimeSpan.FromMinutes(30) // 30-minute TTL
|
||||
});
|
||||
}
|
||||
|
||||
public EnvironmentConfig? GetCachedConfig(Guid environmentId)
|
||||
=> _cache.Get<EnvironmentConfig>(environmentId);
|
||||
|
||||
public void InvalidateConfig(Guid environmentId)
|
||||
=> _cache.Remove(environmentId);
|
||||
}
|
||||
```
|
||||
|
||||
**Cache Invalidation**:
|
||||
- Environment configs: Invalidate on update
|
||||
- Target health: Invalidate on health check or deployment
|
||||
- Agent capabilities: Invalidate on capability change event
|
||||
- Freeze windows: Invalidate on window creation/deletion
|
||||
|
||||
**Reference**: See [Implementation Guide](../implementation-guide.md#caching) for cache implementation patterns.
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Environments
|
||||
CREATE TABLE release.environments (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(100) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
order_index INTEGER NOT NULL,
|
||||
config JSONB NOT NULL DEFAULT '{}',
|
||||
freeze_windows JSONB NOT NULL DEFAULT '[]',
|
||||
required_approvals INTEGER NOT NULL DEFAULT 0,
|
||||
require_sod BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
auto_promote_from UUID REFERENCES release.environments(id),
|
||||
promotion_policy VARCHAR(255),
|
||||
deployment_timeout INTEGER NOT NULL DEFAULT 600,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_environments_tenant ON release.environments(tenant_id);
|
||||
CREATE INDEX idx_environments_order ON release.environments(tenant_id, order_index);
|
||||
|
||||
-- Target Groups
|
||||
CREATE TABLE release.target_groups (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
labels JSONB NOT NULL DEFAULT '{}',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, environment_id, name)
|
||||
);
|
||||
|
||||
-- Targets
|
||||
CREATE TABLE release.targets (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE,
|
||||
target_group_id UUID REFERENCES release.target_groups(id),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
target_type VARCHAR(100) NOT NULL,
|
||||
connection JSONB NOT NULL,
|
||||
capabilities JSONB NOT NULL DEFAULT '[]',
|
||||
labels JSONB NOT NULL DEFAULT '{}',
|
||||
deployment_directory VARCHAR(500),
|
||||
health_status VARCHAR(50) NOT NULL DEFAULT 'unknown',
|
||||
last_health_check TIMESTAMPTZ,
|
||||
current_digest VARCHAR(100),
|
||||
agent_id UUID REFERENCES release.agents(id),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, environment_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_targets_tenant_env ON release.targets(tenant_id, environment_id);
|
||||
CREATE INDEX idx_targets_type ON release.targets(target_type);
|
||||
CREATE INDEX idx_targets_labels ON release.targets USING GIN (labels);
|
||||
|
||||
-- Agents
|
||||
CREATE TABLE release.agents (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
version VARCHAR(50) NOT NULL,
|
||||
capabilities JSONB NOT NULL DEFAULT '[]',
|
||||
labels JSONB NOT NULL DEFAULT '{}',
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'offline',
|
||||
last_heartbeat TIMESTAMPTZ,
|
||||
resource_usage JSONB,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_agents_tenant ON release.agents(tenant_id);
|
||||
CREATE INDEX idx_agents_status ON release.agents(status);
|
||||
CREATE INDEX idx_agents_capabilities ON release.agents USING GIN (capabilities);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Environments
|
||||
POST /api/v1/environments
|
||||
GET /api/v1/environments
|
||||
GET /api/v1/environments/{id}
|
||||
PUT /api/v1/environments/{id}
|
||||
DELETE /api/v1/environments/{id}
|
||||
|
||||
# Freeze Windows
|
||||
POST /api/v1/environments/{envId}/freeze-windows
|
||||
GET /api/v1/environments/{envId}/freeze-windows
|
||||
DELETE /api/v1/environments/{envId}/freeze-windows/{windowId}
|
||||
|
||||
# Target Groups
|
||||
POST /api/v1/environments/{envId}/target-groups
|
||||
GET /api/v1/environments/{envId}/target-groups
|
||||
GET /api/v1/target-groups/{id}
|
||||
PUT /api/v1/target-groups/{id}
|
||||
DELETE /api/v1/target-groups/{id}
|
||||
|
||||
# Targets
|
||||
POST /api/v1/targets
|
||||
GET /api/v1/targets
|
||||
GET /api/v1/targets/{id}
|
||||
PUT /api/v1/targets/{id}
|
||||
DELETE /api/v1/targets/{id}
|
||||
POST /api/v1/targets/{id}/health-check
|
||||
GET /api/v1/targets/{id}/sticker
|
||||
GET /api/v1/targets/{id}/drift
|
||||
|
||||
# Agents
|
||||
POST /api/v1/agents/register
|
||||
GET /api/v1/agents
|
||||
GET /api/v1/agents/{id}
|
||||
PUT /api/v1/agents/{id}
|
||||
DELETE /api/v1/agents/{id}
|
||||
POST /api/v1/agents/{id}/heartbeat
|
||||
POST /api/v1/agents/{id}/tasks/{taskId}/complete
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Agent Specification](agents.md)
|
||||
- [API Documentation](../api/environments.md)
|
||||
- [Agent Security](../security/agent-security.md)
|
||||
575
docs/modules/release-orchestrator/modules/evidence.md
Normal file
575
docs/modules/release-orchestrator/modules/evidence.md
Normal file
@@ -0,0 +1,575 @@
|
||||
# RELEVI: Release Evidence
|
||||
|
||||
**Purpose**: Cryptographically sealed evidence packets for audit-grade release governance.
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `evidence-collector`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Evidence aggregation; packet composition |
|
||||
| **Dependencies** | `promotion-manager`, `deploy-orchestrator`, `decision-engine` |
|
||||
| **Data Entities** | `EvidencePacket`, `EvidenceContent` |
|
||||
| **Events Produced** | `evidence.collected`, `evidence.packet_created` |
|
||||
|
||||
**Evidence Packet Structure**:
|
||||
```typescript
|
||||
interface EvidencePacket {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
promotionId: UUID;
|
||||
packetType: EvidencePacketType;
|
||||
content: EvidenceContent;
|
||||
contentHash: string; // SHA-256 of content
|
||||
signature: string; // Cryptographic signature
|
||||
signerKeyRef: string; // Reference to signing key
|
||||
createdAt: DateTime;
|
||||
// Note: No updatedAt - packets are immutable
|
||||
}
|
||||
|
||||
type EvidencePacketType =
|
||||
| "release_decision" // Promotion decision evidence
|
||||
| "deployment" // Deployment execution evidence
|
||||
| "rollback" // Rollback evidence
|
||||
| "ab_promotion"; // A/B promotion evidence
|
||||
|
||||
interface EvidenceContent {
|
||||
// Metadata
|
||||
version: "1.0";
|
||||
generatedAt: DateTime;
|
||||
generatorVersion: string;
|
||||
|
||||
// What
|
||||
release: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
components: Array<{
|
||||
name: string;
|
||||
digest: string;
|
||||
semver: string;
|
||||
imageRepository: string;
|
||||
}>;
|
||||
sourceRef: SourceReference | null;
|
||||
};
|
||||
|
||||
// Where
|
||||
environment: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
targets: Array<{
|
||||
id: UUID;
|
||||
name: string;
|
||||
type: string;
|
||||
}>;
|
||||
};
|
||||
|
||||
// Who
|
||||
actors: {
|
||||
requester: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
email: string;
|
||||
};
|
||||
approvers: Array<{
|
||||
id: UUID;
|
||||
name: string;
|
||||
action: string;
|
||||
at: DateTime;
|
||||
comment: string | null;
|
||||
}>;
|
||||
};
|
||||
|
||||
// Why
|
||||
decision: {
|
||||
result: "allow" | "deny";
|
||||
gates: Array<{
|
||||
type: string;
|
||||
name: string;
|
||||
status: string;
|
||||
message: string;
|
||||
details: Record<string, any>;
|
||||
}>;
|
||||
reasons: string[];
|
||||
};
|
||||
|
||||
// How
|
||||
execution: {
|
||||
workflowRunId: UUID | null;
|
||||
deploymentJobId: UUID | null;
|
||||
artifacts: Array<{
|
||||
type: string;
|
||||
name: string;
|
||||
contentHash: string;
|
||||
}>;
|
||||
logs: string | null; // Compressed/truncated
|
||||
};
|
||||
|
||||
// When
|
||||
timeline: {
|
||||
requestedAt: DateTime;
|
||||
decidedAt: DateTime | null;
|
||||
startedAt: DateTime | null;
|
||||
completedAt: DateTime | null;
|
||||
};
|
||||
|
||||
// Integrity
|
||||
inputsHash: string; // Hash of all inputs for replay
|
||||
previousEvidenceId: UUID | null; // Chain to previous evidence
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `evidence-signer`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Cryptographic signing of evidence packets |
|
||||
| **Dependencies** | `authority`, `vault` (for key storage) |
|
||||
| **Algorithms** | RS256, ES256, Ed25519 |
|
||||
|
||||
**Signing Process**:
|
||||
```typescript
|
||||
class EvidenceSigner {
|
||||
async sign(content: EvidenceContent): Promise<SignedEvidence> {
|
||||
// 1. Canonicalize content (RFC 8785)
|
||||
const canonicalJson = canonicalize(content);
|
||||
|
||||
// 2. Compute content hash
|
||||
const contentHash = crypto
|
||||
.createHash("sha256")
|
||||
.update(canonicalJson)
|
||||
.digest("hex");
|
||||
|
||||
// 3. Get signing key from vault
|
||||
const keyRef = await this.getActiveSigningKey();
|
||||
const privateKey = await this.vault.getPrivateKey(keyRef);
|
||||
|
||||
// 4. Sign the content hash
|
||||
const signature = await this.signWithKey(contentHash, privateKey);
|
||||
|
||||
return {
|
||||
content,
|
||||
contentHash: `sha256:${contentHash}`,
|
||||
signature: base64Encode(signature),
|
||||
signerKeyRef: keyRef,
|
||||
algorithm: this.config.signatureAlgorithm,
|
||||
};
|
||||
}
|
||||
|
||||
async verify(packet: EvidencePacket): Promise<VerificationResult> {
|
||||
// 1. Canonicalize stored content
|
||||
const canonicalJson = canonicalize(packet.content);
|
||||
|
||||
// 2. Verify content hash
|
||||
const computedHash = crypto
|
||||
.createHash("sha256")
|
||||
.update(canonicalJson)
|
||||
.digest("hex");
|
||||
|
||||
if (`sha256:${computedHash}` !== packet.contentHash) {
|
||||
return { valid: false, error: "Content hash mismatch" };
|
||||
}
|
||||
|
||||
// 3. Get public key
|
||||
const publicKey = await this.vault.getPublicKey(packet.signerKeyRef);
|
||||
|
||||
// 4. Verify signature
|
||||
const signatureValid = await this.verifySignature(
|
||||
computedHash,
|
||||
base64Decode(packet.signature),
|
||||
publicKey
|
||||
);
|
||||
|
||||
return {
|
||||
valid: signatureValid,
|
||||
signerKeyRef: packet.signerKeyRef,
|
||||
signedAt: packet.createdAt,
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `sticker-writer`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Version sticker generation and placement |
|
||||
| **Dependencies** | `deploy-orchestrator`, `agent-manager` |
|
||||
| **Data Entities** | `VersionSticker` |
|
||||
|
||||
**Version Sticker Schema**:
|
||||
```typescript
|
||||
interface VersionSticker {
|
||||
stella_version: "1.0";
|
||||
|
||||
// Release identity
|
||||
release_id: UUID;
|
||||
release_name: string;
|
||||
|
||||
// Component details
|
||||
components: Array<{
|
||||
name: string;
|
||||
digest: string;
|
||||
semver: string;
|
||||
tag: string;
|
||||
image_repository: string;
|
||||
}>;
|
||||
|
||||
// Deployment context
|
||||
environment: string;
|
||||
environment_id: UUID;
|
||||
deployed_at: string; // ISO 8601
|
||||
deployed_by: UUID;
|
||||
|
||||
// Traceability
|
||||
promotion_id: UUID;
|
||||
workflow_run_id: UUID;
|
||||
|
||||
// Evidence chain
|
||||
evidence_packet_id: UUID;
|
||||
evidence_packet_hash: string;
|
||||
policy_decision_hash: string;
|
||||
|
||||
// Orchestrator info
|
||||
orchestrator_version: string;
|
||||
|
||||
// Source reference
|
||||
source_ref?: {
|
||||
commit_sha: string;
|
||||
branch: string;
|
||||
repository: string;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Sticker Placement**:
|
||||
- Written to `/var/stella/version.json` on each target
|
||||
- Atomic write (write to temp, rename)
|
||||
- Read during drift detection
|
||||
- Verified against expected state
|
||||
|
||||
---
|
||||
|
||||
### Module: `audit-exporter`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Compliance report generation; evidence export |
|
||||
| **Dependencies** | `evidence-collector` |
|
||||
| **Export Formats** | JSON, PDF, CSV |
|
||||
|
||||
**Audit Report Types**:
|
||||
|
||||
| Report Type | Description |
|
||||
|-------------|-------------|
|
||||
| `release_audit` | Full audit trail for a release |
|
||||
| `environment_audit` | All deployments to an environment |
|
||||
| `compliance_summary` | Summary for compliance review |
|
||||
| `change_log` | Chronological change log |
|
||||
|
||||
**Report Generation**:
|
||||
```typescript
|
||||
interface AuditReportRequest {
|
||||
type: AuditReportType;
|
||||
scope: {
|
||||
releaseId?: UUID;
|
||||
environmentId?: UUID;
|
||||
from?: DateTime;
|
||||
to?: DateTime;
|
||||
};
|
||||
format: "json" | "pdf" | "csv";
|
||||
options?: {
|
||||
includeDecisionDetails: boolean;
|
||||
includeApproverDetails: boolean;
|
||||
includeLogs: boolean;
|
||||
includeArtifacts: boolean;
|
||||
};
|
||||
}
|
||||
|
||||
interface AuditReport {
|
||||
id: UUID;
|
||||
type: AuditReportType;
|
||||
scope: ReportScope;
|
||||
generatedAt: DateTime;
|
||||
generatedBy: UUID;
|
||||
|
||||
summary: {
|
||||
totalPromotions: number;
|
||||
successfulDeployments: number;
|
||||
failedDeployments: number;
|
||||
rollbacks: number;
|
||||
averageDeploymentTime: number;
|
||||
};
|
||||
|
||||
entries: AuditEntry[];
|
||||
|
||||
// For compliance
|
||||
signatureChain: {
|
||||
valid: boolean;
|
||||
verifiedPackets: number;
|
||||
invalidPackets: number;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Immutability Enforcement
|
||||
|
||||
Evidence packets are append-only. This is enforced at multiple levels:
|
||||
|
||||
### Database Level
|
||||
```sql
|
||||
-- Evidence packets table with no UPDATE/DELETE
|
||||
CREATE TABLE release.evidence_packets (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
|
||||
packet_type VARCHAR(50) NOT NULL CHECK (packet_type IN (
|
||||
'release_decision', 'deployment', 'rollback', 'ab_promotion'
|
||||
)),
|
||||
content JSONB NOT NULL,
|
||||
content_hash VARCHAR(100) NOT NULL,
|
||||
signature TEXT,
|
||||
signer_key_ref VARCHAR(255),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
-- Note: No updated_at column; immutable by design
|
||||
);
|
||||
|
||||
-- Append-only enforcement via trigger
|
||||
CREATE OR REPLACE FUNCTION prevent_evidence_modification()
|
||||
RETURNS TRIGGER AS $$
|
||||
BEGIN
|
||||
RAISE EXCEPTION 'Evidence packets are immutable and cannot be modified or deleted';
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
CREATE TRIGGER evidence_packets_immutable
|
||||
BEFORE UPDATE OR DELETE ON evidence_packets
|
||||
FOR EACH ROW EXECUTE FUNCTION prevent_evidence_modification();
|
||||
|
||||
-- Revoke UPDATE/DELETE from application role
|
||||
REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role;
|
||||
|
||||
-- Version stickers table
|
||||
CREATE TABLE release.version_stickers (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
target_id UUID NOT NULL REFERENCES release.targets(id),
|
||||
release_id UUID NOT NULL REFERENCES release.releases(id),
|
||||
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
|
||||
sticker_content JSONB NOT NULL,
|
||||
content_hash VARCHAR(100) NOT NULL,
|
||||
written_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
verified_at TIMESTAMPTZ,
|
||||
drift_detected BOOLEAN NOT NULL DEFAULT FALSE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_version_stickers_target ON release.version_stickers(target_id);
|
||||
CREATE INDEX idx_version_stickers_release ON release.version_stickers(release_id);
|
||||
CREATE INDEX idx_evidence_packets_promotion ON release.evidence_packets(promotion_id);
|
||||
CREATE INDEX idx_evidence_packets_created ON release.evidence_packets(created_at DESC);
|
||||
```
|
||||
|
||||
### Application Level
|
||||
```csharp
|
||||
// Evidence service enforces immutability
|
||||
public sealed class EvidenceService
|
||||
{
|
||||
// Only Create method - no Update or Delete
|
||||
public async Task<EvidencePacket> CreateAsync(
|
||||
EvidenceContent content,
|
||||
CancellationToken ct)
|
||||
{
|
||||
// Sign content
|
||||
var signed = await _signer.SignAsync(content, ct);
|
||||
|
||||
// Store (append-only)
|
||||
var packet = new EvidencePacket
|
||||
{
|
||||
Id = Guid.NewGuid(),
|
||||
TenantId = content.TenantId,
|
||||
PromotionId = content.PromotionId,
|
||||
PacketType = content.PacketType,
|
||||
Content = content,
|
||||
ContentHash = signed.ContentHash,
|
||||
Signature = signed.Signature,
|
||||
SignerKeyRef = signed.SignerKeyRef,
|
||||
CreatedAt = DateTime.UtcNow,
|
||||
};
|
||||
|
||||
await _repository.InsertAsync(packet, ct);
|
||||
return packet;
|
||||
}
|
||||
|
||||
// Read methods only
|
||||
public async Task<EvidencePacket> GetAsync(Guid id, CancellationToken ct);
|
||||
public async Task<IReadOnlyList<EvidencePacket>> ListAsync(
|
||||
EvidenceFilter filter, CancellationToken ct);
|
||||
public async Task<VerificationResult> VerifyAsync(
|
||||
Guid id, CancellationToken ct);
|
||||
|
||||
// No Update or Delete methods exist
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Evidence Chain
|
||||
|
||||
Evidence packets form a verifiable chain:
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Evidence #1 │ │ Evidence #2 │ │ Evidence #3 │
|
||||
│ (Dev Deploy) │────►│ (Stage Deploy) │────►│ (Prod Deploy) │
|
||||
│ │ │ │ │ │
|
||||
│ prevEvidenceId: │ │ prevEvidenceId: │ │ prevEvidenceId: │
|
||||
│ null │ │ #1 │ │ #2 │
|
||||
│ │ │ │ │ │
|
||||
│ contentHash: │ │ contentHash: │ │ contentHash: │
|
||||
│ sha256:abc... │ │ sha256:def... │ │ sha256:ghi... │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
**Chain Verification**:
|
||||
```typescript
|
||||
async function verifyEvidenceChain(releaseId: UUID): Promise<ChainVerificationResult> {
|
||||
const packets = await getPacketsForRelease(releaseId);
|
||||
const results: PacketVerificationResult[] = [];
|
||||
|
||||
let previousHash: string | null = null;
|
||||
|
||||
for (const packet of packets) {
|
||||
// 1. Verify packet signature
|
||||
const signatureValid = await verifySignature(packet);
|
||||
|
||||
// 2. Verify content hash
|
||||
const contentValid = await verifyContentHash(packet);
|
||||
|
||||
// 3. Verify chain link
|
||||
const chainValid = packet.content.previousEvidenceId === null
|
||||
? previousHash === null
|
||||
: await verifyPreviousLink(packet, previousHash);
|
||||
|
||||
results.push({
|
||||
packetId: packet.id,
|
||||
signatureValid,
|
||||
contentValid,
|
||||
chainValid,
|
||||
valid: signatureValid && contentValid && chainValid,
|
||||
});
|
||||
|
||||
previousHash = packet.contentHash;
|
||||
}
|
||||
|
||||
return {
|
||||
valid: results.every(r => r.valid),
|
||||
packets: results,
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Evidence Packets
|
||||
GET /api/v1/evidence-packets
|
||||
Query: ?promotionId={uuid}&type={type}&from={date}&to={date}
|
||||
Response: EvidencePacket[]
|
||||
|
||||
GET /api/v1/evidence-packets/{id}
|
||||
Response: EvidencePacket (full content)
|
||||
|
||||
GET /api/v1/evidence-packets/{id}/verify
|
||||
Response: VerificationResult
|
||||
|
||||
GET /api/v1/evidence-packets/{id}/download
|
||||
Query: ?format={json|pdf}
|
||||
Response: binary
|
||||
|
||||
# Evidence Chain
|
||||
GET /api/v1/releases/{id}/evidence-chain
|
||||
Response: EvidenceChain
|
||||
|
||||
GET /api/v1/releases/{id}/evidence-chain/verify
|
||||
Response: ChainVerificationResult
|
||||
|
||||
# Audit Reports
|
||||
POST /api/v1/audit-reports
|
||||
Body: {
|
||||
type: "release" | "environment" | "compliance",
|
||||
scope: { releaseId?, environmentId?, from?, to? },
|
||||
format: "json" | "pdf" | "csv"
|
||||
}
|
||||
Response: { reportId: UUID, status: "generating" }
|
||||
|
||||
GET /api/v1/audit-reports/{id}
|
||||
Response: { status, downloadUrl? }
|
||||
|
||||
GET /api/v1/audit-reports/{id}/download
|
||||
Response: binary
|
||||
|
||||
# Version Stickers
|
||||
GET /api/v1/version-stickers
|
||||
Query: ?targetId={uuid}&releaseId={uuid}
|
||||
Response: VersionSticker[]
|
||||
|
||||
GET /api/v1/version-stickers/{id}
|
||||
Response: VersionSticker
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deterministic Replay
|
||||
|
||||
Evidence packets enable deterministic replay - given the same inputs and policy version, the same decision is produced:
|
||||
|
||||
```typescript
|
||||
async function replayDecision(evidencePacket: EvidencePacket): Promise<ReplayResult> {
|
||||
const content = evidencePacket.content;
|
||||
|
||||
// 1. Verify inputs hash
|
||||
const currentInputsHash = computeInputsHash(
|
||||
content.release,
|
||||
content.environment,
|
||||
content.decision.gates
|
||||
);
|
||||
|
||||
if (currentInputsHash !== content.inputsHash) {
|
||||
return { valid: false, error: "Inputs have changed since original decision" };
|
||||
}
|
||||
|
||||
// 2. Re-evaluate decision with same inputs
|
||||
const replayedDecision = await evaluateDecision(
|
||||
content.release,
|
||||
content.environment,
|
||||
{ asOf: content.timeline.decidedAt } // Use policy version from that time
|
||||
);
|
||||
|
||||
// 3. Compare decisions
|
||||
const decisionsMatch = replayedDecision.result === content.decision.result;
|
||||
|
||||
return {
|
||||
valid: decisionsMatch,
|
||||
originalDecision: content.decision.result,
|
||||
replayedDecision: replayedDecision.result,
|
||||
differences: decisionsMatch ? [] : computeDifferences(content.decision, replayedDecision),
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Design Principles](../design/principles.md)
|
||||
- [Security Architecture](../security/overview.md)
|
||||
- [Evidence Schema](../appendices/evidence-schema.md)
|
||||
373
docs/modules/release-orchestrator/modules/integration-hub.md
Normal file
373
docs/modules/release-orchestrator/modules/integration-hub.md
Normal file
@@ -0,0 +1,373 @@
|
||||
# INTHUB: Integration Hub
|
||||
|
||||
**Purpose**: Central management of all external integrations (SCM, CI, registries, vaults, targets).
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `integration-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | CRUD for integration instances; plugin type registry |
|
||||
| **Dependencies** | `plugin-registry`, `authority` (for credentials) |
|
||||
| **Data Entities** | `Integration`, `IntegrationType`, `IntegrationCredential` |
|
||||
| **Events Produced** | `integration.created`, `integration.updated`, `integration.deleted`, `integration.health_changed` |
|
||||
| **Events Consumed** | `plugin.registered`, `plugin.unregistered` |
|
||||
|
||||
**Key Operations**:
|
||||
```
|
||||
CreateIntegration(type, name, config, credentials) → Integration
|
||||
UpdateIntegration(id, config, credentials) → Integration
|
||||
DeleteIntegration(id) → void
|
||||
TestConnection(id) → ConnectionTestResult
|
||||
DiscoverResources(id, resourceType) → Resource[]
|
||||
GetIntegrationHealth(id) → HealthStatus
|
||||
ListIntegrations(filter) → Integration[]
|
||||
```
|
||||
|
||||
**Integration Entity**:
|
||||
```typescript
|
||||
interface Integration {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
type: string; // "scm.github", "registry.harbor"
|
||||
name: string; // user-defined name
|
||||
config: IntegrationConfig; // type-specific config
|
||||
credentialId: UUID; // reference to vault
|
||||
healthStatus: HealthStatus;
|
||||
lastHealthCheck: DateTime;
|
||||
createdAt: DateTime;
|
||||
updatedAt: DateTime;
|
||||
}
|
||||
|
||||
interface IntegrationConfig {
|
||||
endpoint: string;
|
||||
authMode: "token" | "oauth" | "mtls" | "iam";
|
||||
timeout: number;
|
||||
retryPolicy: RetryPolicy;
|
||||
customHeaders?: Record<string, string>;
|
||||
// Type-specific fields added by plugin
|
||||
[key: string]: any;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `connection-profiles`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Default settings management; "last used" pattern |
|
||||
| **Dependencies** | `integration-manager` |
|
||||
| **Data Entities** | `ConnectionProfile`, `ProfileTemplate` |
|
||||
|
||||
**Behavior**: When user adds a new integration instance:
|
||||
1. Wizard defaults to last used endpoint, auth mode, network settings
|
||||
2. Secrets are **never** auto-reused (explicit confirmation required)
|
||||
3. User can save as named profile for reuse
|
||||
|
||||
**Profile Entity**:
|
||||
```typescript
|
||||
interface ConnectionProfile {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string; // "Production GitHub"
|
||||
integrationType: string;
|
||||
defaultConfig: Partial<IntegrationConfig>;
|
||||
isDefault: boolean;
|
||||
lastUsedAt: DateTime;
|
||||
createdBy: UUID;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `connector-runtime`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Execute plugin connector logic in controlled environment |
|
||||
| **Dependencies** | `plugin-loader`, `plugin-sandbox` |
|
||||
| **Protocol** | gRPC (preferred) or HTTP/REST |
|
||||
|
||||
**Connector Interface** (implemented by plugins):
|
||||
```protobuf
|
||||
service Connector {
|
||||
// Connection management
|
||||
rpc TestConnection(TestConnectionRequest) returns (TestConnectionResponse);
|
||||
rpc GetHealth(HealthRequest) returns (HealthResponse);
|
||||
|
||||
// Resource discovery
|
||||
rpc DiscoverResources(DiscoverRequest) returns (DiscoverResponse);
|
||||
rpc ListRepositories(ListReposRequest) returns (ListReposResponse);
|
||||
rpc ListBranches(ListBranchesRequest) returns (ListBranchesResponse);
|
||||
rpc ListTags(ListTagsRequest) returns (ListTagsResponse);
|
||||
|
||||
// Registry operations
|
||||
rpc ResolveTagToDigest(ResolveRequest) returns (ResolveResponse);
|
||||
rpc FetchManifest(ManifestRequest) returns (ManifestResponse);
|
||||
rpc VerifyDigest(VerifyRequest) returns (VerifyResponse);
|
||||
|
||||
// Secrets operations
|
||||
rpc GetSecretsRef(SecretsRequest) returns (SecretsResponse);
|
||||
rpc FetchSecret(FetchSecretRequest) returns (FetchSecretResponse);
|
||||
|
||||
// Workflow step execution
|
||||
rpc ExecuteStep(StepRequest) returns (stream StepResponse);
|
||||
rpc CancelStep(CancelRequest) returns (CancelResponse);
|
||||
}
|
||||
```
|
||||
|
||||
**Request/Response Types**:
|
||||
```protobuf
|
||||
message TestConnectionRequest {
|
||||
string integration_id = 1;
|
||||
map<string, string> config = 2;
|
||||
string credential_ref = 3;
|
||||
}
|
||||
|
||||
message TestConnectionResponse {
|
||||
bool success = 1;
|
||||
string error_message = 2;
|
||||
map<string, string> details = 3;
|
||||
int64 latency_ms = 4;
|
||||
}
|
||||
|
||||
message ResolveRequest {
|
||||
string integration_id = 1;
|
||||
string image_ref = 2; // "myapp:v2.3.1"
|
||||
}
|
||||
|
||||
message ResolveResponse {
|
||||
string digest = 1; // "sha256:abc123..."
|
||||
string manifest_type = 2;
|
||||
int64 size_bytes = 3;
|
||||
google.protobuf.Timestamp pushed_at = 4;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `doctor-checks`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Integration health diagnostics; troubleshooting |
|
||||
| **Dependencies** | `integration-manager`, `connector-runtime` |
|
||||
|
||||
**Doctor Check Types**:
|
||||
|
||||
| Check | Purpose | Pass Criteria |
|
||||
|-------|---------|---------------|
|
||||
| **Connectivity** | Can reach endpoint | TCP connect succeeds |
|
||||
| **TLS** | Certificate valid | Chain validates, not expired |
|
||||
| **Authentication** | Credentials valid | Auth request succeeds |
|
||||
| **Authorization** | Permissions sufficient | Required scopes present |
|
||||
| **Version** | API version supported | Version in supported range |
|
||||
| **Rate Limit** | Quota available | >10% remaining |
|
||||
| **Latency** | Response time acceptable | <5s p99 |
|
||||
|
||||
**Doctor Check Output**:
|
||||
```typescript
|
||||
interface DoctorCheckResult {
|
||||
checkType: string;
|
||||
status: "pass" | "warn" | "fail";
|
||||
message: string;
|
||||
details: Record<string, any>;
|
||||
suggestions: string[];
|
||||
runAt: DateTime;
|
||||
durationMs: number;
|
||||
}
|
||||
|
||||
interface DoctorReport {
|
||||
integrationId: UUID;
|
||||
overallStatus: "healthy" | "degraded" | "unhealthy";
|
||||
checks: DoctorCheckResult[];
|
||||
generatedAt: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cache Eviction Policies
|
||||
|
||||
Integration health status and connector results are cached to reduce load on external systems. **All caches MUST have bounded size and TTL-based eviction**:
|
||||
|
||||
| Cache Type | Purpose | TTL | Max Size | Eviction Strategy |
|
||||
|-----------|---------|-----|----------|-------------------|
|
||||
| **Health Checks** | Integration health status | 5 minutes | 1,000 entries | Sliding expiration |
|
||||
| **Connection Tests** | Test connection results | 2 minutes | 500 entries | Sliding expiration |
|
||||
| **Resource Discovery** | Discovered resources (repos, tags) | 10 minutes | 5,000 entries | Sliding expiration |
|
||||
| **Tag Resolution** | Tag → digest mappings | 1 hour | 10,000 entries | Absolute expiration |
|
||||
|
||||
**Implementation**:
|
||||
```csharp
|
||||
public class IntegrationHealthCache
|
||||
{
|
||||
private readonly MemoryCache _cache;
|
||||
|
||||
public IntegrationHealthCache()
|
||||
{
|
||||
_cache = new MemoryCache(new MemoryCacheOptions
|
||||
{
|
||||
SizeLimit = 1_000 // Max 1,000 integration health entries
|
||||
});
|
||||
}
|
||||
|
||||
public void CacheHealthStatus(Guid integrationId, HealthStatus status)
|
||||
{
|
||||
_cache.Set(integrationId, status, new MemoryCacheEntryOptions
|
||||
{
|
||||
Size = 1,
|
||||
SlidingExpiration = TimeSpan.FromMinutes(5) // 5-minute TTL
|
||||
});
|
||||
}
|
||||
|
||||
public HealthStatus? GetCachedHealthStatus(Guid integrationId)
|
||||
=> _cache.Get<HealthStatus>(integrationId);
|
||||
}
|
||||
```
|
||||
|
||||
**Reference**: See [Implementation Guide](../implementation-guide.md#caching) for cache implementation patterns.
|
||||
|
||||
---
|
||||
|
||||
## Integration Types
|
||||
|
||||
The following integration types are supported (via plugins):
|
||||
|
||||
### SCM Integrations
|
||||
|
||||
| Type | Plugin | Capabilities |
|
||||
|------|--------|--------------|
|
||||
| `scm.github` | Built-in | repos, branches, commits, webhooks, status |
|
||||
| `scm.gitlab` | Built-in | repos, branches, commits, webhooks, pipelines |
|
||||
| `scm.bitbucket` | Plugin | repos, branches, commits, webhooks |
|
||||
| `scm.azure_repos` | Plugin | repos, branches, commits, pipelines |
|
||||
|
||||
### Registry Integrations
|
||||
|
||||
| Type | Plugin | Capabilities |
|
||||
|------|--------|--------------|
|
||||
| `registry.harbor` | Built-in | repos, tags, digests, scanning status |
|
||||
| `registry.ecr` | Plugin | repos, tags, digests, IAM auth |
|
||||
| `registry.gcr` | Plugin | repos, tags, digests |
|
||||
| `registry.dockerhub` | Plugin | repos, tags, digests |
|
||||
| `registry.ghcr` | Plugin | repos, tags, digests |
|
||||
| `registry.acr` | Plugin | repos, tags, digests |
|
||||
|
||||
### Vault Integrations
|
||||
|
||||
| Type | Plugin | Capabilities |
|
||||
|------|--------|--------------|
|
||||
| `vault.hashicorp` | Built-in | KV, transit, PKI |
|
||||
| `vault.aws_secrets` | Plugin | secrets, IAM auth |
|
||||
| `vault.azure_keyvault` | Plugin | secrets, certificates |
|
||||
| `vault.gcp_secrets` | Plugin | secrets, IAM auth |
|
||||
|
||||
### CI Integrations
|
||||
|
||||
| Type | Plugin | Capabilities |
|
||||
|------|--------|--------------|
|
||||
| `ci.github_actions` | Built-in | workflows, runs, artifacts, status |
|
||||
| `ci.gitlab_ci` | Built-in | pipelines, jobs, artifacts |
|
||||
| `ci.jenkins` | Plugin | jobs, builds, artifacts |
|
||||
| `ci.azure_pipelines` | Plugin | pipelines, runs, artifacts |
|
||||
|
||||
### Router Integrations (for Progressive Delivery)
|
||||
|
||||
| Type | Plugin | Capabilities |
|
||||
|------|--------|--------------|
|
||||
| `router.nginx` | Plugin | upstream config, reload |
|
||||
| `router.haproxy` | Plugin | backend config, reload |
|
||||
| `router.traefik` | Plugin | dynamic config |
|
||||
| `router.aws_alb` | Plugin | target groups, listener rules |
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Integration types (populated by plugins)
|
||||
CREATE TABLE release.integration_types (
|
||||
id TEXT PRIMARY KEY, -- "scm.github"
|
||||
plugin_id UUID REFERENCES release.plugins(id),
|
||||
display_name TEXT NOT NULL,
|
||||
description TEXT,
|
||||
icon_url TEXT,
|
||||
config_schema JSONB NOT NULL, -- JSON Schema for config
|
||||
capabilities TEXT[] NOT NULL, -- ["repos", "webhooks", "status"]
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
-- Integration instances
|
||||
CREATE TABLE release.integrations (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id),
|
||||
type_id TEXT NOT NULL REFERENCES release.integration_types(id),
|
||||
name TEXT NOT NULL,
|
||||
config JSONB NOT NULL,
|
||||
credential_ref TEXT NOT NULL, -- vault reference
|
||||
health_status TEXT NOT NULL DEFAULT 'unknown',
|
||||
last_health_check TIMESTAMPTZ,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
created_by UUID NOT NULL REFERENCES users(id),
|
||||
UNIQUE(tenant_id, name)
|
||||
);
|
||||
|
||||
-- Connection profiles
|
||||
CREATE TABLE release.connection_profiles (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id),
|
||||
name TEXT NOT NULL,
|
||||
integration_type TEXT NOT NULL,
|
||||
default_config JSONB NOT NULL,
|
||||
is_default BOOLEAN NOT NULL DEFAULT false,
|
||||
last_used_at TIMESTAMPTZ,
|
||||
created_by UUID NOT NULL REFERENCES users(id),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
UNIQUE(tenant_id, name)
|
||||
);
|
||||
|
||||
-- Doctor check history
|
||||
CREATE TABLE release.doctor_checks (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
integration_id UUID NOT NULL REFERENCES release.integrations(id),
|
||||
check_type TEXT NOT NULL,
|
||||
status TEXT NOT NULL,
|
||||
message TEXT,
|
||||
details JSONB,
|
||||
duration_ms INTEGER NOT NULL,
|
||||
run_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_doctor_checks_integration ON release.doctor_checks(integration_id, run_at DESC);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
See [API Documentation](../api/overview.md) for full specification.
|
||||
|
||||
```
|
||||
GET /api/v1/integration-types # List available types
|
||||
GET /api/v1/integration-types/{type} # Get type details
|
||||
|
||||
GET /api/v1/integrations # List integrations
|
||||
POST /api/v1/integrations # Create integration
|
||||
GET /api/v1/integrations/{id} # Get integration
|
||||
PUT /api/v1/integrations/{id} # Update integration
|
||||
DELETE /api/v1/integrations/{id} # Delete integration
|
||||
POST /api/v1/integrations/{id}/test # Test connection
|
||||
GET /api/v1/integrations/{id}/health # Get health status
|
||||
POST /api/v1/integrations/{id}/doctor # Run doctor checks
|
||||
GET /api/v1/integrations/{id}/resources # Discover resources
|
||||
|
||||
GET /api/v1/connection-profiles # List profiles
|
||||
POST /api/v1/connection-profiles # Create profile
|
||||
GET /api/v1/connection-profiles/{id} # Get profile
|
||||
PUT /api/v1/connection-profiles/{id} # Update profile
|
||||
DELETE /api/v1/connection-profiles/{id} # Delete profile
|
||||
```
|
||||
203
docs/modules/release-orchestrator/modules/overview.md
Normal file
203
docs/modules/release-orchestrator/modules/overview.md
Normal file
@@ -0,0 +1,203 @@
|
||||
# Module Landscape Overview
|
||||
|
||||
The Stella Ops Suite comprises existing modules (vulnerability scanning) and new modules (release orchestration). Modules are organized into **themes** (functional areas).
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ STELLA OPS SUITE │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ EXISTING THEMES (Vulnerability) │ │
|
||||
│ │ │ │
|
||||
│ │ INGEST VEXOPS REASON SCANENG EVIDENCE │ │
|
||||
│ │ ├─concelier ├─excititor ├─policy ├─scanner ├─locker │ │
|
||||
│ │ └─advisory-ai └─linksets └─opa-runtime ├─sbom-gen ├─export │ │
|
||||
│ │ └─reachability └─timeline │ │
|
||||
│ │ │ │
|
||||
│ │ RUNTIME JOBCTRL OBSERVE REPLAY DEVEXP │ │
|
||||
│ │ ├─signals ├─scheduler ├─notifier └─replay-core ├─cli │ │
|
||||
│ │ ├─graph ├─orchestrator └─telemetry ├─web-ui │ │
|
||||
│ │ └─zastava └─task-runner └─sdk │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ NEW THEMES (Release Orchestration) │ │
|
||||
│ │ │ │
|
||||
│ │ INTHUB (Integration Hub) │ │
|
||||
│ │ ├─integration-manager Central registry of configured integrations │ │
|
||||
│ │ ├─connection-profiles Default settings + credential management │ │
|
||||
│ │ ├─connector-runtime Plugin connector execution environment │ │
|
||||
│ │ └─doctor-checks Integration health diagnostics │ │
|
||||
│ │ │ │
|
||||
│ │ ENVMGR (Environment & Inventory) │ │
|
||||
│ │ ├─environment-manager Environment CRUD, ordering, config │ │
|
||||
│ │ ├─target-registry Deployment targets (hosts/services) │ │
|
||||
│ │ ├─agent-manager Agent registration, health, capabilities │ │
|
||||
│ │ └─inventory-sync Drift detection, state reconciliation │ │
|
||||
│ │ │ │
|
||||
│ │ RELMAN (Release Management) │ │
|
||||
│ │ ├─component-registry Image repos → components mapping │ │
|
||||
│ │ ├─version-manager Tag/digest → semver mapping │ │
|
||||
│ │ ├─release-manager Release bundle lifecycle │ │
|
||||
│ │ └─release-catalog Release history, search, compare │ │
|
||||
│ │ │ │
|
||||
│ │ WORKFL (Workflow Engine) │ │
|
||||
│ │ ├─workflow-designer Template creation, step graph editor │ │
|
||||
│ │ ├─workflow-engine DAG execution, state machine │ │
|
||||
│ │ ├─step-executor Step dispatch, retry, timeout │ │
|
||||
│ │ └─step-registry Built-in + plugin-provided steps │ │
|
||||
│ │ │ │
|
||||
│ │ PROMOT (Promotion & Approval) │ │
|
||||
│ │ ├─promotion-manager Promotion request lifecycle │ │
|
||||
│ │ ├─approval-gateway Approval collection, SoD enforcement │ │
|
||||
│ │ ├─decision-engine Gate evaluation, policy integration │ │
|
||||
│ │ └─gate-registry Built-in + custom gates │ │
|
||||
│ │ │ │
|
||||
│ │ DEPLOY (Deployment Execution) │ │
|
||||
│ │ ├─deploy-orchestrator Deployment job coordination │ │
|
||||
│ │ ├─target-executor Target-specific deployment logic │ │
|
||||
│ │ ├─runner-executor Script/hook execution sandbox │ │
|
||||
│ │ ├─artifact-generator Compose/script artifact generation │ │
|
||||
│ │ └─rollback-manager Rollback orchestration │ │
|
||||
│ │ │ │
|
||||
│ │ AGENTS (Deployment Agents) │ │
|
||||
│ │ ├─agent-core Shared agent runtime │ │
|
||||
│ │ ├─agent-docker Docker host agent │ │
|
||||
│ │ ├─agent-compose Docker Compose agent │ │
|
||||
│ │ ├─agent-ssh SSH remote executor │ │
|
||||
│ │ ├─agent-winrm WinRM remote executor │ │
|
||||
│ │ ├─agent-ecs AWS ECS agent │ │
|
||||
│ │ └─agent-nomad HashiCorp Nomad agent │ │
|
||||
│ │ │ │
|
||||
│ │ PROGDL (Progressive Delivery) │ │
|
||||
│ │ ├─ab-manager A/B release coordination │ │
|
||||
│ │ ├─traffic-router Router plugin orchestration │ │
|
||||
│ │ ├─canary-controller Canary ramp automation │ │
|
||||
│ │ └─rollout-strategy Strategy templates │ │
|
||||
│ │ │ │
|
||||
│ │ RELEVI (Release Evidence) │ │
|
||||
│ │ ├─evidence-collector Evidence aggregation │ │
|
||||
│ │ ├─evidence-signer Cryptographic signing │ │
|
||||
│ │ ├─sticker-writer Version sticker generation │ │
|
||||
│ │ └─audit-exporter Compliance report generation │ │
|
||||
│ │ │ │
|
||||
│ │ PLUGIN (Plugin Infrastructure) │ │
|
||||
│ │ ├─plugin-registry Plugin discovery, versioning │ │
|
||||
│ │ ├─plugin-loader Plugin lifecycle management │ │
|
||||
│ │ ├─plugin-sandbox Isolation, resource limits │ │
|
||||
│ │ └─plugin-sdk SDK for plugin development │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Theme Summary
|
||||
|
||||
### Existing Themes (Vulnerability Scanning)
|
||||
|
||||
| Theme | Purpose | Key Modules |
|
||||
|-------|---------|-------------|
|
||||
| **INGEST** | Advisory ingestion | concelier, advisory-ai |
|
||||
| **VEXOPS** | VEX document handling | excititor, linksets |
|
||||
| **REASON** | Policy and decisioning | policy, opa-runtime |
|
||||
| **SCANENG** | Scanning and SBOM | scanner, sbom-gen, reachability |
|
||||
| **EVIDENCE** | Evidence and attestation | locker, export, timeline |
|
||||
| **RUNTIME** | Runtime signals | signals, graph, zastava |
|
||||
| **JOBCTRL** | Job orchestration | scheduler, orchestrator, task-runner |
|
||||
| **OBSERVE** | Observability | notifier, telemetry |
|
||||
| **REPLAY** | Deterministic replay | replay-core |
|
||||
| **DEVEXP** | Developer experience | cli, web-ui, sdk |
|
||||
|
||||
### New Themes (Release Orchestration)
|
||||
|
||||
| Theme | Purpose | Key Modules | Documentation |
|
||||
|-------|---------|-------------|---------------|
|
||||
| **INTHUB** | Integration hub | integration-manager, connection-profiles, connector-runtime, doctor-checks | [Details](integration-hub.md) |
|
||||
| **ENVMGR** | Environment & inventory | environment-manager, target-registry, agent-manager, inventory-sync | [Details](environment-manager.md) |
|
||||
| **RELMAN** | Release management | component-registry, version-manager, release-manager, release-catalog | [Details](release-manager.md) |
|
||||
| **WORKFL** | Workflow engine | workflow-designer, workflow-engine, step-executor, step-registry | [Details](workflow-engine.md) |
|
||||
| **PROMOT** | Promotion & approval | promotion-manager, approval-gateway, decision-engine, gate-registry | [Details](promotion-manager.md) |
|
||||
| **DEPLOY** | Deployment execution | deploy-orchestrator, target-executor, runner-executor, artifact-generator, rollback-manager | [Details](deploy-orchestrator.md) |
|
||||
| **AGENTS** | Deployment agents | agent-core, agent-docker, agent-compose, agent-ssh, agent-winrm, agent-ecs, agent-nomad | [Details](agents.md) |
|
||||
| **PROGDL** | Progressive delivery | ab-manager, traffic-router, canary-controller, rollout-strategy | [Details](progressive-delivery.md) |
|
||||
| **RELEVI** | Release evidence | evidence-collector, evidence-signer, sticker-writer, audit-exporter | [Details](evidence.md) |
|
||||
| **PLUGIN** | Plugin infrastructure | plugin-registry, plugin-loader, plugin-sandbox, plugin-sdk | [Details](plugin-system.md) |
|
||||
|
||||
## Module Dependencies
|
||||
|
||||
```
|
||||
┌──────────────┐
|
||||
│ AUTHORITY │
|
||||
└──────┬───────┘
|
||||
│
|
||||
┌──────────────────┼──────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
|
||||
│ INTHUB │ │ ENVMGR │ │ PLUGIN │
|
||||
│ (Integrations)│ │ (Environments)│ │ (Plugins) │
|
||||
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
|
||||
│ │ │
|
||||
└──────────┬───────┴──────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ RELMAN │
|
||||
│ (Releases) │
|
||||
└───────┬───────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ WORKFL │
|
||||
│ (Workflows) │
|
||||
└───────┬───────┘
|
||||
│
|
||||
┌──────────┴──────────┐
|
||||
│ │
|
||||
▼ ▼
|
||||
┌───────────────┐ ┌───────────────┐
|
||||
│ PROMOT │ │ DEPLOY │
|
||||
│ (Promotion) │ │ (Deployment) │
|
||||
└───────┬───────┘ └───────┬───────┘
|
||||
│ │
|
||||
│ ▼
|
||||
│ ┌───────────────┐
|
||||
│ │ AGENTS │
|
||||
│ │ (Agents) │
|
||||
│ └───────┬───────┘
|
||||
│ │
|
||||
└──────────┬──────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ RELEVI │
|
||||
│ (Evidence) │
|
||||
└───────────────┘
|
||||
```
|
||||
|
||||
## Communication Patterns
|
||||
|
||||
| Pattern | Usage |
|
||||
|---------|-------|
|
||||
| **Synchronous API** | User-initiated operations (CRUD, queries) |
|
||||
| **Event Bus** | Cross-module notifications (domain events) |
|
||||
| **Task Queue** | Long-running operations (deployments, syncs) |
|
||||
| **WebSocket/SSE** | Real-time UI updates |
|
||||
| **gRPC Streams** | Agent communication |
|
||||
|
||||
## Database Schema Organization
|
||||
|
||||
Each theme owns a PostgreSQL schema:
|
||||
|
||||
| Schema | Owner Theme |
|
||||
|--------|-------------|
|
||||
| `release.integrations` | INTHUB |
|
||||
| `release.environments` | ENVMGR |
|
||||
| `release.components` | RELMAN |
|
||||
| `release.workflows` | WORKFL |
|
||||
| `release.promotions` | PROMOT |
|
||||
| `release.deployments` | DEPLOY |
|
||||
| `release.agents` | AGENTS |
|
||||
| `release.evidence` | RELEVI |
|
||||
| `release.plugins` | PLUGIN |
|
||||
629
docs/modules/release-orchestrator/modules/plugin-system.md
Normal file
629
docs/modules/release-orchestrator/modules/plugin-system.md
Normal file
@@ -0,0 +1,629 @@
|
||||
# PLUGIN: Plugin Infrastructure
|
||||
|
||||
**Purpose**: Extensible plugin system for integrations, steps, and custom functionality.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ PLUGIN ARCHITECTURE │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PLUGIN REGISTRY │ │
|
||||
│ │ │ │
|
||||
│ │ - Plugin discovery and versioning │ │
|
||||
│ │ - Manifest validation │ │
|
||||
│ │ - Dependency resolution │ │
|
||||
│ └──────────────────────────────┬──────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PLUGIN LOADER │ │
|
||||
│ │ │ │
|
||||
│ │ - Lifecycle management (load, start, stop, unload) │ │
|
||||
│ │ - Health monitoring │ │
|
||||
│ │ - Hot reload support │ │
|
||||
│ └──────────────────────────────┬──────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PLUGIN SANDBOX │ │
|
||||
│ │ │ │
|
||||
│ │ - Process isolation │ │
|
||||
│ │ - Resource limits (CPU, memory, network) │ │
|
||||
│ │ - Capability enforcement │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Plugin Types: │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Connector │ │ Step │ │ Gate │ │ Agent │ │
|
||||
│ │ Plugins │ │ Providers │ │ Providers │ │ Plugins │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `plugin-registry`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Plugin discovery; versioning; manifest management |
|
||||
| **Data Entities** | `Plugin`, `PluginManifest`, `PluginVersion` |
|
||||
| **Events Produced** | `plugin.discovered`, `plugin.registered`, `plugin.unregistered` |
|
||||
|
||||
**Plugin Entity**:
|
||||
```typescript
|
||||
interface Plugin {
|
||||
id: UUID;
|
||||
pluginId: string; // "com.example.my-connector"
|
||||
version: string; // "1.2.3"
|
||||
vendor: string;
|
||||
license: string;
|
||||
manifest: PluginManifest;
|
||||
status: PluginStatus;
|
||||
entrypoint: string; // Path to plugin executable/module
|
||||
lastHealthCheck: DateTime;
|
||||
healthMessage: string | null;
|
||||
installedAt: DateTime;
|
||||
updatedAt: DateTime;
|
||||
}
|
||||
|
||||
type PluginStatus =
|
||||
| "discovered" // Found but not loaded
|
||||
| "loaded" // Loaded but not active
|
||||
| "active" // Running and healthy
|
||||
| "stopped" // Manually stopped
|
||||
| "failed" // Failed to load or crashed
|
||||
| "degraded"; // Running but with issues
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `plugin-loader`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Plugin lifecycle management |
|
||||
| **Dependencies** | `plugin-registry`, `plugin-sandbox` |
|
||||
| **Events Produced** | `plugin.loaded`, `plugin.started`, `plugin.stopped`, `plugin.failed` |
|
||||
|
||||
**Plugin Lifecycle**:
|
||||
```
|
||||
┌──────────────┐
|
||||
│ DISCOVERED │ ──── Plugin found in registry
|
||||
└──────┬───────┘
|
||||
│ load()
|
||||
▼
|
||||
┌──────────────┐
|
||||
│ LOADED │ ──── Plugin validated and prepared
|
||||
└──────┬───────┘
|
||||
│ start()
|
||||
▼
|
||||
┌──────────────┐ ┌──────────────┐
|
||||
│ ACTIVE │ ──── │ DEGRADED │ ◄── Health issues
|
||||
└──────┬───────┘ └──────────────┘
|
||||
│ stop() │
|
||||
▼ │
|
||||
┌──────────────┐ │
|
||||
│ STOPPED │ ◄───────────┘ manual stop
|
||||
└──────────────┘
|
||||
|
||||
│ unload()
|
||||
▼
|
||||
┌──────────────┐
|
||||
│ UNLOADED │
|
||||
└──────────────┘
|
||||
```
|
||||
|
||||
**Lifecycle Operations**:
|
||||
```typescript
|
||||
interface PluginLoader {
|
||||
// Discovery
|
||||
discover(): Promise<Plugin[]>;
|
||||
refresh(): Promise<void>;
|
||||
|
||||
// Lifecycle
|
||||
load(pluginId: string): Promise<Plugin>;
|
||||
start(pluginId: string): Promise<void>;
|
||||
stop(pluginId: string): Promise<void>;
|
||||
unload(pluginId: string): Promise<void>;
|
||||
restart(pluginId: string): Promise<void>;
|
||||
|
||||
// Health
|
||||
checkHealth(pluginId: string): Promise<HealthStatus>;
|
||||
getStatus(pluginId: string): Promise<PluginStatus>;
|
||||
|
||||
// Hot reload
|
||||
reload(pluginId: string): Promise<void>;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `plugin-sandbox`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Isolation; resource limits; security |
|
||||
| **Enforcement** | Process isolation, capability-based security |
|
||||
|
||||
**Sandbox Configuration**:
|
||||
```typescript
|
||||
interface SandboxConfig {
|
||||
// Process isolation
|
||||
processIsolation: boolean; // Run in separate process
|
||||
containerIsolation: boolean; // Run in container
|
||||
|
||||
// Resource limits
|
||||
resourceLimits: {
|
||||
maxMemoryMb: number; // Memory limit
|
||||
maxCpuPercent: number; // CPU limit
|
||||
maxDiskMb: number; // Disk quota
|
||||
maxNetworkBandwidth: number; // Network bandwidth limit
|
||||
};
|
||||
|
||||
// Network restrictions
|
||||
networkPolicy: {
|
||||
allowedHosts: string[]; // Allowed outbound hosts
|
||||
blockedHosts: string[]; // Blocked hosts
|
||||
allowOutbound: boolean; // Allow any outbound
|
||||
};
|
||||
|
||||
// Filesystem restrictions
|
||||
filesystemPolicy: {
|
||||
readOnlyPaths: string[];
|
||||
writablePaths: string[];
|
||||
blockedPaths: string[];
|
||||
};
|
||||
|
||||
// Timeouts
|
||||
timeouts: {
|
||||
initializationMs: number;
|
||||
operationMs: number;
|
||||
shutdownMs: number;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Capability Enforcement**:
|
||||
```typescript
|
||||
interface PluginCapabilities {
|
||||
// Integration capabilities
|
||||
integrations: {
|
||||
scm: boolean;
|
||||
ci: boolean;
|
||||
registry: boolean;
|
||||
vault: boolean;
|
||||
router: boolean;
|
||||
};
|
||||
|
||||
// Step capabilities
|
||||
steps: {
|
||||
deploy: boolean;
|
||||
gate: boolean;
|
||||
notify: boolean;
|
||||
custom: boolean;
|
||||
};
|
||||
|
||||
// System capabilities
|
||||
system: {
|
||||
network: boolean;
|
||||
filesystem: boolean;
|
||||
secrets: boolean;
|
||||
database: boolean;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `plugin-sdk`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | SDK for plugin development |
|
||||
| **Languages** | C#, TypeScript, Go |
|
||||
|
||||
**Plugin SDK Interface**:
|
||||
```typescript
|
||||
// Base plugin interface
|
||||
interface StellaPlugin {
|
||||
// Lifecycle
|
||||
initialize(config: PluginConfig): Promise<void>;
|
||||
start(): Promise<void>;
|
||||
stop(): Promise<void>;
|
||||
dispose(): Promise<void>;
|
||||
|
||||
// Health
|
||||
getHealth(): Promise<HealthStatus>;
|
||||
|
||||
// Metadata
|
||||
getManifest(): PluginManifest;
|
||||
}
|
||||
|
||||
// Connector plugin interface
|
||||
interface ConnectorPlugin extends StellaPlugin {
|
||||
createConnector(config: ConnectorConfig): Promise<Connector>;
|
||||
}
|
||||
|
||||
// Step provider plugin interface
|
||||
interface StepProviderPlugin extends StellaPlugin {
|
||||
getStepTypes(): StepType[];
|
||||
executeStep(
|
||||
stepType: string,
|
||||
config: StepConfig,
|
||||
inputs: StepInputs,
|
||||
context: StepContext
|
||||
): AsyncGenerator<StepEvent>;
|
||||
}
|
||||
|
||||
// Gate provider plugin interface
|
||||
interface GateProviderPlugin extends StellaPlugin {
|
||||
getGateTypes(): GateType[];
|
||||
evaluateGate(
|
||||
gateType: string,
|
||||
config: GateConfig,
|
||||
context: GateContext
|
||||
): Promise<GateResult>;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Three-Surface Plugin Model
|
||||
|
||||
Plugins contribute to the system through three distinct surfaces:
|
||||
|
||||
### 1. Manifest Surface (Static)
|
||||
|
||||
The plugin manifest declares:
|
||||
- Plugin identity and version
|
||||
- Required capabilities
|
||||
- Provided integrations/steps/gates
|
||||
- Configuration schema
|
||||
- UI components (optional)
|
||||
|
||||
```yaml
|
||||
# plugin.stella.yaml
|
||||
plugin:
|
||||
id: "com.example.jenkins-connector"
|
||||
version: "1.0.0"
|
||||
vendor: "Example Corp"
|
||||
license: "Apache-2.0"
|
||||
description: "Jenkins CI integration for Stella Ops"
|
||||
|
||||
capabilities:
|
||||
required:
|
||||
- network
|
||||
optional:
|
||||
- secrets
|
||||
|
||||
provides:
|
||||
integrations:
|
||||
- type: "ci.jenkins"
|
||||
displayName: "Jenkins"
|
||||
configSchema: "./schemas/jenkins-config.json"
|
||||
capabilities:
|
||||
- "pipelines"
|
||||
- "builds"
|
||||
- "artifacts"
|
||||
|
||||
steps:
|
||||
- type: "jenkins-trigger"
|
||||
displayName: "Trigger Jenkins Build"
|
||||
category: "integration"
|
||||
configSchema: "./schemas/jenkins-trigger-config.json"
|
||||
inputSchema: "./schemas/jenkins-trigger-input.json"
|
||||
outputSchema: "./schemas/jenkins-trigger-output.json"
|
||||
|
||||
ui:
|
||||
configScreen: "./ui/config.html"
|
||||
icon: "./assets/jenkins-icon.svg"
|
||||
|
||||
dependencies:
|
||||
stellaCore: ">=1.0.0"
|
||||
```
|
||||
|
||||
### 2. Connector Runtime Surface (Dynamic)
|
||||
|
||||
Plugins implement connector interfaces for runtime operations:
|
||||
|
||||
```typescript
|
||||
// Jenkins connector implementation
|
||||
class JenkinsConnector implements CIConnector {
|
||||
private client: JenkinsClient;
|
||||
|
||||
async initialize(config: ConnectorConfig, secrets: SecretHandle[]): Promise<void> {
|
||||
const apiToken = await this.getSecret(secrets, "api_token");
|
||||
this.client = new JenkinsClient({
|
||||
baseUrl: config.endpoint,
|
||||
username: config.username,
|
||||
apiToken: apiToken,
|
||||
});
|
||||
}
|
||||
|
||||
async testConnection(): Promise<ConnectionTestResult> {
|
||||
try {
|
||||
const crumb = await this.client.getCrumb();
|
||||
return { success: true, message: "Connected to Jenkins" };
|
||||
} catch (error) {
|
||||
return { success: false, message: error.message };
|
||||
}
|
||||
}
|
||||
|
||||
async listPipelines(): Promise<PipelineInfo[]> {
|
||||
const jobs = await this.client.getJobs();
|
||||
return jobs.map(job => ({
|
||||
id: job.name,
|
||||
name: job.displayName,
|
||||
url: job.url,
|
||||
lastBuild: job.lastBuild?.number,
|
||||
}));
|
||||
}
|
||||
|
||||
async triggerPipeline(pipelineId: string, params: object): Promise<PipelineRun> {
|
||||
const queueItem = await this.client.build(pipelineId, params);
|
||||
return {
|
||||
id: queueItem.id.toString(),
|
||||
pipelineId,
|
||||
status: "queued",
|
||||
startedAt: new Date(),
|
||||
};
|
||||
}
|
||||
|
||||
async getPipelineRun(runId: string): Promise<PipelineRun> {
|
||||
const build = await this.client.getBuild(runId);
|
||||
return {
|
||||
id: build.number.toString(),
|
||||
pipelineId: build.job,
|
||||
status: this.mapStatus(build.result),
|
||||
startedAt: new Date(build.timestamp),
|
||||
completedAt: build.result ? new Date(build.timestamp + build.duration) : null,
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Step Provider Surface (Execution)
|
||||
|
||||
Plugins implement step execution logic:
|
||||
|
||||
```typescript
|
||||
// Jenkins trigger step implementation
|
||||
class JenkinsTriggerStep implements StepExecutor {
|
||||
async *execute(
|
||||
config: StepConfig,
|
||||
inputs: StepInputs,
|
||||
context: StepContext
|
||||
): AsyncGenerator<StepEvent> {
|
||||
const connector = await context.getConnector<JenkinsConnector>(config.integrationId);
|
||||
|
||||
yield { type: "log", line: `Triggering Jenkins job: ${config.jobName}` };
|
||||
|
||||
// Trigger build
|
||||
const run = await connector.triggerPipeline(config.jobName, inputs.parameters);
|
||||
yield { type: "output", name: "buildId", value: run.id };
|
||||
yield { type: "log", line: `Build queued: ${run.id}` };
|
||||
|
||||
// Wait for completion if configured
|
||||
if (config.waitForCompletion) {
|
||||
yield { type: "log", line: "Waiting for build to complete..." };
|
||||
|
||||
while (true) {
|
||||
const status = await connector.getPipelineRun(run.id);
|
||||
|
||||
if (status.status === "succeeded") {
|
||||
yield { type: "output", name: "status", value: "succeeded" };
|
||||
yield { type: "result", success: true };
|
||||
return;
|
||||
}
|
||||
|
||||
if (status.status === "failed") {
|
||||
yield { type: "output", name: "status", value: "failed" };
|
||||
yield { type: "result", success: false, message: "Build failed" };
|
||||
return;
|
||||
}
|
||||
|
||||
yield { type: "progress", progress: 50, message: `Build running: ${status.status}` };
|
||||
await sleep(config.pollIntervalSeconds * 1000);
|
||||
}
|
||||
}
|
||||
|
||||
yield { type: "result", success: true };
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Plugins
|
||||
CREATE TABLE release.plugins (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
plugin_id VARCHAR(255) NOT NULL UNIQUE,
|
||||
version VARCHAR(50) NOT NULL,
|
||||
vendor VARCHAR(255) NOT NULL,
|
||||
license VARCHAR(100),
|
||||
manifest JSONB NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'discovered' CHECK (status IN (
|
||||
'discovered', 'loaded', 'active', 'stopped', 'failed', 'degraded'
|
||||
)),
|
||||
entrypoint VARCHAR(500) NOT NULL,
|
||||
last_health_check TIMESTAMPTZ,
|
||||
health_message TEXT,
|
||||
installed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_plugins_status ON release.plugins(status);
|
||||
|
||||
-- Plugin Instances (per-tenant configuration)
|
||||
CREATE TABLE release.plugin_instances (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
plugin_id UUID NOT NULL REFERENCES release.plugins(id) ON DELETE CASCADE,
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
config JSONB NOT NULL DEFAULT '{}',
|
||||
enabled BOOLEAN NOT NULL DEFAULT TRUE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_plugin_instances_tenant ON release.plugin_instances(tenant_id);
|
||||
|
||||
-- Integration types (populated by plugins)
|
||||
CREATE TABLE release.integration_types (
|
||||
id TEXT PRIMARY KEY, -- "scm.github", "ci.jenkins"
|
||||
plugin_id UUID REFERENCES release.plugins(id),
|
||||
display_name TEXT NOT NULL,
|
||||
description TEXT,
|
||||
icon_url TEXT,
|
||||
config_schema JSONB NOT NULL, -- JSON Schema for config
|
||||
capabilities TEXT[] NOT NULL, -- ["repos", "webhooks", "status"]
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Plugin Registry
|
||||
GET /api/v1/plugins
|
||||
Query: ?status={status}&capability={type}
|
||||
Response: Plugin[]
|
||||
|
||||
GET /api/v1/plugins/{id}
|
||||
Response: Plugin (with manifest)
|
||||
|
||||
POST /api/v1/plugins/{id}/enable
|
||||
Response: Plugin
|
||||
|
||||
POST /api/v1/plugins/{id}/disable
|
||||
Response: Plugin
|
||||
|
||||
GET /api/v1/plugins/{id}/health
|
||||
Response: { status, message, diagnostics[] }
|
||||
|
||||
# Plugin Instances (per-tenant config)
|
||||
POST /api/v1/plugin-instances
|
||||
Body: { pluginId: UUID, config: object }
|
||||
Response: PluginInstance
|
||||
|
||||
GET /api/v1/plugin-instances
|
||||
Response: PluginInstance[]
|
||||
|
||||
PUT /api/v1/plugin-instances/{id}
|
||||
Body: { config: object, enabled: boolean }
|
||||
Response: PluginInstance
|
||||
|
||||
DELETE /api/v1/plugin-instances/{id}
|
||||
Response: { deleted: true }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Plugin Security
|
||||
|
||||
### Capability Declaration
|
||||
|
||||
Plugins must declare all required capabilities in their manifest. The system enforces:
|
||||
|
||||
1. **Network Access**: Plugins can only access declared hosts
|
||||
2. **Secret Access**: Plugins receive secrets through controlled injection
|
||||
3. **Database Access**: No direct database access; API only
|
||||
4. **Filesystem Access**: Limited to declared paths
|
||||
|
||||
### Sandbox Enforcement
|
||||
|
||||
```typescript
|
||||
// Plugin execution is sandboxed
|
||||
class PluginSandbox {
|
||||
async execute<T>(
|
||||
plugin: Plugin,
|
||||
operation: () => Promise<T>
|
||||
): Promise<T> {
|
||||
// 1. Verify capabilities
|
||||
this.verifyCapabilities(plugin);
|
||||
|
||||
// 2. Set resource limits
|
||||
const limits = this.getResourceLimits(plugin);
|
||||
await this.applyLimits(limits);
|
||||
|
||||
// 3. Create isolated context
|
||||
const context = await this.createIsolatedContext(plugin);
|
||||
|
||||
try {
|
||||
// 4. Execute with timeout
|
||||
return await this.withTimeout(
|
||||
operation(),
|
||||
plugin.manifest.timeouts.operationMs
|
||||
);
|
||||
} catch (error) {
|
||||
// 5. Log and handle errors
|
||||
await this.handlePluginError(plugin, error);
|
||||
throw error;
|
||||
} finally {
|
||||
// 6. Cleanup
|
||||
await context.dispose();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Plugin Failures Cannot Crash Core
|
||||
|
||||
```csharp
|
||||
// Core orchestration is protected from plugin failures
|
||||
public sealed class PromotionDecisionEngine
|
||||
{
|
||||
public async Task<DecisionResult> EvaluateAsync(
|
||||
Promotion promotion,
|
||||
IReadOnlyList<IGateProvider> gates,
|
||||
CancellationToken ct)
|
||||
{
|
||||
var results = new List<GateResult>();
|
||||
|
||||
foreach (var gate in gates)
|
||||
{
|
||||
try
|
||||
{
|
||||
// Plugin provides evaluation logic
|
||||
var result = await gate.EvaluateAsync(promotion, ct);
|
||||
results.Add(result);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
// Plugin failure is logged but doesn't crash core
|
||||
_logger.LogError(ex, "Gate {GateType} failed", gate.Type);
|
||||
results.Add(new GateResult
|
||||
{
|
||||
GateType = gate.Type,
|
||||
Status = GateStatus.Failed,
|
||||
Message = $"Gate evaluation failed: {ex.Message}",
|
||||
IsBlocking = gate.IsBlocking,
|
||||
});
|
||||
}
|
||||
|
||||
// Core decides how to aggregate (plugins cannot override)
|
||||
if (results.Last().IsBlocking && _policy.FailFast)
|
||||
break;
|
||||
}
|
||||
|
||||
// Core makes final decision
|
||||
return _decisionAggregator.Aggregate(results);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Integration Hub](integration-hub.md)
|
||||
- [Workflow Engine](workflow-engine.md)
|
||||
- [Connector Interface](../integrations/connectors.md)
|
||||
@@ -0,0 +1,471 @@
|
||||
# PROGDL: Progressive Delivery
|
||||
|
||||
**Purpose**: A/B releases, canary deployments, and traffic management.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ PROGRESSIVE DELIVERY ARCHITECTURE │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ A/B RELEASE MANAGER │ │
|
||||
│ │ │ │
|
||||
│ │ - Create A/B release with variations │ │
|
||||
│ │ - Manage traffic split configuration │ │
|
||||
│ │ - Coordinate rollout stages │ │
|
||||
│ │ - Handle promotion/rollback │ │
|
||||
│ └──────────────────────────────┬──────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────────┴──────────────────┐ │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌───────────────────────┐ ┌───────────────────────┐ │
|
||||
│ │ TARGET-GROUP A/B │ │ ROUTER-BASED A/B │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ Deploy to groups │ │ Configure traffic │ │
|
||||
│ │ by labels/membership │ │ via load balancer │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ Good for: │ │ Good for: │ │
|
||||
│ │ - Background workers │ │ - Web/API traffic │ │
|
||||
│ │ - Batch processors │ │ - Customer-facing │ │
|
||||
│ │ - Internal services │ │ - L7 routing │ │
|
||||
│ └───────────────────────┘ └───────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ CANARY CONTROLLER │ │
|
||||
│ │ │ │
|
||||
│ │ - Execute rollout stages │ │
|
||||
│ │ - Monitor health metrics │ │
|
||||
│ │ - Auto-advance or pause │ │
|
||||
│ │ - Trigger rollback on failure │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ TRAFFIC ROUTER INTEGRATION │ │
|
||||
│ │ │ │
|
||||
│ │ Plugin-based integration with: │ │
|
||||
│ │ - Nginx (config generation + reload) │ │
|
||||
│ │ - HAProxy (config generation + reload) │ │
|
||||
│ │ - Traefik (dynamic config API) │ │
|
||||
│ │ - AWS ALB (target group weights) │ │
|
||||
│ │ - Custom (webhook) │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `ab-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | A/B release lifecycle; variation management |
|
||||
| **Dependencies** | `release-manager`, `environment-manager`, `deploy-orchestrator` |
|
||||
| **Data Entities** | `ABRelease`, `Variation`, `TrafficSplit` |
|
||||
| **Events Produced** | `ab.created`, `ab.started`, `ab.stage_advanced`, `ab.promoted`, `ab.rolled_back` |
|
||||
|
||||
**A/B Release Entity**:
|
||||
```typescript
|
||||
interface ABRelease {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
environmentId: UUID;
|
||||
name: string;
|
||||
variations: Variation[];
|
||||
activeVariation: string; // "A" or "B"
|
||||
trafficSplit: TrafficSplit;
|
||||
rolloutStrategy: RolloutStrategy;
|
||||
status: ABReleaseStatus;
|
||||
createdAt: DateTime;
|
||||
completedAt: DateTime | null;
|
||||
createdBy: UUID;
|
||||
}
|
||||
|
||||
interface Variation {
|
||||
name: string; // "A", "B"
|
||||
releaseId: UUID;
|
||||
targetGroupId: UUID | null; // for target-group based A/B
|
||||
trafficPercentage: number;
|
||||
deploymentJobId: UUID | null;
|
||||
}
|
||||
|
||||
interface TrafficSplit {
|
||||
type: "percentage" | "sticky" | "header";
|
||||
percentages: Record<string, number>; // {"A": 90, "B": 10}
|
||||
stickyKey?: string; // cookie or header name
|
||||
headerMatch?: { // for header-based routing
|
||||
header: string;
|
||||
values: Record<string, string>; // value -> variation
|
||||
};
|
||||
}
|
||||
|
||||
type ABReleaseStatus =
|
||||
| "created" // Configured, not started
|
||||
| "deploying" // Deploying variations
|
||||
| "running" // Active with traffic split
|
||||
| "promoting" // Promoting winner to 100%
|
||||
| "completed" // Successfully completed
|
||||
| "rolled_back"; // Rolled back to original
|
||||
```
|
||||
|
||||
**A/B Release Models**:
|
||||
|
||||
| Model | Description | Use Case |
|
||||
|-------|-------------|----------|
|
||||
| **Target-Group A/B** | Deploy different releases to different target groups | Background workers, internal services |
|
||||
| **Router-Based A/B** | Use load balancer to split traffic | Web/API traffic, customer-facing |
|
||||
| **Hybrid A/B** | Combination of both | Complex deployments |
|
||||
|
||||
---
|
||||
|
||||
### Module: `traffic-router`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Router plugin orchestration; traffic shifting |
|
||||
| **Dependencies** | `integration-manager`, `connector-runtime` |
|
||||
| **Protocol** | Plugin-specific (API calls, config generation) |
|
||||
|
||||
**Router Connector Interface**:
|
||||
```typescript
|
||||
interface RouterConnector extends BaseConnector {
|
||||
// Traffic management
|
||||
configureRoute(config: RouteConfig): Promise<void>;
|
||||
getTrafficDistribution(): Promise<TrafficDistribution>;
|
||||
shiftTraffic(from: string, to: string, percentage: number): Promise<void>;
|
||||
|
||||
// Configuration
|
||||
reloadConfig(): Promise<void>;
|
||||
validateConfig(config: string): Promise<ValidationResult>;
|
||||
}
|
||||
|
||||
interface RouteConfig {
|
||||
upstream: string;
|
||||
backends: Array<{
|
||||
name: string;
|
||||
targets: string[];
|
||||
weight: number;
|
||||
}>;
|
||||
healthCheck?: {
|
||||
path: string;
|
||||
interval: number;
|
||||
timeout: number;
|
||||
};
|
||||
}
|
||||
|
||||
interface TrafficDistribution {
|
||||
backends: Array<{
|
||||
name: string;
|
||||
weight: number;
|
||||
healthyTargets: number;
|
||||
totalTargets: number;
|
||||
}>;
|
||||
timestamp: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
**Router Plugins**:
|
||||
|
||||
| Plugin | Capabilities |
|
||||
|--------|-------------|
|
||||
| `router.nginx` | Config generation, reload via signal/API |
|
||||
| `router.haproxy` | Config generation, reload via socket |
|
||||
| `router.traefik` | Dynamic config API |
|
||||
| `router.aws_alb` | Target group weights via AWS API |
|
||||
| `router.custom` | Webhook-based custom integration |
|
||||
|
||||
---
|
||||
|
||||
### Module: `canary-controller`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Canary ramp automation; health monitoring |
|
||||
| **Dependencies** | `ab-manager`, `traffic-router` |
|
||||
| **Data Entities** | `CanaryStage`, `HealthResult` |
|
||||
| **Events Produced** | `canary.stage_started`, `canary.stage_passed`, `canary.stage_failed` |
|
||||
|
||||
**Canary Stage Entity**:
|
||||
```typescript
|
||||
interface CanaryStage {
|
||||
id: UUID;
|
||||
abReleaseId: UUID;
|
||||
stageNumber: number;
|
||||
trafficPercentage: number;
|
||||
status: CanaryStageStatus;
|
||||
healthThreshold: number; // Required health % to pass
|
||||
durationSeconds: number; // How long to run stage
|
||||
requireApproval: boolean; // Require manual approval
|
||||
startedAt: DateTime | null;
|
||||
completedAt: DateTime | null;
|
||||
healthResult: HealthResult | null;
|
||||
}
|
||||
|
||||
type CanaryStageStatus =
|
||||
| "pending"
|
||||
| "running"
|
||||
| "succeeded"
|
||||
| "failed"
|
||||
| "skipped";
|
||||
|
||||
interface HealthResult {
|
||||
healthy: boolean;
|
||||
healthPercentage: number;
|
||||
metrics: {
|
||||
successRate: number;
|
||||
errorRate: number;
|
||||
latencyP50: number;
|
||||
latencyP99: number;
|
||||
};
|
||||
samples: number;
|
||||
evaluatedAt: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
**Canary Rollout Execution**:
|
||||
```typescript
|
||||
class CanaryController {
|
||||
async executeRollout(abRelease: ABRelease): Promise<void> {
|
||||
const stages = abRelease.rolloutStrategy.stages;
|
||||
|
||||
for (const stage of stages) {
|
||||
this.log(`Starting canary stage ${stage.stageNumber}: ${stage.trafficPercentage}%`);
|
||||
|
||||
// 1. Shift traffic to canary percentage
|
||||
await this.trafficRouter.shiftTraffic(
|
||||
abRelease.variations[0].name, // baseline
|
||||
abRelease.variations[1].name, // canary
|
||||
stage.trafficPercentage
|
||||
);
|
||||
|
||||
// 2. Update stage status
|
||||
stage.status = "running";
|
||||
stage.startedAt = new Date();
|
||||
await this.save(stage);
|
||||
|
||||
// 3. Wait for stage duration
|
||||
await this.waitForDuration(stage.durationSeconds);
|
||||
|
||||
// 4. Evaluate health
|
||||
const healthResult = await this.evaluateHealth(abRelease, stage);
|
||||
stage.healthResult = healthResult;
|
||||
|
||||
if (!healthResult.healthy || healthResult.healthPercentage < stage.healthThreshold) {
|
||||
stage.status = "failed";
|
||||
await this.save(stage);
|
||||
|
||||
// Rollback
|
||||
await this.rollback(abRelease);
|
||||
throw new CanaryFailedError(`Stage ${stage.stageNumber} failed health check`);
|
||||
}
|
||||
|
||||
// 5. Check if approval required
|
||||
if (stage.requireApproval) {
|
||||
await this.waitForApproval(abRelease, stage);
|
||||
}
|
||||
|
||||
stage.status = "succeeded";
|
||||
stage.completedAt = new Date();
|
||||
await this.save(stage);
|
||||
|
||||
// 6. Check for auto-advance
|
||||
if (!abRelease.rolloutStrategy.autoAdvance) {
|
||||
await this.waitForManualAdvance(abRelease);
|
||||
}
|
||||
}
|
||||
|
||||
// All stages passed - promote canary to 100%
|
||||
await this.promote(abRelease, abRelease.variations[1].name);
|
||||
}
|
||||
|
||||
private async evaluateHealth(abRelease: ABRelease, stage: CanaryStage): Promise<HealthResult> {
|
||||
// Collect metrics from targets
|
||||
const canaryVariation = abRelease.variations.find(v => v.name === "B");
|
||||
const targets = await this.getTargets(canaryVariation.targetGroupId);
|
||||
|
||||
let healthyCount = 0;
|
||||
let totalLatency = 0;
|
||||
let errorCount = 0;
|
||||
|
||||
for (const target of targets) {
|
||||
const health = await this.checkTargetHealth(target);
|
||||
if (health.healthy) healthyCount++;
|
||||
totalLatency += health.latencyMs;
|
||||
if (health.errorRate > 0) errorCount++;
|
||||
}
|
||||
|
||||
return {
|
||||
healthy: healthyCount >= targets.length * (stage.healthThreshold / 100),
|
||||
healthPercentage: (healthyCount / targets.length) * 100,
|
||||
metrics: {
|
||||
successRate: ((targets.length - errorCount) / targets.length) * 100,
|
||||
errorRate: (errorCount / targets.length) * 100,
|
||||
latencyP50: totalLatency / targets.length,
|
||||
latencyP99: totalLatency / targets.length * 1.5, // simplified
|
||||
},
|
||||
samples: targets.length,
|
||||
evaluatedAt: new Date(),
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `rollout-strategy`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Strategy templates; configuration |
|
||||
| **Data Entities** | `RolloutStrategyTemplate` |
|
||||
|
||||
**Built-in Strategy Templates**:
|
||||
|
||||
| Template | Stages | Description |
|
||||
|----------|--------|-------------|
|
||||
| `canary-10-25-50-100` | 4 | Standard canary: 10%, 25%, 50%, 100% |
|
||||
| `canary-1-5-10-50-100` | 5 | Conservative: 1%, 5%, 10%, 50%, 100% |
|
||||
| `blue-green-instant` | 2 | Deploy 100% to green, instant switch |
|
||||
| `blue-green-gradual` | 4 | Gradual shift: 25%, 50%, 75%, 100% |
|
||||
|
||||
**Rollout Strategy Definition**:
|
||||
```typescript
|
||||
interface RolloutStrategy {
|
||||
id: UUID;
|
||||
name: string;
|
||||
stages: Array<{
|
||||
trafficPercentage: number;
|
||||
durationSeconds: number;
|
||||
healthThreshold: number;
|
||||
requireApproval: boolean;
|
||||
}>;
|
||||
autoAdvance: boolean;
|
||||
rollbackOnFailure: boolean;
|
||||
healthCheckInterval: number;
|
||||
}
|
||||
|
||||
// Example: Standard Canary
|
||||
const standardCanary: RolloutStrategy = {
|
||||
name: "canary-10-25-50-100",
|
||||
stages: [
|
||||
{ trafficPercentage: 10, durationSeconds: 300, healthThreshold: 95, requireApproval: false },
|
||||
{ trafficPercentage: 25, durationSeconds: 600, healthThreshold: 95, requireApproval: false },
|
||||
{ trafficPercentage: 50, durationSeconds: 900, healthThreshold: 95, requireApproval: true },
|
||||
{ trafficPercentage: 100, durationSeconds: 0, healthThreshold: 95, requireApproval: false },
|
||||
],
|
||||
autoAdvance: true,
|
||||
rollbackOnFailure: true,
|
||||
healthCheckInterval: 30,
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- A/B Releases
|
||||
CREATE TABLE release.ab_releases (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES release.environments(id),
|
||||
name VARCHAR(255) NOT NULL,
|
||||
variations JSONB NOT NULL, -- [{name, releaseId, targetGroupId, trafficPercentage}]
|
||||
active_variation VARCHAR(50) NOT NULL DEFAULT 'A',
|
||||
traffic_split JSONB NOT NULL,
|
||||
rollout_strategy JSONB NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'created' CHECK (status IN (
|
||||
'created', 'deploying', 'running', 'promoting', 'completed', 'rolled_back'
|
||||
)),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
completed_at TIMESTAMPTZ,
|
||||
created_by UUID REFERENCES users(id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_ab_releases_tenant_env ON release.ab_releases(tenant_id, environment_id);
|
||||
CREATE INDEX idx_ab_releases_status ON release.ab_releases(status);
|
||||
|
||||
-- Canary Stages
|
||||
CREATE TABLE release.canary_stages (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
ab_release_id UUID NOT NULL REFERENCES release.ab_releases(id) ON DELETE CASCADE,
|
||||
stage_number INTEGER NOT NULL,
|
||||
traffic_percentage INTEGER NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending' CHECK (status IN (
|
||||
'pending', 'running', 'succeeded', 'failed', 'skipped'
|
||||
)),
|
||||
health_threshold DECIMAL(5,2),
|
||||
duration_seconds INTEGER,
|
||||
require_approval BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
health_result JSONB,
|
||||
UNIQUE (ab_release_id, stage_number)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# A/B Releases
|
||||
POST /api/v1/ab-releases
|
||||
Body: {
|
||||
environmentId: UUID,
|
||||
name: string,
|
||||
variations: [
|
||||
{ name: "A", releaseId: UUID, targetGroupId?: UUID },
|
||||
{ name: "B", releaseId: UUID, targetGroupId?: UUID }
|
||||
],
|
||||
trafficSplit: TrafficSplit,
|
||||
rolloutStrategy: RolloutStrategy
|
||||
}
|
||||
Response: ABRelease
|
||||
|
||||
GET /api/v1/ab-releases
|
||||
Query: ?environmentId={uuid}&status={status}
|
||||
Response: ABRelease[]
|
||||
|
||||
GET /api/v1/ab-releases/{id}
|
||||
Response: ABRelease (with stages)
|
||||
|
||||
POST /api/v1/ab-releases/{id}/start
|
||||
Response: ABRelease
|
||||
|
||||
POST /api/v1/ab-releases/{id}/advance
|
||||
Body: { stageNumber?: number } # advance to next or specific stage
|
||||
Response: ABRelease
|
||||
|
||||
POST /api/v1/ab-releases/{id}/promote
|
||||
Body: { variation: "A" | "B" } # promote to 100%
|
||||
Response: ABRelease
|
||||
|
||||
POST /api/v1/ab-releases/{id}/rollback
|
||||
Response: ABRelease
|
||||
|
||||
GET /api/v1/ab-releases/{id}/traffic
|
||||
Response: { currentSplit: TrafficDistribution, history: TrafficHistory[] }
|
||||
|
||||
GET /api/v1/ab-releases/{id}/health
|
||||
Response: { variations: [{ name, healthStatus, metrics }] }
|
||||
|
||||
# Rollout Strategies
|
||||
GET /api/v1/rollout-strategies
|
||||
Response: RolloutStrategyTemplate[]
|
||||
|
||||
GET /api/v1/rollout-strategies/{id}
|
||||
Response: RolloutStrategyTemplate
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Deploy Orchestrator](deploy-orchestrator.md)
|
||||
- [A/B Releases](../progressive-delivery/ab-releases.md)
|
||||
- [Canary Controller](../progressive-delivery/canary.md)
|
||||
- [Router Plugins](../progressive-delivery/routers.md)
|
||||
433
docs/modules/release-orchestrator/modules/promotion-manager.md
Normal file
433
docs/modules/release-orchestrator/modules/promotion-manager.md
Normal file
@@ -0,0 +1,433 @@
|
||||
# PROMOT: Promotion & Approval Manager
|
||||
|
||||
**Purpose**: Manage promotion requests, approvals, gates, and decision records.
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `promotion-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Promotion request lifecycle; state management |
|
||||
| **Dependencies** | `release-manager`, `environment-manager`, `workflow-engine` |
|
||||
| **Data Entities** | `Promotion`, `PromotionState` |
|
||||
| **Events Produced** | `promotion.requested`, `promotion.approved`, `promotion.rejected`, `promotion.started`, `promotion.completed`, `promotion.failed`, `promotion.rolled_back` |
|
||||
|
||||
**Key Operations**:
|
||||
```
|
||||
RequestPromotion(releaseId, targetEnvironmentId, reason) → Promotion
|
||||
ApprovePromotion(promotionId, comment) → Promotion
|
||||
RejectPromotion(promotionId, reason) → Promotion
|
||||
CancelPromotion(promotionId) → Promotion
|
||||
GetPromotionStatus(promotionId) → PromotionState
|
||||
GetDecisionRecord(promotionId) → DecisionRecord
|
||||
```
|
||||
|
||||
**Promotion Entity**:
|
||||
```typescript
|
||||
interface Promotion {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
releaseId: UUID;
|
||||
sourceEnvironmentId: UUID | null; // null for first deployment
|
||||
targetEnvironmentId: UUID;
|
||||
status: PromotionStatus;
|
||||
decisionRecord: DecisionRecord;
|
||||
workflowRunId: UUID | null;
|
||||
requestedAt: DateTime;
|
||||
requestedBy: UUID;
|
||||
requestReason: string;
|
||||
decidedAt: DateTime | null;
|
||||
startedAt: DateTime | null;
|
||||
completedAt: DateTime | null;
|
||||
evidencePacketId: UUID | null;
|
||||
}
|
||||
|
||||
type PromotionStatus =
|
||||
| "pending_approval" // Waiting for human approval
|
||||
| "pending_gate" // Waiting for gate evaluation
|
||||
| "approved" // Ready for deployment
|
||||
| "rejected" // Blocked by approval or gate
|
||||
| "deploying" // Deployment in progress
|
||||
| "deployed" // Successfully deployed
|
||||
| "failed" // Deployment failed
|
||||
| "cancelled" // User cancelled
|
||||
| "rolled_back"; // Rolled back after failure
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `approval-gateway`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Approval collection; separation of duties enforcement |
|
||||
| **Dependencies** | `authority` (for user/group lookup) |
|
||||
| **Data Entities** | `Approval`, `ApprovalPolicy` |
|
||||
| **Events Produced** | `approval.granted`, `approval.denied` |
|
||||
|
||||
**Approval Policy Entity**:
|
||||
```typescript
|
||||
interface ApprovalPolicy {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
environmentId: UUID;
|
||||
requiredCount: number; // Minimum approvals required
|
||||
requiredRoles: string[]; // At least one approver must have role
|
||||
requiredGroups: string[]; // At least one approver must be in group
|
||||
requireSeparationOfDuties: boolean; // Requester cannot approve
|
||||
allowSelfApproval: boolean; // Override SoD for specific users
|
||||
expirationMinutes: number; // Approval expires after N minutes
|
||||
}
|
||||
|
||||
interface Approval {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
promotionId: UUID;
|
||||
approverId: UUID;
|
||||
action: "approved" | "rejected";
|
||||
comment: string;
|
||||
approvedAt: DateTime;
|
||||
approverRole: string;
|
||||
approverGroups: string[];
|
||||
}
|
||||
```
|
||||
|
||||
**Separation of Duties (SoD) Rules**:
|
||||
1. Requester cannot approve their own promotion (if `requireSeparationOfDuties` is true)
|
||||
2. Same user cannot approve twice
|
||||
3. At least N different users must approve (based on `requiredCount`)
|
||||
4. At least one approver must match `requiredRoles` if specified
|
||||
5. At least one approver must be in `requiredGroups` if specified
|
||||
|
||||
---
|
||||
|
||||
### Module: `decision-engine`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Gate evaluation; policy integration; decision record generation |
|
||||
| **Dependencies** | `gate-registry`, `policy` (OPA integration), `scanner` (security data) |
|
||||
| **Data Entities** | `DecisionRecord`, `GateResult` |
|
||||
| **Events Produced** | `decision.evaluated`, `decision.recorded` |
|
||||
|
||||
**Decision Record Structure**:
|
||||
```typescript
|
||||
interface DecisionRecord {
|
||||
promotionId: UUID;
|
||||
evaluatedAt: DateTime;
|
||||
decision: "allow" | "deny" | "pending";
|
||||
|
||||
// What was evaluated
|
||||
release: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
components: Array<{
|
||||
name: string;
|
||||
digest: string;
|
||||
semver: string;
|
||||
}>;
|
||||
};
|
||||
|
||||
environment: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
requiredApprovals: number;
|
||||
freezeWindow: boolean;
|
||||
};
|
||||
|
||||
// Gate evaluation results
|
||||
gates: GateResult[];
|
||||
|
||||
// Approval status
|
||||
approvalStatus: {
|
||||
required: number;
|
||||
received: number;
|
||||
approvers: Array<{
|
||||
userId: UUID;
|
||||
action: string;
|
||||
at: DateTime;
|
||||
}>;
|
||||
sodViolation: boolean;
|
||||
};
|
||||
|
||||
// Reason for decision
|
||||
reasons: string[];
|
||||
|
||||
// Hash of all inputs for replay verification
|
||||
inputsHash: string;
|
||||
}
|
||||
|
||||
interface GateResult {
|
||||
gateType: string;
|
||||
gateName: string;
|
||||
status: "passed" | "failed" | "warning" | "skipped";
|
||||
message: string;
|
||||
details: Record<string, any>;
|
||||
evaluatedAt: DateTime;
|
||||
durationMs: number;
|
||||
}
|
||||
```
|
||||
|
||||
**Gate Evaluation Order**:
|
||||
1. **Freeze Window Check**: Is environment in freeze?
|
||||
2. **Approval Check**: All required approvals received?
|
||||
3. **Security Gate**: No blocking vulnerabilities?
|
||||
4. **Custom Policy Gates**: All OPA policies pass?
|
||||
5. **Integration Gates**: External system checks pass?
|
||||
|
||||
---
|
||||
|
||||
### Module: `gate-registry`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Built-in + custom gate registration |
|
||||
| **Dependencies** | `plugin-registry` |
|
||||
| **Data Entities** | `GateDefinition`, `GateConfig` |
|
||||
|
||||
**Built-in Gates**:
|
||||
|
||||
| Gate Type | Description |
|
||||
|-----------|-------------|
|
||||
| `freeze-window` | Check if environment is in freeze |
|
||||
| `approval` | Check if required approvals received |
|
||||
| `security-scan` | Check for blocking vulnerabilities |
|
||||
| `scan-freshness` | Check if scan is recent enough |
|
||||
| `digest-verification` | Verify digests haven't changed |
|
||||
| `environment-sequence` | Enforce promotion order |
|
||||
| `custom-opa` | Custom OPA/Rego policy |
|
||||
| `webhook` | External webhook gate |
|
||||
|
||||
**Gate Definition**:
|
||||
```typescript
|
||||
interface GateDefinition {
|
||||
type: string;
|
||||
displayName: string;
|
||||
description: string;
|
||||
configSchema: JSONSchema;
|
||||
evaluator: "builtin" | UUID; // builtin or plugin ID
|
||||
blocking: boolean; // Can block promotion
|
||||
cacheable: boolean; // Can cache result
|
||||
cacheTtlSeconds: number;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Promotion State Machine
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ PROMOTION STATE MACHINE │
|
||||
│ │
|
||||
│ ┌───────────────┐ │
|
||||
│ │ REQUESTED │ ◄──── User requests promotion │
|
||||
│ └───────┬───────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌───────────────┐ ┌───────────────┐ │
|
||||
│ │ PENDING │─────►│ REJECTED │ ◄──── Approver rejects │
|
||||
│ │ APPROVAL │ └───────────────┘ │
|
||||
│ └───────┬───────┘ │
|
||||
│ │ approval received │
|
||||
│ ▼ │
|
||||
│ ┌───────────────┐ ┌───────────────┐ │
|
||||
│ │ PENDING │─────►│ REJECTED │ ◄──── Gate fails │
|
||||
│ │ GATE │ └───────────────┘ │
|
||||
│ └───────┬───────┘ │
|
||||
│ │ all gates pass │
|
||||
│ ▼ │
|
||||
│ ┌───────────────┐ │
|
||||
│ │ APPROVED │ ◄──── Ready for deployment │
|
||||
│ └───────┬───────┘ │
|
||||
│ │ workflow starts │
|
||||
│ ▼ │
|
||||
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
|
||||
│ │ DEPLOYING │─────►│ FAILED │─────►│ ROLLED_BACK │ │
|
||||
│ └───────┬───────┘ └───────────────┘ └───────────────┘ │
|
||||
│ │ │
|
||||
│ │ deployment complete │
|
||||
│ ▼ │
|
||||
│ ┌───────────────┐ │
|
||||
│ │ DEPLOYED │ ◄──── Success! │
|
||||
│ └───────────────┘ │
|
||||
│ │
|
||||
│ Additional transitions: │
|
||||
│ - Any non-terminal → CANCELLED: user cancels │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Promotions
|
||||
CREATE TABLE release.promotions (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
release_id UUID NOT NULL REFERENCES release.releases(id),
|
||||
source_environment_id UUID REFERENCES release.environments(id),
|
||||
target_environment_id UUID NOT NULL REFERENCES release.environments(id),
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending_approval' CHECK (status IN (
|
||||
'pending_approval', 'pending_gate', 'approved', 'rejected',
|
||||
'deploying', 'deployed', 'failed', 'cancelled', 'rolled_back'
|
||||
)),
|
||||
decision_record JSONB,
|
||||
workflow_run_id UUID REFERENCES release.workflow_runs(id),
|
||||
requested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
requested_by UUID NOT NULL REFERENCES users(id),
|
||||
request_reason TEXT,
|
||||
decided_at TIMESTAMPTZ,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
evidence_packet_id UUID
|
||||
);
|
||||
|
||||
CREATE INDEX idx_promotions_tenant ON release.promotions(tenant_id);
|
||||
CREATE INDEX idx_promotions_release ON release.promotions(release_id);
|
||||
CREATE INDEX idx_promotions_status ON release.promotions(status);
|
||||
CREATE INDEX idx_promotions_target_env ON release.promotions(target_environment_id);
|
||||
|
||||
-- Approvals
|
||||
CREATE TABLE release.approvals (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
promotion_id UUID NOT NULL REFERENCES release.promotions(id) ON DELETE CASCADE,
|
||||
approver_id UUID NOT NULL REFERENCES users(id),
|
||||
action VARCHAR(50) NOT NULL CHECK (action IN ('approved', 'rejected')),
|
||||
comment TEXT,
|
||||
approved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
approver_role VARCHAR(255),
|
||||
approver_groups JSONB NOT NULL DEFAULT '[]'
|
||||
);
|
||||
|
||||
CREATE INDEX idx_approvals_promotion ON release.approvals(promotion_id);
|
||||
CREATE INDEX idx_approvals_approver ON release.approvals(approver_id);
|
||||
|
||||
-- Approval Policies
|
||||
CREATE TABLE release.approval_policies (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE,
|
||||
required_count INTEGER NOT NULL DEFAULT 1,
|
||||
required_roles JSONB NOT NULL DEFAULT '[]',
|
||||
required_groups JSONB NOT NULL DEFAULT '[]',
|
||||
require_sod BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
allow_self_approval BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
expiration_minutes INTEGER NOT NULL DEFAULT 1440,
|
||||
UNIQUE (tenant_id, environment_id)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Promotions
|
||||
POST /api/v1/promotions
|
||||
Body: { releaseId, targetEnvironmentId, reason? }
|
||||
Response: Promotion
|
||||
|
||||
GET /api/v1/promotions
|
||||
Query: ?status={status}&releaseId={uuid}&environmentId={uuid}&page={n}
|
||||
Response: { data: Promotion[], meta: PaginationMeta }
|
||||
|
||||
GET /api/v1/promotions/{id}
|
||||
Response: Promotion (with decision record, approvals)
|
||||
|
||||
POST /api/v1/promotions/{id}/approve
|
||||
Body: { comment? }
|
||||
Response: Promotion
|
||||
|
||||
POST /api/v1/promotions/{id}/reject
|
||||
Body: { reason }
|
||||
Response: Promotion
|
||||
|
||||
POST /api/v1/promotions/{id}/cancel
|
||||
Response: Promotion
|
||||
|
||||
GET /api/v1/promotions/{id}/decision
|
||||
Response: DecisionRecord
|
||||
|
||||
GET /api/v1/promotions/{id}/approvals
|
||||
Response: Approval[]
|
||||
|
||||
GET /api/v1/promotions/{id}/evidence
|
||||
Response: EvidencePacket
|
||||
|
||||
# Gate Evaluation Preview
|
||||
POST /api/v1/promotions/preview-gates
|
||||
Body: { releaseId, targetEnvironmentId }
|
||||
Response: { wouldPass: boolean, gates: GateResult[] }
|
||||
|
||||
# Approval Policies
|
||||
POST /api/v1/approval-policies
|
||||
GET /api/v1/approval-policies
|
||||
GET /api/v1/approval-policies/{id}
|
||||
PUT /api/v1/approval-policies/{id}
|
||||
DELETE /api/v1/approval-policies/{id}
|
||||
|
||||
# Pending Approvals (for current user)
|
||||
GET /api/v1/my/pending-approvals
|
||||
Response: Promotion[]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Gate Integration
|
||||
|
||||
The security gate evaluates the release against vulnerability data from the Scanner module:
|
||||
|
||||
```typescript
|
||||
interface SecurityGateConfig {
|
||||
blockOnCritical: boolean; // Block if any critical severity
|
||||
blockOnHigh: boolean; // Block if any high severity
|
||||
maxCritical: number; // Max allowed critical (0 for strict)
|
||||
maxHigh: number; // Max allowed high
|
||||
requireFreshScan: boolean; // Require scan within N hours
|
||||
scanFreshnessHours: number; // How recent scan must be
|
||||
allowExceptions: boolean; // Allow VEX exceptions
|
||||
requireVexJustification: boolean; // Require VEX for exceptions
|
||||
}
|
||||
|
||||
interface SecurityGateResult {
|
||||
passed: boolean;
|
||||
summary: {
|
||||
critical: number;
|
||||
high: number;
|
||||
medium: number;
|
||||
low: number;
|
||||
};
|
||||
blocking: Array<{
|
||||
cve: string;
|
||||
severity: string;
|
||||
component: string;
|
||||
digest: string;
|
||||
fixAvailable: boolean;
|
||||
}>;
|
||||
exceptions: Array<{
|
||||
cve: string;
|
||||
vexStatus: string;
|
||||
justification: string;
|
||||
}>;
|
||||
scanAge: {
|
||||
component: string;
|
||||
scannedAt: DateTime;
|
||||
ageHours: number;
|
||||
fresh: boolean;
|
||||
}[];
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Workflow Engine](workflow-engine.md)
|
||||
- [Security Architecture](../security/overview.md)
|
||||
- [API Documentation](../api/promotions.md)
|
||||
406
docs/modules/release-orchestrator/modules/release-manager.md
Normal file
406
docs/modules/release-orchestrator/modules/release-manager.md
Normal file
@@ -0,0 +1,406 @@
|
||||
# RELMAN: Release Management
|
||||
|
||||
**Purpose**: Manage components, versions, and release bundles.
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `component-registry`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Map image repositories to logical components |
|
||||
| **Dependencies** | `integration-manager` (for registry access) |
|
||||
| **Data Entities** | `Component`, `ComponentVersion` |
|
||||
| **Events Produced** | `component.created`, `component.updated`, `component.deleted` |
|
||||
|
||||
**Key Operations**:
|
||||
```
|
||||
CreateComponent(name, displayName, imageRepository, registryId) → Component
|
||||
UpdateComponent(id, config) → Component
|
||||
DeleteComponent(id) → void
|
||||
SyncVersions(componentId, forceRefresh) → VersionMap[]
|
||||
ListComponents(tenantId) → Component[]
|
||||
```
|
||||
|
||||
**Component Entity**:
|
||||
```typescript
|
||||
interface Component {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string; // "api", "worker", "frontend"
|
||||
displayName: string; // "API Service"
|
||||
imageRepository: string; // "registry.example.com/myapp/api"
|
||||
registryIntegrationId: UUID; // which registry integration
|
||||
versioningStrategy: VersionStrategy;
|
||||
deploymentTemplate: string; // which workflow template to use
|
||||
defaultChannel: string; // "stable", "beta"
|
||||
metadata: Record<string, string>;
|
||||
}
|
||||
|
||||
interface VersionStrategy {
|
||||
type: "semver" | "date" | "sequential" | "manual";
|
||||
tagPattern?: string; // regex for tag extraction
|
||||
semverExtract?: string; // regex capture group
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `version-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Tag/digest mapping; version rules |
|
||||
| **Dependencies** | `component-registry`, `connector-runtime` |
|
||||
| **Data Entities** | `VersionMap`, `VersionRule`, `Channel` |
|
||||
| **Events Produced** | `version.resolved`, `version.updated` |
|
||||
|
||||
**Version Resolution**:
|
||||
```typescript
|
||||
interface VersionMap {
|
||||
id: UUID;
|
||||
componentId: UUID;
|
||||
tag: string; // "v2.3.1"
|
||||
digest: string; // "sha256:abc123..."
|
||||
semver: string; // "2.3.1"
|
||||
channel: string; // "stable"
|
||||
prerelease: boolean;
|
||||
buildMetadata: string;
|
||||
resolvedAt: DateTime;
|
||||
source: "auto" | "manual";
|
||||
}
|
||||
|
||||
interface VersionRule {
|
||||
id: UUID;
|
||||
componentId: UUID;
|
||||
pattern: string; // "^v(\\d+\\.\\d+\\.\\d+)$"
|
||||
channel: string; // "stable"
|
||||
prereleasePattern: string;// ".*-(alpha|beta|rc).*"
|
||||
}
|
||||
```
|
||||
|
||||
**Version Resolution Algorithm**:
|
||||
1. Fetch tags from registry (via connector)
|
||||
2. Apply version rules to extract semver
|
||||
3. Resolve each tag to digest
|
||||
4. Store in version map
|
||||
5. Update channels ("latest stable", "latest beta")
|
||||
|
||||
---
|
||||
|
||||
### Module: `release-manager`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Release bundle lifecycle; composition |
|
||||
| **Dependencies** | `component-registry`, `version-manager` |
|
||||
| **Data Entities** | `Release`, `ReleaseComponent` |
|
||||
| **Events Produced** | `release.created`, `release.promoted`, `release.deprecated` |
|
||||
|
||||
**Release Entity**:
|
||||
```typescript
|
||||
interface Release {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string; // "myapp-v2.3.1"
|
||||
displayName: string; // "MyApp 2.3.1"
|
||||
components: ReleaseComponent[];
|
||||
sourceRef: SourceReference;
|
||||
status: ReleaseStatus;
|
||||
createdAt: DateTime;
|
||||
createdBy: UUID;
|
||||
deployedEnvironments: UUID[]; // where currently deployed
|
||||
metadata: Record<string, string>;
|
||||
}
|
||||
|
||||
interface ReleaseComponent {
|
||||
componentId: UUID;
|
||||
componentName: string;
|
||||
digest: string; // sha256:...
|
||||
semver: string; // resolved semver
|
||||
tag: string; // original tag (for display)
|
||||
role: "primary" | "sidecar" | "init" | "migration";
|
||||
}
|
||||
|
||||
interface SourceReference {
|
||||
scmIntegrationId?: UUID;
|
||||
commitSha?: string;
|
||||
branch?: string;
|
||||
ciIntegrationId?: UUID;
|
||||
buildId?: string;
|
||||
pipelineUrl?: string;
|
||||
}
|
||||
|
||||
type ReleaseStatus =
|
||||
| "draft" // being composed
|
||||
| "ready" // ready for promotion
|
||||
| "promoting" // promotion in progress
|
||||
| "deployed" // deployed to at least one env
|
||||
| "deprecated" // marked as deprecated
|
||||
| "archived"; // no longer active
|
||||
```
|
||||
|
||||
**Release Creation Modes**:
|
||||
|
||||
| Mode | Description |
|
||||
|------|-------------|
|
||||
| **Full Release** | All components, latest versions |
|
||||
| **Partial Release** | Subset of components updated; others pinned from last deployment |
|
||||
| **Pinned Release** | All versions explicitly specified |
|
||||
| **Channel Release** | All components from specific channel ("beta") |
|
||||
|
||||
---
|
||||
|
||||
### Module: `release-catalog`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Release history, search, comparison |
|
||||
| **Dependencies** | `release-manager` |
|
||||
|
||||
**Key Operations**:
|
||||
```
|
||||
SearchReleases(filter, pagination) → Release[]
|
||||
CompareReleases(releaseA, releaseB) → ReleaseDiff
|
||||
GetReleaseHistory(componentId) → Release[]
|
||||
GetReleaseLineage(releaseId) → ReleaseLineage // promotion path
|
||||
```
|
||||
|
||||
**Release Comparison**:
|
||||
```typescript
|
||||
interface ReleaseDiff {
|
||||
releaseA: UUID;
|
||||
releaseB: UUID;
|
||||
added: ComponentDiff[]; // Components in B not in A
|
||||
removed: ComponentDiff[]; // Components in A not in B
|
||||
changed: ComponentChange[]; // Components with different versions
|
||||
unchanged: ComponentDiff[]; // Components with same version
|
||||
}
|
||||
|
||||
interface ComponentChange {
|
||||
componentId: UUID;
|
||||
componentName: string;
|
||||
fromVersion: string;
|
||||
toVersion: string;
|
||||
fromDigest: string;
|
||||
toDigest: string;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Components
|
||||
CREATE TABLE release.components (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
image_repository VARCHAR(500) NOT NULL,
|
||||
registry_integration_id UUID REFERENCES release.integrations(id),
|
||||
versioning_strategy JSONB NOT NULL DEFAULT '{"type": "semver"}',
|
||||
deployment_template VARCHAR(255),
|
||||
default_channel VARCHAR(50) NOT NULL DEFAULT 'stable',
|
||||
metadata JSONB NOT NULL DEFAULT '{}',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_components_tenant ON release.components(tenant_id);
|
||||
|
||||
-- Version Maps
|
||||
CREATE TABLE release.version_maps (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
component_id UUID NOT NULL REFERENCES release.components(id) ON DELETE CASCADE,
|
||||
tag VARCHAR(255) NOT NULL,
|
||||
digest VARCHAR(100) NOT NULL,
|
||||
semver VARCHAR(50),
|
||||
channel VARCHAR(50) NOT NULL DEFAULT 'stable',
|
||||
prerelease BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
build_metadata VARCHAR(255),
|
||||
resolved_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
source VARCHAR(50) NOT NULL DEFAULT 'auto',
|
||||
UNIQUE (tenant_id, component_id, digest)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_version_maps_component ON release.version_maps(component_id);
|
||||
CREATE INDEX idx_version_maps_digest ON release.version_maps(digest);
|
||||
CREATE INDEX idx_version_maps_semver ON release.version_maps(semver);
|
||||
|
||||
-- Releases
|
||||
CREATE TABLE release.releases (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
components JSONB NOT NULL, -- [{componentId, digest, semver, tag, role}]
|
||||
source_ref JSONB, -- {scmIntegrationId, commitSha, ciIntegrationId, buildId}
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'draft',
|
||||
metadata JSONB NOT NULL DEFAULT '{}',
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
created_by UUID REFERENCES users(id),
|
||||
UNIQUE (tenant_id, name)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_releases_tenant ON release.releases(tenant_id);
|
||||
CREATE INDEX idx_releases_status ON release.releases(status);
|
||||
CREATE INDEX idx_releases_created ON release.releases(created_at DESC);
|
||||
|
||||
-- Release Environment State
|
||||
CREATE TABLE release.release_environment_state (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
environment_id UUID NOT NULL REFERENCES release.environments(id) ON DELETE CASCADE,
|
||||
release_id UUID NOT NULL REFERENCES release.releases(id),
|
||||
status VARCHAR(50) NOT NULL,
|
||||
deployed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
deployed_by UUID REFERENCES users(id),
|
||||
promotion_id UUID,
|
||||
evidence_ref VARCHAR(255),
|
||||
UNIQUE (tenant_id, environment_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_release_env_state_env ON release.release_environment_state(environment_id);
|
||||
CREATE INDEX idx_release_env_state_release ON release.release_environment_state(release_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Components
|
||||
POST /api/v1/components
|
||||
Body: { name, displayName, imageRepository, registryIntegrationId, versioningStrategy?, defaultChannel? }
|
||||
Response: Component
|
||||
|
||||
GET /api/v1/components
|
||||
Response: Component[]
|
||||
|
||||
GET /api/v1/components/{id}
|
||||
Response: Component
|
||||
|
||||
PUT /api/v1/components/{id}
|
||||
Response: Component
|
||||
|
||||
DELETE /api/v1/components/{id}
|
||||
Response: { deleted: true }
|
||||
|
||||
POST /api/v1/components/{id}/sync-versions
|
||||
Body: { forceRefresh?: boolean }
|
||||
Response: { synced: number, versions: VersionMap[] }
|
||||
|
||||
GET /api/v1/components/{id}/versions
|
||||
Query: ?channel={stable|beta}&limit={n}
|
||||
Response: VersionMap[]
|
||||
|
||||
# Version Maps
|
||||
POST /api/v1/version-maps
|
||||
Body: { componentId, tag, semver, channel } # manual version assignment
|
||||
Response: VersionMap
|
||||
|
||||
GET /api/v1/version-maps
|
||||
Query: ?componentId={uuid}&channel={channel}
|
||||
Response: VersionMap[]
|
||||
|
||||
# Releases
|
||||
POST /api/v1/releases
|
||||
Body: {
|
||||
name: string,
|
||||
displayName?: string,
|
||||
components: [
|
||||
{ componentId: UUID, version?: string, digest?: string, channel?: string }
|
||||
],
|
||||
sourceRef?: SourceReference
|
||||
}
|
||||
Response: Release
|
||||
|
||||
GET /api/v1/releases
|
||||
Query: ?status={status}&componentId={uuid}&page={n}&pageSize={n}
|
||||
Response: { data: Release[], meta: PaginationMeta }
|
||||
|
||||
GET /api/v1/releases/{id}
|
||||
Response: Release (with full component details)
|
||||
|
||||
PUT /api/v1/releases/{id}
|
||||
Body: { displayName?, metadata?, status? }
|
||||
Response: Release
|
||||
|
||||
DELETE /api/v1/releases/{id}
|
||||
Response: { deleted: true }
|
||||
|
||||
GET /api/v1/releases/{id}/state
|
||||
Response: { environments: [{ environmentId, status, deployedAt }] }
|
||||
|
||||
POST /api/v1/releases/{id}/deprecate
|
||||
Response: Release
|
||||
|
||||
GET /api/v1/releases/{id}/compare/{otherId}
|
||||
Response: ReleaseDiff
|
||||
|
||||
# Quick release creation
|
||||
POST /api/v1/releases/from-latest
|
||||
Body: {
|
||||
name: string,
|
||||
channel?: string, # default: stable
|
||||
componentIds?: UUID[], # default: all
|
||||
pinFrom?: { environmentId: UUID } # for partial release
|
||||
}
|
||||
Response: Release
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Release Identity: Digest-First Principle
|
||||
|
||||
A core design invariant of the Release Orchestrator:
|
||||
|
||||
```
|
||||
INVARIANT: A release is a set of OCI image digests (component -> digest mapping), never tags.
|
||||
```
|
||||
|
||||
**Implementation Requirements**:
|
||||
- Tags are convenience inputs for resolution
|
||||
- Tags are resolved to digests at release creation time
|
||||
- All downstream operations (promotion, deployment, rollback) use digests
|
||||
- Digest mismatch at pull time = deployment failure (tamper detection)
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{
|
||||
"id": "release-uuid",
|
||||
"name": "myapp-v2.3.1",
|
||||
"components": [
|
||||
{
|
||||
"componentId": "api-component-uuid",
|
||||
"componentName": "api",
|
||||
"tag": "v2.3.1",
|
||||
"digest": "sha256:abc123def456...",
|
||||
"semver": "2.3.1",
|
||||
"role": "primary"
|
||||
},
|
||||
{
|
||||
"componentId": "worker-component-uuid",
|
||||
"componentName": "worker",
|
||||
"tag": "v2.3.1",
|
||||
"digest": "sha256:789xyz123abc...",
|
||||
"semver": "2.3.1",
|
||||
"role": "primary"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Design Principles](../design/principles.md)
|
||||
- [API Documentation](../api/releases.md)
|
||||
- [Promotion Manager](promotion-manager.md)
|
||||
590
docs/modules/release-orchestrator/modules/workflow-engine.md
Normal file
590
docs/modules/release-orchestrator/modules/workflow-engine.md
Normal file
@@ -0,0 +1,590 @@
|
||||
# WORKFL: Workflow Engine
|
||||
|
||||
**Purpose**: DAG-based workflow execution for deployments, approvals, and custom automation.
|
||||
|
||||
## Modules
|
||||
|
||||
### Module: `workflow-designer`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Template creation; DAG graph editor; validation |
|
||||
| **Dependencies** | `step-registry` |
|
||||
| **Data Entities** | `WorkflowTemplate`, `StepNode`, `StepEdge` |
|
||||
|
||||
**Workflow Template Structure**:
|
||||
```typescript
|
||||
interface WorkflowTemplate {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string;
|
||||
displayName: string;
|
||||
description: string;
|
||||
version: number;
|
||||
|
||||
// DAG structure
|
||||
nodes: StepNode[];
|
||||
edges: StepEdge[];
|
||||
|
||||
// I/O
|
||||
inputs: InputDefinition[];
|
||||
outputs: OutputDefinition[];
|
||||
|
||||
// Metadata
|
||||
tags: string[];
|
||||
isBuiltin: boolean;
|
||||
createdAt: DateTime;
|
||||
createdBy: UUID;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `workflow-engine`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | DAG execution; state machine; pause/resume |
|
||||
| **Dependencies** | `step-executor`, `step-registry` |
|
||||
| **Data Entities** | `WorkflowRun`, `WorkflowState` |
|
||||
| **Events Produced** | `workflow.started`, `workflow.paused`, `workflow.resumed`, `workflow.completed`, `workflow.failed` |
|
||||
|
||||
**Workflow Execution Algorithm**:
|
||||
```python
|
||||
class WorkflowEngine:
|
||||
def execute(self, workflow_run: WorkflowRun) -> None:
|
||||
"""Main workflow execution loop."""
|
||||
|
||||
# Initialize
|
||||
workflow_run.status = "running"
|
||||
workflow_run.started_at = now()
|
||||
self.save(workflow_run)
|
||||
|
||||
try:
|
||||
while not self.is_terminal(workflow_run):
|
||||
# Handle pause state
|
||||
if workflow_run.status == "paused":
|
||||
self.wait_for_resume(workflow_run)
|
||||
continue
|
||||
|
||||
# Get nodes ready for execution
|
||||
ready_nodes = self.get_ready_nodes(workflow_run)
|
||||
|
||||
if not ready_nodes:
|
||||
# Check if we're waiting on approvals
|
||||
if self.has_pending_approvals(workflow_run):
|
||||
workflow_run.status = "paused"
|
||||
self.save(workflow_run)
|
||||
continue
|
||||
|
||||
# Check if all nodes are complete
|
||||
if self.all_nodes_complete(workflow_run):
|
||||
break
|
||||
|
||||
# Deadlock detection
|
||||
raise WorkflowDeadlockError(workflow_run.id)
|
||||
|
||||
# Execute ready nodes in parallel
|
||||
futures = []
|
||||
for node in ready_nodes:
|
||||
future = self.executor.submit(
|
||||
self.execute_node,
|
||||
workflow_run,
|
||||
node
|
||||
)
|
||||
futures.append((node, future))
|
||||
|
||||
# Wait for at least one to complete
|
||||
completed = self.wait_any(futures)
|
||||
|
||||
for node, result in completed:
|
||||
step_run = self.get_step_run(workflow_run, node.id)
|
||||
|
||||
if result.success:
|
||||
step_run.status = "succeeded"
|
||||
step_run.outputs = result.outputs
|
||||
self.propagate_outputs(workflow_run, node, result.outputs)
|
||||
else:
|
||||
step_run.status = "failed"
|
||||
step_run.error_message = result.error
|
||||
|
||||
# Handle failure action
|
||||
if node.on_failure == "fail":
|
||||
workflow_run.status = "failed"
|
||||
workflow_run.error_message = f"Step {node.name} failed: {result.error}"
|
||||
self.cancel_pending_steps(workflow_run)
|
||||
return
|
||||
elif node.on_failure == "rollback":
|
||||
self.trigger_rollback(workflow_run, node)
|
||||
elif node.on_failure.startswith("goto:"):
|
||||
target = node.on_failure.split(":")[1]
|
||||
self.add_ready_node(workflow_run, target)
|
||||
# "continue" just continues to next nodes
|
||||
|
||||
step_run.completed_at = now()
|
||||
self.save(step_run)
|
||||
|
||||
# Workflow completed successfully
|
||||
workflow_run.status = "succeeded"
|
||||
workflow_run.completed_at = now()
|
||||
self.save(workflow_run)
|
||||
|
||||
except WorkflowCancelledError:
|
||||
workflow_run.status = "cancelled"
|
||||
workflow_run.completed_at = now()
|
||||
self.save(workflow_run)
|
||||
except Exception as e:
|
||||
workflow_run.status = "failed"
|
||||
workflow_run.error_message = str(e)
|
||||
workflow_run.completed_at = now()
|
||||
self.save(workflow_run)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `step-executor`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Step dispatch; retry logic; timeout handling |
|
||||
| **Dependencies** | `step-registry`, `plugin-sandbox` |
|
||||
| **Data Entities** | `StepRun`, `StepResult` |
|
||||
| **Events Produced** | `step.started`, `step.progress`, `step.completed`, `step.failed`, `step.retrying` |
|
||||
|
||||
**Step Node Structure**:
|
||||
```typescript
|
||||
interface StepNode {
|
||||
id: string; // Unique within template (e.g., "deploy-api")
|
||||
type: string; // Step type from registry
|
||||
name: string; // Display name
|
||||
config: Record<string, any>; // Step-specific configuration
|
||||
inputs: InputBinding[]; // Input value bindings
|
||||
outputs: OutputBinding[]; // Output declarations
|
||||
position: { x: number; y: number }; // UI position
|
||||
|
||||
// Execution settings
|
||||
timeout: number; // Seconds (default from step type)
|
||||
retryPolicy: RetryPolicy;
|
||||
onFailure: FailureAction;
|
||||
condition?: string; // JS expression for conditional execution
|
||||
|
||||
// Documentation
|
||||
description?: string;
|
||||
documentation?: string;
|
||||
}
|
||||
|
||||
type FailureAction = "fail" | "continue" | "rollback" | "goto:{nodeId}";
|
||||
|
||||
interface InputBinding {
|
||||
name: string; // Input parameter name
|
||||
source: InputSource;
|
||||
}
|
||||
|
||||
type InputSource =
|
||||
| { type: "literal"; value: any }
|
||||
| { type: "context"; path: string } // e.g., "release.name"
|
||||
| { type: "output"; nodeId: string; outputName: string }
|
||||
| { type: "secret"; secretName: string }
|
||||
| { type: "expression"; expression: string }; // JS expression
|
||||
|
||||
interface StepEdge {
|
||||
id: string;
|
||||
from: string; // Source node ID
|
||||
to: string; // Target node ID
|
||||
condition?: string; // Optional condition expression
|
||||
label?: string; // Display label for conditional edges
|
||||
}
|
||||
|
||||
interface RetryPolicy {
|
||||
maxRetries: number;
|
||||
backoffType: "fixed" | "exponential";
|
||||
backoffSeconds: number;
|
||||
retryableErrors: string[];
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Module: `step-registry`
|
||||
|
||||
| Aspect | Specification |
|
||||
|--------|---------------|
|
||||
| **Responsibility** | Built-in + plugin-provided step types |
|
||||
| **Dependencies** | `plugin-registry` |
|
||||
| **Data Entities** | `StepType`, `StepSchema` |
|
||||
|
||||
**Built-in Step Types**:
|
||||
|
||||
| Step Type | Category | Description |
|
||||
|-----------|----------|-------------|
|
||||
| `approval` | Control | Wait for human approval |
|
||||
| `security-gate` | Gate | Evaluate security policy |
|
||||
| `custom-gate` | Gate | Custom OPA policy evaluation |
|
||||
| `deploy-docker` | Deploy | Deploy single container |
|
||||
| `deploy-compose` | Deploy | Deploy Docker Compose stack |
|
||||
| `deploy-ecs` | Deploy | Deploy to AWS ECS |
|
||||
| `deploy-nomad` | Deploy | Deploy to HashiCorp Nomad |
|
||||
| `health-check` | Verify | HTTP/TCP health check |
|
||||
| `smoke-test` | Verify | Run smoke test suite |
|
||||
| `execute-script` | Custom | Run C#/Bash script |
|
||||
| `webhook` | Integration | Call external webhook |
|
||||
| `trigger-ci` | Integration | Trigger CI pipeline |
|
||||
| `wait-ci` | Integration | Wait for CI pipeline |
|
||||
| `notify` | Notification | Send notification |
|
||||
| `rollback` | Recovery | Rollback deployment |
|
||||
| `traffic-shift` | Progressive | Shift traffic percentage |
|
||||
|
||||
**Step Type Definition**:
|
||||
```typescript
|
||||
interface StepType {
|
||||
type: string; // "deploy-compose"
|
||||
displayName: string; // "Deploy Compose Stack"
|
||||
description: string;
|
||||
category: StepCategory;
|
||||
icon: string;
|
||||
|
||||
// Schema
|
||||
configSchema: JSONSchema; // Step configuration schema
|
||||
inputSchema: JSONSchema; // Required inputs schema
|
||||
outputSchema: JSONSchema; // Produced outputs schema
|
||||
|
||||
// Execution
|
||||
executor: "builtin" | UUID; // builtin or plugin ID
|
||||
defaultTimeout: number;
|
||||
safeToRetry: boolean;
|
||||
retryableErrors: string[];
|
||||
|
||||
// Documentation
|
||||
documentation: string;
|
||||
examples: StepExample[];
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow Run State Machine
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ WORKFLOW RUN STATE MACHINE │
|
||||
│ │
|
||||
│ ┌──────────┐ │
|
||||
│ │ CREATED │ │
|
||||
│ └────┬─────┘ │
|
||||
│ │ start() │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────┐ │
|
||||
│ │ │ │
|
||||
│ pause() ┌──┴──────────┐ │ │
|
||||
│ ┌────────►│ PAUSED │◄─────────┐ │ │
|
||||
│ │ └──────┬──────┘ │ │ │
|
||||
│ │ │ resume() │ │ │
|
||||
│ │ ▼ │ │ │
|
||||
│ │ ┌─────────────┐ │ │ │
|
||||
│ └─────────│ RUNNING │──────────┘ │ │
|
||||
│ └──────┬──────┘ (waiting for │ │
|
||||
│ │ approval) │ │
|
||||
│ ┌────────────┼────────────┐ │ │
|
||||
│ │ │ │ │ │
|
||||
│ ▼ ▼ ▼ │ │
|
||||
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
|
||||
│ │ SUCCEEDED │ │ FAILED │ │ CANCELLED │ │ │
|
||||
│ └───────────┘ └───────────┘ └───────────┘ │ │
|
||||
│ │
|
||||
│ Transitions: │
|
||||
│ - CREATED → RUNNING: start() │
|
||||
│ - RUNNING → PAUSED: pause(), waiting approval │
|
||||
│ - PAUSED → RUNNING: resume(), approval granted │
|
||||
│ - RUNNING → SUCCEEDED: all nodes complete │
|
||||
│ - RUNNING → FAILED: node fails with fail action │
|
||||
│ - RUNNING → CANCELLED: cancel() │
|
||||
│ - PAUSED → CANCELLED: cancel() │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Step Run State Machine
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ STEP RUN STATE MACHINE │
|
||||
│ │
|
||||
│ ┌──────────┐ │
|
||||
│ │ PENDING │ ◄──── Initial state; dependencies not met │
|
||||
│ └────┬─────┘ │
|
||||
│ │ dependencies met + condition true │
|
||||
│ ▼ │
|
||||
│ ┌──────────┐ │
|
||||
│ │ RUNNING │ ◄──── Step is executing │
|
||||
│ └────┬─────┘ │
|
||||
│ │ │
|
||||
│ ┌────┴────────────────┬─────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
|
||||
│ │ SUCCEEDED │ │ FAILED │ │ SKIPPED │ │
|
||||
│ └───────────┘ └─────┬─────┘ └───────────┘ │
|
||||
│ │ ▲ │
|
||||
│ │ │ condition false │
|
||||
│ ▼ │ │
|
||||
│ ┌───────────┐ │ │
|
||||
│ │ RETRYING │──────┘ (max retries exceeded) │
|
||||
│ └─────┬─────┘ │
|
||||
│ │ │
|
||||
│ │ retry attempt │
|
||||
│ └──────────────────┐ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────┐ │
|
||||
│ │ RUNNING │ (retry) │
|
||||
│ └──────────┘ │
|
||||
│ │
|
||||
│ Additional transitions: │
|
||||
│ - Any state → CANCELLED: workflow cancelled │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
-- Workflow Templates
|
||||
CREATE TABLE release.workflow_templates (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
description TEXT,
|
||||
version INTEGER NOT NULL DEFAULT 1,
|
||||
nodes JSONB NOT NULL,
|
||||
edges JSONB NOT NULL,
|
||||
inputs JSONB NOT NULL DEFAULT '[]',
|
||||
outputs JSONB NOT NULL DEFAULT '[]',
|
||||
tags JSONB NOT NULL DEFAULT '[]',
|
||||
is_builtin BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
created_by UUID REFERENCES users(id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_workflow_templates_tenant ON release.workflow_templates(tenant_id);
|
||||
CREATE INDEX idx_workflow_templates_name ON release.workflow_templates(name);
|
||||
|
||||
-- Workflow Runs
|
||||
CREATE TABLE release.workflow_runs (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
|
||||
template_id UUID NOT NULL REFERENCES release.workflow_templates(id),
|
||||
template_version INTEGER NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'created',
|
||||
context JSONB NOT NULL,
|
||||
inputs JSONB NOT NULL DEFAULT '{}',
|
||||
outputs JSONB NOT NULL DEFAULT '{}',
|
||||
error_message TEXT,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
created_by UUID REFERENCES users(id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_workflow_runs_tenant ON release.workflow_runs(tenant_id);
|
||||
CREATE INDEX idx_workflow_runs_template ON release.workflow_runs(template_id);
|
||||
CREATE INDEX idx_workflow_runs_status ON release.workflow_runs(status);
|
||||
|
||||
-- Step Runs
|
||||
CREATE TABLE release.step_runs (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
workflow_run_id UUID NOT NULL REFERENCES release.workflow_runs(id) ON DELETE CASCADE,
|
||||
node_id VARCHAR(255) NOT NULL,
|
||||
status VARCHAR(50) NOT NULL DEFAULT 'pending',
|
||||
inputs JSONB NOT NULL DEFAULT '{}',
|
||||
outputs JSONB NOT NULL DEFAULT '{}',
|
||||
error_message TEXT,
|
||||
logs TEXT,
|
||||
attempt_number INTEGER NOT NULL DEFAULT 1,
|
||||
started_at TIMESTAMPTZ,
|
||||
completed_at TIMESTAMPTZ,
|
||||
UNIQUE (workflow_run_id, node_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_step_runs_workflow ON release.step_runs(workflow_run_id);
|
||||
CREATE INDEX idx_step_runs_status ON release.step_runs(status);
|
||||
|
||||
-- Step Registry
|
||||
CREATE TABLE release.step_types (
|
||||
type VARCHAR(255) PRIMARY KEY,
|
||||
display_name VARCHAR(255) NOT NULL,
|
||||
description TEXT,
|
||||
category VARCHAR(100) NOT NULL,
|
||||
icon VARCHAR(255),
|
||||
config_schema JSONB NOT NULL,
|
||||
input_schema JSONB NOT NULL,
|
||||
output_schema JSONB NOT NULL,
|
||||
executor VARCHAR(255) NOT NULL DEFAULT 'builtin',
|
||||
default_timeout INTEGER NOT NULL DEFAULT 300,
|
||||
safe_to_retry BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
retryable_errors JSONB NOT NULL DEFAULT '[]',
|
||||
documentation TEXT,
|
||||
examples JSONB NOT NULL DEFAULT '[]',
|
||||
plugin_id UUID REFERENCES release.plugins(id),
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_step_types_category ON release.step_types(category);
|
||||
CREATE INDEX idx_step_types_plugin ON release.step_types(plugin_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow Template Example: Standard Deployment
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "template-standard-deploy",
|
||||
"name": "standard-deploy",
|
||||
"displayName": "Standard Deployment",
|
||||
"version": 1,
|
||||
"inputs": [
|
||||
{ "name": "releaseId", "type": "uuid", "required": true },
|
||||
{ "name": "environmentId", "type": "uuid", "required": true },
|
||||
{ "name": "promotionId", "type": "uuid", "required": true }
|
||||
],
|
||||
"nodes": [
|
||||
{
|
||||
"id": "approval",
|
||||
"type": "approval",
|
||||
"name": "Approval Gate",
|
||||
"config": {},
|
||||
"inputs": [
|
||||
{ "name": "promotionId", "source": { "type": "context", "path": "promotionId" } }
|
||||
],
|
||||
"position": { "x": 100, "y": 100 }
|
||||
},
|
||||
{
|
||||
"id": "security-gate",
|
||||
"type": "security-gate",
|
||||
"name": "Security Verification",
|
||||
"config": {
|
||||
"blockOnCritical": true,
|
||||
"blockOnHigh": true
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "releaseId", "source": { "type": "context", "path": "releaseId" } }
|
||||
],
|
||||
"position": { "x": 100, "y": 200 }
|
||||
},
|
||||
{
|
||||
"id": "deploy-targets",
|
||||
"type": "deploy-compose",
|
||||
"name": "Deploy to Targets",
|
||||
"config": {
|
||||
"strategy": "rolling",
|
||||
"parallelism": 2
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "releaseId", "source": { "type": "context", "path": "releaseId" } },
|
||||
{ "name": "environmentId", "source": { "type": "context", "path": "environmentId" } }
|
||||
],
|
||||
"timeout": 600,
|
||||
"retryPolicy": {
|
||||
"maxRetries": 2,
|
||||
"backoffType": "exponential",
|
||||
"backoffSeconds": 30
|
||||
},
|
||||
"onFailure": "rollback",
|
||||
"position": { "x": 100, "y": 400 }
|
||||
},
|
||||
{
|
||||
"id": "health-check",
|
||||
"type": "health-check",
|
||||
"name": "Health Verification",
|
||||
"config": {
|
||||
"type": "http",
|
||||
"path": "/health",
|
||||
"expectedStatus": 200,
|
||||
"timeout": 30,
|
||||
"retries": 5
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "targets", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "deployedTargets" } }
|
||||
],
|
||||
"onFailure": "rollback",
|
||||
"position": { "x": 100, "y": 500 }
|
||||
},
|
||||
{
|
||||
"id": "notify-success",
|
||||
"type": "notify",
|
||||
"name": "Success Notification",
|
||||
"config": {
|
||||
"channel": "slack",
|
||||
"template": "deployment-success"
|
||||
},
|
||||
"onFailure": "continue",
|
||||
"position": { "x": 100, "y": 700 }
|
||||
},
|
||||
{
|
||||
"id": "rollback-handler",
|
||||
"type": "rollback",
|
||||
"name": "Rollback Handler",
|
||||
"config": {
|
||||
"strategy": "to-previous"
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "deploymentJobId", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "jobId" } }
|
||||
],
|
||||
"position": { "x": 300, "y": 450 }
|
||||
}
|
||||
],
|
||||
"edges": [
|
||||
{ "id": "e1", "from": "approval", "to": "security-gate" },
|
||||
{ "id": "e2", "from": "security-gate", "to": "deploy-targets" },
|
||||
{ "id": "e3", "from": "deploy-targets", "to": "health-check" },
|
||||
{ "id": "e4", "from": "health-check", "to": "notify-success" },
|
||||
{ "id": "e5", "from": "deploy-targets", "to": "rollback-handler", "condition": "status === 'failed'" },
|
||||
{ "id": "e6", "from": "health-check", "to": "rollback-handler", "condition": "status === 'failed'" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
See [API Documentation](../api/workflows.md) for full specification.
|
||||
|
||||
```yaml
|
||||
# Workflow Templates
|
||||
POST /api/v1/workflow-templates
|
||||
GET /api/v1/workflow-templates
|
||||
GET /api/v1/workflow-templates/{id}
|
||||
PUT /api/v1/workflow-templates/{id}
|
||||
DELETE /api/v1/workflow-templates/{id}
|
||||
POST /api/v1/workflow-templates/{id}/validate
|
||||
|
||||
# Step Registry
|
||||
GET /api/v1/step-types
|
||||
GET /api/v1/step-types/{type}
|
||||
|
||||
# Workflow Runs
|
||||
POST /api/v1/workflow-runs
|
||||
GET /api/v1/workflow-runs
|
||||
GET /api/v1/workflow-runs/{id}
|
||||
POST /api/v1/workflow-runs/{id}/pause
|
||||
POST /api/v1/workflow-runs/{id}/resume
|
||||
POST /api/v1/workflow-runs/{id}/cancel
|
||||
GET /api/v1/workflow-runs/{id}/steps
|
||||
GET /api/v1/workflow-runs/{id}/steps/{nodeId}
|
||||
GET /api/v1/workflow-runs/{id}/steps/{nodeId}/logs
|
||||
GET /api/v1/workflow-runs/{id}/steps/{nodeId}/artifacts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Module Overview](overview.md)
|
||||
- [Workflow Templates](../workflow/templates.md)
|
||||
- [Execution State Machine](../workflow/execution.md)
|
||||
- [API Documentation](../api/workflows.md)
|
||||
274
docs/modules/release-orchestrator/operations/metrics.md
Normal file
274
docs/modules/release-orchestrator/operations/metrics.md
Normal file
@@ -0,0 +1,274 @@
|
||||
# Metrics Specification
|
||||
|
||||
## Overview
|
||||
|
||||
Release Orchestrator exposes Prometheus-compatible metrics for monitoring deployment health, performance, and operational status.
|
||||
|
||||
## Core Metrics
|
||||
|
||||
### Release Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_releases_total` | counter | Total releases created | `tenant`, `status` |
|
||||
| `stella_releases_active` | gauge | Currently active releases | `tenant`, `status` |
|
||||
| `stella_release_components_count` | histogram | Components per release | `tenant` |
|
||||
|
||||
### Promotion Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_promotions_total` | counter | Total promotions | `tenant`, `env`, `status` |
|
||||
| `stella_promotions_in_progress` | gauge | Promotions currently in progress | `tenant`, `env` |
|
||||
| `stella_promotion_duration_seconds` | histogram | Time from request to completion | `tenant`, `env`, `status` |
|
||||
| `stella_approval_pending_count` | gauge | Pending approvals | `tenant`, `env` |
|
||||
| `stella_approval_duration_seconds` | histogram | Time to approve | `tenant`, `env` |
|
||||
|
||||
### Deployment Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_deployments_total` | counter | Total deployments | `tenant`, `env`, `strategy`, `status` |
|
||||
| `stella_deployment_duration_seconds` | histogram | Deployment duration | `tenant`, `env`, `strategy` |
|
||||
| `stella_deployment_tasks_total` | counter | Total deployment tasks | `tenant`, `status` |
|
||||
| `stella_deployment_task_duration_seconds` | histogram | Task duration | `target_type` |
|
||||
| `stella_rollbacks_total` | counter | Total rollbacks | `tenant`, `env`, `reason` |
|
||||
|
||||
### Agent Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_agents_connected` | gauge | Connected agents | `tenant` |
|
||||
| `stella_agents_by_status` | gauge | Agents by status | `tenant`, `status` |
|
||||
| `stella_agent_tasks_total` | counter | Tasks executed by agents | `agent`, `type`, `status` |
|
||||
| `stella_agent_task_duration_seconds` | histogram | Agent task duration | `agent`, `type` |
|
||||
| `stella_agent_heartbeat_age_seconds` | gauge | Seconds since last heartbeat | `agent` |
|
||||
| `stella_agent_resource_cpu_percent` | gauge | Agent CPU usage | `agent` |
|
||||
| `stella_agent_resource_memory_percent` | gauge | Agent memory usage | `agent` |
|
||||
|
||||
### Workflow Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_workflow_runs_total` | counter | Workflow executions | `tenant`, `template`, `status` |
|
||||
| `stella_workflow_runs_active` | gauge | Currently running workflows | `tenant`, `template` |
|
||||
| `stella_workflow_duration_seconds` | histogram | Workflow duration | `template`, `status` |
|
||||
| `stella_workflow_step_duration_seconds` | histogram | Step execution time | `step_type`, `status` |
|
||||
| `stella_workflow_step_retries_total` | counter | Step retry count | `step_type` |
|
||||
|
||||
### Target Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_targets_total` | gauge | Total targets | `tenant`, `env`, `type` |
|
||||
| `stella_targets_by_health` | gauge | Targets by health status | `tenant`, `env`, `health` |
|
||||
| `stella_target_drift_detected` | gauge | Targets with drift | `tenant`, `env` |
|
||||
|
||||
### Integration Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_integrations_total` | gauge | Configured integrations | `tenant`, `type` |
|
||||
| `stella_integration_health` | gauge | Integration health (1=healthy) | `tenant`, `integration` |
|
||||
| `stella_integration_requests_total` | counter | Requests to integrations | `integration`, `operation`, `status` |
|
||||
| `stella_integration_latency_seconds` | histogram | Integration request latency | `integration`, `operation` |
|
||||
|
||||
### Gate Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_gate_evaluations_total` | counter | Gate evaluations | `tenant`, `gate_type`, `result` |
|
||||
| `stella_gate_evaluation_duration_seconds` | histogram | Gate evaluation time | `gate_type` |
|
||||
| `stella_gate_blocks_total` | counter | Blocked promotions by gate | `tenant`, `gate_type`, `env` |
|
||||
|
||||
## API Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_http_requests_total` | counter | HTTP requests | `method`, `path`, `status` |
|
||||
| `stella_http_request_duration_seconds` | histogram | Request latency | `method`, `path` |
|
||||
| `stella_http_requests_in_flight` | gauge | Active requests | `method` |
|
||||
| `stella_http_request_size_bytes` | histogram | Request size | `method`, `path` |
|
||||
| `stella_http_response_size_bytes` | histogram | Response size | `method`, `path` |
|
||||
|
||||
## Evidence Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_evidence_packets_total` | counter | Evidence packets generated | `tenant`, `type` |
|
||||
| `stella_evidence_packet_size_bytes` | histogram | Evidence packet size | `type` |
|
||||
| `stella_evidence_verification_total` | counter | Evidence verifications | `result` |
|
||||
|
||||
## Prometheus Configuration
|
||||
|
||||
```yaml
|
||||
# prometheus.yml
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: 'stella-orchestrator'
|
||||
static_configs:
|
||||
- targets: ['stella-orchestrator:9090']
|
||||
metrics_path: /metrics
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /etc/prometheus/ca.crt
|
||||
|
||||
- job_name: 'stella-agents'
|
||||
kubernetes_sd_configs:
|
||||
- role: pod
|
||||
selectors:
|
||||
- role: pod
|
||||
label: "app.kubernetes.io/name=stella-agent"
|
||||
relabel_configs:
|
||||
- source_labels: [__meta_kubernetes_pod_label_agent_id]
|
||||
target_label: agent_id
|
||||
```
|
||||
|
||||
## Histogram Buckets
|
||||
|
||||
### Duration Buckets (seconds)
|
||||
|
||||
```yaml
|
||||
# Short operations (API calls, gate evaluations)
|
||||
short_duration_buckets: [0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
|
||||
|
||||
# Medium operations (workflow steps)
|
||||
medium_duration_buckets: [0.1, 0.5, 1, 2.5, 5, 10, 30, 60, 120, 300]
|
||||
|
||||
# Long operations (deployments)
|
||||
long_duration_buckets: [1, 5, 10, 30, 60, 120, 300, 600, 1200, 3600]
|
||||
```
|
||||
|
||||
### Size Buckets (bytes)
|
||||
|
||||
```yaml
|
||||
# Request/response sizes
|
||||
size_buckets: [100, 1000, 10000, 100000, 1000000, 10000000]
|
||||
|
||||
# Evidence packet sizes
|
||||
evidence_buckets: [1000, 10000, 100000, 500000, 1000000, 5000000]
|
||||
```
|
||||
|
||||
## SLI Definitions
|
||||
|
||||
### Availability SLI
|
||||
|
||||
```promql
|
||||
# API availability (99.9% target)
|
||||
sum(rate(stella_http_requests_total{status!~"5.."}[5m]))
|
||||
/
|
||||
sum(rate(stella_http_requests_total[5m]))
|
||||
```
|
||||
|
||||
### Latency SLI
|
||||
|
||||
```promql
|
||||
# API latency P99 < 500ms
|
||||
histogram_quantile(0.99,
|
||||
sum(rate(stella_http_request_duration_seconds_bucket[5m])) by (le)
|
||||
)
|
||||
```
|
||||
|
||||
### Deployment Success SLI
|
||||
|
||||
```promql
|
||||
# Deployment success rate (99% target)
|
||||
sum(rate(stella_deployments_total{status="succeeded"}[24h]))
|
||||
/
|
||||
sum(rate(stella_deployments_total[24h]))
|
||||
```
|
||||
|
||||
## Alert Rules
|
||||
|
||||
```yaml
|
||||
groups:
|
||||
- name: stella-orchestrator
|
||||
rules:
|
||||
- alert: HighDeploymentFailureRate
|
||||
expr: |
|
||||
sum(rate(stella_deployments_total{status="failed"}[1h]))
|
||||
/
|
||||
sum(rate(stella_deployments_total[1h])) > 0.1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: High deployment failure rate
|
||||
description: More than 10% of deployments failing in the last hour
|
||||
|
||||
- alert: AgentOffline
|
||||
expr: stella_agent_heartbeat_age_seconds > 120
|
||||
for: 2m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: Agent {{ $labels.agent }} offline
|
||||
description: Agent has not sent heartbeat for > 2 minutes
|
||||
|
||||
- alert: PendingApprovalsStale
|
||||
expr: |
|
||||
stella_approval_pending_count > 0
|
||||
and
|
||||
time() - stella_promotion_request_timestamp > 3600
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: Stale pending approvals
|
||||
description: Approvals pending for more than 1 hour
|
||||
|
||||
- alert: IntegrationUnhealthy
|
||||
expr: stella_integration_health == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: Integration {{ $labels.integration }} unhealthy
|
||||
description: Integration health check failing
|
||||
|
||||
- alert: HighAPILatency
|
||||
expr: |
|
||||
histogram_quantile(0.99,
|
||||
sum(rate(stella_http_request_duration_seconds_bucket[5m])) by (le, path)
|
||||
) > 1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: High API latency on {{ $labels.path }}
|
||||
description: P99 latency exceeds 1 second
|
||||
```
|
||||
|
||||
## Grafana Dashboards
|
||||
|
||||
### Main Dashboard Panels
|
||||
|
||||
1. **Deployment Pipeline Overview**
|
||||
- Promotions per environment (time series)
|
||||
- Success/failure rates (gauge)
|
||||
- Active deployments (stat)
|
||||
|
||||
2. **Agent Health**
|
||||
- Connected agents (stat)
|
||||
- Agent status distribution (pie chart)
|
||||
- Heartbeat age (table)
|
||||
|
||||
3. **Gate Performance**
|
||||
- Gate evaluation counts (bar chart)
|
||||
- Block rate by gate type (time series)
|
||||
- Evaluation latency (heatmap)
|
||||
|
||||
4. **API Performance**
|
||||
- Request rate (time series)
|
||||
- Error rate (time series)
|
||||
- Latency distribution (heatmap)
|
||||
|
||||
## References
|
||||
|
||||
- [Operations Overview](overview.md)
|
||||
- [Logging](logging.md)
|
||||
- [Tracing](tracing.md)
|
||||
- [Alerting](alerting.md)
|
||||
508
docs/modules/release-orchestrator/operations/overview.md
Normal file
508
docs/modules/release-orchestrator/operations/overview.md
Normal file
@@ -0,0 +1,508 @@
|
||||
# Operations Overview
|
||||
|
||||
## Observability Stack
|
||||
|
||||
Release Orchestrator provides comprehensive observability through metrics, logging, and distributed tracing.
|
||||
|
||||
```
|
||||
OBSERVABILITY ARCHITECTURE
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ RELEASE ORCHESTRATOR │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Metrics │ │ Logs │ │ Traces │ │ Events │ │
|
||||
│ │ Exporter │ │ Collector │ │ Exporter │ │ Publisher │ │
|
||||
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
|
||||
│ │ │ │ │ │
|
||||
└─────────┼────────────────┼────────────────┼────────────────┼────────────────┘
|
||||
│ │ │ │
|
||||
▼ ▼ ▼ ▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ OBSERVABILITY BACKENDS │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Prometheus │ │ Loki / │ │ Jaeger / │ │ Event │ │
|
||||
│ │ / Mimir │ │ Elasticsearch│ │ Tempo │ │ Bus │ │
|
||||
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
|
||||
│ │ │ │ │ │
|
||||
│ └────────────────┴────────────────┴────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────┐ │
|
||||
│ │ Grafana │ │
|
||||
│ │ Dashboards │ │
|
||||
│ └─────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Metrics
|
||||
|
||||
### Core Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_releases_total` | counter | Total releases created | `tenant`, `status` |
|
||||
| `stella_promotions_total` | counter | Total promotions | `tenant`, `env`, `status` |
|
||||
| `stella_deployments_total` | counter | Total deployments | `tenant`, `env`, `strategy` |
|
||||
| `stella_deployment_duration_seconds` | histogram | Deployment duration | `tenant`, `env`, `strategy` |
|
||||
| `stella_rollbacks_total` | counter | Total rollbacks | `tenant`, `env`, `reason` |
|
||||
| `stella_agents_connected` | gauge | Connected agents | `tenant` |
|
||||
| `stella_targets_total` | gauge | Total targets | `tenant`, `env`, `type` |
|
||||
| `stella_workflow_runs_total` | counter | Workflow executions | `tenant`, `template`, `status` |
|
||||
| `stella_workflow_step_duration_seconds` | histogram | Step execution time | `step_type` |
|
||||
| `stella_approval_pending_count` | gauge | Pending approvals | `tenant`, `env` |
|
||||
| `stella_approval_duration_seconds` | histogram | Time to approve | `tenant`, `env` |
|
||||
|
||||
### API Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_http_requests_total` | counter | HTTP requests | `method`, `path`, `status` |
|
||||
| `stella_http_request_duration_seconds` | histogram | Request latency | `method`, `path` |
|
||||
| `stella_http_requests_in_flight` | gauge | Active requests | `method` |
|
||||
|
||||
### Agent Metrics
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|--------|
|
||||
| `stella_agent_tasks_total` | counter | Tasks executed | `agent`, `type`, `status` |
|
||||
| `stella_agent_task_duration_seconds` | histogram | Task duration | `agent`, `type` |
|
||||
| `stella_agent_heartbeat_age_seconds` | gauge | Since last heartbeat | `agent` |
|
||||
|
||||
### Prometheus Configuration
|
||||
|
||||
```yaml
|
||||
# prometheus.yml
|
||||
scrape_configs:
|
||||
- job_name: 'stella-orchestrator'
|
||||
static_configs:
|
||||
- targets: ['stella-orchestrator:9090']
|
||||
metrics_path: /metrics
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /etc/prometheus/ca.crt
|
||||
|
||||
- job_name: 'stella-agents'
|
||||
kubernetes_sd_configs:
|
||||
- role: pod
|
||||
selectors:
|
||||
- role: pod
|
||||
label: "app.kubernetes.io/name=stella-agent"
|
||||
relabel_configs:
|
||||
- source_labels: [__meta_kubernetes_pod_label_agent_id]
|
||||
target_label: agent_id
|
||||
```
|
||||
|
||||
## Logging
|
||||
|
||||
### Log Format
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-01-09T10:30:00.123Z",
|
||||
"level": "info",
|
||||
"message": "Deployment started",
|
||||
"service": "deploy-orchestrator",
|
||||
"version": "1.0.0",
|
||||
"traceId": "abc123def456",
|
||||
"spanId": "789ghi",
|
||||
"tenantId": "tenant-uuid",
|
||||
"correlationId": "corr-uuid",
|
||||
"context": {
|
||||
"deploymentJobId": "job-uuid",
|
||||
"releaseId": "release-uuid",
|
||||
"environmentId": "env-uuid"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Log Levels
|
||||
|
||||
| Level | Usage |
|
||||
|-------|-------|
|
||||
| `error` | Failures requiring attention |
|
||||
| `warn` | Degraded operation, recoverable issues |
|
||||
| `info` | Business events (deployment started, approval granted) |
|
||||
| `debug` | Detailed operational info |
|
||||
| `trace` | Very detailed debugging |
|
||||
|
||||
### Structured Logging Configuration
|
||||
|
||||
```typescript
|
||||
// Logging configuration
|
||||
const loggerConfig = {
|
||||
level: process.env.LOG_LEVEL || 'info',
|
||||
format: 'json',
|
||||
outputs: [
|
||||
{
|
||||
type: 'stdout',
|
||||
format: 'json'
|
||||
},
|
||||
{
|
||||
type: 'file',
|
||||
path: '/var/log/stella/orchestrator.log',
|
||||
rotation: {
|
||||
maxSize: '100MB',
|
||||
maxFiles: 10
|
||||
}
|
||||
}
|
||||
],
|
||||
// Sensitive field masking
|
||||
redact: [
|
||||
'password',
|
||||
'token',
|
||||
'secret',
|
||||
'credentials',
|
||||
'authorization'
|
||||
]
|
||||
};
|
||||
```
|
||||
|
||||
### Important Log Events
|
||||
|
||||
| Event | Level | Description |
|
||||
|-------|-------|-------------|
|
||||
| `deployment.started` | info | Deployment job started |
|
||||
| `deployment.completed` | info | Deployment successful |
|
||||
| `deployment.failed` | error | Deployment failed |
|
||||
| `rollback.initiated` | warn | Rollback triggered |
|
||||
| `approval.granted` | info | Promotion approved |
|
||||
| `approval.denied` | info | Promotion rejected |
|
||||
| `agent.connected` | info | Agent came online |
|
||||
| `agent.disconnected` | warn | Agent went offline |
|
||||
| `security.gate.failed` | warn | Security check blocked |
|
||||
|
||||
## Distributed Tracing
|
||||
|
||||
### Trace Context Propagation
|
||||
|
||||
```typescript
|
||||
// Trace context in requests
|
||||
interface TraceContext {
|
||||
traceId: string;
|
||||
spanId: string;
|
||||
parentSpanId?: string;
|
||||
sampled: boolean;
|
||||
baggage?: Record<string, string>;
|
||||
}
|
||||
|
||||
// W3C Trace Context headers
|
||||
// traceparent: 00-{traceId}-{spanId}-{flags}
|
||||
// tracestate: stella=...
|
||||
|
||||
// Example trace propagation
|
||||
class TracingMiddleware {
|
||||
handle(req: Request, res: Response, next: NextFunction): void {
|
||||
const traceparent = req.headers['traceparent'];
|
||||
const traceContext = this.parseTraceParent(traceparent);
|
||||
|
||||
// Start span for this request
|
||||
const span = this.tracer.startSpan('http.request', {
|
||||
parent: traceContext,
|
||||
attributes: {
|
||||
'http.method': req.method,
|
||||
'http.url': req.url,
|
||||
'http.user_agent': req.headers['user-agent'],
|
||||
'tenant.id': req.tenantId
|
||||
}
|
||||
});
|
||||
|
||||
// Attach to request for downstream use
|
||||
req.span = span;
|
||||
|
||||
res.on('finish', () => {
|
||||
span.setAttribute('http.status_code', res.statusCode);
|
||||
span.end();
|
||||
});
|
||||
|
||||
next();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Key Spans
|
||||
|
||||
| Span Name | Description | Attributes |
|
||||
|-----------|-------------|------------|
|
||||
| `deployment.execute` | Full deployment | `release_id`, `environment` |
|
||||
| `task.dispatch` | Task dispatch to agent | `target_id`, `agent_id` |
|
||||
| `agent.execute` | Agent task execution | `task_type`, `duration` |
|
||||
| `workflow.run` | Workflow execution | `template_id`, `status` |
|
||||
| `workflow.step` | Individual step | `step_type`, `node_id` |
|
||||
| `approval.wait` | Waiting for approval | `promotion_id`, `duration` |
|
||||
| `gate.evaluate` | Gate evaluation | `gate_type`, `result` |
|
||||
|
||||
### Jaeger Configuration
|
||||
|
||||
```yaml
|
||||
# jaeger-config.yaml
|
||||
apiVersion: jaegertracing.io/v1
|
||||
kind: Jaeger
|
||||
metadata:
|
||||
name: stella-jaeger
|
||||
spec:
|
||||
strategy: production
|
||||
collector:
|
||||
maxReplicas: 5
|
||||
storage:
|
||||
type: elasticsearch
|
||||
options:
|
||||
es:
|
||||
server-urls: https://elasticsearch:9200
|
||||
secretName: jaeger-es-secret
|
||||
ingress:
|
||||
enabled: true
|
||||
```
|
||||
|
||||
## Alerting
|
||||
|
||||
### Alert Rules
|
||||
|
||||
```yaml
|
||||
# prometheus-rules.yaml
|
||||
groups:
|
||||
- name: stella.deployment
|
||||
rules:
|
||||
- alert: DeploymentFailureRateHigh
|
||||
expr: |
|
||||
sum(rate(stella_deployments_total{status="failed"}[5m])) /
|
||||
sum(rate(stella_deployments_total[5m])) > 0.1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "High deployment failure rate"
|
||||
description: "More than 10% of deployments are failing"
|
||||
|
||||
- alert: DeploymentDurationHigh
|
||||
expr: |
|
||||
histogram_quantile(0.95, sum(rate(stella_deployment_duration_seconds_bucket[5m])) by (le, tenant)) > 600
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Deployment duration high"
|
||||
description: "P95 deployment duration exceeds 10 minutes"
|
||||
|
||||
- alert: RollbackRateHigh
|
||||
expr: |
|
||||
sum(rate(stella_rollbacks_total[1h])) > 3
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High rollback rate"
|
||||
description: "More than 3 rollbacks in the last hour"
|
||||
|
||||
- name: stella.agents
|
||||
rules:
|
||||
- alert: AgentOffline
|
||||
expr: |
|
||||
stella_agent_heartbeat_age_seconds > 120
|
||||
for: 2m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Agent offline"
|
||||
description: "Agent {{ $labels.agent }} has not sent heartbeat for 2 minutes"
|
||||
|
||||
- alert: AgentPoolLow
|
||||
expr: |
|
||||
count(stella_agents_connected{status="online"}) by (tenant) < 2
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Low agent count"
|
||||
description: "Fewer than 2 agents online for tenant {{ $labels.tenant }}"
|
||||
|
||||
- name: stella.approvals
|
||||
rules:
|
||||
- alert: ApprovalBacklogHigh
|
||||
expr: |
|
||||
stella_approval_pending_count > 10
|
||||
for: 1h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Approval backlog growing"
|
||||
description: "More than 10 pending approvals for over an hour"
|
||||
|
||||
- alert: ApprovalWaitLong
|
||||
expr: |
|
||||
histogram_quantile(0.90, stella_approval_duration_seconds_bucket) > 86400
|
||||
for: 1h
|
||||
labels:
|
||||
severity: info
|
||||
annotations:
|
||||
summary: "Long approval wait times"
|
||||
description: "P90 approval wait time exceeds 24 hours"
|
||||
```
|
||||
|
||||
### PagerDuty Integration
|
||||
|
||||
```typescript
|
||||
interface AlertManagerConfig {
|
||||
receivers: [
|
||||
{
|
||||
name: "stella-critical",
|
||||
pagerduty_configs: [
|
||||
{
|
||||
service_key: "${PAGERDUTY_SERVICE_KEY}",
|
||||
severity: "critical"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
name: "stella-warning",
|
||||
slack_configs: [
|
||||
{
|
||||
api_url: "${SLACK_WEBHOOK_URL}",
|
||||
channel: "#stella-alerts",
|
||||
send_resolved: true
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
route: {
|
||||
receiver: "stella-warning",
|
||||
routes: [
|
||||
{
|
||||
match: { severity: "critical" },
|
||||
receiver: "stella-critical"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Dashboards
|
||||
|
||||
### Deployment Dashboard
|
||||
|
||||
Key panels:
|
||||
- Deployment rate over time
|
||||
- Success/failure ratio
|
||||
- Average deployment duration
|
||||
- Deployment duration histogram
|
||||
- Active deployments by environment
|
||||
- Recent deployment list
|
||||
|
||||
### Agent Health Dashboard
|
||||
|
||||
Key panels:
|
||||
- Connected agents count
|
||||
- Agent heartbeat status
|
||||
- Tasks per agent
|
||||
- Task success rate by agent
|
||||
- Agent resource utilization
|
||||
|
||||
### Approval Dashboard
|
||||
|
||||
Key panels:
|
||||
- Pending approvals count
|
||||
- Approval response time
|
||||
- Approvals by user
|
||||
- Rejection reasons breakdown
|
||||
|
||||
## Health Endpoints
|
||||
|
||||
### Application Health
|
||||
|
||||
```http
|
||||
GET /health
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"version": "1.0.0",
|
||||
"uptime": 86400,
|
||||
"checks": {
|
||||
"database": { "status": "healthy", "latency": 5 },
|
||||
"redis": { "status": "healthy", "latency": 2 },
|
||||
"vault": { "status": "healthy", "latency": 10 }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Readiness Probe
|
||||
|
||||
```http
|
||||
GET /health/ready
|
||||
```
|
||||
|
||||
### Liveness Probe
|
||||
|
||||
```http
|
||||
GET /health/live
|
||||
```
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### Database Connection Pool
|
||||
|
||||
```typescript
|
||||
const poolConfig = {
|
||||
min: 5,
|
||||
max: 20,
|
||||
acquireTimeout: 30000,
|
||||
idleTimeout: 600000,
|
||||
connectionTimeout: 10000
|
||||
};
|
||||
```
|
||||
|
||||
### Cache Configuration
|
||||
|
||||
```typescript
|
||||
const cacheConfig = {
|
||||
// Release cache
|
||||
releases: {
|
||||
ttl: 300, // 5 minutes
|
||||
maxSize: 1000
|
||||
},
|
||||
// Target cache
|
||||
targets: {
|
||||
ttl: 60, // 1 minute
|
||||
maxSize: 5000
|
||||
},
|
||||
// Workflow template cache
|
||||
templates: {
|
||||
ttl: 3600, // 1 hour
|
||||
maxSize: 100
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
```typescript
|
||||
const rateLimitConfig = {
|
||||
// API rate limits
|
||||
api: {
|
||||
windowMs: 60000, // 1 minute
|
||||
max: 1000, // requests per window
|
||||
burst: 100 // burst allowance
|
||||
},
|
||||
// Webhook rate limits
|
||||
webhooks: {
|
||||
windowMs: 60000,
|
||||
max: 100
|
||||
},
|
||||
// Per-tenant limits
|
||||
tenant: {
|
||||
windowMs: 60000,
|
||||
max: 500
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Metrics Reference](metrics.md)
|
||||
- [Logging Guide](logging.md)
|
||||
- [Tracing Setup](tracing.md)
|
||||
- [Alert Configuration](alerting.md)
|
||||
286
docs/modules/release-orchestrator/security/agent-security.md
Normal file
286
docs/modules/release-orchestrator/security/agent-security.md
Normal file
@@ -0,0 +1,286 @@
|
||||
# Agent Security Model
|
||||
|
||||
## Overview
|
||||
|
||||
Agents are trusted components that execute deployment tasks on targets. Their security model ensures:
|
||||
- Strong identity through mTLS certificates
|
||||
- Minimal privilege through scoped task credentials
|
||||
- Audit trail through signed task receipts
|
||||
- Isolation through process sandboxing
|
||||
|
||||
## Agent Registration Flow
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ AGENT REGISTRATION FLOW │
|
||||
│ │
|
||||
│ 1. Admin generates registration token (one-time use) │
|
||||
│ POST /api/v1/admin/agent-tokens │
|
||||
│ Response: { token: "reg_xxx", expiresAt: "..." } │
|
||||
│ │
|
||||
│ 2. Agent starts with registration token │
|
||||
│ ./stella-agent --register --token=reg_xxx │
|
||||
│ │
|
||||
│ 3. Agent requests mTLS certificate │
|
||||
│ POST /api/v1/agents/register │
|
||||
│ Headers: X-Registration-Token: reg_xxx │
|
||||
│ Body: { name, version, capabilities, csr } │
|
||||
│ Response: { agentId, certificate, caCertificate } │
|
||||
│ │
|
||||
│ 4. Agent establishes mTLS connection │
|
||||
│ Uses issued certificate for all subsequent requests │
|
||||
│ │
|
||||
│ 5. Agent requests short-lived JWT for task execution │
|
||||
│ POST /api/v1/agents/token (over mTLS) │
|
||||
│ Response: { token, expiresIn: 3600 } │
|
||||
│ │
|
||||
│ 6. Agent refreshes token before expiration │
|
||||
│ Token refresh only over mTLS connection │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## mTLS Communication
|
||||
|
||||
All agent-to-core communication uses mutual TLS:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ AGENT COMMUNICATION SECURITY │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ AGENT │ │ STELLA CORE │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ │ │ │
|
||||
│ │ mTLS (mutual TLS) │ │
|
||||
│ │ - Agent cert signed by Stella CA │ │
|
||||
│ │ - Server cert verified by Agent │ │
|
||||
│ │ - TLS 1.3 only │ │
|
||||
│ │ - Perfect forward secrecy │ │
|
||||
│ │◄────────────────────────────────────────►│ │
|
||||
│ │ │ │
|
||||
│ │ Encrypted payload │ │
|
||||
│ │ - Task payloads encrypted with │ │
|
||||
│ │ agent-specific key │ │
|
||||
│ │ - Logs encrypted in transit │ │
|
||||
│ │◄────────────────────────────────────────►│ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### TLS Requirements
|
||||
|
||||
| Requirement | Value |
|
||||
|-------------|-------|
|
||||
| Protocol | TLS 1.3 only |
|
||||
| Cipher Suites | TLS_AES_256_GCM_SHA384, TLS_CHACHA20_POLY1305_SHA256 |
|
||||
| Key Exchange | ECDHE with P-384 or X25519 |
|
||||
| Certificate Key | RSA 4096-bit or ECDSA P-384 |
|
||||
| Certificate Validity | 90 days (auto-renewed) |
|
||||
|
||||
## Certificate Management
|
||||
|
||||
### Certificate Structure
|
||||
|
||||
```typescript
|
||||
interface AgentCertificate {
|
||||
subject: {
|
||||
CN: string; // Agent name
|
||||
O: string; // "Stella Ops"
|
||||
OU: string; // Tenant ID
|
||||
};
|
||||
serialNumber: string;
|
||||
issuer: string; // Stella CA
|
||||
validFrom: DateTime;
|
||||
validTo: DateTime;
|
||||
extensions: {
|
||||
keyUsage: ["digitalSignature", "keyEncipherment"];
|
||||
extendedKeyUsage: ["clientAuth"];
|
||||
subjectAltName: string[]; // Agent ID as URI
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Certificate Renewal
|
||||
|
||||
Agents automatically renew certificates before expiration:
|
||||
1. Agent detects certificate expiring within 30 days
|
||||
2. Agent generates new CSR with same identity
|
||||
3. Agent submits renewal request over existing mTLS connection
|
||||
4. Authority issues new certificate
|
||||
5. Agent transitions to new certificate seamlessly
|
||||
|
||||
## Secrets Management
|
||||
|
||||
Secrets are NEVER stored in the Stella database. Only vault references are stored.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ SECRETS FLOW (NEVER STORED IN DB) │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ VAULT │ │ STELLA CORE │ │ AGENT │ │
|
||||
│ │ (Source) │ │ (Broker) │ │ (Consumer) │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ │ │ │ │
|
||||
│ │ │ Task requires secret │ │
|
||||
│ │ │ │ │
|
||||
│ │ Fetch with service │ │ │
|
||||
│ │ account token │ │ │
|
||||
│ │◄─────────────────────── │ │
|
||||
│ │ │ │ │
|
||||
│ │ Return secret │ │ │
|
||||
│ │ (wrapped, short TTL) │ │ │
|
||||
│ │────────────────────────► │ │
|
||||
│ │ │ │ │
|
||||
│ │ │ Embed in task payload │ │
|
||||
│ │ │ (encrypted) │ │
|
||||
│ │ │────────────────────────► │
|
||||
│ │ │ │ │
|
||||
│ │ │ │ Decrypt │
|
||||
│ │ │ │ Use for task │
|
||||
│ │ │ │ Discard │
|
||||
│ │
|
||||
│ Rules: │
|
||||
│ - Secrets NEVER stored in Stella database │
|
||||
│ - Only Vault references stored │
|
||||
│ - Secrets fetched at execution time only │
|
||||
│ - Secrets not logged (masked in logs) │
|
||||
│ - Secrets not persisted in agent memory beyond task scope │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Task Security
|
||||
|
||||
### Task Assignment
|
||||
|
||||
```typescript
|
||||
interface AgentTask {
|
||||
id: UUID;
|
||||
type: TaskType;
|
||||
targetId: UUID;
|
||||
payload: TaskPayload;
|
||||
credentials: EncryptedCredentials; // Encrypted with agent's public key
|
||||
timeout: number;
|
||||
priority: TaskPriority;
|
||||
idempotencyKey: string;
|
||||
assignedAt: DateTime;
|
||||
expiresAt: DateTime;
|
||||
}
|
||||
```
|
||||
|
||||
### Credential Scoping
|
||||
|
||||
Task credentials are:
|
||||
- Scoped to specific target only
|
||||
- Valid only for task duration
|
||||
- Encrypted with agent's public key
|
||||
- Logged when accessed (without values)
|
||||
|
||||
### Task Execution Isolation
|
||||
|
||||
Agents execute tasks with isolation:
|
||||
```typescript
|
||||
interface TaskExecutionContext {
|
||||
// Process isolation
|
||||
workingDirectory: string; // Unique per task
|
||||
processUser: string; // Non-root user
|
||||
networkNamespace: string; // If network isolation enabled
|
||||
|
||||
// Resource limits
|
||||
memoryLimit: number; // Bytes
|
||||
cpuLimit: number; // Millicores
|
||||
diskLimit: number; // Bytes
|
||||
networkEgress: string[]; // Allowed destinations
|
||||
|
||||
// Cleanup
|
||||
cleanupOnComplete: boolean;
|
||||
cleanupTimeout: number;
|
||||
}
|
||||
```
|
||||
|
||||
## Agent Capabilities
|
||||
|
||||
Agents declare capabilities that determine what tasks they can execute:
|
||||
|
||||
```typescript
|
||||
interface AgentCapabilities {
|
||||
docker?: DockerCapability;
|
||||
compose?: ComposeCapability;
|
||||
ssh?: SshCapability;
|
||||
winrm?: WinrmCapability;
|
||||
ecs?: EcsCapability;
|
||||
nomad?: NomadCapability;
|
||||
}
|
||||
|
||||
interface DockerCapability {
|
||||
version: string;
|
||||
apiVersion: string;
|
||||
runtimes: string[];
|
||||
registryAuth: boolean;
|
||||
}
|
||||
|
||||
interface ComposeCapability {
|
||||
version: string;
|
||||
fileFormats: string[];
|
||||
}
|
||||
```
|
||||
|
||||
## Heartbeat Protocol
|
||||
|
||||
```typescript
|
||||
interface AgentHeartbeat {
|
||||
agentId: UUID;
|
||||
timestamp: DateTime;
|
||||
status: "healthy" | "degraded";
|
||||
resourceUsage: {
|
||||
cpuPercent: number;
|
||||
memoryPercent: number;
|
||||
diskPercent: number;
|
||||
networkRxBytes: number;
|
||||
networkTxBytes: number;
|
||||
};
|
||||
activeTaskCount: number;
|
||||
completedTasks: number;
|
||||
failedTasks: number;
|
||||
errors: string[];
|
||||
signature: string; // HMAC of heartbeat data
|
||||
}
|
||||
```
|
||||
|
||||
### Heartbeat Validation
|
||||
|
||||
1. Verify signature matches expected HMAC
|
||||
2. Check timestamp is within acceptable skew (30s)
|
||||
3. Update agent status based on heartbeat content
|
||||
4. Trigger alerts if heartbeat missing for >90s
|
||||
|
||||
## Agent Revocation
|
||||
|
||||
When an agent is compromised or decommissioned:
|
||||
|
||||
1. Certificate added to CRL (Certificate Revocation List)
|
||||
2. All pending tasks for agent cancelled
|
||||
3. Agent removed from target assignments
|
||||
4. Audit event logged
|
||||
5. New agent can be registered with same name (new identity)
|
||||
|
||||
## Security Checklist
|
||||
|
||||
| Control | Implementation |
|
||||
|---------|----------------|
|
||||
| Identity | mTLS certificates signed by internal CA |
|
||||
| Authentication | Certificate-based + short-lived JWT |
|
||||
| Authorization | Task-scoped credentials |
|
||||
| Encryption | TLS 1.3 for transport, envelope encryption for secrets |
|
||||
| Isolation | Process sandboxing, resource limits |
|
||||
| Audit | All task assignments and completions logged |
|
||||
| Revocation | CRL for compromised agents |
|
||||
| Secret handling | Vault integration, no persistence |
|
||||
|
||||
## References
|
||||
|
||||
- [Security Overview](overview.md)
|
||||
- [Authentication & Authorization](auth.md)
|
||||
- [Threat Model](threat-model.md)
|
||||
305
docs/modules/release-orchestrator/security/auth.md
Normal file
305
docs/modules/release-orchestrator/security/auth.md
Normal file
@@ -0,0 +1,305 @@
|
||||
# Authentication & Authorization
|
||||
|
||||
## Authentication Methods
|
||||
|
||||
### OAuth 2.0 for Human Users
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────────────┐
|
||||
│ OAUTH 2.0 AUTHORIZATION CODE FLOW │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────────┐ │
|
||||
│ │ Browser │ │ Authority │ │
|
||||
│ └────┬─────┘ └──────┬───────┘ │
|
||||
│ │ │ │
|
||||
│ │ 1. Login request │ │
|
||||
│ │ ────────────────────────────────────► │ │
|
||||
│ │ │ │
|
||||
│ │ 2. Redirect to IdP │ │
|
||||
│ │ ◄──────────────────────────────────── │ │
|
||||
│ │ │ │
|
||||
│ │ 3. User authenticates at IdP │ │
|
||||
│ │ ─────────────────────────────────► │ │
|
||||
│ │ │ │
|
||||
│ │ 4. IdP callback with code │ │
|
||||
│ │ ◄──────────────────────────────────── │ │
|
||||
│ │ │ │
|
||||
│ │ 5. Exchange code for tokens │ │
|
||||
│ │ ────────────────────────────────────► │ │
|
||||
│ │ │ │
|
||||
│ │ 6. Access token + refresh token │ │
|
||||
│ │ ◄──────────────────────────────────── │ │
|
||||
│ │ │ │
|
||||
└──────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### mTLS for Agents
|
||||
|
||||
Agents authenticate using mutual TLS with certificates issued by Stella's internal CA.
|
||||
|
||||
**Registration Flow:**
|
||||
1. Admin generates one-time registration token
|
||||
2. Agent starts with registration token
|
||||
3. Agent submits CSR (Certificate Signing Request)
|
||||
4. Authority issues certificate signed by Stella CA
|
||||
5. Agent uses certificate for all subsequent requests
|
||||
|
||||
### API Keys for Service-to-Service
|
||||
|
||||
External services can use API keys for programmatic access:
|
||||
- Keys are tenant-scoped
|
||||
- Keys can have restricted permissions
|
||||
- Keys can have expiration dates
|
||||
- Key usage is audited
|
||||
|
||||
## JWT Token Structure
|
||||
|
||||
### Access Token Claims
|
||||
|
||||
```typescript
|
||||
interface AccessTokenClaims {
|
||||
// Standard claims
|
||||
iss: string; // "https://authority.stella.local"
|
||||
sub: string; // User ID
|
||||
aud: string[]; // ["stella-api"]
|
||||
exp: number; // Expiration timestamp
|
||||
iat: number; // Issued at timestamp
|
||||
jti: string; // Unique token ID
|
||||
|
||||
// Custom claims
|
||||
tenant_id: string;
|
||||
roles: string[];
|
||||
permissions: Permission[];
|
||||
email?: string;
|
||||
name?: string;
|
||||
}
|
||||
```
|
||||
|
||||
### Token Lifetimes
|
||||
|
||||
| Token Type | Lifetime | Refresh |
|
||||
|------------|----------|---------|
|
||||
| Access Token | 15 minutes | Via refresh token |
|
||||
| Refresh Token | 7 days | Rotated on use |
|
||||
| Agent Token | 1 hour | Via mTLS connection |
|
||||
| API Key | Configurable | Not refreshed |
|
||||
|
||||
## Authorization Model
|
||||
|
||||
### Resource Types
|
||||
|
||||
```typescript
|
||||
type ResourceType =
|
||||
| "environment"
|
||||
| "release"
|
||||
| "promotion"
|
||||
| "target"
|
||||
| "agent"
|
||||
| "workflow"
|
||||
| "plugin"
|
||||
| "integration"
|
||||
| "evidence";
|
||||
```
|
||||
|
||||
### Action Types
|
||||
|
||||
```typescript
|
||||
type ActionType =
|
||||
| "create"
|
||||
| "read"
|
||||
| "update"
|
||||
| "delete"
|
||||
| "execute"
|
||||
| "approve"
|
||||
| "deploy"
|
||||
| "rollback";
|
||||
```
|
||||
|
||||
### Permission Structure
|
||||
|
||||
```typescript
|
||||
interface Permission {
|
||||
resource: ResourceType;
|
||||
action: ActionType;
|
||||
scope?: PermissionScope;
|
||||
conditions?: Condition[];
|
||||
}
|
||||
|
||||
type PermissionScope =
|
||||
| "*" // All resources
|
||||
| { environmentId: UUID } // Specific environment
|
||||
| { labels: Record<string, string> }; // Label-based
|
||||
```
|
||||
|
||||
### Built-in Roles
|
||||
|
||||
| Role | Description | Key Permissions |
|
||||
|------|-------------|-----------------|
|
||||
| `admin` | Full access | All permissions |
|
||||
| `release_manager` | Manage releases and promotions | Create releases, request promotions |
|
||||
| `deployer` | Execute deployments | Approve promotions (where allowed), view releases |
|
||||
| `approver` | Approve promotions | Approve promotions (SoD respected) |
|
||||
| `viewer` | Read-only access | Read all resources |
|
||||
| `agent` | Agent service account | Execute deployment tasks |
|
||||
|
||||
### Role Definitions
|
||||
|
||||
```typescript
|
||||
const roles = {
|
||||
admin: {
|
||||
permissions: [
|
||||
{ resource: "*", action: "*" }
|
||||
]
|
||||
},
|
||||
release_manager: {
|
||||
permissions: [
|
||||
{ resource: "release", action: "create" },
|
||||
{ resource: "release", action: "read" },
|
||||
{ resource: "release", action: "update" },
|
||||
{ resource: "promotion", action: "create" },
|
||||
{ resource: "promotion", action: "read" },
|
||||
{ resource: "environment", action: "read" },
|
||||
{ resource: "workflow", action: "read" },
|
||||
{ resource: "workflow", action: "execute" }
|
||||
]
|
||||
},
|
||||
deployer: {
|
||||
permissions: [
|
||||
{ resource: "release", action: "read" },
|
||||
{ resource: "promotion", action: "read" },
|
||||
{ resource: "promotion", action: "approve" },
|
||||
{ resource: "environment", action: "read" },
|
||||
{ resource: "target", action: "read" },
|
||||
{ resource: "agent", action: "read" }
|
||||
]
|
||||
},
|
||||
approver: {
|
||||
permissions: [
|
||||
{ resource: "promotion", action: "read" },
|
||||
{ resource: "promotion", action: "approve" },
|
||||
{ resource: "release", action: "read" },
|
||||
{ resource: "environment", action: "read" }
|
||||
]
|
||||
},
|
||||
viewer: {
|
||||
permissions: [
|
||||
{ resource: "*", action: "read" }
|
||||
]
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Environment-Scoped Permissions
|
||||
|
||||
Permissions can be scoped to specific environments:
|
||||
|
||||
```typescript
|
||||
// User can approve promotions only to staging
|
||||
{
|
||||
resource: "promotion",
|
||||
action: "approve",
|
||||
scope: { environmentId: "staging-env-id" }
|
||||
}
|
||||
|
||||
// User can deploy only to targets with specific labels
|
||||
{
|
||||
resource: "target",
|
||||
action: "deploy",
|
||||
scope: { labels: { "tier": "frontend" } }
|
||||
}
|
||||
```
|
||||
|
||||
## Separation of Duties (SoD)
|
||||
|
||||
When SoD is enabled for an environment:
|
||||
- The user who requested a promotion cannot approve it
|
||||
- The user who created a release cannot be the sole approver
|
||||
- Approval records include SoD verification status
|
||||
|
||||
```typescript
|
||||
interface ApprovalValidation {
|
||||
promotionId: UUID;
|
||||
approverId: UUID;
|
||||
requesterId: UUID;
|
||||
sodRequired: boolean;
|
||||
sodSatisfied: boolean;
|
||||
validationResult: "valid" | "self_approval_denied" | "sod_violation";
|
||||
}
|
||||
```
|
||||
|
||||
## Permission Checking Algorithm
|
||||
|
||||
```typescript
|
||||
async function checkPermission(
|
||||
userId: UUID,
|
||||
resource: ResourceType,
|
||||
action: ActionType,
|
||||
resourceId?: UUID
|
||||
): Promise<boolean> {
|
||||
// 1. Get user's roles and direct permissions
|
||||
const userRoles = await getUserRoles(userId);
|
||||
const userPermissions = await getUserPermissions(userId);
|
||||
|
||||
// 2. Expand role permissions
|
||||
const rolePermissions = userRoles.flatMap(r => roles[r].permissions);
|
||||
const allPermissions = [...rolePermissions, ...userPermissions];
|
||||
|
||||
// 3. Check for matching permission
|
||||
for (const perm of allPermissions) {
|
||||
if (matchesResource(perm.resource, resource) &&
|
||||
matchesAction(perm.action, action) &&
|
||||
matchesScope(perm.scope, resourceId) &&
|
||||
evaluateConditions(perm.conditions)) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
function matchesResource(pattern: string, resource: string): boolean {
|
||||
return pattern === "*" || pattern === resource;
|
||||
}
|
||||
|
||||
function matchesAction(pattern: string, action: string): boolean {
|
||||
return pattern === "*" || pattern === action;
|
||||
}
|
||||
```
|
||||
|
||||
## API Authorization Headers
|
||||
|
||||
All API requests require:
|
||||
```http
|
||||
Authorization: Bearer <access_token>
|
||||
```
|
||||
|
||||
For agent requests (over mTLS):
|
||||
```http
|
||||
X-Agent-Id: <agent_id>
|
||||
Authorization: Bearer <agent_token>
|
||||
```
|
||||
|
||||
## Permission Denied Response
|
||||
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"error": {
|
||||
"code": "PERMISSION_DENIED",
|
||||
"message": "User does not have permission to approve promotions to production",
|
||||
"details": {
|
||||
"resource": "promotion",
|
||||
"action": "approve",
|
||||
"scope": { "environmentId": "prod-env-id" },
|
||||
"requiredRoles": ["admin", "approver"],
|
||||
"userRoles": ["viewer"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Security Overview](overview.md)
|
||||
- [Agent Security](agent-security.md)
|
||||
- [Authority Module](../../../authority/architecture.md)
|
||||
281
docs/modules/release-orchestrator/security/overview.md
Normal file
281
docs/modules/release-orchestrator/security/overview.md
Normal file
@@ -0,0 +1,281 @@
|
||||
# Security Architecture Overview
|
||||
|
||||
## Security Principles
|
||||
|
||||
| Principle | Implementation |
|
||||
|-----------|----------------|
|
||||
| **Defense in depth** | Multiple layers: network, auth, authz, audit |
|
||||
| **Least privilege** | Role-based access; minimal permissions |
|
||||
| **Zero trust** | All requests authenticated; mTLS for agents |
|
||||
| **Secrets hygiene** | Secrets in vault; never in DB; ephemeral injection |
|
||||
| **Audit everything** | All mutations logged; evidence trail |
|
||||
| **Immutable evidence** | Evidence packets append-only; cryptographically signed |
|
||||
|
||||
## Authentication Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ AUTHENTICATION ARCHITECTURE │
|
||||
│ │
|
||||
│ Human Users Service/Agent │
|
||||
│ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ Browser │ │ Agent │ │
|
||||
│ └────┬─────┘ └────┬─────┘ │
|
||||
│ │ │ │
|
||||
│ │ OAuth 2.0 │ mTLS + JWT │
|
||||
│ │ Authorization Code │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌──────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ AUTHORITY MODULE │ │
|
||||
│ │ │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
|
||||
│ │ │ OAuth 2.0 │ │ mTLS │ │ API Key │ │ │
|
||||
│ │ │ Provider │ │ Validator │ │ Validator │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ TOKEN ISSUER │ │ │
|
||||
│ │ │ - Short-lived JWT (15 min) │ │ │
|
||||
│ │ │ - Contains: user_id, tenant_id, roles, permissions │ │ │
|
||||
│ │ │ - Signed with RS256 │ │ │
|
||||
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
||||
│ └──────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ API GATEWAY │ │
|
||||
│ │ │ │
|
||||
│ │ - Validate JWT signature │ │
|
||||
│ │ - Check token expiration │ │
|
||||
│ │ - Extract tenant context │ │
|
||||
│ │ - Enforce rate limits │ │
|
||||
│ └──────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Authorization Model
|
||||
|
||||
### Permission Structure
|
||||
|
||||
```typescript
|
||||
interface Permission {
|
||||
resource: ResourceType;
|
||||
action: ActionType;
|
||||
scope?: ScopeType;
|
||||
conditions?: Condition[];
|
||||
}
|
||||
|
||||
type ResourceType =
|
||||
| "environment"
|
||||
| "release"
|
||||
| "promotion"
|
||||
| "target"
|
||||
| "agent"
|
||||
| "workflow"
|
||||
| "plugin"
|
||||
| "integration"
|
||||
| "evidence";
|
||||
|
||||
type ActionType =
|
||||
| "create"
|
||||
| "read"
|
||||
| "update"
|
||||
| "delete"
|
||||
| "execute"
|
||||
| "approve"
|
||||
| "deploy"
|
||||
| "rollback";
|
||||
|
||||
type ScopeType =
|
||||
| "*" // All resources
|
||||
| { environmentId: UUID } // Specific environment
|
||||
| { labels: Record<string, string> }; // Label-based
|
||||
```
|
||||
|
||||
### Role Definitions
|
||||
|
||||
| Role | Permissions |
|
||||
|------|-------------|
|
||||
| `admin` | All permissions on all resources |
|
||||
| `release-manager` | Full access to releases, promotions; read environments/targets |
|
||||
| `deployer` | Read releases; create/read promotions; read targets |
|
||||
| `approver` | Read/approve promotions |
|
||||
| `viewer` | Read-only access to all resources |
|
||||
|
||||
### Environment-Scoped Roles
|
||||
|
||||
Roles can be scoped to specific environments:
|
||||
|
||||
```typescript
|
||||
// Example: Production deployer can only deploy to production
|
||||
const prodDeployer = {
|
||||
role: "deployer",
|
||||
scope: { environmentId: "prod-environment-uuid" }
|
||||
};
|
||||
```
|
||||
|
||||
## Policy Enforcement Points
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ POLICY ENFORCEMENT POINTS │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ API LAYER (PEP 1) │ │
|
||||
│ │ - Authenticate request │ │
|
||||
│ │ - Check resource-level permissions │ │
|
||||
│ │ - Enforce tenant isolation │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ SERVICE LAYER (PEP 2) │ │
|
||||
│ │ - Check business-level permissions │ │
|
||||
│ │ - Validate separation of duties │ │
|
||||
│ │ - Enforce approval policies │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ DECISION ENGINE (PEP 3) │ │
|
||||
│ │ - Evaluate security gates │ │
|
||||
│ │ - Evaluate custom OPA policies │ │
|
||||
│ │ - Produce signed decision records │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ DATA LAYER (PEP 4) │ │
|
||||
│ │ - Row-level security (tenant_id) │ │
|
||||
│ │ - Append-only enforcement (evidence) │ │
|
||||
│ │ - Encryption at rest │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Agent Security Model
|
||||
|
||||
See [Agent Security](agent-security.md) for detailed agent security architecture.
|
||||
|
||||
Key features:
|
||||
- mTLS authentication with CA-signed certificates
|
||||
- One-time registration tokens
|
||||
- Short-lived JWT for task execution
|
||||
- Encrypted task payloads
|
||||
- Scoped credentials per task
|
||||
|
||||
## Secrets Management
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ SECRETS FLOW (NEVER STORED IN DB) │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ VAULT │ │ STELLA CORE │ │ AGENT │ │
|
||||
│ │ (Source) │ │ (Broker) │ │ (Consumer) │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ │ │ │ │
|
||||
│ │ │ Task requires secret │ │
|
||||
│ │ │ │ │
|
||||
│ │ Fetch with service │ │ │
|
||||
│ │ account token │ │ │
|
||||
│ │◄─────────────────────── │ │
|
||||
│ │ │ │ │
|
||||
│ │ Return secret │ │ │
|
||||
│ │ (wrapped, short TTL) │ │ │
|
||||
│ │───────────────────────► │ │
|
||||
│ │ │ │ │
|
||||
│ │ │ Embed in task payload │ │
|
||||
│ │ │ (encrypted) │ │
|
||||
│ │ │───────────────────────► │
|
||||
│ │ │ │ │
|
||||
│ │ │ │ Decrypt │
|
||||
│ │ │ │ Use for task │
|
||||
│ │ │ │ Discard │
|
||||
│ │
|
||||
│ Rules: │
|
||||
│ - Secrets NEVER stored in Stella database │
|
||||
│ - Only Vault references stored │
|
||||
│ - Secrets fetched at execution time only │
|
||||
│ - Secrets not logged (masked in logs) │
|
||||
│ - Secrets not persisted in agent memory beyond task scope │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Threat Model
|
||||
|
||||
| Threat | Attack Vector | Mitigation |
|
||||
|--------|---------------|------------|
|
||||
| **Credential theft** | Database breach | Secrets never in DB; only vault refs |
|
||||
| **Token replay** | Stolen JWT | Short-lived tokens (15 min); refresh tokens rotated |
|
||||
| **Agent impersonation** | Fake agent | mTLS with CA-signed certs; registration token one-time |
|
||||
| **Digest tampering** | Modified image | Digest verification at pull time; mismatch = failure |
|
||||
| **Evidence tampering** | Modified audit records | Append-only table; cryptographic signing |
|
||||
| **Privilege escalation** | Compromised account | Role-based access; SoD enforcement; audit logs |
|
||||
| **Supply chain attack** | Malicious plugin | Plugin sandbox; capability declarations; review process |
|
||||
| **Lateral movement** | Compromised target | Short-lived task credentials; scoped permissions |
|
||||
| **Data exfiltration** | Log/artifact theft | Encryption at rest; network segmentation |
|
||||
| **Denial of service** | Resource exhaustion | Rate limiting; resource quotas; circuit breakers |
|
||||
|
||||
## Audit Trail
|
||||
|
||||
### Audit Event Structure
|
||||
|
||||
```typescript
|
||||
interface AuditEvent {
|
||||
id: UUID;
|
||||
timestamp: DateTime;
|
||||
tenantId: UUID;
|
||||
|
||||
// Actor
|
||||
actorType: "user" | "agent" | "system" | "plugin";
|
||||
actorId: UUID;
|
||||
actorName: string;
|
||||
actorIp?: string;
|
||||
|
||||
// Action
|
||||
action: string; // "promotion.approved", "deployment.started"
|
||||
resource: string; // "promotion"
|
||||
resourceId: UUID;
|
||||
|
||||
// Context
|
||||
environmentId?: UUID;
|
||||
releaseId?: UUID;
|
||||
promotionId?: UUID;
|
||||
|
||||
// Details
|
||||
before?: object; // State before (for updates)
|
||||
after?: object; // State after
|
||||
metadata?: object; // Additional context
|
||||
|
||||
// Integrity
|
||||
previousEventHash: string; // Hash chain for tamper detection
|
||||
eventHash: string;
|
||||
}
|
||||
```
|
||||
|
||||
### Audited Operations
|
||||
|
||||
| Category | Operations |
|
||||
|----------|------------|
|
||||
| **Authentication** | Login, logout, token refresh, failed attempts |
|
||||
| **Authorization** | Permission denied events |
|
||||
| **Environments** | Create, update, delete, freeze window changes |
|
||||
| **Releases** | Create, deprecate, archive |
|
||||
| **Promotions** | Request, approve, reject, cancel |
|
||||
| **Deployments** | Start, complete, fail, rollback |
|
||||
| **Targets** | Register, update, delete, health changes |
|
||||
| **Agents** | Register, heartbeat gaps, capability changes |
|
||||
| **Integrations** | Create, update, delete, test |
|
||||
| **Plugins** | Enable, disable, config changes |
|
||||
| **Evidence** | Create (never update/delete) |
|
||||
|
||||
## References
|
||||
|
||||
- [Authentication & Authorization](auth.md)
|
||||
- [Agent Security](agent-security.md)
|
||||
- [Threat Model](threat-model.md)
|
||||
- [Audit Trail](audit-trail.md)
|
||||
207
docs/modules/release-orchestrator/security/threat-model.md
Normal file
207
docs/modules/release-orchestrator/security/threat-model.md
Normal file
@@ -0,0 +1,207 @@
|
||||
# Threat Model
|
||||
|
||||
## Overview
|
||||
|
||||
This document identifies threats to the Release Orchestrator and their mitigations.
|
||||
|
||||
## Threat Categories
|
||||
|
||||
### T1: Credential Theft
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker gains access to credentials through database breach |
|
||||
| **Attack Vector** | SQL injection, database backup theft, insider threat |
|
||||
| **Assets at Risk** | Registry credentials, vault tokens, SSH keys |
|
||||
| **Mitigation** | Secrets NEVER stored in database; only vault references stored |
|
||||
| **Detection** | Anomalous vault access patterns, failed authentication attempts |
|
||||
|
||||
### T2: Token Replay
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker captures and reuses valid JWT tokens |
|
||||
| **Attack Vector** | Man-in-the-middle, log file exposure, memory dump |
|
||||
| **Assets at Risk** | User sessions, API access |
|
||||
| **Mitigation** | Short-lived tokens (15 min), refresh token rotation, TLS everywhere |
|
||||
| **Detection** | Token used from unusual IP, concurrent sessions |
|
||||
|
||||
### T3: Agent Impersonation
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker registers fake agent to receive deployment tasks |
|
||||
| **Attack Vector** | Stolen registration token, certificate forgery |
|
||||
| **Assets at Risk** | Deployment credentials, target access |
|
||||
| **Mitigation** | One-time registration tokens, mTLS with CA-signed certs |
|
||||
| **Detection** | Registration from unexpected network, capability mismatch |
|
||||
|
||||
### T4: Digest Tampering
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker modifies container image after release creation |
|
||||
| **Attack Vector** | Registry compromise, man-in-the-middle at pull time |
|
||||
| **Assets at Risk** | Application integrity, supply chain |
|
||||
| **Mitigation** | Digest verification at pull time; mismatch = deployment failure |
|
||||
| **Detection** | Pull failures due to digest mismatch |
|
||||
|
||||
### T5: Evidence Tampering
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker modifies audit records to hide malicious activity |
|
||||
| **Attack Vector** | Database admin access, SQL injection |
|
||||
| **Assets at Risk** | Audit integrity, compliance |
|
||||
| **Mitigation** | Append-only table, cryptographic signing, no UPDATE/DELETE |
|
||||
| **Detection** | Signature verification failure, hash chain break |
|
||||
|
||||
### T6: Privilege Escalation
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | User gains permissions beyond their role |
|
||||
| **Attack Vector** | Role assignment exploit, permission bypass |
|
||||
| **Assets at Risk** | Environment access, approval authority |
|
||||
| **Mitigation** | Role-based access, SoD enforcement, audit logs |
|
||||
| **Detection** | Unusual permission patterns, SoD violation attempts |
|
||||
|
||||
### T7: Supply Chain Attack
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Malicious plugin injected into workflow |
|
||||
| **Attack Vector** | Plugin repository compromise, typosquatting |
|
||||
| **Assets at Risk** | All environments, all credentials |
|
||||
| **Mitigation** | Plugin sandbox, capability declarations, signed manifests |
|
||||
| **Detection** | Unexpected network egress, resource anomalies |
|
||||
|
||||
### T8: Lateral Movement
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker uses compromised target to access others |
|
||||
| **Attack Vector** | Target compromise, credential reuse |
|
||||
| **Assets at Risk** | Other targets, environments |
|
||||
| **Mitigation** | Short-lived task credentials, scoped permissions |
|
||||
| **Detection** | Cross-target credential use, unexpected connections |
|
||||
|
||||
### T9: Data Exfiltration
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker extracts logs, artifacts, or configuration |
|
||||
| **Attack Vector** | API abuse, log aggregator compromise |
|
||||
| **Assets at Risk** | Application data, deployment configurations |
|
||||
| **Mitigation** | Encryption at rest, network segmentation, audit logging |
|
||||
| **Detection** | Large data transfers, unusual API patterns |
|
||||
|
||||
### T10: Denial of Service
|
||||
|
||||
| Aspect | Description |
|
||||
|--------|-------------|
|
||||
| **Threat** | Attacker exhausts resources to prevent deployments |
|
||||
| **Attack Vector** | API flooding, workflow loop, agent task spam |
|
||||
| **Assets at Risk** | Service availability |
|
||||
| **Mitigation** | Rate limiting, resource quotas, circuit breakers |
|
||||
| **Detection** | Resource exhaustion alerts, traffic spikes |
|
||||
|
||||
## STRIDE Analysis
|
||||
|
||||
| Category | Threats | Primary Mitigations |
|
||||
|----------|---------|---------------------|
|
||||
| **Spoofing** | T3 Agent Impersonation | mTLS, registration tokens |
|
||||
| **Tampering** | T4 Digest, T5 Evidence | Digest verification, append-only tables |
|
||||
| **Repudiation** | Evidence manipulation | Signed evidence packets |
|
||||
| **Information Disclosure** | T1 Credentials, T9 Exfiltration | Vault integration, encryption |
|
||||
| **Denial of Service** | T10 Resource exhaustion | Rate limits, quotas |
|
||||
| **Elevation of Privilege** | T6 Escalation | RBAC, SoD enforcement |
|
||||
|
||||
## Trust Boundaries
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ TRUST BOUNDARIES │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PUBLIC NETWORK (Untrusted) │ │
|
||||
│ │ │ │
|
||||
│ │ Internet, External Users, External Services │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ │ TLS + Authentication │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ DMZ (Semi-trusted) │ │
|
||||
│ │ │ │
|
||||
│ │ API Gateway, Webhook Gateway │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ │ Internal mTLS │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ INTERNAL NETWORK (Trusted) │ │
|
||||
│ │ │ │
|
||||
│ │ Stella Core Services, Database, Internal Vault │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ │ Agent mTLS │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ DEPLOYMENT NETWORK (Controlled) │ │
|
||||
│ │ │ │
|
||||
│ │ Agents, Targets │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Data Classification
|
||||
|
||||
| Classification | Examples | Protection Requirements |
|
||||
|---------------|----------|------------------------|
|
||||
| **Critical** | Vault credentials, signing keys | Hardware security, minimal access |
|
||||
| **Sensitive** | User tokens, agent certificates | Encryption, access logging |
|
||||
| **Internal** | Release configs, workflow definitions | Encryption at rest |
|
||||
| **Public** | API documentation, release names | Integrity protection |
|
||||
|
||||
## Security Controls Summary
|
||||
|
||||
| Control | Implementation | Threats Addressed |
|
||||
|---------|----------------|-------------------|
|
||||
| mTLS | Agent communication | T3 |
|
||||
| Short-lived tokens | 15-min access tokens | T2 |
|
||||
| Vault integration | No secrets in DB | T1 |
|
||||
| Digest verification | Pull-time validation | T4 |
|
||||
| Append-only tables | Evidence immutability | T5 |
|
||||
| RBAC + SoD | Permission enforcement | T6 |
|
||||
| Plugin sandbox | Resource limits, capability control | T7 |
|
||||
| Scoped credentials | Task-specific access | T8 |
|
||||
| Encryption | At rest and in transit | T9 |
|
||||
| Rate limiting | API and resource quotas | T10 |
|
||||
|
||||
## Incident Response
|
||||
|
||||
### Detection Signals
|
||||
|
||||
| Signal | Indicates | Response |
|
||||
|--------|-----------|----------|
|
||||
| Digest mismatch at pull | T4 Tampering | Halt deployment, investigate registry |
|
||||
| Evidence signature failure | T5 Tampering | Preserve logs, forensic analysis |
|
||||
| Unusual agent registration | T3 Impersonation | Revoke agent, review access |
|
||||
| SoD violation attempt | T6 Escalation | Block action, alert admin |
|
||||
| Plugin network egress | T7 Supply chain | Isolate plugin, review manifest |
|
||||
|
||||
### Response Procedures
|
||||
|
||||
1. **Contain** - Isolate affected component (revoke token, disable agent)
|
||||
2. **Investigate** - Collect logs, evidence packets, audit trail
|
||||
3. **Remediate** - Patch vulnerability, rotate credentials
|
||||
4. **Recover** - Restore service, verify integrity
|
||||
5. **Report** - Document incident, update threat model
|
||||
|
||||
## References
|
||||
|
||||
- [Security Overview](overview.md)
|
||||
- [Agent Security](agent-security.md)
|
||||
- [Audit Trail](audit-trail.md)
|
||||
508
docs/modules/release-orchestrator/test-structure.md
Normal file
508
docs/modules/release-orchestrator/test-structure.md
Normal file
@@ -0,0 +1,508 @@
|
||||
# Test Structure & Guidelines
|
||||
|
||||
> Test organization, categorization, and patterns for Release Orchestrator modules.
|
||||
|
||||
---
|
||||
|
||||
## Test Directory Layout
|
||||
|
||||
Release Orchestrator tests follow the Stella Ops standard test structure:
|
||||
|
||||
```
|
||||
src/ReleaseOrchestrator/
|
||||
├── __Libraries/
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Core/
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Workflow/
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Promotion/
|
||||
│ └── StellaOps.ReleaseOrchestrator.Deploy/
|
||||
├── __Tests/
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Core.Tests/ # Unit tests for Core
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Workflow.Tests/ # Unit tests for Workflow
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Promotion.Tests/ # Unit tests for Promotion
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Deploy.Tests/ # Unit tests for Deploy
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Integration.Tests/ # Integration tests
|
||||
│ └── StellaOps.ReleaseOrchestrator.Acceptance.Tests/ # End-to-end tests
|
||||
└── StellaOps.ReleaseOrchestrator.WebService/
|
||||
```
|
||||
|
||||
**Shared test infrastructure**:
|
||||
```
|
||||
src/__Tests/__Libraries/
|
||||
├── StellaOps.Infrastructure.Postgres.Testing/ # PostgreSQL Testcontainers fixtures
|
||||
└── StellaOps.Testing.Common/ # Common test utilities
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test Categories
|
||||
|
||||
Tests **MUST** be categorized using xUnit traits to enable selective execution:
|
||||
|
||||
### Unit Tests
|
||||
|
||||
```csharp
|
||||
[Trait("Category", "Unit")]
|
||||
public class PromotionValidatorTests
|
||||
{
|
||||
[Fact]
|
||||
public void Validate_MissingReleaseId_ReturnsFalse()
|
||||
{
|
||||
// Arrange
|
||||
var validator = new PromotionValidator();
|
||||
var promotion = new Promotion { ReleaseId = Guid.Empty };
|
||||
|
||||
// Act
|
||||
var result = validator.Validate(promotion);
|
||||
|
||||
// Assert
|
||||
Assert.False(result.IsValid);
|
||||
Assert.Contains("ReleaseId is required", result.Errors);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Characteristics**:
|
||||
- No database, network, or file system access
|
||||
- Fast execution (< 100ms per test)
|
||||
- Isolated from external dependencies
|
||||
- Deterministic and repeatable
|
||||
|
||||
### Integration Tests
|
||||
|
||||
```csharp
|
||||
[Trait("Category", "Integration")]
|
||||
public class PromotionRepositoryTests : IClassFixture<PostgresFixture>
|
||||
{
|
||||
private readonly PostgresFixture _fixture;
|
||||
|
||||
public PromotionRepositoryTests(PostgresFixture fixture)
|
||||
{
|
||||
_fixture = fixture;
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task SaveAsync_ValidPromotion_PersistsToDatabase()
|
||||
{
|
||||
// Arrange
|
||||
await using var connection = _fixture.CreateConnection();
|
||||
var repository = new PromotionRepository(connection, _fixture.TimeProvider);
|
||||
|
||||
var promotion = new Promotion
|
||||
{
|
||||
Id = Guid.NewGuid(),
|
||||
TenantId = _fixture.DefaultTenantId,
|
||||
ReleaseId = Guid.NewGuid(),
|
||||
TargetEnvironmentId = Guid.NewGuid(),
|
||||
Status = PromotionState.PendingApproval,
|
||||
RequestedAt = _fixture.TimeProvider.GetUtcNow(),
|
||||
RequestedBy = Guid.NewGuid()
|
||||
};
|
||||
|
||||
// Act
|
||||
await repository.SaveAsync(promotion, CancellationToken.None);
|
||||
|
||||
// Assert
|
||||
var retrieved = await repository.GetByIdAsync(promotion.Id, CancellationToken.None);
|
||||
Assert.NotNull(retrieved);
|
||||
Assert.Equal(promotion.ReleaseId, retrieved.ReleaseId);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Characteristics**:
|
||||
- Uses Testcontainers for PostgreSQL
|
||||
- Requires Docker to be running
|
||||
- Slower execution (hundreds of ms per test)
|
||||
- Tests data access layer and database constraints
|
||||
|
||||
### Acceptance Tests
|
||||
|
||||
```csharp
|
||||
[Trait("Category", "Acceptance")]
|
||||
public class PromotionWorkflowTests : IClassFixture<WebApplicationFactory<Program>>
|
||||
{
|
||||
private readonly WebApplicationFactory<Program> _factory;
|
||||
private readonly HttpClient _client;
|
||||
|
||||
public PromotionWorkflowTests(WebApplicationFactory<Program> factory)
|
||||
{
|
||||
_factory = factory;
|
||||
_client = factory.CreateClient();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task PromotionWorkflow_EndToEnd_SuccessfullyDeploysRelease()
|
||||
{
|
||||
// Arrange: Create environment, release, and promotion
|
||||
var envId = await CreateEnvironmentAsync("Production");
|
||||
var releaseId = await CreateReleaseAsync("v2.3.1");
|
||||
|
||||
// Act: Request promotion
|
||||
var promotionResponse = await _client.PostAsJsonAsync(
|
||||
"/api/v1/promotions",
|
||||
new { releaseId, targetEnvironmentId = envId });
|
||||
|
||||
promotionResponse.EnsureSuccessStatusCode();
|
||||
var promotion = await promotionResponse.Content.ReadFromJsonAsync<PromotionDto>();
|
||||
|
||||
// Act: Approve promotion
|
||||
var approveResponse = await _client.PostAsync(
|
||||
$"/api/v1/promotions/{promotion.Id}/approve", null);
|
||||
|
||||
approveResponse.EnsureSuccessStatusCode();
|
||||
|
||||
// Assert: Verify deployment completed
|
||||
var status = await GetPromotionStatusAsync(promotion.Id);
|
||||
Assert.Equal("deployed", status.Status);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Characteristics**:
|
||||
- Tests full API surface and workflows
|
||||
- Uses `WebApplicationFactory` for in-memory hosting
|
||||
- Tests end-to-end scenarios
|
||||
- May involve multiple services
|
||||
|
||||
---
|
||||
|
||||
## PostgreSQL Test Fixtures
|
||||
|
||||
### Testcontainers Fixture
|
||||
|
||||
```csharp
|
||||
public class PostgresFixture : IAsyncLifetime
|
||||
{
|
||||
private PostgreSqlContainer? _container;
|
||||
private NpgsqlConnection? _connection;
|
||||
public TimeProvider TimeProvider { get; private set; } = null!;
|
||||
public IGuidGenerator GuidGenerator { get; private set; } = null!;
|
||||
public Guid DefaultTenantId { get; private set; }
|
||||
|
||||
public async Task InitializeAsync()
|
||||
{
|
||||
// Start PostgreSQL container
|
||||
_container = new PostgreSqlBuilder()
|
||||
.WithImage("postgres:16")
|
||||
.WithDatabase("stellaops_test")
|
||||
.WithUsername("postgres")
|
||||
.WithPassword("postgres")
|
||||
.Build();
|
||||
|
||||
await _container.StartAsync();
|
||||
|
||||
// Create connection
|
||||
_connection = new NpgsqlConnection(_container.GetConnectionString());
|
||||
await _connection.OpenAsync();
|
||||
|
||||
// Run migrations
|
||||
await ApplyMigrationsAsync();
|
||||
|
||||
// Setup test infrastructure
|
||||
TimeProvider = new ManualTimeProvider();
|
||||
GuidGenerator = new SequentialGuidGenerator();
|
||||
DefaultTenantId = Guid.Parse("00000000-0000-0000-0000-000000000001");
|
||||
|
||||
// Seed test data
|
||||
await SeedTestDataAsync();
|
||||
}
|
||||
|
||||
public NpgsqlConnection CreateConnection()
|
||||
{
|
||||
if (_container == null)
|
||||
throw new InvalidOperationException("Container not initialized");
|
||||
|
||||
return new NpgsqlConnection(_container.GetConnectionString());
|
||||
}
|
||||
|
||||
private async Task ApplyMigrationsAsync()
|
||||
{
|
||||
// Apply schema migrations
|
||||
await ExecuteSqlFileAsync("schema/release-orchestrator-schema.sql");
|
||||
}
|
||||
|
||||
private async Task SeedTestDataAsync()
|
||||
{
|
||||
// Create default tenant
|
||||
await using var cmd = _connection!.CreateCommand();
|
||||
cmd.CommandText = @"
|
||||
INSERT INTO tenants (id, name, created_at)
|
||||
VALUES (@id, @name, @created_at)
|
||||
ON CONFLICT DO NOTHING";
|
||||
cmd.Parameters.AddWithValue("id", DefaultTenantId);
|
||||
cmd.Parameters.AddWithValue("name", "Test Tenant");
|
||||
cmd.Parameters.AddWithValue("created_at", TimeProvider.GetUtcNow());
|
||||
await cmd.ExecuteNonQueryAsync();
|
||||
}
|
||||
|
||||
public async Task DisposeAsync()
|
||||
{
|
||||
if (_connection != null)
|
||||
{
|
||||
await _connection.DisposeAsync();
|
||||
}
|
||||
|
||||
if (_container != null)
|
||||
{
|
||||
await _container.DisposeAsync();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test Patterns
|
||||
|
||||
### Deterministic Time in Tests
|
||||
|
||||
```csharp
|
||||
public class PromotionTimingTests
|
||||
{
|
||||
[Fact]
|
||||
public void CreatePromotion_SetsCorrectTimestamp()
|
||||
{
|
||||
// Arrange
|
||||
var manualTime = new ManualTimeProvider();
|
||||
manualTime.SetUtcNow(new DateTimeOffset(2026, 1, 10, 14, 30, 0, TimeSpan.Zero));
|
||||
|
||||
var guidGen = new SequentialGuidGenerator();
|
||||
var manager = new PromotionManager(manualTime, guidGen);
|
||||
|
||||
// Act
|
||||
var promotion = manager.CreatePromotion(
|
||||
releaseId: Guid.Parse("00000000-0000-0000-0000-000000000001"),
|
||||
targetEnvId: Guid.Parse("00000000-0000-0000-0000-000000000002")
|
||||
);
|
||||
|
||||
// Assert
|
||||
Assert.Equal(
|
||||
new DateTimeOffset(2026, 1, 10, 14, 30, 0, TimeSpan.Zero),
|
||||
promotion.RequestedAt
|
||||
);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Testing CancellationToken Propagation
|
||||
|
||||
```csharp
|
||||
public class PromotionCancellationTests
|
||||
{
|
||||
[Fact]
|
||||
public async Task ApprovePromotionAsync_CancellationRequested_ThrowsOperationCanceledException()
|
||||
{
|
||||
// Arrange
|
||||
var cts = new CancellationTokenSource();
|
||||
var repository = new Mock<IPromotionRepository>();
|
||||
|
||||
repository
|
||||
.Setup(r => r.GetByIdAsync(It.IsAny<Guid>(), It.IsAny<CancellationToken>()))
|
||||
.Returns(async (Guid id, CancellationToken ct) =>
|
||||
{
|
||||
await Task.Delay(100, ct); // Simulate delay
|
||||
return new Promotion { Id = id };
|
||||
});
|
||||
|
||||
var manager = new PromotionManager(repository.Object, TimeProvider.System, new SystemGuidGenerator());
|
||||
|
||||
// Act & Assert
|
||||
cts.Cancel(); // Cancel before operation completes
|
||||
|
||||
await Assert.ThrowsAsync<OperationCanceledException>(async () =>
|
||||
await manager.ApprovePromotionAsync(Guid.NewGuid(), Guid.NewGuid(), cts.Token)
|
||||
);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Testing Immutability
|
||||
|
||||
```csharp
|
||||
public class ReleaseImmutabilityTests
|
||||
{
|
||||
[Fact]
|
||||
public void GetComponents_ReturnsImmutableCollection()
|
||||
{
|
||||
// Arrange
|
||||
var release = new Release
|
||||
{
|
||||
Components = new Dictionary<string, ComponentDigest>
|
||||
{
|
||||
["api"] = new ComponentDigest("registry.io/api", "sha256:abc123", "v1.0.0")
|
||||
}.ToImmutableDictionary()
|
||||
};
|
||||
|
||||
// Act
|
||||
var components = release.Components;
|
||||
|
||||
// Assert: Attempting to modify throws
|
||||
Assert.Throws<NotSupportedException>(() =>
|
||||
{
|
||||
var mutable = (IDictionary<string, ComponentDigest>)components;
|
||||
mutable["web"] = new ComponentDigest("registry.io/web", "sha256:def456", "v1.0.0");
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Testing Evidence Hash Determinism
|
||||
|
||||
```csharp
|
||||
public class EvidenceHashDeterminismTests
|
||||
{
|
||||
[Fact]
|
||||
public void ComputeEvidenceHash_SameInputs_ProducesSameHash()
|
||||
{
|
||||
// Arrange
|
||||
var decisionRecord = new DecisionRecord
|
||||
{
|
||||
PromotionId = Guid.Parse("00000000-0000-0000-0000-000000000001"),
|
||||
DecidedAt = new DateTimeOffset(2026, 1, 10, 12, 0, 0, TimeSpan.Zero),
|
||||
Outcome = "approved",
|
||||
GateResults = ImmutableArray.Create(
|
||||
new GateResult("security", "pass", null)
|
||||
)
|
||||
};
|
||||
|
||||
// Act: Compute hash multiple times
|
||||
var hash1 = EvidenceHasher.ComputeHash(decisionRecord);
|
||||
var hash2 = EvidenceHasher.ComputeHash(decisionRecord);
|
||||
|
||||
// Assert: Hashes are identical
|
||||
Assert.Equal(hash1, hash2);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Running Tests
|
||||
|
||||
### Run All Tests
|
||||
|
||||
```bash
|
||||
dotnet test src/StellaOps.sln
|
||||
```
|
||||
|
||||
### Run Only Unit Tests
|
||||
|
||||
```bash
|
||||
dotnet test src/StellaOps.sln --filter "Category=Unit"
|
||||
```
|
||||
|
||||
### Run Only Integration Tests
|
||||
|
||||
```bash
|
||||
dotnet test src/StellaOps.sln --filter "Category=Integration"
|
||||
```
|
||||
|
||||
### Run Specific Test Class
|
||||
|
||||
```bash
|
||||
dotnet test --filter "FullyQualifiedName~PromotionValidatorTests"
|
||||
```
|
||||
|
||||
### Run with Coverage
|
||||
|
||||
```bash
|
||||
dotnet test src/StellaOps.sln --collect:"XPlat Code Coverage"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test Data Builders
|
||||
|
||||
Use builder pattern for complex test data:
|
||||
|
||||
```csharp
|
||||
public class PromotionBuilder
|
||||
{
|
||||
private Guid _id = Guid.NewGuid();
|
||||
private Guid _tenantId = Guid.NewGuid();
|
||||
private Guid _releaseId = Guid.NewGuid();
|
||||
private Guid _targetEnvId = Guid.NewGuid();
|
||||
private PromotionState _status = PromotionState.PendingApproval;
|
||||
private DateTimeOffset _requestedAt = DateTimeOffset.UtcNow;
|
||||
|
||||
public PromotionBuilder WithId(Guid id)
|
||||
{
|
||||
_id = id;
|
||||
return this;
|
||||
}
|
||||
|
||||
public PromotionBuilder WithStatus(PromotionState status)
|
||||
{
|
||||
_status = status;
|
||||
return this;
|
||||
}
|
||||
|
||||
public PromotionBuilder WithReleaseId(Guid releaseId)
|
||||
{
|
||||
_releaseId = releaseId;
|
||||
return this;
|
||||
}
|
||||
|
||||
public Promotion Build()
|
||||
{
|
||||
return new Promotion
|
||||
{
|
||||
Id = _id,
|
||||
TenantId = _tenantId,
|
||||
ReleaseId = _releaseId,
|
||||
TargetEnvironmentId = _targetEnvId,
|
||||
Status = _status,
|
||||
RequestedAt = _requestedAt,
|
||||
RequestedBy = Guid.NewGuid()
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// Usage in tests
|
||||
[Fact]
|
||||
public void ApprovePromotion_PendingStatus_TransitionsToApproved()
|
||||
{
|
||||
var promotion = new PromotionBuilder()
|
||||
.WithStatus(PromotionState.PendingApproval)
|
||||
.Build();
|
||||
|
||||
// ... test logic
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Code Coverage Requirements
|
||||
|
||||
- **Unit tests**: Aim for 80%+ coverage of business logic
|
||||
- **Integration tests**: Cover all data access paths and constraints
|
||||
- **Acceptance tests**: Cover critical user journeys
|
||||
|
||||
**Exclusions from coverage**:
|
||||
- Program.cs / Startup.cs configuration code
|
||||
- DTOs and simple data classes
|
||||
- Generated code
|
||||
|
||||
---
|
||||
|
||||
## Summary Checklist
|
||||
|
||||
Before merging:
|
||||
|
||||
- [ ] All tests categorized with `[Trait("Category", "...")]`
|
||||
- [ ] Unit tests use `TimeProvider` and `IGuidGenerator` for determinism
|
||||
- [ ] Integration tests use `PostgresFixture` with Testcontainers
|
||||
- [ ] `CancellationToken` propagation tested where applicable
|
||||
- [ ] Evidence hash determinism verified
|
||||
- [ ] No test reimplements production logic
|
||||
- [ ] All tests pass locally and in CI
|
||||
- [ ] Code coverage meets requirements
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Implementation Guide](./implementation-guide.md) — .NET implementation patterns
|
||||
- [CLAUDE.md](../../../CLAUDE.md) — Stella Ops coding rules
|
||||
- [PostgreSQL Testing Guide](../../infrastructure/Postgres.Testing/README.md) — Testcontainers setup
|
||||
- [src/__Tests/AGENTS.md](../../../src/__Tests/AGENTS.md) — Global test infrastructure
|
||||
332
docs/modules/release-orchestrator/ui/overview.md
Normal file
332
docs/modules/release-orchestrator/ui/overview.md
Normal file
@@ -0,0 +1,332 @@
|
||||
# UI Overview
|
||||
|
||||
## Status
|
||||
|
||||
**Planned** - UI implementation has not started.
|
||||
|
||||
## Design Principles
|
||||
|
||||
| Principle | Implementation |
|
||||
|-----------|----------------|
|
||||
| **Clarity** | Clear status indicators, intuitive navigation |
|
||||
| **Real-time** | Live updates via WebSocket for deployments |
|
||||
| **Actionable** | One-click approvals, quick actions |
|
||||
| **Audit-friendly** | Full history visibility, evidence access |
|
||||
| **Mobile-aware** | Responsive design for on-call scenarios |
|
||||
|
||||
## Main Screens
|
||||
|
||||
### Dashboard
|
||||
|
||||
The main dashboard provides an at-a-glance view of deployment health across environments.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ RELEASE ORCHESTRATOR [User] [Settings] │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ ENVIRONMENT PIPELINE │ │
|
||||
│ │ │ │
|
||||
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
|
||||
│ │ │ DEV │───►│ STAGING │───►│ UAT │───►│ PROD │ │ │
|
||||
│ │ │ v1.5.0 │ │ v1.4.2 │ │ v1.4.1 │ │ v1.4.0 │ │ │
|
||||
│ │ │ 3/3 OK │ │ 2/2 OK │ │ 2/2 OK │ │ 5/5 OK │ │ │
|
||||
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────────────────────┐ ┌──────────────────────────────┐ │
|
||||
│ │ PENDING APPROVALS (3) │ │ RECENT DEPLOYMENTS │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ ● myapp → prod [Approve] │ │ ✓ api v1.5.0 → dev 2m │ │
|
||||
│ │ Requested by: John │ │ ✓ web v1.4.2 → staging 15m │ │
|
||||
│ │ 2 hours ago │ │ ✗ api v1.4.1 → uat 1h │ │
|
||||
│ │ │ │ ✓ web v1.4.0 → prod 2h │ │
|
||||
│ │ ● web → uat [Approve] │ │ │ │
|
||||
│ │ Requested by: Jane │ │ [View All] │ │
|
||||
│ │ 30 minutes ago │ │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ └──────────────────────────────┘ └──────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────────────────────┐ ┌──────────────────────────────┐ │
|
||||
│ │ AGENT STATUS │ │ ACTIVE WORKFLOWS │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ ● 12 Online │ │ ● Deploy api v1.5.0 │ │
|
||||
│ │ ○ 1 Offline │ │ Step: Health Check (3/5) │ │
|
||||
│ │ ◐ 2 Degraded │ │ │ │
|
||||
│ │ │ │ ● Promote web to UAT │ │
|
||||
│ │ [View Details] │ │ Step: Awaiting Approval │ │
|
||||
│ │ │ │ │ │
|
||||
│ └──────────────────────────────┘ └──────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Releases View
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ RELEASES [+ Create Release] │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Filter: [All ▼] Status: [All ▼] Search: [________________] │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ NAME STATUS COMPONENTS ENVIRONMENTS CREATED │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ myapp-v1.5.0 Ready 3 dev 2h ago │ │
|
||||
│ │ myapp-v1.4.2 Deployed 3 staging, uat 1d ago │ │
|
||||
│ │ myapp-v1.4.1 Deployed 3 prod 3d ago │ │
|
||||
│ │ myapp-v1.4.0 Deprecated 3 - 1w ago │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ RELEASE DETAIL: myapp-v1.5.0 [Promote ▼] │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ │ │
|
||||
│ │ Components: │ │
|
||||
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ api sha256:abc123... registry.io/myorg/api │ │ │
|
||||
│ │ │ web sha256:def456... registry.io/myorg/web │ │ │
|
||||
│ │ │ worker sha256:ghi789... registry.io/myorg/worker │ │ │
|
||||
│ │ └────────────────────────────────────────────────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ Source: https://github.com/myorg/myapp @ v1.5.0 │ │
|
||||
│ │ Created: 2h ago by john@example.com │ │
|
||||
│ │ │ │
|
||||
│ │ Promotion History: │ │
|
||||
│ │ dev (✓) → staging (pending) → uat (-) → prod (-) │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Promotion Detail
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ PROMOTION: myapp-v1.5.0 → production │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Status: PENDING APPROVAL [Approve] [Reject] │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ GATE EVALUATION │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ │ │
|
||||
│ │ ✓ Security Gate Passed │ │
|
||||
│ │ No critical vulnerabilities │ │
|
||||
│ │ │ │
|
||||
│ │ ✓ Freeze Window Check Passed │ │
|
||||
│ │ No active freeze windows │ │
|
||||
│ │ │ │
|
||||
│ │ ◐ Approval Gate 1/2 Approvals │ │
|
||||
│ │ Jane approved 30m ago │ │
|
||||
│ │ Waiting for 1 more approval │ │
|
||||
│ │ │ │
|
||||
│ │ ○ Separation of Duties Pending │ │
|
||||
│ │ Requester: John (cannot approve) │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PROMOTION TIMELINE │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ │ │
|
||||
│ │ 10:00 John requested promotion │ │
|
||||
│ │ 10:05 Security gate evaluated: PASSED │ │
|
||||
│ │ 10:05 Freeze check: PASSED │ │
|
||||
│ │ 10:30 Jane approved │ │
|
||||
│ │ 11:00 Waiting for additional approval... │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Workflow Editor
|
||||
|
||||
Visual editor for creating and modifying workflow templates.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ WORKFLOW EDITOR: standard-deploy [Save] [Run] │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────┐ ┌─────────────────────────────────────────────────┐ │
|
||||
│ │ STEP PALETTE │ │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ Control │ │ ┌──────────┐ │ │
|
||||
│ │ ├─ Approval │ │ │ Approval │ │ │
|
||||
│ │ ├─ Wait │ │ │ Gate │ │ │
|
||||
│ │ └─ Condition │ │ └────┬─────┘ │ │
|
||||
│ │ │ │ │ │ │
|
||||
│ │ Gates │ │ ▼ │ │
|
||||
│ │ ├─ Security │ │ ┌──────────┐ │ │
|
||||
│ │ ├─ Freeze │ │ │ Security │ │ │
|
||||
│ │ └─ Custom │ │ │ Gate │ │ │
|
||||
│ │ │ │ └────┬─────┘ │ │
|
||||
│ │ Deploy │ │ │ │ │
|
||||
│ │ ├─ Docker │ │ ▼ │ │
|
||||
│ │ ├─ Compose │ │ ┌──────────┐ │ │
|
||||
│ │ └─ ECS │ │ │ Deploy │ │ │
|
||||
│ │ │ │ │ Targets │ │ │
|
||||
│ │ Verify │ │ └────┬─────┘ │ │
|
||||
│ │ ├─ Health │ │ │ │ │
|
||||
│ │ └─ Smoke Test │ │ ┌────┴────┐ │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ Notify │ │ ▼ ▼ │ │
|
||||
│ │ ├─ Slack │ │ ┌──────┐ ┌──────────┐ │ │
|
||||
│ │ └─ Email │ │ │Health│ │ Rollback │◄──[on failure] │ │
|
||||
│ │ │ │ │Check │ │ Handler │ │ │
|
||||
│ │ │ │ └──┬───┘ └────┬─────┘ │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ │ │ ▼ ▼ │ │
|
||||
│ │ │ │ ┌──────┐ ┌──────────┐ │ │
|
||||
│ │ │ │ │Notify│ │ Notify │ │ │
|
||||
│ │ │ │ │Success│ │ Failure │ │ │
|
||||
│ │ │ │ └──────┘ └──────────┘ │ │
|
||||
│ │ │ │ │ │
|
||||
│ └─────────────────┘ └─────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ STEP PROPERTIES: Deploy Targets │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ Type: deploy-compose │ │
|
||||
│ │ Strategy: [Rolling ▼] │ │
|
||||
│ │ Parallelism: [2] │ │
|
||||
│ │ Timeout: [600] seconds │ │
|
||||
│ │ On Failure: [Rollback ▼] │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Deployment Live View
|
||||
|
||||
Real-time view of an active deployment.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ DEPLOYMENT: myapp-v1.5.0 → production [Abort]│
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Status: RUNNING Progress: ████████░░ 80% │
|
||||
│ Strategy: Rolling (batch 4/5) Duration: 5m 23s │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ TARGET STATUS │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ │ │
|
||||
│ │ ✓ prod-host-1 sha256:abc123 Deployed Health: OK │ │
|
||||
│ │ ✓ prod-host-2 sha256:abc123 Deployed Health: OK │ │
|
||||
│ │ ✓ prod-host-3 sha256:abc123 Deployed Health: OK │ │
|
||||
│ │ ● prod-host-4 sha256:abc123 Deploying Health: Checking... │ │
|
||||
│ │ ○ prod-host-5 - Pending Health: - │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ LIVE LOGS: prod-host-4 │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ 10:25:15 Pulling image sha256:abc123... │ │
|
||||
│ │ 10:25:18 Image pulled successfully │ │
|
||||
│ │ 10:25:19 Stopping existing container... │ │
|
||||
│ │ 10:25:20 Starting new container... │ │
|
||||
│ │ 10:25:21 Container started │ │
|
||||
│ │ 10:25:22 Running health check... │ │
|
||||
│ │ 10:25:25 Health check passed (1/3) │ │
|
||||
│ │ 10:25:28 Health check passed (2/3) │ │
|
||||
│ │ ... │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Environment Management
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ ENVIRONMENTS [+ Add Environment] │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ NAME ORDER TARGETS CURRENT RELEASE APPROVALS STATUS │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ development 1 3 myapp-v1.5.0 0 Active │ │
|
||||
│ │ staging 2 2 myapp-v1.4.2 1 Active │ │
|
||||
│ │ uat 3 2 myapp-v1.4.1 1 Active │ │
|
||||
│ │ production 4 5 myapp-v1.4.0 2 + SoD Active │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ ENVIRONMENT DETAIL: production [Edit] │ │
|
||||
│ ├─────────────────────────────────────────────────────────────────────┤ │
|
||||
│ │ │ │
|
||||
│ │ Approval Policy: │ │
|
||||
│ │ - Required approvals: 2 │ │
|
||||
│ │ - Separation of duties: Enabled │ │
|
||||
│ │ - Approver roles: release-manager, tech-lead │ │
|
||||
│ │ │ │
|
||||
│ │ Freeze Windows: │ │
|
||||
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ Holiday Freeze Dec 20 - Jan 5 Active [Remove] │ │ │
|
||||
│ │ │ Weekend Freeze Sat-Sun Active [Remove] │ │ │
|
||||
│ │ └────────────────────────────────────────────────────────────┘ │ │
|
||||
│ │ [+ Add Freeze Window] │ │
|
||||
│ │ │ │
|
||||
│ │ Targets: │ │
|
||||
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ prod-host-1 docker_host healthy sha256:abc... │ │ │
|
||||
│ │ │ prod-host-2 docker_host healthy sha256:abc... │ │ │
|
||||
│ │ │ prod-host-3 docker_host healthy sha256:abc... │ │ │
|
||||
│ │ │ prod-host-4 docker_host healthy sha256:abc... │ │ │
|
||||
│ │ │ prod-host-5 docker_host degraded sha256:abc... │ │ │
|
||||
│ │ └────────────────────────────────────────────────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Key Interactions
|
||||
|
||||
### Approval Flow
|
||||
|
||||
1. User sees pending approval notification on dashboard
|
||||
2. Clicks to view promotion detail
|
||||
3. Reviews gate evaluation results and change details
|
||||
4. Clicks "Approve" or "Reject" with optional comment
|
||||
5. System validates SoD requirements
|
||||
6. Promotion advances or notification sent
|
||||
|
||||
### Quick Promote
|
||||
|
||||
1. From release detail, user clicks "Promote"
|
||||
2. Selects target environment from dropdown
|
||||
3. Confirms promotion request
|
||||
4. System evaluates gates immediately
|
||||
5. If auto-approved, deployment begins
|
||||
6. If approval required, notification sent to approvers
|
||||
|
||||
### Emergency Rollback
|
||||
|
||||
1. From deployment history or alert, user clicks "Rollback"
|
||||
2. System shows previous healthy version
|
||||
3. User confirms rollback
|
||||
4. System creates rollback deployment job
|
||||
5. Real-time progress shown
|
||||
|
||||
## Mobile Considerations
|
||||
|
||||
- Responsive design for smaller screens
|
||||
- Critical actions (approve/reject) accessible on mobile
|
||||
- Push notifications for pending approvals
|
||||
- Simplified views for monitoring on-the-go
|
||||
|
||||
## References
|
||||
|
||||
- [API Overview](../api/overview.md)
|
||||
- [Workflow Templates](../workflow/templates.md)
|
||||
591
docs/modules/release-orchestrator/workflow/execution.md
Normal file
591
docs/modules/release-orchestrator/workflow/execution.md
Normal file
@@ -0,0 +1,591 @@
|
||||
# Workflow Execution
|
||||
|
||||
## Overview
|
||||
|
||||
The Workflow Engine executes workflow templates as DAGs (Directed Acyclic Graphs) of steps, managing state transitions, parallelism, retries, and failure handling.
|
||||
|
||||
## Execution Architecture
|
||||
|
||||
```
|
||||
WORKFLOW EXECUTION ARCHITECTURE
|
||||
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ WORKFLOW ENGINE │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ WORKFLOW RUNNER │ │
|
||||
│ │ │ │
|
||||
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
|
||||
│ │ │ Template │───►│ Execution │───►│ Context │ │ │
|
||||
│ │ │ Parser │ │ Planner │ │ Builder │ │ │
|
||||
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
|
||||
│ │ │ │ │ │ │
|
||||
│ │ └────────────────┼─────────────────┘ │ │
|
||||
│ │ ▼ │ │
|
||||
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ DAG EXECUTOR │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │
|
||||
│ │ │ │ Ready │ │ Running │ │ Waiting │ │ Completed│ │ │ │
|
||||
│ │ │ │ Queue │ │ Set │ │ Set │ │ Set │ │ │ │
|
||||
│ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │
|
||||
│ │ │ │ STEP DISPATCHER │ │ │ │
|
||||
│ │ │ └──────────────────────────────────────────────────────┘ │ │ │
|
||||
│ │ └─────────────────────────────────────────────────────────────┘ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ STEP EXECUTOR POOL │ │
|
||||
│ │ │ │
|
||||
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
|
||||
│ │ │ Executor 1 │ │ Executor 2 │ │ Executor 3 │ │ Executor N │ │ │
|
||||
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Workflow Run State Machine
|
||||
|
||||
```
|
||||
WORKFLOW RUN STATES
|
||||
|
||||
┌──────────┐
|
||||
│ CREATED │
|
||||
└────┬─────┘
|
||||
│ start()
|
||||
▼
|
||||
┌──────────┐
|
||||
│ RUNNING │◄──────────────────┐
|
||||
└────┬─────┘ │
|
||||
│ │
|
||||
┌───────────────────┼───────────────────┐ │
|
||||
│ │ │ │
|
||||
▼ ▼ ▼ │
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐│
|
||||
│ WAITING │ │ PAUSED │ │ FAILING ││
|
||||
│ APPROVAL │ │ │ │ ││
|
||||
└────┬─────┘ └────┬─────┘ └────┬─────┘│
|
||||
│ │ │ │
|
||||
│ approve() │ resume() │ │
|
||||
│ │ │ │
|
||||
└───────────────►──┴──────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────┘
|
||||
│
|
||||
┌───────────────────────┼───────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
│COMPLETED │ │ FAILED │ │ CANCELLED│
|
||||
└──────────┘ └──────────┘ └──────────┘
|
||||
```
|
||||
|
||||
### State Transitions
|
||||
|
||||
| Current State | Event | Next State | Description |
|
||||
|---------------|-------|------------|-------------|
|
||||
| `created` | `start()` | `running` | Begin workflow execution |
|
||||
| `running` | Step requires approval | `waiting_approval` | Pause for human approval |
|
||||
| `running` | `pause()` | `paused` | Manual pause requested |
|
||||
| `running` | Step fails | `failing` | Handle failure path |
|
||||
| `running` | All steps complete | `completed` | Workflow success |
|
||||
| `waiting_approval` | `approve()` | `running` | Resume after approval |
|
||||
| `waiting_approval` | `reject()` | `failed` | Rejection ends workflow |
|
||||
| `paused` | `resume()` | `running` | Resume execution |
|
||||
| `paused` | `cancel()` | `cancelled` | Cancel workflow |
|
||||
| `failing` | Rollback complete | `failed` | Failure handling done |
|
||||
| `failing` | Rollback succeeds | `running` | Resume with fallback |
|
||||
|
||||
## Step Execution State Machine
|
||||
|
||||
```
|
||||
STEP STATES
|
||||
|
||||
┌──────────┐
|
||||
│ PENDING │
|
||||
└────┬─────┘
|
||||
│ schedule()
|
||||
▼
|
||||
┌──────────┐
|
||||
│ QUEUED │
|
||||
└────┬─────┘
|
||||
│ dispatch()
|
||||
▼
|
||||
┌──────────┐
|
||||
│ RUNNING │◄─────────┐
|
||||
└────┬─────┘ │
|
||||
│ │ retry()
|
||||
┌───────────────────┼───────────────┐│
|
||||
│ │ ││
|
||||
▼ ▼ ▼│
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
│SUCCEEDED │ │ FAILED │ │ RETRYING │
|
||||
└──────────┘ └────┬─────┘ └──────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ FAILURE HANDLER │
|
||||
│ ┌───────────────┐ │
|
||||
│ │ fail │──┼─► Mark workflow failing
|
||||
│ │ continue │──┼─► Continue to next step
|
||||
│ │ rollback │──┼─► Trigger rollback path
|
||||
│ │ goto:{nodeId} │──┼─► Jump to specific node
|
||||
│ └───────────────┘ │
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
### Step States
|
||||
|
||||
| State | Description |
|
||||
|-------|-------------|
|
||||
| `pending` | Step not yet ready (dependencies incomplete) |
|
||||
| `queued` | Ready for execution, waiting for executor |
|
||||
| `running` | Currently executing |
|
||||
| `succeeded` | Completed successfully |
|
||||
| `failed` | Failed after all retries exhausted |
|
||||
| `retrying` | Failed, waiting for retry |
|
||||
| `skipped` | Condition evaluated to false |
|
||||
|
||||
## DAG Execution Algorithm
|
||||
|
||||
```python
|
||||
class DAGExecutor:
|
||||
def __init__(self, workflow_run: WorkflowRun):
|
||||
self.run = workflow_run
|
||||
self.template = workflow_run.template
|
||||
self.pending = set(node.id for node in template.nodes)
|
||||
self.running = set()
|
||||
self.completed = set()
|
||||
self.failed = set()
|
||||
self.outputs = {} # nodeId -> outputs
|
||||
|
||||
async def execute(self):
|
||||
"""Main execution loop."""
|
||||
self.run.status = WorkflowStatus.RUNNING
|
||||
self.run.started_at = datetime.utcnow()
|
||||
|
||||
while self.pending or self.running:
|
||||
# Find ready nodes (all dependencies satisfied)
|
||||
ready = self.find_ready_nodes()
|
||||
|
||||
# Dispatch ready nodes
|
||||
for node_id in ready:
|
||||
asyncio.create_task(self.execute_node(node_id))
|
||||
self.pending.remove(node_id)
|
||||
self.running.add(node_id)
|
||||
|
||||
# Wait for any node to complete
|
||||
if self.running:
|
||||
await self.wait_for_completion()
|
||||
|
||||
# Check for deadlock
|
||||
if not ready and self.pending and not self.running:
|
||||
raise DeadlockException(self.pending)
|
||||
|
||||
# Determine final status
|
||||
if self.failed:
|
||||
self.run.status = WorkflowStatus.FAILED
|
||||
else:
|
||||
self.run.status = WorkflowStatus.COMPLETED
|
||||
|
||||
self.run.completed_at = datetime.utcnow()
|
||||
|
||||
def find_ready_nodes(self) -> List[str]:
|
||||
"""Find nodes whose dependencies are all complete."""
|
||||
ready = []
|
||||
for node_id in self.pending:
|
||||
node = self.template.get_node(node_id)
|
||||
|
||||
# Check condition
|
||||
if node.condition:
|
||||
if not self.evaluate_condition(node.condition):
|
||||
self.mark_skipped(node_id)
|
||||
continue
|
||||
|
||||
# Check all incoming edges
|
||||
incoming = self.template.get_incoming_edges(node_id)
|
||||
dependencies_met = all(
|
||||
edge.from_node in self.completed
|
||||
for edge in incoming
|
||||
if self.evaluate_edge_condition(edge)
|
||||
)
|
||||
|
||||
if dependencies_met:
|
||||
ready.append(node_id)
|
||||
|
||||
return ready
|
||||
|
||||
async def execute_node(self, node_id: str):
|
||||
"""Execute a single node."""
|
||||
node = self.template.get_node(node_id)
|
||||
step_run = StepRun(
|
||||
workflow_run_id=self.run.id,
|
||||
node_id=node_id,
|
||||
status=StepStatus.RUNNING
|
||||
)
|
||||
|
||||
try:
|
||||
# Resolve inputs
|
||||
inputs = self.resolve_inputs(node)
|
||||
|
||||
# Get step executor
|
||||
executor = self.step_registry.get_executor(node.type)
|
||||
|
||||
# Execute with timeout
|
||||
async with asyncio.timeout(node.timeout):
|
||||
outputs = await executor.execute(inputs, node.config)
|
||||
|
||||
# Store outputs
|
||||
self.outputs[node_id] = outputs
|
||||
step_run.outputs = outputs
|
||||
step_run.status = StepStatus.SUCCEEDED
|
||||
|
||||
self.running.remove(node_id)
|
||||
self.completed.add(node_id)
|
||||
|
||||
except Exception as e:
|
||||
await self.handle_step_failure(node, step_run, e)
|
||||
|
||||
async def handle_step_failure(self, node, step_run, error):
|
||||
"""Handle step failure according to retry and failure policies."""
|
||||
step_run.attempt_number += 1
|
||||
|
||||
# Check retry policy
|
||||
if step_run.attempt_number <= node.retry_policy.max_retries:
|
||||
if self.is_retryable(error, node.retry_policy):
|
||||
step_run.status = StepStatus.RETRYING
|
||||
delay = self.calculate_backoff(node.retry_policy, step_run.attempt_number)
|
||||
await asyncio.sleep(delay)
|
||||
await self.execute_node(node.id) # Retry
|
||||
return
|
||||
|
||||
# No more retries - handle failure
|
||||
step_run.status = StepStatus.FAILED
|
||||
step_run.error = str(error)
|
||||
|
||||
match node.on_failure:
|
||||
case "fail":
|
||||
self.run.status = WorkflowStatus.FAILING
|
||||
self.failed.add(node.id)
|
||||
case "continue":
|
||||
self.completed.add(node.id) # Continue as if succeeded
|
||||
case "rollback":
|
||||
await self.trigger_rollback(node)
|
||||
case _ if node.on_failure.startswith("goto:"):
|
||||
target = node.on_failure.split(":")[1]
|
||||
self.pending.add(target) # Add target to pending
|
||||
|
||||
self.running.remove(node.id)
|
||||
```
|
||||
|
||||
## Input Resolution
|
||||
|
||||
Inputs to steps can come from multiple sources:
|
||||
|
||||
```typescript
|
||||
interface InputResolver {
|
||||
resolve(binding: InputBinding, context: ExecutionContext): any;
|
||||
}
|
||||
|
||||
class StandardInputResolver implements InputResolver {
|
||||
resolve(binding: InputBinding, context: ExecutionContext): any {
|
||||
switch (binding.source.type) {
|
||||
case "literal":
|
||||
return binding.source.value;
|
||||
|
||||
case "context":
|
||||
// Navigate context path: "release.name" -> context.release.name
|
||||
return this.navigatePath(context, binding.source.path);
|
||||
|
||||
case "output":
|
||||
// Get output from previous step
|
||||
const stepOutputs = context.stepOutputs[binding.source.nodeId];
|
||||
return stepOutputs?.[binding.source.outputName];
|
||||
|
||||
case "secret":
|
||||
// Fetch from vault (never cached)
|
||||
return this.secretsClient.fetch(binding.source.secretName);
|
||||
|
||||
case "expression":
|
||||
// Evaluate JavaScript expression
|
||||
return this.expressionEvaluator.evaluate(
|
||||
binding.source.expression,
|
||||
context
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Execution Context
|
||||
|
||||
The execution context provides data available to all steps:
|
||||
|
||||
```typescript
|
||||
interface ExecutionContext {
|
||||
// Workflow identifiers
|
||||
workflowRunId: UUID;
|
||||
templateId: UUID;
|
||||
templateVersion: number;
|
||||
|
||||
// Input values
|
||||
inputs: Record<string, any>;
|
||||
|
||||
// Domain objects (loaded at start)
|
||||
release?: Release;
|
||||
promotion?: Promotion;
|
||||
environment?: Environment;
|
||||
targets?: Target[];
|
||||
|
||||
// Step outputs (accumulated during execution)
|
||||
stepOutputs: Record<string, Record<string, any>>;
|
||||
|
||||
// Tenant context
|
||||
tenantId: UUID;
|
||||
userId: UUID;
|
||||
|
||||
// Metadata
|
||||
startedAt: DateTime;
|
||||
correlationId: string;
|
||||
}
|
||||
```
|
||||
|
||||
## Concurrency Control
|
||||
|
||||
### Parallelism Within Workflows
|
||||
|
||||
```typescript
|
||||
interface ParallelConfig {
|
||||
maxConcurrency: number; // Max simultaneous steps
|
||||
failFast: boolean; // Stop all on first failure
|
||||
}
|
||||
|
||||
// Example: Parallel deployment to multiple targets
|
||||
const parallelDeploy: StepNode = {
|
||||
id: "parallel-deploy",
|
||||
type: "parallel",
|
||||
config: {
|
||||
maxConcurrency: 5,
|
||||
failFast: false
|
||||
},
|
||||
children: [
|
||||
{ id: "deploy-target-1", type: "deploy-docker", ... },
|
||||
{ id: "deploy-target-2", type: "deploy-docker", ... },
|
||||
{ id: "deploy-target-3", type: "deploy-docker", ... },
|
||||
]
|
||||
};
|
||||
```
|
||||
|
||||
### Global Concurrency Limits
|
||||
|
||||
```typescript
|
||||
interface ConcurrencyLimits {
|
||||
maxWorkflowsPerTenant: number; // Concurrent workflow runs
|
||||
maxStepsPerWorkflow: number; // Concurrent steps per workflow
|
||||
maxDeploymentsPerEnvironment: number; // Prevent deployment conflicts
|
||||
}
|
||||
|
||||
// Default limits
|
||||
const defaults: ConcurrencyLimits = {
|
||||
maxWorkflowsPerTenant: 10,
|
||||
maxStepsPerWorkflow: 20,
|
||||
maxDeploymentsPerEnvironment: 1 // One deployment at a time
|
||||
};
|
||||
```
|
||||
|
||||
## Checkpoint and Resume
|
||||
|
||||
Workflows support checkpointing for long-running executions:
|
||||
|
||||
```typescript
|
||||
interface WorkflowCheckpoint {
|
||||
workflowRunId: UUID;
|
||||
checkpointedAt: DateTime;
|
||||
|
||||
// Execution state
|
||||
pendingNodes: string[];
|
||||
completedNodes: string[];
|
||||
failedNodes: string[];
|
||||
|
||||
// Accumulated data
|
||||
stepOutputs: Record<string, Record<string, any>>;
|
||||
|
||||
// Context snapshot
|
||||
contextSnapshot: ExecutionContext;
|
||||
}
|
||||
|
||||
class CheckpointManager {
|
||||
// Save checkpoint after each step completion
|
||||
async saveCheckpoint(run: WorkflowRun): Promise<void> {
|
||||
const checkpoint: WorkflowCheckpoint = {
|
||||
workflowRunId: run.id,
|
||||
checkpointedAt: new Date(),
|
||||
pendingNodes: Array.from(run.executor.pending),
|
||||
completedNodes: Array.from(run.executor.completed),
|
||||
failedNodes: Array.from(run.executor.failed),
|
||||
stepOutputs: run.executor.outputs,
|
||||
contextSnapshot: run.context
|
||||
};
|
||||
|
||||
await this.repository.save(checkpoint);
|
||||
}
|
||||
|
||||
// Resume from checkpoint after service restart
|
||||
async resumeFromCheckpoint(workflowRunId: UUID): Promise<WorkflowRun> {
|
||||
const checkpoint = await this.repository.get(workflowRunId);
|
||||
|
||||
const run = new WorkflowRun();
|
||||
run.executor.pending = new Set(checkpoint.pendingNodes);
|
||||
run.executor.completed = new Set(checkpoint.completedNodes);
|
||||
run.executor.failed = new Set(checkpoint.failedNodes);
|
||||
run.executor.outputs = checkpoint.stepOutputs;
|
||||
run.context = checkpoint.contextSnapshot;
|
||||
|
||||
// Resume execution
|
||||
await run.executor.execute();
|
||||
return run;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Timeout Handling
|
||||
|
||||
```typescript
|
||||
interface TimeoutConfig {
|
||||
stepTimeout: number; // Per-step timeout (seconds)
|
||||
workflowTimeout: number; // Total workflow timeout (seconds)
|
||||
}
|
||||
|
||||
class TimeoutHandler {
|
||||
async executeWithTimeout<T>(
|
||||
operation: () => Promise<T>,
|
||||
timeoutSeconds: number,
|
||||
onTimeout: () => Promise<void>
|
||||
): Promise<T> {
|
||||
const controller = new AbortController();
|
||||
const timeoutId = setTimeout(
|
||||
() => controller.abort(),
|
||||
timeoutSeconds * 1000
|
||||
);
|
||||
|
||||
try {
|
||||
const result = await operation();
|
||||
clearTimeout(timeoutId);
|
||||
return result;
|
||||
} catch (error) {
|
||||
if (error.name === 'AbortError') {
|
||||
await onTimeout();
|
||||
throw new TimeoutException(timeoutSeconds);
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Event Emission
|
||||
|
||||
The workflow engine emits events for observability:
|
||||
|
||||
```typescript
|
||||
type WorkflowEvent =
|
||||
| { type: "workflow.started"; workflowRunId: UUID; templateId: UUID }
|
||||
| { type: "workflow.completed"; workflowRunId: UUID; status: string }
|
||||
| { type: "workflow.failed"; workflowRunId: UUID; error: string }
|
||||
| { type: "step.started"; workflowRunId: UUID; nodeId: string }
|
||||
| { type: "step.completed"; workflowRunId: UUID; nodeId: string; outputs: any }
|
||||
| { type: "step.failed"; workflowRunId: UUID; nodeId: string; error: string }
|
||||
| { type: "step.retrying"; workflowRunId: UUID; nodeId: string; attempt: number };
|
||||
|
||||
class WorkflowEventEmitter {
|
||||
private subscribers: Map<string, ((event: WorkflowEvent) => void)[]> = new Map();
|
||||
|
||||
emit(event: WorkflowEvent): void {
|
||||
const handlers = this.subscribers.get(event.type) || [];
|
||||
for (const handler of handlers) {
|
||||
handler(event);
|
||||
}
|
||||
|
||||
// Also emit to event bus for external consumers
|
||||
this.eventBus.publish("workflow.events", event);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Execution Monitoring
|
||||
|
||||
### Real-time Progress
|
||||
|
||||
```typescript
|
||||
interface WorkflowProgress {
|
||||
workflowRunId: UUID;
|
||||
status: WorkflowStatus;
|
||||
|
||||
// Step progress
|
||||
totalSteps: number;
|
||||
completedSteps: number;
|
||||
runningSteps: number;
|
||||
failedSteps: number;
|
||||
|
||||
// Current activity
|
||||
currentNodes: string[];
|
||||
|
||||
// Timing
|
||||
startedAt: DateTime;
|
||||
estimatedCompletion?: DateTime;
|
||||
|
||||
// Step details
|
||||
steps: StepProgress[];
|
||||
}
|
||||
|
||||
interface StepProgress {
|
||||
nodeId: string;
|
||||
nodeName: string;
|
||||
status: StepStatus;
|
||||
startedAt?: DateTime;
|
||||
completedAt?: DateTime;
|
||||
attempt: number;
|
||||
logs?: string;
|
||||
}
|
||||
```
|
||||
|
||||
### WebSocket Streaming
|
||||
|
||||
```typescript
|
||||
// Client subscribes to workflow progress
|
||||
const ws = new WebSocket(`/api/v1/workflow-runs/${runId}/stream`);
|
||||
|
||||
ws.onmessage = (event) => {
|
||||
const progress: WorkflowProgress = JSON.parse(event.data);
|
||||
updateUI(progress);
|
||||
};
|
||||
|
||||
// Server streams updates
|
||||
class WorkflowStreamHandler {
|
||||
async stream(runId: UUID, connection: WebSocket): Promise<void> {
|
||||
const subscription = this.eventBus.subscribe(`workflow.${runId}.*`);
|
||||
|
||||
for await (const event of subscription) {
|
||||
const progress = await this.buildProgress(runId);
|
||||
connection.send(JSON.stringify(progress));
|
||||
|
||||
if (progress.status === 'completed' || progress.status === 'failed') {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
connection.close();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Workflow Templates](templates.md)
|
||||
- [Workflow Engine Module](../modules/workflow-engine.md)
|
||||
- [Promotion Manager](../modules/promotion-manager.md)
|
||||
405
docs/modules/release-orchestrator/workflow/promotion.md
Normal file
405
docs/modules/release-orchestrator/workflow/promotion.md
Normal file
@@ -0,0 +1,405 @@
|
||||
# Promotion State Machine
|
||||
|
||||
## Overview
|
||||
|
||||
Promotions move releases through environments (Dev -> Staging -> Production). The promotion state machine manages the lifecycle from request to completion.
|
||||
|
||||
## Promotion States
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ PROMOTION STATE MACHINE │
|
||||
│ │
|
||||
│ ┌──────────────────┐ │
|
||||
│ │ PENDING_APPROVAL │ (initial) │
|
||||
│ └────────┬─────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────────┼──────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
||||
│ │ REJECTED │ │ PENDING_GATE │ │ CANCELLED │ │
|
||||
│ └────────────────┘ └────────┬───────┘ └────────────────┘ │
|
||||
│ │ │
|
||||
│ │ gates pass │
|
||||
│ ▼ │
|
||||
│ ┌────────────────┐ │
|
||||
│ │ APPROVED │ │
|
||||
│ └────────┬───────┘ │
|
||||
│ │ │
|
||||
│ │ start deployment │
|
||||
│ ▼ │
|
||||
│ ┌────────────────┐ │
|
||||
│ │ DEPLOYING │ │
|
||||
│ └────────┬───────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────────┼──────────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
||||
│ │ FAILED │ │ DEPLOYED │ │ ROLLED_BACK │ │
|
||||
│ └────────────────┘ └────────────────┘ └────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## State Definitions
|
||||
|
||||
| State | Description |
|
||||
|-------|-------------|
|
||||
| `pending_approval` | Awaiting human approval (if required) |
|
||||
| `pending_gate` | Awaiting automated gate evaluation |
|
||||
| `approved` | All approvals and gates satisfied; ready for deployment |
|
||||
| `rejected` | Blocked by approval rejection or gate failure |
|
||||
| `deploying` | Deployment in progress |
|
||||
| `deployed` | Successfully deployed to target environment |
|
||||
| `failed` | Deployment failed (not rolled back) |
|
||||
| `cancelled` | Cancelled by user before completion |
|
||||
| `rolled_back` | Deployment rolled back to previous version |
|
||||
|
||||
## State Transitions
|
||||
|
||||
### Valid Transitions
|
||||
|
||||
```typescript
|
||||
const validTransitions: Record<PromotionStatus, PromotionStatus[]> = {
|
||||
pending_approval: ["pending_gate", "approved", "rejected", "cancelled"],
|
||||
pending_gate: ["approved", "rejected", "cancelled"],
|
||||
approved: ["deploying", "cancelled"],
|
||||
deploying: ["deployed", "failed", "rolled_back"],
|
||||
rejected: [], // terminal
|
||||
cancelled: [], // terminal
|
||||
deployed: [], // terminal (for this promotion)
|
||||
failed: ["rolled_back"], // can trigger rollback
|
||||
rolled_back: [] // terminal
|
||||
};
|
||||
```
|
||||
|
||||
### Transition Events
|
||||
|
||||
```typescript
|
||||
interface PromotionTransition {
|
||||
promotionId: UUID;
|
||||
fromState: PromotionStatus;
|
||||
toState: PromotionStatus;
|
||||
trigger: TransitionTrigger;
|
||||
triggeredBy: UUID; // user or system
|
||||
timestamp: DateTime;
|
||||
details: object;
|
||||
}
|
||||
|
||||
type TransitionTrigger =
|
||||
| "approval_granted"
|
||||
| "approval_rejected"
|
||||
| "gate_passed"
|
||||
| "gate_failed"
|
||||
| "deployment_started"
|
||||
| "deployment_completed"
|
||||
| "deployment_failed"
|
||||
| "rollback_triggered"
|
||||
| "rollback_completed"
|
||||
| "user_cancelled";
|
||||
```
|
||||
|
||||
## Promotion Flow
|
||||
|
||||
### 1. Request Promotion
|
||||
|
||||
```typescript
|
||||
async function requestPromotion(request: PromotionRequest): Promise<Promotion> {
|
||||
// Validate release exists and is ready
|
||||
const release = await getRelease(request.releaseId);
|
||||
if (release.status !== "ready" && release.status !== "deployed") {
|
||||
throw new Error("Release not ready for promotion");
|
||||
}
|
||||
|
||||
// Validate target environment
|
||||
const environment = await getEnvironment(request.targetEnvironmentId);
|
||||
|
||||
// Check freeze windows
|
||||
if (await isEnvironmentFrozen(environment.id)) {
|
||||
throw new Error("Environment is frozen");
|
||||
}
|
||||
|
||||
// Determine initial state
|
||||
const requiresApproval = environment.requiredApprovals > 0;
|
||||
const initialStatus = requiresApproval ? "pending_approval" : "pending_gate";
|
||||
|
||||
// Create promotion
|
||||
const promotion = await createPromotion({
|
||||
releaseId: request.releaseId,
|
||||
sourceEnvironmentId: release.currentEnvironmentId,
|
||||
targetEnvironmentId: environment.id,
|
||||
status: initialStatus,
|
||||
requestedBy: request.userId,
|
||||
requestReason: request.reason
|
||||
});
|
||||
|
||||
// Emit event
|
||||
await emitEvent("promotion.requested", promotion);
|
||||
|
||||
return promotion;
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Approval Phase
|
||||
|
||||
```typescript
|
||||
async function processApproval(
|
||||
promotionId: UUID,
|
||||
approverId: UUID,
|
||||
action: "approve" | "reject",
|
||||
comment?: string
|
||||
): Promise<Promotion> {
|
||||
const promotion = await getPromotion(promotionId);
|
||||
const environment = await getEnvironment(promotion.targetEnvironmentId);
|
||||
|
||||
// Validate approver can approve
|
||||
await validateApproverPermission(approverId, environment.id);
|
||||
|
||||
// Check separation of duties
|
||||
if (environment.requireSeparationOfDuties) {
|
||||
if (approverId === promotion.requestedBy) {
|
||||
throw new Error("Separation of duties violation: requester cannot approve");
|
||||
}
|
||||
}
|
||||
|
||||
// Record approval
|
||||
await recordApproval({
|
||||
promotionId,
|
||||
approverId,
|
||||
action,
|
||||
comment
|
||||
});
|
||||
|
||||
if (action === "reject") {
|
||||
return await transitionState(promotion, "rejected", {
|
||||
trigger: "approval_rejected",
|
||||
triggeredBy: approverId,
|
||||
details: { reason: comment }
|
||||
});
|
||||
}
|
||||
|
||||
// Check if all required approvals received
|
||||
const approvalCount = await countApprovals(promotionId);
|
||||
if (approvalCount >= environment.requiredApprovals) {
|
||||
return await transitionState(promotion, "pending_gate", {
|
||||
trigger: "approval_granted",
|
||||
triggeredBy: approverId
|
||||
});
|
||||
}
|
||||
|
||||
return promotion;
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Gate Evaluation
|
||||
|
||||
```typescript
|
||||
async function evaluateGates(promotionId: UUID): Promise<GateEvaluationResult> {
|
||||
const promotion = await getPromotion(promotionId);
|
||||
const environment = await getEnvironment(promotion.targetEnvironmentId);
|
||||
const release = await getRelease(promotion.releaseId);
|
||||
|
||||
const gateResults: GateResult[] = [];
|
||||
|
||||
// Security gate
|
||||
const securityResult = await evaluateSecurityGate(release, environment);
|
||||
gateResults.push(securityResult);
|
||||
|
||||
// Custom policy gates
|
||||
for (const policy of environment.policies) {
|
||||
const policyResult = await evaluatePolicyGate(release, environment, policy);
|
||||
gateResults.push(policyResult);
|
||||
}
|
||||
|
||||
// Aggregate results
|
||||
const allPassed = gateResults.every(g => g.passed);
|
||||
const blockingFailures = gateResults.filter(g => !g.passed && g.blocking);
|
||||
|
||||
// Create decision record
|
||||
const decisionRecord = await createDecisionRecord({
|
||||
promotionId,
|
||||
gateResults,
|
||||
decision: allPassed ? "allow" : "block",
|
||||
decidedAt: new Date()
|
||||
});
|
||||
|
||||
// Transition state
|
||||
if (allPassed) {
|
||||
await transitionState(promotion, "approved", {
|
||||
trigger: "gate_passed",
|
||||
triggeredBy: "system",
|
||||
details: { decisionRecordId: decisionRecord.id }
|
||||
});
|
||||
} else {
|
||||
await transitionState(promotion, "rejected", {
|
||||
trigger: "gate_failed",
|
||||
triggeredBy: "system",
|
||||
details: { blockingGates: blockingFailures }
|
||||
});
|
||||
}
|
||||
|
||||
return { passed: allPassed, gateResults, decisionRecord };
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Deployment Execution
|
||||
|
||||
```typescript
|
||||
async function executeDeployment(promotionId: UUID): Promise<DeploymentJob> {
|
||||
const promotion = await getPromotion(promotionId);
|
||||
|
||||
// Transition to deploying
|
||||
await transitionState(promotion, "deploying", {
|
||||
trigger: "deployment_started",
|
||||
triggeredBy: "system"
|
||||
});
|
||||
|
||||
// Generate artifacts
|
||||
const artifacts = await generateArtifacts(promotion);
|
||||
|
||||
// Create deployment job
|
||||
const job = await createDeploymentJob({
|
||||
promotionId,
|
||||
releaseId: promotion.releaseId,
|
||||
environmentId: promotion.targetEnvironmentId,
|
||||
artifacts
|
||||
});
|
||||
|
||||
// Execute via workflow or direct
|
||||
const workflowRun = await startDeploymentWorkflow(job);
|
||||
|
||||
// Update promotion with workflow reference
|
||||
await updatePromotion(promotionId, { workflowRunId: workflowRun.id });
|
||||
|
||||
return job;
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Completion Handling
|
||||
|
||||
```typescript
|
||||
async function handleDeploymentCompletion(
|
||||
jobId: UUID,
|
||||
status: "succeeded" | "failed"
|
||||
): Promise<Promotion> {
|
||||
const job = await getDeploymentJob(jobId);
|
||||
const promotion = await getPromotion(job.promotionId);
|
||||
|
||||
if (status === "succeeded") {
|
||||
// Generate evidence packet
|
||||
const evidence = await generateEvidencePacket(promotion, job);
|
||||
|
||||
// Update release environment state
|
||||
await updateReleaseEnvironmentState({
|
||||
releaseId: promotion.releaseId,
|
||||
environmentId: promotion.targetEnvironmentId,
|
||||
status: "deployed",
|
||||
promotionId: promotion.id,
|
||||
evidenceRef: evidence.id
|
||||
});
|
||||
|
||||
return await transitionState(promotion, "deployed", {
|
||||
trigger: "deployment_completed",
|
||||
triggeredBy: "system",
|
||||
details: { evidencePacketId: evidence.id }
|
||||
});
|
||||
} else {
|
||||
return await transitionState(promotion, "failed", {
|
||||
trigger: "deployment_failed",
|
||||
triggeredBy: "system",
|
||||
details: { jobId, error: job.errorMessage }
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Decision Record
|
||||
|
||||
Every promotion produces a decision record:
|
||||
|
||||
```typescript
|
||||
interface DecisionRecord {
|
||||
id: UUID;
|
||||
promotionId: UUID;
|
||||
decision: "allow" | "block";
|
||||
decidedAt: DateTime;
|
||||
|
||||
// Inputs
|
||||
release: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
components: Array<{ name: string; digest: string }>;
|
||||
};
|
||||
environment: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
};
|
||||
|
||||
// Gate results
|
||||
gateResults: Array<{
|
||||
gateName: string;
|
||||
gateType: string;
|
||||
passed: boolean;
|
||||
blocking: boolean;
|
||||
message: string;
|
||||
details: object;
|
||||
evaluatedAt: DateTime;
|
||||
}>;
|
||||
|
||||
// Approvals
|
||||
approvals: Array<{
|
||||
approverId: UUID;
|
||||
approverName: string;
|
||||
action: "approved" | "rejected";
|
||||
comment?: string;
|
||||
timestamp: DateTime;
|
||||
}>;
|
||||
|
||||
// Context
|
||||
requester: {
|
||||
id: UUID;
|
||||
name: string;
|
||||
};
|
||||
requestReason: string;
|
||||
|
||||
// Signature
|
||||
contentHash: string;
|
||||
signature: string;
|
||||
}
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
```yaml
|
||||
# Request promotion
|
||||
POST /api/v1/promotions
|
||||
Body: { releaseId, targetEnvironmentId, reason? }
|
||||
Response: Promotion
|
||||
|
||||
# Approve/reject promotion
|
||||
POST /api/v1/promotions/{id}/approve
|
||||
POST /api/v1/promotions/{id}/reject
|
||||
Body: { comment? }
|
||||
Response: Promotion
|
||||
|
||||
# Cancel promotion
|
||||
POST /api/v1/promotions/{id}/cancel
|
||||
Response: Promotion
|
||||
|
||||
# Get decision record
|
||||
GET /api/v1/promotions/{id}/decision
|
||||
Response: DecisionRecord
|
||||
|
||||
# Preview gates (dry run)
|
||||
POST /api/v1/promotions/preview-gates
|
||||
Body: { releaseId, targetEnvironmentId }
|
||||
Response: { wouldPass: boolean, gates: GateResult[] }
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Workflow Templates](templates.md)
|
||||
- [Workflow Execution](execution.md)
|
||||
- [Evidence Schema](../appendices/evidence-schema.md)
|
||||
327
docs/modules/release-orchestrator/workflow/templates.md
Normal file
327
docs/modules/release-orchestrator/workflow/templates.md
Normal file
@@ -0,0 +1,327 @@
|
||||
# Workflow Template Structure
|
||||
|
||||
## Overview
|
||||
|
||||
Workflow templates define the DAG (Directed Acyclic Graph) of steps to execute during deployment, promotion, and other automated processes.
|
||||
|
||||
## Template Structure
|
||||
|
||||
```typescript
|
||||
interface WorkflowTemplate {
|
||||
id: UUID;
|
||||
tenantId: UUID;
|
||||
name: string; // "standard-deploy"
|
||||
displayName: string; // "Standard Deployment"
|
||||
description: string;
|
||||
version: number; // Auto-incremented
|
||||
|
||||
// DAG structure
|
||||
nodes: StepNode[];
|
||||
edges: StepEdge[];
|
||||
|
||||
// I/O definitions
|
||||
inputs: InputDefinition[];
|
||||
outputs: OutputDefinition[];
|
||||
|
||||
// Metadata
|
||||
tags: string[];
|
||||
isBuiltin: boolean;
|
||||
createdAt: DateTime;
|
||||
createdBy: UUID;
|
||||
}
|
||||
```
|
||||
|
||||
## Node Types
|
||||
|
||||
### Step Node
|
||||
|
||||
```typescript
|
||||
interface StepNode {
|
||||
id: string; // Unique within template (e.g., "deploy-api")
|
||||
type: string; // Step type from registry
|
||||
name: string; // Display name
|
||||
config: Record<string, any>; // Step-specific configuration
|
||||
inputs: InputBinding[]; // Input value bindings
|
||||
outputs: OutputBinding[]; // Output declarations
|
||||
position: { x: number; y: number }; // UI position
|
||||
|
||||
// Execution settings
|
||||
timeout: number; // Seconds (default from step type)
|
||||
retryPolicy: RetryPolicy;
|
||||
onFailure: FailureAction;
|
||||
condition?: string; // JS expression for conditional execution
|
||||
|
||||
// Documentation
|
||||
description?: string;
|
||||
documentation?: string;
|
||||
}
|
||||
|
||||
type FailureAction = "fail" | "continue" | "rollback" | "goto:{nodeId}";
|
||||
|
||||
interface RetryPolicy {
|
||||
maxRetries: number;
|
||||
backoffType: "fixed" | "exponential";
|
||||
backoffSeconds: number;
|
||||
retryableErrors: string[];
|
||||
}
|
||||
```
|
||||
|
||||
### Input Bindings
|
||||
|
||||
```typescript
|
||||
interface InputBinding {
|
||||
name: string; // Input parameter name
|
||||
source: InputSource;
|
||||
}
|
||||
|
||||
type InputSource =
|
||||
| { type: "literal"; value: any }
|
||||
| { type: "context"; path: string } // e.g., "release.name"
|
||||
| { type: "output"; nodeId: string; outputName: string }
|
||||
| { type: "secret"; secretName: string }
|
||||
| { type: "expression"; expression: string }; // JS expression
|
||||
```
|
||||
|
||||
### Edge Types
|
||||
|
||||
```typescript
|
||||
interface StepEdge {
|
||||
id: string;
|
||||
from: string; // Source node ID
|
||||
to: string; // Target node ID
|
||||
condition?: string; // Optional condition expression
|
||||
label?: string; // Display label for conditional edges
|
||||
}
|
||||
```
|
||||
|
||||
## Built-in Step Types
|
||||
|
||||
### Control Steps
|
||||
|
||||
| Type | Description | Config |
|
||||
|------|-------------|--------|
|
||||
| `approval` | Wait for human approval | `promotionId` |
|
||||
| `wait` | Wait for specified duration | `durationSeconds` |
|
||||
| `condition` | Branch based on condition | `expression` |
|
||||
| `parallel` | Execute children in parallel | `maxConcurrency` |
|
||||
|
||||
### Gate Steps
|
||||
|
||||
| Type | Description | Config |
|
||||
|------|-------------|--------|
|
||||
| `security-gate` | Evaluate security policy | `blockOnCritical`, `blockOnHigh` |
|
||||
| `custom-gate` | Custom OPA policy evaluation | `policyName` |
|
||||
| `freeze-check` | Check freeze windows | - |
|
||||
| `approval-check` | Check approval status | `requiredCount` |
|
||||
|
||||
### Deploy Steps
|
||||
|
||||
| Type | Description | Config |
|
||||
|------|-------------|--------|
|
||||
| `deploy-docker` | Deploy single container | `containerName`, `strategy` |
|
||||
| `deploy-compose` | Deploy Docker Compose stack | `composePath`, `strategy` |
|
||||
| `deploy-ecs` | Deploy to AWS ECS | `cluster`, `service` |
|
||||
| `deploy-nomad` | Deploy to HashiCorp Nomad | `jobName` |
|
||||
|
||||
### Verification Steps
|
||||
|
||||
| Type | Description | Config |
|
||||
|------|-------------|--------|
|
||||
| `health-check` | HTTP/TCP health check | `type`, `path`, `expectedStatus` |
|
||||
| `smoke-test` | Run smoke test suite | `testSuite`, `timeout` |
|
||||
| `verify-digest` | Verify deployed digest | `expectedDigest` |
|
||||
|
||||
### Integration Steps
|
||||
|
||||
| Type | Description | Config |
|
||||
|------|-------------|--------|
|
||||
| `webhook` | Call external webhook | `url`, `method`, `headers` |
|
||||
| `trigger-ci` | Trigger CI pipeline | `integrationId`, `pipelineId` |
|
||||
| `wait-ci` | Wait for CI pipeline | `runId`, `timeout` |
|
||||
|
||||
### Notification Steps
|
||||
|
||||
| Type | Description | Config |
|
||||
|------|-------------|--------|
|
||||
| `notify` | Send notification | `channel`, `template` |
|
||||
| `slack` | Send Slack message | `channel`, `message` |
|
||||
| `email` | Send email | `recipients`, `template` |
|
||||
|
||||
### Recovery Steps
|
||||
|
||||
| Type | Description | Config |
|
||||
|------|-------------|--------|
|
||||
| `rollback` | Rollback deployment | `strategy`, `targetReleaseId` |
|
||||
| `execute-script` | Run recovery script | `scriptType`, `scriptRef` |
|
||||
|
||||
## Template Example: Standard Deployment
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "template-standard-deploy",
|
||||
"name": "standard-deploy",
|
||||
"displayName": "Standard Deployment",
|
||||
"version": 1,
|
||||
"inputs": [
|
||||
{ "name": "releaseId", "type": "uuid", "required": true },
|
||||
{ "name": "environmentId", "type": "uuid", "required": true },
|
||||
{ "name": "promotionId", "type": "uuid", "required": true }
|
||||
],
|
||||
"nodes": [
|
||||
{
|
||||
"id": "approval",
|
||||
"type": "approval",
|
||||
"name": "Approval Gate",
|
||||
"config": {},
|
||||
"inputs": [
|
||||
{ "name": "promotionId", "source": { "type": "context", "path": "promotionId" } }
|
||||
],
|
||||
"position": { "x": 100, "y": 100 }
|
||||
},
|
||||
{
|
||||
"id": "security-gate",
|
||||
"type": "security-gate",
|
||||
"name": "Security Verification",
|
||||
"config": {
|
||||
"blockOnCritical": true,
|
||||
"blockOnHigh": true
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "releaseId", "source": { "type": "context", "path": "releaseId" } }
|
||||
],
|
||||
"position": { "x": 100, "y": 200 }
|
||||
},
|
||||
{
|
||||
"id": "pre-deploy-hook",
|
||||
"type": "execute-script",
|
||||
"name": "Pre-Deploy Hook",
|
||||
"config": {
|
||||
"scriptType": "csharp",
|
||||
"scriptRef": "hooks/pre-deploy.csx"
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "release", "source": { "type": "context", "path": "release" } },
|
||||
{ "name": "environment", "source": { "type": "context", "path": "environment" } }
|
||||
],
|
||||
"timeout": 300,
|
||||
"onFailure": "fail",
|
||||
"position": { "x": 100, "y": 300 }
|
||||
},
|
||||
{
|
||||
"id": "deploy-targets",
|
||||
"type": "deploy-compose",
|
||||
"name": "Deploy to Targets",
|
||||
"config": {
|
||||
"strategy": "rolling",
|
||||
"parallelism": 2
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "releaseId", "source": { "type": "context", "path": "releaseId" } },
|
||||
{ "name": "environmentId", "source": { "type": "context", "path": "environmentId" } }
|
||||
],
|
||||
"timeout": 600,
|
||||
"retryPolicy": {
|
||||
"maxRetries": 2,
|
||||
"backoffType": "exponential",
|
||||
"backoffSeconds": 30
|
||||
},
|
||||
"onFailure": "rollback",
|
||||
"position": { "x": 100, "y": 400 }
|
||||
},
|
||||
{
|
||||
"id": "health-check",
|
||||
"type": "health-check",
|
||||
"name": "Health Verification",
|
||||
"config": {
|
||||
"type": "http",
|
||||
"path": "/health",
|
||||
"expectedStatus": 200,
|
||||
"timeout": 30,
|
||||
"retries": 5
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "targets", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "deployedTargets" } }
|
||||
],
|
||||
"onFailure": "rollback",
|
||||
"position": { "x": 100, "y": 500 }
|
||||
},
|
||||
{
|
||||
"id": "post-deploy-hook",
|
||||
"type": "execute-script",
|
||||
"name": "Post-Deploy Hook",
|
||||
"config": {
|
||||
"scriptType": "bash",
|
||||
"inline": "echo 'Deployment complete'"
|
||||
},
|
||||
"timeout": 300,
|
||||
"onFailure": "continue",
|
||||
"position": { "x": 100, "y": 600 }
|
||||
},
|
||||
{
|
||||
"id": "notify-success",
|
||||
"type": "notify",
|
||||
"name": "Success Notification",
|
||||
"config": {
|
||||
"channel": "slack",
|
||||
"template": "deployment-success"
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "release", "source": { "type": "context", "path": "release" } },
|
||||
{ "name": "environment", "source": { "type": "context", "path": "environment" } }
|
||||
],
|
||||
"onFailure": "continue",
|
||||
"position": { "x": 100, "y": 700 }
|
||||
},
|
||||
{
|
||||
"id": "rollback-handler",
|
||||
"type": "rollback",
|
||||
"name": "Rollback Handler",
|
||||
"config": {
|
||||
"strategy": "to-previous"
|
||||
},
|
||||
"inputs": [
|
||||
{ "name": "deploymentJobId", "source": { "type": "output", "nodeId": "deploy-targets", "outputName": "jobId" } }
|
||||
],
|
||||
"position": { "x": 300, "y": 450 }
|
||||
},
|
||||
{
|
||||
"id": "notify-failure",
|
||||
"type": "notify",
|
||||
"name": "Failure Notification",
|
||||
"config": {
|
||||
"channel": "slack",
|
||||
"template": "deployment-failure"
|
||||
},
|
||||
"onFailure": "continue",
|
||||
"position": { "x": 300, "y": 550 }
|
||||
}
|
||||
],
|
||||
"edges": [
|
||||
{ "id": "e1", "from": "approval", "to": "security-gate" },
|
||||
{ "id": "e2", "from": "security-gate", "to": "pre-deploy-hook" },
|
||||
{ "id": "e3", "from": "pre-deploy-hook", "to": "deploy-targets" },
|
||||
{ "id": "e4", "from": "deploy-targets", "to": "health-check" },
|
||||
{ "id": "e5", "from": "health-check", "to": "post-deploy-hook" },
|
||||
{ "id": "e6", "from": "post-deploy-hook", "to": "notify-success" },
|
||||
{ "id": "e7", "from": "deploy-targets", "to": "rollback-handler", "condition": "status === 'failed'" },
|
||||
{ "id": "e8", "from": "health-check", "to": "rollback-handler", "condition": "status === 'failed'" },
|
||||
{ "id": "e9", "from": "rollback-handler", "to": "notify-failure" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Template Validation
|
||||
|
||||
Templates are validated for:
|
||||
|
||||
1. **Structural validity**: Valid JSON/YAML, required fields present
|
||||
2. **DAG validity**: No cycles, all edges reference valid nodes
|
||||
3. **Type validity**: All step types exist in registry
|
||||
4. **Schema validity**: Step configs match type schemas
|
||||
5. **Input validity**: All required inputs are bindable
|
||||
|
||||
## References
|
||||
|
||||
- [Workflow Engine](../modules/workflow-engine.md)
|
||||
- [Execution State Machine](execution.md)
|
||||
- [Step Registry](../modules/workflow-engine.md#module-step-registry)
|
||||
Reference in New Issue
Block a user