release orchestrator pivot, architecture and planning
This commit is contained in:
410
docs/modules/release-orchestrator/architecture.md
Normal file
410
docs/modules/release-orchestrator/architecture.md
Normal file
@@ -0,0 +1,410 @@
|
||||
# Release Orchestrator Architecture
|
||||
|
||||
> Technical architecture specification for the Release Orchestrator — Stella Ops Suite's central release control plane for non-Kubernetes container estates.
|
||||
|
||||
**Status:** Planned (not yet implemented)
|
||||
|
||||
## Overview
|
||||
|
||||
The Release Orchestrator transforms Stella Ops Suite from a vulnerability scanning platform into a centralized, auditable release control plane. It sits between CI systems and runtime targets, governing promotion across environments, enforcing security and policy gates, and producing verifiable evidence for every release decision.
|
||||
|
||||
### Core Value Proposition
|
||||
|
||||
- **Release orchestration** — UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks
|
||||
- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates
|
||||
- **OCI-digest-first releases** — Immutable digest-based release identity
|
||||
- **Toolchain-agnostic integrations** — Plug into any SCM, CI, registry, secrets system
|
||||
- **Auditability + standards** — Evidence packets, SBOM/VEX/attestation support, deterministic replay
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Digest-First Release Identity** — A release is an immutable set of OCI digests, never mutable tags. Tags are resolved to digests at release creation time.
|
||||
|
||||
2. **Pluggable Everything, Stable Core** — Integrations are plugins; the core orchestration engine is stable. Plugins contribute UI screens, connector logic, step types, and agent types.
|
||||
|
||||
3. **Evidence for Every Decision** — Every deployment/promotion produces an immutable evidence record containing who, what, why, how, and when.
|
||||
|
||||
4. **No Feature Gating** — All plans include all features. Limits are only: environments, new digests/day, fair use on deployments.
|
||||
|
||||
5. **Offline-First Operation** — All core operations work in air-gapped environments. Plugins may require connectivity; core does not.
|
||||
|
||||
6. **Immutable Generated Artifacts** — Every deployment generates and stores immutable artifacts (compose lockfiles, scripts, evidence).
|
||||
|
||||
## Platform Themes
|
||||
|
||||
The Release Orchestrator introduces ten new functional themes:
|
||||
|
||||
| Theme | Purpose | Key Modules |
|
||||
|-------|---------|-------------|
|
||||
| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime |
|
||||
| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager |
|
||||
| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager |
|
||||
| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor |
|
||||
| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine |
|
||||
| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Artifact Generator |
|
||||
| **AGENTS** | Deployment agents | Agent Core, Docker/Compose/ECS/Nomad agents |
|
||||
| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller |
|
||||
| **RELEVI** | Release evidence | Evidence Collector, Sticker Writer, Audit Exporter |
|
||||
| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin SDK |
|
||||
|
||||
## Components
|
||||
|
||||
```
|
||||
ReleaseOrchestrator/
|
||||
├── __Libraries/
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Core/ # Core domain models
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Workflow/ # DAG workflow engine
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Promotion/ # Promotion logic
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Deploy/ # Deployment coordination
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Evidence/ # Evidence generation
|
||||
│ ├── StellaOps.ReleaseOrchestrator.Plugin/ # Plugin infrastructure
|
||||
│ └── StellaOps.ReleaseOrchestrator.Integration/ # Integration connectors
|
||||
├── StellaOps.ReleaseOrchestrator.WebService/ # HTTP API
|
||||
├── StellaOps.ReleaseOrchestrator.Worker/ # Background processing
|
||||
├── StellaOps.Agent.Core/ # Agent base framework
|
||||
├── StellaOps.Agent.Docker/ # Docker host agent
|
||||
├── StellaOps.Agent.Compose/ # Docker Compose agent
|
||||
├── StellaOps.Agent.SSH/ # SSH agentless executor
|
||||
├── StellaOps.Agent.WinRM/ # WinRM agentless executor
|
||||
├── StellaOps.Agent.ECS/ # AWS ECS agent
|
||||
├── StellaOps.Agent.Nomad/ # HashiCorp Nomad agent
|
||||
└── __Tests/
|
||||
└── StellaOps.ReleaseOrchestrator.*.Tests/
|
||||
```
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Release Orchestration Flow
|
||||
|
||||
```
|
||||
CI Build → Registry Push → Webhook → Stella Scan → Create Release →
|
||||
Request Promotion → Gate Evaluation → Decision Record →
|
||||
Deploy via Agent → Version Sticker → Evidence Packet
|
||||
```
|
||||
|
||||
### Detailed Flow
|
||||
|
||||
1. **CI pushes image** to registry by digest; triggers webhook to Stella
|
||||
2. **Stella scans** the new digest (if not already scanned); stores verdict
|
||||
3. **Release created** bundling component digests with semantic version
|
||||
4. **Promotion requested** to move release from source → target environment
|
||||
5. **Gate evaluation** runs: security verdict, approval count, freeze windows, custom policies
|
||||
6. **Decision record** produced with evidence refs and signed
|
||||
7. **Deployment executed** via agent to target (Docker/Compose/ECS/Nomad)
|
||||
8. **Version sticker** written to target for drift detection
|
||||
9. **Evidence packet** sealed and stored
|
||||
|
||||
## Key Abstractions
|
||||
|
||||
### Environment
|
||||
|
||||
```csharp
|
||||
public sealed record Environment
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required string Name { get; init; } // "dev", "stage", "prod"
|
||||
public required string Slug { get; init; } // URL-safe identifier
|
||||
public required int PromotionOrder { get; init; } // 1, 2, 3...
|
||||
public required FreezeWindow[] FreezeWindows { get; init; }
|
||||
public required ApprovalPolicy ApprovalPolicy { get; init; }
|
||||
public required bool IsProduction { get; init; }
|
||||
public EnvironmentState State { get; init; } // Active, Frozen, Retired
|
||||
}
|
||||
```
|
||||
|
||||
### Release
|
||||
|
||||
```csharp
|
||||
public sealed record Release
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required string Version { get; init; } // SemVer: "2.3.1"
|
||||
public required string Name { get; init; } // Display name
|
||||
public required ImmutableDictionary<string, ComponentDigest> Components { get; init; }
|
||||
public required string SourceRef { get; init; } // Git SHA or tag
|
||||
public required DateTimeOffset CreatedAt { get; init; }
|
||||
public required Guid CreatedBy { get; init; }
|
||||
public ReleaseState State { get; init; } // Draft, Active, Deprecated
|
||||
}
|
||||
|
||||
public sealed record ComponentDigest
|
||||
{
|
||||
public required string Repository { get; init; } // registry.example.com/app/api
|
||||
public required string Digest { get; init; } // sha256:abc123...
|
||||
public required string? ResolvedFromTag { get; init; } // Optional: "v2.3.1"
|
||||
}
|
||||
```
|
||||
|
||||
### Promotion
|
||||
|
||||
```csharp
|
||||
public sealed record Promotion
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required Guid ReleaseId { get; init; }
|
||||
public required Guid SourceEnvironmentId { get; init; }
|
||||
public required Guid TargetEnvironmentId { get; init; }
|
||||
public required Guid RequestedBy { get; init; }
|
||||
public required DateTimeOffset RequestedAt { get; init; }
|
||||
public PromotionState State { get; init; } // Pending, Approved, Rejected, Deployed, RolledBack
|
||||
public required ImmutableArray<GateResult> GateResults { get; init; }
|
||||
public required ImmutableArray<ApprovalRecord> Approvals { get; init; }
|
||||
public required DecisionRecord? Decision { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### Workflow
|
||||
|
||||
```csharp
|
||||
public sealed record Workflow
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required string Name { get; init; }
|
||||
public required ImmutableArray<WorkflowStep> Steps { get; init; }
|
||||
public required ImmutableDictionary<string, string[]> DependencyGraph { get; init; }
|
||||
}
|
||||
|
||||
public sealed record WorkflowStep
|
||||
{
|
||||
public required string Id { get; init; }
|
||||
public required string Type { get; init; } // "script", "approval", "deploy", "gate"
|
||||
public required StepProvider Provider { get; init; }
|
||||
public required ImmutableDictionary<string, object> Config { get; init; }
|
||||
public required string[] DependsOn { get; init; }
|
||||
public StepState State { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### Target
|
||||
|
||||
```csharp
|
||||
public sealed record Target
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required Guid EnvironmentId { get; init; }
|
||||
public required string Name { get; init; }
|
||||
public required TargetType Type { get; init; } // DockerHost, ComposeHost, ECSService, NomadJob
|
||||
public required ImmutableDictionary<string, string> Labels { get; init; }
|
||||
public required Guid? AgentId { get; init; } // Null for agentless
|
||||
public required TargetState State { get; init; }
|
||||
public required HealthStatus Health { get; init; }
|
||||
}
|
||||
|
||||
public enum TargetType
|
||||
{
|
||||
DockerHost,
|
||||
ComposeHost,
|
||||
ECSService,
|
||||
NomadJob,
|
||||
SSHRemote,
|
||||
WinRMRemote
|
||||
}
|
||||
```
|
||||
|
||||
### Agent
|
||||
|
||||
```csharp
|
||||
public sealed record Agent
|
||||
{
|
||||
public required Guid Id { get; init; }
|
||||
public required Guid TenantId { get; init; }
|
||||
public required string Name { get; init; }
|
||||
public required string Version { get; init; }
|
||||
public required ImmutableArray<string> Capabilities { get; init; }
|
||||
public required DateTimeOffset LastHeartbeat { get; init; }
|
||||
public required AgentState State { get; init; } // Online, Offline, Degraded
|
||||
public required ImmutableDictionary<string, string> Labels { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
| Table | Purpose |
|
||||
|-------|---------|
|
||||
| `release.environments` | Environment definitions with freeze windows |
|
||||
| `release.targets` | Deployment targets within environments |
|
||||
| `release.agents` | Registered deployment agents |
|
||||
| `release.components` | Component definitions (service → repository mapping) |
|
||||
| `release.releases` | Release bundles (version → component digests) |
|
||||
| `release.promotions` | Promotion requests and state |
|
||||
| `release.approvals` | Approval records |
|
||||
| `release.workflows` | Workflow templates |
|
||||
| `release.workflow_runs` | Workflow execution state |
|
||||
| `release.deployment_jobs` | Deployment job records |
|
||||
| `release.evidence_packets` | Sealed evidence records |
|
||||
| `release.integrations` | Integration configurations |
|
||||
| `release.plugins` | Plugin registrations |
|
||||
|
||||
## Gate Types
|
||||
|
||||
| Gate | Purpose | Evaluation |
|
||||
|------|---------|------------|
|
||||
| **Security** | Check scan verdict | Query latest scan for release digest; block on critical/high reachable |
|
||||
| **Approval** | Human sign-off | Count approvals; check SoD rules |
|
||||
| **FreezeWindow** | Calendar-based blocking | Check target environment freeze windows |
|
||||
| **PreviousEnvironment** | Require prior deployment | Verify release deployed to source environment |
|
||||
| **Policy** | Custom OPA/Rego rules | Evaluate policy with promotion context |
|
||||
| **HealthCheck** | Target health | Verify target is healthy before deploy |
|
||||
|
||||
## Plugin System (Three-Surface Model)
|
||||
|
||||
Plugins contribute through three surfaces:
|
||||
|
||||
### 1. Manifest (Static Declaration)
|
||||
|
||||
```yaml
|
||||
# plugin-manifest.yaml
|
||||
name: github-integration
|
||||
version: 1.0.0
|
||||
provider: StellaOps.Integration.GitHub.Plugin
|
||||
capabilities:
|
||||
integrations:
|
||||
- type: scm
|
||||
id: github
|
||||
displayName: GitHub
|
||||
steps:
|
||||
- type: github-status
|
||||
displayName: Update GitHub Status
|
||||
gates:
|
||||
- type: github-check
|
||||
displayName: GitHub Check Required
|
||||
```
|
||||
|
||||
### 2. Connector Runtime (Dynamic Execution)
|
||||
|
||||
```csharp
|
||||
public interface IIntegrationConnector
|
||||
{
|
||||
Task<ConnectionTestResult> TestConnectionAsync(CancellationToken ct);
|
||||
Task<HealthStatus> GetHealthAsync(CancellationToken ct);
|
||||
Task<IReadOnlyList<Resource>> DiscoverResourcesAsync(string resourceType, CancellationToken ct);
|
||||
}
|
||||
|
||||
public interface ISCMConnector : IIntegrationConnector
|
||||
{
|
||||
Task<CommitInfo> GetCommitAsync(string ref, CancellationToken ct);
|
||||
Task CreateCommitStatusAsync(string commit, CommitStatus status, CancellationToken ct);
|
||||
}
|
||||
|
||||
public interface IRegistryConnector : IIntegrationConnector
|
||||
{
|
||||
Task<string> ResolveDigestAsync(string imageRef, CancellationToken ct);
|
||||
Task<bool> VerifyDigestAsync(string imageRef, string expectedDigest, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Step Provider (Execution Contract)
|
||||
|
||||
```csharp
|
||||
public interface IStepProvider
|
||||
{
|
||||
StepExecutionCharacteristics Characteristics { get; }
|
||||
Task<StepResult> ExecuteAsync(StepContext context, CancellationToken ct);
|
||||
Task<StepResult> RollbackAsync(StepContext context, CancellationToken ct);
|
||||
}
|
||||
|
||||
public sealed record StepExecutionCharacteristics
|
||||
{
|
||||
public bool IsIdempotent { get; init; }
|
||||
public bool SupportsRollback { get; init; }
|
||||
public TimeSpan DefaultTimeout { get; init; }
|
||||
public ResourceRequirements Resources { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
## Invariants
|
||||
|
||||
1. **Release identity is immutable** — Once created, a release's component digests cannot be changed. Create a new release instead.
|
||||
|
||||
2. **Promotions are append-only** — Promotion state transitions are recorded; no edits or deletions.
|
||||
|
||||
3. **Evidence packets are sealed** — Evidence is cryptographically signed and stored immutably.
|
||||
|
||||
4. **Digest verification at deploy time** — Agents verify image digests at pull time; mismatch fails deployment.
|
||||
|
||||
5. **Separation of duties enforced** — Requester cannot be sole approver for production promotions.
|
||||
|
||||
6. **Workflow execution is deterministic** — Same inputs produce same execution order and outputs.
|
||||
|
||||
## Error Handling
|
||||
|
||||
- **Transient failures** — Retry with exponential backoff; circuit breaker for repeated failures
|
||||
- **Agent disconnection** — Mark agent offline; reassign pending tasks to other agents
|
||||
- **Deployment failure** — Automatic rollback if configured; otherwise mark promotion as failed
|
||||
- **Gate failure** — Block promotion; require manual intervention or re-evaluation
|
||||
|
||||
## Observability
|
||||
|
||||
### Metrics
|
||||
|
||||
- `release_promotions_total` — Counter by environment and outcome
|
||||
- `release_deployments_duration_seconds` — Histogram of deployment times
|
||||
- `release_gate_evaluations_total` — Counter by gate type and result
|
||||
- `release_agents_online` — Gauge of online agents
|
||||
- `release_workflow_steps_duration_seconds` — Histogram by step type
|
||||
|
||||
### Traces
|
||||
|
||||
- `promotion.request` — Span for promotion request handling
|
||||
- `gate.evaluate` — Span for each gate evaluation
|
||||
- `deployment.execute` — Span for deployment execution
|
||||
- `agent.task` — Span for agent task execution
|
||||
|
||||
### Logs
|
||||
|
||||
- Structured logs with correlation IDs
|
||||
- Promotion ID, release ID, environment ID in all relevant logs
|
||||
- Sensitive data (secrets, credentials) masked
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Agent Security
|
||||
|
||||
- **mTLS authentication** — Agents authenticate with CA-signed certificates
|
||||
- **Short-lived credentials** — Task credentials expire after execution
|
||||
- **Capability-based authorization** — Agents only receive tasks matching their capabilities
|
||||
- **Heartbeat monitoring** — Detect and flag agent disconnections
|
||||
|
||||
### Secrets Management
|
||||
|
||||
- **Never stored in database** — Only vault references stored
|
||||
- **Fetched at execution time** — Secrets retrieved just-in-time for deployment
|
||||
- **Short-lived** — Dynamic credentials with minimal TTL
|
||||
- **Masked in logs** — Secret values never logged
|
||||
|
||||
### Plugin Sandbox
|
||||
|
||||
- **Resource limits** — CPU, memory, timeout limits per plugin
|
||||
- **Capability restrictions** — Plugins declare required capabilities
|
||||
- **Network isolation** — Optional network restrictions for plugins
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- **Promotion evaluation** — < 5 seconds for typical gate evaluation
|
||||
- **Deployment latency** — Dominated by image pull time; orchestration overhead < 10 seconds
|
||||
- **Agent heartbeat** — 30-second interval; offline detection within 90 seconds
|
||||
- **Workflow step timeout** — Configurable; default 5 minutes per step
|
||||
|
||||
## Implementation Roadmap
|
||||
|
||||
| Phase | Focus | Key Deliverables |
|
||||
|-------|-------|------------------|
|
||||
| **Phase 1** | Foundation | Environment management, integration hub, release bundles |
|
||||
| **Phase 2** | Workflow Engine | DAG execution, step registry, workflow templates |
|
||||
| **Phase 3** | Promotion & Decision | Approval gateway, security gates, decision records |
|
||||
| **Phase 4** | Deployment Execution | Docker/Compose agents, artifact generation, rollback |
|
||||
| **Phase 5** | UI & Polish | Release dashboard, promotion UI, environment management |
|
||||
| **Phase 6** | Progressive Delivery | A/B releases, canary, traffic routing |
|
||||
| **Phase 7** | Extended Targets | ECS, Nomad, SSH/WinRM agentless |
|
||||
| **Phase 8** | Plugin Ecosystem | Full plugin system, marketplace |
|
||||
|
||||
## References
|
||||
|
||||
- [Product Vision](../../product/VISION.md)
|
||||
- [Architecture Overview](../../ARCHITECTURE_OVERVIEW.md)
|
||||
- [Full Orchestrator Specification](../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
|
||||
- [Competitive Landscape](../../product/competitive-landscape.md)
|
||||
Reference in New Issue
Block a user