release orchestrator pivot, architecture and planning

This commit is contained in:
2026-01-10 22:37:22 +02:00
parent c84f421e2f
commit d509c44411
130 changed files with 70292 additions and 721 deletions

View File

@@ -0,0 +1,410 @@
# Release Orchestrator Architecture
> Technical architecture specification for the Release Orchestrator — Stella Ops Suite's central release control plane for non-Kubernetes container estates.
**Status:** Planned (not yet implemented)
## Overview
The Release Orchestrator transforms Stella Ops Suite from a vulnerability scanning platform into a centralized, auditable release control plane. It sits between CI systems and runtime targets, governing promotion across environments, enforcing security and policy gates, and producing verifiable evidence for every release decision.
### Core Value Proposition
- **Release orchestration** — UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks
- **Security decisioning as a gate** — Scan on build, evaluate on release, re-evaluate on CVE updates
- **OCI-digest-first releases** — Immutable digest-based release identity
- **Toolchain-agnostic integrations** — Plug into any SCM, CI, registry, secrets system
- **Auditability + standards** — Evidence packets, SBOM/VEX/attestation support, deterministic replay
## Design Principles
1. **Digest-First Release Identity** — A release is an immutable set of OCI digests, never mutable tags. Tags are resolved to digests at release creation time.
2. **Pluggable Everything, Stable Core** — Integrations are plugins; the core orchestration engine is stable. Plugins contribute UI screens, connector logic, step types, and agent types.
3. **Evidence for Every Decision** — Every deployment/promotion produces an immutable evidence record containing who, what, why, how, and when.
4. **No Feature Gating** — All plans include all features. Limits are only: environments, new digests/day, fair use on deployments.
5. **Offline-First Operation** — All core operations work in air-gapped environments. Plugins may require connectivity; core does not.
6. **Immutable Generated Artifacts** — Every deployment generates and stores immutable artifacts (compose lockfiles, scripts, evidence).
## Platform Themes
The Release Orchestrator introduces ten new functional themes:
| Theme | Purpose | Key Modules |
|-------|---------|-------------|
| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime |
| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager |
| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager |
| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor |
| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine |
| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Artifact Generator |
| **AGENTS** | Deployment agents | Agent Core, Docker/Compose/ECS/Nomad agents |
| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller |
| **RELEVI** | Release evidence | Evidence Collector, Sticker Writer, Audit Exporter |
| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin SDK |
## Components
```
ReleaseOrchestrator/
├── __Libraries/
│ ├── StellaOps.ReleaseOrchestrator.Core/ # Core domain models
│ ├── StellaOps.ReleaseOrchestrator.Workflow/ # DAG workflow engine
│ ├── StellaOps.ReleaseOrchestrator.Promotion/ # Promotion logic
│ ├── StellaOps.ReleaseOrchestrator.Deploy/ # Deployment coordination
│ ├── StellaOps.ReleaseOrchestrator.Evidence/ # Evidence generation
│ ├── StellaOps.ReleaseOrchestrator.Plugin/ # Plugin infrastructure
│ └── StellaOps.ReleaseOrchestrator.Integration/ # Integration connectors
├── StellaOps.ReleaseOrchestrator.WebService/ # HTTP API
├── StellaOps.ReleaseOrchestrator.Worker/ # Background processing
├── StellaOps.Agent.Core/ # Agent base framework
├── StellaOps.Agent.Docker/ # Docker host agent
├── StellaOps.Agent.Compose/ # Docker Compose agent
├── StellaOps.Agent.SSH/ # SSH agentless executor
├── StellaOps.Agent.WinRM/ # WinRM agentless executor
├── StellaOps.Agent.ECS/ # AWS ECS agent
├── StellaOps.Agent.Nomad/ # HashiCorp Nomad agent
└── __Tests/
└── StellaOps.ReleaseOrchestrator.*.Tests/
```
## Data Flow
### Release Orchestration Flow
```
CI Build → Registry Push → Webhook → Stella Scan → Create Release →
Request Promotion → Gate Evaluation → Decision Record →
Deploy via Agent → Version Sticker → Evidence Packet
```
### Detailed Flow
1. **CI pushes image** to registry by digest; triggers webhook to Stella
2. **Stella scans** the new digest (if not already scanned); stores verdict
3. **Release created** bundling component digests with semantic version
4. **Promotion requested** to move release from source → target environment
5. **Gate evaluation** runs: security verdict, approval count, freeze windows, custom policies
6. **Decision record** produced with evidence refs and signed
7. **Deployment executed** via agent to target (Docker/Compose/ECS/Nomad)
8. **Version sticker** written to target for drift detection
9. **Evidence packet** sealed and stored
## Key Abstractions
### Environment
```csharp
public sealed record Environment
{
public required Guid Id { get; init; }
public required Guid TenantId { get; init; }
public required string Name { get; init; } // "dev", "stage", "prod"
public required string Slug { get; init; } // URL-safe identifier
public required int PromotionOrder { get; init; } // 1, 2, 3...
public required FreezeWindow[] FreezeWindows { get; init; }
public required ApprovalPolicy ApprovalPolicy { get; init; }
public required bool IsProduction { get; init; }
public EnvironmentState State { get; init; } // Active, Frozen, Retired
}
```
### Release
```csharp
public sealed record Release
{
public required Guid Id { get; init; }
public required Guid TenantId { get; init; }
public required string Version { get; init; } // SemVer: "2.3.1"
public required string Name { get; init; } // Display name
public required ImmutableDictionary<string, ComponentDigest> Components { get; init; }
public required string SourceRef { get; init; } // Git SHA or tag
public required DateTimeOffset CreatedAt { get; init; }
public required Guid CreatedBy { get; init; }
public ReleaseState State { get; init; } // Draft, Active, Deprecated
}
public sealed record ComponentDigest
{
public required string Repository { get; init; } // registry.example.com/app/api
public required string Digest { get; init; } // sha256:abc123...
public required string? ResolvedFromTag { get; init; } // Optional: "v2.3.1"
}
```
### Promotion
```csharp
public sealed record Promotion
{
public required Guid Id { get; init; }
public required Guid TenantId { get; init; }
public required Guid ReleaseId { get; init; }
public required Guid SourceEnvironmentId { get; init; }
public required Guid TargetEnvironmentId { get; init; }
public required Guid RequestedBy { get; init; }
public required DateTimeOffset RequestedAt { get; init; }
public PromotionState State { get; init; } // Pending, Approved, Rejected, Deployed, RolledBack
public required ImmutableArray<GateResult> GateResults { get; init; }
public required ImmutableArray<ApprovalRecord> Approvals { get; init; }
public required DecisionRecord? Decision { get; init; }
}
```
### Workflow
```csharp
public sealed record Workflow
{
public required Guid Id { get; init; }
public required string Name { get; init; }
public required ImmutableArray<WorkflowStep> Steps { get; init; }
public required ImmutableDictionary<string, string[]> DependencyGraph { get; init; }
}
public sealed record WorkflowStep
{
public required string Id { get; init; }
public required string Type { get; init; } // "script", "approval", "deploy", "gate"
public required StepProvider Provider { get; init; }
public required ImmutableDictionary<string, object> Config { get; init; }
public required string[] DependsOn { get; init; }
public StepState State { get; init; }
}
```
### Target
```csharp
public sealed record Target
{
public required Guid Id { get; init; }
public required Guid TenantId { get; init; }
public required Guid EnvironmentId { get; init; }
public required string Name { get; init; }
public required TargetType Type { get; init; } // DockerHost, ComposeHost, ECSService, NomadJob
public required ImmutableDictionary<string, string> Labels { get; init; }
public required Guid? AgentId { get; init; } // Null for agentless
public required TargetState State { get; init; }
public required HealthStatus Health { get; init; }
}
public enum TargetType
{
DockerHost,
ComposeHost,
ECSService,
NomadJob,
SSHRemote,
WinRMRemote
}
```
### Agent
```csharp
public sealed record Agent
{
public required Guid Id { get; init; }
public required Guid TenantId { get; init; }
public required string Name { get; init; }
public required string Version { get; init; }
public required ImmutableArray<string> Capabilities { get; init; }
public required DateTimeOffset LastHeartbeat { get; init; }
public required AgentState State { get; init; } // Online, Offline, Degraded
public required ImmutableDictionary<string, string> Labels { get; init; }
}
```
## Database Schema
| Table | Purpose |
|-------|---------|
| `release.environments` | Environment definitions with freeze windows |
| `release.targets` | Deployment targets within environments |
| `release.agents` | Registered deployment agents |
| `release.components` | Component definitions (service → repository mapping) |
| `release.releases` | Release bundles (version → component digests) |
| `release.promotions` | Promotion requests and state |
| `release.approvals` | Approval records |
| `release.workflows` | Workflow templates |
| `release.workflow_runs` | Workflow execution state |
| `release.deployment_jobs` | Deployment job records |
| `release.evidence_packets` | Sealed evidence records |
| `release.integrations` | Integration configurations |
| `release.plugins` | Plugin registrations |
## Gate Types
| Gate | Purpose | Evaluation |
|------|---------|------------|
| **Security** | Check scan verdict | Query latest scan for release digest; block on critical/high reachable |
| **Approval** | Human sign-off | Count approvals; check SoD rules |
| **FreezeWindow** | Calendar-based blocking | Check target environment freeze windows |
| **PreviousEnvironment** | Require prior deployment | Verify release deployed to source environment |
| **Policy** | Custom OPA/Rego rules | Evaluate policy with promotion context |
| **HealthCheck** | Target health | Verify target is healthy before deploy |
## Plugin System (Three-Surface Model)
Plugins contribute through three surfaces:
### 1. Manifest (Static Declaration)
```yaml
# plugin-manifest.yaml
name: github-integration
version: 1.0.0
provider: StellaOps.Integration.GitHub.Plugin
capabilities:
integrations:
- type: scm
id: github
displayName: GitHub
steps:
- type: github-status
displayName: Update GitHub Status
gates:
- type: github-check
displayName: GitHub Check Required
```
### 2. Connector Runtime (Dynamic Execution)
```csharp
public interface IIntegrationConnector
{
Task<ConnectionTestResult> TestConnectionAsync(CancellationToken ct);
Task<HealthStatus> GetHealthAsync(CancellationToken ct);
Task<IReadOnlyList<Resource>> DiscoverResourcesAsync(string resourceType, CancellationToken ct);
}
public interface ISCMConnector : IIntegrationConnector
{
Task<CommitInfo> GetCommitAsync(string ref, CancellationToken ct);
Task CreateCommitStatusAsync(string commit, CommitStatus status, CancellationToken ct);
}
public interface IRegistryConnector : IIntegrationConnector
{
Task<string> ResolveDigestAsync(string imageRef, CancellationToken ct);
Task<bool> VerifyDigestAsync(string imageRef, string expectedDigest, CancellationToken ct);
}
```
### 3. Step Provider (Execution Contract)
```csharp
public interface IStepProvider
{
StepExecutionCharacteristics Characteristics { get; }
Task<StepResult> ExecuteAsync(StepContext context, CancellationToken ct);
Task<StepResult> RollbackAsync(StepContext context, CancellationToken ct);
}
public sealed record StepExecutionCharacteristics
{
public bool IsIdempotent { get; init; }
public bool SupportsRollback { get; init; }
public TimeSpan DefaultTimeout { get; init; }
public ResourceRequirements Resources { get; init; }
}
```
## Invariants
1. **Release identity is immutable** — Once created, a release's component digests cannot be changed. Create a new release instead.
2. **Promotions are append-only** — Promotion state transitions are recorded; no edits or deletions.
3. **Evidence packets are sealed** — Evidence is cryptographically signed and stored immutably.
4. **Digest verification at deploy time** — Agents verify image digests at pull time; mismatch fails deployment.
5. **Separation of duties enforced** — Requester cannot be sole approver for production promotions.
6. **Workflow execution is deterministic** — Same inputs produce same execution order and outputs.
## Error Handling
- **Transient failures** — Retry with exponential backoff; circuit breaker for repeated failures
- **Agent disconnection** — Mark agent offline; reassign pending tasks to other agents
- **Deployment failure** — Automatic rollback if configured; otherwise mark promotion as failed
- **Gate failure** — Block promotion; require manual intervention or re-evaluation
## Observability
### Metrics
- `release_promotions_total` — Counter by environment and outcome
- `release_deployments_duration_seconds` — Histogram of deployment times
- `release_gate_evaluations_total` — Counter by gate type and result
- `release_agents_online` — Gauge of online agents
- `release_workflow_steps_duration_seconds` — Histogram by step type
### Traces
- `promotion.request` — Span for promotion request handling
- `gate.evaluate` — Span for each gate evaluation
- `deployment.execute` — Span for deployment execution
- `agent.task` — Span for agent task execution
### Logs
- Structured logs with correlation IDs
- Promotion ID, release ID, environment ID in all relevant logs
- Sensitive data (secrets, credentials) masked
## Security Considerations
### Agent Security
- **mTLS authentication** — Agents authenticate with CA-signed certificates
- **Short-lived credentials** — Task credentials expire after execution
- **Capability-based authorization** — Agents only receive tasks matching their capabilities
- **Heartbeat monitoring** — Detect and flag agent disconnections
### Secrets Management
- **Never stored in database** — Only vault references stored
- **Fetched at execution time** — Secrets retrieved just-in-time for deployment
- **Short-lived** — Dynamic credentials with minimal TTL
- **Masked in logs** — Secret values never logged
### Plugin Sandbox
- **Resource limits** — CPU, memory, timeout limits per plugin
- **Capability restrictions** — Plugins declare required capabilities
- **Network isolation** — Optional network restrictions for plugins
## Performance Characteristics
- **Promotion evaluation** — < 5 seconds for typical gate evaluation
- **Deployment latency** Dominated by image pull time; orchestration overhead < 10 seconds
- **Agent heartbeat** 30-second interval; offline detection within 90 seconds
- **Workflow step timeout** Configurable; default 5 minutes per step
## Implementation Roadmap
| Phase | Focus | Key Deliverables |
|-------|-------|------------------|
| **Phase 1** | Foundation | Environment management, integration hub, release bundles |
| **Phase 2** | Workflow Engine | DAG execution, step registry, workflow templates |
| **Phase 3** | Promotion & Decision | Approval gateway, security gates, decision records |
| **Phase 4** | Deployment Execution | Docker/Compose agents, artifact generation, rollback |
| **Phase 5** | UI & Polish | Release dashboard, promotion UI, environment management |
| **Phase 6** | Progressive Delivery | A/B releases, canary, traffic routing |
| **Phase 7** | Extended Targets | ECS, Nomad, SSH/WinRM agentless |
| **Phase 8** | Plugin Ecosystem | Full plugin system, marketplace |
## References
- [Product Vision](../../product/VISION.md)
- [Architecture Overview](../../ARCHITECTURE_OVERVIEW.md)
- [Full Orchestrator Specification](../../product/advisories/09-Jan-2026%20-%20Stella%20Ops%20Orchestrator%20Architecture.md)
- [Competitive Landscape](../../product/competitive-landscape.md)