Files
git.stella-ops.org/docs/modules/release-orchestrator/architecture.md

17 KiB

Release Orchestrator Architecture

Technical architecture specification for the Release Orchestrator — Stella Ops Suite's central release control plane for non-Kubernetes container estates.

Status: Planned (not yet implemented)

Overview

The Release Orchestrator transforms Stella Ops Suite from a vulnerability scanning platform into a centralized, auditable release control plane. It sits between CI systems and runtime targets, governing promotion across environments, enforcing security and policy gates, and producing verifiable evidence for every release decision.

Core Value Proposition

  • Release orchestration — UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks
  • Security decisioning as a gate — Scan on build, evaluate on release, re-evaluate on CVE updates
  • OCI-digest-first releases — Immutable digest-based release identity
  • Toolchain-agnostic integrations — Plug into any SCM, CI, registry, secrets system
  • Auditability + standards — Evidence packets, SBOM/VEX/attestation support, deterministic replay

Design Principles

  1. Digest-First Release Identity — A release is an immutable set of OCI digests, never mutable tags. Tags are resolved to digests at release creation time.

  2. Pluggable Everything, Stable Core — Integrations are plugins; the core orchestration engine is stable. Plugins contribute UI screens, connector logic, step types, and agent types.

  3. Evidence for Every Decision — Every deployment/promotion produces an immutable evidence record containing who, what, why, how, and when.

  4. No Feature Gating — All plans include all features. Limits are only: environments, new digests/day, fair use on deployments.

  5. Offline-First Operation — All core operations work in air-gapped environments. Plugins may require connectivity; core does not.

  6. Immutable Generated Artifacts — Every deployment generates and stores immutable artifacts (compose lockfiles, scripts, evidence).

Platform Themes

The Release Orchestrator introduces ten new functional themes:

Theme Purpose Key Modules
INTHUB Integration hub Integration Manager, Connection Profiles, Connector Runtime
ENVMGR Environment management Environment Manager, Target Registry, Agent Manager
RELMAN Release management Component Registry, Version Manager, Release Manager
WORKFL Workflow engine Workflow Designer, Workflow Engine, Step Executor
PROMOT Promotion and approval Promotion Manager, Approval Gateway, Decision Engine
DEPLOY Deployment execution Deploy Orchestrator, Target Executor, Artifact Generator
AGENTS Deployment agents Agent Core, Docker/Compose/ECS/Nomad agents
PROGDL Progressive delivery A/B Manager, Traffic Router, Canary Controller
RELEVI Release evidence Evidence Collector, Sticker Writer, Audit Exporter
PLUGIN Plugin infrastructure Plugin Registry, Plugin Loader, Plugin SDK

Components

ReleaseOrchestrator/
├── __Libraries/
│   ├── StellaOps.ReleaseOrchestrator.Core/           # Core domain models
│   ├── StellaOps.ReleaseOrchestrator.Workflow/       # DAG workflow engine
│   ├── StellaOps.ReleaseOrchestrator.Promotion/      # Promotion logic
│   ├── StellaOps.ReleaseOrchestrator.Deploy/         # Deployment coordination
│   ├── StellaOps.ReleaseOrchestrator.Evidence/       # Evidence generation
│   ├── StellaOps.ReleaseOrchestrator.Plugin/         # Plugin infrastructure
│   └── StellaOps.ReleaseOrchestrator.Integration/    # Integration connectors
├── StellaOps.ReleaseOrchestrator.WebService/         # HTTP API
├── StellaOps.ReleaseOrchestrator.Worker/             # Background processing
├── StellaOps.Agent.Core/                             # Agent base framework
├── StellaOps.Agent.Docker/                           # Docker host agent
├── StellaOps.Agent.Compose/                          # Docker Compose agent
├── StellaOps.Agent.SSH/                              # SSH agentless executor
├── StellaOps.Agent.WinRM/                            # WinRM agentless executor
├── StellaOps.Agent.ECS/                              # AWS ECS agent
├── StellaOps.Agent.Nomad/                            # HashiCorp Nomad agent
└── __Tests/
    └── StellaOps.ReleaseOrchestrator.*.Tests/

Data Flow

Release Orchestration Flow

CI Build → Registry Push → Webhook → Stella Scan → Create Release →
Request Promotion → Gate Evaluation → Decision Record →
Deploy via Agent → Version Sticker → Evidence Packet

Detailed Flow

  1. CI pushes image to registry by digest; triggers webhook to Stella
  2. Stella scans the new digest (if not already scanned); stores verdict
  3. Release created bundling component digests with semantic version
  4. Promotion requested to move release from source → target environment
  5. Gate evaluation runs: security verdict, approval count, freeze windows, custom policies
  6. Decision record produced with evidence refs and signed
  7. Deployment executed via agent to target (Docker/Compose/ECS/Nomad)
  8. Version sticker written to target for drift detection
  9. Evidence packet sealed and stored

Key Abstractions

Environment

public sealed record Environment
{
    public required Guid Id { get; init; }
    public required Guid TenantId { get; init; }
    public required string Name { get; init; }        // "dev", "stage", "prod"
    public required string Slug { get; init; }        // URL-safe identifier
    public required int PromotionOrder { get; init; } // 1, 2, 3...
    public required FreezeWindow[] FreezeWindows { get; init; }
    public required ApprovalPolicy ApprovalPolicy { get; init; }
    public required bool IsProduction { get; init; }
    public EnvironmentState State { get; init; }      // Active, Frozen, Retired
}

Release

public sealed record Release
{
    public required Guid Id { get; init; }
    public required Guid TenantId { get; init; }
    public required string Version { get; init; }     // SemVer: "2.3.1"
    public required string Name { get; init; }        // Display name
    public required ImmutableDictionary<string, ComponentDigest> Components { get; init; }
    public required string SourceRef { get; init; }   // Git SHA or tag
    public required DateTimeOffset CreatedAt { get; init; }
    public required Guid CreatedBy { get; init; }
    public ReleaseState State { get; init; }          // Draft, Active, Deprecated
}

public sealed record ComponentDigest
{
    public required string Repository { get; init; }  // registry.example.com/app/api
    public required string Digest { get; init; }      // sha256:abc123...
    public required string? ResolvedFromTag { get; init; } // Optional: "v2.3.1"
}

Promotion

public sealed record Promotion
{
    public required Guid Id { get; init; }
    public required Guid TenantId { get; init; }
    public required Guid ReleaseId { get; init; }
    public required Guid SourceEnvironmentId { get; init; }
    public required Guid TargetEnvironmentId { get; init; }
    public required Guid RequestedBy { get; init; }
    public required DateTimeOffset RequestedAt { get; init; }
    public PromotionState State { get; init; }        // Pending, Approved, Rejected, Deployed, RolledBack
    public required ImmutableArray<GateResult> GateResults { get; init; }
    public required ImmutableArray<ApprovalRecord> Approvals { get; init; }
    public required DecisionRecord? Decision { get; init; }
}

Workflow

public sealed record Workflow
{
    public required Guid Id { get; init; }
    public required string Name { get; init; }
    public required ImmutableArray<WorkflowStep> Steps { get; init; }
    public required ImmutableDictionary<string, string[]> DependencyGraph { get; init; }
}

public sealed record WorkflowStep
{
    public required string Id { get; init; }
    public required string Type { get; init; }        // "script", "approval", "deploy", "gate"
    public required StepProvider Provider { get; init; }
    public required ImmutableDictionary<string, object> Config { get; init; }
    public required string[] DependsOn { get; init; }
    public StepState State { get; init; }
}

Target

public sealed record Target
{
    public required Guid Id { get; init; }
    public required Guid TenantId { get; init; }
    public required Guid EnvironmentId { get; init; }
    public required string Name { get; init; }
    public required TargetType Type { get; init; }    // DockerHost, ComposeHost, ECSService, NomadJob
    public required ImmutableDictionary<string, string> Labels { get; init; }
    public required Guid? AgentId { get; init; }      // Null for agentless
    public required TargetState State { get; init; }
    public required HealthStatus Health { get; init; }
}

public enum TargetType
{
    DockerHost,
    ComposeHost,
    ECSService,
    NomadJob,
    SSHRemote,
    WinRMRemote
}

Agent

public sealed record Agent
{
    public required Guid Id { get; init; }
    public required Guid TenantId { get; init; }
    public required string Name { get; init; }
    public required string Version { get; init; }
    public required ImmutableArray<string> Capabilities { get; init; }
    public required DateTimeOffset LastHeartbeat { get; init; }
    public required AgentState State { get; init; }   // Online, Offline, Degraded
    public required ImmutableDictionary<string, string> Labels { get; init; }
}

Database Schema

Table Purpose
release.environments Environment definitions with freeze windows
release.targets Deployment targets within environments
release.agents Registered deployment agents
release.components Component definitions (service → repository mapping)
release.releases Release bundles (version → component digests)
release.promotions Promotion requests and state
release.approvals Approval records
release.workflows Workflow templates
release.workflow_runs Workflow execution state
release.deployment_jobs Deployment job records
release.evidence_packets Sealed evidence records
release.integrations Integration configurations
release.plugins Plugin registrations

Gate Types

Gate Purpose Evaluation
Security Check scan verdict Query latest scan for release digest; block on critical/high reachable
Approval Human sign-off Count approvals; check SoD rules
FreezeWindow Calendar-based blocking Check target environment freeze windows
PreviousEnvironment Require prior deployment Verify release deployed to source environment
Policy Custom OPA/Rego rules Evaluate policy with promotion context
HealthCheck Target health Verify target is healthy before deploy

Plugin System (Three-Surface Model)

Plugins contribute through three surfaces:

1. Manifest (Static Declaration)

# plugin-manifest.yaml
name: github-integration
version: 1.0.0
provider: StellaOps.Integration.GitHub.Plugin
capabilities:
  integrations:
    - type: scm
      id: github
      displayName: GitHub
  steps:
    - type: github-status
      displayName: Update GitHub Status
  gates:
    - type: github-check
      displayName: GitHub Check Required

2. Connector Runtime (Dynamic Execution)

public interface IIntegrationConnector
{
    Task<ConnectionTestResult> TestConnectionAsync(CancellationToken ct);
    Task<HealthStatus> GetHealthAsync(CancellationToken ct);
    Task<IReadOnlyList<Resource>> DiscoverResourcesAsync(string resourceType, CancellationToken ct);
}

public interface ISCMConnector : IIntegrationConnector
{
    Task<CommitInfo> GetCommitAsync(string ref, CancellationToken ct);
    Task CreateCommitStatusAsync(string commit, CommitStatus status, CancellationToken ct);
}

public interface IRegistryConnector : IIntegrationConnector
{
    Task<string> ResolveDigestAsync(string imageRef, CancellationToken ct);
    Task<bool> VerifyDigestAsync(string imageRef, string expectedDigest, CancellationToken ct);
}

3. Step Provider (Execution Contract)

public interface IStepProvider
{
    StepExecutionCharacteristics Characteristics { get; }
    Task<StepResult> ExecuteAsync(StepContext context, CancellationToken ct);
    Task<StepResult> RollbackAsync(StepContext context, CancellationToken ct);
}

public sealed record StepExecutionCharacteristics
{
    public bool IsIdempotent { get; init; }
    public bool SupportsRollback { get; init; }
    public TimeSpan DefaultTimeout { get; init; }
    public ResourceRequirements Resources { get; init; }
}

Invariants

  1. Release identity is immutable — Once created, a release's component digests cannot be changed. Create a new release instead.

  2. Promotions are append-only — Promotion state transitions are recorded; no edits or deletions.

  3. Evidence packets are sealed — Evidence is cryptographically signed and stored immutably.

  4. Digest verification at deploy time — Agents verify image digests at pull time; mismatch fails deployment.

  5. Separation of duties enforced — Requester cannot be sole approver for production promotions.

  6. Workflow execution is deterministic — Same inputs produce same execution order and outputs.

Error Handling

  • Transient failures — Retry with exponential backoff; circuit breaker for repeated failures
  • Agent disconnection — Mark agent offline; reassign pending tasks to other agents
  • Deployment failure — Automatic rollback if configured; otherwise mark promotion as failed
  • Gate failure — Block promotion; require manual intervention or re-evaluation

Observability

Metrics

  • release_promotions_total — Counter by environment and outcome
  • release_deployments_duration_seconds — Histogram of deployment times
  • release_gate_evaluations_total — Counter by gate type and result
  • release_agents_online — Gauge of online agents
  • release_workflow_steps_duration_seconds — Histogram by step type

Traces

  • promotion.request — Span for promotion request handling
  • gate.evaluate — Span for each gate evaluation
  • deployment.execute — Span for deployment execution
  • agent.task — Span for agent task execution

Logs

  • Structured logs with correlation IDs
  • Promotion ID, release ID, environment ID in all relevant logs
  • Sensitive data (secrets, credentials) masked

Security Considerations

Agent Security

  • mTLS authentication — Agents authenticate with CA-signed certificates
  • Short-lived credentials — Task credentials expire after execution
  • Capability-based authorization — Agents only receive tasks matching their capabilities
  • Heartbeat monitoring — Detect and flag agent disconnections

Secrets Management

  • Never stored in database — Only vault references stored
  • Fetched at execution time — Secrets retrieved just-in-time for deployment
  • Short-lived — Dynamic credentials with minimal TTL
  • Masked in logs — Secret values never logged

Plugin Sandbox

  • Resource limits — CPU, memory, timeout limits per plugin
  • Capability restrictions — Plugins declare required capabilities
  • Network isolation — Optional network restrictions for plugins

Performance Characteristics

  • Promotion evaluation — < 5 seconds for typical gate evaluation
  • Deployment latency — Dominated by image pull time; orchestration overhead < 10 seconds
  • Agent heartbeat — 30-second interval; offline detection within 90 seconds
  • Workflow step timeout — Configurable; default 5 minutes per step

Implementation Roadmap

Phase Focus Key Deliverables
Phase 1 Foundation Environment management, integration hub, release bundles
Phase 2 Workflow Engine DAG execution, step registry, workflow templates
Phase 3 Promotion & Decision Approval gateway, security gates, decision records
Phase 4 Deployment Execution Docker/Compose agents, artifact generation, rollback
Phase 5 UI & Polish Release dashboard, promotion UI, environment management
Phase 6 Progressive Delivery A/B releases, canary, traffic routing
Phase 7 Extended Targets ECS, Nomad, SSH/WinRM agentless
Phase 8 Plugin Ecosystem Full plugin system, marketplace

References