release orchestrator pivot, architecture and planning

2026-01-10 22:37:22 +02:00
parent c84f421e2f
commit d509c44411
130 changed files with 70292 additions and 721 deletions
--- a/docs/ARCHITECTURE_OVERVIEW.md
+++ b/docs/ARCHITECTURE_OVERVIEW.md
@@ -1,41 +1,84 @@
 # Architecture Overview (High-Level)

-This document is the 10-minute tour for StellaOps: what components exist, how they fit together, and what "offline-first + deterministic + evidence-linked decisions" means in practice.
+This document is the 10-minute tour for Stella Ops Suite: what components exist, how they fit together, and what "release control plane + security gates + evidence-linked decisions" means in practice.

 For the full reference map (services, boundaries, detailed flows), see `docs/ARCHITECTURE_REFERENCE.md`.

+## What Stella Ops Suite Is
+
+**Stella Ops Suite is a centralized, auditable release control plane for non-Kubernetes container estates.**
+
+It sits between your CI and your runtime targets, governs promotion across environments, enforces security and policy gates, and produces verifiable evidence for every release decision.
+
+```
+CI Build → Registry → Stella (Scan + Release + Promote + Gate + Deploy) → Targets → Evidence
+```
+
 ## Guiding Principles

- **SBOM-first:** scan and reason over SBOMs; fall back to unpacking only when needed.
+- **Digest-first releases:** a release is an immutable set of OCI digests, never mutable tags.
 - **Deterministic replay:** the same inputs yield the same outputs (stable ordering, canonical hashing, UTC timestamps).
- **Evidence-linked decisions:** policy decisions link back to specific evidence artifacts (SBOM slices, advisory/VEX observations, reachability proofs, attestations).
- **Aggregation-not-merge:** upstream advisories and VEX are stored and exposed with provenance; conflicts are visible, not silently collapsed.
- **Offline-first:** the same workflow runs connected or air-gapped via Offline Kit snapshots and signed bundles.
+- **Evidence-linked decisions:** every release decision links to concrete evidence artifacts (scan verdicts, approvals, policy evaluations).
+- **Pluggable everything:** integrations are plugins; the core orchestration engine is stable.
+- **Offline-first:** all core operations work in air-gapped environments.
+- **No feature gating:** all plans include all features; limits are environments + new digests/day.

-## System Map (What Runs)
+## System Map
+
+### Release-Centric Flow

 ```
-Build -> Sign -> Store -> Scan -> Decide -> Attest -> Notify/Export
+Build → Scan → Create Release → Request Promotion → Gate Evaluation → Deploy → Evidence
+        ↑                              ↓
+        └── Re-evaluate on CVE Updates ┘
 ```

-At a high level, StellaOps is a set of services grouped by responsibility:
+### Platform Themes

- **Identity and authorization:** Authority (OIDC/OAuth2, scopes/tenancy)
- **Scanning and SBOM:** Scanner WebService + Worker (facts generation)
- **Advisories:** Concelier (ingest/normalize/export vulnerability sources)
- **VEX:** Excititor + VEX Lens (VEX observations/linksets and exploration)
- **Decisioning:** Policy Engine surfaces (lattice-style explainable policy)
- **Signing and transparency:** Signer + Attestor (DSSE/in-toto and optional transparency)
- **Orchestration and delivery:** Scheduler, Notify, Export Center
- **Console:** Web UI for operators and auditors
+Stella Ops Suite organizes capabilities into **themes** (functional areas):

-| Tier | Services | Key responsibilities |
+#### Existing Themes (Operational)
+
+| Theme | Purpose | Key Modules |
+|-------|---------|-------------|
+| **INGEST** | Advisory ingestion | Concelier, Advisory-AI |
+| **VEXOPS** | VEX document handling | Excititor, VEX Lens, VEX Hub |
+| **REASON** | Policy and decisioning | Policy Engine, OPA Runtime |
+| **SCANENG** | Scanning and SBOM | Scanner, SBOM Service, Reachability |
+| **EVIDENCE** | Evidence and attestation | Evidence Locker, Attestor, Export Center |
+| **RUNTIME** | Runtime signals | Signals, Graph, Zastava |
+| **JOBCTRL** | Job orchestration | Scheduler, Orchestrator, TaskRunner |
+| **OBSERVE** | Observability | Notifier, Telemetry |
+| **REPLAY** | Deterministic replay | Replay Engine |
+| **DEVEXP** | Developer experience | CLI, Web UI, SDK |
+
+#### Planned Themes (Release Orchestration)
+
+| Theme | Purpose | Key Modules |
+|-------|---------|-------------|
+| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime, Doctor Checks |
+| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager, Inventory Sync |
+| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager, Release Catalog |
+| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor, Step Registry |
+| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine, Gate Registry |
+| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Runner Executor, Artifact Generator, Rollback Manager |
+| **AGENTS** | Deployment agents | Agent Core, Agent Docker, Agent Compose, Agent SSH, Agent WinRM, Agent ECS, Agent Nomad |
+| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller, Rollout Strategy |
+| **RELEVI** | Release evidence | Evidence Collector, Evidence Signer, Sticker Writer, Audit Exporter |
+| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin Sandbox, Plugin SDK |
+
+### Service Tiers
+
+| Tier | Services | Key Responsibilities |
 |------|----------|----------------------|
-| **Edge / Identity** | `StellaOps.Authority` | Issues short-lived tokens (DPoP + mTLS), exposes OIDC device-code + auth-code flows, rotates JWKS. |
-| **Scan & attest** | `StellaOps.Scanner` (API + Worker), `StellaOps.Signer`, `StellaOps.Attestor` | Accept SBOMs/images, drive analyzers, produce DSSE bundles, optionally log to a Rekor mirror. |
-| **Evidence graph** | `StellaOps.Concelier`, `StellaOps.Excititor`, `StellaOps.Policy.Engine` | Ingest advisories/VEX, correlate linksets, run lattice policy and VEX-first decisioning. |
-| **Experience** | `StellaOps.Web` (Console), `StellaOps.Cli`, `StellaOps.Notify`, `StellaOps.ExportCenter` | Operator UX, automation, notifications, and offline/mirror packaging. |
-| **Data plane** | PostgreSQL, Valkey, RustFS/object storage (optional NATS JetStream) | Canonical store, counters/queues, and artifact storage with deterministic layouts. |
+| **Edge / Identity** | `StellaOps.Authority` | Issues short-lived tokens (DPoP + mTLS), exposes OIDC flows, rotates JWKS |
+| **Release Control** | `StellaOps.ReleaseManager`, `StellaOps.PromotionManager`, `StellaOps.WorkflowEngine` | Release bundles, promotion workflows, gate evaluation (planned) |
+| **Integration Hub** | `StellaOps.IntegrationManager`, `StellaOps.ConnectorRuntime` | SCM/CI/Registry/Vault connectors (planned) |
+| **Scan & Attest** | `StellaOps.Scanner`, `StellaOps.Signer`, `StellaOps.Attestor` | Accept SBOMs/images, produce DSSE bundles, transparency logging |
+| **Evidence Graph** | `StellaOps.Concelier`, `StellaOps.Excititor`, `StellaOps.Policy.Engine` | Advisories/VEX, linksets, lattice policy |
+| **Deployment** | `StellaOps.DeployOrchestrator`, `StellaOps.Agent.*` | Deployment execution to Docker/Compose/ECS/Nomad (planned) |
+| **Experience** | `StellaOps.Web`, `StellaOps.Cli`, `StellaOps.Notify`, `StellaOps.ExportCenter` | Operator UX, automation, notifications |
+| **Data Plane** | PostgreSQL, Valkey, RustFS/object storage | Canonical store, queues, artifact storage |

 ## Infrastructure (What Is Required)

@@ -50,7 +93,9 @@ At a high level, StellaOps is a set of services grouped by responsibility:
 - **NATS JetStream:** optional messaging transport in some deployments.
 - **Transparency log services:** Rekor mirror (and CA services) when transparency is enabled.

-## End-to-End Flow (Typical)
+## End-to-End Flows
+
+### Current: Vulnerability Scanning Flow

 1. **Evidence enters** via Concelier and Excititor connectors (Aggregation-Only Contract).
 2. **SBOM arrives** from CLI/CI; Scanner deduplicates layers and enqueues work.
@@ -59,22 +104,64 @@ At a high level, StellaOps is a set of services grouped by responsibility:
 5. **Signer + Attestor** wrap outputs into DSSE bundles and (optionally) anchor them in a Rekor mirror.
 6. **Console/CLI/Export** surface findings and package verifiable evidence; Notify emits digests/incidents.

-## Extension Points (Where You Customize)
+### Planned: Release Orchestration Flow
+
+1. **CI pushes image** to registry by digest; triggers webhook to Stella.
+2. **Stella scans** the new digest and stores the verdict.
+3. **Release created** bundling component digests with semantic version.
+4. **Promotion requested** to move release from Dev → Stage → Prod.
+5. **Gate evaluation** checks: security verdict, approval count, freeze windows, custom policies.
+6. **Decision record** produced with evidence refs and signed.
+7. **Deployment executed** via agent to target (Docker/Compose/ECS/Nomad).
+8. **Version sticker** written to target for drift detection.
+9. **Evidence packet** sealed and stored.
+
+## Extension Points
+
+### Current Extension Points

 - **Scanner analyzers** (restart-time plug-ins) for ecosystem-specific parsing and facts extraction.
 - **Concelier connectors** for new advisory sources (preserving aggregation-only guardrails).
 - **Policy packs** for organization-specific gating and waivers/justifications.
 - **Export profiles** for output formats and offline bundle shapes.

+### Planned Extension Points (Three-Surface Plugin Model)
+
+Plugins contribute through three surfaces:
+
+1. **Manifest** (static declaration): What the plugin provides (integrations, steps, agents, gates)
+2. **Connector Runtime** (dynamic execution): gRPC interface for runtime operations
+3. **Step Provider** (execution contract): Execution characteristics for workflow steps
+
+Plugin types:
+- **Integration connectors:** SCM, CI, Registry, Vault, Target, Router
+- **Step providers:** Custom workflow steps
+- **Agent types:** New deployment target types
+- **Gate providers:** Custom gate evaluations
+
 ## Offline & Sovereign Notes

 - Offline Kit carries vulnerability feeds, container images, signatures, and verification material so the workflow stays identical when air-gapped.
 - Authority + token verification remain local; quota enforcement is verifiable offline.
 - Attestor can cache transparency proofs for offline verification.
+- Evidence packets are exportable for external audit in air-gapped environments.
+- All release decisions can be replayed with frozen inputs.
+
+## Key Architectural Decisions
+
+| Decision | Rationale |
+|----------|-----------|
+| **Digest-first release identity** | Tags are mutable; digests provide immutable release identity for audit |
+| **3-surface plugin model** | Enables extensibility without core code changes |
+| **Compiled C# scripts + sandboxed bash** | C# for complex orchestration; bash for simple hooks |
+| **Agent + agentless execution** | Agent-based preferred for reliability; agentless for adoption |
+| **Evidence packets for every decision** | Enables deterministic replay and audit-grade compliance |

 ## References

- `docs/ARCHITECTURE_REFERENCE.md`
- `docs/OFFLINE_KIT.md`
- `docs/API_CLI_REFERENCE.md`
- `docs/modules/platform/architecture-overview.md`
+- `docs/ARCHITECTURE_REFERENCE.md` — Full reference map
+- `docs/modules/release-orchestrator/architecture.md` — Release orchestrator design (planned)
+- `docs/OFFLINE_KIT.md` — Air-gap operations
+- `docs/API_CLI_REFERENCE.md` — API and CLI contracts
+- `docs/modules/platform/architecture-overview.md` — Platform service design
+- `docs/product/advisories/09-Jan-2026 - Stella Ops Orchestrator Architecture.md` — Full orchestrator specification