release orchestrator pivot, architecture and planning

This commit is contained in:
2026-01-10 22:37:22 +02:00
parent c84f421e2f
commit d509c44411
130 changed files with 70292 additions and 721 deletions

View File

@@ -1,41 +1,84 @@
# Architecture Overview (High-Level)
This document is the 10-minute tour for StellaOps: what components exist, how they fit together, and what "offline-first + deterministic + evidence-linked decisions" means in practice.
This document is the 10-minute tour for Stella Ops Suite: what components exist, how they fit together, and what "release control plane + security gates + evidence-linked decisions" means in practice.
For the full reference map (services, boundaries, detailed flows), see `docs/ARCHITECTURE_REFERENCE.md`.
## What Stella Ops Suite Is
**Stella Ops Suite is a centralized, auditable release control plane for non-Kubernetes container estates.**
It sits between your CI and your runtime targets, governs promotion across environments, enforces security and policy gates, and produces verifiable evidence for every release decision.
```
CI Build → Registry → Stella (Scan + Release + Promote + Gate + Deploy) → Targets → Evidence
```
## Guiding Principles
- **SBOM-first:** scan and reason over SBOMs; fall back to unpacking only when needed.
- **Digest-first releases:** a release is an immutable set of OCI digests, never mutable tags.
- **Deterministic replay:** the same inputs yield the same outputs (stable ordering, canonical hashing, UTC timestamps).
- **Evidence-linked decisions:** policy decisions link back to specific evidence artifacts (SBOM slices, advisory/VEX observations, reachability proofs, attestations).
- **Aggregation-not-merge:** upstream advisories and VEX are stored and exposed with provenance; conflicts are visible, not silently collapsed.
- **Offline-first:** the same workflow runs connected or air-gapped via Offline Kit snapshots and signed bundles.
- **Evidence-linked decisions:** every release decision links to concrete evidence artifacts (scan verdicts, approvals, policy evaluations).
- **Pluggable everything:** integrations are plugins; the core orchestration engine is stable.
- **Offline-first:** all core operations work in air-gapped environments.
- **No feature gating:** all plans include all features; limits are environments + new digests/day.
## System Map (What Runs)
## System Map
### Release-Centric Flow
```
Build -> Sign -> Store -> Scan -> Decide -> Attest -> Notify/Export
Build → Scan → Create Release → Request Promotion → Gate Evaluation → Deploy → Evidence
↑ ↓
└── Re-evaluate on CVE Updates ┘
```
At a high level, StellaOps is a set of services grouped by responsibility:
### Platform Themes
- **Identity and authorization:** Authority (OIDC/OAuth2, scopes/tenancy)
- **Scanning and SBOM:** Scanner WebService + Worker (facts generation)
- **Advisories:** Concelier (ingest/normalize/export vulnerability sources)
- **VEX:** Excititor + VEX Lens (VEX observations/linksets and exploration)
- **Decisioning:** Policy Engine surfaces (lattice-style explainable policy)
- **Signing and transparency:** Signer + Attestor (DSSE/in-toto and optional transparency)
- **Orchestration and delivery:** Scheduler, Notify, Export Center
- **Console:** Web UI for operators and auditors
Stella Ops Suite organizes capabilities into **themes** (functional areas):
| Tier | Services | Key responsibilities |
#### Existing Themes (Operational)
| Theme | Purpose | Key Modules |
|-------|---------|-------------|
| **INGEST** | Advisory ingestion | Concelier, Advisory-AI |
| **VEXOPS** | VEX document handling | Excititor, VEX Lens, VEX Hub |
| **REASON** | Policy and decisioning | Policy Engine, OPA Runtime |
| **SCANENG** | Scanning and SBOM | Scanner, SBOM Service, Reachability |
| **EVIDENCE** | Evidence and attestation | Evidence Locker, Attestor, Export Center |
| **RUNTIME** | Runtime signals | Signals, Graph, Zastava |
| **JOBCTRL** | Job orchestration | Scheduler, Orchestrator, TaskRunner |
| **OBSERVE** | Observability | Notifier, Telemetry |
| **REPLAY** | Deterministic replay | Replay Engine |
| **DEVEXP** | Developer experience | CLI, Web UI, SDK |
#### Planned Themes (Release Orchestration)
| Theme | Purpose | Key Modules |
|-------|---------|-------------|
| **INTHUB** | Integration hub | Integration Manager, Connection Profiles, Connector Runtime, Doctor Checks |
| **ENVMGR** | Environment management | Environment Manager, Target Registry, Agent Manager, Inventory Sync |
| **RELMAN** | Release management | Component Registry, Version Manager, Release Manager, Release Catalog |
| **WORKFL** | Workflow engine | Workflow Designer, Workflow Engine, Step Executor, Step Registry |
| **PROMOT** | Promotion and approval | Promotion Manager, Approval Gateway, Decision Engine, Gate Registry |
| **DEPLOY** | Deployment execution | Deploy Orchestrator, Target Executor, Runner Executor, Artifact Generator, Rollback Manager |
| **AGENTS** | Deployment agents | Agent Core, Agent Docker, Agent Compose, Agent SSH, Agent WinRM, Agent ECS, Agent Nomad |
| **PROGDL** | Progressive delivery | A/B Manager, Traffic Router, Canary Controller, Rollout Strategy |
| **RELEVI** | Release evidence | Evidence Collector, Evidence Signer, Sticker Writer, Audit Exporter |
| **PLUGIN** | Plugin infrastructure | Plugin Registry, Plugin Loader, Plugin Sandbox, Plugin SDK |
### Service Tiers
| Tier | Services | Key Responsibilities |
|------|----------|----------------------|
| **Edge / Identity** | `StellaOps.Authority` | Issues short-lived tokens (DPoP + mTLS), exposes OIDC device-code + auth-code flows, rotates JWKS. |
| **Scan & attest** | `StellaOps.Scanner` (API + Worker), `StellaOps.Signer`, `StellaOps.Attestor` | Accept SBOMs/images, drive analyzers, produce DSSE bundles, optionally log to a Rekor mirror. |
| **Evidence graph** | `StellaOps.Concelier`, `StellaOps.Excititor`, `StellaOps.Policy.Engine` | Ingest advisories/VEX, correlate linksets, run lattice policy and VEX-first decisioning. |
| **Experience** | `StellaOps.Web` (Console), `StellaOps.Cli`, `StellaOps.Notify`, `StellaOps.ExportCenter` | Operator UX, automation, notifications, and offline/mirror packaging. |
| **Data plane** | PostgreSQL, Valkey, RustFS/object storage (optional NATS JetStream) | Canonical store, counters/queues, and artifact storage with deterministic layouts. |
| **Edge / Identity** | `StellaOps.Authority` | Issues short-lived tokens (DPoP + mTLS), exposes OIDC flows, rotates JWKS |
| **Release Control** | `StellaOps.ReleaseManager`, `StellaOps.PromotionManager`, `StellaOps.WorkflowEngine` | Release bundles, promotion workflows, gate evaluation (planned) |
| **Integration Hub** | `StellaOps.IntegrationManager`, `StellaOps.ConnectorRuntime` | SCM/CI/Registry/Vault connectors (planned) |
| **Scan & Attest** | `StellaOps.Scanner`, `StellaOps.Signer`, `StellaOps.Attestor` | Accept SBOMs/images, produce DSSE bundles, transparency logging |
| **Evidence Graph** | `StellaOps.Concelier`, `StellaOps.Excititor`, `StellaOps.Policy.Engine` | Advisories/VEX, linksets, lattice policy |
| **Deployment** | `StellaOps.DeployOrchestrator`, `StellaOps.Agent.*` | Deployment execution to Docker/Compose/ECS/Nomad (planned) |
| **Experience** | `StellaOps.Web`, `StellaOps.Cli`, `StellaOps.Notify`, `StellaOps.ExportCenter` | Operator UX, automation, notifications |
| **Data Plane** | PostgreSQL, Valkey, RustFS/object storage | Canonical store, queues, artifact storage |
## Infrastructure (What Is Required)
@@ -50,7 +93,9 @@ At a high level, StellaOps is a set of services grouped by responsibility:
- **NATS JetStream:** optional messaging transport in some deployments.
- **Transparency log services:** Rekor mirror (and CA services) when transparency is enabled.
## End-to-End Flow (Typical)
## End-to-End Flows
### Current: Vulnerability Scanning Flow
1. **Evidence enters** via Concelier and Excititor connectors (Aggregation-Only Contract).
2. **SBOM arrives** from CLI/CI; Scanner deduplicates layers and enqueues work.
@@ -59,22 +104,64 @@ At a high level, StellaOps is a set of services grouped by responsibility:
5. **Signer + Attestor** wrap outputs into DSSE bundles and (optionally) anchor them in a Rekor mirror.
6. **Console/CLI/Export** surface findings and package verifiable evidence; Notify emits digests/incidents.
## Extension Points (Where You Customize)
### Planned: Release Orchestration Flow
1. **CI pushes image** to registry by digest; triggers webhook to Stella.
2. **Stella scans** the new digest and stores the verdict.
3. **Release created** bundling component digests with semantic version.
4. **Promotion requested** to move release from Dev → Stage → Prod.
5. **Gate evaluation** checks: security verdict, approval count, freeze windows, custom policies.
6. **Decision record** produced with evidence refs and signed.
7. **Deployment executed** via agent to target (Docker/Compose/ECS/Nomad).
8. **Version sticker** written to target for drift detection.
9. **Evidence packet** sealed and stored.
## Extension Points
### Current Extension Points
- **Scanner analyzers** (restart-time plug-ins) for ecosystem-specific parsing and facts extraction.
- **Concelier connectors** for new advisory sources (preserving aggregation-only guardrails).
- **Policy packs** for organization-specific gating and waivers/justifications.
- **Export profiles** for output formats and offline bundle shapes.
### Planned Extension Points (Three-Surface Plugin Model)
Plugins contribute through three surfaces:
1. **Manifest** (static declaration): What the plugin provides (integrations, steps, agents, gates)
2. **Connector Runtime** (dynamic execution): gRPC interface for runtime operations
3. **Step Provider** (execution contract): Execution characteristics for workflow steps
Plugin types:
- **Integration connectors:** SCM, CI, Registry, Vault, Target, Router
- **Step providers:** Custom workflow steps
- **Agent types:** New deployment target types
- **Gate providers:** Custom gate evaluations
## Offline & Sovereign Notes
- Offline Kit carries vulnerability feeds, container images, signatures, and verification material so the workflow stays identical when air-gapped.
- Authority + token verification remain local; quota enforcement is verifiable offline.
- Attestor can cache transparency proofs for offline verification.
- Evidence packets are exportable for external audit in air-gapped environments.
- All release decisions can be replayed with frozen inputs.
## Key Architectural Decisions
| Decision | Rationale |
|----------|-----------|
| **Digest-first release identity** | Tags are mutable; digests provide immutable release identity for audit |
| **3-surface plugin model** | Enables extensibility without core code changes |
| **Compiled C# scripts + sandboxed bash** | C# for complex orchestration; bash for simple hooks |
| **Agent + agentless execution** | Agent-based preferred for reliability; agentless for adoption |
| **Evidence packets for every decision** | Enables deterministic replay and audit-grade compliance |
## References
- `docs/ARCHITECTURE_REFERENCE.md`
- `docs/OFFLINE_KIT.md`
- `docs/API_CLI_REFERENCE.md`
- `docs/modules/platform/architecture-overview.md`
- `docs/ARCHITECTURE_REFERENCE.md` — Full reference map
- `docs/modules/release-orchestrator/architecture.md` — Release orchestrator design (planned)
- `docs/OFFLINE_KIT.md` — Air-gap operations
- `docs/API_CLI_REFERENCE.md` — API and CLI contracts
- `docs/modules/platform/architecture-overview.md` — Platform service design
- `docs/product/advisories/09-Jan-2026 - Stella Ops Orchestrator Architecture.md` — Full orchestrator specification