release orchestrator pivot, architecture and planning

2026-01-10 22:37:22 +02:00
parent c84f421e2f
commit d509c44411
130 changed files with 70292 additions and 721 deletions
--- a/docs/modules/release-orchestrator/design/decisions.md
+++ b/docs/modules/release-orchestrator/design/decisions.md
@@ -0,0 +1,249 @@
+# Key Architectural Decisions
+
+This document records significant architectural decisions and their rationale.
+
+## ADR-001: Digest-First Release Identity
+
+**Status:** Accepted
+
+**Context:**
+Container images can be referenced by tags (e.g., `v1.2.3`) or digests (e.g., `sha256:abc123...`). Tags are mutable - the same tag can point to different images over time.
+
+**Decision:**
+All releases are identified by immutable OCI digests, never tags. Tags are accepted as input but immediately resolved to digests at release creation time.
+
+**Consequences:**
+- Releases are immutable and reproducible
+- Digest mismatch at pull time indicates tampering (deployment fails)
+- Rollback targets specific digest, not "previous tag"
+- Requires registry integration for tag resolution
+- Users see both tag (friendly) and digest (authoritative) in UI
+
+---
+
+## ADR-002: Evidence for Every Decision
+
+**Status:** Accepted
+
+**Context:**
+Compliance and audit requirements demand proof of what was deployed, when, by whom, and why.
+
+**Decision:**
+Every promotion and deployment produces a cryptographically signed evidence packet that is immutable and append-only.
+
+**Consequences:**
+- Evidence table has no UPDATE/DELETE permissions
+- Evidence enables audit-grade compliance reporting
+- Evidence enables deterministic replay (same inputs + policy = same decision)
+- Evidence packets are exportable for external audit systems
+- Storage requirements increase over time
+
+---
+
+## ADR-003: Plugin Architecture for Integrations
+
+**Status:** Accepted
+
+**Context:**
+Organizations use diverse toolchains (registries, CI/CD, vaults, notification systems). Hard-coding integrations limits adoption.
+
+**Decision:**
+All integrations are implemented as plugins via a three-surface contract (Manifest, Connector Runtime, Step Provider). Core orchestration is stable and plugin-agnostic.
+
+**Consequences:**
+- Core has no hard-coded vendor integrations
+- New integrations can be added without core changes
+- Plugin failures cannot crash core (sandbox isolation)
+- Plugin interface must be versioned and stable
+- Additional complexity in plugin lifecycle management
+
+---
+
+## ADR-004: No Feature Gating
+
+**Status:** Accepted
+
+**Context:**
+Enterprise software often gates security features behind premium tiers, creating "pay for security" anti-patterns.
+
+**Decision:**
+All plans include all features. Pricing is based only on:
+- Number of environments
+- New digests analyzed per day
+- Fair use on deployments
+
+**Consequences:**
+- No feature flags tied to billing tier
+- Transparent pricing without feature fragmentation
+- May limit revenue optimization per customer
+- Quota enforcement must be clear and user-friendly
+
+---
+
+## ADR-005: Offline-First Operation
+
+**Status:** Accepted
+
+**Context:**
+Many organizations operate in air-gapped or restricted network environments. Dependency on external services limits adoption.
+
+**Decision:**
+All core operations must work in air-gapped environments. External data is synced via mirror bundles. Plugins may require connectivity; core does not.
+
+**Consequences:**
+- No runtime calls to external APIs for core decisions
+- Advisory data synced via offline bundles
+- Plugin connectivity requirements are declared in manifest
+- Evidence packets exportable for external submission
+- Additional complexity in data synchronization
+
+---
+
+## ADR-006: Agent-Based and Agentless Deployment
+
+**Status:** Accepted
+
+**Context:**
+Some organizations prefer agents for security isolation; others prefer agentless for simplicity.
+
+**Decision:**
+Support both agent-based (persistent daemon on targets) and agentless (SSH/WinRM on demand) deployment models.
+
+**Consequences:**
+- Agent provides better performance and reliability
+- Agentless reduces infrastructure footprint
+- Unified task model abstracts deployment details
+- Security model must handle both patterns
+- Higher testing matrix
+
+---
+
+## ADR-007: PostgreSQL as Primary Database
+
+**Status:** Accepted
+
+**Context:**
+Database choice affects scalability, operations, and feature availability.
+
+**Decision:**
+PostgreSQL (16+) as the primary database with:
+- Per-module schema isolation
+- Row-level security for multi-tenancy
+- JSONB for flexible configuration
+- Append-only triggers for evidence tables
+
+**Consequences:**
+- Proven scalability and reliability
+- Rich feature set (JSONB, RLS, triggers)
+- Single database technology to operate
+- Requires PostgreSQL expertise
+- Schema migrations must be carefully managed
+
+---
+
+## ADR-008: Workflow Engine with DAG Execution
+
+**Status:** Accepted
+
+**Context:**
+Deployment workflows need conditional logic, parallel execution, error handling, and rollback support.
+
+**Decision:**
+Implement a DAG-based workflow engine where:
+- Workflows are templates with nodes (steps) and edges (dependencies)
+- Steps execute when all dependencies are satisfied
+- Expressions reference previous step outputs
+- Built-in support for approval, retry, timeout, and rollback
+
+**Consequences:**
+- Flexible workflow composition
+- Visual representation in UI
+- Complex error handling scenarios supported
+- Learning curve for workflow authors
+- Expression engine security considerations
+
+---
+
+## ADR-009: Separation of Duties Enforcement
+
+**Status:** Accepted
+
+**Context:**
+Compliance requires that the person requesting a change cannot be the same person approving it.
+
+**Decision:**
+Separation of Duties (SoD) is enforced at the approval gateway level, preventing self-approval when SoD is enabled for an environment.
+
+**Consequences:**
+- Prevents single-person deployment to sensitive environments
+- Configurable per environment
+- May slow down deployments
+- Requires minimum team size for SoD-enabled environments
+
+---
+
+## ADR-010: Version Stickers for Drift Detection
+
+**Status:** Accepted
+
+**Context:**
+Knowing what's actually deployed on targets is essential for audit and troubleshooting.
+
+**Decision:**
+Every deployment writes a `stella.version.json` sticker file on the target containing release ID, digests, deployment timestamp, and deployer identity.
+
+**Consequences:**
+- Enables drift detection (expected vs actual)
+- Provides audit trail on target hosts
+- Enables accurate "what's deployed where" queries
+- Requires file access on targets
+- Sticker corruption/deletion must be handled
+
+---
+
+## ADR-011: Security Gate Integration
+
+**Status:** Accepted
+
+**Context:**
+Security scanning exists as a separate concern; release orchestration should leverage but not duplicate it.
+
+**Decision:**
+Security scanning remains in existing modules (Scanner, VEX). Release orchestration consumes scan results through a security gate that evaluates vulnerability thresholds.
+
+**Consequences:**
+- Clear separation of concerns
+- Existing scanning investment preserved
+- Gate configuration determines block thresholds
+- Requires API integration with scanning modules
+- Policy engine evaluates security verdicts
+
+---
+
+## ADR-012: gRPC for Agent Communication
+
+**Status:** Accepted
+
+**Context:**
+Agent communication requires efficient, bidirectional, and secure data transfer.
+
+**Decision:**
+Use gRPC for agent communication with:
+- mTLS for transport security
+- Bidirectional streaming for logs and progress
+- Protocol buffers for efficient serialization
+
+**Consequences:**
+- Efficient binary protocol
+- Strong typing via protobuf
+- Built-in streaming support
+- Requires gRPC infrastructure
+- Firewall considerations for gRPC traffic
+
+---
+
+## References
+
+- [Design Principles](principles.md)
+- [Security Architecture](../security/overview.md)
+- [Plugin System](../modules/plugin-system.md)