release orchestrator pivot, architecture and planning
This commit is contained in:
249
docs/modules/release-orchestrator/design/decisions.md
Normal file
249
docs/modules/release-orchestrator/design/decisions.md
Normal file
@@ -0,0 +1,249 @@
|
||||
# Key Architectural Decisions
|
||||
|
||||
This document records significant architectural decisions and their rationale.
|
||||
|
||||
## ADR-001: Digest-First Release Identity
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Container images can be referenced by tags (e.g., `v1.2.3`) or digests (e.g., `sha256:abc123...`). Tags are mutable - the same tag can point to different images over time.
|
||||
|
||||
**Decision:**
|
||||
All releases are identified by immutable OCI digests, never tags. Tags are accepted as input but immediately resolved to digests at release creation time.
|
||||
|
||||
**Consequences:**
|
||||
- Releases are immutable and reproducible
|
||||
- Digest mismatch at pull time indicates tampering (deployment fails)
|
||||
- Rollback targets specific digest, not "previous tag"
|
||||
- Requires registry integration for tag resolution
|
||||
- Users see both tag (friendly) and digest (authoritative) in UI
|
||||
|
||||
---
|
||||
|
||||
## ADR-002: Evidence for Every Decision
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Compliance and audit requirements demand proof of what was deployed, when, by whom, and why.
|
||||
|
||||
**Decision:**
|
||||
Every promotion and deployment produces a cryptographically signed evidence packet that is immutable and append-only.
|
||||
|
||||
**Consequences:**
|
||||
- Evidence table has no UPDATE/DELETE permissions
|
||||
- Evidence enables audit-grade compliance reporting
|
||||
- Evidence enables deterministic replay (same inputs + policy = same decision)
|
||||
- Evidence packets are exportable for external audit systems
|
||||
- Storage requirements increase over time
|
||||
|
||||
---
|
||||
|
||||
## ADR-003: Plugin Architecture for Integrations
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Organizations use diverse toolchains (registries, CI/CD, vaults, notification systems). Hard-coding integrations limits adoption.
|
||||
|
||||
**Decision:**
|
||||
All integrations are implemented as plugins via a three-surface contract (Manifest, Connector Runtime, Step Provider). Core orchestration is stable and plugin-agnostic.
|
||||
|
||||
**Consequences:**
|
||||
- Core has no hard-coded vendor integrations
|
||||
- New integrations can be added without core changes
|
||||
- Plugin failures cannot crash core (sandbox isolation)
|
||||
- Plugin interface must be versioned and stable
|
||||
- Additional complexity in plugin lifecycle management
|
||||
|
||||
---
|
||||
|
||||
## ADR-004: No Feature Gating
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Enterprise software often gates security features behind premium tiers, creating "pay for security" anti-patterns.
|
||||
|
||||
**Decision:**
|
||||
All plans include all features. Pricing is based only on:
|
||||
- Number of environments
|
||||
- New digests analyzed per day
|
||||
- Fair use on deployments
|
||||
|
||||
**Consequences:**
|
||||
- No feature flags tied to billing tier
|
||||
- Transparent pricing without feature fragmentation
|
||||
- May limit revenue optimization per customer
|
||||
- Quota enforcement must be clear and user-friendly
|
||||
|
||||
---
|
||||
|
||||
## ADR-005: Offline-First Operation
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Many organizations operate in air-gapped or restricted network environments. Dependency on external services limits adoption.
|
||||
|
||||
**Decision:**
|
||||
All core operations must work in air-gapped environments. External data is synced via mirror bundles. Plugins may require connectivity; core does not.
|
||||
|
||||
**Consequences:**
|
||||
- No runtime calls to external APIs for core decisions
|
||||
- Advisory data synced via offline bundles
|
||||
- Plugin connectivity requirements are declared in manifest
|
||||
- Evidence packets exportable for external submission
|
||||
- Additional complexity in data synchronization
|
||||
|
||||
---
|
||||
|
||||
## ADR-006: Agent-Based and Agentless Deployment
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Some organizations prefer agents for security isolation; others prefer agentless for simplicity.
|
||||
|
||||
**Decision:**
|
||||
Support both agent-based (persistent daemon on targets) and agentless (SSH/WinRM on demand) deployment models.
|
||||
|
||||
**Consequences:**
|
||||
- Agent provides better performance and reliability
|
||||
- Agentless reduces infrastructure footprint
|
||||
- Unified task model abstracts deployment details
|
||||
- Security model must handle both patterns
|
||||
- Higher testing matrix
|
||||
|
||||
---
|
||||
|
||||
## ADR-007: PostgreSQL as Primary Database
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Database choice affects scalability, operations, and feature availability.
|
||||
|
||||
**Decision:**
|
||||
PostgreSQL (16+) as the primary database with:
|
||||
- Per-module schema isolation
|
||||
- Row-level security for multi-tenancy
|
||||
- JSONB for flexible configuration
|
||||
- Append-only triggers for evidence tables
|
||||
|
||||
**Consequences:**
|
||||
- Proven scalability and reliability
|
||||
- Rich feature set (JSONB, RLS, triggers)
|
||||
- Single database technology to operate
|
||||
- Requires PostgreSQL expertise
|
||||
- Schema migrations must be carefully managed
|
||||
|
||||
---
|
||||
|
||||
## ADR-008: Workflow Engine with DAG Execution
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Deployment workflows need conditional logic, parallel execution, error handling, and rollback support.
|
||||
|
||||
**Decision:**
|
||||
Implement a DAG-based workflow engine where:
|
||||
- Workflows are templates with nodes (steps) and edges (dependencies)
|
||||
- Steps execute when all dependencies are satisfied
|
||||
- Expressions reference previous step outputs
|
||||
- Built-in support for approval, retry, timeout, and rollback
|
||||
|
||||
**Consequences:**
|
||||
- Flexible workflow composition
|
||||
- Visual representation in UI
|
||||
- Complex error handling scenarios supported
|
||||
- Learning curve for workflow authors
|
||||
- Expression engine security considerations
|
||||
|
||||
---
|
||||
|
||||
## ADR-009: Separation of Duties Enforcement
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Compliance requires that the person requesting a change cannot be the same person approving it.
|
||||
|
||||
**Decision:**
|
||||
Separation of Duties (SoD) is enforced at the approval gateway level, preventing self-approval when SoD is enabled for an environment.
|
||||
|
||||
**Consequences:**
|
||||
- Prevents single-person deployment to sensitive environments
|
||||
- Configurable per environment
|
||||
- May slow down deployments
|
||||
- Requires minimum team size for SoD-enabled environments
|
||||
|
||||
---
|
||||
|
||||
## ADR-010: Version Stickers for Drift Detection
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Knowing what's actually deployed on targets is essential for audit and troubleshooting.
|
||||
|
||||
**Decision:**
|
||||
Every deployment writes a `stella.version.json` sticker file on the target containing release ID, digests, deployment timestamp, and deployer identity.
|
||||
|
||||
**Consequences:**
|
||||
- Enables drift detection (expected vs actual)
|
||||
- Provides audit trail on target hosts
|
||||
- Enables accurate "what's deployed where" queries
|
||||
- Requires file access on targets
|
||||
- Sticker corruption/deletion must be handled
|
||||
|
||||
---
|
||||
|
||||
## ADR-011: Security Gate Integration
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Security scanning exists as a separate concern; release orchestration should leverage but not duplicate it.
|
||||
|
||||
**Decision:**
|
||||
Security scanning remains in existing modules (Scanner, VEX). Release orchestration consumes scan results through a security gate that evaluates vulnerability thresholds.
|
||||
|
||||
**Consequences:**
|
||||
- Clear separation of concerns
|
||||
- Existing scanning investment preserved
|
||||
- Gate configuration determines block thresholds
|
||||
- Requires API integration with scanning modules
|
||||
- Policy engine evaluates security verdicts
|
||||
|
||||
---
|
||||
|
||||
## ADR-012: gRPC for Agent Communication
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Agent communication requires efficient, bidirectional, and secure data transfer.
|
||||
|
||||
**Decision:**
|
||||
Use gRPC for agent communication with:
|
||||
- mTLS for transport security
|
||||
- Bidirectional streaming for logs and progress
|
||||
- Protocol buffers for efficient serialization
|
||||
|
||||
**Consequences:**
|
||||
- Efficient binary protocol
|
||||
- Strong typing via protobuf
|
||||
- Built-in streaming support
|
||||
- Requires gRPC infrastructure
|
||||
- Firewall considerations for gRPC traffic
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Design Principles](principles.md)
|
||||
- [Security Architecture](../security/overview.md)
|
||||
- [Plugin System](../modules/plugin-system.md)
|
||||
Reference in New Issue
Block a user