release orchestrator pivot, architecture and planning
This commit is contained in:
249
docs/modules/release-orchestrator/design/decisions.md
Normal file
249
docs/modules/release-orchestrator/design/decisions.md
Normal file
@@ -0,0 +1,249 @@
|
||||
# Key Architectural Decisions
|
||||
|
||||
This document records significant architectural decisions and their rationale.
|
||||
|
||||
## ADR-001: Digest-First Release Identity
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Container images can be referenced by tags (e.g., `v1.2.3`) or digests (e.g., `sha256:abc123...`). Tags are mutable - the same tag can point to different images over time.
|
||||
|
||||
**Decision:**
|
||||
All releases are identified by immutable OCI digests, never tags. Tags are accepted as input but immediately resolved to digests at release creation time.
|
||||
|
||||
**Consequences:**
|
||||
- Releases are immutable and reproducible
|
||||
- Digest mismatch at pull time indicates tampering (deployment fails)
|
||||
- Rollback targets specific digest, not "previous tag"
|
||||
- Requires registry integration for tag resolution
|
||||
- Users see both tag (friendly) and digest (authoritative) in UI
|
||||
|
||||
---
|
||||
|
||||
## ADR-002: Evidence for Every Decision
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Compliance and audit requirements demand proof of what was deployed, when, by whom, and why.
|
||||
|
||||
**Decision:**
|
||||
Every promotion and deployment produces a cryptographically signed evidence packet that is immutable and append-only.
|
||||
|
||||
**Consequences:**
|
||||
- Evidence table has no UPDATE/DELETE permissions
|
||||
- Evidence enables audit-grade compliance reporting
|
||||
- Evidence enables deterministic replay (same inputs + policy = same decision)
|
||||
- Evidence packets are exportable for external audit systems
|
||||
- Storage requirements increase over time
|
||||
|
||||
---
|
||||
|
||||
## ADR-003: Plugin Architecture for Integrations
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Organizations use diverse toolchains (registries, CI/CD, vaults, notification systems). Hard-coding integrations limits adoption.
|
||||
|
||||
**Decision:**
|
||||
All integrations are implemented as plugins via a three-surface contract (Manifest, Connector Runtime, Step Provider). Core orchestration is stable and plugin-agnostic.
|
||||
|
||||
**Consequences:**
|
||||
- Core has no hard-coded vendor integrations
|
||||
- New integrations can be added without core changes
|
||||
- Plugin failures cannot crash core (sandbox isolation)
|
||||
- Plugin interface must be versioned and stable
|
||||
- Additional complexity in plugin lifecycle management
|
||||
|
||||
---
|
||||
|
||||
## ADR-004: No Feature Gating
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Enterprise software often gates security features behind premium tiers, creating "pay for security" anti-patterns.
|
||||
|
||||
**Decision:**
|
||||
All plans include all features. Pricing is based only on:
|
||||
- Number of environments
|
||||
- New digests analyzed per day
|
||||
- Fair use on deployments
|
||||
|
||||
**Consequences:**
|
||||
- No feature flags tied to billing tier
|
||||
- Transparent pricing without feature fragmentation
|
||||
- May limit revenue optimization per customer
|
||||
- Quota enforcement must be clear and user-friendly
|
||||
|
||||
---
|
||||
|
||||
## ADR-005: Offline-First Operation
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Many organizations operate in air-gapped or restricted network environments. Dependency on external services limits adoption.
|
||||
|
||||
**Decision:**
|
||||
All core operations must work in air-gapped environments. External data is synced via mirror bundles. Plugins may require connectivity; core does not.
|
||||
|
||||
**Consequences:**
|
||||
- No runtime calls to external APIs for core decisions
|
||||
- Advisory data synced via offline bundles
|
||||
- Plugin connectivity requirements are declared in manifest
|
||||
- Evidence packets exportable for external submission
|
||||
- Additional complexity in data synchronization
|
||||
|
||||
---
|
||||
|
||||
## ADR-006: Agent-Based and Agentless Deployment
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Some organizations prefer agents for security isolation; others prefer agentless for simplicity.
|
||||
|
||||
**Decision:**
|
||||
Support both agent-based (persistent daemon on targets) and agentless (SSH/WinRM on demand) deployment models.
|
||||
|
||||
**Consequences:**
|
||||
- Agent provides better performance and reliability
|
||||
- Agentless reduces infrastructure footprint
|
||||
- Unified task model abstracts deployment details
|
||||
- Security model must handle both patterns
|
||||
- Higher testing matrix
|
||||
|
||||
---
|
||||
|
||||
## ADR-007: PostgreSQL as Primary Database
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Database choice affects scalability, operations, and feature availability.
|
||||
|
||||
**Decision:**
|
||||
PostgreSQL (16+) as the primary database with:
|
||||
- Per-module schema isolation
|
||||
- Row-level security for multi-tenancy
|
||||
- JSONB for flexible configuration
|
||||
- Append-only triggers for evidence tables
|
||||
|
||||
**Consequences:**
|
||||
- Proven scalability and reliability
|
||||
- Rich feature set (JSONB, RLS, triggers)
|
||||
- Single database technology to operate
|
||||
- Requires PostgreSQL expertise
|
||||
- Schema migrations must be carefully managed
|
||||
|
||||
---
|
||||
|
||||
## ADR-008: Workflow Engine with DAG Execution
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Deployment workflows need conditional logic, parallel execution, error handling, and rollback support.
|
||||
|
||||
**Decision:**
|
||||
Implement a DAG-based workflow engine where:
|
||||
- Workflows are templates with nodes (steps) and edges (dependencies)
|
||||
- Steps execute when all dependencies are satisfied
|
||||
- Expressions reference previous step outputs
|
||||
- Built-in support for approval, retry, timeout, and rollback
|
||||
|
||||
**Consequences:**
|
||||
- Flexible workflow composition
|
||||
- Visual representation in UI
|
||||
- Complex error handling scenarios supported
|
||||
- Learning curve for workflow authors
|
||||
- Expression engine security considerations
|
||||
|
||||
---
|
||||
|
||||
## ADR-009: Separation of Duties Enforcement
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Compliance requires that the person requesting a change cannot be the same person approving it.
|
||||
|
||||
**Decision:**
|
||||
Separation of Duties (SoD) is enforced at the approval gateway level, preventing self-approval when SoD is enabled for an environment.
|
||||
|
||||
**Consequences:**
|
||||
- Prevents single-person deployment to sensitive environments
|
||||
- Configurable per environment
|
||||
- May slow down deployments
|
||||
- Requires minimum team size for SoD-enabled environments
|
||||
|
||||
---
|
||||
|
||||
## ADR-010: Version Stickers for Drift Detection
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Knowing what's actually deployed on targets is essential for audit and troubleshooting.
|
||||
|
||||
**Decision:**
|
||||
Every deployment writes a `stella.version.json` sticker file on the target containing release ID, digests, deployment timestamp, and deployer identity.
|
||||
|
||||
**Consequences:**
|
||||
- Enables drift detection (expected vs actual)
|
||||
- Provides audit trail on target hosts
|
||||
- Enables accurate "what's deployed where" queries
|
||||
- Requires file access on targets
|
||||
- Sticker corruption/deletion must be handled
|
||||
|
||||
---
|
||||
|
||||
## ADR-011: Security Gate Integration
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Security scanning exists as a separate concern; release orchestration should leverage but not duplicate it.
|
||||
|
||||
**Decision:**
|
||||
Security scanning remains in existing modules (Scanner, VEX). Release orchestration consumes scan results through a security gate that evaluates vulnerability thresholds.
|
||||
|
||||
**Consequences:**
|
||||
- Clear separation of concerns
|
||||
- Existing scanning investment preserved
|
||||
- Gate configuration determines block thresholds
|
||||
- Requires API integration with scanning modules
|
||||
- Policy engine evaluates security verdicts
|
||||
|
||||
---
|
||||
|
||||
## ADR-012: gRPC for Agent Communication
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Context:**
|
||||
Agent communication requires efficient, bidirectional, and secure data transfer.
|
||||
|
||||
**Decision:**
|
||||
Use gRPC for agent communication with:
|
||||
- mTLS for transport security
|
||||
- Bidirectional streaming for logs and progress
|
||||
- Protocol buffers for efficient serialization
|
||||
|
||||
**Consequences:**
|
||||
- Efficient binary protocol
|
||||
- Strong typing via protobuf
|
||||
- Built-in streaming support
|
||||
- Requires gRPC infrastructure
|
||||
- Firewall considerations for gRPC traffic
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Design Principles](principles.md)
|
||||
- [Security Architecture](../security/overview.md)
|
||||
- [Plugin System](../modules/plugin-system.md)
|
||||
221
docs/modules/release-orchestrator/design/principles.md
Normal file
221
docs/modules/release-orchestrator/design/principles.md
Normal file
@@ -0,0 +1,221 @@
|
||||
# Design Principles & Invariants
|
||||
|
||||
> These principles are **inviolable** and MUST be reflected in all code, UI, documentation, and audit artifacts.
|
||||
|
||||
## Core Principles
|
||||
|
||||
### Principle 1: Release Identity via Digest
|
||||
|
||||
```
|
||||
INVARIANT: A release is a set of OCI image digests (component → digest mapping), never tags.
|
||||
```
|
||||
|
||||
- Tags are convenience inputs for resolution
|
||||
- Tags are resolved to digests at release creation time
|
||||
- All downstream operations (promotion, deployment, rollback) use digests
|
||||
- Digest mismatch at pull time = deployment failure (tamper detection)
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Release creation API accepts tags but immediately resolves to digests
|
||||
- All internal references use `sha256:` prefixed digests
|
||||
- Agent deployment verifies digest at pull time
|
||||
- Rollback targets specific digest, not "previous tag"
|
||||
|
||||
### Principle 2: Determinism and Evidence
|
||||
|
||||
```
|
||||
INVARIANT: Every deployment/promotion produces an immutable evidence record.
|
||||
```
|
||||
|
||||
Evidence record contains:
|
||||
- **Who**: User identity (from Authority)
|
||||
- **What**: Release bundle (digests), target environment, target hosts
|
||||
- **Why**: Policy evaluation result, approval records, decision reasons
|
||||
- **How**: Generated artifacts (compose files, scripts), execution logs
|
||||
- **When**: Timestamps for request, decision, execution, completion
|
||||
|
||||
Evidence enables:
|
||||
- Audit-grade compliance reporting
|
||||
- Deterministic replay (same inputs + policy → same decision)
|
||||
- "Why blocked?" explainability
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Evidence is generated synchronously with decision
|
||||
- Evidence is signed before storage
|
||||
- Evidence table is append-only (no UPDATE/DELETE)
|
||||
- Evidence includes hash of all inputs for replay verification
|
||||
|
||||
### Principle 3: Pluggable Everything, Stable Core
|
||||
|
||||
```
|
||||
INVARIANT: Integrations are plugins; the core orchestration engine is stable.
|
||||
```
|
||||
|
||||
**Plugins contribute:**
|
||||
- Configuration screens (UI)
|
||||
- Connector logic (runtime)
|
||||
- Step node types (workflow)
|
||||
- Doctor checks (diagnostics)
|
||||
- Agent types (deployment)
|
||||
|
||||
**Core engine provides:**
|
||||
- Workflow execution (DAG processing)
|
||||
- State machine management
|
||||
- Evidence generation
|
||||
- Policy evaluation
|
||||
- Credential brokering
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Core has no hard-coded integrations
|
||||
- Plugin interface is versioned and stable
|
||||
- Plugin failures cannot crash core
|
||||
- Core provides fallback behavior when plugins unavailable
|
||||
|
||||
### Principle 4: No Feature Gating
|
||||
|
||||
```
|
||||
INVARIANT: All plans include all features. Limits are only:
|
||||
- Number of environments
|
||||
- Number of new digests analyzed per day
|
||||
- Fair use on deployments
|
||||
```
|
||||
|
||||
This prevents:
|
||||
- "Pay for security" anti-pattern
|
||||
- Per-project/per-seat billing landmines
|
||||
- Feature fragmentation across tiers
|
||||
|
||||
**Implementation Requirements:**
|
||||
- No feature flags tied to billing tier
|
||||
- Quota enforcement is transparent (clear error messages)
|
||||
- Usage metrics exposed for customer visibility
|
||||
- Overage handling is graceful (soft limits with warnings)
|
||||
|
||||
### Principle 5: Offline-First Operation
|
||||
|
||||
```
|
||||
INVARIANT: All core operations MUST work in air-gapped environments.
|
||||
```
|
||||
|
||||
Implications:
|
||||
- No runtime calls to external APIs for core decisions
|
||||
- Vulnerability data synced via mirror bundles
|
||||
- Plugins may require connectivity; core does not
|
||||
- Evidence packets exportable for external audit
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Core decision logic has no external HTTP calls
|
||||
- All external data is pre-synced and cached
|
||||
- Plugin connectivity requirements are declared in manifest
|
||||
- Offline mode is explicit configuration, not degraded fallback
|
||||
|
||||
### Principle 6: Immutable Generated Artifacts
|
||||
|
||||
```
|
||||
INVARIANT: Every deployment generates and stores immutable artifacts.
|
||||
```
|
||||
|
||||
Generated artifacts:
|
||||
- `compose.stella.lock.yml`: Pinned digests, resolved env refs
|
||||
- `deploy.stella.script.dll`: Compiled C# script (or hash reference)
|
||||
- `release.evidence.json`: Decision record
|
||||
- `stella.version.json`: Version sticker placed on target
|
||||
|
||||
Version sticker enables:
|
||||
- Drift detection (expected vs actual)
|
||||
- Audit trail on target host
|
||||
- Rollback reference
|
||||
|
||||
**Implementation Requirements:**
|
||||
- Artifacts are content-addressed (hash in filename or metadata)
|
||||
- Artifacts are stored before deployment execution
|
||||
- Artifact storage is immutable (no overwrites)
|
||||
- Version sticker is atomic write on target
|
||||
|
||||
---
|
||||
|
||||
## Architectural Invariants (Enforced by Design)
|
||||
|
||||
These invariants are enforced through database constraints, code architecture, and operational controls.
|
||||
|
||||
| Invariant | Enforcement Mechanism |
|
||||
|-----------|----------------------|
|
||||
| Digests are immutable | Database constraint: digest column is unique, no updates |
|
||||
| Evidence packets are append-only | Evidence table has no UPDATE/DELETE permissions |
|
||||
| Secrets never in database | Vault integration; only references stored |
|
||||
| Plugins cannot bypass policy | Policy evaluation in core, not plugin |
|
||||
| Multi-tenant isolation | `tenant_id` FK on all tables; row-level security |
|
||||
| Workflow state is auditable | State transitions logged; no direct state manipulation |
|
||||
| Approvals are tamper-evident | Approval records are signed and append-only |
|
||||
|
||||
### Database Enforcement
|
||||
|
||||
```sql
|
||||
-- Example: Evidence table with no UPDATE/DELETE
|
||||
CREATE TABLE release.evidence_packets (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL REFERENCES tenants(id),
|
||||
promotion_id UUID NOT NULL REFERENCES release.promotions(id),
|
||||
content_hash TEXT NOT NULL,
|
||||
content JSONB NOT NULL,
|
||||
signature TEXT NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
-- No updated_at column; immutable by design
|
||||
);
|
||||
|
||||
-- Revoke UPDATE/DELETE from application role
|
||||
REVOKE UPDATE, DELETE ON release.evidence_packets FROM app_role;
|
||||
```
|
||||
|
||||
### Code Architecture Enforcement
|
||||
|
||||
```csharp
|
||||
// Policy evaluation is ALWAYS in core, never delegated to plugins
|
||||
public sealed class PromotionDecisionEngine
|
||||
{
|
||||
// Plugins provide gate implementations, but core orchestrates evaluation
|
||||
public async Task<DecisionResult> EvaluateAsync(
|
||||
Promotion promotion,
|
||||
IReadOnlyList<IGateProvider> gates,
|
||||
CancellationToken ct)
|
||||
{
|
||||
// Core controls evaluation order and aggregation
|
||||
var results = new List<GateResult>();
|
||||
foreach (var gate in gates)
|
||||
{
|
||||
// Plugin provides evaluation logic
|
||||
var result = await gate.EvaluateAsync(promotion, ct);
|
||||
results.Add(result);
|
||||
|
||||
// Core decides how to aggregate (plugins cannot override)
|
||||
if (result.IsBlocking && _policy.FailFast)
|
||||
break;
|
||||
}
|
||||
|
||||
// Core makes final decision
|
||||
return _decisionAggregator.Aggregate(results);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Document Conventions
|
||||
|
||||
Throughout the Release Orchestrator documentation:
|
||||
|
||||
- **MUST**: Mandatory requirement; non-compliance is a bug
|
||||
- **SHOULD**: Recommended but not mandatory; deviation requires justification
|
||||
- **MAY**: Optional; implementation decision
|
||||
- **Entity names**: `PascalCase` (e.g., `ReleaseBundle`)
|
||||
- **Table names**: `snake_case` (e.g., `release_bundles`)
|
||||
- **API paths**: `/api/v1/resource-name`
|
||||
- **Module names**: `kebab-case` (e.g., `release-manager`)
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Key Architectural Decisions](decisions.md)
|
||||
- [Module Architecture](../modules/overview.md)
|
||||
- [Security Architecture](../security/overview.md)
|
||||
Reference in New Issue
Block a user