# Key Architectural Decisions This document records significant architectural decisions and their rationale. ## ADR-001: Digest-First Release Identity **Status:** Accepted **Context:** Container images can be referenced by tags (e.g., `v1.2.3`) or digests (e.g., `sha256:abc123...`). Tags are mutable - the same tag can point to different images over time. **Decision:** All releases are identified by immutable OCI digests, never tags. Tags are accepted as input but immediately resolved to digests at release creation time. **Consequences:** - Releases are immutable and reproducible - Digest mismatch at pull time indicates tampering (deployment fails) - Rollback targets specific digest, not "previous tag" - Requires registry integration for tag resolution - Users see both tag (friendly) and digest (authoritative) in UI --- ## ADR-002: Evidence for Every Decision **Status:** Accepted **Context:** Compliance and audit requirements demand proof of what was deployed, when, by whom, and why. **Decision:** Every promotion and deployment produces a cryptographically signed evidence packet that is immutable and append-only. **Consequences:** - Evidence table has no UPDATE/DELETE permissions - Evidence enables audit-grade compliance reporting - Evidence enables deterministic replay (same inputs + policy = same decision) - Evidence packets are exportable for external audit systems - Storage requirements increase over time --- ## ADR-003: Plugin Architecture for Integrations **Status:** Accepted **Context:** Organizations use diverse toolchains (registries, CI/CD, vaults, notification systems). Hard-coding integrations limits adoption. **Decision:** All integrations are implemented as plugins via a three-surface contract (Manifest, Connector Runtime, Step Provider). Core orchestration is stable and plugin-agnostic. **Consequences:** - Core has no hard-coded vendor integrations - New integrations can be added without core changes - Plugin failures cannot crash core (sandbox isolation) - Plugin interface must be versioned and stable - Additional complexity in plugin lifecycle management --- ## ADR-004: No Feature Gating **Status:** Accepted **Context:** Enterprise software often gates security features behind premium tiers, creating "pay for security" anti-patterns. **Decision:** All plans include all features. Pricing is based only on: - Number of environments - New digests analyzed per day - Fair use on deployments **Consequences:** - No feature flags tied to billing tier - Transparent pricing without feature fragmentation - May limit revenue optimization per customer - Quota enforcement must be clear and user-friendly --- ## ADR-005: Offline-First Operation **Status:** Accepted **Context:** Many organizations operate in air-gapped or restricted network environments. Dependency on external services limits adoption. **Decision:** All core operations must work in air-gapped environments. External data is synced via mirror bundles. Plugins may require connectivity; core does not. **Consequences:** - No runtime calls to external APIs for core decisions - Advisory data synced via offline bundles - Plugin connectivity requirements are declared in manifest - Evidence packets exportable for external submission - Additional complexity in data synchronization --- ## ADR-006: Agent-Based and Agentless Deployment **Status:** Accepted **Context:** Some organizations prefer agents for security isolation; others prefer agentless for simplicity. **Decision:** Support both agent-based (persistent daemon on targets) and agentless (SSH/WinRM on demand) deployment models. **Consequences:** - Agent provides better performance and reliability - Agentless reduces infrastructure footprint - Unified task model abstracts deployment details - Security model must handle both patterns - Higher testing matrix --- ## ADR-007: PostgreSQL as Primary Database **Status:** Accepted **Context:** Database choice affects scalability, operations, and feature availability. **Decision:** PostgreSQL (16+) as the primary database with: - Per-module schema isolation - Row-level security for multi-tenancy - JSONB for flexible configuration - Append-only triggers for evidence tables **Consequences:** - Proven scalability and reliability - Rich feature set (JSONB, RLS, triggers) - Single database technology to operate - Requires PostgreSQL expertise - Schema migrations must be carefully managed --- ## ADR-008: Workflow Engine with DAG Execution **Status:** Accepted **Context:** Deployment workflows need conditional logic, parallel execution, error handling, and rollback support. **Decision:** Implement a DAG-based workflow engine where: - Workflows are templates with nodes (steps) and edges (dependencies) - Steps execute when all dependencies are satisfied - Expressions reference previous step outputs - Built-in support for approval, retry, timeout, and rollback **Consequences:** - Flexible workflow composition - Visual representation in UI - Complex error handling scenarios supported - Learning curve for workflow authors - Expression engine security considerations --- ## ADR-009: Separation of Duties Enforcement **Status:** Accepted **Context:** Compliance requires that the person requesting a change cannot be the same person approving it. **Decision:** Separation of Duties (SoD) is enforced at the approval gateway level, preventing self-approval when SoD is enabled for an environment. **Consequences:** - Prevents single-person deployment to sensitive environments - Configurable per environment - May slow down deployments - Requires minimum team size for SoD-enabled environments --- ## ADR-010: Version Stickers for Drift Detection **Status:** Accepted **Context:** Knowing what's actually deployed on targets is essential for audit and troubleshooting. **Decision:** Every deployment writes a `stella.version.json` sticker file on the target containing release ID, digests, deployment timestamp, and deployer identity. **Consequences:** - Enables drift detection (expected vs actual) - Provides audit trail on target hosts - Enables accurate "what's deployed where" queries - Requires file access on targets - Sticker corruption/deletion must be handled --- ## ADR-011: Security Gate Integration **Status:** Accepted **Context:** Security scanning exists as a separate concern; release orchestration should leverage but not duplicate it. **Decision:** Security scanning remains in existing modules (Scanner, VEX). Release orchestration consumes scan results through a security gate that evaluates vulnerability thresholds. **Consequences:** - Clear separation of concerns - Existing scanning investment preserved - Gate configuration determines block thresholds - Requires API integration with scanning modules - Policy engine evaluates security verdicts --- ## ADR-012: gRPC for Agent Communication **Status:** Accepted **Context:** Agent communication requires efficient, bidirectional, and secure data transfer. **Decision:** Use gRPC for agent communication with: - mTLS for transport security - Bidirectional streaming for logs and progress - Protocol buffers for efficient serialization **Consequences:** - Efficient binary protocol - Strong typing via protobuf - Built-in streaming support - Requires gRPC infrastructure - Firewall considerations for gRPC traffic --- ## References - [Design Principles](principles.md) - [Security Architecture](../security/overview.md) - [Plugin System](../modules/plugin-system.md)