# Key Architectural Decisions

This document records significant architectural decisions and their rationale.

## ADR-001: Digest-First Release Identity

**Status:** Accepted

**Context:**
Container images can be referenced by tags (e.g., `v1.2.3`) or digests (e.g., `sha256:abc123...`). Tags are mutable - the same tag can point to different images over time.

**Decision:**
All releases are identified by immutable OCI digests, never tags. Tags are accepted as input but immediately resolved to digests at release creation time.

**Consequences:**
- Releases are immutable and reproducible
- Digest mismatch at pull time indicates tampering (deployment fails)
- Rollback targets specific digest, not "previous tag"
- Requires registry integration for tag resolution
- Users see both tag (friendly) and digest (authoritative) in UI

---

## ADR-002: Evidence for Every Decision

**Status:** Accepted

**Context:**
Compliance and audit requirements demand proof of what was deployed, when, by whom, and why.

**Decision:**
Every promotion and deployment produces a cryptographically signed evidence packet that is immutable and append-only.

**Consequences:**
- Evidence table has no UPDATE/DELETE permissions
- Evidence enables audit-grade compliance reporting
- Evidence enables deterministic replay (same inputs + policy = same decision)
- Evidence packets are exportable for external audit systems
- Storage requirements increase over time

---

## ADR-003: Plugin Architecture for Integrations

**Status:** Accepted

**Context:**
Organizations use diverse toolchains (registries, CI/CD, vaults, notification systems). Hard-coding integrations limits adoption.

**Decision:**
All integrations are implemented as plugins via a three-surface contract (Manifest, Connector Runtime, Step Provider). Core orchestration is stable and plugin-agnostic.

**Consequences:**
- Core has no hard-coded vendor integrations
- New integrations can be added without core changes
- Plugin failures cannot crash core (sandbox isolation)
- Plugin interface must be versioned and stable
- Additional complexity in plugin lifecycle management

---

## ADR-004: No Feature Gating

**Status:** Accepted

**Context:**
Enterprise software often gates security features behind premium tiers, creating "pay for security" anti-patterns.

**Decision:**
All plans include all features. Pricing is based only on:
- Number of environments
- New digests analyzed per day
- Fair use on deployments

**Consequences:**
- No feature flags tied to billing tier
- Transparent pricing without feature fragmentation
- May limit revenue optimization per customer
- Quota enforcement must be clear and user-friendly

---

## ADR-005: Offline-First Operation

**Status:** Accepted

**Context:**
Many organizations operate in air-gapped or restricted network environments. Dependency on external services limits adoption.

**Decision:**
All core operations must work in air-gapped environments. External data is synced via mirror bundles. Plugins may require connectivity; core does not.

**Consequences:**
- No runtime calls to external APIs for core decisions
- Advisory data synced via offline bundles
- Plugin connectivity requirements are declared in manifest
- Evidence packets exportable for external submission
- Additional complexity in data synchronization

---

## ADR-006: Agent-Based and Agentless Deployment

**Status:** Accepted

**Context:**
Some organizations prefer agents for security isolation; others prefer agentless for simplicity.

**Decision:**
Support both agent-based (persistent daemon on targets) and agentless (SSH/WinRM on demand) deployment models.

**Consequences:**
- Agent provides better performance and reliability
- Agentless reduces infrastructure footprint
- Unified task model abstracts deployment details
- Security model must handle both patterns
- Higher testing matrix

---

## ADR-007: PostgreSQL as Primary Database

**Status:** Accepted

**Context:**
Database choice affects scalability, operations, and feature availability.

**Decision:**
PostgreSQL (16+) as the primary database with:
- Per-module schema isolation
- Row-level security for multi-tenancy
- JSONB for flexible configuration
- Append-only triggers for evidence tables

**Consequences:**
- Proven scalability and reliability
- Rich feature set (JSONB, RLS, triggers)
- Single database technology to operate
- Requires PostgreSQL expertise
- Schema migrations must be carefully managed

---

## ADR-008: Workflow Engine with DAG Execution

**Status:** Accepted

**Context:**
Deployment workflows need conditional logic, parallel execution, error handling, and rollback support.

**Decision:**
Implement a DAG-based workflow engine where:
- Workflows are templates with nodes (steps) and edges (dependencies)
- Steps execute when all dependencies are satisfied
- Expressions reference previous step outputs
- Built-in support for approval, retry, timeout, and rollback

**Consequences:**
- Flexible workflow composition
- Visual representation in UI
- Complex error handling scenarios supported
- Learning curve for workflow authors
- Expression engine security considerations

---

## ADR-009: Separation of Duties Enforcement

**Status:** Accepted

**Context:**
Compliance requires that the person requesting a change cannot be the same person approving it.

**Decision:**
Separation of Duties (SoD) is enforced at the approval gateway level, preventing self-approval when SoD is enabled for an environment.

**Consequences:**
- Prevents single-person deployment to sensitive environments
- Configurable per environment
- May slow down deployments
- Requires minimum team size for SoD-enabled environments

---

## ADR-010: Version Stickers for Drift Detection

**Status:** Accepted

**Context:**
Knowing what's actually deployed on targets is essential for audit and troubleshooting.

**Decision:**
Every deployment writes a `stella.version.json` sticker file on the target containing release ID, digests, deployment timestamp, and deployer identity.

**Consequences:**
- Enables drift detection (expected vs actual)
- Provides audit trail on target hosts
- Enables accurate "what's deployed where" queries
- Requires file access on targets
- Sticker corruption/deletion must be handled

---

## ADR-011: Security Gate Integration

**Status:** Accepted

**Context:**
Security scanning exists as a separate concern; release orchestration should leverage but not duplicate it.

**Decision:**
Security scanning remains in existing modules (Scanner, VEX). Release orchestration consumes scan results through a security gate that evaluates vulnerability thresholds.

**Consequences:**
- Clear separation of concerns
- Existing scanning investment preserved
- Gate configuration determines block thresholds
- Requires API integration with scanning modules
- Policy engine evaluates security verdicts

---

## ADR-012: gRPC for Agent Communication

**Status:** Accepted

**Context:**
Agent communication requires efficient, bidirectional, and secure data transfer.

**Decision:**
Use gRPC for agent communication with:
- mTLS for transport security
- Bidirectional streaming for logs and progress
- Protocol buffers for efficient serialization

**Consequences:**
- Efficient binary protocol
- Strong typing via protobuf
- Built-in streaming support
- Requires gRPC infrastructure
- Firewall considerations for gRPC traffic

---

## References

- [Design Principles](principles.md)
- [Security Architecture](../security/overview.md)
- [Plugin System](../modules/plugin-system.md)