Files

7.4 KiB

Key Architectural Decisions

This document records significant architectural decisions and their rationale.

ADR-001: Digest-First Release Identity

Status: Accepted

Context: Container images can be referenced by tags (e.g., v1.2.3) or digests (e.g., sha256:abc123...). Tags are mutable - the same tag can point to different images over time.

Decision: All releases are identified by immutable OCI digests, never tags. Tags are accepted as input but immediately resolved to digests at release creation time.

Consequences:

  • Releases are immutable and reproducible
  • Digest mismatch at pull time indicates tampering (deployment fails)
  • Rollback targets specific digest, not "previous tag"
  • Requires registry integration for tag resolution
  • Users see both tag (friendly) and digest (authoritative) in UI

ADR-002: Evidence for Every Decision

Status: Accepted

Context: Compliance and audit requirements demand proof of what was deployed, when, by whom, and why.

Decision: Every promotion and deployment produces a cryptographically signed evidence packet that is immutable and append-only.

Consequences:

  • Evidence table has no UPDATE/DELETE permissions
  • Evidence enables audit-grade compliance reporting
  • Evidence enables deterministic replay (same inputs + policy = same decision)
  • Evidence packets are exportable for external audit systems
  • Storage requirements increase over time

ADR-003: Plugin Architecture for Integrations

Status: Accepted

Context: Organizations use diverse toolchains (registries, CI/CD, vaults, notification systems). Hard-coding integrations limits adoption.

Decision: All integrations are implemented as plugins via a three-surface contract (Manifest, Connector Runtime, Step Provider). Core orchestration is stable and plugin-agnostic.

Consequences:

  • Core has no hard-coded vendor integrations
  • New integrations can be added without core changes
  • Plugin failures cannot crash core (sandbox isolation)
  • Plugin interface must be versioned and stable
  • Additional complexity in plugin lifecycle management

ADR-004: No Feature Gating

Status: Accepted

Context: Enterprise software often gates security features behind premium tiers, creating "pay for security" anti-patterns.

Decision: All plans include all features. Pricing is based only on:

  • Number of environments
  • New digests analyzed per day
  • Fair use on deployments

Consequences:

  • No feature flags tied to billing tier
  • Transparent pricing without feature fragmentation
  • May limit revenue optimization per customer
  • Quota enforcement must be clear and user-friendly

ADR-005: Offline-First Operation

Status: Accepted

Context: Many organizations operate in air-gapped or restricted network environments. Dependency on external services limits adoption.

Decision: All core operations must work in air-gapped environments. External data is synced via mirror bundles. Plugins may require connectivity; core does not.

Consequences:

  • No runtime calls to external APIs for core decisions
  • Advisory data synced via offline bundles
  • Plugin connectivity requirements are declared in manifest
  • Evidence packets exportable for external submission
  • Additional complexity in data synchronization

ADR-006: Agent-Based and Agentless Deployment

Status: Accepted

Context: Some organizations prefer agents for security isolation; others prefer agentless for simplicity.

Decision: Support both agent-based (persistent daemon on targets) and agentless (SSH/WinRM on demand) deployment models.

Consequences:

  • Agent provides better performance and reliability
  • Agentless reduces infrastructure footprint
  • Unified task model abstracts deployment details
  • Security model must handle both patterns
  • Higher testing matrix

ADR-007: PostgreSQL as Primary Database

Status: Accepted

Context: Database choice affects scalability, operations, and feature availability.

Decision: PostgreSQL (16+) as the primary database with:

  • Per-module schema isolation
  • Row-level security for multi-tenancy
  • JSONB for flexible configuration
  • Append-only triggers for evidence tables

Consequences:

  • Proven scalability and reliability
  • Rich feature set (JSONB, RLS, triggers)
  • Single database technology to operate
  • Requires PostgreSQL expertise
  • Schema migrations must be carefully managed

ADR-008: Workflow Engine with DAG Execution

Status: Accepted

Context: Deployment workflows need conditional logic, parallel execution, error handling, and rollback support.

Decision: Implement a DAG-based workflow engine where:

  • Workflows are templates with nodes (steps) and edges (dependencies)
  • Steps execute when all dependencies are satisfied
  • Expressions reference previous step outputs
  • Built-in support for approval, retry, timeout, and rollback

Consequences:

  • Flexible workflow composition
  • Visual representation in UI
  • Complex error handling scenarios supported
  • Learning curve for workflow authors
  • Expression engine security considerations

ADR-009: Separation of Duties Enforcement

Status: Accepted

Context: Compliance requires that the person requesting a change cannot be the same person approving it.

Decision: Separation of Duties (SoD) is enforced at the approval gateway level, preventing self-approval when SoD is enabled for an environment.

Consequences:

  • Prevents single-person deployment to sensitive environments
  • Configurable per environment
  • May slow down deployments
  • Requires minimum team size for SoD-enabled environments

ADR-010: Version Stickers for Drift Detection

Status: Accepted

Context: Knowing what's actually deployed on targets is essential for audit and troubleshooting.

Decision: Every deployment writes a stella.version.json sticker file on the target containing release ID, digests, deployment timestamp, and deployer identity.

Consequences:

  • Enables drift detection (expected vs actual)
  • Provides audit trail on target hosts
  • Enables accurate "what's deployed where" queries
  • Requires file access on targets
  • Sticker corruption/deletion must be handled

ADR-011: Security Gate Integration

Status: Accepted

Context: Security scanning exists as a separate concern; release orchestration should leverage but not duplicate it.

Decision: Security scanning remains in existing modules (Scanner, VEX). Release orchestration consumes scan results through a security gate that evaluates vulnerability thresholds.

Consequences:

  • Clear separation of concerns
  • Existing scanning investment preserved
  • Gate configuration determines block thresholds
  • Requires API integration with scanning modules
  • Policy engine evaluates security verdicts

ADR-012: gRPC for Agent Communication

Status: Accepted

Context: Agent communication requires efficient, bidirectional, and secure data transfer.

Decision: Use gRPC for agent communication with:

  • mTLS for transport security
  • Bidirectional streaming for logs and progress
  • Protocol buffers for efficient serialization

Consequences:

  • Efficient binary protocol
  • Strong typing via protobuf
  • Built-in streaming support
  • Requires gRPC infrastructure
  • Firewall considerations for gRPC traffic

References