- Implemented MigrationCategoryTests to validate migration categorization for startup, release, seed, and data migrations. - Added tests for edge cases, including null, empty, and whitespace migration names. - Created StartupMigrationHostTests to verify the behavior of the migration host with real PostgreSQL instances using Testcontainers. - Included tests for migration execution, schema creation, and handling of pending release migrations. - Added SQL migration files for testing: creating a test table, adding a column, a release migration, and seeding data.
8.7 KiB
ADR-0001: PostgreSQL for Control-Plane Storage
Status
Accepted
Date
2025-12-04
Authors
- Platform Team
Deciders
- Architecture Guild
- Platform Team
Context
StellaOps control-plane services (Authority, Scheduler, Notify, Concelier/Excititor, Policy) require persistent storage for:
- Identity and authorization data (users, roles, tokens, sessions)
- Job scheduling and execution state
- Notification rules, templates, and delivery tracking
- Vulnerability advisories and VEX statements
- Policy packs, rules, and evaluation history
Triggers for this decision:
- Licensing trust & ecosystem stability — PostgreSQL is licensed under the permissive PostgreSQL License (similar to MIT/BSD), OSI-approved, with no vendor lock-in concerns. MongoDB's SSPL license (2018) is not OSI-approved and creates uncertainty for self-hosted/sovereign deployments. For a platform emphasizing sovereignty and auditability, database licensing must be beyond reproach.
- Schema complexity — Control-plane domains have well-defined, relational schemas with referential integrity requirements (foreign keys, cascading deletes, constraints).
- Query patterns — Complex joins, aggregations, and window functions are common (e.g., finding all images affected by a newly published CVE).
- ACID requirements — Job scheduling, token issuance, and notification delivery require strong transactional guarantees.
- Multi-tenancy — Row-level security (RLS) needed for tenant isolation without schema-per-tenant overhead.
- Migration tooling — Need deterministic, forward-only migrations with advisory lock coordination for multi-instance deployments.
- Air-gap operation — All schema and data must be embeddable in assemblies without external network dependencies.
- Auditability — PostgreSQL's mature ecosystem includes proven audit logging, compliance tooling, and forensic capabilities trusted by regulated industries.
Decision
Adopt PostgreSQL (≥15) as the primary database for all StellaOps control-plane domains.
Key architectural choices:
1. Per-Module Schema Isolation
Each module owns exactly one PostgreSQL schema:
| Schema | Owner | Description |
|---|---|---|
auth |
Authority | Identity, authentication, authorization, licensing |
vuln |
Concelier | Vulnerability advisories, sources, affected packages |
vex |
Excititor | VEX statements, graphs, observations, consensus |
scheduler |
Scheduler | Jobs, triggers, workers, execution history |
notify |
Notify | Channels, templates, rules, deliveries |
policy |
Policy | Policy packs, rules, risk profiles |
audit |
Shared | Cross-cutting audit log (optional) |
Rationale:
- Clear ownership boundaries
- Independent migration lifecycles
- Schema-level access control
- Simplified testing and development
2. Multi-Tenancy via tenant_id Column
Single database, single schema set, tenant_id column on all tenant-scoped tables.
-- Session-level tenant context
SET app.tenant_id = '<tenant-uuid>';
-- Row-level security (defense in depth)
CREATE POLICY tenant_isolation ON <table>
USING (tenant_id = current_setting('app.tenant_id')::uuid);
Rationale:
- Simplest operational model
- Shared connection pooling
- Easy cross-tenant queries for admin operations
- Composite indexes on
(tenant_id, ...)for query performance
3. Forward-Only Migrations with Advisory Locks
Migrations are embedded in assemblies and executed at startup with PostgreSQL advisory locks:
SELECT pg_try_advisory_lock(hashtext('auth')); -- Per-schema lock
Migration categories:
- Startup (001-099): Automatic, non-breaking DDL
- Release (100-199): Manual CLI, breaking changes
- Seed (S001-S999): Reference data
- Data (DM001-DM999): Batched background jobs
Rationale:
- No down migrations needed (forward-only with fix-forward)
- Advisory locks prevent concurrent migrations across instances
- Checksum validation catches unauthorized modifications
- Air-gap compatible (no external migration service needed)
4. RustFS for Binary Artifacts
PostgreSQL stores metadata and indexes; RustFS stores binary artifacts (SBOMs, attestations, reports):
PostgreSQL: Schema definitions, relationships, indexes, audit trails
RustFS: sbom.cdx.json.zst, inventory.cdx.pb, bom-index.bin, *.dsse.json
Rationale:
- Right tool for each job
- PostgreSQL excellent for structured queries
- Object storage better for large binary blobs
- Clear separation of concerns
Consequences
Positive
- Licensing trust — PostgreSQL License is permissive, OSI-approved, and universally accepted. No vendor lock-in, no license ambiguity for sovereign deployments. Trusted by governments, regulated industries, and security-conscious organizations.
- Ecosystem stability — 30+ years of development, included in all major distributions, no license rug-pulls. Community governance ensures long-term trust.
- Relational integrity — Foreign keys, constraints, and transactions ensure data consistency.
- Query flexibility — Complex joins, CTEs, window functions, and full-text search available natively.
- Operational maturity — Well-understood backup, replication, and monitoring ecosystem.
- Row-level security — Built-in multi-tenancy support without application-layer hacks.
- Schema evolution — Mature migration tooling with online DDL capabilities.
- Performance — Excellent query planning, connection pooling (PgBouncer), and indexing options.
- Auditability — Proven audit logging extensions (pgAudit), compliance certifications, forensic tooling.
Negative
- Schema rigidity — Changes require migrations; less flexible than document stores for rapidly evolving schemas.
- Operational overhead — Requires PostgreSQL expertise for tuning, vacuuming, and monitoring.
- Connection limits — Need PgBouncer for high-concurrency workloads.
Follow-up Actions
- Create
docs/db/documentation directory with specification, rules, and conversion plan - Define migration infrastructure in
StellaOps.Infrastructure.Postgres - Complete phased conversion from MongoDB per
docs/db/tasks/PHASE_*.md - Update deployment guides for PostgreSQL requirements
- Add PostgreSQL health checks to all control-plane services
Rollback Criteria
Revert to MongoDB (or hybrid) if:
- Migration performance unacceptable (> 60s startup time)
- Query complexity exceeds PostgreSQL capabilities
- Operational burden exceeds team capacity
Alternatives Considered
Option A: Continue with MongoDB
Pros:
- Already in use for some components
- Flexible schema
- Good for document-centric workloads
Cons:
- Licensing uncertainty — MongoDB's SSPL (Server Side Public License, 2018) is not OSI-approved. Creates legal ambiguity for sovereign/self-hosted deployments, especially in regulated industries and government contexts where license provenance matters.
- Ecosystem trust erosion — SSPL switch caused major distributions (Debian, Fedora, RHEL) to drop MongoDB packages. Sovereign customers may have policies against non-OSI licenses.
- No referential integrity (app-enforced)
- Limited join capabilities
- Multi-tenancy requires additional logic
- No row-level security
- Less mature migration tooling
Rejected because: Licensing uncertainty is incompatible with StellaOps' sovereign-first positioning. Control-plane domains are also fundamentally relational with strong consistency requirements.
Option B: Hybrid (PostgreSQL + MongoDB)
Pros:
- Use each database for appropriate workloads
- Gradual migration possible
Cons:
- Two databases to operate and monitor
- Complex deployment
- Cross-database consistency challenges
- Higher operational burden
Rejected because: Unified PostgreSQL approach is simpler and sufficient for all control-plane needs.
Option C: CockroachDB / YugabyteDB
Pros:
- PostgreSQL-compatible
- Built-in horizontal scaling
- Multi-region capabilities
Cons:
- Additional operational complexity
- Less mature than PostgreSQL
- Overkill for current scale
- Air-gap deployment challenges
Rejected because: PostgreSQL provides sufficient scale and simpler operations for current requirements. Can revisit if horizontal scaling becomes necessary.
References
docs/db/README.md— Database documentation indexdocs/db/SPECIFICATION.md— Schema design specificationdocs/db/MIGRATION_STRATEGY.md— Migration execution strategydocs/db/RULES.md— Database coding rulesdocs/07_HIGH_LEVEL_ARCHITECTURE.md— High-level architecture overview