# MongoDB to PostgreSQL Conversion Plan **Version:** 2.0.0 **Status:** APPROVED **Created:** 2025-11-28 **Last Updated:** 2025-11-28 --- ## Executive Summary This document outlines the strategic plan to **convert** (not migrate) StellaOps from MongoDB to PostgreSQL for control-plane domains. The conversion follows a "strangler fig" pattern, introducing PostgreSQL repositories alongside existing MongoDB implementations and gradually switching each bounded context. **Key Finding:** StellaOps already has production-ready PostgreSQL patterns in the Orchestrator and Findings modules that serve as templates for all other modules. ### Related Documents | Document | Purpose | |----------|---------| | [SPECIFICATION.md](./SPECIFICATION.md) | Schema designs, naming conventions, data types | | [RULES.md](./RULES.md) | Database coding rules and patterns | | [VERIFICATION.md](./VERIFICATION.md) | Testing and verification requirements | | [tasks/](./tasks/) | Detailed task definitions per phase | --- ## 1. Principles & Scope ### 1.1 Goals Convert **control-plane** domains from MongoDB to PostgreSQL: | Domain | Current DB | Target | Priority | |--------|-----------|--------|----------| | Authority | `stellaops_authority` | PostgreSQL | P0 | | Scheduler | `stellaops_scheduler` | PostgreSQL | P0 | | Notify | `stellaops_notify` | PostgreSQL | P1 | | Policy | `stellaops_policy` | PostgreSQL | P1 | | Vulnerabilities (Concelier) | `concelier` | PostgreSQL | P2 | | VEX & Graph (Excititor) | `excititor` | PostgreSQL | P2 | | PacksRegistry | `stellaops_packs` | PostgreSQL | P3 | | IssuerDirectory | `stellaops_issuer` | PostgreSQL | P3 | ### 1.2 Non-Goals - Scanner result storage (remains object storage + Mongo for now) - Real-time event streams (separate infrastructure) - Legacy data archive (can remain in MongoDB read-only) ### 1.3 Constraints **MUST Preserve:** - Deterministic, replayable scans - "Preserve/prune source" rule for Concelier/Excititor - Lattice logic in `Scanner.WebService` (not in DB) - Air-gap friendliness and offline-kit packaging - Multi-tenant isolation patterns - Zero downtime during conversion ### 1.4 Conversion vs Migration This is a **conversion**, not a 1:1 document→row mapping: | Approach | When to Use | |----------|-------------| | **Normalize** | Identities, jobs, schedules, relationships | | **Keep JSONB** | Advisory payloads, provenance trails, evidence manifests | | **Drop/Archive** | Ephemeral data (caches, locks), historical logs | --- ## 2. Architecture ### 2.1 Strangler Fig Pattern ``` ┌─────────────────────────────────────────────────────────────┐ │ Service Layer │ ├─────────────────────────────────────────────────────────────┤ │ Repository Interface │ │ (e.g., IScheduleRepository) │ ├──────────────────────┬──────────────────────────────────────┤ │ MongoRepository │ PostgresRepository │ │ (existing) │ (new) │ ├──────────────────────┴──────────────────────────────────────┤ │ DI Container (configured switch) │ └─────────────────────────────────────────────────────────────┘ ``` ### 2.2 Configuration-Driven Backend Selection ```json { "Persistence": { "Authority": "Postgres", "Scheduler": "Postgres", "Concelier": "Mongo", "Excititor": "Mongo", "Notify": "Postgres", "Policy": "Mongo" } } ``` ### 2.3 Existing PostgreSQL Patterns The codebase already contains production-ready patterns: | Module | Location | Reusable Components | |--------|----------|---------------------| | Orchestrator | `src/Orchestrator/.../Infrastructure/Postgres/` | DataSource, tenant context, repository pattern | | Findings | `src/Findings/StellaOps.Findings.Ledger/Infrastructure/Postgres/` | Ledger events, Merkle anchors, projections | **Reference Implementation:** `OrchestratorDataSource.cs` --- ## 3. Data Tiering ### 3.1 Tier Definitions | Tier | Description | Strategy | |------|-------------|----------| | **A** | Critical business data | Full conversion with verification | | **B** | Important but recoverable | Convert active records only | | **C** | Ephemeral/cache data | Fresh start, no migration | ### 3.2 Module Tiering #### Authority | Collection | Tier | Strategy | |------------|------|----------| | `authority_users` | A | Full conversion | | `authority_clients` | A | Full conversion | | `authority_scopes` | A | Full conversion | | `authority_tokens` | B | Active tokens only | | `authority_service_accounts` | A | Full conversion | | `authority_login_attempts` | B | Recent 90 days | | `authority_revocations` | A | Full conversion | #### Scheduler | Collection | Tier | Strategy | |------------|------|----------| | `schedules` | A | Full conversion | | `runs` | B | Recent 180 days | | `graph_jobs` | B | Active/recent only | | `policy_jobs` | B | Active/recent only | | `impact_snapshots` | B | Recent 90 days | | `locks` | C | Fresh start | #### Concelier (Vulnerabilities) | Collection | Tier | Strategy | |------------|------|----------| | `advisory` | A | Full conversion | | `advisory_raw` | B | GridFS refs only | | `alias` | A | Full conversion | | `affected` | A | Full conversion | | `source` | A | Full conversion | | `source_state` | A | Full conversion | | `jobs`, `locks` | C | Fresh start | #### Excititor (VEX) | Collection | Tier | Strategy | |------------|------|----------| | `vex.statements` | A | Full conversion | | `vex.observations` | A | Full conversion | | `vex.linksets` | A | Full conversion | | `vex.consensus` | A | Full conversion | | `vex.raw` | B | Active/recent only | | `vex.cache` | C | Fresh start | --- ## 4. Execution Phases ### Phase Overview ``` Phase 0: Foundations [1 sprint] │ ├─→ Phase 1: Authority [1 sprint] │ ├─→ Phase 2: Scheduler [1 sprint] │ ├─→ Phase 3: Notify [1 sprint] │ ├─→ Phase 4: Policy [1 sprint] │ └─→ Phase 5: Concelier [2 sprints] │ └─→ Phase 6: Excititor [2-3 sprints] │ └─→ Phase 7: Cleanup [1 sprint] ``` ### Phase Summary | Phase | Scope | Duration | Dependencies | Deliverable | |-------|-------|----------|--------------|-------------| | 0 | Foundations | 1 sprint | None | PostgreSQL infrastructure, shared library | | 1 | Authority | 1 sprint | Phase 0 | Identity management on PostgreSQL | | 2 | Scheduler | 1 sprint | Phase 0 | Job scheduling on PostgreSQL | | 3 | Notify | 1 sprint | Phase 0 | Notifications on PostgreSQL | | 4 | Policy | 1 sprint | Phase 0 | Policy engine on PostgreSQL | | 5 | Concelier | 2 sprints | Phase 0 | Vulnerability index on PostgreSQL | | 6 | Excititor | 2-3 sprints | Phase 5 | VEX & graphs on PostgreSQL | | 7 | Cleanup | 1 sprint | All | MongoDB retired, docs updated | **Total: 10-12 sprints** ### Detailed Task Definitions See: - [tasks/PHASE_0_FOUNDATIONS.md](./tasks/PHASE_0_FOUNDATIONS.md) - [tasks/PHASE_1_AUTHORITY.md](./tasks/PHASE_1_AUTHORITY.md) - [tasks/PHASE_2_SCHEDULER.md](./tasks/PHASE_2_SCHEDULER.md) - [tasks/PHASE_3_NOTIFY.md](./tasks/PHASE_3_NOTIFY.md) - [tasks/PHASE_4_POLICY.md](./tasks/PHASE_4_POLICY.md) - [tasks/PHASE_5_VULNERABILITIES.md](./tasks/PHASE_5_VULNERABILITIES.md) - [tasks/PHASE_6_VEX_GRAPH.md](./tasks/PHASE_6_VEX_GRAPH.md) - [tasks/PHASE_7_CLEANUP.md](./tasks/PHASE_7_CLEANUP.md) --- ## 5. Conversion Strategy ### 5.1 Per-Module Approach ``` 1. Create PostgreSQL storage project 2. Implement schema migrations 3. Implement repository interfaces 4. Add configuration switch 5. Enable dual-write (if Tier A) 6. Run verification tests 7. Switch to PostgreSQL-only 8. Archive MongoDB data ``` ### 5.2 Dual-Write Pattern For Tier A data requiring historical continuity: ``` ┌──────────────────────────────────────────────────────────────┐ │ DualWriteRepository │ ├──────────────────────────────────────────────────────────────┤ │ Write: PostgreSQL (primary) + MongoDB (secondary) │ │ Read: PostgreSQL (primary) → MongoDB (fallback) │ │ Config: WriteToBoth, FallbackToMongo, ConvertOnRead │ └──────────────────────────────────────────────────────────────┘ ``` ### 5.3 Fresh Start Pattern For Tier C ephemeral data: ``` ┌──────────────────────────────────────────────────────────────┐ │ 1. Deploy PostgreSQL schema │ │ 2. Switch configuration to PostgreSQL │ │ 3. New data goes to PostgreSQL only │ │ 4. Old MongoDB data ages out naturally │ └──────────────────────────────────────────────────────────────┘ ``` --- ## 6. Risk Assessment ### 6.1 Technical Risks | Risk | Impact | Likelihood | Mitigation | |------|--------|------------|------------| | Data loss during conversion | High | Low | Dual-write mode, extensive verification | | Performance regression | Medium | Medium | Load testing before switch, index optimization | | Determinism violation | High | Medium | Automated verification tests, parallel pipeline | | Schema evolution conflicts | Medium | Low | Migration framework, schema versioning | | Transaction semantics differences | Medium | Low | Code review, integration tests | ### 6.2 Operational Risks | Risk | Impact | Likelihood | Mitigation | |------|--------|------------|------------| | Extended conversion timeline | Medium | Medium | Phase-based approach, clear milestones | | Team learning curve | Low | Medium | Reference implementations, documentation | | Rollback complexity | Medium | Low | Keep Mongo data until verified, feature flags | ### 6.3 Rollback Strategy Each phase has independent rollback capability: | Level | Action | Recovery Time | |-------|--------|---------------| | Configuration | Change `Persistence:` to `Mongo` | Minutes | | Data | MongoDB data retained during dual-write | None needed | | Code | Git revert (PostgreSQL code isolated) | Hours | --- ## 7. Success Criteria ### 7.1 Per-Module Criteria - [ ] All existing integration tests pass with PostgreSQL backend - [ ] No performance regression >10% on critical paths - [ ] Deterministic outputs verified against MongoDB baseline - [ ] Zero data loss during conversion - [ ] Tenant isolation verified ### 7.2 Overall Criteria - [ ] All control-plane modules running on PostgreSQL - [ ] MongoDB retired from production for converted modules - [ ] Air-gap kit updated with PostgreSQL support - [ ] Documentation updated for PostgreSQL operations - [ ] Runbooks updated for PostgreSQL troubleshooting --- ## 8. Project Structure ### 8.1 New Projects ``` src/ ├── Shared/ │ └── StellaOps.Infrastructure.Postgres/ │ ├── DataSourceBase.cs │ ├── Migrations/ │ │ ├── IPostgresMigration.cs │ │ └── PostgresMigrationRunner.cs │ ├── Extensions/ │ │ └── NpgsqlExtensions.cs │ └── ServiceCollectionExtensions.cs │ ├── Authority/ │ └── __Libraries/ │ └── StellaOps.Authority.Storage.Postgres/ │ ├── AuthorityDataSource.cs │ ├── Repositories/ │ ├── Migrations/ │ └── ServiceCollectionExtensions.cs │ ├── Scheduler/ │ └── __Libraries/ │ └── StellaOps.Scheduler.Storage.Postgres/ │ ├── Notify/ │ └── __Libraries/ │ └── StellaOps.Notify.Storage.Postgres/ │ ├── Policy/ │ └── __Libraries/ │ └── StellaOps.Policy.Storage.Postgres/ │ ├── Concelier/ │ └── __Libraries/ │ └── StellaOps.Concelier.Storage.Postgres/ │ └── Excititor/ └── __Libraries/ └── StellaOps.Excititor.Storage.Postgres/ ``` ### 8.2 Schema Files ``` docs/db/ ├── schemas/ │ ├── authority.sql │ ├── vuln.sql │ ├── vex.sql │ ├── scheduler.sql │ ├── notify.sql │ └── policy.sql ``` --- ## 9. Timeline ### 9.1 Sprint Schedule | Sprint | Phase | Focus | |--------|-------|-------| | 1 | 0 | PostgreSQL infrastructure, shared library | | 2 | 1 | Authority module conversion | | 3 | 2 | Scheduler module conversion | | 4 | 3 | Notify module conversion | | 5 | 4 | Policy module conversion | | 6-7 | 5 | Concelier/Vulnerability conversion | | 8-10 | 6 | Excititor/VEX conversion | | 11 | 7 | Cleanup, optimization, documentation | ### 9.2 Milestones | Milestone | Sprint | Criteria | |-----------|--------|----------| | M1: Infrastructure Ready | 1 | PostgreSQL cluster operational, CI tests passing | | M2: Identity Converted | 2 | Authority on PostgreSQL, auth flows working | | M3: Scheduling Converted | 3 | Scheduler on PostgreSQL, jobs executing | | M4: Core Services Converted | 5 | Notify + Policy on PostgreSQL | | M5: Vulnerability Index Converted | 7 | Concelier on PostgreSQL, scans deterministic | | M6: VEX Converted | 10 | Excititor on PostgreSQL, graphs stable | | M7: MongoDB Retired | 11 | All modules converted, Mongo archived | --- ## 10. Governance ### 10.1 Decision Log | Date | Decision | Rationale | Approver | |------|----------|-----------|----------| | 2025-11-28 | Strangler fig pattern | Allows gradual rollout with rollback | Architecture Team | | 2025-11-28 | JSONB for semi-structured data | Preserves flexibility, simplifies conversion | Architecture Team | | 2025-11-28 | Phase 0 first | Infrastructure must be stable before modules | Architecture Team | ### 10.2 Change Control Changes to this plan require: 1. Impact assessment documented 2. Risk analysis updated 3. Approval from Architecture Team 4. Updated task definitions in `docs/db/tasks/` ### 10.3 Status Reporting Weekly status updates in sprint files tracking: - Tasks completed - Blockers encountered - Verification results - Next sprint objectives --- ## Appendix A: Reference Implementation ### DataSource Pattern ```csharp public sealed class ModuleDataSource : IAsyncDisposable { private readonly NpgsqlDataSource _dataSource; public async Task OpenConnectionAsync( string tenantId, CancellationToken cancellationToken = default) { var connection = await _dataSource.OpenConnectionAsync(cancellationToken); await ConfigureSessionAsync(connection, tenantId, cancellationToken); return connection; } private static async Task ConfigureSessionAsync( NpgsqlConnection connection, string tenantId, CancellationToken cancellationToken) { await using var cmd = connection.CreateCommand(); cmd.CommandText = $""" SET app.tenant_id = '{tenantId}'; SET timezone = 'UTC'; SET statement_timeout = '30s'; """; await cmd.ExecuteNonQueryAsync(cancellationToken); } } ``` ### Repository Pattern See [RULES.md](./RULES.md) Section 1 for complete repository implementation guidelines. --- ## Appendix B: Glossary | Term | Definition | |------|------------| | **Strangler Fig** | Pattern where new system grows alongside old, gradually replacing it | | **Dual-Write** | Writing to both MongoDB and PostgreSQL during transition | | **Tier A/B/C** | Data classification by criticality for migration strategy | | **DataSource** | Npgsql connection factory with tenant context configuration | | **Determinism** | Property that same inputs always produce same outputs | --- *Document Version: 2.0.0* *Last Updated: 2025-11-28*