17 KiB
MongoDB to PostgreSQL Conversion Plan
Version: 2.0.0 Status: ✅ CONVERSION COMPLETED — Sprint 4400 (November 2024) Created: 2025-11-28 Last Updated: 2025-12-25
COMPLETION NOTICE: MongoDB to PostgreSQL conversion completed in Sprint 4400 (November 2024). All control-plane services now use PostgreSQL 16+. MongoDB dependencies removed from all modules. This document is preserved for historical reference and lessons learned.
Executive Summary
This document outlines the strategic plan to convert (not migrate) StellaOps from MongoDB to PostgreSQL for control-plane domains. The conversion follows a "strangler fig" pattern, introducing PostgreSQL repositories alongside existing MongoDB implementations and gradually switching each bounded context.
Key Finding: StellaOps already has production-ready PostgreSQL patterns in the Orchestrator and Findings modules that serve as templates for all other modules.
Related Documents
| Document | Purpose |
|---|---|
| SPECIFICATION.md | Schema designs, naming conventions, data types |
| RULES.md | Database coding rules and patterns |
| VERIFICATION.md | Testing and verification requirements |
| tasks/ | Detailed task definitions per phase |
1. Principles & Scope
1.1 Goals
Convert control-plane domains from MongoDB to PostgreSQL:
| Domain | Current DB | Target | Priority |
|---|---|---|---|
| Authority | stellaops_authority |
PostgreSQL | P0 |
| Scheduler | stellaops_scheduler |
PostgreSQL | P0 |
| Notify | stellaops_notify |
PostgreSQL | P1 |
| Policy | stellaops_policy |
PostgreSQL | P1 |
| Vulnerabilities (Concelier) | concelier |
PostgreSQL | P2 |
| VEX & Graph (Excititor) | excititor |
PostgreSQL | P2 |
| PacksRegistry | stellaops_packs |
PostgreSQL | P3 |
| IssuerDirectory | stellaops_issuer |
PostgreSQL | P3 |
1.2 Non-Goals
- Scanner result storage (remains object storage + Mongo for now)
- Real-time event streams (separate infrastructure)
- Legacy data archive (can remain in MongoDB read-only)
1.3 Constraints
MUST Preserve:
- Deterministic, replayable scans
- "Preserve/prune source" rule for Concelier/Excititor
- Lattice logic in
Scanner.WebService(not in DB) - Air-gap friendliness and offline-kit packaging
- Multi-tenant isolation patterns
- Zero downtime during conversion
1.4 Conversion vs Migration
This is a conversion, not a 1:1 document→row mapping:
| Approach | When to Use |
|---|---|
| Normalize | Identities, jobs, schedules, relationships |
| Keep JSONB | Advisory payloads, provenance trails, evidence manifests |
| Drop/Archive | Ephemeral data (caches, locks), historical logs |
2. Architecture
2.1 Strangler Fig Pattern
┌─────────────────────────────────────────────────────────────┐
│ Service Layer │
├─────────────────────────────────────────────────────────────┤
│ Repository Interface │
│ (e.g., IScheduleRepository) │
├──────────────────────┬──────────────────────────────────────┤
│ MongoRepository │ PostgresRepository │
│ (existing) │ (new) │
├──────────────────────┴──────────────────────────────────────┤
│ DI Container (configured switch) │
└─────────────────────────────────────────────────────────────┘
2.2 Configuration-Driven Backend Selection
{
"Persistence": {
"Authority": "Postgres",
"Scheduler": "Postgres",
"Concelier": "Mongo",
"Excititor": "Mongo",
"Notify": "Postgres",
"Policy": "Mongo"
}
}
2.3 Existing PostgreSQL Patterns
The codebase already contains production-ready patterns:
| Module | Location | Reusable Components |
|---|---|---|
| Orchestrator | src/Orchestrator/.../Infrastructure/Postgres/ |
DataSource, tenant context, repository pattern |
| Findings | src/Findings/StellaOps.Findings.Ledger/Infrastructure/Postgres/ |
Ledger events, Merkle anchors, projections |
Reference Implementation: OrchestratorDataSource.cs
3. Data Tiering
3.1 Tier Definitions
| Tier | Description | Strategy |
|---|---|---|
| A | Critical business data | Full conversion with verification |
| B | Important but recoverable | Convert active records only |
| C | Ephemeral/cache data | Fresh start, no migration |
3.2 Module Tiering
Authority
| Collection | Tier | Strategy |
|---|---|---|
authority_users |
A | Full conversion |
authority_clients |
A | Full conversion |
authority_scopes |
A | Full conversion |
authority_tokens |
B | Active tokens only |
authority_service_accounts |
A | Full conversion |
authority_login_attempts |
B | Recent 90 days |
authority_revocations |
A | Full conversion |
Scheduler
| Collection | Tier | Strategy |
|---|---|---|
schedules |
A | Full conversion |
runs |
B | Recent 180 days |
graph_jobs |
B | Active/recent only |
policy_jobs |
B | Active/recent only |
impact_snapshots |
B | Recent 90 days |
locks |
C | Fresh start |
Concelier (Vulnerabilities)
| Collection | Tier | Strategy |
|---|---|---|
advisory |
A | Full conversion |
advisory_raw |
B | GridFS refs only |
alias |
A | Full conversion |
affected |
A | Full conversion |
source |
A | Full conversion |
source_state |
A | Full conversion |
jobs, locks |
C | Fresh start |
Excititor (VEX)
| Collection | Tier | Strategy |
|---|---|---|
vex.statements |
A | Full conversion |
vex.observations |
A | Full conversion |
vex.linksets |
A | Full conversion |
vex.consensus |
A | Full conversion |
vex.raw |
B | Active/recent only |
vex.cache |
C | Fresh start |
4. Execution Phases
Phase Overview
Phase 0: Foundations [1 sprint]
│
├─→ Phase 1: Authority [1 sprint]
│
├─→ Phase 2: Scheduler [1 sprint]
│
├─→ Phase 3: Notify [1 sprint]
│
├─→ Phase 4: Policy [1 sprint]
│
└─→ Phase 5: Concelier [2 sprints]
│
└─→ Phase 6: Excititor [2-3 sprints]
│
└─→ Phase 7: Cleanup [1 sprint]
Phase Summary
| Phase | Scope | Duration | Dependencies | Deliverable |
|---|---|---|---|---|
| 0 | Foundations | 1 sprint | None | PostgreSQL infrastructure, shared library |
| 1 | Authority | 1 sprint | Phase 0 | Identity management on PostgreSQL |
| 2 | Scheduler | 1 sprint | Phase 0 | Job scheduling on PostgreSQL |
| 3 | Notify | 1 sprint | Phase 0 | Notifications on PostgreSQL |
| 4 | Policy | 1 sprint | Phase 0 | Policy engine on PostgreSQL |
| 5 | Concelier | 2 sprints | Phase 0 | Vulnerability index on PostgreSQL |
| 6 | Excititor | 2-3 sprints | Phase 5 | VEX & graphs on PostgreSQL |
| 7 | Cleanup | 1 sprint | All | MongoDB retired, docs updated |
Total: 10-12 sprints
Detailed Task Definitions
See:
- tasks/PHASE_0_FOUNDATIONS.md
- tasks/PHASE_1_AUTHORITY.md
- tasks/PHASE_2_SCHEDULER.md
- tasks/PHASE_3_NOTIFY.md
- tasks/PHASE_4_POLICY.md
- tasks/PHASE_5_VULNERABILITIES.md
- tasks/PHASE_6_VEX_GRAPH.md
- tasks/PHASE_7_CLEANUP.md
5. Conversion Strategy
5.1 Per-Module Approach
1. Create PostgreSQL storage project
2. Implement schema migrations
3. Implement repository interfaces
4. Add configuration switch
5. (Retired) Dual-write was used during migration for Tier A; all modules are now Postgres-only.
6. Run verification tests
7. Switch to PostgreSQL-only
8. Archive MongoDB data
5.2 Dual-Write Pattern
For Tier A data requiring historical continuity:
┌──────────────────────────────────────────────────────────────┐
│ DualWriteRepository │
├──────────────────────────────────────────────────────────────┤
│ Write: PostgreSQL (primary) + MongoDB (secondary) │
│ Read: PostgreSQL (primary) → MongoDB (fallback) │
│ Config: WriteToBoth, FallbackToMongo, ConvertOnRead │
└──────────────────────────────────────────────────────────────┘
5.3 Fresh Start Pattern
For Tier C ephemeral data:
┌──────────────────────────────────────────────────────────────┐
│ 1. Deploy PostgreSQL schema │
│ 2. Switch configuration to PostgreSQL │
│ 3. New data goes to PostgreSQL only │
│ 4. Old MongoDB data ages out naturally │
└──────────────────────────────────────────────────────────────┘
6. Risk Assessment
6.1 Technical Risks
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| Data loss during conversion | High | Low | Dual-write mode, extensive verification |
| Performance regression | Medium | Medium | Load testing before switch, index optimization |
| Determinism violation | High | Medium | Automated verification tests, parallel pipeline |
| Schema evolution conflicts | Medium | Low | Migration framework, schema versioning |
| Transaction semantics differences | Medium | Low | Code review, integration tests |
6.2 Operational Risks
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| Extended conversion timeline | Medium | Medium | Phase-based approach, clear milestones |
| Team learning curve | Low | Medium | Reference implementations, documentation |
| Rollback complexity | Medium | Low | Keep Mongo data until verified, feature flags |
6.3 Rollback Strategy
Each phase has independent rollback capability:
| Level | Action | Recovery Time |
|---|---|---|
| Configuration | Change Persistence:<Module> to Mongo |
Minutes |
| Data | MongoDB data retained during dual-write | None needed (historical note; dual-write ended after cutover) |
| Code | Git revert (PostgreSQL code isolated) | Hours |
7. Success Criteria
7.1 Per-Module Criteria
- All existing integration tests pass with PostgreSQL backend
- No performance regression >10% on critical paths
- Deterministic outputs verified against MongoDB baseline
- Zero data loss during conversion
- Tenant isolation verified
7.2 Overall Criteria
- All control-plane modules running on PostgreSQL
- MongoDB retired from production for converted modules
- Air-gap kit updated with PostgreSQL support
- Documentation updated for PostgreSQL operations
- Runbooks updated for PostgreSQL troubleshooting
8. Project Structure
8.1 New Projects
src/
├── Shared/
│ └── StellaOps.Infrastructure.Postgres/
│ ├── DataSourceBase.cs
│ ├── Migrations/
│ │ ├── IPostgresMigration.cs
│ │ └── PostgresMigrationRunner.cs
│ ├── Extensions/
│ │ └── NpgsqlExtensions.cs
│ └── ServiceCollectionExtensions.cs
│
├── Authority/
│ └── __Libraries/
│ └── StellaOps.Authority.Storage.Postgres/
│ ├── AuthorityDataSource.cs
│ ├── Repositories/
│ ├── Migrations/
│ └── ServiceCollectionExtensions.cs
│
├── Scheduler/
│ └── __Libraries/
│ └── StellaOps.Scheduler.Storage.Postgres/
│
├── Notify/
│ └── __Libraries/
│ └── StellaOps.Notify.Storage.Postgres/
│
├── Policy/
│ └── __Libraries/
│ └── StellaOps.Policy.Storage.Postgres/
│
├── Concelier/
│ └── __Libraries/
│ └── StellaOps.Concelier.Storage.Postgres/
│
└── Excititor/
└── __Libraries/
└── StellaOps.Excititor.Storage.Postgres/
8.2 Schema Files
docs/db/
├── schemas/
│ ├── authority.sql
│ ├── vuln.sql
│ ├── vex.sql
│ ├── scheduler.sql
│ ├── notify.sql
│ └── policy.sql
9. Timeline
9.1 Sprint Schedule
| Sprint | Phase | Focus |
|---|---|---|
| 1 | 0 | PostgreSQL infrastructure, shared library |
| 2 | 1 | Authority module conversion |
| 3 | 2 | Scheduler module conversion |
| 4 | 3 | Notify module conversion |
| 5 | 4 | Policy module conversion |
| 6-7 | 5 | Concelier/Vulnerability conversion |
| 8-10 | 6 | Excititor/VEX conversion |
| 11 | 7 | Cleanup, optimization, documentation |
9.2 Milestones
| Milestone | Sprint | Criteria |
|---|---|---|
| M1: Infrastructure Ready | 1 | PostgreSQL cluster operational, CI tests passing |
| M2: Identity Converted | 2 | Authority on PostgreSQL, auth flows working |
| M3: Scheduling Converted | 3 | Scheduler on PostgreSQL, jobs executing |
| M4: Core Services Converted | 5 | Notify + Policy on PostgreSQL |
| M5: Vulnerability Index Converted | 7 | Concelier on PostgreSQL, scans deterministic |
| M6: VEX Converted | 10 | Excititor on PostgreSQL, graphs stable |
| M7: MongoDB Retired | 11 | All modules converted, Mongo archived |
10. Governance
10.1 Decision Log
| Date | Decision | Rationale | Approver |
|---|---|---|---|
| 2025-11-28 | Strangler fig pattern | Allows gradual rollout with rollback | Architecture Team |
| 2025-11-28 | JSONB for semi-structured data | Preserves flexibility, simplifies conversion | Architecture Team |
| 2025-11-28 | Phase 0 first | Infrastructure must be stable before modules | Architecture Team |
10.2 Change Control
Changes to this plan require:
- Impact assessment documented
- Risk analysis updated
- Approval from Architecture Team
- Updated task definitions in
docs/db/tasks/
10.3 Status Reporting
Weekly status updates in sprint files tracking:
- Tasks completed
- Blockers encountered
- Verification results
- Next sprint objectives
Appendix A: Reference Implementation
DataSource Pattern
public sealed class ModuleDataSource : IAsyncDisposable
{
private readonly NpgsqlDataSource _dataSource;
public async Task<NpgsqlConnection> OpenConnectionAsync(
string tenantId,
CancellationToken cancellationToken = default)
{
var connection = await _dataSource.OpenConnectionAsync(cancellationToken);
await ConfigureSessionAsync(connection, tenantId, cancellationToken);
return connection;
}
private static async Task ConfigureSessionAsync(
NpgsqlConnection connection,
string tenantId,
CancellationToken cancellationToken)
{
await using var cmd = connection.CreateCommand();
cmd.CommandText = $"""
SET app.tenant_id = '{tenantId}';
SET timezone = 'UTC';
SET statement_timeout = '30s';
""";
await cmd.ExecuteNonQueryAsync(cancellationToken);
}
}
Repository Pattern
See RULES.md Section 1 for complete repository implementation guidelines.
Appendix B: Glossary
| Term | Definition |
|---|---|
| Strangler Fig | Pattern where new system grows alongside old, gradually replacing it |
| Dual-Write | Writing to both MongoDB and PostgreSQL during transition |
| Tier A/B/C | Data classification by criticality for migration strategy |
| DataSource | Npgsql connection factory with tenant context configuration |
| Determinism | Property that same inputs always produce same outputs |
Document Version: 2.0.0 Last Updated: 2025-11-28