Files
git.stella-ops.org/docs/db/CONVERSION_PLAN.md
StellaOps Bot 2548abc56f
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
up
2025-11-29 01:35:49 +02:00

17 KiB

MongoDB to PostgreSQL Conversion Plan

Version: 2.0.0 Status: APPROVED Created: 2025-11-28 Last Updated: 2025-11-28


Executive Summary

This document outlines the strategic plan to convert (not migrate) StellaOps from MongoDB to PostgreSQL for control-plane domains. The conversion follows a "strangler fig" pattern, introducing PostgreSQL repositories alongside existing MongoDB implementations and gradually switching each bounded context.

Key Finding: StellaOps already has production-ready PostgreSQL patterns in the Orchestrator and Findings modules that serve as templates for all other modules.

Document Purpose
SPECIFICATION.md Schema designs, naming conventions, data types
RULES.md Database coding rules and patterns
VERIFICATION.md Testing and verification requirements
tasks/ Detailed task definitions per phase

1. Principles & Scope

1.1 Goals

Convert control-plane domains from MongoDB to PostgreSQL:

Domain Current DB Target Priority
Authority stellaops_authority PostgreSQL P0
Scheduler stellaops_scheduler PostgreSQL P0
Notify stellaops_notify PostgreSQL P1
Policy stellaops_policy PostgreSQL P1
Vulnerabilities (Concelier) concelier PostgreSQL P2
VEX & Graph (Excititor) excititor PostgreSQL P2
PacksRegistry stellaops_packs PostgreSQL P3
IssuerDirectory stellaops_issuer PostgreSQL P3

1.2 Non-Goals

  • Scanner result storage (remains object storage + Mongo for now)
  • Real-time event streams (separate infrastructure)
  • Legacy data archive (can remain in MongoDB read-only)

1.3 Constraints

MUST Preserve:

  • Deterministic, replayable scans
  • "Preserve/prune source" rule for Concelier/Excititor
  • Lattice logic in Scanner.WebService (not in DB)
  • Air-gap friendliness and offline-kit packaging
  • Multi-tenant isolation patterns
  • Zero downtime during conversion

1.4 Conversion vs Migration

This is a conversion, not a 1:1 document→row mapping:

Approach When to Use
Normalize Identities, jobs, schedules, relationships
Keep JSONB Advisory payloads, provenance trails, evidence manifests
Drop/Archive Ephemeral data (caches, locks), historical logs

2. Architecture

2.1 Strangler Fig Pattern

┌─────────────────────────────────────────────────────────────┐
│                      Service Layer                           │
├─────────────────────────────────────────────────────────────┤
│                    Repository Interface                      │
│              (e.g., IScheduleRepository)                     │
├──────────────────────┬──────────────────────────────────────┤
│   MongoRepository    │      PostgresRepository              │
│   (existing)         │      (new)                           │
├──────────────────────┴──────────────────────────────────────┤
│              DI Container (configured switch)                │
└─────────────────────────────────────────────────────────────┘

2.2 Configuration-Driven Backend Selection

{
  "Persistence": {
    "Authority": "Postgres",
    "Scheduler": "Postgres",
    "Concelier": "Mongo",
    "Excititor": "Mongo",
    "Notify": "Postgres",
    "Policy": "Mongo"
  }
}

2.3 Existing PostgreSQL Patterns

The codebase already contains production-ready patterns:

Module Location Reusable Components
Orchestrator src/Orchestrator/.../Infrastructure/Postgres/ DataSource, tenant context, repository pattern
Findings src/Findings/StellaOps.Findings.Ledger/Infrastructure/Postgres/ Ledger events, Merkle anchors, projections

Reference Implementation: OrchestratorDataSource.cs


3. Data Tiering

3.1 Tier Definitions

Tier Description Strategy
A Critical business data Full conversion with verification
B Important but recoverable Convert active records only
C Ephemeral/cache data Fresh start, no migration

3.2 Module Tiering

Authority

Collection Tier Strategy
authority_users A Full conversion
authority_clients A Full conversion
authority_scopes A Full conversion
authority_tokens B Active tokens only
authority_service_accounts A Full conversion
authority_login_attempts B Recent 90 days
authority_revocations A Full conversion

Scheduler

Collection Tier Strategy
schedules A Full conversion
runs B Recent 180 days
graph_jobs B Active/recent only
policy_jobs B Active/recent only
impact_snapshots B Recent 90 days
locks C Fresh start

Concelier (Vulnerabilities)

Collection Tier Strategy
advisory A Full conversion
advisory_raw B GridFS refs only
alias A Full conversion
affected A Full conversion
source A Full conversion
source_state A Full conversion
jobs, locks C Fresh start

Excititor (VEX)

Collection Tier Strategy
vex.statements A Full conversion
vex.observations A Full conversion
vex.linksets A Full conversion
vex.consensus A Full conversion
vex.raw B Active/recent only
vex.cache C Fresh start

4. Execution Phases

Phase Overview

Phase 0: Foundations           [1 sprint]
    │
    ├─→ Phase 1: Authority     [1 sprint]
    │
    ├─→ Phase 2: Scheduler     [1 sprint]
    │
    ├─→ Phase 3: Notify        [1 sprint]
    │
    ├─→ Phase 4: Policy        [1 sprint]
    │
    └─→ Phase 5: Concelier     [2 sprints]
            │
            └─→ Phase 6: Excititor [2-3 sprints]
                    │
                    └─→ Phase 7: Cleanup [1 sprint]

Phase Summary

Phase Scope Duration Dependencies Deliverable
0 Foundations 1 sprint None PostgreSQL infrastructure, shared library
1 Authority 1 sprint Phase 0 Identity management on PostgreSQL
2 Scheduler 1 sprint Phase 0 Job scheduling on PostgreSQL
3 Notify 1 sprint Phase 0 Notifications on PostgreSQL
4 Policy 1 sprint Phase 0 Policy engine on PostgreSQL
5 Concelier 2 sprints Phase 0 Vulnerability index on PostgreSQL
6 Excititor 2-3 sprints Phase 5 VEX & graphs on PostgreSQL
7 Cleanup 1 sprint All MongoDB retired, docs updated

Total: 10-12 sprints

Detailed Task Definitions

See:


5. Conversion Strategy

5.1 Per-Module Approach

1. Create PostgreSQL storage project
2. Implement schema migrations
3. Implement repository interfaces
4. Add configuration switch
5. Enable dual-write (if Tier A)
6. Run verification tests
7. Switch to PostgreSQL-only
8. Archive MongoDB data

5.2 Dual-Write Pattern

For Tier A data requiring historical continuity:

┌──────────────────────────────────────────────────────────────┐
│                   DualWriteRepository                         │
├──────────────────────────────────────────────────────────────┤
│  Write:  PostgreSQL (primary) + MongoDB (secondary)          │
│  Read:   PostgreSQL (primary) → MongoDB (fallback)           │
│  Config: WriteToBoth, FallbackToMongo, ConvertOnRead         │
└──────────────────────────────────────────────────────────────┘

5.3 Fresh Start Pattern

For Tier C ephemeral data:

┌──────────────────────────────────────────────────────────────┐
│  1. Deploy PostgreSQL schema                                  │
│  2. Switch configuration to PostgreSQL                        │
│  3. New data goes to PostgreSQL only                         │
│  4. Old MongoDB data ages out naturally                      │
└──────────────────────────────────────────────────────────────┘

6. Risk Assessment

6.1 Technical Risks

Risk Impact Likelihood Mitigation
Data loss during conversion High Low Dual-write mode, extensive verification
Performance regression Medium Medium Load testing before switch, index optimization
Determinism violation High Medium Automated verification tests, parallel pipeline
Schema evolution conflicts Medium Low Migration framework, schema versioning
Transaction semantics differences Medium Low Code review, integration tests

6.2 Operational Risks

Risk Impact Likelihood Mitigation
Extended conversion timeline Medium Medium Phase-based approach, clear milestones
Team learning curve Low Medium Reference implementations, documentation
Rollback complexity Medium Low Keep Mongo data until verified, feature flags

6.3 Rollback Strategy

Each phase has independent rollback capability:

Level Action Recovery Time
Configuration Change Persistence:<Module> to Mongo Minutes
Data MongoDB data retained during dual-write None needed
Code Git revert (PostgreSQL code isolated) Hours

7. Success Criteria

7.1 Per-Module Criteria

  • All existing integration tests pass with PostgreSQL backend
  • No performance regression >10% on critical paths
  • Deterministic outputs verified against MongoDB baseline
  • Zero data loss during conversion
  • Tenant isolation verified

7.2 Overall Criteria

  • All control-plane modules running on PostgreSQL
  • MongoDB retired from production for converted modules
  • Air-gap kit updated with PostgreSQL support
  • Documentation updated for PostgreSQL operations
  • Runbooks updated for PostgreSQL troubleshooting

8. Project Structure

8.1 New Projects

src/
├── Shared/
│   └── StellaOps.Infrastructure.Postgres/
│       ├── DataSourceBase.cs
│       ├── Migrations/
│       │   ├── IPostgresMigration.cs
│       │   └── PostgresMigrationRunner.cs
│       ├── Extensions/
│       │   └── NpgsqlExtensions.cs
│       └── ServiceCollectionExtensions.cs
│
├── Authority/
│   └── __Libraries/
│       └── StellaOps.Authority.Storage.Postgres/
│           ├── AuthorityDataSource.cs
│           ├── Repositories/
│           ├── Migrations/
│           └── ServiceCollectionExtensions.cs
│
├── Scheduler/
│   └── __Libraries/
│       └── StellaOps.Scheduler.Storage.Postgres/
│
├── Notify/
│   └── __Libraries/
│       └── StellaOps.Notify.Storage.Postgres/
│
├── Policy/
│   └── __Libraries/
│       └── StellaOps.Policy.Storage.Postgres/
│
├── Concelier/
│   └── __Libraries/
│       └── StellaOps.Concelier.Storage.Postgres/
│
└── Excititor/
    └── __Libraries/
        └── StellaOps.Excititor.Storage.Postgres/

8.2 Schema Files

docs/db/
├── schemas/
│   ├── authority.sql
│   ├── vuln.sql
│   ├── vex.sql
│   ├── scheduler.sql
│   ├── notify.sql
│   └── policy.sql

9. Timeline

9.1 Sprint Schedule

Sprint Phase Focus
1 0 PostgreSQL infrastructure, shared library
2 1 Authority module conversion
3 2 Scheduler module conversion
4 3 Notify module conversion
5 4 Policy module conversion
6-7 5 Concelier/Vulnerability conversion
8-10 6 Excititor/VEX conversion
11 7 Cleanup, optimization, documentation

9.2 Milestones

Milestone Sprint Criteria
M1: Infrastructure Ready 1 PostgreSQL cluster operational, CI tests passing
M2: Identity Converted 2 Authority on PostgreSQL, auth flows working
M3: Scheduling Converted 3 Scheduler on PostgreSQL, jobs executing
M4: Core Services Converted 5 Notify + Policy on PostgreSQL
M5: Vulnerability Index Converted 7 Concelier on PostgreSQL, scans deterministic
M6: VEX Converted 10 Excititor on PostgreSQL, graphs stable
M7: MongoDB Retired 11 All modules converted, Mongo archived

10. Governance

10.1 Decision Log

Date Decision Rationale Approver
2025-11-28 Strangler fig pattern Allows gradual rollout with rollback Architecture Team
2025-11-28 JSONB for semi-structured data Preserves flexibility, simplifies conversion Architecture Team
2025-11-28 Phase 0 first Infrastructure must be stable before modules Architecture Team

10.2 Change Control

Changes to this plan require:

  1. Impact assessment documented
  2. Risk analysis updated
  3. Approval from Architecture Team
  4. Updated task definitions in docs/db/tasks/

10.3 Status Reporting

Weekly status updates in sprint files tracking:

  • Tasks completed
  • Blockers encountered
  • Verification results
  • Next sprint objectives

Appendix A: Reference Implementation

DataSource Pattern

public sealed class ModuleDataSource : IAsyncDisposable
{
    private readonly NpgsqlDataSource _dataSource;

    public async Task<NpgsqlConnection> OpenConnectionAsync(
        string tenantId,
        CancellationToken cancellationToken = default)
    {
        var connection = await _dataSource.OpenConnectionAsync(cancellationToken);
        await ConfigureSessionAsync(connection, tenantId, cancellationToken);
        return connection;
    }

    private static async Task ConfigureSessionAsync(
        NpgsqlConnection connection,
        string tenantId,
        CancellationToken cancellationToken)
    {
        await using var cmd = connection.CreateCommand();
        cmd.CommandText = $"""
            SET app.tenant_id = '{tenantId}';
            SET timezone = 'UTC';
            SET statement_timeout = '30s';
            """;
        await cmd.ExecuteNonQueryAsync(cancellationToken);
    }
}

Repository Pattern

See RULES.md Section 1 for complete repository implementation guidelines.


Appendix B: Glossary

Term Definition
Strangler Fig Pattern where new system grows alongside old, gradually replacing it
Dual-Write Writing to both MongoDB and PostgreSQL during transition
Tier A/B/C Data classification by criticality for migration strategy
DataSource Npgsql connection factory with tenant context configuration
Determinism Property that same inputs always produce same outputs

Document Version: 2.0.0 Last Updated: 2025-11-28