- Introduced `BinaryReachabilityLifterTests` to validate binary lifting functionality. - Created `PackRunWorkerOptions` for configuring worker paths and execution persistence. - Added `TimelineIngestionOptions` for configuring NATS and Redis ingestion transports. - Implemented `NatsTimelineEventSubscriber` for subscribing to NATS events. - Developed `RedisTimelineEventSubscriber` for reading from Redis Streams. - Added `TimelineEnvelopeParser` to normalize incoming event envelopes. - Created unit tests for `TimelineEnvelopeParser` to ensure correct field mapping. - Implemented `TimelineAuthorizationAuditSink` for logging authorization outcomes.
17 KiB
MongoDB to PostgreSQL Conversion Plan
Version: 2.0.0 Status: APPROVED Created: 2025-11-28 Last Updated: 2025-11-28
Executive Summary
This document outlines the strategic plan to convert (not migrate) StellaOps from MongoDB to PostgreSQL for control-plane domains. The conversion follows a "strangler fig" pattern, introducing PostgreSQL repositories alongside existing MongoDB implementations and gradually switching each bounded context.
Key Finding: StellaOps already has production-ready PostgreSQL patterns in the Orchestrator and Findings modules that serve as templates for all other modules.
Related Documents
| Document | Purpose |
|---|---|
| SPECIFICATION.md | Schema designs, naming conventions, data types |
| RULES.md | Database coding rules and patterns |
| VERIFICATION.md | Testing and verification requirements |
| tasks/ | Detailed task definitions per phase |
1. Principles & Scope
1.1 Goals
Convert control-plane domains from MongoDB to PostgreSQL:
| Domain | Current DB | Target | Priority |
|---|---|---|---|
| Authority | stellaops_authority |
PostgreSQL | P0 |
| Scheduler | stellaops_scheduler |
PostgreSQL | P0 |
| Notify | stellaops_notify |
PostgreSQL | P1 |
| Policy | stellaops_policy |
PostgreSQL | P1 |
| Vulnerabilities (Concelier) | concelier |
PostgreSQL | P2 |
| VEX & Graph (Excititor) | excititor |
PostgreSQL | P2 |
| PacksRegistry | stellaops_packs |
PostgreSQL | P3 |
| IssuerDirectory | stellaops_issuer |
PostgreSQL | P3 |
1.2 Non-Goals
- Scanner result storage (remains object storage + Mongo for now)
- Real-time event streams (separate infrastructure)
- Legacy data archive (can remain in MongoDB read-only)
1.3 Constraints
MUST Preserve:
- Deterministic, replayable scans
- "Preserve/prune source" rule for Concelier/Excititor
- Lattice logic in
Scanner.WebService(not in DB) - Air-gap friendliness and offline-kit packaging
- Multi-tenant isolation patterns
- Zero downtime during conversion
1.4 Conversion vs Migration
This is a conversion, not a 1:1 document→row mapping:
| Approach | When to Use |
|---|---|
| Normalize | Identities, jobs, schedules, relationships |
| Keep JSONB | Advisory payloads, provenance trails, evidence manifests |
| Drop/Archive | Ephemeral data (caches, locks), historical logs |
2. Architecture
2.1 Strangler Fig Pattern
┌─────────────────────────────────────────────────────────────┐
│ Service Layer │
├─────────────────────────────────────────────────────────────┤
│ Repository Interface │
│ (e.g., IScheduleRepository) │
├──────────────────────┬──────────────────────────────────────┤
│ MongoRepository │ PostgresRepository │
│ (existing) │ (new) │
├──────────────────────┴──────────────────────────────────────┤
│ DI Container (configured switch) │
└─────────────────────────────────────────────────────────────┘
2.2 Configuration-Driven Backend Selection
{
"Persistence": {
"Authority": "Postgres",
"Scheduler": "Postgres",
"Concelier": "Mongo",
"Excititor": "Mongo",
"Notify": "Postgres",
"Policy": "Mongo"
}
}
2.3 Existing PostgreSQL Patterns
The codebase already contains production-ready patterns:
| Module | Location | Reusable Components |
|---|---|---|
| Orchestrator | src/Orchestrator/.../Infrastructure/Postgres/ |
DataSource, tenant context, repository pattern |
| Findings | src/Findings/StellaOps.Findings.Ledger/Infrastructure/Postgres/ |
Ledger events, Merkle anchors, projections |
Reference Implementation: OrchestratorDataSource.cs
3. Data Tiering
3.1 Tier Definitions
| Tier | Description | Strategy |
|---|---|---|
| A | Critical business data | Full conversion with verification |
| B | Important but recoverable | Convert active records only |
| C | Ephemeral/cache data | Fresh start, no migration |
3.2 Module Tiering
Authority
| Collection | Tier | Strategy |
|---|---|---|
authority_users |
A | Full conversion |
authority_clients |
A | Full conversion |
authority_scopes |
A | Full conversion |
authority_tokens |
B | Active tokens only |
authority_service_accounts |
A | Full conversion |
authority_login_attempts |
B | Recent 90 days |
authority_revocations |
A | Full conversion |
Scheduler
| Collection | Tier | Strategy |
|---|---|---|
schedules |
A | Full conversion |
runs |
B | Recent 180 days |
graph_jobs |
B | Active/recent only |
policy_jobs |
B | Active/recent only |
impact_snapshots |
B | Recent 90 days |
locks |
C | Fresh start |
Concelier (Vulnerabilities)
| Collection | Tier | Strategy |
|---|---|---|
advisory |
A | Full conversion |
advisory_raw |
B | GridFS refs only |
alias |
A | Full conversion |
affected |
A | Full conversion |
source |
A | Full conversion |
source_state |
A | Full conversion |
jobs, locks |
C | Fresh start |
Excititor (VEX)
| Collection | Tier | Strategy |
|---|---|---|
vex.statements |
A | Full conversion |
vex.observations |
A | Full conversion |
vex.linksets |
A | Full conversion |
vex.consensus |
A | Full conversion |
vex.raw |
B | Active/recent only |
vex.cache |
C | Fresh start |
4. Execution Phases
Phase Overview
Phase 0: Foundations [1 sprint]
│
├─→ Phase 1: Authority [1 sprint]
│
├─→ Phase 2: Scheduler [1 sprint]
│
├─→ Phase 3: Notify [1 sprint]
│
├─→ Phase 4: Policy [1 sprint]
│
└─→ Phase 5: Concelier [2 sprints]
│
└─→ Phase 6: Excititor [2-3 sprints]
│
└─→ Phase 7: Cleanup [1 sprint]
Phase Summary
| Phase | Scope | Duration | Dependencies | Deliverable |
|---|---|---|---|---|
| 0 | Foundations | 1 sprint | None | PostgreSQL infrastructure, shared library |
| 1 | Authority | 1 sprint | Phase 0 | Identity management on PostgreSQL |
| 2 | Scheduler | 1 sprint | Phase 0 | Job scheduling on PostgreSQL |
| 3 | Notify | 1 sprint | Phase 0 | Notifications on PostgreSQL |
| 4 | Policy | 1 sprint | Phase 0 | Policy engine on PostgreSQL |
| 5 | Concelier | 2 sprints | Phase 0 | Vulnerability index on PostgreSQL |
| 6 | Excititor | 2-3 sprints | Phase 5 | VEX & graphs on PostgreSQL |
| 7 | Cleanup | 1 sprint | All | MongoDB retired, docs updated |
Total: 10-12 sprints
Detailed Task Definitions
See:
- tasks/PHASE_0_FOUNDATIONS.md
- tasks/PHASE_1_AUTHORITY.md
- tasks/PHASE_2_SCHEDULER.md
- tasks/PHASE_3_NOTIFY.md
- tasks/PHASE_4_POLICY.md
- tasks/PHASE_5_VULNERABILITIES.md
- tasks/PHASE_6_VEX_GRAPH.md
- tasks/PHASE_7_CLEANUP.md
5. Conversion Strategy
5.1 Per-Module Approach
1. Create PostgreSQL storage project
2. Implement schema migrations
3. Implement repository interfaces
4. Add configuration switch
5. (Retired) Dual-write was used during migration for Tier A; all modules are now Postgres-only.
6. Run verification tests
7. Switch to PostgreSQL-only
8. Archive MongoDB data
5.2 Dual-Write Pattern
For Tier A data requiring historical continuity:
┌──────────────────────────────────────────────────────────────┐
│ DualWriteRepository │
├──────────────────────────────────────────────────────────────┤
│ Write: PostgreSQL (primary) + MongoDB (secondary) │
│ Read: PostgreSQL (primary) → MongoDB (fallback) │
│ Config: WriteToBoth, FallbackToMongo, ConvertOnRead │
└──────────────────────────────────────────────────────────────┘
5.3 Fresh Start Pattern
For Tier C ephemeral data:
┌──────────────────────────────────────────────────────────────┐
│ 1. Deploy PostgreSQL schema │
│ 2. Switch configuration to PostgreSQL │
│ 3. New data goes to PostgreSQL only │
│ 4. Old MongoDB data ages out naturally │
└──────────────────────────────────────────────────────────────┘
6. Risk Assessment
6.1 Technical Risks
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| Data loss during conversion | High | Low | Dual-write mode, extensive verification |
| Performance regression | Medium | Medium | Load testing before switch, index optimization |
| Determinism violation | High | Medium | Automated verification tests, parallel pipeline |
| Schema evolution conflicts | Medium | Low | Migration framework, schema versioning |
| Transaction semantics differences | Medium | Low | Code review, integration tests |
6.2 Operational Risks
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| Extended conversion timeline | Medium | Medium | Phase-based approach, clear milestones |
| Team learning curve | Low | Medium | Reference implementations, documentation |
| Rollback complexity | Medium | Low | Keep Mongo data until verified, feature flags |
6.3 Rollback Strategy
Each phase has independent rollback capability:
| Level | Action | Recovery Time |
|---|---|---|
| Configuration | Change Persistence:<Module> to Mongo |
Minutes |
| Data | MongoDB data retained during dual-write | None needed (historical note; dual-write ended after cutover) |
| Code | Git revert (PostgreSQL code isolated) | Hours |
7. Success Criteria
7.1 Per-Module Criteria
- All existing integration tests pass with PostgreSQL backend
- No performance regression >10% on critical paths
- Deterministic outputs verified against MongoDB baseline
- Zero data loss during conversion
- Tenant isolation verified
7.2 Overall Criteria
- All control-plane modules running on PostgreSQL
- MongoDB retired from production for converted modules
- Air-gap kit updated with PostgreSQL support
- Documentation updated for PostgreSQL operations
- Runbooks updated for PostgreSQL troubleshooting
8. Project Structure
8.1 New Projects
src/
├── Shared/
│ └── StellaOps.Infrastructure.Postgres/
│ ├── DataSourceBase.cs
│ ├── Migrations/
│ │ ├── IPostgresMigration.cs
│ │ └── PostgresMigrationRunner.cs
│ ├── Extensions/
│ │ └── NpgsqlExtensions.cs
│ └── ServiceCollectionExtensions.cs
│
├── Authority/
│ └── __Libraries/
│ └── StellaOps.Authority.Storage.Postgres/
│ ├── AuthorityDataSource.cs
│ ├── Repositories/
│ ├── Migrations/
│ └── ServiceCollectionExtensions.cs
│
├── Scheduler/
│ └── __Libraries/
│ └── StellaOps.Scheduler.Storage.Postgres/
│
├── Notify/
│ └── __Libraries/
│ └── StellaOps.Notify.Storage.Postgres/
│
├── Policy/
│ └── __Libraries/
│ └── StellaOps.Policy.Storage.Postgres/
│
├── Concelier/
│ └── __Libraries/
│ └── StellaOps.Concelier.Storage.Postgres/
│
└── Excititor/
└── __Libraries/
└── StellaOps.Excititor.Storage.Postgres/
8.2 Schema Files
docs/db/
├── schemas/
│ ├── authority.sql
│ ├── vuln.sql
│ ├── vex.sql
│ ├── scheduler.sql
│ ├── notify.sql
│ └── policy.sql
9. Timeline
9.1 Sprint Schedule
| Sprint | Phase | Focus |
|---|---|---|
| 1 | 0 | PostgreSQL infrastructure, shared library |
| 2 | 1 | Authority module conversion |
| 3 | 2 | Scheduler module conversion |
| 4 | 3 | Notify module conversion |
| 5 | 4 | Policy module conversion |
| 6-7 | 5 | Concelier/Vulnerability conversion |
| 8-10 | 6 | Excititor/VEX conversion |
| 11 | 7 | Cleanup, optimization, documentation |
9.2 Milestones
| Milestone | Sprint | Criteria |
|---|---|---|
| M1: Infrastructure Ready | 1 | PostgreSQL cluster operational, CI tests passing |
| M2: Identity Converted | 2 | Authority on PostgreSQL, auth flows working |
| M3: Scheduling Converted | 3 | Scheduler on PostgreSQL, jobs executing |
| M4: Core Services Converted | 5 | Notify + Policy on PostgreSQL |
| M5: Vulnerability Index Converted | 7 | Concelier on PostgreSQL, scans deterministic |
| M6: VEX Converted | 10 | Excititor on PostgreSQL, graphs stable |
| M7: MongoDB Retired | 11 | All modules converted, Mongo archived |
10. Governance
10.1 Decision Log
| Date | Decision | Rationale | Approver |
|---|---|---|---|
| 2025-11-28 | Strangler fig pattern | Allows gradual rollout with rollback | Architecture Team |
| 2025-11-28 | JSONB for semi-structured data | Preserves flexibility, simplifies conversion | Architecture Team |
| 2025-11-28 | Phase 0 first | Infrastructure must be stable before modules | Architecture Team |
10.2 Change Control
Changes to this plan require:
- Impact assessment documented
- Risk analysis updated
- Approval from Architecture Team
- Updated task definitions in
docs/db/tasks/
10.3 Status Reporting
Weekly status updates in sprint files tracking:
- Tasks completed
- Blockers encountered
- Verification results
- Next sprint objectives
Appendix A: Reference Implementation
DataSource Pattern
public sealed class ModuleDataSource : IAsyncDisposable
{
private readonly NpgsqlDataSource _dataSource;
public async Task<NpgsqlConnection> OpenConnectionAsync(
string tenantId,
CancellationToken cancellationToken = default)
{
var connection = await _dataSource.OpenConnectionAsync(cancellationToken);
await ConfigureSessionAsync(connection, tenantId, cancellationToken);
return connection;
}
private static async Task ConfigureSessionAsync(
NpgsqlConnection connection,
string tenantId,
CancellationToken cancellationToken)
{
await using var cmd = connection.CreateCommand();
cmd.CommandText = $"""
SET app.tenant_id = '{tenantId}';
SET timezone = 'UTC';
SET statement_timeout = '30s';
""";
await cmd.ExecuteNonQueryAsync(cancellationToken);
}
}
Repository Pattern
See RULES.md Section 1 for complete repository implementation guidelines.
Appendix B: Glossary
| Term | Definition |
|---|---|
| Strangler Fig | Pattern where new system grows alongside old, gradually replacing it |
| Dual-Write | Writing to both MongoDB and PostgreSQL during transition |
| Tier A/B/C | Data classification by criticality for migration strategy |
| DataSource | Npgsql connection factory with tenant context configuration |
| Determinism | Property that same inputs always produce same outputs |
Document Version: 2.0.0 Last Updated: 2025-11-28