up
This commit is contained in:
491
docs/db/CONVERSION_PLAN.md
Normal file
491
docs/db/CONVERSION_PLAN.md
Normal file
@@ -0,0 +1,491 @@
|
||||
# MongoDB to PostgreSQL Conversion Plan
|
||||
|
||||
**Version:** 2.0.0
|
||||
**Status:** APPROVED
|
||||
**Created:** 2025-11-28
|
||||
**Last Updated:** 2025-11-28
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This document outlines the strategic plan to **convert** (not migrate) StellaOps from MongoDB to PostgreSQL for control-plane domains. The conversion follows a "strangler fig" pattern, introducing PostgreSQL repositories alongside existing MongoDB implementations and gradually switching each bounded context.
|
||||
|
||||
**Key Finding:** StellaOps already has production-ready PostgreSQL patterns in the Orchestrator and Findings modules that serve as templates for all other modules.
|
||||
|
||||
### Related Documents
|
||||
|
||||
| Document | Purpose |
|
||||
|----------|---------|
|
||||
| [SPECIFICATION.md](./SPECIFICATION.md) | Schema designs, naming conventions, data types |
|
||||
| [RULES.md](./RULES.md) | Database coding rules and patterns |
|
||||
| [VERIFICATION.md](./VERIFICATION.md) | Testing and verification requirements |
|
||||
| [tasks/](./tasks/) | Detailed task definitions per phase |
|
||||
|
||||
---
|
||||
|
||||
## 1. Principles & Scope
|
||||
|
||||
### 1.1 Goals
|
||||
|
||||
Convert **control-plane** domains from MongoDB to PostgreSQL:
|
||||
|
||||
| Domain | Current DB | Target | Priority |
|
||||
|--------|-----------|--------|----------|
|
||||
| Authority | `stellaops_authority` | PostgreSQL | P0 |
|
||||
| Scheduler | `stellaops_scheduler` | PostgreSQL | P0 |
|
||||
| Notify | `stellaops_notify` | PostgreSQL | P1 |
|
||||
| Policy | `stellaops_policy` | PostgreSQL | P1 |
|
||||
| Vulnerabilities (Concelier) | `concelier` | PostgreSQL | P2 |
|
||||
| VEX & Graph (Excititor) | `excititor` | PostgreSQL | P2 |
|
||||
| PacksRegistry | `stellaops_packs` | PostgreSQL | P3 |
|
||||
| IssuerDirectory | `stellaops_issuer` | PostgreSQL | P3 |
|
||||
|
||||
### 1.2 Non-Goals
|
||||
|
||||
- Scanner result storage (remains object storage + Mongo for now)
|
||||
- Real-time event streams (separate infrastructure)
|
||||
- Legacy data archive (can remain in MongoDB read-only)
|
||||
|
||||
### 1.3 Constraints
|
||||
|
||||
**MUST Preserve:**
|
||||
- Deterministic, replayable scans
|
||||
- "Preserve/prune source" rule for Concelier/Excititor
|
||||
- Lattice logic in `Scanner.WebService` (not in DB)
|
||||
- Air-gap friendliness and offline-kit packaging
|
||||
- Multi-tenant isolation patterns
|
||||
- Zero downtime during conversion
|
||||
|
||||
### 1.4 Conversion vs Migration
|
||||
|
||||
This is a **conversion**, not a 1:1 document→row mapping:
|
||||
|
||||
| Approach | When to Use |
|
||||
|----------|-------------|
|
||||
| **Normalize** | Identities, jobs, schedules, relationships |
|
||||
| **Keep JSONB** | Advisory payloads, provenance trails, evidence manifests |
|
||||
| **Drop/Archive** | Ephemeral data (caches, locks), historical logs |
|
||||
|
||||
---
|
||||
|
||||
## 2. Architecture
|
||||
|
||||
### 2.1 Strangler Fig Pattern
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Service Layer │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ Repository Interface │
|
||||
│ (e.g., IScheduleRepository) │
|
||||
├──────────────────────┬──────────────────────────────────────┤
|
||||
│ MongoRepository │ PostgresRepository │
|
||||
│ (existing) │ (new) │
|
||||
├──────────────────────┴──────────────────────────────────────┤
|
||||
│ DI Container (configured switch) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 2.2 Configuration-Driven Backend Selection
|
||||
|
||||
```json
|
||||
{
|
||||
"Persistence": {
|
||||
"Authority": "Postgres",
|
||||
"Scheduler": "Postgres",
|
||||
"Concelier": "Mongo",
|
||||
"Excititor": "Mongo",
|
||||
"Notify": "Postgres",
|
||||
"Policy": "Mongo"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 Existing PostgreSQL Patterns
|
||||
|
||||
The codebase already contains production-ready patterns:
|
||||
|
||||
| Module | Location | Reusable Components |
|
||||
|--------|----------|---------------------|
|
||||
| Orchestrator | `src/Orchestrator/.../Infrastructure/Postgres/` | DataSource, tenant context, repository pattern |
|
||||
| Findings | `src/Findings/StellaOps.Findings.Ledger/Infrastructure/Postgres/` | Ledger events, Merkle anchors, projections |
|
||||
|
||||
**Reference Implementation:** `OrchestratorDataSource.cs`
|
||||
|
||||
---
|
||||
|
||||
## 3. Data Tiering
|
||||
|
||||
### 3.1 Tier Definitions
|
||||
|
||||
| Tier | Description | Strategy |
|
||||
|------|-------------|----------|
|
||||
| **A** | Critical business data | Full conversion with verification |
|
||||
| **B** | Important but recoverable | Convert active records only |
|
||||
| **C** | Ephemeral/cache data | Fresh start, no migration |
|
||||
|
||||
### 3.2 Module Tiering
|
||||
|
||||
#### Authority
|
||||
| Collection | Tier | Strategy |
|
||||
|------------|------|----------|
|
||||
| `authority_users` | A | Full conversion |
|
||||
| `authority_clients` | A | Full conversion |
|
||||
| `authority_scopes` | A | Full conversion |
|
||||
| `authority_tokens` | B | Active tokens only |
|
||||
| `authority_service_accounts` | A | Full conversion |
|
||||
| `authority_login_attempts` | B | Recent 90 days |
|
||||
| `authority_revocations` | A | Full conversion |
|
||||
|
||||
#### Scheduler
|
||||
| Collection | Tier | Strategy |
|
||||
|------------|------|----------|
|
||||
| `schedules` | A | Full conversion |
|
||||
| `runs` | B | Recent 180 days |
|
||||
| `graph_jobs` | B | Active/recent only |
|
||||
| `policy_jobs` | B | Active/recent only |
|
||||
| `impact_snapshots` | B | Recent 90 days |
|
||||
| `locks` | C | Fresh start |
|
||||
|
||||
#### Concelier (Vulnerabilities)
|
||||
| Collection | Tier | Strategy |
|
||||
|------------|------|----------|
|
||||
| `advisory` | A | Full conversion |
|
||||
| `advisory_raw` | B | GridFS refs only |
|
||||
| `alias` | A | Full conversion |
|
||||
| `affected` | A | Full conversion |
|
||||
| `source` | A | Full conversion |
|
||||
| `source_state` | A | Full conversion |
|
||||
| `jobs`, `locks` | C | Fresh start |
|
||||
|
||||
#### Excititor (VEX)
|
||||
| Collection | Tier | Strategy |
|
||||
|------------|------|----------|
|
||||
| `vex.statements` | A | Full conversion |
|
||||
| `vex.observations` | A | Full conversion |
|
||||
| `vex.linksets` | A | Full conversion |
|
||||
| `vex.consensus` | A | Full conversion |
|
||||
| `vex.raw` | B | Active/recent only |
|
||||
| `vex.cache` | C | Fresh start |
|
||||
|
||||
---
|
||||
|
||||
## 4. Execution Phases
|
||||
|
||||
### Phase Overview
|
||||
|
||||
```
|
||||
Phase 0: Foundations [1 sprint]
|
||||
│
|
||||
├─→ Phase 1: Authority [1 sprint]
|
||||
│
|
||||
├─→ Phase 2: Scheduler [1 sprint]
|
||||
│
|
||||
├─→ Phase 3: Notify [1 sprint]
|
||||
│
|
||||
├─→ Phase 4: Policy [1 sprint]
|
||||
│
|
||||
└─→ Phase 5: Concelier [2 sprints]
|
||||
│
|
||||
└─→ Phase 6: Excititor [2-3 sprints]
|
||||
│
|
||||
└─→ Phase 7: Cleanup [1 sprint]
|
||||
```
|
||||
|
||||
### Phase Summary
|
||||
|
||||
| Phase | Scope | Duration | Dependencies | Deliverable |
|
||||
|-------|-------|----------|--------------|-------------|
|
||||
| 0 | Foundations | 1 sprint | None | PostgreSQL infrastructure, shared library |
|
||||
| 1 | Authority | 1 sprint | Phase 0 | Identity management on PostgreSQL |
|
||||
| 2 | Scheduler | 1 sprint | Phase 0 | Job scheduling on PostgreSQL |
|
||||
| 3 | Notify | 1 sprint | Phase 0 | Notifications on PostgreSQL |
|
||||
| 4 | Policy | 1 sprint | Phase 0 | Policy engine on PostgreSQL |
|
||||
| 5 | Concelier | 2 sprints | Phase 0 | Vulnerability index on PostgreSQL |
|
||||
| 6 | Excititor | 2-3 sprints | Phase 5 | VEX & graphs on PostgreSQL |
|
||||
| 7 | Cleanup | 1 sprint | All | MongoDB retired, docs updated |
|
||||
|
||||
**Total: 10-12 sprints**
|
||||
|
||||
### Detailed Task Definitions
|
||||
|
||||
See:
|
||||
- [tasks/PHASE_0_FOUNDATIONS.md](./tasks/PHASE_0_FOUNDATIONS.md)
|
||||
- [tasks/PHASE_1_AUTHORITY.md](./tasks/PHASE_1_AUTHORITY.md)
|
||||
- [tasks/PHASE_2_SCHEDULER.md](./tasks/PHASE_2_SCHEDULER.md)
|
||||
- [tasks/PHASE_3_NOTIFY.md](./tasks/PHASE_3_NOTIFY.md)
|
||||
- [tasks/PHASE_4_POLICY.md](./tasks/PHASE_4_POLICY.md)
|
||||
- [tasks/PHASE_5_VULNERABILITIES.md](./tasks/PHASE_5_VULNERABILITIES.md)
|
||||
- [tasks/PHASE_6_VEX_GRAPH.md](./tasks/PHASE_6_VEX_GRAPH.md)
|
||||
- [tasks/PHASE_7_CLEANUP.md](./tasks/PHASE_7_CLEANUP.md)
|
||||
|
||||
---
|
||||
|
||||
## 5. Conversion Strategy
|
||||
|
||||
### 5.1 Per-Module Approach
|
||||
|
||||
```
|
||||
1. Create PostgreSQL storage project
|
||||
2. Implement schema migrations
|
||||
3. Implement repository interfaces
|
||||
4. Add configuration switch
|
||||
5. Enable dual-write (if Tier A)
|
||||
6. Run verification tests
|
||||
7. Switch to PostgreSQL-only
|
||||
8. Archive MongoDB data
|
||||
```
|
||||
|
||||
### 5.2 Dual-Write Pattern
|
||||
|
||||
For Tier A data requiring historical continuity:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ DualWriteRepository │
|
||||
├──────────────────────────────────────────────────────────────┤
|
||||
│ Write: PostgreSQL (primary) + MongoDB (secondary) │
|
||||
│ Read: PostgreSQL (primary) → MongoDB (fallback) │
|
||||
│ Config: WriteToBoth, FallbackToMongo, ConvertOnRead │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 5.3 Fresh Start Pattern
|
||||
|
||||
For Tier C ephemeral data:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ 1. Deploy PostgreSQL schema │
|
||||
│ 2. Switch configuration to PostgreSQL │
|
||||
│ 3. New data goes to PostgreSQL only │
|
||||
│ 4. Old MongoDB data ages out naturally │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Risk Assessment
|
||||
|
||||
### 6.1 Technical Risks
|
||||
|
||||
| Risk | Impact | Likelihood | Mitigation |
|
||||
|------|--------|------------|------------|
|
||||
| Data loss during conversion | High | Low | Dual-write mode, extensive verification |
|
||||
| Performance regression | Medium | Medium | Load testing before switch, index optimization |
|
||||
| Determinism violation | High | Medium | Automated verification tests, parallel pipeline |
|
||||
| Schema evolution conflicts | Medium | Low | Migration framework, schema versioning |
|
||||
| Transaction semantics differences | Medium | Low | Code review, integration tests |
|
||||
|
||||
### 6.2 Operational Risks
|
||||
|
||||
| Risk | Impact | Likelihood | Mitigation |
|
||||
|------|--------|------------|------------|
|
||||
| Extended conversion timeline | Medium | Medium | Phase-based approach, clear milestones |
|
||||
| Team learning curve | Low | Medium | Reference implementations, documentation |
|
||||
| Rollback complexity | Medium | Low | Keep Mongo data until verified, feature flags |
|
||||
|
||||
### 6.3 Rollback Strategy
|
||||
|
||||
Each phase has independent rollback capability:
|
||||
|
||||
| Level | Action | Recovery Time |
|
||||
|-------|--------|---------------|
|
||||
| Configuration | Change `Persistence:<Module>` to `Mongo` | Minutes |
|
||||
| Data | MongoDB data retained during dual-write | None needed |
|
||||
| Code | Git revert (PostgreSQL code isolated) | Hours |
|
||||
|
||||
---
|
||||
|
||||
## 7. Success Criteria
|
||||
|
||||
### 7.1 Per-Module Criteria
|
||||
|
||||
- [ ] All existing integration tests pass with PostgreSQL backend
|
||||
- [ ] No performance regression >10% on critical paths
|
||||
- [ ] Deterministic outputs verified against MongoDB baseline
|
||||
- [ ] Zero data loss during conversion
|
||||
- [ ] Tenant isolation verified
|
||||
|
||||
### 7.2 Overall Criteria
|
||||
|
||||
- [ ] All control-plane modules running on PostgreSQL
|
||||
- [ ] MongoDB retired from production for converted modules
|
||||
- [ ] Air-gap kit updated with PostgreSQL support
|
||||
- [ ] Documentation updated for PostgreSQL operations
|
||||
- [ ] Runbooks updated for PostgreSQL troubleshooting
|
||||
|
||||
---
|
||||
|
||||
## 8. Project Structure
|
||||
|
||||
### 8.1 New Projects
|
||||
|
||||
```
|
||||
src/
|
||||
├── Shared/
|
||||
│ └── StellaOps.Infrastructure.Postgres/
|
||||
│ ├── DataSourceBase.cs
|
||||
│ ├── Migrations/
|
||||
│ │ ├── IPostgresMigration.cs
|
||||
│ │ └── PostgresMigrationRunner.cs
|
||||
│ ├── Extensions/
|
||||
│ │ └── NpgsqlExtensions.cs
|
||||
│ └── ServiceCollectionExtensions.cs
|
||||
│
|
||||
├── Authority/
|
||||
│ └── __Libraries/
|
||||
│ └── StellaOps.Authority.Storage.Postgres/
|
||||
│ ├── AuthorityDataSource.cs
|
||||
│ ├── Repositories/
|
||||
│ ├── Migrations/
|
||||
│ └── ServiceCollectionExtensions.cs
|
||||
│
|
||||
├── Scheduler/
|
||||
│ └── __Libraries/
|
||||
│ └── StellaOps.Scheduler.Storage.Postgres/
|
||||
│
|
||||
├── Notify/
|
||||
│ └── __Libraries/
|
||||
│ └── StellaOps.Notify.Storage.Postgres/
|
||||
│
|
||||
├── Policy/
|
||||
│ └── __Libraries/
|
||||
│ └── StellaOps.Policy.Storage.Postgres/
|
||||
│
|
||||
├── Concelier/
|
||||
│ └── __Libraries/
|
||||
│ └── StellaOps.Concelier.Storage.Postgres/
|
||||
│
|
||||
└── Excititor/
|
||||
└── __Libraries/
|
||||
└── StellaOps.Excititor.Storage.Postgres/
|
||||
```
|
||||
|
||||
### 8.2 Schema Files
|
||||
|
||||
```
|
||||
docs/db/
|
||||
├── schemas/
|
||||
│ ├── authority.sql
|
||||
│ ├── vuln.sql
|
||||
│ ├── vex.sql
|
||||
│ ├── scheduler.sql
|
||||
│ ├── notify.sql
|
||||
│ └── policy.sql
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Timeline
|
||||
|
||||
### 9.1 Sprint Schedule
|
||||
|
||||
| Sprint | Phase | Focus |
|
||||
|--------|-------|-------|
|
||||
| 1 | 0 | PostgreSQL infrastructure, shared library |
|
||||
| 2 | 1 | Authority module conversion |
|
||||
| 3 | 2 | Scheduler module conversion |
|
||||
| 4 | 3 | Notify module conversion |
|
||||
| 5 | 4 | Policy module conversion |
|
||||
| 6-7 | 5 | Concelier/Vulnerability conversion |
|
||||
| 8-10 | 6 | Excititor/VEX conversion |
|
||||
| 11 | 7 | Cleanup, optimization, documentation |
|
||||
|
||||
### 9.2 Milestones
|
||||
|
||||
| Milestone | Sprint | Criteria |
|
||||
|-----------|--------|----------|
|
||||
| M1: Infrastructure Ready | 1 | PostgreSQL cluster operational, CI tests passing |
|
||||
| M2: Identity Converted | 2 | Authority on PostgreSQL, auth flows working |
|
||||
| M3: Scheduling Converted | 3 | Scheduler on PostgreSQL, jobs executing |
|
||||
| M4: Core Services Converted | 5 | Notify + Policy on PostgreSQL |
|
||||
| M5: Vulnerability Index Converted | 7 | Concelier on PostgreSQL, scans deterministic |
|
||||
| M6: VEX Converted | 10 | Excititor on PostgreSQL, graphs stable |
|
||||
| M7: MongoDB Retired | 11 | All modules converted, Mongo archived |
|
||||
|
||||
---
|
||||
|
||||
## 10. Governance
|
||||
|
||||
### 10.1 Decision Log
|
||||
|
||||
| Date | Decision | Rationale | Approver |
|
||||
|------|----------|-----------|----------|
|
||||
| 2025-11-28 | Strangler fig pattern | Allows gradual rollout with rollback | Architecture Team |
|
||||
| 2025-11-28 | JSONB for semi-structured data | Preserves flexibility, simplifies conversion | Architecture Team |
|
||||
| 2025-11-28 | Phase 0 first | Infrastructure must be stable before modules | Architecture Team |
|
||||
|
||||
### 10.2 Change Control
|
||||
|
||||
Changes to this plan require:
|
||||
1. Impact assessment documented
|
||||
2. Risk analysis updated
|
||||
3. Approval from Architecture Team
|
||||
4. Updated task definitions in `docs/db/tasks/`
|
||||
|
||||
### 10.3 Status Reporting
|
||||
|
||||
Weekly status updates in sprint files tracking:
|
||||
- Tasks completed
|
||||
- Blockers encountered
|
||||
- Verification results
|
||||
- Next sprint objectives
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: Reference Implementation
|
||||
|
||||
### DataSource Pattern
|
||||
|
||||
```csharp
|
||||
public sealed class ModuleDataSource : IAsyncDisposable
|
||||
{
|
||||
private readonly NpgsqlDataSource _dataSource;
|
||||
|
||||
public async Task<NpgsqlConnection> OpenConnectionAsync(
|
||||
string tenantId,
|
||||
CancellationToken cancellationToken = default)
|
||||
{
|
||||
var connection = await _dataSource.OpenConnectionAsync(cancellationToken);
|
||||
await ConfigureSessionAsync(connection, tenantId, cancellationToken);
|
||||
return connection;
|
||||
}
|
||||
|
||||
private static async Task ConfigureSessionAsync(
|
||||
NpgsqlConnection connection,
|
||||
string tenantId,
|
||||
CancellationToken cancellationToken)
|
||||
{
|
||||
await using var cmd = connection.CreateCommand();
|
||||
cmd.CommandText = $"""
|
||||
SET app.tenant_id = '{tenantId}';
|
||||
SET timezone = 'UTC';
|
||||
SET statement_timeout = '30s';
|
||||
""";
|
||||
await cmd.ExecuteNonQueryAsync(cancellationToken);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Repository Pattern
|
||||
|
||||
See [RULES.md](./RULES.md) Section 1 for complete repository implementation guidelines.
|
||||
|
||||
---
|
||||
|
||||
## Appendix B: Glossary
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Strangler Fig** | Pattern where new system grows alongside old, gradually replacing it |
|
||||
| **Dual-Write** | Writing to both MongoDB and PostgreSQL during transition |
|
||||
| **Tier A/B/C** | Data classification by criticality for migration strategy |
|
||||
| **DataSource** | Npgsql connection factory with tenant context configuration |
|
||||
| **Determinism** | Property that same inputs always produce same outputs |
|
||||
|
||||
---
|
||||
|
||||
*Document Version: 2.0.0*
|
||||
*Last Updated: 2025-11-28*
|
||||
Reference in New Issue
Block a user