- Introduced `BinaryReachabilityLifterTests` to validate binary lifting functionality. - Created `PackRunWorkerOptions` for configuring worker paths and execution persistence. - Added `TimelineIngestionOptions` for configuring NATS and Redis ingestion transports. - Implemented `NatsTimelineEventSubscriber` for subscribing to NATS events. - Developed `RedisTimelineEventSubscriber` for reading from Redis Streams. - Added `TimelineEnvelopeParser` to normalize incoming event envelopes. - Created unit tests for `TimelineEnvelopeParser` to ensure correct field mapping. - Implemented `TimelineAuthorizationAuditSink` for logging authorization outcomes.
492 lines
17 KiB
Markdown
492 lines
17 KiB
Markdown
# MongoDB to PostgreSQL Conversion Plan
|
|
|
|
**Version:** 2.0.0
|
|
**Status:** APPROVED
|
|
**Created:** 2025-11-28
|
|
**Last Updated:** 2025-11-28
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
This document outlines the strategic plan to **convert** (not migrate) StellaOps from MongoDB to PostgreSQL for control-plane domains. The conversion follows a "strangler fig" pattern, introducing PostgreSQL repositories alongside existing MongoDB implementations and gradually switching each bounded context.
|
|
|
|
**Key Finding:** StellaOps already has production-ready PostgreSQL patterns in the Orchestrator and Findings modules that serve as templates for all other modules.
|
|
|
|
### Related Documents
|
|
|
|
| Document | Purpose |
|
|
|----------|---------|
|
|
| [SPECIFICATION.md](./SPECIFICATION.md) | Schema designs, naming conventions, data types |
|
|
| [RULES.md](./RULES.md) | Database coding rules and patterns |
|
|
| [VERIFICATION.md](./VERIFICATION.md) | Testing and verification requirements |
|
|
| [tasks/](./tasks/) | Detailed task definitions per phase |
|
|
|
|
---
|
|
|
|
## 1. Principles & Scope
|
|
|
|
### 1.1 Goals
|
|
|
|
Convert **control-plane** domains from MongoDB to PostgreSQL:
|
|
|
|
| Domain | Current DB | Target | Priority |
|
|
|--------|-----------|--------|----------|
|
|
| Authority | `stellaops_authority` | PostgreSQL | P0 |
|
|
| Scheduler | `stellaops_scheduler` | PostgreSQL | P0 |
|
|
| Notify | `stellaops_notify` | PostgreSQL | P1 |
|
|
| Policy | `stellaops_policy` | PostgreSQL | P1 |
|
|
| Vulnerabilities (Concelier) | `concelier` | PostgreSQL | P2 |
|
|
| VEX & Graph (Excititor) | `excititor` | PostgreSQL | P2 |
|
|
| PacksRegistry | `stellaops_packs` | PostgreSQL | P3 |
|
|
| IssuerDirectory | `stellaops_issuer` | PostgreSQL | P3 |
|
|
|
|
### 1.2 Non-Goals
|
|
|
|
- Scanner result storage (remains object storage + Mongo for now)
|
|
- Real-time event streams (separate infrastructure)
|
|
- Legacy data archive (can remain in MongoDB read-only)
|
|
|
|
### 1.3 Constraints
|
|
|
|
**MUST Preserve:**
|
|
- Deterministic, replayable scans
|
|
- "Preserve/prune source" rule for Concelier/Excititor
|
|
- Lattice logic in `Scanner.WebService` (not in DB)
|
|
- Air-gap friendliness and offline-kit packaging
|
|
- Multi-tenant isolation patterns
|
|
- Zero downtime during conversion
|
|
|
|
### 1.4 Conversion vs Migration
|
|
|
|
This is a **conversion**, not a 1:1 document→row mapping:
|
|
|
|
| Approach | When to Use |
|
|
|----------|-------------|
|
|
| **Normalize** | Identities, jobs, schedules, relationships |
|
|
| **Keep JSONB** | Advisory payloads, provenance trails, evidence manifests |
|
|
| **Drop/Archive** | Ephemeral data (caches, locks), historical logs |
|
|
|
|
---
|
|
|
|
## 2. Architecture
|
|
|
|
### 2.1 Strangler Fig Pattern
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Service Layer │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ Repository Interface │
|
|
│ (e.g., IScheduleRepository) │
|
|
├──────────────────────┬──────────────────────────────────────┤
|
|
│ MongoRepository │ PostgresRepository │
|
|
│ (existing) │ (new) │
|
|
├──────────────────────┴──────────────────────────────────────┤
|
|
│ DI Container (configured switch) │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### 2.2 Configuration-Driven Backend Selection
|
|
|
|
```json
|
|
{
|
|
"Persistence": {
|
|
"Authority": "Postgres",
|
|
"Scheduler": "Postgres",
|
|
"Concelier": "Mongo",
|
|
"Excititor": "Mongo",
|
|
"Notify": "Postgres",
|
|
"Policy": "Mongo"
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2.3 Existing PostgreSQL Patterns
|
|
|
|
The codebase already contains production-ready patterns:
|
|
|
|
| Module | Location | Reusable Components |
|
|
|--------|----------|---------------------|
|
|
| Orchestrator | `src/Orchestrator/.../Infrastructure/Postgres/` | DataSource, tenant context, repository pattern |
|
|
| Findings | `src/Findings/StellaOps.Findings.Ledger/Infrastructure/Postgres/` | Ledger events, Merkle anchors, projections |
|
|
|
|
**Reference Implementation:** `OrchestratorDataSource.cs`
|
|
|
|
---
|
|
|
|
## 3. Data Tiering
|
|
|
|
### 3.1 Tier Definitions
|
|
|
|
| Tier | Description | Strategy |
|
|
|------|-------------|----------|
|
|
| **A** | Critical business data | Full conversion with verification |
|
|
| **B** | Important but recoverable | Convert active records only |
|
|
| **C** | Ephemeral/cache data | Fresh start, no migration |
|
|
|
|
### 3.2 Module Tiering
|
|
|
|
#### Authority
|
|
| Collection | Tier | Strategy |
|
|
|------------|------|----------|
|
|
| `authority_users` | A | Full conversion |
|
|
| `authority_clients` | A | Full conversion |
|
|
| `authority_scopes` | A | Full conversion |
|
|
| `authority_tokens` | B | Active tokens only |
|
|
| `authority_service_accounts` | A | Full conversion |
|
|
| `authority_login_attempts` | B | Recent 90 days |
|
|
| `authority_revocations` | A | Full conversion |
|
|
|
|
#### Scheduler
|
|
| Collection | Tier | Strategy |
|
|
|------------|------|----------|
|
|
| `schedules` | A | Full conversion |
|
|
| `runs` | B | Recent 180 days |
|
|
| `graph_jobs` | B | Active/recent only |
|
|
| `policy_jobs` | B | Active/recent only |
|
|
| `impact_snapshots` | B | Recent 90 days |
|
|
| `locks` | C | Fresh start |
|
|
|
|
#### Concelier (Vulnerabilities)
|
|
| Collection | Tier | Strategy |
|
|
|------------|------|----------|
|
|
| `advisory` | A | Full conversion |
|
|
| `advisory_raw` | B | GridFS refs only |
|
|
| `alias` | A | Full conversion |
|
|
| `affected` | A | Full conversion |
|
|
| `source` | A | Full conversion |
|
|
| `source_state` | A | Full conversion |
|
|
| `jobs`, `locks` | C | Fresh start |
|
|
|
|
#### Excititor (VEX)
|
|
| Collection | Tier | Strategy |
|
|
|------------|------|----------|
|
|
| `vex.statements` | A | Full conversion |
|
|
| `vex.observations` | A | Full conversion |
|
|
| `vex.linksets` | A | Full conversion |
|
|
| `vex.consensus` | A | Full conversion |
|
|
| `vex.raw` | B | Active/recent only |
|
|
| `vex.cache` | C | Fresh start |
|
|
|
|
---
|
|
|
|
## 4. Execution Phases
|
|
|
|
### Phase Overview
|
|
|
|
```
|
|
Phase 0: Foundations [1 sprint]
|
|
│
|
|
├─→ Phase 1: Authority [1 sprint]
|
|
│
|
|
├─→ Phase 2: Scheduler [1 sprint]
|
|
│
|
|
├─→ Phase 3: Notify [1 sprint]
|
|
│
|
|
├─→ Phase 4: Policy [1 sprint]
|
|
│
|
|
└─→ Phase 5: Concelier [2 sprints]
|
|
│
|
|
└─→ Phase 6: Excititor [2-3 sprints]
|
|
│
|
|
└─→ Phase 7: Cleanup [1 sprint]
|
|
```
|
|
|
|
### Phase Summary
|
|
|
|
| Phase | Scope | Duration | Dependencies | Deliverable |
|
|
|-------|-------|----------|--------------|-------------|
|
|
| 0 | Foundations | 1 sprint | None | PostgreSQL infrastructure, shared library |
|
|
| 1 | Authority | 1 sprint | Phase 0 | Identity management on PostgreSQL |
|
|
| 2 | Scheduler | 1 sprint | Phase 0 | Job scheduling on PostgreSQL |
|
|
| 3 | Notify | 1 sprint | Phase 0 | Notifications on PostgreSQL |
|
|
| 4 | Policy | 1 sprint | Phase 0 | Policy engine on PostgreSQL |
|
|
| 5 | Concelier | 2 sprints | Phase 0 | Vulnerability index on PostgreSQL |
|
|
| 6 | Excititor | 2-3 sprints | Phase 5 | VEX & graphs on PostgreSQL |
|
|
| 7 | Cleanup | 1 sprint | All | MongoDB retired, docs updated |
|
|
|
|
**Total: 10-12 sprints**
|
|
|
|
### Detailed Task Definitions
|
|
|
|
See:
|
|
- [tasks/PHASE_0_FOUNDATIONS.md](./tasks/PHASE_0_FOUNDATIONS.md)
|
|
- [tasks/PHASE_1_AUTHORITY.md](./tasks/PHASE_1_AUTHORITY.md)
|
|
- [tasks/PHASE_2_SCHEDULER.md](./tasks/PHASE_2_SCHEDULER.md)
|
|
- [tasks/PHASE_3_NOTIFY.md](./tasks/PHASE_3_NOTIFY.md)
|
|
- [tasks/PHASE_4_POLICY.md](./tasks/PHASE_4_POLICY.md)
|
|
- [tasks/PHASE_5_VULNERABILITIES.md](./tasks/PHASE_5_VULNERABILITIES.md)
|
|
- [tasks/PHASE_6_VEX_GRAPH.md](./tasks/PHASE_6_VEX_GRAPH.md)
|
|
- [tasks/PHASE_7_CLEANUP.md](./tasks/PHASE_7_CLEANUP.md)
|
|
|
|
---
|
|
|
|
## 5. Conversion Strategy
|
|
|
|
### 5.1 Per-Module Approach
|
|
|
|
```
|
|
1. Create PostgreSQL storage project
|
|
2. Implement schema migrations
|
|
3. Implement repository interfaces
|
|
4. Add configuration switch
|
|
5. (Retired) Dual-write was used during migration for Tier A; all modules are now Postgres-only.
|
|
6. Run verification tests
|
|
7. Switch to PostgreSQL-only
|
|
8. Archive MongoDB data
|
|
```
|
|
|
|
### 5.2 Dual-Write Pattern
|
|
|
|
For Tier A data requiring historical continuity:
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
│ DualWriteRepository │
|
|
├──────────────────────────────────────────────────────────────┤
|
|
│ Write: PostgreSQL (primary) + MongoDB (secondary) │
|
|
│ Read: PostgreSQL (primary) → MongoDB (fallback) │
|
|
│ Config: WriteToBoth, FallbackToMongo, ConvertOnRead │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### 5.3 Fresh Start Pattern
|
|
|
|
For Tier C ephemeral data:
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
│ 1. Deploy PostgreSQL schema │
|
|
│ 2. Switch configuration to PostgreSQL │
|
|
│ 3. New data goes to PostgreSQL only │
|
|
│ 4. Old MongoDB data ages out naturally │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Risk Assessment
|
|
|
|
### 6.1 Technical Risks
|
|
|
|
| Risk | Impact | Likelihood | Mitigation |
|
|
|------|--------|------------|------------|
|
|
| Data loss during conversion | High | Low | Dual-write mode, extensive verification |
|
|
| Performance regression | Medium | Medium | Load testing before switch, index optimization |
|
|
| Determinism violation | High | Medium | Automated verification tests, parallel pipeline |
|
|
| Schema evolution conflicts | Medium | Low | Migration framework, schema versioning |
|
|
| Transaction semantics differences | Medium | Low | Code review, integration tests |
|
|
|
|
### 6.2 Operational Risks
|
|
|
|
| Risk | Impact | Likelihood | Mitigation |
|
|
|------|--------|------------|------------|
|
|
| Extended conversion timeline | Medium | Medium | Phase-based approach, clear milestones |
|
|
| Team learning curve | Low | Medium | Reference implementations, documentation |
|
|
| Rollback complexity | Medium | Low | Keep Mongo data until verified, feature flags |
|
|
|
|
### 6.3 Rollback Strategy
|
|
|
|
Each phase has independent rollback capability:
|
|
|
|
| Level | Action | Recovery Time |
|
|
|-------|--------|---------------|
|
|
| Configuration | Change `Persistence:<Module>` to `Mongo` | Minutes |
|
|
| Data | MongoDB data retained during dual-write | None needed (historical note; dual-write ended after cutover) |
|
|
| Code | Git revert (PostgreSQL code isolated) | Hours |
|
|
|
|
---
|
|
|
|
## 7. Success Criteria
|
|
|
|
### 7.1 Per-Module Criteria
|
|
|
|
- [ ] All existing integration tests pass with PostgreSQL backend
|
|
- [ ] No performance regression >10% on critical paths
|
|
- [ ] Deterministic outputs verified against MongoDB baseline
|
|
- [ ] Zero data loss during conversion
|
|
- [ ] Tenant isolation verified
|
|
|
|
### 7.2 Overall Criteria
|
|
|
|
- [ ] All control-plane modules running on PostgreSQL
|
|
- [ ] MongoDB retired from production for converted modules
|
|
- [ ] Air-gap kit updated with PostgreSQL support
|
|
- [ ] Documentation updated for PostgreSQL operations
|
|
- [ ] Runbooks updated for PostgreSQL troubleshooting
|
|
|
|
---
|
|
|
|
## 8. Project Structure
|
|
|
|
### 8.1 New Projects
|
|
|
|
```
|
|
src/
|
|
├── Shared/
|
|
│ └── StellaOps.Infrastructure.Postgres/
|
|
│ ├── DataSourceBase.cs
|
|
│ ├── Migrations/
|
|
│ │ ├── IPostgresMigration.cs
|
|
│ │ └── PostgresMigrationRunner.cs
|
|
│ ├── Extensions/
|
|
│ │ └── NpgsqlExtensions.cs
|
|
│ └── ServiceCollectionExtensions.cs
|
|
│
|
|
├── Authority/
|
|
│ └── __Libraries/
|
|
│ └── StellaOps.Authority.Storage.Postgres/
|
|
│ ├── AuthorityDataSource.cs
|
|
│ ├── Repositories/
|
|
│ ├── Migrations/
|
|
│ └── ServiceCollectionExtensions.cs
|
|
│
|
|
├── Scheduler/
|
|
│ └── __Libraries/
|
|
│ └── StellaOps.Scheduler.Storage.Postgres/
|
|
│
|
|
├── Notify/
|
|
│ └── __Libraries/
|
|
│ └── StellaOps.Notify.Storage.Postgres/
|
|
│
|
|
├── Policy/
|
|
│ └── __Libraries/
|
|
│ └── StellaOps.Policy.Storage.Postgres/
|
|
│
|
|
├── Concelier/
|
|
│ └── __Libraries/
|
|
│ └── StellaOps.Concelier.Storage.Postgres/
|
|
│
|
|
└── Excititor/
|
|
└── __Libraries/
|
|
└── StellaOps.Excititor.Storage.Postgres/
|
|
```
|
|
|
|
### 8.2 Schema Files
|
|
|
|
```
|
|
docs/db/
|
|
├── schemas/
|
|
│ ├── authority.sql
|
|
│ ├── vuln.sql
|
|
│ ├── vex.sql
|
|
│ ├── scheduler.sql
|
|
│ ├── notify.sql
|
|
│ └── policy.sql
|
|
```
|
|
|
|
---
|
|
|
|
## 9. Timeline
|
|
|
|
### 9.1 Sprint Schedule
|
|
|
|
| Sprint | Phase | Focus |
|
|
|--------|-------|-------|
|
|
| 1 | 0 | PostgreSQL infrastructure, shared library |
|
|
| 2 | 1 | Authority module conversion |
|
|
| 3 | 2 | Scheduler module conversion |
|
|
| 4 | 3 | Notify module conversion |
|
|
| 5 | 4 | Policy module conversion |
|
|
| 6-7 | 5 | Concelier/Vulnerability conversion |
|
|
| 8-10 | 6 | Excititor/VEX conversion |
|
|
| 11 | 7 | Cleanup, optimization, documentation |
|
|
|
|
### 9.2 Milestones
|
|
|
|
| Milestone | Sprint | Criteria |
|
|
|-----------|--------|----------|
|
|
| M1: Infrastructure Ready | 1 | PostgreSQL cluster operational, CI tests passing |
|
|
| M2: Identity Converted | 2 | Authority on PostgreSQL, auth flows working |
|
|
| M3: Scheduling Converted | 3 | Scheduler on PostgreSQL, jobs executing |
|
|
| M4: Core Services Converted | 5 | Notify + Policy on PostgreSQL |
|
|
| M5: Vulnerability Index Converted | 7 | Concelier on PostgreSQL, scans deterministic |
|
|
| M6: VEX Converted | 10 | Excititor on PostgreSQL, graphs stable |
|
|
| M7: MongoDB Retired | 11 | All modules converted, Mongo archived |
|
|
|
|
---
|
|
|
|
## 10. Governance
|
|
|
|
### 10.1 Decision Log
|
|
|
|
| Date | Decision | Rationale | Approver |
|
|
|------|----------|-----------|----------|
|
|
| 2025-11-28 | Strangler fig pattern | Allows gradual rollout with rollback | Architecture Team |
|
|
| 2025-11-28 | JSONB for semi-structured data | Preserves flexibility, simplifies conversion | Architecture Team |
|
|
| 2025-11-28 | Phase 0 first | Infrastructure must be stable before modules | Architecture Team |
|
|
|
|
### 10.2 Change Control
|
|
|
|
Changes to this plan require:
|
|
1. Impact assessment documented
|
|
2. Risk analysis updated
|
|
3. Approval from Architecture Team
|
|
4. Updated task definitions in `docs/db/tasks/`
|
|
|
|
### 10.3 Status Reporting
|
|
|
|
Weekly status updates in sprint files tracking:
|
|
- Tasks completed
|
|
- Blockers encountered
|
|
- Verification results
|
|
- Next sprint objectives
|
|
|
|
---
|
|
|
|
## Appendix A: Reference Implementation
|
|
|
|
### DataSource Pattern
|
|
|
|
```csharp
|
|
public sealed class ModuleDataSource : IAsyncDisposable
|
|
{
|
|
private readonly NpgsqlDataSource _dataSource;
|
|
|
|
public async Task<NpgsqlConnection> OpenConnectionAsync(
|
|
string tenantId,
|
|
CancellationToken cancellationToken = default)
|
|
{
|
|
var connection = await _dataSource.OpenConnectionAsync(cancellationToken);
|
|
await ConfigureSessionAsync(connection, tenantId, cancellationToken);
|
|
return connection;
|
|
}
|
|
|
|
private static async Task ConfigureSessionAsync(
|
|
NpgsqlConnection connection,
|
|
string tenantId,
|
|
CancellationToken cancellationToken)
|
|
{
|
|
await using var cmd = connection.CreateCommand();
|
|
cmd.CommandText = $"""
|
|
SET app.tenant_id = '{tenantId}';
|
|
SET timezone = 'UTC';
|
|
SET statement_timeout = '30s';
|
|
""";
|
|
await cmd.ExecuteNonQueryAsync(cancellationToken);
|
|
}
|
|
}
|
|
```
|
|
|
|
### Repository Pattern
|
|
|
|
See [RULES.md](./RULES.md) Section 1 for complete repository implementation guidelines.
|
|
|
|
---
|
|
|
|
## Appendix B: Glossary
|
|
|
|
| Term | Definition |
|
|
|------|------------|
|
|
| **Strangler Fig** | Pattern where new system grows alongside old, gradually replacing it |
|
|
| **Dual-Write** | Writing to both MongoDB and PostgreSQL during transition |
|
|
| **Tier A/B/C** | Data classification by criticality for migration strategy |
|
|
| **DataSource** | Npgsql connection factory with tenant context configuration |
|
|
| **Determinism** | Property that same inputs always produce same outputs |
|
|
|
|
---
|
|
|
|
*Document Version: 2.0.0*
|
|
*Last Updated: 2025-11-28*
|