# Phase 6: VEX & Graph Conversion (Excititor) **Sprint:** 8-10 **Duration:** 2-3 sprints **Status:** TODO **Dependencies:** Phase 5 (Vulnerabilities) --- ## Objectives 1. Create `StellaOps.Excititor.Storage.Postgres` project 2. Implement VEX schema in PostgreSQL 3. Handle graph nodes/edges efficiently 4. Preserve graph_revision_id stability (determinism critical) 5. Maintain VEX statement lattice logic --- ## Deliverables | Deliverable | Acceptance Criteria | |-------------|---------------------| | VEX schema | All tables created with indexes | | Graph storage | Nodes/edges efficiently stored | | Statement storage | VEX statements with full provenance | | Revision stability | Same inputs produce same revision_id | | Integration tests | 100% coverage | --- ## Schema Reference See [SPECIFICATION.md](../SPECIFICATION.md) Section 5.3 for complete VEX schema. **Tables:** - `vex.projects` - `vex.graph_revisions` - `vex.graph_nodes` - `vex.graph_edges` - `vex.statements` - `vex.observations` - `vex.linksets` - `vex.linkset_events` - `vex.consensus` - `vex.consensus_holds` - `vex.unknowns_snapshots` - `vex.unknown_items` - `vex.evidence_manifests` - `vex.cvss_receipts` - `vex.attestations` - `vex.timeline_events` --- ## Sprint 6a: Core Schema & Repositories ### T6a.1: Create Excititor.Storage.Postgres Project **Status:** TODO **Estimate:** 0.5 days **Subtasks:** - [ ] Create project structure - [ ] Add NuGet references - [ ] Create `ExcititorDataSource` class - [ ] Create `ServiceCollectionExtensions.cs` --- ### T6a.2: Implement Schema Migrations **Status:** TODO **Estimate:** 1.5 days **Subtasks:** - [ ] Create schema migration - [ ] Include all tables - [ ] Add indexes for graph traversal - [ ] Add indexes for VEX lookups - [ ] Test migration idempotency --- ### T6a.3: Implement Project Repository **Status:** TODO **Estimate:** 0.5 days **Subtasks:** - [ ] Implement CRUD operations - [ ] Handle tenant scoping - [ ] Write integration tests --- ### T6a.4: Implement VEX Statement Repository **Status:** TODO **Estimate:** 1.5 days **Interface:** ```csharp public interface IVexStatementRepository { Task GetAsync(string tenantId, Guid statementId, CancellationToken ct); Task> GetByVulnerabilityAsync( string tenantId, string vulnerabilityId, CancellationToken ct); Task> GetByProjectAsync( string tenantId, Guid projectId, CancellationToken ct); Task UpsertAsync(VexStatement statement, CancellationToken ct); Task> GetByGraphRevisionAsync( Guid graphRevisionId, CancellationToken ct); } ``` **Subtasks:** - [ ] Implement all interface methods - [ ] Handle status and justification enums - [ ] Preserve evidence JSONB - [ ] Preserve provenance JSONB - [ ] Write integration tests --- ### T6a.5: Implement VEX Observation Repository **Status:** TODO **Estimate:** 1 day **Subtasks:** - [ ] Implement CRUD operations - [ ] Handle unique constraint on composite key - [ ] Implement FindByVulnerabilityAndProductAsync - [ ] Write integration tests --- ### T6a.6: Implement Linkset Repository **Status:** TODO **Estimate:** 0.5 days **Subtasks:** - [ ] Implement CRUD operations - [ ] Implement event logging - [ ] Write integration tests --- ### T6a.7: Implement Consensus Repository **Status:** TODO **Estimate:** 0.5 days **Subtasks:** - [ ] Implement CRUD operations - [ ] Implement hold management - [ ] Write integration tests --- ## Sprint 6b: Graph Storage ### T6b.1: Implement Graph Revision Repository **Status:** TODO **Estimate:** 1 day **Interface:** ```csharp public interface IGraphRevisionRepository { Task GetByIdAsync(Guid id, CancellationToken ct); Task GetByRevisionIdAsync(string revisionId, CancellationToken ct); Task GetLatestByProjectAsync(Guid projectId, CancellationToken ct); Task CreateAsync(GraphRevision revision, CancellationToken ct); Task> GetHistoryAsync( Guid projectId, int limit, CancellationToken ct); } ``` **Subtasks:** - [ ] Implement all interface methods - [ ] Handle revision_id uniqueness - [ ] Handle parent_revision_id linking - [ ] Write integration tests --- ### T6b.2: Implement Graph Node Repository **Status:** TODO **Estimate:** 1.5 days **Interface:** ```csharp public interface IGraphNodeRepository { Task GetByIdAsync(long nodeId, CancellationToken ct); Task GetByKeyAsync(Guid graphRevisionId, string nodeKey, CancellationToken ct); Task> GetByRevisionAsync( Guid graphRevisionId, CancellationToken ct); Task BulkInsertAsync( Guid graphRevisionId, IEnumerable nodes, CancellationToken ct); Task GetCountAsync(Guid graphRevisionId, CancellationToken ct); } ``` **Subtasks:** - [ ] Implement all interface methods - [ ] Implement bulk insert for efficiency - [ ] Handle node_key uniqueness per revision - [ ] Write integration tests **Bulk Insert Optimization:** ```csharp public async Task BulkInsertAsync( Guid graphRevisionId, IEnumerable nodes, CancellationToken ct) { await using var connection = await _dataSource.OpenConnectionAsync("system", ct); await using var writer = await connection.BeginBinaryImportAsync( "COPY vex.graph_nodes (graph_revision_id, node_key, node_type, purl, name, version, attributes) " + "FROM STDIN (FORMAT BINARY)", ct); foreach (var node in nodes) { await writer.StartRowAsync(ct); await writer.WriteAsync(graphRevisionId, ct); await writer.WriteAsync(node.NodeKey, ct); await writer.WriteAsync(node.NodeType, ct); await writer.WriteAsync(node.Purl, NpgsqlDbType.Text, ct); await writer.WriteAsync(node.Name, NpgsqlDbType.Text, ct); await writer.WriteAsync(node.Version, NpgsqlDbType.Text, ct); await writer.WriteAsync(JsonSerializer.Serialize(node.Attributes), NpgsqlDbType.Jsonb, ct); } await writer.CompleteAsync(ct); } ``` --- ### T6b.3: Implement Graph Edge Repository **Status:** TODO **Estimate:** 1.5 days **Interface:** ```csharp public interface IGraphEdgeRepository { Task> GetByRevisionAsync( Guid graphRevisionId, CancellationToken ct); Task> GetOutgoingAsync( long fromNodeId, CancellationToken ct); Task> GetIncomingAsync( long toNodeId, CancellationToken ct); Task BulkInsertAsync( Guid graphRevisionId, IEnumerable edges, CancellationToken ct); Task GetCountAsync(Guid graphRevisionId, CancellationToken ct); } ``` **Subtasks:** - [ ] Implement all interface methods - [ ] Implement bulk insert for efficiency - [ ] Optimize for traversal queries - [ ] Write integration tests --- ### T6b.4: Verify Graph Revision ID Stability **Status:** TODO **Estimate:** 1 day **Description:** Critical: Same SBOM + feeds + policy must produce identical revision_id. **Subtasks:** - [ ] Document revision_id computation algorithm - [ ] Verify nodes are inserted in deterministic order - [ ] Verify edges are inserted in deterministic order - [ ] Write stability tests **Stability Test:** ```csharp [Fact] public async Task Same_Inputs_Should_Produce_Same_RevisionId() { var sbom = await LoadSbom("testdata/stable-sbom.json"); var feedSnapshot = "feed-v1.2.3"; var policyVersion = "policy-v1.0"; // Compute multiple times var revisions = new List(); for (int i = 0; i < 5; i++) { var graph = await _graphService.ComputeGraphAsync( sbom, feedSnapshot, policyVersion); revisions.Add(graph.RevisionId); } // All must be identical revisions.Distinct().Should().HaveCount(1); } ``` --- ## Sprint 6c: Migration & Verification ### T6c.1: Build Graph Conversion Service **Status:** TODO **Estimate:** 1.5 days **Description:** Convert existing MongoDB graphs to PostgreSQL. **Subtasks:** - [ ] Parse MongoDB graph documents - [ ] Map to graph_revisions table - [ ] Extract and insert nodes - [ ] Extract and insert edges - [ ] Verify node/edge counts match --- ### T6c.2: Build VEX Conversion Service **Status:** TODO **Estimate:** 1 day **Subtasks:** - [ ] Parse MongoDB VEX statements - [ ] Map to vex.statements table - [ ] Preserve provenance - [ ] Preserve evidence --- ### T6c.3: Run Dual Pipeline Comparison **Status:** TODO **Estimate:** 2 days **Description:** Run graph computation on both backends and compare. **Subtasks:** - [ ] Select sample projects - [ ] Compute graphs with MongoDB - [ ] Compute graphs with PostgreSQL - [ ] Compare revision_ids (must match) - [ ] Compare node counts - [ ] Compare edge counts - [ ] Compare VEX statements - [ ] Document any differences --- ### T6c.4: Migrate Projects **Status:** TODO **Estimate:** 1 day **Subtasks:** - [ ] Identify projects to migrate (active VEX) - [ ] Run conversion for each project - [ ] Verify latest graph revision - [ ] Verify VEX statements --- ### T6c.5: Switch to PostgreSQL-Only **Status:** TODO **Estimate:** 0.5 days **Subtasks:** - [ ] Update configuration - [ ] Deploy to staging - [ ] Run full test suite - [ ] Deploy to production - [ ] Monitor metrics --- ## Exit Criteria - [ ] All repository interfaces implemented - [ ] Graph storage working efficiently - [ ] Graph revision IDs stable (deterministic) - [ ] VEX statements preserved correctly - [ ] All comparison tests pass - [ ] Excititor running on PostgreSQL in production --- ## Risks & Mitigations | Risk | Likelihood | Impact | Mitigation | |------|------------|--------|------------| | Revision ID instability | Medium | Critical | Deterministic ordering tests | | Graph storage performance | Medium | High | Bulk insert, index optimization | | VEX lattice logic errors | Low | High | Extensive comparison testing | --- ## Performance Considerations ### Graph Storage - Use `BIGSERIAL` for node/edge IDs (high volume) - Use `COPY` for bulk inserts (10-100x faster) - Index `(graph_revision_id, node_key)` for lookups - Index `(from_node_id)` and `(to_node_id)` for traversal ### Estimated Volumes | Table | Estimated Rows per Project | Total Estimated | |-------|---------------------------|-----------------| | graph_nodes | 1,000 - 50,000 | 10M+ | | graph_edges | 2,000 - 100,000 | 20M+ | | vex_statements | 100 - 5,000 | 1M+ | --- *Phase Version: 1.0.0* *Last Updated: 2025-11-28*