# Phase 6: VEX & Graph Conversion (Excititor) **Sprint:** 8-10 **Duration:** 2-3 sprints **Status:** DONE **Dependencies:** Phase 5 (Vulnerabilities); Phase 0 (Foundations) — DONE --- ## Objectives 1. Create `StellaOps.Excititor.Storage.Postgres` project 2. Implement VEX schema in PostgreSQL 3. Handle graph nodes/edges efficiently 4. Preserve graph_revision_id stability (determinism critical) 5. Maintain VEX statement lattice logic --- ## Deliverables | Deliverable | Acceptance Criteria | |-------------|---------------------| | VEX schema | All tables created with indexes | | Graph storage | Nodes/edges efficiently stored | | Statement storage | VEX statements with full provenance | | Revision stability | Same inputs produce same revision_id | | Integration tests | 100% coverage | --- ## Schema Reference See [SPECIFICATION.md](../SPECIFICATION.md) Section 5.3 for complete VEX schema. **Tables:** - `vex.projects` - `vex.graph_revisions` - `vex.graph_nodes` - `vex.graph_edges` - `vex.statements` - `vex.observations` - `vex.linksets` - `vex.linkset_events` - `vex.consensus` - `vex.consensus_holds` - `vex.unknowns_snapshots` - `vex.unknown_items` - `vex.evidence_manifests` - `vex.cvss_receipts` - `vex.attestations` - `vex.timeline_events` --- ## Sprint 6a: Core Schema & Repositories ### T6a.1: Create Excititor.Storage.Postgres Project **Status:** DONE **Estimate:** 0.5 days **Subtasks:** - [x] Create project structure - [x] Add NuGet references - [x] Create `ExcititorDataSource` class - [x] Create `ServiceCollectionExtensions.cs` --- ### T6a.2: Implement Schema Migrations **Status:** DONE **Estimate:** 1.5 days **Subtasks:** - [x] Create schema migration - [x] Include all tables - [x] Add indexes for graph traversal - [x] Add indexes for VEX lookups - [x] Test migration idempotency --- ### T6a.3: Implement Project Repository **Status:** DONE **Estimate:** 0.5 days **Subtasks:** - [x] Implement CRUD operations - [x] Handle tenant scoping - [x] Write integration tests --- ### T6a.4: Implement VEX Statement Repository **Status:** DONE **Estimate:** 1.5 days **Interface:** ```csharp public interface IVexStatementRepository { Task GetAsync(string tenantId, Guid statementId, CancellationToken ct); Task> GetByVulnerabilityAsync( string tenantId, string vulnerabilityId, CancellationToken ct); Task> GetByProjectAsync( string tenantId, Guid projectId, CancellationToken ct); Task UpsertAsync(VexStatement statement, CancellationToken ct); Task> GetByGraphRevisionAsync( Guid graphRevisionId, CancellationToken ct); } ``` - **Subtasks:** - [x] Implement all interface methods - [x] Handle status and justification enums - [x] Preserve evidence JSONB - [ ] Preserve provenance JSONB - [ ] Write integration tests --- ### T6a.5: Implement VEX Observation Repository **Status:** DONE **Estimate:** 1 day **Subtasks:** - [x] Implement CRUD operations - [x] Handle unique constraint on composite key - [x] Implement FindByVulnerabilityAndProductAsync - [x] Write integration tests --- ### T6a.6: Implement Linkset Repository **Status:** DONE **Estimate:** 0.5 days **Subtasks:** - [x] Implement CRUD operations - [x] Implement event logging - [x] Write integration tests --- ### T6a.7: Implement Consensus Repository **Status:** DONE **Estimate:** 0.5 days **Subtasks:** - [x] Implement CRUD operations - [x] Implement hold management - [x] Write integration tests --- ## Sprint 6b: Graph Storage ### T6b.1: Implement Graph Revision Repository **Status:** DONE **Estimate:** 1 day **Interface:** ```csharp public interface IGraphRevisionRepository { Task GetByIdAsync(Guid id, CancellationToken ct); Task GetByRevisionIdAsync(string revisionId, CancellationToken ct); Task GetLatestByProjectAsync(Guid projectId, CancellationToken ct); Task CreateAsync(GraphRevision revision, CancellationToken ct); Task> GetHistoryAsync( Guid projectId, int limit, CancellationToken ct); } ``` **Subtasks:** - [x] Implement all interface methods - [x] Handle revision_id uniqueness - [x] Handle parent_revision_id linking - [x] Write integration tests --- ### T6b.2: Implement Graph Node Repository **Status:** DONE **Estimate:** 1.5 days **Interface:** ```csharp public interface IGraphNodeRepository { Task GetByIdAsync(long nodeId, CancellationToken ct); Task GetByKeyAsync(Guid graphRevisionId, string nodeKey, CancellationToken ct); Task> GetByRevisionAsync( Guid graphRevisionId, CancellationToken ct); Task BulkInsertAsync( Guid graphRevisionId, IEnumerable nodes, CancellationToken ct); Task GetCountAsync(Guid graphRevisionId, CancellationToken ct); } ``` **Subtasks:** - [x] Implement all interface methods - [x] Implement bulk insert for efficiency - [x] Handle node_key uniqueness per revision - [x] Write integration tests **Bulk Insert Optimization:** ```csharp public async Task BulkInsertAsync( Guid graphRevisionId, IEnumerable nodes, CancellationToken ct) { await using var connection = await _dataSource.OpenConnectionAsync("system", ct); await using var writer = await connection.BeginBinaryImportAsync( "COPY vex.graph_nodes (graph_revision_id, node_key, node_type, purl, name, version, attributes) " + "FROM STDIN (FORMAT BINARY)", ct); foreach (var node in nodes) { await writer.StartRowAsync(ct); await writer.WriteAsync(graphRevisionId, ct); await writer.WriteAsync(node.NodeKey, ct); await writer.WriteAsync(node.NodeType, ct); await writer.WriteAsync(node.Purl, NpgsqlDbType.Text, ct); await writer.WriteAsync(node.Name, NpgsqlDbType.Text, ct); await writer.WriteAsync(node.Version, NpgsqlDbType.Text, ct); await writer.WriteAsync(JsonSerializer.Serialize(node.Attributes), NpgsqlDbType.Jsonb, ct); } await writer.CompleteAsync(ct); } ``` --- ### T6b.3: Implement Graph Edge Repository **Status:** DONE **Estimate:** 1.5 days **Interface:** ```csharp public interface IGraphEdgeRepository { Task> GetByRevisionAsync( Guid graphRevisionId, CancellationToken ct); Task> GetOutgoingAsync( long fromNodeId, CancellationToken ct); Task> GetIncomingAsync( long toNodeId, CancellationToken ct); Task BulkInsertAsync( Guid graphRevisionId, IEnumerable edges, CancellationToken ct); Task GetCountAsync(Guid graphRevisionId, CancellationToken ct); } ``` **Subtasks:** - [x] Implement all interface methods - [x] Implement bulk insert for efficiency - [x] Optimize for traversal queries - [x] Write integration tests --- ### T6b.4: Verify Graph Revision ID Stability **Status:** DONE **Estimate:** 1 day **Description:** Critical: Same SBOM + feeds + policy must produce identical revision_id. **Subtasks:** - [x] Document revision_id computation algorithm - [x] Verify nodes are inserted in deterministic order - [x] Verify edges are inserted in deterministic order - [x] Write stability tests **Stability Test:** ```csharp [Fact] public async Task Same_Inputs_Should_Produce_Same_RevisionId() { var sbom = await LoadSbom("testdata/stable-sbom.json"); var feedSnapshot = "feed-v1.2.3"; var policyVersion = "policy-v1.0"; // Compute multiple times var revisions = new List(); for (int i = 0; i < 5; i++) { var graph = await _graphService.ComputeGraphAsync( sbom, feedSnapshot, policyVersion); revisions.Add(graph.RevisionId); } // All must be identical revisions.Distinct().Should().HaveCount(1); } ``` --- ## Sprint 6c: Migration & Verification (Fresh-Start) ### T6c.1: Build Graph Conversion Service **Status:** SKIPPED (fresh-start; no Mongo graph backfill) **Estimate:** 0 days --- ### T6c.2: Build VEX Conversion Service **Status:** SKIPPED (fresh-start; no Mongo VEX backfill) **Estimate:** 0 days --- ### T6c.3: Run Dual Pipeline Comparison **Status:** SKIPPED (fresh-start) **Estimate:** 0 days --- ### T6c.4: Migrate Projects **Status:** SKIPPED (fresh-start) **Estimate:** 0 days --- ### T6c.5: Switch to PostgreSQL-Only **Status:** DONE **Estimate:** 0.5 days **Subtasks:** - [x] Update configuration - [x] Deploy to staging - [x] Run full test suite - [x] Deploy to production - [x] Monitor metrics --- ## Exit Criteria - [x] All repository interfaces implemented - [x] Graph storage working efficiently - [x] Graph revision IDs stable (deterministic) - [x] VEX statements preserved correctly - [x] Determinism tests pass (Postgres baseline) - [ ] Excititor running on PostgreSQL in production ## Execution Log | Date (UTC) | Update | | --- | --- | | 2025-12-05 | Core schema/repos/migrations/tests completed; determinism verified; fresh-start path chosen (no Mongo VEX/graph backfill). | --- ## Risks & Mitigations | Risk | Likelihood | Impact | Mitigation | |------|------------|--------|------------| | Revision ID instability | Medium | Critical | Deterministic ordering tests | | Graph storage performance | Medium | High | Bulk insert, index optimization | | VEX lattice logic errors | Low | High | Extensive comparison testing | --- ## Performance Considerations ### Graph Storage - Use `BIGSERIAL` for node/edge IDs (high volume) - Use `COPY` for bulk inserts (10-100x faster) - Index `(graph_revision_id, node_key)` for lookups - Index `(from_node_id)` and `(to_node_id)` for traversal ### Estimated Volumes | Table | Estimated Rows per Project | Total Estimated | |-------|---------------------------|-----------------| | graph_nodes | 1,000 - 50,000 | 10M+ | | graph_edges | 2,000 - 100,000 | 20M+ | | vex_statements | 100 - 5,000 | 1M+ | --- *Phase Version: 1.0.0* *Last Updated: 2025-11-28*