10 KiB
Phase 6: VEX & Graph Conversion (Excititor)
Sprint: 8-10 Duration: 2-3 sprints Status: TODO Dependencies: Phase 5 (Vulnerabilities)
Objectives
- Create
StellaOps.Excititor.Storage.Postgresproject - Implement VEX schema in PostgreSQL
- Handle graph nodes/edges efficiently
- Preserve graph_revision_id stability (determinism critical)
- Maintain VEX statement lattice logic
Deliverables
| Deliverable | Acceptance Criteria |
|---|---|
| VEX schema | All tables created with indexes |
| Graph storage | Nodes/edges efficiently stored |
| Statement storage | VEX statements with full provenance |
| Revision stability | Same inputs produce same revision_id |
| Integration tests | 100% coverage |
Schema Reference
See SPECIFICATION.md Section 5.3 for complete VEX schema.
Tables:
vex.projectsvex.graph_revisionsvex.graph_nodesvex.graph_edgesvex.statementsvex.observationsvex.linksetsvex.linkset_eventsvex.consensusvex.consensus_holdsvex.unknowns_snapshotsvex.unknown_itemsvex.evidence_manifestsvex.cvss_receiptsvex.attestationsvex.timeline_events
Sprint 6a: Core Schema & Repositories
T6a.1: Create Excititor.Storage.Postgres Project
Status: TODO Estimate: 0.5 days
Subtasks:
- Create project structure
- Add NuGet references
- Create
ExcititorDataSourceclass - Create
ServiceCollectionExtensions.cs
T6a.2: Implement Schema Migrations
Status: TODO Estimate: 1.5 days
Subtasks:
- Create schema migration
- Include all tables
- Add indexes for graph traversal
- Add indexes for VEX lookups
- Test migration idempotency
T6a.3: Implement Project Repository
Status: TODO Estimate: 0.5 days
Subtasks:
- Implement CRUD operations
- Handle tenant scoping
- Write integration tests
T6a.4: Implement VEX Statement Repository
Status: TODO Estimate: 1.5 days
Interface:
public interface IVexStatementRepository
{
Task<VexStatement?> GetAsync(string tenantId, Guid statementId, CancellationToken ct);
Task<IReadOnlyList<VexStatement>> GetByVulnerabilityAsync(
string tenantId, string vulnerabilityId, CancellationToken ct);
Task<IReadOnlyList<VexStatement>> GetByProjectAsync(
string tenantId, Guid projectId, CancellationToken ct);
Task<VexStatement> UpsertAsync(VexStatement statement, CancellationToken ct);
Task<IReadOnlyList<VexStatement>> GetByGraphRevisionAsync(
Guid graphRevisionId, CancellationToken ct);
}
Subtasks:
- Implement all interface methods
- Handle status and justification enums
- Preserve evidence JSONB
- Preserve provenance JSONB
- Write integration tests
T6a.5: Implement VEX Observation Repository
Status: TODO Estimate: 1 day
Subtasks:
- Implement CRUD operations
- Handle unique constraint on composite key
- Implement FindByVulnerabilityAndProductAsync
- Write integration tests
T6a.6: Implement Linkset Repository
Status: TODO Estimate: 0.5 days
Subtasks:
- Implement CRUD operations
- Implement event logging
- Write integration tests
T6a.7: Implement Consensus Repository
Status: TODO Estimate: 0.5 days
Subtasks:
- Implement CRUD operations
- Implement hold management
- Write integration tests
Sprint 6b: Graph Storage
T6b.1: Implement Graph Revision Repository
Status: TODO Estimate: 1 day
Interface:
public interface IGraphRevisionRepository
{
Task<GraphRevision?> GetByIdAsync(Guid id, CancellationToken ct);
Task<GraphRevision?> GetByRevisionIdAsync(string revisionId, CancellationToken ct);
Task<GraphRevision?> GetLatestByProjectAsync(Guid projectId, CancellationToken ct);
Task<GraphRevision> CreateAsync(GraphRevision revision, CancellationToken ct);
Task<IReadOnlyList<GraphRevision>> GetHistoryAsync(
Guid projectId, int limit, CancellationToken ct);
}
Subtasks:
- Implement all interface methods
- Handle revision_id uniqueness
- Handle parent_revision_id linking
- Write integration tests
T6b.2: Implement Graph Node Repository
Status: TODO Estimate: 1.5 days
Interface:
public interface IGraphNodeRepository
{
Task<GraphNode?> GetByIdAsync(long nodeId, CancellationToken ct);
Task<GraphNode?> GetByKeyAsync(Guid graphRevisionId, string nodeKey, CancellationToken ct);
Task<IReadOnlyList<GraphNode>> GetByRevisionAsync(
Guid graphRevisionId, CancellationToken ct);
Task BulkInsertAsync(
Guid graphRevisionId, IEnumerable<GraphNode> nodes, CancellationToken ct);
Task<int> GetCountAsync(Guid graphRevisionId, CancellationToken ct);
}
Subtasks:
- Implement all interface methods
- Implement bulk insert for efficiency
- Handle node_key uniqueness per revision
- Write integration tests
Bulk Insert Optimization:
public async Task BulkInsertAsync(
Guid graphRevisionId,
IEnumerable<GraphNode> nodes,
CancellationToken ct)
{
await using var connection = await _dataSource.OpenConnectionAsync("system", ct);
await using var writer = await connection.BeginBinaryImportAsync(
"COPY vex.graph_nodes (graph_revision_id, node_key, node_type, purl, name, version, attributes) " +
"FROM STDIN (FORMAT BINARY)", ct);
foreach (var node in nodes)
{
await writer.StartRowAsync(ct);
await writer.WriteAsync(graphRevisionId, ct);
await writer.WriteAsync(node.NodeKey, ct);
await writer.WriteAsync(node.NodeType, ct);
await writer.WriteAsync(node.Purl, NpgsqlDbType.Text, ct);
await writer.WriteAsync(node.Name, NpgsqlDbType.Text, ct);
await writer.WriteAsync(node.Version, NpgsqlDbType.Text, ct);
await writer.WriteAsync(JsonSerializer.Serialize(node.Attributes), NpgsqlDbType.Jsonb, ct);
}
await writer.CompleteAsync(ct);
}
T6b.3: Implement Graph Edge Repository
Status: TODO Estimate: 1.5 days
Interface:
public interface IGraphEdgeRepository
{
Task<IReadOnlyList<GraphEdge>> GetByRevisionAsync(
Guid graphRevisionId, CancellationToken ct);
Task<IReadOnlyList<GraphEdge>> GetOutgoingAsync(
long fromNodeId, CancellationToken ct);
Task<IReadOnlyList<GraphEdge>> GetIncomingAsync(
long toNodeId, CancellationToken ct);
Task BulkInsertAsync(
Guid graphRevisionId, IEnumerable<GraphEdge> edges, CancellationToken ct);
Task<int> GetCountAsync(Guid graphRevisionId, CancellationToken ct);
}
Subtasks:
- Implement all interface methods
- Implement bulk insert for efficiency
- Optimize for traversal queries
- Write integration tests
T6b.4: Verify Graph Revision ID Stability
Status: TODO Estimate: 1 day
Description: Critical: Same SBOM + feeds + policy must produce identical revision_id.
Subtasks:
- Document revision_id computation algorithm
- Verify nodes are inserted in deterministic order
- Verify edges are inserted in deterministic order
- Write stability tests
Stability Test:
[Fact]
public async Task Same_Inputs_Should_Produce_Same_RevisionId()
{
var sbom = await LoadSbom("testdata/stable-sbom.json");
var feedSnapshot = "feed-v1.2.3";
var policyVersion = "policy-v1.0";
// Compute multiple times
var revisions = new List<string>();
for (int i = 0; i < 5; i++)
{
var graph = await _graphService.ComputeGraphAsync(
sbom, feedSnapshot, policyVersion);
revisions.Add(graph.RevisionId);
}
// All must be identical
revisions.Distinct().Should().HaveCount(1);
}
Sprint 6c: Migration & Verification
T6c.1: Build Graph Conversion Service
Status: TODO Estimate: 1.5 days
Description: Convert existing MongoDB graphs to PostgreSQL.
Subtasks:
- Parse MongoDB graph documents
- Map to graph_revisions table
- Extract and insert nodes
- Extract and insert edges
- Verify node/edge counts match
T6c.2: Build VEX Conversion Service
Status: TODO Estimate: 1 day
Subtasks:
- Parse MongoDB VEX statements
- Map to vex.statements table
- Preserve provenance
- Preserve evidence
T6c.3: Run Dual Pipeline Comparison
Status: TODO Estimate: 2 days
Description: Run graph computation on both backends and compare.
Subtasks:
- Select sample projects
- Compute graphs with MongoDB
- Compute graphs with PostgreSQL
- Compare revision_ids (must match)
- Compare node counts
- Compare edge counts
- Compare VEX statements
- Document any differences
T6c.4: Migrate Projects
Status: TODO Estimate: 1 day
Subtasks:
- Identify projects to migrate (active VEX)
- Run conversion for each project
- Verify latest graph revision
- Verify VEX statements
T6c.5: Switch to PostgreSQL-Only
Status: TODO Estimate: 0.5 days
Subtasks:
- Update configuration
- Deploy to staging
- Run full test suite
- Deploy to production
- Monitor metrics
Exit Criteria
- All repository interfaces implemented
- Graph storage working efficiently
- Graph revision IDs stable (deterministic)
- VEX statements preserved correctly
- All comparison tests pass
- Excititor running on PostgreSQL in production
Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Revision ID instability | Medium | Critical | Deterministic ordering tests |
| Graph storage performance | Medium | High | Bulk insert, index optimization |
| VEX lattice logic errors | Low | High | Extensive comparison testing |
Performance Considerations
Graph Storage
- Use
BIGSERIALfor node/edge IDs (high volume) - Use
COPYfor bulk inserts (10-100x faster) - Index
(graph_revision_id, node_key)for lookups - Index
(from_node_id)and(to_node_id)for traversal
Estimated Volumes
| Table | Estimated Rows per Project | Total Estimated |
|---|---|---|
| graph_nodes | 1,000 - 50,000 | 10M+ |
| graph_edges | 2,000 - 100,000 | 20M+ |
| vex_statements | 100 - 5,000 | 1M+ |
Phase Version: 1.0.0 Last Updated: 2025-11-28