up
This commit is contained in:
434
docs/db/tasks/PHASE_6_VEX_GRAPH.md
Normal file
434
docs/db/tasks/PHASE_6_VEX_GRAPH.md
Normal file
@@ -0,0 +1,434 @@
|
||||
# Phase 6: VEX & Graph Conversion (Excititor)
|
||||
|
||||
**Sprint:** 8-10
|
||||
**Duration:** 2-3 sprints
|
||||
**Status:** TODO
|
||||
**Dependencies:** Phase 5 (Vulnerabilities)
|
||||
|
||||
---
|
||||
|
||||
## Objectives
|
||||
|
||||
1. Create `StellaOps.Excititor.Storage.Postgres` project
|
||||
2. Implement VEX schema in PostgreSQL
|
||||
3. Handle graph nodes/edges efficiently
|
||||
4. Preserve graph_revision_id stability (determinism critical)
|
||||
5. Maintain VEX statement lattice logic
|
||||
|
||||
---
|
||||
|
||||
## Deliverables
|
||||
|
||||
| Deliverable | Acceptance Criteria |
|
||||
|-------------|---------------------|
|
||||
| VEX schema | All tables created with indexes |
|
||||
| Graph storage | Nodes/edges efficiently stored |
|
||||
| Statement storage | VEX statements with full provenance |
|
||||
| Revision stability | Same inputs produce same revision_id |
|
||||
| Integration tests | 100% coverage |
|
||||
|
||||
---
|
||||
|
||||
## Schema Reference
|
||||
|
||||
See [SPECIFICATION.md](../SPECIFICATION.md) Section 5.3 for complete VEX schema.
|
||||
|
||||
**Tables:**
|
||||
- `vex.projects`
|
||||
- `vex.graph_revisions`
|
||||
- `vex.graph_nodes`
|
||||
- `vex.graph_edges`
|
||||
- `vex.statements`
|
||||
- `vex.observations`
|
||||
- `vex.linksets`
|
||||
- `vex.linkset_events`
|
||||
- `vex.consensus`
|
||||
- `vex.consensus_holds`
|
||||
- `vex.unknowns_snapshots`
|
||||
- `vex.unknown_items`
|
||||
- `vex.evidence_manifests`
|
||||
- `vex.cvss_receipts`
|
||||
- `vex.attestations`
|
||||
- `vex.timeline_events`
|
||||
|
||||
---
|
||||
|
||||
## Sprint 6a: Core Schema & Repositories
|
||||
|
||||
### T6a.1: Create Excititor.Storage.Postgres Project
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 0.5 days
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Create project structure
|
||||
- [ ] Add NuGet references
|
||||
- [ ] Create `ExcititorDataSource` class
|
||||
- [ ] Create `ServiceCollectionExtensions.cs`
|
||||
|
||||
---
|
||||
|
||||
### T6a.2: Implement Schema Migrations
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 1.5 days
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Create schema migration
|
||||
- [ ] Include all tables
|
||||
- [ ] Add indexes for graph traversal
|
||||
- [ ] Add indexes for VEX lookups
|
||||
- [ ] Test migration idempotency
|
||||
|
||||
---
|
||||
|
||||
### T6a.3: Implement Project Repository
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 0.5 days
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Implement CRUD operations
|
||||
- [ ] Handle tenant scoping
|
||||
- [ ] Write integration tests
|
||||
|
||||
---
|
||||
|
||||
### T6a.4: Implement VEX Statement Repository
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 1.5 days
|
||||
|
||||
**Interface:**
|
||||
```csharp
|
||||
public interface IVexStatementRepository
|
||||
{
|
||||
Task<VexStatement?> GetAsync(string tenantId, Guid statementId, CancellationToken ct);
|
||||
Task<IReadOnlyList<VexStatement>> GetByVulnerabilityAsync(
|
||||
string tenantId, string vulnerabilityId, CancellationToken ct);
|
||||
Task<IReadOnlyList<VexStatement>> GetByProjectAsync(
|
||||
string tenantId, Guid projectId, CancellationToken ct);
|
||||
Task<VexStatement> UpsertAsync(VexStatement statement, CancellationToken ct);
|
||||
Task<IReadOnlyList<VexStatement>> GetByGraphRevisionAsync(
|
||||
Guid graphRevisionId, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Implement all interface methods
|
||||
- [ ] Handle status and justification enums
|
||||
- [ ] Preserve evidence JSONB
|
||||
- [ ] Preserve provenance JSONB
|
||||
- [ ] Write integration tests
|
||||
|
||||
---
|
||||
|
||||
### T6a.5: Implement VEX Observation Repository
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 1 day
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Implement CRUD operations
|
||||
- [ ] Handle unique constraint on composite key
|
||||
- [ ] Implement FindByVulnerabilityAndProductAsync
|
||||
- [ ] Write integration tests
|
||||
|
||||
---
|
||||
|
||||
### T6a.6: Implement Linkset Repository
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 0.5 days
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Implement CRUD operations
|
||||
- [ ] Implement event logging
|
||||
- [ ] Write integration tests
|
||||
|
||||
---
|
||||
|
||||
### T6a.7: Implement Consensus Repository
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 0.5 days
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Implement CRUD operations
|
||||
- [ ] Implement hold management
|
||||
- [ ] Write integration tests
|
||||
|
||||
---
|
||||
|
||||
## Sprint 6b: Graph Storage
|
||||
|
||||
### T6b.1: Implement Graph Revision Repository
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 1 day
|
||||
|
||||
**Interface:**
|
||||
```csharp
|
||||
public interface IGraphRevisionRepository
|
||||
{
|
||||
Task<GraphRevision?> GetByIdAsync(Guid id, CancellationToken ct);
|
||||
Task<GraphRevision?> GetByRevisionIdAsync(string revisionId, CancellationToken ct);
|
||||
Task<GraphRevision?> GetLatestByProjectAsync(Guid projectId, CancellationToken ct);
|
||||
Task<GraphRevision> CreateAsync(GraphRevision revision, CancellationToken ct);
|
||||
Task<IReadOnlyList<GraphRevision>> GetHistoryAsync(
|
||||
Guid projectId, int limit, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Implement all interface methods
|
||||
- [ ] Handle revision_id uniqueness
|
||||
- [ ] Handle parent_revision_id linking
|
||||
- [ ] Write integration tests
|
||||
|
||||
---
|
||||
|
||||
### T6b.2: Implement Graph Node Repository
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 1.5 days
|
||||
|
||||
**Interface:**
|
||||
```csharp
|
||||
public interface IGraphNodeRepository
|
||||
{
|
||||
Task<GraphNode?> GetByIdAsync(long nodeId, CancellationToken ct);
|
||||
Task<GraphNode?> GetByKeyAsync(Guid graphRevisionId, string nodeKey, CancellationToken ct);
|
||||
Task<IReadOnlyList<GraphNode>> GetByRevisionAsync(
|
||||
Guid graphRevisionId, CancellationToken ct);
|
||||
Task BulkInsertAsync(
|
||||
Guid graphRevisionId, IEnumerable<GraphNode> nodes, CancellationToken ct);
|
||||
Task<int> GetCountAsync(Guid graphRevisionId, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Implement all interface methods
|
||||
- [ ] Implement bulk insert for efficiency
|
||||
- [ ] Handle node_key uniqueness per revision
|
||||
- [ ] Write integration tests
|
||||
|
||||
**Bulk Insert Optimization:**
|
||||
```csharp
|
||||
public async Task BulkInsertAsync(
|
||||
Guid graphRevisionId,
|
||||
IEnumerable<GraphNode> nodes,
|
||||
CancellationToken ct)
|
||||
{
|
||||
await using var connection = await _dataSource.OpenConnectionAsync("system", ct);
|
||||
await using var writer = await connection.BeginBinaryImportAsync(
|
||||
"COPY vex.graph_nodes (graph_revision_id, node_key, node_type, purl, name, version, attributes) " +
|
||||
"FROM STDIN (FORMAT BINARY)", ct);
|
||||
|
||||
foreach (var node in nodes)
|
||||
{
|
||||
await writer.StartRowAsync(ct);
|
||||
await writer.WriteAsync(graphRevisionId, ct);
|
||||
await writer.WriteAsync(node.NodeKey, ct);
|
||||
await writer.WriteAsync(node.NodeType, ct);
|
||||
await writer.WriteAsync(node.Purl, NpgsqlDbType.Text, ct);
|
||||
await writer.WriteAsync(node.Name, NpgsqlDbType.Text, ct);
|
||||
await writer.WriteAsync(node.Version, NpgsqlDbType.Text, ct);
|
||||
await writer.WriteAsync(JsonSerializer.Serialize(node.Attributes), NpgsqlDbType.Jsonb, ct);
|
||||
}
|
||||
|
||||
await writer.CompleteAsync(ct);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### T6b.3: Implement Graph Edge Repository
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 1.5 days
|
||||
|
||||
**Interface:**
|
||||
```csharp
|
||||
public interface IGraphEdgeRepository
|
||||
{
|
||||
Task<IReadOnlyList<GraphEdge>> GetByRevisionAsync(
|
||||
Guid graphRevisionId, CancellationToken ct);
|
||||
Task<IReadOnlyList<GraphEdge>> GetOutgoingAsync(
|
||||
long fromNodeId, CancellationToken ct);
|
||||
Task<IReadOnlyList<GraphEdge>> GetIncomingAsync(
|
||||
long toNodeId, CancellationToken ct);
|
||||
Task BulkInsertAsync(
|
||||
Guid graphRevisionId, IEnumerable<GraphEdge> edges, CancellationToken ct);
|
||||
Task<int> GetCountAsync(Guid graphRevisionId, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Implement all interface methods
|
||||
- [ ] Implement bulk insert for efficiency
|
||||
- [ ] Optimize for traversal queries
|
||||
- [ ] Write integration tests
|
||||
|
||||
---
|
||||
|
||||
### T6b.4: Verify Graph Revision ID Stability
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 1 day
|
||||
|
||||
**Description:**
|
||||
Critical: Same SBOM + feeds + policy must produce identical revision_id.
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Document revision_id computation algorithm
|
||||
- [ ] Verify nodes are inserted in deterministic order
|
||||
- [ ] Verify edges are inserted in deterministic order
|
||||
- [ ] Write stability tests
|
||||
|
||||
**Stability Test:**
|
||||
```csharp
|
||||
[Fact]
|
||||
public async Task Same_Inputs_Should_Produce_Same_RevisionId()
|
||||
{
|
||||
var sbom = await LoadSbom("testdata/stable-sbom.json");
|
||||
var feedSnapshot = "feed-v1.2.3";
|
||||
var policyVersion = "policy-v1.0";
|
||||
|
||||
// Compute multiple times
|
||||
var revisions = new List<string>();
|
||||
for (int i = 0; i < 5; i++)
|
||||
{
|
||||
var graph = await _graphService.ComputeGraphAsync(
|
||||
sbom, feedSnapshot, policyVersion);
|
||||
revisions.Add(graph.RevisionId);
|
||||
}
|
||||
|
||||
// All must be identical
|
||||
revisions.Distinct().Should().HaveCount(1);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sprint 6c: Migration & Verification
|
||||
|
||||
### T6c.1: Build Graph Conversion Service
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 1.5 days
|
||||
|
||||
**Description:**
|
||||
Convert existing MongoDB graphs to PostgreSQL.
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Parse MongoDB graph documents
|
||||
- [ ] Map to graph_revisions table
|
||||
- [ ] Extract and insert nodes
|
||||
- [ ] Extract and insert edges
|
||||
- [ ] Verify node/edge counts match
|
||||
|
||||
---
|
||||
|
||||
### T6c.2: Build VEX Conversion Service
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 1 day
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Parse MongoDB VEX statements
|
||||
- [ ] Map to vex.statements table
|
||||
- [ ] Preserve provenance
|
||||
- [ ] Preserve evidence
|
||||
|
||||
---
|
||||
|
||||
### T6c.3: Run Dual Pipeline Comparison
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 2 days
|
||||
|
||||
**Description:**
|
||||
Run graph computation on both backends and compare.
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Select sample projects
|
||||
- [ ] Compute graphs with MongoDB
|
||||
- [ ] Compute graphs with PostgreSQL
|
||||
- [ ] Compare revision_ids (must match)
|
||||
- [ ] Compare node counts
|
||||
- [ ] Compare edge counts
|
||||
- [ ] Compare VEX statements
|
||||
- [ ] Document any differences
|
||||
|
||||
---
|
||||
|
||||
### T6c.4: Migrate Projects
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 1 day
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Identify projects to migrate (active VEX)
|
||||
- [ ] Run conversion for each project
|
||||
- [ ] Verify latest graph revision
|
||||
- [ ] Verify VEX statements
|
||||
|
||||
---
|
||||
|
||||
### T6c.5: Switch to PostgreSQL-Only
|
||||
|
||||
**Status:** TODO
|
||||
**Estimate:** 0.5 days
|
||||
|
||||
**Subtasks:**
|
||||
- [ ] Update configuration
|
||||
- [ ] Deploy to staging
|
||||
- [ ] Run full test suite
|
||||
- [ ] Deploy to production
|
||||
- [ ] Monitor metrics
|
||||
|
||||
---
|
||||
|
||||
## Exit Criteria
|
||||
|
||||
- [ ] All repository interfaces implemented
|
||||
- [ ] Graph storage working efficiently
|
||||
- [ ] Graph revision IDs stable (deterministic)
|
||||
- [ ] VEX statements preserved correctly
|
||||
- [ ] All comparison tests pass
|
||||
- [ ] Excititor running on PostgreSQL in production
|
||||
|
||||
---
|
||||
|
||||
## Risks & Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|------------|--------|------------|
|
||||
| Revision ID instability | Medium | Critical | Deterministic ordering tests |
|
||||
| Graph storage performance | Medium | High | Bulk insert, index optimization |
|
||||
| VEX lattice logic errors | Low | High | Extensive comparison testing |
|
||||
|
||||
---
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Graph Storage
|
||||
|
||||
- Use `BIGSERIAL` for node/edge IDs (high volume)
|
||||
- Use `COPY` for bulk inserts (10-100x faster)
|
||||
- Index `(graph_revision_id, node_key)` for lookups
|
||||
- Index `(from_node_id)` and `(to_node_id)` for traversal
|
||||
|
||||
### Estimated Volumes
|
||||
|
||||
| Table | Estimated Rows per Project | Total Estimated |
|
||||
|-------|---------------------------|-----------------|
|
||||
| graph_nodes | 1,000 - 50,000 | 10M+ |
|
||||
| graph_edges | 2,000 - 100,000 | 20M+ |
|
||||
| vex_statements | 100 - 5,000 | 1M+ |
|
||||
|
||||
---
|
||||
|
||||
*Phase Version: 1.0.0*
|
||||
*Last Updated: 2025-11-28*
|
||||
Reference in New Issue
Block a user