up
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled

This commit is contained in:
StellaOps Bot
2025-11-28 20:55:22 +02:00
parent d040c001ac
commit 2548abc56f
231 changed files with 47468 additions and 68 deletions

View File

@@ -0,0 +1,434 @@
# Phase 6: VEX & Graph Conversion (Excititor)
**Sprint:** 8-10
**Duration:** 2-3 sprints
**Status:** TODO
**Dependencies:** Phase 5 (Vulnerabilities)
---
## Objectives
1. Create `StellaOps.Excititor.Storage.Postgres` project
2. Implement VEX schema in PostgreSQL
3. Handle graph nodes/edges efficiently
4. Preserve graph_revision_id stability (determinism critical)
5. Maintain VEX statement lattice logic
---
## Deliverables
| Deliverable | Acceptance Criteria |
|-------------|---------------------|
| VEX schema | All tables created with indexes |
| Graph storage | Nodes/edges efficiently stored |
| Statement storage | VEX statements with full provenance |
| Revision stability | Same inputs produce same revision_id |
| Integration tests | 100% coverage |
---
## Schema Reference
See [SPECIFICATION.md](../SPECIFICATION.md) Section 5.3 for complete VEX schema.
**Tables:**
- `vex.projects`
- `vex.graph_revisions`
- `vex.graph_nodes`
- `vex.graph_edges`
- `vex.statements`
- `vex.observations`
- `vex.linksets`
- `vex.linkset_events`
- `vex.consensus`
- `vex.consensus_holds`
- `vex.unknowns_snapshots`
- `vex.unknown_items`
- `vex.evidence_manifests`
- `vex.cvss_receipts`
- `vex.attestations`
- `vex.timeline_events`
---
## Sprint 6a: Core Schema & Repositories
### T6a.1: Create Excititor.Storage.Postgres Project
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Create project structure
- [ ] Add NuGet references
- [ ] Create `ExcititorDataSource` class
- [ ] Create `ServiceCollectionExtensions.cs`
---
### T6a.2: Implement Schema Migrations
**Status:** TODO
**Estimate:** 1.5 days
**Subtasks:**
- [ ] Create schema migration
- [ ] Include all tables
- [ ] Add indexes for graph traversal
- [ ] Add indexes for VEX lookups
- [ ] Test migration idempotency
---
### T6a.3: Implement Project Repository
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Handle tenant scoping
- [ ] Write integration tests
---
### T6a.4: Implement VEX Statement Repository
**Status:** TODO
**Estimate:** 1.5 days
**Interface:**
```csharp
public interface IVexStatementRepository
{
Task<VexStatement?> GetAsync(string tenantId, Guid statementId, CancellationToken ct);
Task<IReadOnlyList<VexStatement>> GetByVulnerabilityAsync(
string tenantId, string vulnerabilityId, CancellationToken ct);
Task<IReadOnlyList<VexStatement>> GetByProjectAsync(
string tenantId, Guid projectId, CancellationToken ct);
Task<VexStatement> UpsertAsync(VexStatement statement, CancellationToken ct);
Task<IReadOnlyList<VexStatement>> GetByGraphRevisionAsync(
Guid graphRevisionId, CancellationToken ct);
}
```
**Subtasks:**
- [ ] Implement all interface methods
- [ ] Handle status and justification enums
- [ ] Preserve evidence JSONB
- [ ] Preserve provenance JSONB
- [ ] Write integration tests
---
### T6a.5: Implement VEX Observation Repository
**Status:** TODO
**Estimate:** 1 day
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Handle unique constraint on composite key
- [ ] Implement FindByVulnerabilityAndProductAsync
- [ ] Write integration tests
---
### T6a.6: Implement Linkset Repository
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Implement event logging
- [ ] Write integration tests
---
### T6a.7: Implement Consensus Repository
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Implement CRUD operations
- [ ] Implement hold management
- [ ] Write integration tests
---
## Sprint 6b: Graph Storage
### T6b.1: Implement Graph Revision Repository
**Status:** TODO
**Estimate:** 1 day
**Interface:**
```csharp
public interface IGraphRevisionRepository
{
Task<GraphRevision?> GetByIdAsync(Guid id, CancellationToken ct);
Task<GraphRevision?> GetByRevisionIdAsync(string revisionId, CancellationToken ct);
Task<GraphRevision?> GetLatestByProjectAsync(Guid projectId, CancellationToken ct);
Task<GraphRevision> CreateAsync(GraphRevision revision, CancellationToken ct);
Task<IReadOnlyList<GraphRevision>> GetHistoryAsync(
Guid projectId, int limit, CancellationToken ct);
}
```
**Subtasks:**
- [ ] Implement all interface methods
- [ ] Handle revision_id uniqueness
- [ ] Handle parent_revision_id linking
- [ ] Write integration tests
---
### T6b.2: Implement Graph Node Repository
**Status:** TODO
**Estimate:** 1.5 days
**Interface:**
```csharp
public interface IGraphNodeRepository
{
Task<GraphNode?> GetByIdAsync(long nodeId, CancellationToken ct);
Task<GraphNode?> GetByKeyAsync(Guid graphRevisionId, string nodeKey, CancellationToken ct);
Task<IReadOnlyList<GraphNode>> GetByRevisionAsync(
Guid graphRevisionId, CancellationToken ct);
Task BulkInsertAsync(
Guid graphRevisionId, IEnumerable<GraphNode> nodes, CancellationToken ct);
Task<int> GetCountAsync(Guid graphRevisionId, CancellationToken ct);
}
```
**Subtasks:**
- [ ] Implement all interface methods
- [ ] Implement bulk insert for efficiency
- [ ] Handle node_key uniqueness per revision
- [ ] Write integration tests
**Bulk Insert Optimization:**
```csharp
public async Task BulkInsertAsync(
Guid graphRevisionId,
IEnumerable<GraphNode> nodes,
CancellationToken ct)
{
await using var connection = await _dataSource.OpenConnectionAsync("system", ct);
await using var writer = await connection.BeginBinaryImportAsync(
"COPY vex.graph_nodes (graph_revision_id, node_key, node_type, purl, name, version, attributes) " +
"FROM STDIN (FORMAT BINARY)", ct);
foreach (var node in nodes)
{
await writer.StartRowAsync(ct);
await writer.WriteAsync(graphRevisionId, ct);
await writer.WriteAsync(node.NodeKey, ct);
await writer.WriteAsync(node.NodeType, ct);
await writer.WriteAsync(node.Purl, NpgsqlDbType.Text, ct);
await writer.WriteAsync(node.Name, NpgsqlDbType.Text, ct);
await writer.WriteAsync(node.Version, NpgsqlDbType.Text, ct);
await writer.WriteAsync(JsonSerializer.Serialize(node.Attributes), NpgsqlDbType.Jsonb, ct);
}
await writer.CompleteAsync(ct);
}
```
---
### T6b.3: Implement Graph Edge Repository
**Status:** TODO
**Estimate:** 1.5 days
**Interface:**
```csharp
public interface IGraphEdgeRepository
{
Task<IReadOnlyList<GraphEdge>> GetByRevisionAsync(
Guid graphRevisionId, CancellationToken ct);
Task<IReadOnlyList<GraphEdge>> GetOutgoingAsync(
long fromNodeId, CancellationToken ct);
Task<IReadOnlyList<GraphEdge>> GetIncomingAsync(
long toNodeId, CancellationToken ct);
Task BulkInsertAsync(
Guid graphRevisionId, IEnumerable<GraphEdge> edges, CancellationToken ct);
Task<int> GetCountAsync(Guid graphRevisionId, CancellationToken ct);
}
```
**Subtasks:**
- [ ] Implement all interface methods
- [ ] Implement bulk insert for efficiency
- [ ] Optimize for traversal queries
- [ ] Write integration tests
---
### T6b.4: Verify Graph Revision ID Stability
**Status:** TODO
**Estimate:** 1 day
**Description:**
Critical: Same SBOM + feeds + policy must produce identical revision_id.
**Subtasks:**
- [ ] Document revision_id computation algorithm
- [ ] Verify nodes are inserted in deterministic order
- [ ] Verify edges are inserted in deterministic order
- [ ] Write stability tests
**Stability Test:**
```csharp
[Fact]
public async Task Same_Inputs_Should_Produce_Same_RevisionId()
{
var sbom = await LoadSbom("testdata/stable-sbom.json");
var feedSnapshot = "feed-v1.2.3";
var policyVersion = "policy-v1.0";
// Compute multiple times
var revisions = new List<string>();
for (int i = 0; i < 5; i++)
{
var graph = await _graphService.ComputeGraphAsync(
sbom, feedSnapshot, policyVersion);
revisions.Add(graph.RevisionId);
}
// All must be identical
revisions.Distinct().Should().HaveCount(1);
}
```
---
## Sprint 6c: Migration & Verification
### T6c.1: Build Graph Conversion Service
**Status:** TODO
**Estimate:** 1.5 days
**Description:**
Convert existing MongoDB graphs to PostgreSQL.
**Subtasks:**
- [ ] Parse MongoDB graph documents
- [ ] Map to graph_revisions table
- [ ] Extract and insert nodes
- [ ] Extract and insert edges
- [ ] Verify node/edge counts match
---
### T6c.2: Build VEX Conversion Service
**Status:** TODO
**Estimate:** 1 day
**Subtasks:**
- [ ] Parse MongoDB VEX statements
- [ ] Map to vex.statements table
- [ ] Preserve provenance
- [ ] Preserve evidence
---
### T6c.3: Run Dual Pipeline Comparison
**Status:** TODO
**Estimate:** 2 days
**Description:**
Run graph computation on both backends and compare.
**Subtasks:**
- [ ] Select sample projects
- [ ] Compute graphs with MongoDB
- [ ] Compute graphs with PostgreSQL
- [ ] Compare revision_ids (must match)
- [ ] Compare node counts
- [ ] Compare edge counts
- [ ] Compare VEX statements
- [ ] Document any differences
---
### T6c.4: Migrate Projects
**Status:** TODO
**Estimate:** 1 day
**Subtasks:**
- [ ] Identify projects to migrate (active VEX)
- [ ] Run conversion for each project
- [ ] Verify latest graph revision
- [ ] Verify VEX statements
---
### T6c.5: Switch to PostgreSQL-Only
**Status:** TODO
**Estimate:** 0.5 days
**Subtasks:**
- [ ] Update configuration
- [ ] Deploy to staging
- [ ] Run full test suite
- [ ] Deploy to production
- [ ] Monitor metrics
---
## Exit Criteria
- [ ] All repository interfaces implemented
- [ ] Graph storage working efficiently
- [ ] Graph revision IDs stable (deterministic)
- [ ] VEX statements preserved correctly
- [ ] All comparison tests pass
- [ ] Excititor running on PostgreSQL in production
---
## Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Revision ID instability | Medium | Critical | Deterministic ordering tests |
| Graph storage performance | Medium | High | Bulk insert, index optimization |
| VEX lattice logic errors | Low | High | Extensive comparison testing |
---
## Performance Considerations
### Graph Storage
- Use `BIGSERIAL` for node/edge IDs (high volume)
- Use `COPY` for bulk inserts (10-100x faster)
- Index `(graph_revision_id, node_key)` for lookups
- Index `(from_node_id)` and `(to_node_id)` for traversal
### Estimated Volumes
| Table | Estimated Rows per Project | Total Estimated |
|-------|---------------------------|-----------------|
| graph_nodes | 1,000 - 50,000 | 10M+ |
| graph_edges | 2,000 - 100,000 | 20M+ |
| vex_statements | 100 - 5,000 | 1M+ |
---
*Phase Version: 1.0.0*
*Last Updated: 2025-11-28*