Files
git.stella-ops.org/docs/db/tasks/PHASE_6_VEX_GRAPH.md
StellaOps Bot 2548abc56f
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
up
2025-11-29 01:35:49 +02:00

10 KiB

Phase 6: VEX & Graph Conversion (Excititor)

Sprint: 8-10 Duration: 2-3 sprints Status: TODO Dependencies: Phase 5 (Vulnerabilities)


Objectives

  1. Create StellaOps.Excititor.Storage.Postgres project
  2. Implement VEX schema in PostgreSQL
  3. Handle graph nodes/edges efficiently
  4. Preserve graph_revision_id stability (determinism critical)
  5. Maintain VEX statement lattice logic

Deliverables

Deliverable Acceptance Criteria
VEX schema All tables created with indexes
Graph storage Nodes/edges efficiently stored
Statement storage VEX statements with full provenance
Revision stability Same inputs produce same revision_id
Integration tests 100% coverage

Schema Reference

See SPECIFICATION.md Section 5.3 for complete VEX schema.

Tables:

  • vex.projects
  • vex.graph_revisions
  • vex.graph_nodes
  • vex.graph_edges
  • vex.statements
  • vex.observations
  • vex.linksets
  • vex.linkset_events
  • vex.consensus
  • vex.consensus_holds
  • vex.unknowns_snapshots
  • vex.unknown_items
  • vex.evidence_manifests
  • vex.cvss_receipts
  • vex.attestations
  • vex.timeline_events

Sprint 6a: Core Schema & Repositories

T6a.1: Create Excititor.Storage.Postgres Project

Status: TODO Estimate: 0.5 days

Subtasks:

  • Create project structure
  • Add NuGet references
  • Create ExcititorDataSource class
  • Create ServiceCollectionExtensions.cs

T6a.2: Implement Schema Migrations

Status: TODO Estimate: 1.5 days

Subtasks:

  • Create schema migration
  • Include all tables
  • Add indexes for graph traversal
  • Add indexes for VEX lookups
  • Test migration idempotency

T6a.3: Implement Project Repository

Status: TODO Estimate: 0.5 days

Subtasks:

  • Implement CRUD operations
  • Handle tenant scoping
  • Write integration tests

T6a.4: Implement VEX Statement Repository

Status: TODO Estimate: 1.5 days

Interface:

public interface IVexStatementRepository
{
    Task<VexStatement?> GetAsync(string tenantId, Guid statementId, CancellationToken ct);
    Task<IReadOnlyList<VexStatement>> GetByVulnerabilityAsync(
        string tenantId, string vulnerabilityId, CancellationToken ct);
    Task<IReadOnlyList<VexStatement>> GetByProjectAsync(
        string tenantId, Guid projectId, CancellationToken ct);
    Task<VexStatement> UpsertAsync(VexStatement statement, CancellationToken ct);
    Task<IReadOnlyList<VexStatement>> GetByGraphRevisionAsync(
        Guid graphRevisionId, CancellationToken ct);
}

Subtasks:

  • Implement all interface methods
  • Handle status and justification enums
  • Preserve evidence JSONB
  • Preserve provenance JSONB
  • Write integration tests

T6a.5: Implement VEX Observation Repository

Status: TODO Estimate: 1 day

Subtasks:

  • Implement CRUD operations
  • Handle unique constraint on composite key
  • Implement FindByVulnerabilityAndProductAsync
  • Write integration tests

T6a.6: Implement Linkset Repository

Status: TODO Estimate: 0.5 days

Subtasks:

  • Implement CRUD operations
  • Implement event logging
  • Write integration tests

T6a.7: Implement Consensus Repository

Status: TODO Estimate: 0.5 days

Subtasks:

  • Implement CRUD operations
  • Implement hold management
  • Write integration tests

Sprint 6b: Graph Storage

T6b.1: Implement Graph Revision Repository

Status: TODO Estimate: 1 day

Interface:

public interface IGraphRevisionRepository
{
    Task<GraphRevision?> GetByIdAsync(Guid id, CancellationToken ct);
    Task<GraphRevision?> GetByRevisionIdAsync(string revisionId, CancellationToken ct);
    Task<GraphRevision?> GetLatestByProjectAsync(Guid projectId, CancellationToken ct);
    Task<GraphRevision> CreateAsync(GraphRevision revision, CancellationToken ct);
    Task<IReadOnlyList<GraphRevision>> GetHistoryAsync(
        Guid projectId, int limit, CancellationToken ct);
}

Subtasks:

  • Implement all interface methods
  • Handle revision_id uniqueness
  • Handle parent_revision_id linking
  • Write integration tests

T6b.2: Implement Graph Node Repository

Status: TODO Estimate: 1.5 days

Interface:

public interface IGraphNodeRepository
{
    Task<GraphNode?> GetByIdAsync(long nodeId, CancellationToken ct);
    Task<GraphNode?> GetByKeyAsync(Guid graphRevisionId, string nodeKey, CancellationToken ct);
    Task<IReadOnlyList<GraphNode>> GetByRevisionAsync(
        Guid graphRevisionId, CancellationToken ct);
    Task BulkInsertAsync(
        Guid graphRevisionId, IEnumerable<GraphNode> nodes, CancellationToken ct);
    Task<int> GetCountAsync(Guid graphRevisionId, CancellationToken ct);
}

Subtasks:

  • Implement all interface methods
  • Implement bulk insert for efficiency
  • Handle node_key uniqueness per revision
  • Write integration tests

Bulk Insert Optimization:

public async Task BulkInsertAsync(
    Guid graphRevisionId,
    IEnumerable<GraphNode> nodes,
    CancellationToken ct)
{
    await using var connection = await _dataSource.OpenConnectionAsync("system", ct);
    await using var writer = await connection.BeginBinaryImportAsync(
        "COPY vex.graph_nodes (graph_revision_id, node_key, node_type, purl, name, version, attributes) " +
        "FROM STDIN (FORMAT BINARY)", ct);

    foreach (var node in nodes)
    {
        await writer.StartRowAsync(ct);
        await writer.WriteAsync(graphRevisionId, ct);
        await writer.WriteAsync(node.NodeKey, ct);
        await writer.WriteAsync(node.NodeType, ct);
        await writer.WriteAsync(node.Purl, NpgsqlDbType.Text, ct);
        await writer.WriteAsync(node.Name, NpgsqlDbType.Text, ct);
        await writer.WriteAsync(node.Version, NpgsqlDbType.Text, ct);
        await writer.WriteAsync(JsonSerializer.Serialize(node.Attributes), NpgsqlDbType.Jsonb, ct);
    }

    await writer.CompleteAsync(ct);
}

T6b.3: Implement Graph Edge Repository

Status: TODO Estimate: 1.5 days

Interface:

public interface IGraphEdgeRepository
{
    Task<IReadOnlyList<GraphEdge>> GetByRevisionAsync(
        Guid graphRevisionId, CancellationToken ct);
    Task<IReadOnlyList<GraphEdge>> GetOutgoingAsync(
        long fromNodeId, CancellationToken ct);
    Task<IReadOnlyList<GraphEdge>> GetIncomingAsync(
        long toNodeId, CancellationToken ct);
    Task BulkInsertAsync(
        Guid graphRevisionId, IEnumerable<GraphEdge> edges, CancellationToken ct);
    Task<int> GetCountAsync(Guid graphRevisionId, CancellationToken ct);
}

Subtasks:

  • Implement all interface methods
  • Implement bulk insert for efficiency
  • Optimize for traversal queries
  • Write integration tests

T6b.4: Verify Graph Revision ID Stability

Status: TODO Estimate: 1 day

Description: Critical: Same SBOM + feeds + policy must produce identical revision_id.

Subtasks:

  • Document revision_id computation algorithm
  • Verify nodes are inserted in deterministic order
  • Verify edges are inserted in deterministic order
  • Write stability tests

Stability Test:

[Fact]
public async Task Same_Inputs_Should_Produce_Same_RevisionId()
{
    var sbom = await LoadSbom("testdata/stable-sbom.json");
    var feedSnapshot = "feed-v1.2.3";
    var policyVersion = "policy-v1.0";

    // Compute multiple times
    var revisions = new List<string>();
    for (int i = 0; i < 5; i++)
    {
        var graph = await _graphService.ComputeGraphAsync(
            sbom, feedSnapshot, policyVersion);
        revisions.Add(graph.RevisionId);
    }

    // All must be identical
    revisions.Distinct().Should().HaveCount(1);
}

Sprint 6c: Migration & Verification

T6c.1: Build Graph Conversion Service

Status: TODO Estimate: 1.5 days

Description: Convert existing MongoDB graphs to PostgreSQL.

Subtasks:

  • Parse MongoDB graph documents
  • Map to graph_revisions table
  • Extract and insert nodes
  • Extract and insert edges
  • Verify node/edge counts match

T6c.2: Build VEX Conversion Service

Status: TODO Estimate: 1 day

Subtasks:

  • Parse MongoDB VEX statements
  • Map to vex.statements table
  • Preserve provenance
  • Preserve evidence

T6c.3: Run Dual Pipeline Comparison

Status: TODO Estimate: 2 days

Description: Run graph computation on both backends and compare.

Subtasks:

  • Select sample projects
  • Compute graphs with MongoDB
  • Compute graphs with PostgreSQL
  • Compare revision_ids (must match)
  • Compare node counts
  • Compare edge counts
  • Compare VEX statements
  • Document any differences

T6c.4: Migrate Projects

Status: TODO Estimate: 1 day

Subtasks:

  • Identify projects to migrate (active VEX)
  • Run conversion for each project
  • Verify latest graph revision
  • Verify VEX statements

T6c.5: Switch to PostgreSQL-Only

Status: TODO Estimate: 0.5 days

Subtasks:

  • Update configuration
  • Deploy to staging
  • Run full test suite
  • Deploy to production
  • Monitor metrics

Exit Criteria

  • All repository interfaces implemented
  • Graph storage working efficiently
  • Graph revision IDs stable (deterministic)
  • VEX statements preserved correctly
  • All comparison tests pass
  • Excititor running on PostgreSQL in production

Risks & Mitigations

Risk Likelihood Impact Mitigation
Revision ID instability Medium Critical Deterministic ordering tests
Graph storage performance Medium High Bulk insert, index optimization
VEX lattice logic errors Low High Extensive comparison testing

Performance Considerations

Graph Storage

  • Use BIGSERIAL for node/edge IDs (high volume)
  • Use COPY for bulk inserts (10-100x faster)
  • Index (graph_revision_id, node_key) for lookups
  • Index (from_node_id) and (to_node_id) for traversal

Estimated Volumes

Table Estimated Rows per Project Total Estimated
graph_nodes 1,000 - 50,000 10M+
graph_edges 2,000 - 100,000 20M+
vex_statements 100 - 5,000 1M+

Phase Version: 1.0.0 Last Updated: 2025-11-28