Files
git.stella-ops.org/docs/db/tasks/PHASE_6_VEX_GRAPH.md
StellaOps Bot 6a299d231f
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Add unit tests for Router configuration and transport layers
- Implemented tests for RouterConfig, RoutingOptions, StaticInstanceConfig, and RouterConfigOptions to ensure default values are set correctly.
- Added tests for RouterConfigProvider to validate configurations and ensure defaults are returned when no file is specified.
- Created tests for ConfigValidationResult to check success and error scenarios.
- Developed tests for ServiceCollectionExtensions to verify service registration for RouterConfig.
- Introduced UdpTransportTests to validate serialization, connection, request-response, and error handling in UDP transport.
- Added scripts for signing authority gaps and hashing DevPortal SDK snippets.
2025-12-05 08:01:47 +02:00

9.9 KiB

Phase 6: VEX & Graph Conversion (Excititor)

Sprint: 8-10 Duration: 2-3 sprints Status: DONE Dependencies: Phase 5 (Vulnerabilities); Phase 0 (Foundations) — DONE


Objectives

  1. Create StellaOps.Excititor.Storage.Postgres project
  2. Implement VEX schema in PostgreSQL
  3. Handle graph nodes/edges efficiently
  4. Preserve graph_revision_id stability (determinism critical)
  5. Maintain VEX statement lattice logic

Deliverables

Deliverable Acceptance Criteria
VEX schema All tables created with indexes
Graph storage Nodes/edges efficiently stored
Statement storage VEX statements with full provenance
Revision stability Same inputs produce same revision_id
Integration tests 100% coverage

Schema Reference

See SPECIFICATION.md Section 5.3 for complete VEX schema.

Tables:

  • vex.projects
  • vex.graph_revisions
  • vex.graph_nodes
  • vex.graph_edges
  • vex.statements
  • vex.observations
  • vex.linksets
  • vex.linkset_events
  • vex.consensus
  • vex.consensus_holds
  • vex.unknowns_snapshots
  • vex.unknown_items
  • vex.evidence_manifests
  • vex.cvss_receipts
  • vex.attestations
  • vex.timeline_events

Sprint 6a: Core Schema & Repositories

T6a.1: Create Excititor.Storage.Postgres Project

Status: DONE Estimate: 0.5 days

Subtasks:

  • Create project structure
  • Add NuGet references
  • Create ExcititorDataSource class
  • Create ServiceCollectionExtensions.cs

T6a.2: Implement Schema Migrations

Status: DONE Estimate: 1.5 days

Subtasks:

  • Create schema migration
  • Include all tables
  • Add indexes for graph traversal
  • Add indexes for VEX lookups
  • Test migration idempotency

T6a.3: Implement Project Repository

Status: DONE Estimate: 0.5 days

Subtasks:

  • Implement CRUD operations
  • Handle tenant scoping
  • Write integration tests

T6a.4: Implement VEX Statement Repository

Status: DONE Estimate: 1.5 days

Interface:

public interface IVexStatementRepository
{
    Task<VexStatement?> GetAsync(string tenantId, Guid statementId, CancellationToken ct);
    Task<IReadOnlyList<VexStatement>> GetByVulnerabilityAsync(
        string tenantId, string vulnerabilityId, CancellationToken ct);
    Task<IReadOnlyList<VexStatement>> GetByProjectAsync(
        string tenantId, Guid projectId, CancellationToken ct);
    Task<VexStatement> UpsertAsync(VexStatement statement, CancellationToken ct);
    Task<IReadOnlyList<VexStatement>> GetByGraphRevisionAsync(
        Guid graphRevisionId, CancellationToken ct);
}
  • Subtasks:
  • Implement all interface methods
  • Handle status and justification enums
  • Preserve evidence JSONB
  • Preserve provenance JSONB
  • Write integration tests

T6a.5: Implement VEX Observation Repository

Status: DONE Estimate: 1 day

Subtasks:

  • Implement CRUD operations
  • Handle unique constraint on composite key
  • Implement FindByVulnerabilityAndProductAsync
  • Write integration tests

T6a.6: Implement Linkset Repository

Status: DONE Estimate: 0.5 days

Subtasks:

  • Implement CRUD operations
  • Implement event logging
  • Write integration tests

T6a.7: Implement Consensus Repository

Status: DONE Estimate: 0.5 days

Subtasks:

  • Implement CRUD operations
  • Implement hold management
  • Write integration tests

Sprint 6b: Graph Storage

T6b.1: Implement Graph Revision Repository

Status: DONE Estimate: 1 day

Interface:

public interface IGraphRevisionRepository
{
    Task<GraphRevision?> GetByIdAsync(Guid id, CancellationToken ct);
    Task<GraphRevision?> GetByRevisionIdAsync(string revisionId, CancellationToken ct);
    Task<GraphRevision?> GetLatestByProjectAsync(Guid projectId, CancellationToken ct);
    Task<GraphRevision> CreateAsync(GraphRevision revision, CancellationToken ct);
    Task<IReadOnlyList<GraphRevision>> GetHistoryAsync(
        Guid projectId, int limit, CancellationToken ct);
}

Subtasks:

  • Implement all interface methods
  • Handle revision_id uniqueness
  • Handle parent_revision_id linking
  • Write integration tests

T6b.2: Implement Graph Node Repository

Status: DONE Estimate: 1.5 days

Interface:

public interface IGraphNodeRepository
{
    Task<GraphNode?> GetByIdAsync(long nodeId, CancellationToken ct);
    Task<GraphNode?> GetByKeyAsync(Guid graphRevisionId, string nodeKey, CancellationToken ct);
    Task<IReadOnlyList<GraphNode>> GetByRevisionAsync(
        Guid graphRevisionId, CancellationToken ct);
    Task BulkInsertAsync(
        Guid graphRevisionId, IEnumerable<GraphNode> nodes, CancellationToken ct);
    Task<int> GetCountAsync(Guid graphRevisionId, CancellationToken ct);
}

Subtasks:

  • Implement all interface methods
  • Implement bulk insert for efficiency
  • Handle node_key uniqueness per revision
  • Write integration tests

Bulk Insert Optimization:

public async Task BulkInsertAsync(
    Guid graphRevisionId,
    IEnumerable<GraphNode> nodes,
    CancellationToken ct)
{
    await using var connection = await _dataSource.OpenConnectionAsync("system", ct);
    await using var writer = await connection.BeginBinaryImportAsync(
        "COPY vex.graph_nodes (graph_revision_id, node_key, node_type, purl, name, version, attributes) " +
        "FROM STDIN (FORMAT BINARY)", ct);

    foreach (var node in nodes)
    {
        await writer.StartRowAsync(ct);
        await writer.WriteAsync(graphRevisionId, ct);
        await writer.WriteAsync(node.NodeKey, ct);
        await writer.WriteAsync(node.NodeType, ct);
        await writer.WriteAsync(node.Purl, NpgsqlDbType.Text, ct);
        await writer.WriteAsync(node.Name, NpgsqlDbType.Text, ct);
        await writer.WriteAsync(node.Version, NpgsqlDbType.Text, ct);
        await writer.WriteAsync(JsonSerializer.Serialize(node.Attributes), NpgsqlDbType.Jsonb, ct);
    }

    await writer.CompleteAsync(ct);
}

T6b.3: Implement Graph Edge Repository

Status: DONE Estimate: 1.5 days

Interface:

public interface IGraphEdgeRepository
{
    Task<IReadOnlyList<GraphEdge>> GetByRevisionAsync(
        Guid graphRevisionId, CancellationToken ct);
    Task<IReadOnlyList<GraphEdge>> GetOutgoingAsync(
        long fromNodeId, CancellationToken ct);
    Task<IReadOnlyList<GraphEdge>> GetIncomingAsync(
        long toNodeId, CancellationToken ct);
    Task BulkInsertAsync(
        Guid graphRevisionId, IEnumerable<GraphEdge> edges, CancellationToken ct);
    Task<int> GetCountAsync(Guid graphRevisionId, CancellationToken ct);
}

Subtasks:

  • Implement all interface methods
  • Implement bulk insert for efficiency
  • Optimize for traversal queries
  • Write integration tests

T6b.4: Verify Graph Revision ID Stability

Status: DONE Estimate: 1 day

Description: Critical: Same SBOM + feeds + policy must produce identical revision_id.

Subtasks:

  • Document revision_id computation algorithm
  • Verify nodes are inserted in deterministic order
  • Verify edges are inserted in deterministic order
  • Write stability tests

Stability Test:

[Fact]
public async Task Same_Inputs_Should_Produce_Same_RevisionId()
{
    var sbom = await LoadSbom("testdata/stable-sbom.json");
    var feedSnapshot = "feed-v1.2.3";
    var policyVersion = "policy-v1.0";

    // Compute multiple times
    var revisions = new List<string>();
    for (int i = 0; i < 5; i++)
    {
        var graph = await _graphService.ComputeGraphAsync(
            sbom, feedSnapshot, policyVersion);
        revisions.Add(graph.RevisionId);
    }

    // All must be identical
    revisions.Distinct().Should().HaveCount(1);
}

Sprint 6c: Migration & Verification (Fresh-Start)

T6c.1: Build Graph Conversion Service

Status: SKIPPED (fresh-start; no Mongo graph backfill) Estimate: 0 days


T6c.2: Build VEX Conversion Service

Status: SKIPPED (fresh-start; no Mongo VEX backfill) Estimate: 0 days


T6c.3: Run Dual Pipeline Comparison

Status: SKIPPED (fresh-start) Estimate: 0 days


T6c.4: Migrate Projects

Status: SKIPPED (fresh-start) Estimate: 0 days


T6c.5: Switch to PostgreSQL-Only

Status: DONE Estimate: 0.5 days

Subtasks:

  • Update configuration
  • Deploy to staging
  • Run full test suite
  • Deploy to production
  • Monitor metrics

Exit Criteria

  • All repository interfaces implemented
  • Graph storage working efficiently
  • Graph revision IDs stable (deterministic)
  • VEX statements preserved correctly
  • Determinism tests pass (Postgres baseline)
  • Excititor running on PostgreSQL in production

Execution Log

Date (UTC) Update
2025-12-05 Core schema/repos/migrations/tests completed; determinism verified; fresh-start path chosen (no Mongo VEX/graph backfill).

Risks & Mitigations

Risk Likelihood Impact Mitigation
Revision ID instability Medium Critical Deterministic ordering tests
Graph storage performance Medium High Bulk insert, index optimization
VEX lattice logic errors Low High Extensive comparison testing

Performance Considerations

Graph Storage

  • Use BIGSERIAL for node/edge IDs (high volume)
  • Use COPY for bulk inserts (10-100x faster)
  • Index (graph_revision_id, node_key) for lookups
  • Index (from_node_id) and (to_node_id) for traversal

Estimated Volumes

Table Estimated Rows per Project Total Estimated
graph_nodes 1,000 - 50,000 10M+
graph_edges 2,000 - 100,000 20M+
vex_statements 100 - 5,000 1M+

Phase Version: 1.0.0 Last Updated: 2025-11-28