Files
git.stella-ops.org/docs/db/tasks/PHASE_5_VULNERABILITIES.md
StellaOps Bot 6a299d231f
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Add unit tests for Router configuration and transport layers
- Implemented tests for RouterConfig, RoutingOptions, StaticInstanceConfig, and RouterConfigOptions to ensure default values are set correctly.
- Added tests for RouterConfigProvider to validate configurations and ensure defaults are returned when no file is specified.
- Created tests for ConfigValidationResult to check success and error scenarios.
- Developed tests for ServiceCollectionExtensions to verify service registration for RouterConfig.
- Introduced UdpTransportTests to validate serialization, connection, request-response, and error handling in UDP transport.
- Added scripts for signing authority gaps and hashing DevPortal SDK snippets.
2025-12-05 08:01:47 +02:00

8.2 KiB

Phase 5: Vulnerability Index Conversion (Concelier)

Sprint: 6-7 Duration: 2 sprints Status: DONE (fresh-start; feed-driven) Dependencies: Phase 0 (Foundations) — DONE


Objectives

  1. Create StellaOps.Concelier.Storage.Postgres project
  2. Implement full vulnerability schema in PostgreSQL
  3. Build advisory conversion pipeline
  4. Maintain deterministic vulnerability matching

Deliverables

Deliverable Acceptance Criteria
Vuln schema All tables created with indexes
Conversion pipeline MongoDB advisories converted to PostgreSQL
Matching verification Same CVEs found for identical SBOMs
Integration tests 100% coverage of query operations

Schema Reference

See SPECIFICATION.md Section 5.2 for complete vulnerability schema.

Tables:

  • vuln.sources
  • vuln.feed_snapshots
  • vuln.advisory_snapshots
  • vuln.advisories
  • vuln.advisory_aliases
  • vuln.advisory_cvss
  • vuln.advisory_affected
  • vuln.advisory_references
  • vuln.advisory_credits
  • vuln.advisory_weaknesses
  • vuln.kev_flags
  • vuln.source_states
  • vuln.merge_events

Sprint 5a: Schema & Repositories

T5a.1: Create Concelier.Storage.Postgres Project

Status: TODO Estimate: 0.5 days

Subtasks:

  • Create project structure
  • Add NuGet references
  • Create ConcelierDataSource class
  • Create ServiceCollectionExtensions.cs

T5a.2: Implement Schema Migrations

Status: DONE Estimate: 1.5 days

Subtasks:

  • Create schema migration
  • Include all tables
  • Add full-text search index
  • Add PURL lookup index
  • Test migration idempotency

T5a.3: Implement Source Repository

Status: DONE Estimate: 0.5 days

Subtasks:

  • Implement CRUD operations
  • Implement GetByKeyAsync
  • Write integration tests

T5a.4: Implement Advisory Repository

Status: DONE Estimate: 2 days

Interface:

public interface IAdvisoryRepository
{
    Task<Advisory?> GetByKeyAsync(string advisoryKey, CancellationToken ct);
    Task<Advisory?> GetByAliasAsync(string aliasType, string aliasValue, CancellationToken ct);
    Task<IReadOnlyList<Advisory>> SearchAsync(AdvisorySearchQuery query, CancellationToken ct);
    Task<Advisory> UpsertAsync(Advisory advisory, CancellationToken ct);
    Task<IReadOnlyList<Advisory>> GetAffectingPackageAsync(string purl, CancellationToken ct);
    Task<IReadOnlyList<Advisory>> GetAffectingPackageNameAsync(string ecosystem, string name, CancellationToken ct);
}

Subtasks:

  • Implement GetByKeyAsync
  • Implement GetByAliasAsync (CVE lookup)
  • Implement SearchAsync with full-text search
  • Implement UpsertAsync with all child tables
  • Implement GetAffectingPackageAsync (PURL match)
  • Implement GetAffectingPackageNameAsync
  • Write integration tests

T5a.5: Implement Child Table Repositories

Status: DONE Estimate: 2 days

Subtasks:

  • Implement Alias repository
  • Implement CVSS repository
  • Implement Affected repository
  • Implement Reference repository
  • Implement Credit repository
  • Implement Weakness repository
  • Implement KEV repository
  • Write integration tests

T5a.6: Implement Source State Repository

Status: DONE Estimate: 0.5 days

Subtasks:

  • Implement CRUD operations
  • Implement cursor management
  • Write integration tests

Sprint 5b: Conversion & Verification

T5b.1: Build Advisory Conversion Service

Status: SKIPPED (fresh-start; no Mongo backfill) Estimate: 0 days

Description: Create service to convert MongoDB advisory documents to PostgreSQL relational structure.

Subtasks:

  • Parse MongoDB AdvisoryDocument structure
  • Map to vuln.advisories table
  • Extract and normalize aliases
  • Extract and normalize CVSS metrics
  • Extract and normalize affected packages
  • Preserve provenance JSONB
  • Handle version ranges (keep as JSONB)
  • Handle normalized versions (keep as JSONB)

Conversion Logic:

public sealed class AdvisoryConverter
{
    public async Task ConvertAsync(
        IMongoCollection<AdvisoryDocument> source,
        IAdvisoryRepository target,
        CancellationToken ct)
    {
        await foreach (var doc in source.AsAsyncEnumerable(ct))
        {
            var advisory = MapToAdvisory(doc);
            await target.UpsertAsync(advisory, ct);
        }
    }

    private Advisory MapToAdvisory(AdvisoryDocument doc)
    {
        // Extract from BsonDocument payload
        var payload = doc.Payload;
        return new Advisory
        {
            AdvisoryKey = doc.Id,
            PrimaryVulnId = payload["primaryVulnId"].AsString,
            Title = payload["title"]?.AsString,
            Summary = payload["summary"]?.AsString,
            // ... etc
            Provenance = BsonSerializer.Deserialize<JsonElement>(payload["provenance"]),
        };
    }
}

T5b.2: Build Feed Import Pipeline

Status: DONE Estimate: 1 day

Description: Modify feed import to write directly to PostgreSQL.

Subtasks:

  • Update NVD importer to use PostgreSQL
  • Update OSV importer to use PostgreSQL
  • Update GHSA importer to use PostgreSQL
  • Update vendor feed importers
  • Test incremental imports

T5b.3: Run Parallel Import

Status: SKIPPED (fresh-start) Estimate: 0 days

Description: Run imports to both MongoDB and PostgreSQL simultaneously.

Subtasks:

  • Configure dual-import mode
  • Run import cycle
  • Compare record counts
  • Sample comparison checks

T5b.4: Verify Vulnerability Matching

Status: DONE (Postgres-only baseline; regression tests) Estimate: 2 days

Description: Verify that vulnerability matching produces identical results.

Subtasks:

  • Select sample SBOMs (various ecosystems)
  • Run matching with MongoDB backend
  • Run matching with PostgreSQL backend
  • Compare findings (must be identical)
  • Document any differences
  • Fix any issues found

Verification Tests:

[Theory]
[MemberData(nameof(GetSampleSboms))]
public async Task Scanner_Should_Find_Same_Vulns(string sbomPath)
{
    var sbom = await LoadSbom(sbomPath);

    _config["Persistence:Concelier"] = "Mongo";
    var mongoFindings = await _scanner.ScanAsync(sbom);

    _config["Persistence:Concelier"] = "Postgres";
    var postgresFindings = await _scanner.ScanAsync(sbom);

    // Strict ordering for determinism
    postgresFindings.Should().BeEquivalentTo(mongoFindings,
        options => options.WithStrictOrdering());
}

T5b.5: Performance Optimization

Status: DONE Estimate: 1 day

Subtasks:

  • Analyze slow queries with EXPLAIN ANALYZE
  • Optimize indexes for common queries
  • Consider partial indexes for active advisories
  • Benchmark PostgreSQL vs MongoDB performance

T5b.6: Switch Scanner to PostgreSQL

Status: DONE Estimate: 0.5 days

Subtasks:

  • Update configuration
  • Deploy to staging
  • Run full scan suite
  • Deploy to production
  • Monitor scan determinism

Exit Criteria

  • All repository interfaces implemented
  • Advisory conversion pipeline working (fresh-start; feed-only ingestion in place)
  • Vulnerability matching validated on Postgres baseline
  • Feed imports working on PostgreSQL
  • Concelier running on PostgreSQL in production

Risks & Mitigations

Risk Likelihood Impact Mitigation
Matching discrepancies Medium High Regression suite on Postgres baseline; keep fixtures deterministic
Performance regression on queries Medium Medium Index optimization, query tuning
Data loss during conversion Low High Fresh-start chosen; rely on feed reimport + deterministic ingest

Data Volume Estimates (post fresh-start)

Table Estimated Rows Growth Rate
advisories feed-derived ~100/day
advisory_aliases feed-derived ~200/day
advisory_affected feed-derived ~1000/day
advisory_cvss feed-derived ~150/day

Phase Version: 1.0.0 Last Updated: 2025-11-28