stella-ops.org/git.stella-ops.org

Fork 0

Files

master a743bb9a1d devops folders consolidate

2026-01-25 23:39:14 +02:00

4.6 KiB

Raw Permalink Blame History

ADR-001: Linkset Correlation Algorithm V2

Status: Accepted Date: 2026-01-25 Sprint: SPRINT_20260125_001_Concelier_linkset_correlation_v2

Context

The Concelier module's linkset correlation algorithm determines whether multiple vulnerability observations (from different sources like NVD, GitHub Advisories, vendor feeds) refer to the same underlying vulnerability. The V1 algorithm had several critical failure modes:

Alias intersection transitivity failure: Sources A (CVE-X), B (CVE-X + GHSA-Y), C (GHSA-Y) produced empty intersection despite transitive identity through shared aliases.
Thin source penalty: A source with zero packages collapsed the entire group's package score to 0, even when other sources shared packages.
False reference conflicts: Zero reference overlap was treated as a conflict rather than neutral evidence.
Uniform conflict penalties: All conflicts applied the same -0.1 penalty regardless of severity.

These issues caused both false negatives (failing to link related advisories) and false positives (emitting spurious conflicts).

Decision

We will replace the V1 intersection-based correlation algorithm with a V2 graph-based approach that:

1. Graph-Based Alias Connectivity

Replace intersection-across-all with union-find graph connectivity. Build a bipartite graph (observation ↔ alias nodes) and compute Largest Connected Component (LCC) ratio.

Rationale: Transitive relationships are naturally captured by graph connectivity. Three sources with partial alias overlap can still achieve high correlation if they form a connected component.

2. Pairwise Package Coverage

Replace intersection-across-all with pairwise coverage scoring. Score is positive when any pair shares a package key, even if some sources have no packages.

Rationale: "Thin" sources (e.g., vendor advisories with only CVE IDs) should not penalize correlation when other sources provide package evidence.

3. Neutral Reference Scoring

Zero reference overlap returns 0.5 (neutral) instead of emitting a conflict. Reserve conflicts for true contradictions.

Rationale: Disjoint reference sets indicate lack of supporting evidence, not contradiction.

4. Typed Conflict Severities

Replace uniform -0.1 penalty with severity-based penalties:

Conflict Type	Severity	Penalty
Distinct CVEs in cluster	Hard	-0.4
Disjoint version ranges	Hard	-0.3
Overlapping divergent ranges	Soft	-0.05
CVSS/severity mismatch	Soft	-0.05
Alias inconsistency	Soft	-0.1
Zero reference overlap	None	0

Rationale: Hard conflicts (distinct identities) should heavily penalize confidence. Soft conflicts (metadata differences) may indicate data quality issues but not identity mismatch.

5. Additional Correlation Signals

Add high-discriminative signals:

Patch lineage (0.10 weight): Shared commit SHA indicates same fix
Version compatibility (0.10 weight): Classify range relationships
IDF weighting: Rare package matches weighted higher than common packages

6. V1/V2 Switchable Interface

Provide ILinksetCorrelationService with configurable version selection to enable gradual rollout and A/B testing.

Consequences

Positive

Eliminates false negatives from transitive alias chains
Eliminates false negatives from thin sources
Reduces false positive conflicts from disjoint references
Enables fine-grained conflict severity handling by downstream policy
Adds discriminative signals (patch lineage) that differentiate from commodity linkers

Negative

Changes correlation weights, affecting existing linkset confidence scores
Requires recomputation of existing linksets during migration
Adds Valkey dependency for IDF caching (mitigated by graceful fallback)

Neutral

Algorithm complexity increases but remains O(n²) in observations
Determinism preserved through fixed scorer order and tie-breakers

Implementation

Core algorithm: LinksetCorrelationV2.cs
Service interface: ILinksetCorrelationService.cs
Service implementation: LinksetCorrelationService.cs
Model extension: ConflictSeverity enum in AdvisoryLinkset.cs
IDF caching: ValkeyPackageIdfService.cs
Tests: 27 V2 tests + 18 IDF tests

References

Sprint: docs/implplan/SPRINT_20260125_001_Concelier_linkset_correlation_v2.md
Algorithm documentation: docs/modules/concelier/linkset-correlation-v2.md
Architecture section: docs/modules/concelier/architecture.md § 5.2
Conflict resolution runbook: docs/modules/concelier/operations/conflict-resolution.md § 5.1

4.6 KiB Raw Permalink Blame History