# CONCELIER-LNM-26-001 · Linkset Correlation Rules (v2) > Supersedes `linkset-correlation-21-002.md` for new linkset builds. > V1 linksets remain valid; migration job will recompute confidence using v2 algorithm. Purpose: Address critical failure modes in v1 correlation (intersection transitivity, false conflict emission) and introduce higher-discriminative signals (patch lineage, version compatibility, IDF-weighted package matching). --- ## Scope - Applies to linksets produced from `advisory_observations` (LNM v2). - Correlation is aggregation-only: no value synthesis or merge; emit conflicts instead of collapsing fields. - Output persists in `advisory_linksets` and drives `advisory.linkset.updated@1` events. - Maintains determinism, offline posture, and LNM/AOC contracts. --- ## Key Changes from v1 | Aspect | v1 Behavior | v2 Behavior | |--------|-------------|-------------| | Alias matching | Intersection across all inputs | Graph connectivity (LCC ratio) | | PURL matching | Intersection across all inputs | Pairwise coverage + IDF weighting | | Reference clash | Emitted on zero overlap | Only on true URL contradictions | | Conflict penalty | Single -0.1 for any conflict | Typed severities with per-reason penalties | | Patch lineage | Not used | Top-tier signal (+0.35 for exact SHA) | | Version ranges | Divergence noted only | Classified (Equivalent/Overlapping/Disjoint) | --- ## Deterministic Confidence Calculation (0-1) ### Signal Weights ``` confidence = clamp( 0.30 * alias_connectivity + 0.10 * alias_authority + 0.20 * package_coverage + 0.10 * version_compatibility + 0.10 * cpe_match + 0.10 * patch_lineage + 0.05 * reference_overlap + 0.05 * freshness_score ) - typed_penalty ``` ### Signal Definitions #### `alias_connectivity` (weight: 0.30) **Graph-based scoring** replacing intersection-across-all. 1. Build bipartite graph: observation nodes ↔ alias nodes 2. Connect observations that share any alias (transitive bridging) 3. Compute LCC (largest connected component) ratio: `|LCC| / N` | Scenario | Score | |----------|-------| | All observations in single connected component | 1.0 | | 80% of observations connected | 0.8 | | No alias overlap at all | 0.0 | **Why this matters**: Sources A (CVE-X), B (CVE-X + GHSA-Y), C (GHSA-Y) now correctly correlate via transitive bridging, whereas v1 produced score = 0. #### `alias_authority` (weight: 0.10) Scope-based weighting using existing canonical key prefixes: | Alias Type | Authority Score | |------------|-----------------| | CVE-* (global) | 1.0 | | GHSA-* (ecosystem) | 0.8 | | Vendor IDs (RHSA, MSRC, CISCO, VMSA) | 0.6 | | Distribution IDs (DSA, USN, SUSE) | 0.4 | | Unknown scheme | 0.2 | #### `package_coverage` (weight: 0.20) **Pairwise + IDF weighting** replacing intersection-across-all. 1. Extract package keys (PURL without version) from each observation 2. For each package key, compute IDF weight: `log(N / (1 + df))` where N = corpus size, df = observations containing package 3. Score = weighted overlap ratio across pairs | Scenario | Score | |----------|-------| | All sources share same rare package | ~1.0 | | All sources share common package (lodash) | ~0.6 | | One "thin" source with no packages | Other sources still score > 0 | | No package overlap | 0.0 | **IDF fallback**: When cache unavailable, uniform weights (1.0) are used. #### `version_compatibility` (weight: 0.10) Classifies version relationships per shared package: | Relation | Score | Conflict | |----------|-------|----------| | **Equivalent**: ranges normalize identically | 1.0 | None | | **Overlapping**: non-empty intersection | 0.6 | Soft (`affected-range-divergence`) | | **Disjoint**: no intersection | 0.0 | Hard (`disjoint-version-ranges`) | | **Unknown**: parse failure | 0.5 | None | Uses `SemanticVersionRangeResolver` for semver; delegates to ecosystem-specific comparers for rpm EVR, dpkg, apk. #### `cpe_match` (weight: 0.10) Unchanged from v1: - Exact CPE overlap: 1.0 - Same vendor/product: 0.5 - No match: 0.0 #### `patch_lineage` (weight: 0.10) **New signal**: correlation via shared fix commits. 1. Extract patch references from observation references (type: `patch`, `fix`, `commit`) 2. Normalize to commit SHAs using `PatchLineageNormalizer` 3. Any pairwise SHA match: 1.0; otherwise 0.0 **Why this matters**: "These advisories fix the same code" is high-confidence evidence most platforms lack. #### `reference_overlap` (weight: 0.05) **Positive-only** (no conflict on zero overlap): 1. Normalize URLs (lowercase, strip tracking params, https://) 2. Compute max pairwise overlap ratio 3. Map to score: `0.5 + (overlap * 0.5)` | Scenario | Score | |----------|-------| | 100% URL overlap | 1.0 | | 50% URL overlap | 0.75 | | Zero URL overlap | 0.5 (neutral) | **No `reference-clash` emission** for simple disjoint sets. #### `freshness_score` (weight: 0.05) Unchanged from v1: - Spread ≤ 48h: 1.0 - Spread ≥ 14d: 0.0 - Linear decay between --- ## Conflict Emission (Typed Severities) ### Severity Levels | Severity | Penalty Range | Meaning | |----------|---------------|---------| | **Hard** | 0.30 - 0.40 | Significant disagreement; likely prevents high-confidence linking | | **Soft** | 0.05 - 0.10 | Minor disagreement; link with reduced confidence | | **Info** | 0.00 | Informational; no penalty | ### Conflict Types and Penalties | Conflict Reason | Severity | Penalty | Trigger Condition | |-----------------|----------|---------|-------------------| | `distinct-cves` | Hard | -0.40 | Two different CVE-* identifiers in cluster | | `disjoint-version-ranges` | Hard | -0.30 | Same package key, ranges have no intersection | | `alias-inconsistency` | Soft | -0.10 | Disconnected alias graph (but no CVE conflict) | | `affected-range-divergence` | Soft | -0.05 | Ranges overlap but differ | | `severity-mismatch` | Soft | -0.05 | CVSS base score delta > 1.0 | | `reference-clash` | Info | 0.00 | Reserved for true contradictions only | | `metadata-gap` | Info | 0.00 | Required provenance missing | ### Penalty Calculation ``` typed_penalty = min(0.6, sum(penalty_per_conflict)) ``` Saturates at 0.6 to prevent complete collapse; minimum confidence = 0.1 when any evidence exists. ### Conflict Record Shape ```json { "field": "aliases", "reason": "distinct-cves", "severity": "Hard", "values": ["nvd:CVE-2025-1234", "ghsa:CVE-2025-5678"], "sourceIds": ["nvd", "ghsa"] } ``` --- ## Linkset Output Shape Additions from v1: ```json { "key": { "vulnerabilityId": "CVE-2025-1234", "productKey": "pkg:npm/lodash", "confidence": 0.85 }, "conflicts": [ { "field": "affected.versions[pkg:npm/lodash]", "reason": "affected-range-divergence", "severity": "Soft", "values": ["nvd:>=4.0.0,<4.17.21", "ghsa:>=4.0.0,<4.18.0"], "sourceIds": ["nvd", "ghsa"] } ], "signalScores": { "aliasConnectivity": 1.0, "aliasAuthority": 1.0, "packageCoverage": 0.85, "versionCompatibility": 0.6, "cpeMatch": 0.5, "patchLineage": 1.0, "referenceOverlap": 0.75, "freshness": 1.0 }, "provenance": { "observationHashes": ["sha256:abc...", "sha256:def..."], "toolVersion": "concelier/2.0.0", "correlationVersion": "v2" } } ``` --- ## Algorithm Pseudocode ``` function Compute(observations): if observations.empty: return (confidence=1.0, conflicts=[]) conflicts = [] # 1. Alias connectivity (graph-based) aliasGraph = buildBipartiteGraph(observations) aliasConnectivity = LCC(aliasGraph) / observations.count if hasDistinctCVEs(aliasGraph): conflicts.add(HardConflict("distinct-cves")) elif aliasConnectivity < 1.0: conflicts.add(SoftConflict("alias-inconsistency")) # 2. Alias authority aliasAuthority = maxAuthorityScore(observations) # 3. Package coverage (pairwise + IDF) packageCoverage = computeIDFWeightedCoverage(observations) # 4. Version compatibility for sharedPackage in findSharedPackages(observations): relation = classifyVersionRelation(observations, sharedPackage) if relation == Disjoint: conflicts.add(HardConflict("disjoint-version-ranges")) elif relation == Overlapping: conflicts.add(SoftConflict("affected-range-divergence")) versionScore = averageRelationScore(observations) # 5. CPE match cpeScore = computeCpeOverlap(observations) # 6. Patch lineage patchScore = 1.0 if anyPairSharesCommitSHA(observations) else 0.0 # 7. Reference overlap (positive-only) referenceScore = 0.5 + (maxPairwiseURLOverlap(observations) * 0.5) # 8. Freshness freshnessScore = computeFreshness(observations) # Calculate weighted sum baseConfidence = ( 0.30 * aliasConnectivity + 0.10 * aliasAuthority + 0.20 * packageCoverage + 0.10 * versionScore + 0.10 * cpeScore + 0.10 * patchScore + 0.05 * referenceScore + 0.05 * freshnessScore ) # Apply typed penalties penalty = min(0.6, sum(conflict.penalty for conflict in conflicts)) finalConfidence = max(0.1, baseConfidence - penalty) return (confidence=finalConfidence, conflicts=dedupe(conflicts)) ``` --- ## Implementation ### Code Locations | Component | Path | |-----------|------| | V2 Algorithm | `src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/LinksetCorrelationV2.cs` | | Conflict Model | `src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/AdvisoryLinkset.cs` | | Patch Normalizer | `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Identity/Normalizers/PatchLineageNormalizer.cs` | | Version Resolver | `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Comparers/SemanticVersionRangeResolver.cs` | ### Configuration ```yaml concelier: correlation: version: v2 # v1 | v2 weights: aliasConnectivity: 0.30 aliasAuthority: 0.10 packageCoverage: 0.20 versionCompatibility: 0.10 cpeMatch: 0.10 patchLineage: 0.10 referenceOverlap: 0.05 freshness: 0.05 idf: enabled: true cacheKey: "concelier:package:idf" refreshIntervalMinutes: 60 textSimilarity: enabled: false # Phase 3 ``` --- ## Telemetry | Instrument | Type | Tags | Purpose | |------------|------|------|---------| | `concelier.linkset.confidence` | Histogram | `version` | Confidence score distribution | | `concelier.linkset.conflicts_total` | Counter | `reason`, `severity` | Conflict counts by type | | `concelier.linkset.signal_score` | Histogram | `signal` | Per-signal score distribution | | `concelier.linkset.patch_lineage_hits` | Counter | - | Patch SHA matches found | | `concelier.linkset.idf_cache_hit` | Counter | `hit` | IDF cache effectiveness | --- ## Migration ### Recompute Job ```bash stella db linksets recompute --correlation-version v2 --batch-size 1000 ``` Recomputes confidence for existing linksets using v2 algorithm. Does not modify observation data. ### Rollback Set `concelier:correlation:version: v1` to revert to intersection-based scoring. --- ## Fixtures - `docs/modules/concelier/samples/linkset-v2-transitive-bridge.json`: Three-source transitive bridging (A↔B↔C) demonstrating graph connectivity. - `docs/modules/concelier/samples/linkset-v2-patch-match.json`: Two-source correlation via shared commit SHA. - `docs/modules/concelier/samples/linkset-v2-hard-conflict.json`: Distinct CVEs in cluster triggering hard penalty. All fixtures use ASCII ordering and ISO-8601 UTC timestamps. --- ## Change Control - V2 is add-only relative to v1 output schema. - Signal weight adjustments require sprint note but not schema version bump. - New conflict reasons require `advisory.linkset.updated@2` event schema and doc update. - Removal of a signal requires deprecation period and migration guidance.