12 KiB
CONCELIER-LNM-26-001 · Linkset Correlation Rules (v2)
Supersedes
linkset-correlation-21-002.mdfor new linkset builds. V1 linksets remain valid; migration job will recompute confidence using v2 algorithm.
Purpose: Address critical failure modes in v1 correlation (intersection transitivity, false conflict emission) and introduce higher-discriminative signals (patch lineage, version compatibility, IDF-weighted package matching).
Scope
- Applies to linksets produced from
advisory_observations(LNM v2). - Correlation is aggregation-only: no value synthesis or merge; emit conflicts instead of collapsing fields.
- Output persists in
advisory_linksetsand drivesadvisory.linkset.updated@1events. - Maintains determinism, offline posture, and LNM/AOC contracts.
Key Changes from v1
| Aspect | v1 Behavior | v2 Behavior |
|---|---|---|
| Alias matching | Intersection across all inputs | Graph connectivity (LCC ratio) |
| PURL matching | Intersection across all inputs | Pairwise coverage + IDF weighting |
| Reference clash | Emitted on zero overlap | Only on true URL contradictions |
| Conflict penalty | Single -0.1 for any conflict | Typed severities with per-reason penalties |
| Patch lineage | Not used | Top-tier signal (+0.35 for exact SHA) |
| Version ranges | Divergence noted only | Classified (Equivalent/Overlapping/Disjoint) |
Deterministic Confidence Calculation (0-1)
Signal Weights
confidence = clamp(
0.30 * alias_connectivity +
0.10 * alias_authority +
0.20 * package_coverage +
0.10 * version_compatibility +
0.10 * cpe_match +
0.10 * patch_lineage +
0.05 * reference_overlap +
0.05 * freshness_score
) - typed_penalty
Signal Definitions
alias_connectivity (weight: 0.30)
Graph-based scoring replacing intersection-across-all.
- Build bipartite graph: observation nodes ↔ alias nodes
- Connect observations that share any alias (transitive bridging)
- Compute LCC (largest connected component) ratio:
|LCC| / N
| Scenario | Score |
|---|---|
| All observations in single connected component | 1.0 |
| 80% of observations connected | 0.8 |
| No alias overlap at all | 0.0 |
Why this matters: Sources A (CVE-X), B (CVE-X + GHSA-Y), C (GHSA-Y) now correctly correlate via transitive bridging, whereas v1 produced score = 0.
alias_authority (weight: 0.10)
Scope-based weighting using existing canonical key prefixes:
| Alias Type | Authority Score |
|---|---|
| CVE-* (global) | 1.0 |
| GHSA-* (ecosystem) | 0.8 |
| Vendor IDs (RHSA, MSRC, CISCO, VMSA) | 0.6 |
| Distribution IDs (DSA, USN, SUSE) | 0.4 |
| Unknown scheme | 0.2 |
package_coverage (weight: 0.20)
Pairwise + IDF weighting replacing intersection-across-all.
- Extract package keys (PURL without version) from each observation
- For each package key, compute IDF weight:
log(N / (1 + df))where N = corpus size, df = observations containing package - Score = weighted overlap ratio across pairs
| Scenario | Score |
|---|---|
| All sources share same rare package | ~1.0 |
| All sources share common package (lodash) | ~0.6 |
| One "thin" source with no packages | Other sources still score > 0 |
| No package overlap | 0.0 |
IDF fallback: When cache unavailable, uniform weights (1.0) are used.
version_compatibility (weight: 0.10)
Classifies version relationships per shared package:
| Relation | Score | Conflict |
|---|---|---|
| Equivalent: ranges normalize identically | 1.0 | None |
| Overlapping: non-empty intersection | 0.6 | Soft (affected-range-divergence) |
| Disjoint: no intersection | 0.0 | Hard (disjoint-version-ranges) |
| Unknown: parse failure | 0.5 | None |
Uses SemanticVersionRangeResolver for semver; delegates to ecosystem-specific comparers for rpm EVR, dpkg, apk.
cpe_match (weight: 0.10)
Unchanged from v1:
- Exact CPE overlap: 1.0
- Same vendor/product: 0.5
- No match: 0.0
patch_lineage (weight: 0.10)
New signal: correlation via shared fix commits.
- Extract patch references from observation references (type:
patch,fix,commit) - Normalize to commit SHAs using
PatchLineageNormalizer - Any pairwise SHA match: 1.0; otherwise 0.0
Why this matters: "These advisories fix the same code" is high-confidence evidence most platforms lack.
reference_overlap (weight: 0.05)
Positive-only (no conflict on zero overlap):
- Normalize URLs (lowercase, strip tracking params, https://)
- Compute max pairwise overlap ratio
- Map to score:
0.5 + (overlap * 0.5)
| Scenario | Score |
|---|---|
| 100% URL overlap | 1.0 |
| 50% URL overlap | 0.75 |
| Zero URL overlap | 0.5 (neutral) |
No reference-clash emission for simple disjoint sets.
freshness_score (weight: 0.05)
Unchanged from v1:
- Spread ≤ 48h: 1.0
- Spread ≥ 14d: 0.0
- Linear decay between
Conflict Emission (Typed Severities)
Severity Levels
| Severity | Penalty Range | Meaning |
|---|---|---|
| Hard | 0.30 - 0.40 | Significant disagreement; likely prevents high-confidence linking |
| Soft | 0.05 - 0.10 | Minor disagreement; link with reduced confidence |
| Info | 0.00 | Informational; no penalty |
Conflict Types and Penalties
| Conflict Reason | Severity | Penalty | Trigger Condition |
|---|---|---|---|
distinct-cves |
Hard | -0.40 | Two different CVE-* identifiers in cluster |
disjoint-version-ranges |
Hard | -0.30 | Same package key, ranges have no intersection |
alias-inconsistency |
Soft | -0.10 | Disconnected alias graph (but no CVE conflict) |
affected-range-divergence |
Soft | -0.05 | Ranges overlap but differ |
severity-mismatch |
Soft | -0.05 | CVSS base score delta > 1.0 |
reference-clash |
Info | 0.00 | Reserved for true contradictions only |
metadata-gap |
Info | 0.00 | Required provenance missing |
Penalty Calculation
typed_penalty = min(0.6, sum(penalty_per_conflict))
Saturates at 0.6 to prevent complete collapse; minimum confidence = 0.1 when any evidence exists.
Conflict Record Shape
{
"field": "aliases",
"reason": "distinct-cves",
"severity": "Hard",
"values": ["nvd:CVE-2025-1234", "ghsa:CVE-2025-5678"],
"sourceIds": ["nvd", "ghsa"]
}
Linkset Output Shape
Additions from v1:
{
"key": {
"vulnerabilityId": "CVE-2025-1234",
"productKey": "pkg:npm/lodash",
"confidence": 0.85
},
"conflicts": [
{
"field": "affected.versions[pkg:npm/lodash]",
"reason": "affected-range-divergence",
"severity": "Soft",
"values": ["nvd:>=4.0.0,<4.17.21", "ghsa:>=4.0.0,<4.18.0"],
"sourceIds": ["nvd", "ghsa"]
}
],
"signalScores": {
"aliasConnectivity": 1.0,
"aliasAuthority": 1.0,
"packageCoverage": 0.85,
"versionCompatibility": 0.6,
"cpeMatch": 0.5,
"patchLineage": 1.0,
"referenceOverlap": 0.75,
"freshness": 1.0
},
"provenance": {
"observationHashes": ["sha256:abc...", "sha256:def..."],
"toolVersion": "concelier/2.0.0",
"correlationVersion": "v2"
}
}
Algorithm Pseudocode
function Compute(observations):
if observations.empty:
return (confidence=1.0, conflicts=[])
conflicts = []
# 1. Alias connectivity (graph-based)
aliasGraph = buildBipartiteGraph(observations)
aliasConnectivity = LCC(aliasGraph) / observations.count
if hasDistinctCVEs(aliasGraph):
conflicts.add(HardConflict("distinct-cves"))
elif aliasConnectivity < 1.0:
conflicts.add(SoftConflict("alias-inconsistency"))
# 2. Alias authority
aliasAuthority = maxAuthorityScore(observations)
# 3. Package coverage (pairwise + IDF)
packageCoverage = computeIDFWeightedCoverage(observations)
# 4. Version compatibility
for sharedPackage in findSharedPackages(observations):
relation = classifyVersionRelation(observations, sharedPackage)
if relation == Disjoint:
conflicts.add(HardConflict("disjoint-version-ranges"))
elif relation == Overlapping:
conflicts.add(SoftConflict("affected-range-divergence"))
versionScore = averageRelationScore(observations)
# 5. CPE match
cpeScore = computeCpeOverlap(observations)
# 6. Patch lineage
patchScore = 1.0 if anyPairSharesCommitSHA(observations) else 0.0
# 7. Reference overlap (positive-only)
referenceScore = 0.5 + (maxPairwiseURLOverlap(observations) * 0.5)
# 8. Freshness
freshnessScore = computeFreshness(observations)
# Calculate weighted sum
baseConfidence = (
0.30 * aliasConnectivity +
0.10 * aliasAuthority +
0.20 * packageCoverage +
0.10 * versionScore +
0.10 * cpeScore +
0.10 * patchScore +
0.05 * referenceScore +
0.05 * freshnessScore
)
# Apply typed penalties
penalty = min(0.6, sum(conflict.penalty for conflict in conflicts))
finalConfidence = max(0.1, baseConfidence - penalty)
return (confidence=finalConfidence, conflicts=dedupe(conflicts))
Implementation
Code Locations
| Component | Path |
|---|---|
| V2 Algorithm | src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/LinksetCorrelationV2.cs |
| Conflict Model | src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/AdvisoryLinkset.cs |
| Patch Normalizer | src/Concelier/__Libraries/StellaOps.Concelier.Merge/Identity/Normalizers/PatchLineageNormalizer.cs |
| Version Resolver | src/Concelier/__Libraries/StellaOps.Concelier.Merge/Comparers/SemanticVersionRangeResolver.cs |
Configuration
concelier:
correlation:
version: v2 # v1 | v2
weights:
aliasConnectivity: 0.30
aliasAuthority: 0.10
packageCoverage: 0.20
versionCompatibility: 0.10
cpeMatch: 0.10
patchLineage: 0.10
referenceOverlap: 0.05
freshness: 0.05
idf:
enabled: true
cacheKey: "concelier:package:idf"
refreshIntervalMinutes: 60
textSimilarity:
enabled: false # Phase 3
Telemetry
| Instrument | Type | Tags | Purpose |
|---|---|---|---|
concelier.linkset.confidence |
Histogram | version |
Confidence score distribution |
concelier.linkset.conflicts_total |
Counter | reason, severity |
Conflict counts by type |
concelier.linkset.signal_score |
Histogram | signal |
Per-signal score distribution |
concelier.linkset.patch_lineage_hits |
Counter | - | Patch SHA matches found |
concelier.linkset.idf_cache_hit |
Counter | hit |
IDF cache effectiveness |
Migration
Recompute Job
stella db linksets recompute --correlation-version v2 --batch-size 1000
Recomputes confidence for existing linksets using v2 algorithm. Does not modify observation data.
Rollback
Set concelier:correlation:version: v1 to revert to intersection-based scoring.
Fixtures
docs/modules/concelier/samples/linkset-v2-transitive-bridge.json: Three-source transitive bridging (A↔B↔C) demonstrating graph connectivity.docs/modules/concelier/samples/linkset-v2-patch-match.json: Two-source correlation via shared commit SHA.docs/modules/concelier/samples/linkset-v2-hard-conflict.json: Distinct CVEs in cluster triggering hard penalty.
All fixtures use ASCII ordering and ISO-8601 UTC timestamps.
Change Control
- V2 is add-only relative to v1 output schema.
- Signal weight adjustments require sprint note but not schema version bump.
- New conflict reasons require
advisory.linkset.updated@2event schema and doc update. - Removal of a signal requires deprecation period and migration guidance.