Files

master a743bb9a1d devops folders consolidate

2026-01-25 23:39:14 +02:00

12 KiB

Raw Blame History

CONCELIER-LNM-26-001 · Linkset Correlation Rules (v2)

Supersedes linkset-correlation-21-002.md for new linkset builds. V1 linksets remain valid; migration job will recompute confidence using v2 algorithm.

Purpose: Address critical failure modes in v1 correlation (intersection transitivity, false conflict emission) and introduce higher-discriminative signals (patch lineage, version compatibility, IDF-weighted package matching).

Scope

Applies to linksets produced from advisory_observations (LNM v2).
Correlation is aggregation-only: no value synthesis or merge; emit conflicts instead of collapsing fields.
Output persists in advisory_linksets and drives advisory.linkset.updated@1 events.
Maintains determinism, offline posture, and LNM/AOC contracts.

Key Changes from v1

Aspect	v1 Behavior	v2 Behavior
Alias matching	Intersection across all inputs	Graph connectivity (LCC ratio)
PURL matching	Intersection across all inputs	Pairwise coverage + IDF weighting
Reference clash	Emitted on zero overlap	Only on true URL contradictions
Conflict penalty	Single -0.1 for any conflict	Typed severities with per-reason penalties
Patch lineage	Not used	Top-tier signal (+0.35 for exact SHA)
Version ranges	Divergence noted only	Classified (Equivalent/Overlapping/Disjoint)

Deterministic Confidence Calculation (0-1)

Signal Weights

confidence = clamp(
  0.30 * alias_connectivity +
  0.10 * alias_authority +
  0.20 * package_coverage +
  0.10 * version_compatibility +
  0.10 * cpe_match +
  0.10 * patch_lineage +
  0.05 * reference_overlap +
  0.05 * freshness_score
) - typed_penalty

Signal Definitions

`alias_connectivity` (weight: 0.30)

Graph-based scoring replacing intersection-across-all.

Build bipartite graph: observation nodes ↔ alias nodes
Connect observations that share any alias (transitive bridging)
Compute LCC (largest connected component) ratio: |LCC| / N

Scenario	Score
All observations in single connected component	1.0
80% of observations connected	0.8
No alias overlap at all	0.0

Why this matters: Sources A (CVE-X), B (CVE-X + GHSA-Y), C (GHSA-Y) now correctly correlate via transitive bridging, whereas v1 produced score = 0.

`alias_authority` (weight: 0.10)

Scope-based weighting using existing canonical key prefixes:

Alias Type	Authority Score
CVE-* (global)	1.0
GHSA-* (ecosystem)	0.8
Vendor IDs (RHSA, MSRC, CISCO, VMSA)	0.6
Distribution IDs (DSA, USN, SUSE)	0.4
Unknown scheme	0.2

`package_coverage` (weight: 0.20)

Pairwise + IDF weighting replacing intersection-across-all.

Extract package keys (PURL without version) from each observation
For each package key, compute IDF weight: log(N / (1 + df)) where N = corpus size, df = observations containing package
Score = weighted overlap ratio across pairs

Scenario	Score
All sources share same rare package	~1.0
All sources share common package (lodash)	~0.6
One "thin" source with no packages	Other sources still score > 0
No package overlap	0.0

IDF fallback: When cache unavailable, uniform weights (1.0) are used.

`version_compatibility` (weight: 0.10)

Classifies version relationships per shared package:

Relation	Score	Conflict
Equivalent: ranges normalize identically	1.0	None
Overlapping: non-empty intersection	0.6	Soft (`affected-range-divergence`)
Disjoint: no intersection	0.0	Hard (`disjoint-version-ranges`)
Unknown: parse failure	0.5	None

Uses SemanticVersionRangeResolver for semver; delegates to ecosystem-specific comparers for rpm EVR, dpkg, apk.

`cpe_match` (weight: 0.10)

Unchanged from v1:

Exact CPE overlap: 1.0
Same vendor/product: 0.5
No match: 0.0

`patch_lineage` (weight: 0.10)

New signal: correlation via shared fix commits.

Extract patch references from observation references (type: patch, fix, commit)
Normalize to commit SHAs using PatchLineageNormalizer
Any pairwise SHA match: 1.0; otherwise 0.0

Why this matters: "These advisories fix the same code" is high-confidence evidence most platforms lack.

`reference_overlap` (weight: 0.05)

Positive-only (no conflict on zero overlap):

Normalize URLs (lowercase, strip tracking params, https://)
Compute max pairwise overlap ratio
Map to score: 0.5 + (overlap * 0.5)

Scenario	Score
100% URL overlap	1.0
50% URL overlap	0.75
Zero URL overlap	0.5 (neutral)

No reference-clash emission for simple disjoint sets.

`freshness_score` (weight: 0.05)

Unchanged from v1:

Spread ≤ 48h: 1.0
Spread ≥ 14d: 0.0
Linear decay between

Conflict Emission (Typed Severities)

Severity Levels

Severity	Penalty Range	Meaning
Hard	0.30 - 0.40	Significant disagreement; likely prevents high-confidence linking
Soft	0.05 - 0.10	Minor disagreement; link with reduced confidence
Info	0.00	Informational; no penalty

Conflict Types and Penalties

Conflict Reason	Severity	Penalty	Trigger Condition
`distinct-cves`	Hard	-0.40	Two different CVE-* identifiers in cluster
`disjoint-version-ranges`	Hard	-0.30	Same package key, ranges have no intersection
`alias-inconsistency`	Soft	-0.10	Disconnected alias graph (but no CVE conflict)
`affected-range-divergence`	Soft	-0.05	Ranges overlap but differ
`severity-mismatch`	Soft	-0.05	CVSS base score delta > 1.0
`reference-clash`	Info	0.00	Reserved for true contradictions only
`metadata-gap`	Info	0.00	Required provenance missing

Penalty Calculation

typed_penalty = min(0.6, sum(penalty_per_conflict))

Saturates at 0.6 to prevent complete collapse; minimum confidence = 0.1 when any evidence exists.

Conflict Record Shape

{
  "field": "aliases",
  "reason": "distinct-cves",
  "severity": "Hard",
  "values": ["nvd:CVE-2025-1234", "ghsa:CVE-2025-5678"],
  "sourceIds": ["nvd", "ghsa"]
}

Linkset Output Shape

Additions from v1:

{
  "key": {
    "vulnerabilityId": "CVE-2025-1234",
    "productKey": "pkg:npm/lodash",
    "confidence": 0.85
  },
  "conflicts": [
    {
      "field": "affected.versions[pkg:npm/lodash]",
      "reason": "affected-range-divergence",
      "severity": "Soft",
      "values": ["nvd:>=4.0.0,<4.17.21", "ghsa:>=4.0.0,<4.18.0"],
      "sourceIds": ["nvd", "ghsa"]
    }
  ],
  "signalScores": {
    "aliasConnectivity": 1.0,
    "aliasAuthority": 1.0,
    "packageCoverage": 0.85,
    "versionCompatibility": 0.6,
    "cpeMatch": 0.5,
    "patchLineage": 1.0,
    "referenceOverlap": 0.75,
    "freshness": 1.0
  },
  "provenance": {
    "observationHashes": ["sha256:abc...", "sha256:def..."],
    "toolVersion": "concelier/2.0.0",
    "correlationVersion": "v2"
  }
}

Algorithm Pseudocode

function Compute(observations):
    if observations.empty:
        return (confidence=1.0, conflicts=[])

    conflicts = []

    # 1. Alias connectivity (graph-based)
    aliasGraph = buildBipartiteGraph(observations)
    aliasConnectivity = LCC(aliasGraph) / observations.count
    if hasDistinctCVEs(aliasGraph):
        conflicts.add(HardConflict("distinct-cves"))
    elif aliasConnectivity < 1.0:
        conflicts.add(SoftConflict("alias-inconsistency"))

    # 2. Alias authority
    aliasAuthority = maxAuthorityScore(observations)

    # 3. Package coverage (pairwise + IDF)
    packageCoverage = computeIDFWeightedCoverage(observations)

    # 4. Version compatibility
    for sharedPackage in findSharedPackages(observations):
        relation = classifyVersionRelation(observations, sharedPackage)
        if relation == Disjoint:
            conflicts.add(HardConflict("disjoint-version-ranges"))
        elif relation == Overlapping:
            conflicts.add(SoftConflict("affected-range-divergence"))
    versionScore = averageRelationScore(observations)

    # 5. CPE match
    cpeScore = computeCpeOverlap(observations)

    # 6. Patch lineage
    patchScore = 1.0 if anyPairSharesCommitSHA(observations) else 0.0

    # 7. Reference overlap (positive-only)
    referenceScore = 0.5 + (maxPairwiseURLOverlap(observations) * 0.5)

    # 8. Freshness
    freshnessScore = computeFreshness(observations)

    # Calculate weighted sum
    baseConfidence = (
        0.30 * aliasConnectivity +
        0.10 * aliasAuthority +
        0.20 * packageCoverage +
        0.10 * versionScore +
        0.10 * cpeScore +
        0.10 * patchScore +
        0.05 * referenceScore +
        0.05 * freshnessScore
    )

    # Apply typed penalties
    penalty = min(0.6, sum(conflict.penalty for conflict in conflicts))
    finalConfidence = max(0.1, baseConfidence - penalty)

    return (confidence=finalConfidence, conflicts=dedupe(conflicts))

Implementation

Code Locations

Component	Path
V2 Algorithm	`src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/LinksetCorrelationV2.cs`
Conflict Model	`src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/AdvisoryLinkset.cs`
Patch Normalizer	`src/Concelier/__Libraries/StellaOps.Concelier.Merge/Identity/Normalizers/PatchLineageNormalizer.cs`
Version Resolver	`src/Concelier/__Libraries/StellaOps.Concelier.Merge/Comparers/SemanticVersionRangeResolver.cs`

Configuration

concelier:
  correlation:
    version: v2  # v1 | v2
    weights:
      aliasConnectivity: 0.30
      aliasAuthority: 0.10
      packageCoverage: 0.20
      versionCompatibility: 0.10
      cpeMatch: 0.10
      patchLineage: 0.10
      referenceOverlap: 0.05
      freshness: 0.05
    idf:
      enabled: true
      cacheKey: "concelier:package:idf"
      refreshIntervalMinutes: 60
    textSimilarity:
      enabled: false  # Phase 3

Telemetry

Instrument	Type	Tags	Purpose
`concelier.linkset.confidence`	Histogram	`version`	Confidence score distribution
`concelier.linkset.conflicts_total`	Counter	`reason`, `severity`	Conflict counts by type
`concelier.linkset.signal_score`	Histogram	`signal`	Per-signal score distribution
`concelier.linkset.patch_lineage_hits`	Counter	-	Patch SHA matches found
`concelier.linkset.idf_cache_hit`	Counter	`hit`	IDF cache effectiveness

Migration

Recompute Job

stella db linksets recompute --correlation-version v2 --batch-size 1000

Recomputes confidence for existing linksets using v2 algorithm. Does not modify observation data.

Rollback

Set concelier:correlation:version: v1 to revert to intersection-based scoring.

Fixtures

docs/modules/concelier/samples/linkset-v2-transitive-bridge.json: Three-source transitive bridging (A↔B↔C) demonstrating graph connectivity.
docs/modules/concelier/samples/linkset-v2-patch-match.json: Two-source correlation via shared commit SHA.
docs/modules/concelier/samples/linkset-v2-hard-conflict.json: Distinct CVEs in cluster triggering hard penalty.

All fixtures use ASCII ordering and ISO-8601 UTC timestamps.

Change Control

V2 is add-only relative to v1 output schema.
Signal weight adjustments require sprint note but not schema version bump.
New conflict reasons require advisory.linkset.updated@2 event schema and doc update.
Removal of a signal requires deprecation period and migration guidance.

12 KiB Raw Blame History