devops folders consolidate

2026-01-25 23:27:41 +02:00
parent 6e687b523a
commit a743bb9a1d
613 changed files with 8611 additions and 41846 deletions
--- a/docs/modules/concelier/architecture.md
+++ b/docs/modules/concelier/architecture.md
@@ -305,11 +305,33 @@ public interface IFeedConnector {
 ### 5.2 Linkset correlation

 1. **Queue** — observation deltas enqueue correlation jobs keyed by `(tenant, vulnerabilityId, productKey)` candidates derived from identifiers + alias graph.
-2. **Canonical grouping** — builder resolves aliases using Concelier’s alias store and deterministic heuristics (vendor > distro > cert), deriving normalized product keys (purl preferred) and confidence scores.
+2. **Canonical grouping** — builder resolves aliases using Concelier's alias store and deterministic heuristics (vendor > distro > cert), deriving normalized product keys (purl preferred) and confidence scores.
 3. **Linkset materialization** — `advisory_linksets` documents store sorted observation references, alias sets, product keys, range metadata, and conflict payloads. Writes are idempotent; unchanged hashes skip updates.
-4. **Conflict detection** — builder emits structured conflicts (`severity-mismatch`, `affected-range-divergence`, `reference-clash`, `alias-inconsistency`, `metadata-gap`). Conflicts carry per-observation values for explainability.
+4. **Conflict detection** — builder emits structured conflicts with typed severities (Hard/Soft/Info). Conflicts carry per-observation values for explainability.
 5. **Event emission** — `advisory.linkset.updated@1` summarizes deltas (`added`, `removed`, `changed` observation IDs, conflict updates, confidence changes) and includes a canonical hash for replay validation.

+#### Correlation Algorithm (v2)
+
+The v2 correlation algorithm (see `linkset-correlation-v2.md`) replaces intersection-based scoring with graph-based connectivity and adds new signals:
+
+| Signal | Weight | Description |
+|--------|--------|-------------|
+| Alias connectivity | 0.30 | LCC ratio from bipartite graph (transitive bridging) |
+| Alias authority | 0.10 | Scope hierarchy (CVE > GHSA > VND > DST) |
+| Package coverage | 0.20 | Pairwise + IDF-weighted overlap |
+| Version compatibility | 0.10 | Equivalent/Overlapping/Disjoint classification |
+| CPE match | 0.10 | Exact or vendor/product overlap |
+| Patch lineage | 0.10 | Shared commit SHA from fix references |
+| Reference overlap | 0.05 | Positive-only URL matching |
+| Freshness | 0.05 | Fetch timestamp spread |
+
+Conflict penalties are typed:
+- **Hard** (`distinct-cves`, `disjoint-version-ranges`): -0.30 to -0.40
+- **Soft** (`affected-range-divergence`, `severity-mismatch`): -0.05 to -0.10
+- **Info** (`reference-clash` on simple disjoint sets): no penalty
+
+Configure via `concelier:correlation:version` (v1 or v2) and optional weight overrides.
+
 ### 5.3 Event contract

 | Event | Schema | Notes |
@@ -317,7 +339,7 @@ public interface IFeedConnector {
 | `advisory.observation.updated@1` | `events/advisory.observation.updated@1.json` | Fired on new or superseded observations. Includes `observationId`, source metadata, `linksetSummary` (aliases/purls), supersedes pointer (if any), SHA-256 hash, and `traceId`. |
 | `advisory.linkset.updated@1` | `events/advisory.linkset.updated@1.json` | Fired when correlation changes. Includes `linksetId`, `key{vulnerabilityId, productKey, confidence}`, observation deltas, conflicts, `updatedAt`, and canonical hash. |

-Events are emitted via NATS (primary) and Valkey Stream (fallback). Consumers acknowledge idempotently using the hash; duplicates are safe. Offline Kit captures both topics during bundle creation for air-gapped replay.
+Events are emitted via Valkey Streams. Consumers acknowledge idempotently using the hash; duplicates are safe. Offline Kit captures event streams during bundle creation for air-gapped replay.

 ---

--- a/docs/modules/concelier/linkset-correlation-v2.md
+++ b/docs/modules/concelier/linkset-correlation-v2.md
@@ -0,0 +1,379 @@
+# CONCELIER-LNM-26-001 · Linkset Correlation Rules (v2)
+
+> Supersedes `linkset-correlation-21-002.md` for new linkset builds.
+> V1 linksets remain valid; migration job will recompute confidence using v2 algorithm.
+
+Purpose: Address critical failure modes in v1 correlation (intersection transitivity, false conflict emission) and introduce higher-discriminative signals (patch lineage, version compatibility, IDF-weighted package matching).
+
+---
+
+## Scope
+
+- Applies to linksets produced from `advisory_observations` (LNM v2).
+- Correlation is aggregation-only: no value synthesis or merge; emit conflicts instead of collapsing fields.
+- Output persists in `advisory_linksets` and drives `advisory.linkset.updated@1` events.
+- Maintains determinism, offline posture, and LNM/AOC contracts.
+
+---
+
+## Key Changes from v1
+
+| Aspect | v1 Behavior | v2 Behavior |
+|--------|-------------|-------------|
+| Alias matching | Intersection across all inputs | Graph connectivity (LCC ratio) |
+| PURL matching | Intersection across all inputs | Pairwise coverage + IDF weighting |
+| Reference clash | Emitted on zero overlap | Only on true URL contradictions |
+| Conflict penalty | Single -0.1 for any conflict | Typed severities with per-reason penalties |
+| Patch lineage | Not used | Top-tier signal (+0.35 for exact SHA) |
+| Version ranges | Divergence noted only | Classified (Equivalent/Overlapping/Disjoint) |
+
+---
+
+## Deterministic Confidence Calculation (0-1)
+
+### Signal Weights
+
+```
+confidence = clamp(
+  0.30 * alias_connectivity +
+  0.10 * alias_authority +
+  0.20 * package_coverage +
+  0.10 * version_compatibility +
+  0.10 * cpe_match +
+  0.10 * patch_lineage +
+  0.05 * reference_overlap +
+  0.05 * freshness_score
+) - typed_penalty
+```
+
+### Signal Definitions
+
+#### `alias_connectivity` (weight: 0.30)
+
+**Graph-based scoring** replacing intersection-across-all.
+
+1. Build bipartite graph: observation nodes ↔ alias nodes
+2. Connect observations that share any alias (transitive bridging)
+3. Compute LCC (largest connected component) ratio: `|LCC| / N`
+
+| Scenario | Score |
+|----------|-------|
+| All observations in single connected component | 1.0 |
+| 80% of observations connected | 0.8 |
+| No alias overlap at all | 0.0 |
+
+**Why this matters**: Sources A (CVE-X), B (CVE-X + GHSA-Y), C (GHSA-Y) now correctly correlate via transitive bridging, whereas v1 produced score = 0.
+
+#### `alias_authority` (weight: 0.10)
+
+Scope-based weighting using existing canonical key prefixes:
+
+| Alias Type | Authority Score |
+|------------|-----------------|
+| CVE-* (global) | 1.0 |
+| GHSA-* (ecosystem) | 0.8 |
+| Vendor IDs (RHSA, MSRC, CISCO, VMSA) | 0.6 |
+| Distribution IDs (DSA, USN, SUSE) | 0.4 |
+| Unknown scheme | 0.2 |
+
+#### `package_coverage` (weight: 0.20)
+
+**Pairwise + IDF weighting** replacing intersection-across-all.
+
+1. Extract package keys (PURL without version) from each observation
+2. For each package key, compute IDF weight: `log(N / (1 + df))` where N = corpus size, df = observations containing package
+3. Score = weighted overlap ratio across pairs
+
+| Scenario | Score |
+|----------|-------|
+| All sources share same rare package | ~1.0 |
+| All sources share common package (lodash) | ~0.6 |
+| One "thin" source with no packages | Other sources still score > 0 |
+| No package overlap | 0.0 |
+
+**IDF fallback**: When cache unavailable, uniform weights (1.0) are used.
+
+#### `version_compatibility` (weight: 0.10)
+
+Classifies version relationships per shared package:
+
+| Relation | Score | Conflict |
+|----------|-------|----------|
+| **Equivalent**: ranges normalize identically | 1.0 | None |
+| **Overlapping**: non-empty intersection | 0.6 | Soft (`affected-range-divergence`) |
+| **Disjoint**: no intersection | 0.0 | Hard (`disjoint-version-ranges`) |
+| **Unknown**: parse failure | 0.5 | None |
+
+Uses `SemanticVersionRangeResolver` for semver; delegates to ecosystem-specific comparers for rpm EVR, dpkg, apk.
+
+#### `cpe_match` (weight: 0.10)
+
+Unchanged from v1:
+- Exact CPE overlap: 1.0
+- Same vendor/product: 0.5
+- No match: 0.0
+
+#### `patch_lineage` (weight: 0.10)
+
+**New signal**: correlation via shared fix commits.
+
+1. Extract patch references from observation references (type: `patch`, `fix`, `commit`)
+2. Normalize to commit SHAs using `PatchLineageNormalizer`
+3. Any pairwise SHA match: 1.0; otherwise 0.0
+
+**Why this matters**: "These advisories fix the same code" is high-confidence evidence most platforms lack.
+
+#### `reference_overlap` (weight: 0.05)
+
+**Positive-only** (no conflict on zero overlap):
+
+1. Normalize URLs (lowercase, strip tracking params, https://)
+2. Compute max pairwise overlap ratio
+3. Map to score: `0.5 + (overlap * 0.5)`
+
+| Scenario | Score |
+|----------|-------|
+| 100% URL overlap | 1.0 |
+| 50% URL overlap | 0.75 |
+| Zero URL overlap | 0.5 (neutral) |
+
+**No `reference-clash` emission** for simple disjoint sets.
+
+#### `freshness_score` (weight: 0.05)
+
+Unchanged from v1:
+- Spread ≤ 48h: 1.0
+- Spread ≥ 14d: 0.0
+- Linear decay between
+
+---
+
+## Conflict Emission (Typed Severities)
+
+### Severity Levels
+
+| Severity | Penalty Range | Meaning |
+|----------|---------------|---------|
+| **Hard** | 0.30 - 0.40 | Significant disagreement; likely prevents high-confidence linking |
+| **Soft** | 0.05 - 0.10 | Minor disagreement; link with reduced confidence |
+| **Info** | 0.00 | Informational; no penalty |
+
+### Conflict Types and Penalties
+
+| Conflict Reason | Severity | Penalty | Trigger Condition |
+|-----------------|----------|---------|-------------------|
+| `distinct-cves` | Hard | -0.40 | Two different CVE-* identifiers in cluster |
+| `disjoint-version-ranges` | Hard | -0.30 | Same package key, ranges have no intersection |
+| `alias-inconsistency` | Soft | -0.10 | Disconnected alias graph (but no CVE conflict) |
+| `affected-range-divergence` | Soft | -0.05 | Ranges overlap but differ |
+| `severity-mismatch` | Soft | -0.05 | CVSS base score delta > 1.0 |
+| `reference-clash` | Info | 0.00 | Reserved for true contradictions only |
+| `metadata-gap` | Info | 0.00 | Required provenance missing |
+
+### Penalty Calculation
+
+```
+typed_penalty = min(0.6, sum(penalty_per_conflict))
+```
+
+Saturates at 0.6 to prevent complete collapse; minimum confidence = 0.1 when any evidence exists.
+
+### Conflict Record Shape
+
+```json
+{
+  "field": "aliases",
+  "reason": "distinct-cves",
+  "severity": "Hard",
+  "values": ["nvd:CVE-2025-1234", "ghsa:CVE-2025-5678"],
+  "sourceIds": ["nvd", "ghsa"]
+}
+```
+
+---
+
+## Linkset Output Shape
+
+Additions from v1:
+
+```json
+{
+  "key": {
+    "vulnerabilityId": "CVE-2025-1234",
+    "productKey": "pkg:npm/lodash",
+    "confidence": 0.85
+  },
+  "conflicts": [
+    {
+      "field": "affected.versions[pkg:npm/lodash]",
+      "reason": "affected-range-divergence",
+      "severity": "Soft",
+      "values": ["nvd:>=4.0.0,<4.17.21", "ghsa:>=4.0.0,<4.18.0"],
+      "sourceIds": ["nvd", "ghsa"]
+    }
+  ],
+  "signalScores": {
+    "aliasConnectivity": 1.0,
+    "aliasAuthority": 1.0,
+    "packageCoverage": 0.85,
+    "versionCompatibility": 0.6,
+    "cpeMatch": 0.5,
+    "patchLineage": 1.0,
+    "referenceOverlap": 0.75,
+    "freshness": 1.0
+  },
+  "provenance": {
+    "observationHashes": ["sha256:abc...", "sha256:def..."],
+    "toolVersion": "concelier/2.0.0",
+    "correlationVersion": "v2"
+  }
+}
+```
+
+---
+
+## Algorithm Pseudocode
+
+```
+function Compute(observations):
+    if observations.empty:
+        return (confidence=1.0, conflicts=[])
+
+    conflicts = []
+
+    # 1. Alias connectivity (graph-based)
+    aliasGraph = buildBipartiteGraph(observations)
+    aliasConnectivity = LCC(aliasGraph) / observations.count
+    if hasDistinctCVEs(aliasGraph):
+        conflicts.add(HardConflict("distinct-cves"))
+    elif aliasConnectivity < 1.0:
+        conflicts.add(SoftConflict("alias-inconsistency"))
+
+    # 2. Alias authority
+    aliasAuthority = maxAuthorityScore(observations)
+
+    # 3. Package coverage (pairwise + IDF)
+    packageCoverage = computeIDFWeightedCoverage(observations)
+
+    # 4. Version compatibility
+    for sharedPackage in findSharedPackages(observations):
+        relation = classifyVersionRelation(observations, sharedPackage)
+        if relation == Disjoint:
+            conflicts.add(HardConflict("disjoint-version-ranges"))
+        elif relation == Overlapping:
+            conflicts.add(SoftConflict("affected-range-divergence"))
+    versionScore = averageRelationScore(observations)
+
+    # 5. CPE match
+    cpeScore = computeCpeOverlap(observations)
+
+    # 6. Patch lineage
+    patchScore = 1.0 if anyPairSharesCommitSHA(observations) else 0.0
+
+    # 7. Reference overlap (positive-only)
+    referenceScore = 0.5 + (maxPairwiseURLOverlap(observations) * 0.5)
+
+    # 8. Freshness
+    freshnessScore = computeFreshness(observations)
+
+    # Calculate weighted sum
+    baseConfidence = (
+        0.30 * aliasConnectivity +
+        0.10 * aliasAuthority +
+        0.20 * packageCoverage +
+        0.10 * versionScore +
+        0.10 * cpeScore +
+        0.10 * patchScore +
+        0.05 * referenceScore +
+        0.05 * freshnessScore
+    )
+
+    # Apply typed penalties
+    penalty = min(0.6, sum(conflict.penalty for conflict in conflicts))
+    finalConfidence = max(0.1, baseConfidence - penalty)
+
+    return (confidence=finalConfidence, conflicts=dedupe(conflicts))
+```
+
+---
+
+## Implementation
+
+### Code Locations
+
+| Component | Path |
+|-----------|------|
+| V2 Algorithm | `src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/LinksetCorrelationV2.cs` |
+| Conflict Model | `src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/AdvisoryLinkset.cs` |
+| Patch Normalizer | `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Identity/Normalizers/PatchLineageNormalizer.cs` |
+| Version Resolver | `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Comparers/SemanticVersionRangeResolver.cs` |
+
+### Configuration
+
+```yaml
+concelier:
+  correlation:
+    version: v2  # v1 | v2
+    weights:
+      aliasConnectivity: 0.30
+      aliasAuthority: 0.10
+      packageCoverage: 0.20
+      versionCompatibility: 0.10
+      cpeMatch: 0.10
+      patchLineage: 0.10
+      referenceOverlap: 0.05
+      freshness: 0.05
+    idf:
+      enabled: true
+      cacheKey: "concelier:package:idf"
+      refreshIntervalMinutes: 60
+    textSimilarity:
+      enabled: false  # Phase 3
+```
+
+---
+
+## Telemetry
+
+| Instrument | Type | Tags | Purpose |
+|------------|------|------|---------|
+| `concelier.linkset.confidence` | Histogram | `version` | Confidence score distribution |
+| `concelier.linkset.conflicts_total` | Counter | `reason`, `severity` | Conflict counts by type |
+| `concelier.linkset.signal_score` | Histogram | `signal` | Per-signal score distribution |
+| `concelier.linkset.patch_lineage_hits` | Counter | - | Patch SHA matches found |
+| `concelier.linkset.idf_cache_hit` | Counter | `hit` | IDF cache effectiveness |
+
+---
+
+## Migration
+
+### Recompute Job
+
+```bash
+stella db linksets recompute --correlation-version v2 --batch-size 1000
+```
+
+Recomputes confidence for existing linksets using v2 algorithm. Does not modify observation data.
+
+### Rollback
+
+Set `concelier:correlation:version: v1` to revert to intersection-based scoring.
+
+---
+
+## Fixtures
+
+- `docs/modules/concelier/samples/linkset-v2-transitive-bridge.json`: Three-source transitive bridging (A↔B↔C) demonstrating graph connectivity.
+- `docs/modules/concelier/samples/linkset-v2-patch-match.json`: Two-source correlation via shared commit SHA.
+- `docs/modules/concelier/samples/linkset-v2-hard-conflict.json`: Distinct CVEs in cluster triggering hard penalty.
+
+All fixtures use ASCII ordering and ISO-8601 UTC timestamps.
+
+---
+
+## Change Control
+
+- V2 is add-only relative to v1 output schema.
+- Signal weight adjustments require sprint note but not schema version bump.
+- New conflict reasons require `advisory.linkset.updated@2` event schema and doc update.
+- Removal of a signal requires deprecation period and migration guidance.
--- a/docs/modules/concelier/operations/conflict-resolution.md
+++ b/docs/modules/concelier/operations/conflict-resolution.md
@@ -81,6 +81,26 @@ Expect all logs at `Information`. Ensure OTEL exporters include the scope `Stell

 ## 5. Conflict Classification Matrix

+### 5.1 Linkset Conflicts (v2 Correlation)
+
+Linkset conflicts now carry typed severities that affect confidence scoring:
+
+| Severity | Penalty | Conflicts | Triage Priority |
+|----------|---------|-----------|-----------------|
+| **Hard** | -0.30 to -0.40 | `distinct-cves`, `disjoint-version-ranges` | High - investigate immediately |
+| **Soft** | -0.05 to -0.10 | `affected-range-divergence`, `severity-mismatch`, `alias-inconsistency` | Medium - review in batch |
+| **Info** | 0.00 | `metadata-gap`, `reference-clash` (disjoint only) | Low - informational |
+
+| Conflict Reason | Severity | Likely Cause | Immediate Action |
+|-----------------|----------|--------------|------------------|
+| `distinct-cves` | Hard | Two different CVE-* IDs in same linkset cluster | Investigate alias mappings; likely compound advisory or incorrect aliasing |
+| `disjoint-version-ranges` | Hard | Same package, no version overlap between sources | Check if distro backport; verify connector range parsing |
+| `affected-range-divergence` | Soft | Ranges overlap but differ | Often benign (distro vs upstream versioning); monitor trends |
+| `severity-mismatch` | Soft | CVSS scores differ by > 1.0 | Normal for cross-source; freshest source typically wins |
+| `alias-inconsistency` | Soft | Disconnected alias graph (no shared CVE) | Review alias extraction; may indicate unrelated advisories grouped |
+
+### 5.2 Merge Conflicts (Legacy)
+
 | Signal | Likely Cause | Immediate Action |
 |--------|--------------|------------------|
 | `reason="mismatch"` with `type="severity"` | Upstream feeds disagree on CVSS vector/severity. | Verify which feed is freshest; if correctness is known, adjust connector mapping or precedence override. |
--- a/docs/modules/concelier/operations/mirror.md
+++ b/docs/modules/concelier/operations/mirror.md
@@ -16,7 +16,7 @@ authn, CDN fronting, and the recurring sync pipeline that keeps mirror bundles c
  that hold `concelier` JSON bundles and `excititor` VEX exports.
 - **Persistent volumes** – storage for Concelier job metadata and mirror export trees.
  For Helm, provision PVCs (`concelier-mirror-jobs`, `concelier-mirror-exports`,
-  `excititor-mirror-exports`, `mirror-mongo-data`, `mirror-minio-data`) before rollout.
+  `excititor-mirror-exports`) before rollout.

 ### 1.1 Service configuration quick reference