devops folders consolidate

2026-01-25 23:27:41 +02:00
parent 6e687b523a
commit a743bb9a1d
613 changed files with 8611 additions and 41846 deletions
--- a/docs-archived/implplan/SPRINT_20260125_001_Concelier_linkset_correlation_v2.md
+++ b/docs-archived/implplan/SPRINT_20260125_001_Concelier_linkset_correlation_v2.md
@@ -0,0 +1,261 @@
+# Sprint 20260125-001 · Concelier Linkset Correlation v2
+
+## Topic & Scope
+- Fix critical failure modes in current `LinksetCorrelation` algorithm (transitivity, reference clash, blunt penalties).
+- Introduce graph-based alias connectivity, version compatibility scoring, and patch lineage as correlation signals.
+- Replace static weights with IDF-weighted signals and typed conflict severities.
+- Preserve LNM/AOC contracts, determinism, and offline posture throughout.
+- **Working directory:** `src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/` and related test projects.
+- **Expected evidence:** Unit tests with golden fixtures, telemetry counters, updated architecture docs.
+
+## Dependencies & Concurrency
+- Upstream: `CANONICAL_RECORDS.md` merge hash contract, `PatchLineageNormalizer`, `SemanticVersionRangeResolver`.
+- No cross-module changes expected; work stays within Concelier Core and Models.
+- Safe to run in parallel with connector work; linkset schema changes require event version bump.
+
+## Documentation Prerequisites
+- `docs/modules/concelier/architecture.md`
+- `docs/modules/concelier/linkset-correlation-21-002.md`
+- `docs/modules/concelier/guides/aggregation.md`
+- `docs/modules/concelier/operations/conflict-resolution.md`
+- `src/Concelier/__Libraries/StellaOps.Concelier.Models/CANONICAL_RECORDS.md`
+- `src/Concelier/AGENTS.md`
+
+---
+
+## Delivery Tracker
+
+### CORR-V2-001 - Fix alias intersection transitivity
+Status: DONE
+Dependency: none
+Owners: Concelier · Backend
+
+Task description:
+Replace the current `CalculateAliasScore` intersection-across-all logic with graph-based connectivity scoring. Build a bipartite graph (observation ↔ alias nodes), compute largest connected component (LCC) ratio, and return coverage score. Only emit `alias-inconsistency` when **distinct CVEs** appear in the same cluster (true identity conflict).
+
+Current failure: Sources A (CVE-X), B (CVE-X + GHSA-Y), C (GHSA-Y) produce empty intersection despite transitive identity.
+
+Completion criteria:
+- [x] `CalculateAliasConnectivity` method computes LCC coverage (0.0–1.0) via union-find
+- [x] `alias-inconsistency` only emitted when disconnected; `distinct-cves` for true CVE conflicts
+- [x] Unit tests cover transitive bridging cases (3+ sources with partial overlap)
+- [x] 27 new V2 tests added in `LinksetCorrelationV2Tests.cs`
+
+### CORR-V2-002 - Fix PURL intersection transitivity
+Status: DONE
+Dependency: none
+Owners: Concelier · Backend
+
+Task description:
+Replace `CalculatePurlScore` intersection-across-all with pairwise + coverage scoring. A "thin" source with zero packages should not collapse the entire group score to 0. Compute:
+- Pairwise overlap: does any pair share a package key?
+- Coverage: fraction of observations with at least one shared package key.
+
+Completion criteria:
+- [x] `CalculatePackageCoverage` method computes pairwise + coverage
+- [x] Score > 0 when any pair shares package key (even if one source has none)
+- [x] Unit tests cover thin-source scenarios
+- [x] IDF weighting support via `packageIdfProvider` parameter
+
+### CORR-V2-003 - Fix reference conflict logic
+Status: DONE
+Dependency: none
+Owners: Concelier · Backend
+
+Task description:
+Remove `reference-clash` emission when overlap is simply zero. Zero overlap = "no supporting evidence" (neutral), not a conflict. Reserve `reference-clash` for true contradictions:
+- Same canonical URL used to support different global IDs
+- Same reference with contradictory classifiers (e.g., `patch` vs `exploit`)
+
+Completion criteria:
+- [x] `CalculateReferenceScore` returns 0.5 (neutral) on zero overlap
+- [x] No `reference-clash` emission for simple disjoint sets
+- [x] `NormalizeReferenceUrl` added (strip tracking params, normalize case/protocol)
+- [x] Unit tests verify no false-positive conflicts on disjoint reference sets
+
+### CORR-V2-004 - Typed conflict severities
+Status: DONE
+Dependency: none
+Owners: Concelier · Backend
+
+Task description:
+Replace the single `-0.1` conflict penalty with typed severity penalties:
+
+| Conflict Reason | Severity | Penalty |
+|-----------------|----------|---------|
+| Two different CVEs in cluster | Hard | -0.4 |
+| Disjoint version ranges (same pkg) | Hard | -0.3 |
+| Overlapping but divergent ranges | Soft | -0.05 |
+| CVSS/severity mismatch | Soft | -0.05 |
+| Zero reference overlap | None | 0 |
+| Alias inconsistency (non-CVE) | Soft | -0.1 |
+
+Extend `AdvisoryLinksetConflict` with `Severity` enum (`Hard`, `Soft`, `Info`).
+
+Completion criteria:
+- [x] `ConflictSeverity` enum added to `AdvisoryLinkset.cs`
+- [x] `AdvisoryLinksetConflict` extended with `Severity` property
+- [x] `CalculateTypedPenalty` uses per-conflict weights with saturation at 0.6
+- [x] Minimum confidence 0.1 when conflicts exist but evidence present
+
+### CORR-V2-005 - Add patch lineage correlation signal
+Status: DONE
+Dependency: CORR-V2-001
+Owners: Concelier · Backend
+
+Task description:
+Extract patch references from observation references using existing `PatchLineageNormalizer`. Add as a top-tier correlation signal:
+- Exact commit SHA match: +1.0 (full weight)
+- No patch data: 0
+
+This is the differentiating signal most vulnerability platforms lack: "these advisories fix the same code."
+
+Completion criteria:
+- [x] `CalculatePatchLineageScore` extracts and compares commit SHAs
+- [x] Weight 0.10 in unified scoring (configurable)
+- [x] `NormalizePatchReference` extracts SHAs from GitHub/GitLab URLs
+- [x] Unit tests with commit URL fixtures
+
+### CORR-V2-006 - Add version compatibility scoring
+Status: DONE
+Dependency: CORR-V2-002
+Owners: Concelier · Backend
+
+Task description:
+Classify version relationships per shared package key:
+- **Equivalent**: ranges identical → strong positive (1.0)
+- **Overlapping**: intersection non-empty but not equal → positive (0.6) + soft conflict
+- **Disjoint**: intersection empty → 0 + hard conflict
+
+Completion criteria:
+- [x] `CalculateVersionCompatibility` classifies range relationships
+- [x] `VersionRelation` enum { Equivalent, Overlapping, Disjoint, Unknown }
+- [x] `ClassifyVersionRelation` helper for set comparison
+- [x] `affected-range-divergence` (Soft) and `disjoint-version-ranges` (Hard) conflicts
+
+### CORR-V2-007 - Add IDF weighting for package keys
+Status: DONE
+Dependency: CORR-V2-002
+Owners: Concelier · Backend
+
+Task description:
+Compute IDF-like weights for package keys based on corpus frequency:
+- Rare package match (e.g., `pkg:cargo/obscure-lib`) → higher discriminative weight
+- Common package match (e.g., `pkg:npm/lodash`) → lower weight
+
+Formula: `idf(pkg) = log(N / (1 + df(pkg)))` where N = total observations, df = observations containing pkg.
+
+Store IDF cache in Valkey with hourly refresh; fallback to uniform weights if cache unavailable.
+
+Completion criteria:
+- [x] `packageIdfProvider` parameter added to V2 algorithm (infrastructure ready)
+- [x] `PackageIdfService` computes and caches IDF scores in Valkey
+- [x] Graceful degradation to uniform weights on cache miss (null provider = uniform)
+- [x] Telemetry histogram `concelier.linkset.package_idf_weight`
+- [x] Unit tests with mocked corpus frequencies
+
+Implementation notes:
+- Created `IPackageIdfService.cs` interface with batch operations
+- Created `ValkeyPackageIdfService.cs` with Valkey caching, TTL, graceful degradation
+- Created `PackageIdfMetrics.cs` with OpenTelemetry instrumentation
+- Created `IdfRefreshHostedService.cs` for hourly background refresh
+- Extended `AdvisoryCacheKeys.cs` with IDF key schema
+- Updated `ServiceCollectionExtensions.cs` for DI registration
+- 18 unit tests covering keys, options, IDF formulas, and metrics
+
+### CORR-V2-008 - Integrate signals into unified scoring
+Status: DONE
+Dependency: CORR-V2-001, CORR-V2-002, CORR-V2-003, CORR-V2-004, CORR-V2-005, CORR-V2-006
+Owners: Concelier · Backend
+
+Task description:
+Refactor `LinksetCorrelation.Compute()` to use the new scorers:
+
+| Signal | Default Weight | Source |
+|--------|----------------|--------|
+| Alias connectivity | 0.30 | CalculateAliasConnectivity |
+| Alias authority | 0.10 | CalculateAliasAuthority |
+| Package coverage | 0.20 | CalculatePackageCoverage |
+| Version compatibility | 0.10 | CalculateVersionCompatibility |
+| CPE match | 0.10 | CalculateCpeScore |
+| Patch lineage | 0.10 | CalculatePatchLineageScore |
+| Reference overlap | 0.05 | CalculateReferenceScore |
+| Freshness | 0.05 | CalculateFreshnessScore |
+
+Apply typed conflict penalties after base score. Ensure deterministic output by fixing scorer order and tie-breakers.
+
+Completion criteria:
+- [x] `LinksetCorrelationV2.Compute()` implements unified scoring
+- [x] `LinksetCorrelationService` provides V1/V2 switchable interface
+- [x] `CorrelationServiceOptions` for configuration
+- [x] Confidence score stable across runs (deterministic)
+- [x] All 27 V2 tests pass; all 59 linkset tests pass
+
+### CORR-V2-009 - Update documentation
+Status: DONE
+Dependency: CORR-V2-008
+Owners: Documentation
+
+Task description:
+Update architecture and operational docs to reflect v2 correlation:
+- `docs/modules/concelier/linkset-correlation-21-002.md` → new version `linkset-correlation-v2.md`
+- `docs/modules/concelier/architecture.md` § 5.2 Linkset correlation
+- `docs/modules/concelier/operations/conflict-resolution.md` conflict severities
+
+Completion criteria:
+- [x] New `linkset-correlation-v2.md` with signal weights, conflict severities, algorithm overview
+- [x] Architecture doc section updated with V2 correlation table
+- [x] Conflict resolution runbook updated with new severity tiers (§ 5.1)
+- [x] ADR recorded in `docs/architecture/decisions/ADR-001-linkset-correlation-v2.md`
+
+### CORR-V2-010 - Add TF-IDF text similarity (Phase 3 prep)
+Status: DONE
+Dependency: CORR-V2-008
+Owners: Concelier · Backend
+
+Task description:
+Add deterministic TF-IDF text similarity as an optional correlation signal:
+- Tokenize normalized descriptions (existing `DescriptionNormalizer`)
+- Compute TF-IDF vectors per observation
+- Cosine similarity as feature (weight 0.05 by default)
+
+This is prep for Phase 3; disabled by default via feature flag `concelier:correlation:textSimilarity:enabled`.
+
+Completion criteria:
+- [x] `TextSimilarityScorer` computes TF-IDF cosine similarity
+- [x] Feature flag controls enablement (default: false)
+- [x] Deterministic tokenization (lowercase, stop-word removal, stemming optional)
+- [x] Unit tests with description fixtures
+- [x] Performance benchmark (target: ≤ 5ms per pair)
+
+Implementation notes:
+- Created `TextSimilarityScorer.cs` with pure C# TF-IDF implementation
+- Uses smoothed IDF formula: log((N+1)/(df+1)) + 1 to avoid zero weights
+- Stop word list includes common English words + security-specific terms
+- 30 unit tests including determinism checks and real-world CVE fixtures
+- Performance benchmarks verify < 5ms per pair (typically < 0.5ms)
+
+---
+
+## Execution Log
+| Date (UTC) | Update | Owner |
+| --- | --- | --- |
+| 2026-01-25 | Sprint created from product advisory review; 10 tasks scoped across Phase 1-2 improvements. | Planning |
+| 2026-01-25 | Phase 1 implementation complete: CORR-V2-001 through CORR-V2-006 and CORR-V2-008/009 DONE. Created `LinksetCorrelationV2.cs`, `LinksetCorrelationService.cs`, `ILinksetCorrelationService.cs`. Extended `AdvisoryLinksetConflict` with `ConflictSeverity`. 27 new tests passing. Documentation updated. | Backend |
+| 2026-01-25 | CORR-V2-007 complete: Created `IPackageIdfService`, `ValkeyPackageIdfService`, `PackageIdfMetrics`, `IdfRefreshHostedService`. Extended `AdvisoryCacheKeys` with IDF key schema. 18 unit tests passing. | Backend |
+| 2026-01-25 | CORR-V2-009 ADR complete: Created `ADR-001-linkset-correlation-v2.md` documenting V2 algorithm decisions. | Documentation |
+| 2026-01-25 | CORR-V2-010 complete: Created `TextSimilarityScorer.cs` with pure C# TF-IDF implementation. 30 unit tests + benchmarks passing. All 10 sprint tasks DONE. Total: 89 linkset tests passing. | Backend |
+
+## Decisions & Risks
+- **Decision made**: Hard conflicts (distinct CVEs) emit linkset with confidence = 0.1 minimum; downstream policy handles blocking.
+- **Risk**: IDF caching adds Valkey dependency; mitigated with graceful fallback to uniform weights (CORR-V2-007 complete).
+- **Risk**: Changing correlation weights affects existing linkset confidence scores; requires migration/recompute job.
+- **Risk**: Text similarity may add latency; feature-flagged and benchmarked before GA (CORR-V2-010 deferred).
+
+## Next Checkpoints
+- 2026-01-27: Review V2 implementation; validate against production dataset sample.
+- 2026-02-05: Cut pre-release with V2 enabled via feature flag for testing.
+- 2026-02-10: GA readiness review; evaluate text similarity impact on correlation quality.
+
+## Sprint Completion
+All 10 tasks DONE. Sprint ready for archive after validation checkpoint.