devops folders consolidate

This commit is contained in:
master
2026-01-25 23:27:41 +02:00
parent 6e687b523a
commit a743bb9a1d
613 changed files with 8611 additions and 41846 deletions

View File

@@ -0,0 +1,261 @@
# Sprint 20260125-001 · Concelier Linkset Correlation v2
## Topic & Scope
- Fix critical failure modes in current `LinksetCorrelation` algorithm (transitivity, reference clash, blunt penalties).
- Introduce graph-based alias connectivity, version compatibility scoring, and patch lineage as correlation signals.
- Replace static weights with IDF-weighted signals and typed conflict severities.
- Preserve LNM/AOC contracts, determinism, and offline posture throughout.
- **Working directory:** `src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/` and related test projects.
- **Expected evidence:** Unit tests with golden fixtures, telemetry counters, updated architecture docs.
## Dependencies & Concurrency
- Upstream: `CANONICAL_RECORDS.md` merge hash contract, `PatchLineageNormalizer`, `SemanticVersionRangeResolver`.
- No cross-module changes expected; work stays within Concelier Core and Models.
- Safe to run in parallel with connector work; linkset schema changes require event version bump.
## Documentation Prerequisites
- `docs/modules/concelier/architecture.md`
- `docs/modules/concelier/linkset-correlation-21-002.md`
- `docs/modules/concelier/guides/aggregation.md`
- `docs/modules/concelier/operations/conflict-resolution.md`
- `src/Concelier/__Libraries/StellaOps.Concelier.Models/CANONICAL_RECORDS.md`
- `src/Concelier/AGENTS.md`
---
## Delivery Tracker
### CORR-V2-001 - Fix alias intersection transitivity
Status: DONE
Dependency: none
Owners: Concelier · Backend
Task description:
Replace the current `CalculateAliasScore` intersection-across-all logic with graph-based connectivity scoring. Build a bipartite graph (observation ↔ alias nodes), compute largest connected component (LCC) ratio, and return coverage score. Only emit `alias-inconsistency` when **distinct CVEs** appear in the same cluster (true identity conflict).
Current failure: Sources A (CVE-X), B (CVE-X + GHSA-Y), C (GHSA-Y) produce empty intersection despite transitive identity.
Completion criteria:
- [x] `CalculateAliasConnectivity` method computes LCC coverage (0.01.0) via union-find
- [x] `alias-inconsistency` only emitted when disconnected; `distinct-cves` for true CVE conflicts
- [x] Unit tests cover transitive bridging cases (3+ sources with partial overlap)
- [x] 27 new V2 tests added in `LinksetCorrelationV2Tests.cs`
### CORR-V2-002 - Fix PURL intersection transitivity
Status: DONE
Dependency: none
Owners: Concelier · Backend
Task description:
Replace `CalculatePurlScore` intersection-across-all with pairwise + coverage scoring. A "thin" source with zero packages should not collapse the entire group score to 0. Compute:
- Pairwise overlap: does any pair share a package key?
- Coverage: fraction of observations with at least one shared package key.
Completion criteria:
- [x] `CalculatePackageCoverage` method computes pairwise + coverage
- [x] Score > 0 when any pair shares package key (even if one source has none)
- [x] Unit tests cover thin-source scenarios
- [x] IDF weighting support via `packageIdfProvider` parameter
### CORR-V2-003 - Fix reference conflict logic
Status: DONE
Dependency: none
Owners: Concelier · Backend
Task description:
Remove `reference-clash` emission when overlap is simply zero. Zero overlap = "no supporting evidence" (neutral), not a conflict. Reserve `reference-clash` for true contradictions:
- Same canonical URL used to support different global IDs
- Same reference with contradictory classifiers (e.g., `patch` vs `exploit`)
Completion criteria:
- [x] `CalculateReferenceScore` returns 0.5 (neutral) on zero overlap
- [x] No `reference-clash` emission for simple disjoint sets
- [x] `NormalizeReferenceUrl` added (strip tracking params, normalize case/protocol)
- [x] Unit tests verify no false-positive conflicts on disjoint reference sets
### CORR-V2-004 - Typed conflict severities
Status: DONE
Dependency: none
Owners: Concelier · Backend
Task description:
Replace the single `-0.1` conflict penalty with typed severity penalties:
| Conflict Reason | Severity | Penalty |
|-----------------|----------|---------|
| Two different CVEs in cluster | Hard | -0.4 |
| Disjoint version ranges (same pkg) | Hard | -0.3 |
| Overlapping but divergent ranges | Soft | -0.05 |
| CVSS/severity mismatch | Soft | -0.05 |
| Zero reference overlap | None | 0 |
| Alias inconsistency (non-CVE) | Soft | -0.1 |
Extend `AdvisoryLinksetConflict` with `Severity` enum (`Hard`, `Soft`, `Info`).
Completion criteria:
- [x] `ConflictSeverity` enum added to `AdvisoryLinkset.cs`
- [x] `AdvisoryLinksetConflict` extended with `Severity` property
- [x] `CalculateTypedPenalty` uses per-conflict weights with saturation at 0.6
- [x] Minimum confidence 0.1 when conflicts exist but evidence present
### CORR-V2-005 - Add patch lineage correlation signal
Status: DONE
Dependency: CORR-V2-001
Owners: Concelier · Backend
Task description:
Extract patch references from observation references using existing `PatchLineageNormalizer`. Add as a top-tier correlation signal:
- Exact commit SHA match: +1.0 (full weight)
- No patch data: 0
This is the differentiating signal most vulnerability platforms lack: "these advisories fix the same code."
Completion criteria:
- [x] `CalculatePatchLineageScore` extracts and compares commit SHAs
- [x] Weight 0.10 in unified scoring (configurable)
- [x] `NormalizePatchReference` extracts SHAs from GitHub/GitLab URLs
- [x] Unit tests with commit URL fixtures
### CORR-V2-006 - Add version compatibility scoring
Status: DONE
Dependency: CORR-V2-002
Owners: Concelier · Backend
Task description:
Classify version relationships per shared package key:
- **Equivalent**: ranges identical → strong positive (1.0)
- **Overlapping**: intersection non-empty but not equal → positive (0.6) + soft conflict
- **Disjoint**: intersection empty → 0 + hard conflict
Completion criteria:
- [x] `CalculateVersionCompatibility` classifies range relationships
- [x] `VersionRelation` enum { Equivalent, Overlapping, Disjoint, Unknown }
- [x] `ClassifyVersionRelation` helper for set comparison
- [x] `affected-range-divergence` (Soft) and `disjoint-version-ranges` (Hard) conflicts
### CORR-V2-007 - Add IDF weighting for package keys
Status: DONE
Dependency: CORR-V2-002
Owners: Concelier · Backend
Task description:
Compute IDF-like weights for package keys based on corpus frequency:
- Rare package match (e.g., `pkg:cargo/obscure-lib`) → higher discriminative weight
- Common package match (e.g., `pkg:npm/lodash`) → lower weight
Formula: `idf(pkg) = log(N / (1 + df(pkg)))` where N = total observations, df = observations containing pkg.
Store IDF cache in Valkey with hourly refresh; fallback to uniform weights if cache unavailable.
Completion criteria:
- [x] `packageIdfProvider` parameter added to V2 algorithm (infrastructure ready)
- [x] `PackageIdfService` computes and caches IDF scores in Valkey
- [x] Graceful degradation to uniform weights on cache miss (null provider = uniform)
- [x] Telemetry histogram `concelier.linkset.package_idf_weight`
- [x] Unit tests with mocked corpus frequencies
Implementation notes:
- Created `IPackageIdfService.cs` interface with batch operations
- Created `ValkeyPackageIdfService.cs` with Valkey caching, TTL, graceful degradation
- Created `PackageIdfMetrics.cs` with OpenTelemetry instrumentation
- Created `IdfRefreshHostedService.cs` for hourly background refresh
- Extended `AdvisoryCacheKeys.cs` with IDF key schema
- Updated `ServiceCollectionExtensions.cs` for DI registration
- 18 unit tests covering keys, options, IDF formulas, and metrics
### CORR-V2-008 - Integrate signals into unified scoring
Status: DONE
Dependency: CORR-V2-001, CORR-V2-002, CORR-V2-003, CORR-V2-004, CORR-V2-005, CORR-V2-006
Owners: Concelier · Backend
Task description:
Refactor `LinksetCorrelation.Compute()` to use the new scorers:
| Signal | Default Weight | Source |
|--------|----------------|--------|
| Alias connectivity | 0.30 | CalculateAliasConnectivity |
| Alias authority | 0.10 | CalculateAliasAuthority |
| Package coverage | 0.20 | CalculatePackageCoverage |
| Version compatibility | 0.10 | CalculateVersionCompatibility |
| CPE match | 0.10 | CalculateCpeScore |
| Patch lineage | 0.10 | CalculatePatchLineageScore |
| Reference overlap | 0.05 | CalculateReferenceScore |
| Freshness | 0.05 | CalculateFreshnessScore |
Apply typed conflict penalties after base score. Ensure deterministic output by fixing scorer order and tie-breakers.
Completion criteria:
- [x] `LinksetCorrelationV2.Compute()` implements unified scoring
- [x] `LinksetCorrelationService` provides V1/V2 switchable interface
- [x] `CorrelationServiceOptions` for configuration
- [x] Confidence score stable across runs (deterministic)
- [x] All 27 V2 tests pass; all 59 linkset tests pass
### CORR-V2-009 - Update documentation
Status: DONE
Dependency: CORR-V2-008
Owners: Documentation
Task description:
Update architecture and operational docs to reflect v2 correlation:
- `docs/modules/concelier/linkset-correlation-21-002.md` → new version `linkset-correlation-v2.md`
- `docs/modules/concelier/architecture.md` § 5.2 Linkset correlation
- `docs/modules/concelier/operations/conflict-resolution.md` conflict severities
Completion criteria:
- [x] New `linkset-correlation-v2.md` with signal weights, conflict severities, algorithm overview
- [x] Architecture doc section updated with V2 correlation table
- [x] Conflict resolution runbook updated with new severity tiers (§ 5.1)
- [x] ADR recorded in `docs/architecture/decisions/ADR-001-linkset-correlation-v2.md`
### CORR-V2-010 - Add TF-IDF text similarity (Phase 3 prep)
Status: DONE
Dependency: CORR-V2-008
Owners: Concelier · Backend
Task description:
Add deterministic TF-IDF text similarity as an optional correlation signal:
- Tokenize normalized descriptions (existing `DescriptionNormalizer`)
- Compute TF-IDF vectors per observation
- Cosine similarity as feature (weight 0.05 by default)
This is prep for Phase 3; disabled by default via feature flag `concelier:correlation:textSimilarity:enabled`.
Completion criteria:
- [x] `TextSimilarityScorer` computes TF-IDF cosine similarity
- [x] Feature flag controls enablement (default: false)
- [x] Deterministic tokenization (lowercase, stop-word removal, stemming optional)
- [x] Unit tests with description fixtures
- [x] Performance benchmark (target: ≤ 5ms per pair)
Implementation notes:
- Created `TextSimilarityScorer.cs` with pure C# TF-IDF implementation
- Uses smoothed IDF formula: log((N+1)/(df+1)) + 1 to avoid zero weights
- Stop word list includes common English words + security-specific terms
- 30 unit tests including determinism checks and real-world CVE fixtures
- Performance benchmarks verify < 5ms per pair (typically < 0.5ms)
---
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-01-25 | Sprint created from product advisory review; 10 tasks scoped across Phase 1-2 improvements. | Planning |
| 2026-01-25 | Phase 1 implementation complete: CORR-V2-001 through CORR-V2-006 and CORR-V2-008/009 DONE. Created `LinksetCorrelationV2.cs`, `LinksetCorrelationService.cs`, `ILinksetCorrelationService.cs`. Extended `AdvisoryLinksetConflict` with `ConflictSeverity`. 27 new tests passing. Documentation updated. | Backend |
| 2026-01-25 | CORR-V2-007 complete: Created `IPackageIdfService`, `ValkeyPackageIdfService`, `PackageIdfMetrics`, `IdfRefreshHostedService`. Extended `AdvisoryCacheKeys` with IDF key schema. 18 unit tests passing. | Backend |
| 2026-01-25 | CORR-V2-009 ADR complete: Created `ADR-001-linkset-correlation-v2.md` documenting V2 algorithm decisions. | Documentation |
| 2026-01-25 | CORR-V2-010 complete: Created `TextSimilarityScorer.cs` with pure C# TF-IDF implementation. 30 unit tests + benchmarks passing. All 10 sprint tasks DONE. Total: 89 linkset tests passing. | Backend |
## Decisions & Risks
- **Decision made**: Hard conflicts (distinct CVEs) emit linkset with confidence = 0.1 minimum; downstream policy handles blocking.
- **Risk**: IDF caching adds Valkey dependency; mitigated with graceful fallback to uniform weights (CORR-V2-007 complete).
- **Risk**: Changing correlation weights affects existing linkset confidence scores; requires migration/recompute job.
- **Risk**: Text similarity may add latency; feature-flagged and benchmarked before GA (CORR-V2-010 deferred).
## Next Checkpoints
- 2026-01-27: Review V2 implementation; validate against production dataset sample.
- 2026-02-05: Cut pre-release with V2 enabled via feature flag for testing.
- 2026-02-10: GA readiness review; evaluate text similarity impact on correlation quality.
## Sprint Completion
All 10 tasks DONE. Sprint ready for archive after validation checkpoint.