devops folders consolidate
This commit is contained in:
@@ -0,0 +1,261 @@
|
||||
# Sprint 20260125-001 · Concelier Linkset Correlation v2
|
||||
|
||||
## Topic & Scope
|
||||
- Fix critical failure modes in current `LinksetCorrelation` algorithm (transitivity, reference clash, blunt penalties).
|
||||
- Introduce graph-based alias connectivity, version compatibility scoring, and patch lineage as correlation signals.
|
||||
- Replace static weights with IDF-weighted signals and typed conflict severities.
|
||||
- Preserve LNM/AOC contracts, determinism, and offline posture throughout.
|
||||
- **Working directory:** `src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/` and related test projects.
|
||||
- **Expected evidence:** Unit tests with golden fixtures, telemetry counters, updated architecture docs.
|
||||
|
||||
## Dependencies & Concurrency
|
||||
- Upstream: `CANONICAL_RECORDS.md` merge hash contract, `PatchLineageNormalizer`, `SemanticVersionRangeResolver`.
|
||||
- No cross-module changes expected; work stays within Concelier Core and Models.
|
||||
- Safe to run in parallel with connector work; linkset schema changes require event version bump.
|
||||
|
||||
## Documentation Prerequisites
|
||||
- `docs/modules/concelier/architecture.md`
|
||||
- `docs/modules/concelier/linkset-correlation-21-002.md`
|
||||
- `docs/modules/concelier/guides/aggregation.md`
|
||||
- `docs/modules/concelier/operations/conflict-resolution.md`
|
||||
- `src/Concelier/__Libraries/StellaOps.Concelier.Models/CANONICAL_RECORDS.md`
|
||||
- `src/Concelier/AGENTS.md`
|
||||
|
||||
---
|
||||
|
||||
## Delivery Tracker
|
||||
|
||||
### CORR-V2-001 - Fix alias intersection transitivity
|
||||
Status: DONE
|
||||
Dependency: none
|
||||
Owners: Concelier · Backend
|
||||
|
||||
Task description:
|
||||
Replace the current `CalculateAliasScore` intersection-across-all logic with graph-based connectivity scoring. Build a bipartite graph (observation ↔ alias nodes), compute largest connected component (LCC) ratio, and return coverage score. Only emit `alias-inconsistency` when **distinct CVEs** appear in the same cluster (true identity conflict).
|
||||
|
||||
Current failure: Sources A (CVE-X), B (CVE-X + GHSA-Y), C (GHSA-Y) produce empty intersection despite transitive identity.
|
||||
|
||||
Completion criteria:
|
||||
- [x] `CalculateAliasConnectivity` method computes LCC coverage (0.0–1.0) via union-find
|
||||
- [x] `alias-inconsistency` only emitted when disconnected; `distinct-cves` for true CVE conflicts
|
||||
- [x] Unit tests cover transitive bridging cases (3+ sources with partial overlap)
|
||||
- [x] 27 new V2 tests added in `LinksetCorrelationV2Tests.cs`
|
||||
|
||||
### CORR-V2-002 - Fix PURL intersection transitivity
|
||||
Status: DONE
|
||||
Dependency: none
|
||||
Owners: Concelier · Backend
|
||||
|
||||
Task description:
|
||||
Replace `CalculatePurlScore` intersection-across-all with pairwise + coverage scoring. A "thin" source with zero packages should not collapse the entire group score to 0. Compute:
|
||||
- Pairwise overlap: does any pair share a package key?
|
||||
- Coverage: fraction of observations with at least one shared package key.
|
||||
|
||||
Completion criteria:
|
||||
- [x] `CalculatePackageCoverage` method computes pairwise + coverage
|
||||
- [x] Score > 0 when any pair shares package key (even if one source has none)
|
||||
- [x] Unit tests cover thin-source scenarios
|
||||
- [x] IDF weighting support via `packageIdfProvider` parameter
|
||||
|
||||
### CORR-V2-003 - Fix reference conflict logic
|
||||
Status: DONE
|
||||
Dependency: none
|
||||
Owners: Concelier · Backend
|
||||
|
||||
Task description:
|
||||
Remove `reference-clash` emission when overlap is simply zero. Zero overlap = "no supporting evidence" (neutral), not a conflict. Reserve `reference-clash` for true contradictions:
|
||||
- Same canonical URL used to support different global IDs
|
||||
- Same reference with contradictory classifiers (e.g., `patch` vs `exploit`)
|
||||
|
||||
Completion criteria:
|
||||
- [x] `CalculateReferenceScore` returns 0.5 (neutral) on zero overlap
|
||||
- [x] No `reference-clash` emission for simple disjoint sets
|
||||
- [x] `NormalizeReferenceUrl` added (strip tracking params, normalize case/protocol)
|
||||
- [x] Unit tests verify no false-positive conflicts on disjoint reference sets
|
||||
|
||||
### CORR-V2-004 - Typed conflict severities
|
||||
Status: DONE
|
||||
Dependency: none
|
||||
Owners: Concelier · Backend
|
||||
|
||||
Task description:
|
||||
Replace the single `-0.1` conflict penalty with typed severity penalties:
|
||||
|
||||
| Conflict Reason | Severity | Penalty |
|
||||
|-----------------|----------|---------|
|
||||
| Two different CVEs in cluster | Hard | -0.4 |
|
||||
| Disjoint version ranges (same pkg) | Hard | -0.3 |
|
||||
| Overlapping but divergent ranges | Soft | -0.05 |
|
||||
| CVSS/severity mismatch | Soft | -0.05 |
|
||||
| Zero reference overlap | None | 0 |
|
||||
| Alias inconsistency (non-CVE) | Soft | -0.1 |
|
||||
|
||||
Extend `AdvisoryLinksetConflict` with `Severity` enum (`Hard`, `Soft`, `Info`).
|
||||
|
||||
Completion criteria:
|
||||
- [x] `ConflictSeverity` enum added to `AdvisoryLinkset.cs`
|
||||
- [x] `AdvisoryLinksetConflict` extended with `Severity` property
|
||||
- [x] `CalculateTypedPenalty` uses per-conflict weights with saturation at 0.6
|
||||
- [x] Minimum confidence 0.1 when conflicts exist but evidence present
|
||||
|
||||
### CORR-V2-005 - Add patch lineage correlation signal
|
||||
Status: DONE
|
||||
Dependency: CORR-V2-001
|
||||
Owners: Concelier · Backend
|
||||
|
||||
Task description:
|
||||
Extract patch references from observation references using existing `PatchLineageNormalizer`. Add as a top-tier correlation signal:
|
||||
- Exact commit SHA match: +1.0 (full weight)
|
||||
- No patch data: 0
|
||||
|
||||
This is the differentiating signal most vulnerability platforms lack: "these advisories fix the same code."
|
||||
|
||||
Completion criteria:
|
||||
- [x] `CalculatePatchLineageScore` extracts and compares commit SHAs
|
||||
- [x] Weight 0.10 in unified scoring (configurable)
|
||||
- [x] `NormalizePatchReference` extracts SHAs from GitHub/GitLab URLs
|
||||
- [x] Unit tests with commit URL fixtures
|
||||
|
||||
### CORR-V2-006 - Add version compatibility scoring
|
||||
Status: DONE
|
||||
Dependency: CORR-V2-002
|
||||
Owners: Concelier · Backend
|
||||
|
||||
Task description:
|
||||
Classify version relationships per shared package key:
|
||||
- **Equivalent**: ranges identical → strong positive (1.0)
|
||||
- **Overlapping**: intersection non-empty but not equal → positive (0.6) + soft conflict
|
||||
- **Disjoint**: intersection empty → 0 + hard conflict
|
||||
|
||||
Completion criteria:
|
||||
- [x] `CalculateVersionCompatibility` classifies range relationships
|
||||
- [x] `VersionRelation` enum { Equivalent, Overlapping, Disjoint, Unknown }
|
||||
- [x] `ClassifyVersionRelation` helper for set comparison
|
||||
- [x] `affected-range-divergence` (Soft) and `disjoint-version-ranges` (Hard) conflicts
|
||||
|
||||
### CORR-V2-007 - Add IDF weighting for package keys
|
||||
Status: DONE
|
||||
Dependency: CORR-V2-002
|
||||
Owners: Concelier · Backend
|
||||
|
||||
Task description:
|
||||
Compute IDF-like weights for package keys based on corpus frequency:
|
||||
- Rare package match (e.g., `pkg:cargo/obscure-lib`) → higher discriminative weight
|
||||
- Common package match (e.g., `pkg:npm/lodash`) → lower weight
|
||||
|
||||
Formula: `idf(pkg) = log(N / (1 + df(pkg)))` where N = total observations, df = observations containing pkg.
|
||||
|
||||
Store IDF cache in Valkey with hourly refresh; fallback to uniform weights if cache unavailable.
|
||||
|
||||
Completion criteria:
|
||||
- [x] `packageIdfProvider` parameter added to V2 algorithm (infrastructure ready)
|
||||
- [x] `PackageIdfService` computes and caches IDF scores in Valkey
|
||||
- [x] Graceful degradation to uniform weights on cache miss (null provider = uniform)
|
||||
- [x] Telemetry histogram `concelier.linkset.package_idf_weight`
|
||||
- [x] Unit tests with mocked corpus frequencies
|
||||
|
||||
Implementation notes:
|
||||
- Created `IPackageIdfService.cs` interface with batch operations
|
||||
- Created `ValkeyPackageIdfService.cs` with Valkey caching, TTL, graceful degradation
|
||||
- Created `PackageIdfMetrics.cs` with OpenTelemetry instrumentation
|
||||
- Created `IdfRefreshHostedService.cs` for hourly background refresh
|
||||
- Extended `AdvisoryCacheKeys.cs` with IDF key schema
|
||||
- Updated `ServiceCollectionExtensions.cs` for DI registration
|
||||
- 18 unit tests covering keys, options, IDF formulas, and metrics
|
||||
|
||||
### CORR-V2-008 - Integrate signals into unified scoring
|
||||
Status: DONE
|
||||
Dependency: CORR-V2-001, CORR-V2-002, CORR-V2-003, CORR-V2-004, CORR-V2-005, CORR-V2-006
|
||||
Owners: Concelier · Backend
|
||||
|
||||
Task description:
|
||||
Refactor `LinksetCorrelation.Compute()` to use the new scorers:
|
||||
|
||||
| Signal | Default Weight | Source |
|
||||
|--------|----------------|--------|
|
||||
| Alias connectivity | 0.30 | CalculateAliasConnectivity |
|
||||
| Alias authority | 0.10 | CalculateAliasAuthority |
|
||||
| Package coverage | 0.20 | CalculatePackageCoverage |
|
||||
| Version compatibility | 0.10 | CalculateVersionCompatibility |
|
||||
| CPE match | 0.10 | CalculateCpeScore |
|
||||
| Patch lineage | 0.10 | CalculatePatchLineageScore |
|
||||
| Reference overlap | 0.05 | CalculateReferenceScore |
|
||||
| Freshness | 0.05 | CalculateFreshnessScore |
|
||||
|
||||
Apply typed conflict penalties after base score. Ensure deterministic output by fixing scorer order and tie-breakers.
|
||||
|
||||
Completion criteria:
|
||||
- [x] `LinksetCorrelationV2.Compute()` implements unified scoring
|
||||
- [x] `LinksetCorrelationService` provides V1/V2 switchable interface
|
||||
- [x] `CorrelationServiceOptions` for configuration
|
||||
- [x] Confidence score stable across runs (deterministic)
|
||||
- [x] All 27 V2 tests pass; all 59 linkset tests pass
|
||||
|
||||
### CORR-V2-009 - Update documentation
|
||||
Status: DONE
|
||||
Dependency: CORR-V2-008
|
||||
Owners: Documentation
|
||||
|
||||
Task description:
|
||||
Update architecture and operational docs to reflect v2 correlation:
|
||||
- `docs/modules/concelier/linkset-correlation-21-002.md` → new version `linkset-correlation-v2.md`
|
||||
- `docs/modules/concelier/architecture.md` § 5.2 Linkset correlation
|
||||
- `docs/modules/concelier/operations/conflict-resolution.md` conflict severities
|
||||
|
||||
Completion criteria:
|
||||
- [x] New `linkset-correlation-v2.md` with signal weights, conflict severities, algorithm overview
|
||||
- [x] Architecture doc section updated with V2 correlation table
|
||||
- [x] Conflict resolution runbook updated with new severity tiers (§ 5.1)
|
||||
- [x] ADR recorded in `docs/architecture/decisions/ADR-001-linkset-correlation-v2.md`
|
||||
|
||||
### CORR-V2-010 - Add TF-IDF text similarity (Phase 3 prep)
|
||||
Status: DONE
|
||||
Dependency: CORR-V2-008
|
||||
Owners: Concelier · Backend
|
||||
|
||||
Task description:
|
||||
Add deterministic TF-IDF text similarity as an optional correlation signal:
|
||||
- Tokenize normalized descriptions (existing `DescriptionNormalizer`)
|
||||
- Compute TF-IDF vectors per observation
|
||||
- Cosine similarity as feature (weight 0.05 by default)
|
||||
|
||||
This is prep for Phase 3; disabled by default via feature flag `concelier:correlation:textSimilarity:enabled`.
|
||||
|
||||
Completion criteria:
|
||||
- [x] `TextSimilarityScorer` computes TF-IDF cosine similarity
|
||||
- [x] Feature flag controls enablement (default: false)
|
||||
- [x] Deterministic tokenization (lowercase, stop-word removal, stemming optional)
|
||||
- [x] Unit tests with description fixtures
|
||||
- [x] Performance benchmark (target: ≤ 5ms per pair)
|
||||
|
||||
Implementation notes:
|
||||
- Created `TextSimilarityScorer.cs` with pure C# TF-IDF implementation
|
||||
- Uses smoothed IDF formula: log((N+1)/(df+1)) + 1 to avoid zero weights
|
||||
- Stop word list includes common English words + security-specific terms
|
||||
- 30 unit tests including determinism checks and real-world CVE fixtures
|
||||
- Performance benchmarks verify < 5ms per pair (typically < 0.5ms)
|
||||
|
||||
---
|
||||
|
||||
## Execution Log
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2026-01-25 | Sprint created from product advisory review; 10 tasks scoped across Phase 1-2 improvements. | Planning |
|
||||
| 2026-01-25 | Phase 1 implementation complete: CORR-V2-001 through CORR-V2-006 and CORR-V2-008/009 DONE. Created `LinksetCorrelationV2.cs`, `LinksetCorrelationService.cs`, `ILinksetCorrelationService.cs`. Extended `AdvisoryLinksetConflict` with `ConflictSeverity`. 27 new tests passing. Documentation updated. | Backend |
|
||||
| 2026-01-25 | CORR-V2-007 complete: Created `IPackageIdfService`, `ValkeyPackageIdfService`, `PackageIdfMetrics`, `IdfRefreshHostedService`. Extended `AdvisoryCacheKeys` with IDF key schema. 18 unit tests passing. | Backend |
|
||||
| 2026-01-25 | CORR-V2-009 ADR complete: Created `ADR-001-linkset-correlation-v2.md` documenting V2 algorithm decisions. | Documentation |
|
||||
| 2026-01-25 | CORR-V2-010 complete: Created `TextSimilarityScorer.cs` with pure C# TF-IDF implementation. 30 unit tests + benchmarks passing. All 10 sprint tasks DONE. Total: 89 linkset tests passing. | Backend |
|
||||
|
||||
## Decisions & Risks
|
||||
- **Decision made**: Hard conflicts (distinct CVEs) emit linkset with confidence = 0.1 minimum; downstream policy handles blocking.
|
||||
- **Risk**: IDF caching adds Valkey dependency; mitigated with graceful fallback to uniform weights (CORR-V2-007 complete).
|
||||
- **Risk**: Changing correlation weights affects existing linkset confidence scores; requires migration/recompute job.
|
||||
- **Risk**: Text similarity may add latency; feature-flagged and benchmarked before GA (CORR-V2-010 deferred).
|
||||
|
||||
## Next Checkpoints
|
||||
- 2026-01-27: Review V2 implementation; validate against production dataset sample.
|
||||
- 2026-02-05: Cut pre-release with V2 enabled via feature flag for testing.
|
||||
- 2026-02-10: GA readiness review; evaluate text similarity impact on correlation quality.
|
||||
|
||||
## Sprint Completion
|
||||
All 10 tasks DONE. Sprint ready for archive after validation checkpoint.
|
||||
@@ -0,0 +1,52 @@
|
||||
# 25-Jan-2026 - Linkset Correlation Algorithm Improvements
|
||||
|
||||
> **Status**: Archived - translated to sprint tasks and documentation
|
||||
> **Sprint**: `SPRINT_20260125_001_Concelier_linkset_correlation_v2.md`
|
||||
> **Documentation**: `docs/modules/concelier/linkset-correlation-v2.md`
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Product advisory proposing improvements to Stella Ops' CVE linking/correlation algorithm. The advisory identified critical failure modes in the current `LinksetCorrelation` implementation and proposed a concrete upgrade path.
|
||||
|
||||
## Key Recommendations Applied
|
||||
|
||||
### Phase 1 (High Impact, Low Effort) - Implemented
|
||||
1. Replace alias intersection with graph connectivity scoring
|
||||
2. Replace PURL intersection with pairwise + coverage scoring
|
||||
3. Fix reference conflict logic (zero overlap = neutral, not conflict)
|
||||
4. Typed conflict severities with per-reason penalties
|
||||
|
||||
### Phase 2 (High Impact, Medium Effort) - Sprint Tasks Created
|
||||
5. Patch lineage as top-tier correlation signal
|
||||
6. Version compatibility scoring (Equivalent/Overlapping/Disjoint)
|
||||
7. IDF weighting for package keys
|
||||
|
||||
### Phase 3 (Differentiating) - Documented for Future
|
||||
8. Fellegi-Sunter probabilistic linkage model
|
||||
9. TF-IDF text similarity with MinHash/LSH
|
||||
10. Correlation clustering for cluster formation
|
||||
|
||||
## Artifacts Produced
|
||||
|
||||
- Sprint file: `docs/implplan/SPRINT_20260125_001_Concelier_linkset_correlation_v2.md`
|
||||
- V2 Algorithm: `src/Concelier/__Libraries/StellaOps.Concelier.Core/Linksets/LinksetCorrelationV2.cs`
|
||||
- Model update: `AdvisoryLinksetConflict` extended with `Severity` property
|
||||
- Documentation: `docs/modules/concelier/linkset-correlation-v2.md`
|
||||
- Architecture update: `docs/modules/concelier/architecture.md` § 5.2
|
||||
- Runbook update: `docs/modules/concelier/operations/conflict-resolution.md` § 5.1
|
||||
|
||||
## Original Advisory Content
|
||||
|
||||
You already have the right *architectural* posture (LNM, immutable observations, conflict-first traceability). "Best-in-class" for the linker now comes down to (1) eliminating a few structural failure modes in the current scoring logic, (2) moving from a **hand-weighted sum** to a **calibrated linkage model**, and (3) adding **high-discriminative signals** that most vulnerability linkers still underuse (patch lineage, semantic text similarity with deterministic fallbacks, and cluster-level graph optimization).
|
||||
|
||||
[Full advisory content preserved in conversation history]
|
||||
|
||||
---
|
||||
|
||||
## Archived
|
||||
|
||||
- **Date**: 2026-01-25
|
||||
- **Archived by**: Product Manager role
|
||||
- **Reason**: Translated to documentation + sprint tasks
|
||||
Reference in New Issue
Block a user