# Deterministic Semantic Merge Hash for Advisory Deduplication ## Module Concelier ## Status IMPLEMENTED ## Description Computes identity-based semantic hash from (CVE + PURL/CPE + version-range + CWE + patch_lineage) for cross-distro advisory deduplication. Includes normalizers (PURL, CPE, version range, CWE, patch lineage), golden corpus validation (Debian/RHEL/SUSE/Alpine), fuzzing tests (1000 random inputs), shadow-write migration mode, and backfill service. Distinct from "Advisory Ingestion with Canonical Deduplication" which is the overall dedup concept; this is the specific merge_hash identity algorithm. ## Implementation Details - **Modules**: `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Identity/`, `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Services/`, `src/Concelier/__Libraries/StellaOps.Concelier.Merge/Jobs/` - **Key Classes**: - `MergeHashCalculator` (`src/Concelier/__Libraries/StellaOps.Concelier.Merge/Identity/MergeHashCalculator.cs`) - computes deterministic semantic hash from (CVE + PURL/CPE + version-range + CWE + patch_lineage) with input normalizers - `MergeHashShadowWriteService` (`src/Concelier/__Libraries/StellaOps.Concelier.Merge/Identity/MergeHashShadowWriteService.cs`) - shadow-write mode for migration validation - `MergeHashBackfillService` (`src/Concelier/__Libraries/StellaOps.Concelier.Merge/Services/MergeHashBackfillService.cs`) - retroactive backfill of merge hashes for existing advisories - `MergeHashBackfillJob` (`src/Concelier/__Libraries/StellaOps.Concelier.Merge/Jobs/MergeHashBackfillJob.cs`) - scheduled `IJob` for backfill execution - **Interfaces**: `IMergeHashCalculator` - **Source**: SPRINT_8200_0012_0001_CONCEL_merge_hash_library.md ## E2E Test Plan - [ ] Compute merge hash for two semantically identical advisories from different sources (e.g., Debian and RHEL for same CVE) and verify identical hash output - [ ] Verify PURL normalization: different PURL formats for the same package produce the same merge hash - [ ] Verify CPE normalization: equivalent CPE strings produce identical hashes - [ ] Verify determinism: same input produces the same hash across 1000 repeated computations - [ ] Verify golden corpus: validate merge hash against the golden corpus of known Debian/RHEL/SUSE/Alpine advisories - [ ] Verify shadow-write mode: enable shadow writes and confirm both old and new hashes are persisted for comparison - [ ] Verify backfill: run `MergeHashBackfillJob` and confirm pre-existing advisories receive computed merge hashes