# Backport-Aware Deduplication > Sprint: SPRINT_8200_0015_0001_CONCEL_backport_integration > Task: BACKPORT-8200-027 ## Overview Linux distributions frequently backport security fixes from upstream projects to their stable package versions without updating the full version number. This creates a challenge for vulnerability scanning: a Debian package at version `1.0-1+deb12u1` may contain the fix for CVE-2024-1234 even though the upstream fixed version is `1.5.0`. Concelier's backport-aware deduplication addresses this by: 1. **Detecting backports** through the `BackportProofService` which analyzes distro advisories, changelogs, patch headers, and binary fingerprints 2. **Tracking provenance** per-distro in the `provenance_scope` table 3. **Including patch lineage** in merge hash computation for deterministic deduplication 4. **Recording evidence** in the merge audit log for traceability ## Architecture ``` ┌─────────────────────────────────────────────────────────────────────┐ │ Ingestion Pipeline │ ├─────────────────────────────────────────────────────────────────────┤ │ Distro Advisory → BackportEvidenceResolver → MergeHash │ │ (DSA, RHSA, USN) (calls BackportProofService) Calculator │ │ │ │ │ │ ▼ │ │ │ ProvenanceScopeService │ │ │ (creates/updates │ │ │ provenance_scope) │ │ │ │ │ │ │ ▼ ▼ │ │ ┌─────────────────────────────────────────┐ │ │ │ PostgreSQL │ │ │ │ ┌───────────────────────────────────┐ │ │ │ │ │ vuln.provenance_scope │ │ │ │ │ │ - canonical_id (FK) │ │ │ │ │ │ - distro_release │ │ │ │ │ │ - backport_semver │ │ │ │ │ │ - patch_id │ │ │ │ │ │ - patch_origin │ │ │ │ │ │ - evidence_ref (proofchain FK) │ │ │ │ │ │ - confidence │ │ │ │ │ └───────────────────────────────────┘ │ │ │ └─────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ``` ## Evidence Tiers The `BackportProofService` produces evidence at four quality tiers: | Tier | Name | Description | Typical Confidence | |------|------|-------------|-------------------| | 1 | DistroAdvisory | Direct distro advisory (DSA, RHSA, USN) confirms fix | 0.90 - 1.00 | | 2 | ChangelogMention | Package changelog mentions CVE or patch commit | 0.75 - 0.90 | | 3 | PatchHeader | Patch file header matches upstream fix commit | 0.60 - 0.85 | | 4 | BinaryFingerprint | Binary analysis matches known-fixed function signatures | 0.40 - 0.70 | Higher-tier evidence takes precedence when updating `provenance_scope` records. ## Patch Origin The `patch_origin` field tracks where the fix came from: - **upstream**: Patch applied directly from upstream project commit - **distro**: Distro-specific patch developed by maintainers - **vendor**: Commercial vendor-specific patch ## Merge Hash Computation The merge hash includes patch lineage to differentiate backport scenarios: ```csharp // MergeHashCalculator computes deterministic hash var input = new MergeHashInput { CveId = "CVE-2024-1234", AffectsKey = "pkg:deb/debian/openssl@1.1.1n-0+deb11u5", Weaknesses = ["CWE-79"], PatchLineage = "abc123def456" // upstream commit SHA }; string mergeHash = calculator.ComputeMergeHash(input); // Result: sha256:7f8a9b... ``` Two advisories with different patch lineage (e.g., Debian backport vs Ubuntu backport) produce different merge hashes, preventing incorrect deduplication. ## API Endpoints ### Get Provenance for Canonical Advisory ```http GET /api/v1/canonical/{id}/provenance ``` Returns all distro-specific provenance scopes: ```json { "canonicalId": "11111111-1111-1111-1111-111111111111", "scopes": [ { "id": "22222222-2222-2222-2222-222222222222", "distroRelease": "debian:bookworm", "backportSemver": "1.1.1n-0+deb12u1", "patchId": "abc123def456abc123def456abc123def456abc123", "patchOrigin": "upstream", "evidenceRef": "33333333-3333-3333-3333-333333333333", "confidence": 0.95, "updatedAt": "2025-01-15T10:30:00Z" }, { "id": "44444444-4444-4444-4444-444444444444", "distroRelease": "ubuntu:22.04", "backportSemver": "1.1.1n-0ubuntu1.22.04.1", "patchId": "ubuntu-specific-patch-001", "patchOrigin": "distro", "confidence": 0.85, "updatedAt": "2025-01-15T11:00:00Z" } ], "totalCount": 2 } ``` ## Database Schema ```sql CREATE TABLE vuln.provenance_scope ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), canonical_id UUID NOT NULL REFERENCES vuln.advisory_canonical(id) ON DELETE CASCADE, distro_release TEXT NOT NULL, -- e.g., 'debian:bookworm', 'rhel:9.2' backport_semver TEXT, -- distro's backported version patch_id TEXT, -- upstream commit SHA or patch identifier patch_origin TEXT, -- 'upstream', 'distro', 'vendor' evidence_ref UUID, -- FK to proofchain.proof_entries confidence NUMERIC(3,2) DEFAULT 0.5, -- 0.00-1.00 created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), UNIQUE (canonical_id, distro_release) ); CREATE INDEX idx_provenance_scope_canonical ON vuln.provenance_scope(canonical_id); CREATE INDEX idx_provenance_scope_distro ON vuln.provenance_scope(distro_release); ``` ## Merge Audit Log When a merge event includes backport evidence, it's recorded in the audit log: ```csharp var record = new MergeEventRecord( id: Guid.NewGuid(), advisoryKey: "CVE-2024-1234", beforeHash: previousHash, afterHash: newHash, mergedAt: DateTimeOffset.UtcNow, inputDocumentIds: [...], fieldDecisions: [...], backportEvidence: [ new BackportEvidenceDecision( cveId: "CVE-2024-1234", distroRelease: "debian:bookworm", evidenceTier: "DistroAdvisory", confidence: 0.95, patchId: "abc123...", patchOrigin: "Upstream", proofId: "proof:33333333-...", evidenceDate: DateTimeOffset.UtcNow ) ] ); ``` ## Configuration Backport detection is enabled by default. Configure via `concelier.yaml`: ```yaml concelier: backport: enabled: true # Minimum confidence threshold for creating provenance scope minConfidence: 0.3 # Evidence tiers to consider (1=DistroAdvisory, 2=Changelog, 3=PatchHeader, 4=Binary) enabledTiers: [1, 2, 3, 4] # Sources with precedence for patch origin precedence: - upstream - distro - vendor ``` ## Testing The `BackportProvenanceE2ETests` class provides comprehensive E2E tests: - `E2E_IngestDebianAdvisoryWithBackport_CreatesProvenanceScope` - `E2E_IngestRhelAdvisoryWithBackport_CreatesProvenanceScopeWithDistroOrigin` - `E2E_SameCveMultipleDistros_CreatesSeparateProvenanceScopes` - `E2E_MergeWithBackportEvidence_RecordsInAuditLog` - `E2E_EvidenceUpgrade_UpdatesProvenanceScope` - `E2E_RetrieveProvenanceForCanonical_ReturnsAllDistroScopes` ## Related Components - **BackportProofService**: Generates proof blobs for backport detection (in `StellaOps.Concelier.ProofService`) - **MergeHashCalculator**: Computes deterministic merge hashes (in `StellaOps.Concelier.Merge`) - **PatchLineageNormalizer**: Normalizes patch identifiers for hashing (in `StellaOps.Concelier.Merge`) - **ProvenanceScopeRepository**: PostgreSQL persistence (in `StellaOps.Concelier.Storage.Postgres`)