Files
git.stella-ops.org/docs/modules/concelier/backport-deduplication.md
2025-12-26 00:32:58 +02:00

9.0 KiB

Backport-Aware Deduplication

Sprint: SPRINT_8200_0015_0001_CONCEL_backport_integration Task: BACKPORT-8200-027

Overview

Linux distributions frequently backport security fixes from upstream projects to their stable package versions without updating the full version number. This creates a challenge for vulnerability scanning: a Debian package at version 1.0-1+deb12u1 may contain the fix for CVE-2024-1234 even though the upstream fixed version is 1.5.0.

Concelier's backport-aware deduplication addresses this by:

  1. Detecting backports through the BackportProofService which analyzes distro advisories, changelogs, patch headers, and binary fingerprints
  2. Tracking provenance per-distro in the provenance_scope table
  3. Including patch lineage in merge hash computation for deterministic deduplication
  4. Recording evidence in the merge audit log for traceability

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        Ingestion Pipeline                           │
├─────────────────────────────────────────────────────────────────────┤
│  Distro Advisory   →   BackportEvidenceResolver   →   MergeHash    │
│  (DSA, RHSA, USN)      (calls BackportProofService)     Calculator │
│                              │                              │       │
│                              ▼                              │       │
│                    ProvenanceScopeService                   │       │
│                        (creates/updates                     │       │
│                         provenance_scope)                   │       │
│                              │                              │       │
│                              ▼                              ▼       │
│                    ┌─────────────────────────────────────────┐     │
│                    │           PostgreSQL                    │     │
│                    │  ┌───────────────────────────────────┐  │     │
│                    │  │      vuln.provenance_scope        │  │     │
│                    │  │  - canonical_id (FK)              │  │     │
│                    │  │  - distro_release                 │  │     │
│                    │  │  - backport_semver                │  │     │
│                    │  │  - patch_id                       │  │     │
│                    │  │  - patch_origin                   │  │     │
│                    │  │  - evidence_ref (proofchain FK)   │  │     │
│                    │  │  - confidence                     │  │     │
│                    │  └───────────────────────────────────┘  │     │
│                    └─────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────────────────┘

Evidence Tiers

The BackportProofService produces evidence at four quality tiers:

Tier Name Description Typical Confidence
1 DistroAdvisory Direct distro advisory (DSA, RHSA, USN) confirms fix 0.90 - 1.00
2 ChangelogMention Package changelog mentions CVE or patch commit 0.75 - 0.90
3 PatchHeader Patch file header matches upstream fix commit 0.60 - 0.85
4 BinaryFingerprint Binary analysis matches known-fixed function signatures 0.40 - 0.70

Higher-tier evidence takes precedence when updating provenance_scope records.

Patch Origin

The patch_origin field tracks where the fix came from:

  • upstream: Patch applied directly from upstream project commit
  • distro: Distro-specific patch developed by maintainers
  • vendor: Commercial vendor-specific patch

Merge Hash Computation

The merge hash includes patch lineage to differentiate backport scenarios:

// MergeHashCalculator computes deterministic hash
var input = new MergeHashInput
{
    CveId = "CVE-2024-1234",
    AffectsKey = "pkg:deb/debian/openssl@1.1.1n-0+deb11u5",
    Weaknesses = ["CWE-79"],
    PatchLineage = "abc123def456" // upstream commit SHA
};

string mergeHash = calculator.ComputeMergeHash(input);
// Result: sha256:7f8a9b...

Two advisories with different patch lineage (e.g., Debian backport vs Ubuntu backport) produce different merge hashes, preventing incorrect deduplication.

API Endpoints

Get Provenance for Canonical Advisory

GET /api/v1/canonical/{id}/provenance

Returns all distro-specific provenance scopes:

{
  "canonicalId": "11111111-1111-1111-1111-111111111111",
  "scopes": [
    {
      "id": "22222222-2222-2222-2222-222222222222",
      "distroRelease": "debian:bookworm",
      "backportSemver": "1.1.1n-0+deb12u1",
      "patchId": "abc123def456abc123def456abc123def456abc123",
      "patchOrigin": "upstream",
      "evidenceRef": "33333333-3333-3333-3333-333333333333",
      "confidence": 0.95,
      "updatedAt": "2025-01-15T10:30:00Z"
    },
    {
      "id": "44444444-4444-4444-4444-444444444444",
      "distroRelease": "ubuntu:22.04",
      "backportSemver": "1.1.1n-0ubuntu1.22.04.1",
      "patchId": "ubuntu-specific-patch-001",
      "patchOrigin": "distro",
      "confidence": 0.85,
      "updatedAt": "2025-01-15T11:00:00Z"
    }
  ],
  "totalCount": 2
}

Database Schema

CREATE TABLE vuln.provenance_scope (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    canonical_id UUID NOT NULL REFERENCES vuln.advisory_canonical(id) ON DELETE CASCADE,
    distro_release TEXT NOT NULL,         -- e.g., 'debian:bookworm', 'rhel:9.2'
    backport_semver TEXT,                 -- distro's backported version
    patch_id TEXT,                        -- upstream commit SHA or patch identifier
    patch_origin TEXT,                    -- 'upstream', 'distro', 'vendor'
    evidence_ref UUID,                    -- FK to proofchain.proof_entries
    confidence NUMERIC(3,2) DEFAULT 0.5,  -- 0.00-1.00
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE (canonical_id, distro_release)
);

CREATE INDEX idx_provenance_scope_canonical ON vuln.provenance_scope(canonical_id);
CREATE INDEX idx_provenance_scope_distro ON vuln.provenance_scope(distro_release);

Merge Audit Log

When a merge event includes backport evidence, it's recorded in the audit log:

var record = new MergeEventRecord(
    id: Guid.NewGuid(),
    advisoryKey: "CVE-2024-1234",
    beforeHash: previousHash,
    afterHash: newHash,
    mergedAt: DateTimeOffset.UtcNow,
    inputDocumentIds: [...],
    fieldDecisions: [...],
    backportEvidence: [
        new BackportEvidenceDecision(
            cveId: "CVE-2024-1234",
            distroRelease: "debian:bookworm",
            evidenceTier: "DistroAdvisory",
            confidence: 0.95,
            patchId: "abc123...",
            patchOrigin: "Upstream",
            proofId: "proof:33333333-...",
            evidenceDate: DateTimeOffset.UtcNow
        )
    ]
);

Configuration

Backport detection is enabled by default. Configure via concelier.yaml:

concelier:
  backport:
    enabled: true
    # Minimum confidence threshold for creating provenance scope
    minConfidence: 0.3
    # Evidence tiers to consider (1=DistroAdvisory, 2=Changelog, 3=PatchHeader, 4=Binary)
    enabledTiers: [1, 2, 3, 4]
    # Sources with precedence for patch origin
    precedence:
      - upstream
      - distro
      - vendor

Testing

The BackportProvenanceE2ETests class provides comprehensive E2E tests:

  • E2E_IngestDebianAdvisoryWithBackport_CreatesProvenanceScope
  • E2E_IngestRhelAdvisoryWithBackport_CreatesProvenanceScopeWithDistroOrigin
  • E2E_SameCveMultipleDistros_CreatesSeparateProvenanceScopes
  • E2E_MergeWithBackportEvidence_RecordsInAuditLog
  • E2E_EvidenceUpgrade_UpdatesProvenanceScope
  • E2E_RetrieveProvenanceForCanonical_ReturnsAllDistroScopes
  • BackportProofService: Generates proof blobs for backport detection (in StellaOps.Concelier.ProofService)
  • MergeHashCalculator: Computes deterministic merge hashes (in StellaOps.Concelier.Merge)
  • PatchLineageNormalizer: Normalizes patch identifiers for hashing (in StellaOps.Concelier.Merge)
  • ProvenanceScopeRepository: PostgreSQL persistence (in StellaOps.Concelier.Storage.Postgres)