9.0 KiB
Backport-Aware Deduplication
Sprint: SPRINT_8200_0015_0001_CONCEL_backport_integration Task: BACKPORT-8200-027
Overview
Linux distributions frequently backport security fixes from upstream projects to their stable package versions without updating the full version number. This creates a challenge for vulnerability scanning: a Debian package at version 1.0-1+deb12u1 may contain the fix for CVE-2024-1234 even though the upstream fixed version is 1.5.0.
Concelier's backport-aware deduplication addresses this by:
- Detecting backports through the
BackportProofServicewhich analyzes distro advisories, changelogs, patch headers, and binary fingerprints - Tracking provenance per-distro in the
provenance_scopetable - Including patch lineage in merge hash computation for deterministic deduplication
- Recording evidence in the merge audit log for traceability
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ Ingestion Pipeline │
├─────────────────────────────────────────────────────────────────────┤
│ Distro Advisory → BackportEvidenceResolver → MergeHash │
│ (DSA, RHSA, USN) (calls BackportProofService) Calculator │
│ │ │ │
│ ▼ │ │
│ ProvenanceScopeService │ │
│ (creates/updates │ │
│ provenance_scope) │ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ PostgreSQL │ │
│ │ ┌───────────────────────────────────┐ │ │
│ │ │ vuln.provenance_scope │ │ │
│ │ │ - canonical_id (FK) │ │ │
│ │ │ - distro_release │ │ │
│ │ │ - backport_semver │ │ │
│ │ │ - patch_id │ │ │
│ │ │ - patch_origin │ │ │
│ │ │ - evidence_ref (proofchain FK) │ │ │
│ │ │ - confidence │ │ │
│ │ └───────────────────────────────────┘ │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Evidence Tiers
The BackportProofService produces evidence at four quality tiers:
| Tier | Name | Description | Typical Confidence |
|---|---|---|---|
| 1 | DistroAdvisory | Direct distro advisory (DSA, RHSA, USN) confirms fix | 0.90 - 1.00 |
| 2 | ChangelogMention | Package changelog mentions CVE or patch commit | 0.75 - 0.90 |
| 3 | PatchHeader | Patch file header matches upstream fix commit | 0.60 - 0.85 |
| 4 | BinaryFingerprint | Binary analysis matches known-fixed function signatures | 0.40 - 0.70 |
Higher-tier evidence takes precedence when updating provenance_scope records.
Patch Origin
The patch_origin field tracks where the fix came from:
- upstream: Patch applied directly from upstream project commit
- distro: Distro-specific patch developed by maintainers
- vendor: Commercial vendor-specific patch
Merge Hash Computation
The merge hash includes patch lineage to differentiate backport scenarios:
// MergeHashCalculator computes deterministic hash
var input = new MergeHashInput
{
CveId = "CVE-2024-1234",
AffectsKey = "pkg:deb/debian/openssl@1.1.1n-0+deb11u5",
Weaknesses = ["CWE-79"],
PatchLineage = "abc123def456" // upstream commit SHA
};
string mergeHash = calculator.ComputeMergeHash(input);
// Result: sha256:7f8a9b...
Two advisories with different patch lineage (e.g., Debian backport vs Ubuntu backport) produce different merge hashes, preventing incorrect deduplication.
API Endpoints
Get Provenance for Canonical Advisory
GET /api/v1/canonical/{id}/provenance
Returns all distro-specific provenance scopes:
{
"canonicalId": "11111111-1111-1111-1111-111111111111",
"scopes": [
{
"id": "22222222-2222-2222-2222-222222222222",
"distroRelease": "debian:bookworm",
"backportSemver": "1.1.1n-0+deb12u1",
"patchId": "abc123def456abc123def456abc123def456abc123",
"patchOrigin": "upstream",
"evidenceRef": "33333333-3333-3333-3333-333333333333",
"confidence": 0.95,
"updatedAt": "2025-01-15T10:30:00Z"
},
{
"id": "44444444-4444-4444-4444-444444444444",
"distroRelease": "ubuntu:22.04",
"backportSemver": "1.1.1n-0ubuntu1.22.04.1",
"patchId": "ubuntu-specific-patch-001",
"patchOrigin": "distro",
"confidence": 0.85,
"updatedAt": "2025-01-15T11:00:00Z"
}
],
"totalCount": 2
}
Database Schema
CREATE TABLE vuln.provenance_scope (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
canonical_id UUID NOT NULL REFERENCES vuln.advisory_canonical(id) ON DELETE CASCADE,
distro_release TEXT NOT NULL, -- e.g., 'debian:bookworm', 'rhel:9.2'
backport_semver TEXT, -- distro's backported version
patch_id TEXT, -- upstream commit SHA or patch identifier
patch_origin TEXT, -- 'upstream', 'distro', 'vendor'
evidence_ref UUID, -- FK to proofchain.proof_entries
confidence NUMERIC(3,2) DEFAULT 0.5, -- 0.00-1.00
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (canonical_id, distro_release)
);
CREATE INDEX idx_provenance_scope_canonical ON vuln.provenance_scope(canonical_id);
CREATE INDEX idx_provenance_scope_distro ON vuln.provenance_scope(distro_release);
Merge Audit Log
When a merge event includes backport evidence, it's recorded in the audit log:
var record = new MergeEventRecord(
id: Guid.NewGuid(),
advisoryKey: "CVE-2024-1234",
beforeHash: previousHash,
afterHash: newHash,
mergedAt: DateTimeOffset.UtcNow,
inputDocumentIds: [...],
fieldDecisions: [...],
backportEvidence: [
new BackportEvidenceDecision(
cveId: "CVE-2024-1234",
distroRelease: "debian:bookworm",
evidenceTier: "DistroAdvisory",
confidence: 0.95,
patchId: "abc123...",
patchOrigin: "Upstream",
proofId: "proof:33333333-...",
evidenceDate: DateTimeOffset.UtcNow
)
]
);
Configuration
Backport detection is enabled by default. Configure via concelier.yaml:
concelier:
backport:
enabled: true
# Minimum confidence threshold for creating provenance scope
minConfidence: 0.3
# Evidence tiers to consider (1=DistroAdvisory, 2=Changelog, 3=PatchHeader, 4=Binary)
enabledTiers: [1, 2, 3, 4]
# Sources with precedence for patch origin
precedence:
- upstream
- distro
- vendor
Testing
The BackportProvenanceE2ETests class provides comprehensive E2E tests:
E2E_IngestDebianAdvisoryWithBackport_CreatesProvenanceScopeE2E_IngestRhelAdvisoryWithBackport_CreatesProvenanceScopeWithDistroOriginE2E_SameCveMultipleDistros_CreatesSeparateProvenanceScopesE2E_MergeWithBackportEvidence_RecordsInAuditLogE2E_EvidenceUpgrade_UpdatesProvenanceScopeE2E_RetrieveProvenanceForCanonical_ReturnsAllDistroScopes
Related Components
- BackportProofService: Generates proof blobs for backport detection (in
StellaOps.Concelier.ProofService) - MergeHashCalculator: Computes deterministic merge hashes (in
StellaOps.Concelier.Merge) - PatchLineageNormalizer: Normalizes patch identifiers for hashing (in
StellaOps.Concelier.Merge) - ProvenanceScopeRepository: PostgreSQL persistence (in
StellaOps.Concelier.Storage.Postgres)