save dev progress
This commit is contained in:
211
docs/modules/concelier/backport-deduplication.md
Normal file
211
docs/modules/concelier/backport-deduplication.md
Normal file
@@ -0,0 +1,211 @@
|
||||
# Backport-Aware Deduplication
|
||||
|
||||
> Sprint: SPRINT_8200_0015_0001_CONCEL_backport_integration
|
||||
> Task: BACKPORT-8200-027
|
||||
|
||||
## Overview
|
||||
|
||||
Linux distributions frequently backport security fixes from upstream projects to their stable package versions without updating the full version number. This creates a challenge for vulnerability scanning: a Debian package at version `1.0-1+deb12u1` may contain the fix for CVE-2024-1234 even though the upstream fixed version is `1.5.0`.
|
||||
|
||||
Concelier's backport-aware deduplication addresses this by:
|
||||
|
||||
1. **Detecting backports** through the `BackportProofService` which analyzes distro advisories, changelogs, patch headers, and binary fingerprints
|
||||
2. **Tracking provenance** per-distro in the `provenance_scope` table
|
||||
3. **Including patch lineage** in merge hash computation for deterministic deduplication
|
||||
4. **Recording evidence** in the merge audit log for traceability
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Ingestion Pipeline │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ Distro Advisory → BackportEvidenceResolver → MergeHash │
|
||||
│ (DSA, RHSA, USN) (calls BackportProofService) Calculator │
|
||||
│ │ │ │
|
||||
│ ▼ │ │
|
||||
│ ProvenanceScopeService │ │
|
||||
│ (creates/updates │ │
|
||||
│ provenance_scope) │ │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌─────────────────────────────────────────┐ │
|
||||
│ │ PostgreSQL │ │
|
||||
│ │ ┌───────────────────────────────────┐ │ │
|
||||
│ │ │ vuln.provenance_scope │ │ │
|
||||
│ │ │ - canonical_id (FK) │ │ │
|
||||
│ │ │ - distro_release │ │ │
|
||||
│ │ │ - backport_semver │ │ │
|
||||
│ │ │ - patch_id │ │ │
|
||||
│ │ │ - patch_origin │ │ │
|
||||
│ │ │ - evidence_ref (proofchain FK) │ │ │
|
||||
│ │ │ - confidence │ │ │
|
||||
│ │ └───────────────────────────────────┘ │ │
|
||||
│ └─────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Evidence Tiers
|
||||
|
||||
The `BackportProofService` produces evidence at four quality tiers:
|
||||
|
||||
| Tier | Name | Description | Typical Confidence |
|
||||
|------|------|-------------|-------------------|
|
||||
| 1 | DistroAdvisory | Direct distro advisory (DSA, RHSA, USN) confirms fix | 0.90 - 1.00 |
|
||||
| 2 | ChangelogMention | Package changelog mentions CVE or patch commit | 0.75 - 0.90 |
|
||||
| 3 | PatchHeader | Patch file header matches upstream fix commit | 0.60 - 0.85 |
|
||||
| 4 | BinaryFingerprint | Binary analysis matches known-fixed function signatures | 0.40 - 0.70 |
|
||||
|
||||
Higher-tier evidence takes precedence when updating `provenance_scope` records.
|
||||
|
||||
## Patch Origin
|
||||
|
||||
The `patch_origin` field tracks where the fix came from:
|
||||
|
||||
- **upstream**: Patch applied directly from upstream project commit
|
||||
- **distro**: Distro-specific patch developed by maintainers
|
||||
- **vendor**: Commercial vendor-specific patch
|
||||
|
||||
## Merge Hash Computation
|
||||
|
||||
The merge hash includes patch lineage to differentiate backport scenarios:
|
||||
|
||||
```csharp
|
||||
// MergeHashCalculator computes deterministic hash
|
||||
var input = new MergeHashInput
|
||||
{
|
||||
CveId = "CVE-2024-1234",
|
||||
AffectsKey = "pkg:deb/debian/openssl@1.1.1n-0+deb11u5",
|
||||
Weaknesses = ["CWE-79"],
|
||||
PatchLineage = "abc123def456" // upstream commit SHA
|
||||
};
|
||||
|
||||
string mergeHash = calculator.ComputeMergeHash(input);
|
||||
// Result: sha256:7f8a9b...
|
||||
```
|
||||
|
||||
Two advisories with different patch lineage (e.g., Debian backport vs Ubuntu backport) produce different merge hashes, preventing incorrect deduplication.
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Get Provenance for Canonical Advisory
|
||||
|
||||
```http
|
||||
GET /api/v1/canonical/{id}/provenance
|
||||
```
|
||||
|
||||
Returns all distro-specific provenance scopes:
|
||||
|
||||
```json
|
||||
{
|
||||
"canonicalId": "11111111-1111-1111-1111-111111111111",
|
||||
"scopes": [
|
||||
{
|
||||
"id": "22222222-2222-2222-2222-222222222222",
|
||||
"distroRelease": "debian:bookworm",
|
||||
"backportSemver": "1.1.1n-0+deb12u1",
|
||||
"patchId": "abc123def456abc123def456abc123def456abc123",
|
||||
"patchOrigin": "upstream",
|
||||
"evidenceRef": "33333333-3333-3333-3333-333333333333",
|
||||
"confidence": 0.95,
|
||||
"updatedAt": "2025-01-15T10:30:00Z"
|
||||
},
|
||||
{
|
||||
"id": "44444444-4444-4444-4444-444444444444",
|
||||
"distroRelease": "ubuntu:22.04",
|
||||
"backportSemver": "1.1.1n-0ubuntu1.22.04.1",
|
||||
"patchId": "ubuntu-specific-patch-001",
|
||||
"patchOrigin": "distro",
|
||||
"confidence": 0.85,
|
||||
"updatedAt": "2025-01-15T11:00:00Z"
|
||||
}
|
||||
],
|
||||
"totalCount": 2
|
||||
}
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE vuln.provenance_scope (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
canonical_id UUID NOT NULL REFERENCES vuln.advisory_canonical(id) ON DELETE CASCADE,
|
||||
distro_release TEXT NOT NULL, -- e.g., 'debian:bookworm', 'rhel:9.2'
|
||||
backport_semver TEXT, -- distro's backported version
|
||||
patch_id TEXT, -- upstream commit SHA or patch identifier
|
||||
patch_origin TEXT, -- 'upstream', 'distro', 'vendor'
|
||||
evidence_ref UUID, -- FK to proofchain.proof_entries
|
||||
confidence NUMERIC(3,2) DEFAULT 0.5, -- 0.00-1.00
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (canonical_id, distro_release)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_provenance_scope_canonical ON vuln.provenance_scope(canonical_id);
|
||||
CREATE INDEX idx_provenance_scope_distro ON vuln.provenance_scope(distro_release);
|
||||
```
|
||||
|
||||
## Merge Audit Log
|
||||
|
||||
When a merge event includes backport evidence, it's recorded in the audit log:
|
||||
|
||||
```csharp
|
||||
var record = new MergeEventRecord(
|
||||
id: Guid.NewGuid(),
|
||||
advisoryKey: "CVE-2024-1234",
|
||||
beforeHash: previousHash,
|
||||
afterHash: newHash,
|
||||
mergedAt: DateTimeOffset.UtcNow,
|
||||
inputDocumentIds: [...],
|
||||
fieldDecisions: [...],
|
||||
backportEvidence: [
|
||||
new BackportEvidenceDecision(
|
||||
cveId: "CVE-2024-1234",
|
||||
distroRelease: "debian:bookworm",
|
||||
evidenceTier: "DistroAdvisory",
|
||||
confidence: 0.95,
|
||||
patchId: "abc123...",
|
||||
patchOrigin: "Upstream",
|
||||
proofId: "proof:33333333-...",
|
||||
evidenceDate: DateTimeOffset.UtcNow
|
||||
)
|
||||
]
|
||||
);
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Backport detection is enabled by default. Configure via `concelier.yaml`:
|
||||
|
||||
```yaml
|
||||
concelier:
|
||||
backport:
|
||||
enabled: true
|
||||
# Minimum confidence threshold for creating provenance scope
|
||||
minConfidence: 0.3
|
||||
# Evidence tiers to consider (1=DistroAdvisory, 2=Changelog, 3=PatchHeader, 4=Binary)
|
||||
enabledTiers: [1, 2, 3, 4]
|
||||
# Sources with precedence for patch origin
|
||||
precedence:
|
||||
- upstream
|
||||
- distro
|
||||
- vendor
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
The `BackportProvenanceE2ETests` class provides comprehensive E2E tests:
|
||||
|
||||
- `E2E_IngestDebianAdvisoryWithBackport_CreatesProvenanceScope`
|
||||
- `E2E_IngestRhelAdvisoryWithBackport_CreatesProvenanceScopeWithDistroOrigin`
|
||||
- `E2E_SameCveMultipleDistros_CreatesSeparateProvenanceScopes`
|
||||
- `E2E_MergeWithBackportEvidence_RecordsInAuditLog`
|
||||
- `E2E_EvidenceUpgrade_UpdatesProvenanceScope`
|
||||
- `E2E_RetrieveProvenanceForCanonical_ReturnsAllDistroScopes`
|
||||
|
||||
## Related Components
|
||||
|
||||
- **BackportProofService**: Generates proof blobs for backport detection (in `StellaOps.Concelier.ProofService`)
|
||||
- **MergeHashCalculator**: Computes deterministic merge hashes (in `StellaOps.Concelier.Merge`)
|
||||
- **PatchLineageNormalizer**: Normalizes patch identifiers for hashing (in `StellaOps.Concelier.Merge`)
|
||||
- **ProvenanceScopeRepository**: PostgreSQL persistence (in `StellaOps.Concelier.Storage.Postgres`)
|
||||
Reference in New Issue
Block a user