save dev progress

This commit is contained in:
StellaOps Bot
2025-12-26 00:32:35 +02:00
parent aa70af062e
commit ed3079543c
142 changed files with 23771 additions and 232 deletions

View File

@@ -0,0 +1,211 @@
# Backport-Aware Deduplication
> Sprint: SPRINT_8200_0015_0001_CONCEL_backport_integration
> Task: BACKPORT-8200-027
## Overview
Linux distributions frequently backport security fixes from upstream projects to their stable package versions without updating the full version number. This creates a challenge for vulnerability scanning: a Debian package at version `1.0-1+deb12u1` may contain the fix for CVE-2024-1234 even though the upstream fixed version is `1.5.0`.
Concelier's backport-aware deduplication addresses this by:
1. **Detecting backports** through the `BackportProofService` which analyzes distro advisories, changelogs, patch headers, and binary fingerprints
2. **Tracking provenance** per-distro in the `provenance_scope` table
3. **Including patch lineage** in merge hash computation for deterministic deduplication
4. **Recording evidence** in the merge audit log for traceability
## Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ Ingestion Pipeline │
├─────────────────────────────────────────────────────────────────────┤
│ Distro Advisory → BackportEvidenceResolver → MergeHash │
│ (DSA, RHSA, USN) (calls BackportProofService) Calculator │
│ │ │ │
│ ▼ │ │
│ ProvenanceScopeService │ │
│ (creates/updates │ │
│ provenance_scope) │ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ PostgreSQL │ │
│ │ ┌───────────────────────────────────┐ │ │
│ │ │ vuln.provenance_scope │ │ │
│ │ │ - canonical_id (FK) │ │ │
│ │ │ - distro_release │ │ │
│ │ │ - backport_semver │ │ │
│ │ │ - patch_id │ │ │
│ │ │ - patch_origin │ │ │
│ │ │ - evidence_ref (proofchain FK) │ │ │
│ │ │ - confidence │ │ │
│ │ └───────────────────────────────────┘ │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
## Evidence Tiers
The `BackportProofService` produces evidence at four quality tiers:
| Tier | Name | Description | Typical Confidence |
|------|------|-------------|-------------------|
| 1 | DistroAdvisory | Direct distro advisory (DSA, RHSA, USN) confirms fix | 0.90 - 1.00 |
| 2 | ChangelogMention | Package changelog mentions CVE or patch commit | 0.75 - 0.90 |
| 3 | PatchHeader | Patch file header matches upstream fix commit | 0.60 - 0.85 |
| 4 | BinaryFingerprint | Binary analysis matches known-fixed function signatures | 0.40 - 0.70 |
Higher-tier evidence takes precedence when updating `provenance_scope` records.
## Patch Origin
The `patch_origin` field tracks where the fix came from:
- **upstream**: Patch applied directly from upstream project commit
- **distro**: Distro-specific patch developed by maintainers
- **vendor**: Commercial vendor-specific patch
## Merge Hash Computation
The merge hash includes patch lineage to differentiate backport scenarios:
```csharp
// MergeHashCalculator computes deterministic hash
var input = new MergeHashInput
{
CveId = "CVE-2024-1234",
AffectsKey = "pkg:deb/debian/openssl@1.1.1n-0+deb11u5",
Weaknesses = ["CWE-79"],
PatchLineage = "abc123def456" // upstream commit SHA
};
string mergeHash = calculator.ComputeMergeHash(input);
// Result: sha256:7f8a9b...
```
Two advisories with different patch lineage (e.g., Debian backport vs Ubuntu backport) produce different merge hashes, preventing incorrect deduplication.
## API Endpoints
### Get Provenance for Canonical Advisory
```http
GET /api/v1/canonical/{id}/provenance
```
Returns all distro-specific provenance scopes:
```json
{
"canonicalId": "11111111-1111-1111-1111-111111111111",
"scopes": [
{
"id": "22222222-2222-2222-2222-222222222222",
"distroRelease": "debian:bookworm",
"backportSemver": "1.1.1n-0+deb12u1",
"patchId": "abc123def456abc123def456abc123def456abc123",
"patchOrigin": "upstream",
"evidenceRef": "33333333-3333-3333-3333-333333333333",
"confidence": 0.95,
"updatedAt": "2025-01-15T10:30:00Z"
},
{
"id": "44444444-4444-4444-4444-444444444444",
"distroRelease": "ubuntu:22.04",
"backportSemver": "1.1.1n-0ubuntu1.22.04.1",
"patchId": "ubuntu-specific-patch-001",
"patchOrigin": "distro",
"confidence": 0.85,
"updatedAt": "2025-01-15T11:00:00Z"
}
],
"totalCount": 2
}
```
## Database Schema
```sql
CREATE TABLE vuln.provenance_scope (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
canonical_id UUID NOT NULL REFERENCES vuln.advisory_canonical(id) ON DELETE CASCADE,
distro_release TEXT NOT NULL, -- e.g., 'debian:bookworm', 'rhel:9.2'
backport_semver TEXT, -- distro's backported version
patch_id TEXT, -- upstream commit SHA or patch identifier
patch_origin TEXT, -- 'upstream', 'distro', 'vendor'
evidence_ref UUID, -- FK to proofchain.proof_entries
confidence NUMERIC(3,2) DEFAULT 0.5, -- 0.00-1.00
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (canonical_id, distro_release)
);
CREATE INDEX idx_provenance_scope_canonical ON vuln.provenance_scope(canonical_id);
CREATE INDEX idx_provenance_scope_distro ON vuln.provenance_scope(distro_release);
```
## Merge Audit Log
When a merge event includes backport evidence, it's recorded in the audit log:
```csharp
var record = new MergeEventRecord(
id: Guid.NewGuid(),
advisoryKey: "CVE-2024-1234",
beforeHash: previousHash,
afterHash: newHash,
mergedAt: DateTimeOffset.UtcNow,
inputDocumentIds: [...],
fieldDecisions: [...],
backportEvidence: [
new BackportEvidenceDecision(
cveId: "CVE-2024-1234",
distroRelease: "debian:bookworm",
evidenceTier: "DistroAdvisory",
confidence: 0.95,
patchId: "abc123...",
patchOrigin: "Upstream",
proofId: "proof:33333333-...",
evidenceDate: DateTimeOffset.UtcNow
)
]
);
```
## Configuration
Backport detection is enabled by default. Configure via `concelier.yaml`:
```yaml
concelier:
backport:
enabled: true
# Minimum confidence threshold for creating provenance scope
minConfidence: 0.3
# Evidence tiers to consider (1=DistroAdvisory, 2=Changelog, 3=PatchHeader, 4=Binary)
enabledTiers: [1, 2, 3, 4]
# Sources with precedence for patch origin
precedence:
- upstream
- distro
- vendor
```
## Testing
The `BackportProvenanceE2ETests` class provides comprehensive E2E tests:
- `E2E_IngestDebianAdvisoryWithBackport_CreatesProvenanceScope`
- `E2E_IngestRhelAdvisoryWithBackport_CreatesProvenanceScopeWithDistroOrigin`
- `E2E_SameCveMultipleDistros_CreatesSeparateProvenanceScopes`
- `E2E_MergeWithBackportEvidence_RecordsInAuditLog`
- `E2E_EvidenceUpgrade_UpdatesProvenanceScope`
- `E2E_RetrieveProvenanceForCanonical_ReturnsAllDistroScopes`
## Related Components
- **BackportProofService**: Generates proof blobs for backport detection (in `StellaOps.Concelier.ProofService`)
- **MergeHashCalculator**: Computes deterministic merge hashes (in `StellaOps.Concelier.Merge`)
- **PatchLineageNormalizer**: Normalizes patch identifiers for hashing (in `StellaOps.Concelier.Merge`)
- **ProvenanceScopeRepository**: PostgreSQL persistence (in `StellaOps.Concelier.Storage.Postgres`)