save dev progress

This commit is contained in:
StellaOps Bot
2025-12-26 00:32:35 +02:00
parent aa70af062e
commit ed3079543c
142 changed files with 23771 additions and 232 deletions

View File

@@ -0,0 +1,211 @@
# Backport-Aware Deduplication
> Sprint: SPRINT_8200_0015_0001_CONCEL_backport_integration
> Task: BACKPORT-8200-027
## Overview
Linux distributions frequently backport security fixes from upstream projects to their stable package versions without updating the full version number. This creates a challenge for vulnerability scanning: a Debian package at version `1.0-1+deb12u1` may contain the fix for CVE-2024-1234 even though the upstream fixed version is `1.5.0`.
Concelier's backport-aware deduplication addresses this by:
1. **Detecting backports** through the `BackportProofService` which analyzes distro advisories, changelogs, patch headers, and binary fingerprints
2. **Tracking provenance** per-distro in the `provenance_scope` table
3. **Including patch lineage** in merge hash computation for deterministic deduplication
4. **Recording evidence** in the merge audit log for traceability
## Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ Ingestion Pipeline │
├─────────────────────────────────────────────────────────────────────┤
│ Distro Advisory → BackportEvidenceResolver → MergeHash │
│ (DSA, RHSA, USN) (calls BackportProofService) Calculator │
│ │ │ │
│ ▼ │ │
│ ProvenanceScopeService │ │
│ (creates/updates │ │
│ provenance_scope) │ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ PostgreSQL │ │
│ │ ┌───────────────────────────────────┐ │ │
│ │ │ vuln.provenance_scope │ │ │
│ │ │ - canonical_id (FK) │ │ │
│ │ │ - distro_release │ │ │
│ │ │ - backport_semver │ │ │
│ │ │ - patch_id │ │ │
│ │ │ - patch_origin │ │ │
│ │ │ - evidence_ref (proofchain FK) │ │ │
│ │ │ - confidence │ │ │
│ │ └───────────────────────────────────┘ │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
## Evidence Tiers
The `BackportProofService` produces evidence at four quality tiers:
| Tier | Name | Description | Typical Confidence |
|------|------|-------------|-------------------|
| 1 | DistroAdvisory | Direct distro advisory (DSA, RHSA, USN) confirms fix | 0.90 - 1.00 |
| 2 | ChangelogMention | Package changelog mentions CVE or patch commit | 0.75 - 0.90 |
| 3 | PatchHeader | Patch file header matches upstream fix commit | 0.60 - 0.85 |
| 4 | BinaryFingerprint | Binary analysis matches known-fixed function signatures | 0.40 - 0.70 |
Higher-tier evidence takes precedence when updating `provenance_scope` records.
## Patch Origin
The `patch_origin` field tracks where the fix came from:
- **upstream**: Patch applied directly from upstream project commit
- **distro**: Distro-specific patch developed by maintainers
- **vendor**: Commercial vendor-specific patch
## Merge Hash Computation
The merge hash includes patch lineage to differentiate backport scenarios:
```csharp
// MergeHashCalculator computes deterministic hash
var input = new MergeHashInput
{
CveId = "CVE-2024-1234",
AffectsKey = "pkg:deb/debian/openssl@1.1.1n-0+deb11u5",
Weaknesses = ["CWE-79"],
PatchLineage = "abc123def456" // upstream commit SHA
};
string mergeHash = calculator.ComputeMergeHash(input);
// Result: sha256:7f8a9b...
```
Two advisories with different patch lineage (e.g., Debian backport vs Ubuntu backport) produce different merge hashes, preventing incorrect deduplication.
## API Endpoints
### Get Provenance for Canonical Advisory
```http
GET /api/v1/canonical/{id}/provenance
```
Returns all distro-specific provenance scopes:
```json
{
"canonicalId": "11111111-1111-1111-1111-111111111111",
"scopes": [
{
"id": "22222222-2222-2222-2222-222222222222",
"distroRelease": "debian:bookworm",
"backportSemver": "1.1.1n-0+deb12u1",
"patchId": "abc123def456abc123def456abc123def456abc123",
"patchOrigin": "upstream",
"evidenceRef": "33333333-3333-3333-3333-333333333333",
"confidence": 0.95,
"updatedAt": "2025-01-15T10:30:00Z"
},
{
"id": "44444444-4444-4444-4444-444444444444",
"distroRelease": "ubuntu:22.04",
"backportSemver": "1.1.1n-0ubuntu1.22.04.1",
"patchId": "ubuntu-specific-patch-001",
"patchOrigin": "distro",
"confidence": 0.85,
"updatedAt": "2025-01-15T11:00:00Z"
}
],
"totalCount": 2
}
```
## Database Schema
```sql
CREATE TABLE vuln.provenance_scope (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
canonical_id UUID NOT NULL REFERENCES vuln.advisory_canonical(id) ON DELETE CASCADE,
distro_release TEXT NOT NULL, -- e.g., 'debian:bookworm', 'rhel:9.2'
backport_semver TEXT, -- distro's backported version
patch_id TEXT, -- upstream commit SHA or patch identifier
patch_origin TEXT, -- 'upstream', 'distro', 'vendor'
evidence_ref UUID, -- FK to proofchain.proof_entries
confidence NUMERIC(3,2) DEFAULT 0.5, -- 0.00-1.00
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (canonical_id, distro_release)
);
CREATE INDEX idx_provenance_scope_canonical ON vuln.provenance_scope(canonical_id);
CREATE INDEX idx_provenance_scope_distro ON vuln.provenance_scope(distro_release);
```
## Merge Audit Log
When a merge event includes backport evidence, it's recorded in the audit log:
```csharp
var record = new MergeEventRecord(
id: Guid.NewGuid(),
advisoryKey: "CVE-2024-1234",
beforeHash: previousHash,
afterHash: newHash,
mergedAt: DateTimeOffset.UtcNow,
inputDocumentIds: [...],
fieldDecisions: [...],
backportEvidence: [
new BackportEvidenceDecision(
cveId: "CVE-2024-1234",
distroRelease: "debian:bookworm",
evidenceTier: "DistroAdvisory",
confidence: 0.95,
patchId: "abc123...",
patchOrigin: "Upstream",
proofId: "proof:33333333-...",
evidenceDate: DateTimeOffset.UtcNow
)
]
);
```
## Configuration
Backport detection is enabled by default. Configure via `concelier.yaml`:
```yaml
concelier:
backport:
enabled: true
# Minimum confidence threshold for creating provenance scope
minConfidence: 0.3
# Evidence tiers to consider (1=DistroAdvisory, 2=Changelog, 3=PatchHeader, 4=Binary)
enabledTiers: [1, 2, 3, 4]
# Sources with precedence for patch origin
precedence:
- upstream
- distro
- vendor
```
## Testing
The `BackportProvenanceE2ETests` class provides comprehensive E2E tests:
- `E2E_IngestDebianAdvisoryWithBackport_CreatesProvenanceScope`
- `E2E_IngestRhelAdvisoryWithBackport_CreatesProvenanceScopeWithDistroOrigin`
- `E2E_SameCveMultipleDistros_CreatesSeparateProvenanceScopes`
- `E2E_MergeWithBackportEvidence_RecordsInAuditLog`
- `E2E_EvidenceUpgrade_UpdatesProvenanceScope`
- `E2E_RetrieveProvenanceForCanonical_ReturnsAllDistroScopes`
## Related Components
- **BackportProofService**: Generates proof blobs for backport detection (in `StellaOps.Concelier.ProofService`)
- **MergeHashCalculator**: Computes deterministic merge hashes (in `StellaOps.Concelier.Merge`)
- **PatchLineageNormalizer**: Normalizes patch identifiers for hashing (in `StellaOps.Concelier.Merge`)
- **ProvenanceScopeRepository**: PostgreSQL persistence (in `StellaOps.Concelier.Storage.Postgres`)

View File

@@ -5,6 +5,25 @@ info:
description: >-
Canonical, aggregation-only surface for append-only findings events, projections, and
Merkle anchoring metadata. Aligns with schema in docs/modules/findings-ledger/schema.md.
contact:
name: StellaOps API Team
url: https://stellaops.io/docs/api
email: api@stellaops.io
tags:
- name: ledger
description: Ledger event operations
- name: projections
description: Finding projections
- name: export
description: Data export endpoints
- name: attestation
description: Attestation verification
- name: metadata
description: API metadata endpoints
- name: scoring
description: Evidence-Weighted Score (EWS) operations
- name: webhooks
description: Webhook management for score notifications
servers:
- url: https://{env}.ledger.api.stellaops.local
description: Default environment-scoped host
@@ -357,15 +376,15 @@ paths:
operationId: calculateFindingScore
tags: [scoring]
security:
- bearerAuth: [write:scores]
- bearerAuth: []
parameters:
- name: findingId
in: path
required: true
description: Finding identifier in format CVE-ID@pkg:PURL
description: Finding identifier in format CVE-ID@pkg:PURL. Requires scope write:scores.
schema:
type: string
pattern: "^[A-Z]+-\\d+@pkg:.+$"
pattern: "^[A-Z]+-\\d+-\\d+@pkg:.+$"
example: "CVE-2024-1234@pkg:deb/debian/curl@7.64.0-4"
requestBody:
required: false
@@ -406,7 +425,7 @@ paths:
explanations:
- "Static reachability: path to vulnerable sink (confidence: 85%)"
- "Runtime: 3 observations in last 24 hours"
policyDigest: "sha256:abc123..."
policyDigest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
calculatedAt: "2026-01-15T14:30:00Z"
cachedUntil: "2026-01-15T15:30:00Z"
'400':
@@ -425,11 +444,11 @@ paths:
description: Rate limit exceeded (100/min)
get:
summary: Get cached evidence-weighted score for a finding
description: Returns the most recently calculated score from cache. Returns 404 if no score has been calculated.
description: Returns the most recently calculated score from cache. Returns 404 if no score has been calculated. Requires scope read:scores.
operationId: getFindingScore
tags: [scoring]
security:
- bearerAuth: [read:scores]
- bearerAuth: []
parameters:
- name: findingId
in: path
@@ -443,17 +462,25 @@ paths:
application/json:
schema:
$ref: '#/components/schemas/EvidenceWeightedScoreResponse'
example:
findingId: "CVE-2024-1234@pkg:deb/debian/curl@7.64.0-4"
score: 78
bucket: "ScheduleNext"
policyDigest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
calculatedAt: "2026-01-15T14:30:00Z"
cachedUntil: "2026-01-15T15:30:00Z"
fromCache: true
'404':
description: No cached score found
/api/v1/findings/scores:
post:
summary: Calculate evidence-weighted scores for multiple findings
description: Batch calculation of scores for up to 100 findings. Returns summary statistics and individual results.
description: Batch calculation of scores for up to 100 findings. Returns summary statistics and individual results. Requires scope write:scores.
operationId: calculateFindingScoresBatch
tags: [scoring]
security:
- bearerAuth: [write:scores]
- bearerAuth: []
requestBody:
required: true
content:
@@ -473,6 +500,23 @@ paths:
application/json:
schema:
$ref: '#/components/schemas/CalculateScoresBatchResponse'
example:
results:
- findingId: "CVE-2024-1234@pkg:npm/lodash@4.17.20"
score: 78
bucket: "ScheduleNext"
policyDigest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
calculatedAt: "2026-01-15T14:30:00Z"
summary:
total: 2
succeeded: 2
failed: 0
byBucket: { actNow: 0, scheduleNext: 1, investigate: 1, watchlist: 0 }
averageScore: 65
calculationTimeMs: 45
errors: []
policyDigest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
calculatedAt: "2026-01-15T14:30:00Z"
'400':
description: Invalid request or batch too large (max 100)
content:
@@ -485,11 +529,11 @@ paths:
/api/v1/findings/{findingId}/score-history:
get:
summary: Get score history for a finding
description: Returns historical score calculations with pagination. Tracks score changes, triggers, and which factors changed.
description: Returns historical score calculations with pagination. Tracks score changes, triggers, and which factors changed. Requires scope read:scores.
operationId: getFindingScoreHistory
tags: [scoring]
security:
- bearerAuth: [read:scores]
- bearerAuth: []
parameters:
- name: findingId
in: path
@@ -528,17 +572,34 @@ paths:
application/json:
schema:
$ref: '#/components/schemas/ScoreHistoryResponse'
example:
findingId: "CVE-2024-1234@pkg:deb/debian/curl@7.64.0-4"
history:
- score: 78
bucket: "ScheduleNext"
policyDigest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
calculatedAt: "2026-01-15T14:30:00Z"
trigger: "evidence_update"
changedFactors: ["rts", "xpl"]
- score: 65
bucket: "Investigate"
policyDigest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
calculatedAt: "2026-01-10T09:15:00Z"
trigger: "scheduled"
changedFactors: []
pagination:
hasMore: false
'404':
description: Finding not found
/api/v1/scoring/policy:
get:
summary: Get active scoring policy configuration
description: Returns the currently active evidence weight policy including weights, guardrails, and bucket thresholds.
description: Returns the currently active evidence weight policy including weights, guardrails, and bucket thresholds. Requires scope read:scores.
operationId: getActiveScoringPolicy
tags: [scoring]
security:
- bearerAuth: [read:scores]
- bearerAuth: []
responses:
'200':
description: Active policy retrieved
@@ -548,7 +609,7 @@ paths:
$ref: '#/components/schemas/ScoringPolicyResponse'
example:
version: "ews.v1.2"
digest: "sha256:abc123..."
digest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
activeSince: "2026-01-01T00:00:00Z"
environment: "production"
weights:
@@ -570,11 +631,11 @@ paths:
/api/v1/scoring/policy/{version}:
get:
summary: Get specific scoring policy version
description: Returns a specific version of the scoring policy for historical comparison or audit.
description: Returns a specific version of the scoring policy for historical comparison or audit. Requires scope read:scores.
operationId: getScoringPolicyVersion
tags: [scoring]
security:
- bearerAuth: [read:scores]
- bearerAuth: []
parameters:
- name: version
in: path
@@ -589,6 +650,26 @@ paths:
application/json:
schema:
$ref: '#/components/schemas/ScoringPolicyResponse'
example:
version: "ews.v1.2"
digest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
activeSince: "2026-01-01T00:00:00Z"
environment: "production"
weights:
rch: 0.30
rts: 0.25
bkp: 0.15
xpl: 0.15
src: 0.10
mit: 0.10
guardrails:
notAffectedCap: { enabled: true, maxScore: 15 }
runtimeFloor: { enabled: true, minScore: 60 }
speculativeCap: { enabled: true, maxScore: 45 }
buckets:
actNowMin: 90
scheduleNextMin: 70
investigateMin: 40
'404':
description: Policy version not found
@@ -603,7 +684,7 @@ paths:
operationId: registerScoringWebhook
tags: [scoring, webhooks]
security:
- bearerAuth: [admin:scoring]
- bearerAuth: []
requestBody:
required: true
content:
@@ -623,16 +704,25 @@ paths:
application/json:
schema:
$ref: '#/components/schemas/WebhookResponse'
example:
id: "550e8400-e29b-41d4-a716-446655440000"
url: "https://example.com/webhook/scores"
hasSecret: true
findingPatterns: ["CVE-*"]
minScoreChange: 10
triggerOnBucketChange: true
createdAt: "2026-01-15T14:30:00Z"
'400':
description: Invalid webhook URL or configuration
'429':
description: Rate limit exceeded (10/min)
get:
summary: List all registered webhooks
description: List all registered scoring webhooks. Requires scope admin:scoring.
operationId: listScoringWebhooks
tags: [scoring, webhooks]
security:
- bearerAuth: [admin:scoring]
- bearerAuth: []
responses:
'200':
description: List of webhooks
@@ -640,14 +730,25 @@ paths:
application/json:
schema:
$ref: '#/components/schemas/WebhookListResponse'
example:
webhooks:
- id: "550e8400-e29b-41d4-a716-446655440000"
url: "https://example.com/webhook/scores"
hasSecret: true
findingPatterns: ["CVE-*"]
minScoreChange: 10
triggerOnBucketChange: true
createdAt: "2026-01-15T14:30:00Z"
totalCount: 1
/api/v1/scoring/webhooks/{id}:
get:
summary: Get a specific webhook by ID
description: Get details of a specific webhook. Requires scope admin:scoring.
operationId: getScoringWebhook
tags: [scoring, webhooks]
security:
- bearerAuth: [admin:scoring]
- bearerAuth: []
parameters:
- name: id
in: path
@@ -662,14 +763,23 @@ paths:
application/json:
schema:
$ref: '#/components/schemas/WebhookResponse'
example:
id: "550e8400-e29b-41d4-a716-446655440000"
url: "https://example.com/webhook/scores"
hasSecret: true
findingPatterns: ["CVE-*"]
minScoreChange: 10
triggerOnBucketChange: true
createdAt: "2026-01-15T14:30:00Z"
'404':
description: Webhook not found
put:
summary: Update a webhook configuration
description: Update a webhook configuration. Requires scope admin:scoring.
operationId: updateScoringWebhook
tags: [scoring, webhooks]
security:
- bearerAuth: [admin:scoring]
- bearerAuth: []
parameters:
- name: id
in: path
@@ -690,16 +800,25 @@ paths:
application/json:
schema:
$ref: '#/components/schemas/WebhookResponse'
example:
id: "550e8400-e29b-41d4-a716-446655440000"
url: "https://example.com/webhook/updated"
hasSecret: true
findingPatterns: ["CVE-*", "GHSA-*"]
minScoreChange: 5
triggerOnBucketChange: true
createdAt: "2026-01-15T14:30:00Z"
'404':
description: Webhook not found
'400':
description: Invalid configuration
delete:
summary: Delete a webhook
description: Delete a webhook registration. Requires scope admin:scoring.
operationId: deleteScoringWebhook
tags: [scoring, webhooks]
security:
- bearerAuth: [admin:scoring]
- bearerAuth: []
parameters:
- name: id
in: path