save dev progress

2025-12-26 00:32:35 +02:00
parent aa70af062e
commit ed3079543c
142 changed files with 23771 additions and 232 deletions
--- a/docs/modules/concelier/backport-deduplication.md
+++ b/docs/modules/concelier/backport-deduplication.md
@@ -0,0 +1,211 @@
+# Backport-Aware Deduplication
+
+> Sprint: SPRINT_8200_0015_0001_CONCEL_backport_integration
+> Task: BACKPORT-8200-027
+
+## Overview
+
+Linux distributions frequently backport security fixes from upstream projects to their stable package versions without updating the full version number. This creates a challenge for vulnerability scanning: a Debian package at version `1.0-1+deb12u1` may contain the fix for CVE-2024-1234 even though the upstream fixed version is `1.5.0`.
+
+Concelier's backport-aware deduplication addresses this by:
+
+1. **Detecting backports** through the `BackportProofService` which analyzes distro advisories, changelogs, patch headers, and binary fingerprints
+2. **Tracking provenance** per-distro in the `provenance_scope` table
+3. **Including patch lineage** in merge hash computation for deterministic deduplication
+4. **Recording evidence** in the merge audit log for traceability
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                        Ingestion Pipeline                           │
+├─────────────────────────────────────────────────────────────────────┤
+│  Distro Advisory   →   BackportEvidenceResolver   →   MergeHash    │
+│  (DSA, RHSA, USN)      (calls BackportProofService)     Calculator │
+│                              │                              │       │
+│                              ▼                              │       │
+│                    ProvenanceScopeService                   │       │
+│                        (creates/updates                     │       │
+│                         provenance_scope)                   │       │
+│                              │                              │       │
+│                              ▼                              ▼       │
+│                    ┌─────────────────────────────────────────┐     │
+│                    │           PostgreSQL                    │     │
+│                    │  ┌───────────────────────────────────┐  │     │
+│                    │  │      vuln.provenance_scope        │  │     │
+│                    │  │  - canonical_id (FK)              │  │     │
+│                    │  │  - distro_release                 │  │     │
+│                    │  │  - backport_semver                │  │     │
+│                    │  │  - patch_id                       │  │     │
+│                    │  │  - patch_origin                   │  │     │
+│                    │  │  - evidence_ref (proofchain FK)   │  │     │
+│                    │  │  - confidence                     │  │     │
+│                    │  └───────────────────────────────────┘  │     │
+│                    └─────────────────────────────────────────┘     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+## Evidence Tiers
+
+The `BackportProofService` produces evidence at four quality tiers:
+
+| Tier | Name | Description | Typical Confidence |
+|------|------|-------------|-------------------|
+| 1 | DistroAdvisory | Direct distro advisory (DSA, RHSA, USN) confirms fix | 0.90 - 1.00 |
+| 2 | ChangelogMention | Package changelog mentions CVE or patch commit | 0.75 - 0.90 |
+| 3 | PatchHeader | Patch file header matches upstream fix commit | 0.60 - 0.85 |
+| 4 | BinaryFingerprint | Binary analysis matches known-fixed function signatures | 0.40 - 0.70 |
+
+Higher-tier evidence takes precedence when updating `provenance_scope` records.
+
+## Patch Origin
+
+The `patch_origin` field tracks where the fix came from:
+
+- **upstream**: Patch applied directly from upstream project commit
+- **distro**: Distro-specific patch developed by maintainers
+- **vendor**: Commercial vendor-specific patch
+
+## Merge Hash Computation
+
+The merge hash includes patch lineage to differentiate backport scenarios:
+
+```csharp
+// MergeHashCalculator computes deterministic hash
+var input = new MergeHashInput
+{
+    CveId = "CVE-2024-1234",
+    AffectsKey = "pkg:deb/debian/openssl@1.1.1n-0+deb11u5",
+    Weaknesses = ["CWE-79"],
+    PatchLineage = "abc123def456" // upstream commit SHA
+};
+
+string mergeHash = calculator.ComputeMergeHash(input);
+// Result: sha256:7f8a9b...
+```
+
+Two advisories with different patch lineage (e.g., Debian backport vs Ubuntu backport) produce different merge hashes, preventing incorrect deduplication.
+
+## API Endpoints
+
+### Get Provenance for Canonical Advisory
+
+```http
+GET /api/v1/canonical/{id}/provenance
+```
+
+Returns all distro-specific provenance scopes:
+
+```json
+{
+  "canonicalId": "11111111-1111-1111-1111-111111111111",
+  "scopes": [
+    {
+      "id": "22222222-2222-2222-2222-222222222222",
+      "distroRelease": "debian:bookworm",
+      "backportSemver": "1.1.1n-0+deb12u1",
+      "patchId": "abc123def456abc123def456abc123def456abc123",
+      "patchOrigin": "upstream",
+      "evidenceRef": "33333333-3333-3333-3333-333333333333",
+      "confidence": 0.95,
+      "updatedAt": "2025-01-15T10:30:00Z"
+    },
+    {
+      "id": "44444444-4444-4444-4444-444444444444",
+      "distroRelease": "ubuntu:22.04",
+      "backportSemver": "1.1.1n-0ubuntu1.22.04.1",
+      "patchId": "ubuntu-specific-patch-001",
+      "patchOrigin": "distro",
+      "confidence": 0.85,
+      "updatedAt": "2025-01-15T11:00:00Z"
+    }
+  ],
+  "totalCount": 2
+}
+```
+
+## Database Schema
+
+```sql
+CREATE TABLE vuln.provenance_scope (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    canonical_id UUID NOT NULL REFERENCES vuln.advisory_canonical(id) ON DELETE CASCADE,
+    distro_release TEXT NOT NULL,         -- e.g., 'debian:bookworm', 'rhel:9.2'
+    backport_semver TEXT,                 -- distro's backported version
+    patch_id TEXT,                        -- upstream commit SHA or patch identifier
+    patch_origin TEXT,                    -- 'upstream', 'distro', 'vendor'
+    evidence_ref UUID,                    -- FK to proofchain.proof_entries
+    confidence NUMERIC(3,2) DEFAULT 0.5,  -- 0.00-1.00
+    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+    UNIQUE (canonical_id, distro_release)
+);
+
+CREATE INDEX idx_provenance_scope_canonical ON vuln.provenance_scope(canonical_id);
+CREATE INDEX idx_provenance_scope_distro ON vuln.provenance_scope(distro_release);
+```
+
+## Merge Audit Log
+
+When a merge event includes backport evidence, it's recorded in the audit log:
+
+```csharp
+var record = new MergeEventRecord(
+    id: Guid.NewGuid(),
+    advisoryKey: "CVE-2024-1234",
+    beforeHash: previousHash,
+    afterHash: newHash,
+    mergedAt: DateTimeOffset.UtcNow,
+    inputDocumentIds: [...],
+    fieldDecisions: [...],
+    backportEvidence: [
+        new BackportEvidenceDecision(
+            cveId: "CVE-2024-1234",
+            distroRelease: "debian:bookworm",
+            evidenceTier: "DistroAdvisory",
+            confidence: 0.95,
+            patchId: "abc123...",
+            patchOrigin: "Upstream",
+            proofId: "proof:33333333-...",
+            evidenceDate: DateTimeOffset.UtcNow
+        )
+    ]
+);
+```
+
+## Configuration
+
+Backport detection is enabled by default. Configure via `concelier.yaml`:
+
+```yaml
+concelier:
+  backport:
+    enabled: true
+    # Minimum confidence threshold for creating provenance scope
+    minConfidence: 0.3
+    # Evidence tiers to consider (1=DistroAdvisory, 2=Changelog, 3=PatchHeader, 4=Binary)
+    enabledTiers: [1, 2, 3, 4]
+    # Sources with precedence for patch origin
+    precedence:
+      - upstream
+      - distro
+      - vendor
+```
+
+## Testing
+
+The `BackportProvenanceE2ETests` class provides comprehensive E2E tests:
+
+- `E2E_IngestDebianAdvisoryWithBackport_CreatesProvenanceScope`
+- `E2E_IngestRhelAdvisoryWithBackport_CreatesProvenanceScopeWithDistroOrigin`
+- `E2E_SameCveMultipleDistros_CreatesSeparateProvenanceScopes`
+- `E2E_MergeWithBackportEvidence_RecordsInAuditLog`
+- `E2E_EvidenceUpgrade_UpdatesProvenanceScope`
+- `E2E_RetrieveProvenanceForCanonical_ReturnsAllDistroScopes`
+
+## Related Components
+
+- **BackportProofService**: Generates proof blobs for backport detection (in `StellaOps.Concelier.ProofService`)
+- **MergeHashCalculator**: Computes deterministic merge hashes (in `StellaOps.Concelier.Merge`)
+- **PatchLineageNormalizer**: Normalizes patch identifiers for hashing (in `StellaOps.Concelier.Merge`)
+- **ProvenanceScopeRepository**: PostgreSQL persistence (in `StellaOps.Concelier.Storage.Postgres`)
--- a/docs/modules/findings-ledger/openapi/findings-ledger.v1.yaml
+++ b/docs/modules/findings-ledger/openapi/findings-ledger.v1.yaml
@@ -5,6 +5,25 @@ info:
  description: >-
    Canonical, aggregation-only surface for append-only findings events, projections, and
    Merkle anchoring metadata. Aligns with schema in docs/modules/findings-ledger/schema.md.
+  contact:
+    name: StellaOps API Team
+    url: https://stellaops.io/docs/api
+    email: api@stellaops.io
+tags:
+  - name: ledger
+    description: Ledger event operations
+  - name: projections
+    description: Finding projections
+  - name: export
+    description: Data export endpoints
+  - name: attestation
+    description: Attestation verification
+  - name: metadata
+    description: API metadata endpoints
+  - name: scoring
+    description: Evidence-Weighted Score (EWS) operations
+  - name: webhooks
+    description: Webhook management for score notifications
 servers:
  - url: https://{env}.ledger.api.stellaops.local
    description: Default environment-scoped host
@@ -357,15 +376,15 @@ paths:
      operationId: calculateFindingScore
      tags: [scoring]
      security:
-        - bearerAuth: [write:scores]
+        - bearerAuth: []
      parameters:
        - name: findingId
          in: path
          required: true
-          description: Finding identifier in format CVE-ID@pkg:PURL
+          description: Finding identifier in format CVE-ID@pkg:PURL. Requires scope write:scores.
          schema:
            type: string
-            pattern: "^[A-Z]+-\\d+@pkg:.+$"
+            pattern: "^[A-Z]+-\\d+-\\d+@pkg:.+$"
          example: "CVE-2024-1234@pkg:deb/debian/curl@7.64.0-4"
      requestBody:
        required: false
@@ -406,7 +425,7 @@ paths:
                explanations:
                  - "Static reachability: path to vulnerable sink (confidence: 85%)"
                  - "Runtime: 3 observations in last 24 hours"
-                policyDigest: "sha256:abc123..."
+                policyDigest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
                calculatedAt: "2026-01-15T14:30:00Z"
                cachedUntil: "2026-01-15T15:30:00Z"
        '400':
@@ -425,11 +444,11 @@ paths:
          description: Rate limit exceeded (100/min)
    get:
      summary: Get cached evidence-weighted score for a finding
-      description: Returns the most recently calculated score from cache. Returns 404 if no score has been calculated.
+      description: Returns the most recently calculated score from cache. Returns 404 if no score has been calculated. Requires scope read:scores.
      operationId: getFindingScore
      tags: [scoring]
      security:
-        - bearerAuth: [read:scores]
+        - bearerAuth: []
      parameters:
        - name: findingId
          in: path
@@ -443,17 +462,25 @@ paths:
            application/json:
              schema:
                $ref: '#/components/schemas/EvidenceWeightedScoreResponse'
+              example:
+                findingId: "CVE-2024-1234@pkg:deb/debian/curl@7.64.0-4"
+                score: 78
+                bucket: "ScheduleNext"
+                policyDigest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
+                calculatedAt: "2026-01-15T14:30:00Z"
+                cachedUntil: "2026-01-15T15:30:00Z"
+                fromCache: true
        '404':
          description: No cached score found

  /api/v1/findings/scores:
    post:
      summary: Calculate evidence-weighted scores for multiple findings
-      description: Batch calculation of scores for up to 100 findings. Returns summary statistics and individual results.
+      description: Batch calculation of scores for up to 100 findings. Returns summary statistics and individual results. Requires scope write:scores.
      operationId: calculateFindingScoresBatch
      tags: [scoring]
      security:
-        - bearerAuth: [write:scores]
+        - bearerAuth: []
      requestBody:
        required: true
        content:
@@ -473,6 +500,23 @@ paths:
            application/json:
              schema:
                $ref: '#/components/schemas/CalculateScoresBatchResponse'
+              example:
+                results:
+                  - findingId: "CVE-2024-1234@pkg:npm/lodash@4.17.20"
+                    score: 78
+                    bucket: "ScheduleNext"
+                    policyDigest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
+                    calculatedAt: "2026-01-15T14:30:00Z"
+                summary:
+                  total: 2
+                  succeeded: 2
+                  failed: 0
+                  byBucket: { actNow: 0, scheduleNext: 1, investigate: 1, watchlist: 0 }
+                  averageScore: 65
+                  calculationTimeMs: 45
+                errors: []
+                policyDigest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
+                calculatedAt: "2026-01-15T14:30:00Z"
        '400':
          description: Invalid request or batch too large (max 100)
          content:
@@ -485,11 +529,11 @@ paths:
  /api/v1/findings/{findingId}/score-history:
    get:
      summary: Get score history for a finding
-      description: Returns historical score calculations with pagination. Tracks score changes, triggers, and which factors changed.
+      description: Returns historical score calculations with pagination. Tracks score changes, triggers, and which factors changed. Requires scope read:scores.
      operationId: getFindingScoreHistory
      tags: [scoring]
      security:
-        - bearerAuth: [read:scores]
+        - bearerAuth: []
      parameters:
        - name: findingId
          in: path
@@ -528,17 +572,34 @@ paths:
            application/json:
              schema:
                $ref: '#/components/schemas/ScoreHistoryResponse'
+              example:
+                findingId: "CVE-2024-1234@pkg:deb/debian/curl@7.64.0-4"
+                history:
+                  - score: 78
+                    bucket: "ScheduleNext"
+                    policyDigest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
+                    calculatedAt: "2026-01-15T14:30:00Z"
+                    trigger: "evidence_update"
+                    changedFactors: ["rts", "xpl"]
+                  - score: 65
+                    bucket: "Investigate"
+                    policyDigest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
+                    calculatedAt: "2026-01-10T09:15:00Z"
+                    trigger: "scheduled"
+                    changedFactors: []
+                pagination:
+                  hasMore: false
        '404':
          description: Finding not found

  /api/v1/scoring/policy:
    get:
      summary: Get active scoring policy configuration
-      description: Returns the currently active evidence weight policy including weights, guardrails, and bucket thresholds.
+      description: Returns the currently active evidence weight policy including weights, guardrails, and bucket thresholds. Requires scope read:scores.
      operationId: getActiveScoringPolicy
      tags: [scoring]
      security:
-        - bearerAuth: [read:scores]
+        - bearerAuth: []
      responses:
        '200':
          description: Active policy retrieved
@@ -548,7 +609,7 @@ paths:
                $ref: '#/components/schemas/ScoringPolicyResponse'
              example:
                version: "ews.v1.2"
-                digest: "sha256:abc123..."
+                digest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
                activeSince: "2026-01-01T00:00:00Z"
                environment: "production"
                weights:
@@ -570,11 +631,11 @@ paths:
  /api/v1/scoring/policy/{version}:
    get:
      summary: Get specific scoring policy version
-      description: Returns a specific version of the scoring policy for historical comparison or audit.
+      description: Returns a specific version of the scoring policy for historical comparison or audit. Requires scope read:scores.
      operationId: getScoringPolicyVersion
      tags: [scoring]
      security:
-        - bearerAuth: [read:scores]
+        - bearerAuth: []
      parameters:
        - name: version
          in: path
@@ -589,6 +650,26 @@ paths:
            application/json:
              schema:
                $ref: '#/components/schemas/ScoringPolicyResponse'
+              example:
+                version: "ews.v1.2"
+                digest: "sha256:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
+                activeSince: "2026-01-01T00:00:00Z"
+                environment: "production"
+                weights:
+                  rch: 0.30
+                  rts: 0.25
+                  bkp: 0.15
+                  xpl: 0.15
+                  src: 0.10
+                  mit: 0.10
+                guardrails:
+                  notAffectedCap: { enabled: true, maxScore: 15 }
+                  runtimeFloor: { enabled: true, minScore: 60 }
+                  speculativeCap: { enabled: true, maxScore: 45 }
+                buckets:
+                  actNowMin: 90
+                  scheduleNextMin: 70
+                  investigateMin: 40
        '404':
          description: Policy version not found

@@ -603,7 +684,7 @@ paths:
      operationId: registerScoringWebhook
      tags: [scoring, webhooks]
      security:
-        - bearerAuth: [admin:scoring]
+        - bearerAuth: []
      requestBody:
        required: true
        content:
@@ -623,16 +704,25 @@ paths:
            application/json:
              schema:
                $ref: '#/components/schemas/WebhookResponse'
+              example:
+                id: "550e8400-e29b-41d4-a716-446655440000"
+                url: "https://example.com/webhook/scores"
+                hasSecret: true
+                findingPatterns: ["CVE-*"]
+                minScoreChange: 10
+                triggerOnBucketChange: true
+                createdAt: "2026-01-15T14:30:00Z"
        '400':
          description: Invalid webhook URL or configuration
        '429':
          description: Rate limit exceeded (10/min)
    get:
      summary: List all registered webhooks
+      description: List all registered scoring webhooks. Requires scope admin:scoring.
      operationId: listScoringWebhooks
      tags: [scoring, webhooks]
      security:
-        - bearerAuth: [admin:scoring]
+        - bearerAuth: []
      responses:
        '200':
          description: List of webhooks
@@ -640,14 +730,25 @@ paths:
            application/json:
              schema:
                $ref: '#/components/schemas/WebhookListResponse'
+              example:
+                webhooks:
+                  - id: "550e8400-e29b-41d4-a716-446655440000"
+                    url: "https://example.com/webhook/scores"
+                    hasSecret: true
+                    findingPatterns: ["CVE-*"]
+                    minScoreChange: 10
+                    triggerOnBucketChange: true
+                    createdAt: "2026-01-15T14:30:00Z"
+                totalCount: 1

  /api/v1/scoring/webhooks/{id}:
    get:
      summary: Get a specific webhook by ID
+      description: Get details of a specific webhook. Requires scope admin:scoring.
      operationId: getScoringWebhook
      tags: [scoring, webhooks]
      security:
-        - bearerAuth: [admin:scoring]
+        - bearerAuth: []
      parameters:
        - name: id
          in: path
@@ -662,14 +763,23 @@ paths:
            application/json:
              schema:
                $ref: '#/components/schemas/WebhookResponse'
+              example:
+                id: "550e8400-e29b-41d4-a716-446655440000"
+                url: "https://example.com/webhook/scores"
+                hasSecret: true
+                findingPatterns: ["CVE-*"]
+                minScoreChange: 10
+                triggerOnBucketChange: true
+                createdAt: "2026-01-15T14:30:00Z"
        '404':
          description: Webhook not found
    put:
      summary: Update a webhook configuration
+      description: Update a webhook configuration. Requires scope admin:scoring.
      operationId: updateScoringWebhook
      tags: [scoring, webhooks]
      security:
-        - bearerAuth: [admin:scoring]
+        - bearerAuth: []
      parameters:
        - name: id
          in: path
@@ -690,16 +800,25 @@ paths:
            application/json:
              schema:
                $ref: '#/components/schemas/WebhookResponse'
+              example:
+                id: "550e8400-e29b-41d4-a716-446655440000"
+                url: "https://example.com/webhook/updated"
+                hasSecret: true
+                findingPatterns: ["CVE-*", "GHSA-*"]
+                minScoreChange: 5
+                triggerOnBucketChange: true
+                createdAt: "2026-01-15T14:30:00Z"
        '404':
          description: Webhook not found
        '400':
          description: Invalid configuration
    delete:
      summary: Delete a webhook
+      description: Delete a webhook registration. Requires scope admin:scoring.
      operationId: deleteScoringWebhook
      tags: [scoring, webhooks]
      security:
-        - bearerAuth: [admin:scoring]
+        - bearerAuth: []
      parameters:
        - name: id
          in: path