Files
git.stella-ops.org/docs/modules/concelier/sbom-learning-api.md
2026-01-22 19:08:46 +02:00

440 lines
15 KiB
Markdown

# SBOM Learning API
Per SPRINT_8200_0013_0003.
## Overview
The SBOM Learning API enables Concelier to learn which advisories are relevant to your organization by registering SBOMs from scanned images. When an SBOM is registered, Concelier matches its components against the canonical advisory database and updates interest scores accordingly.
## SBOM Extraction
Concelier normalizes incoming CycloneDX 1.7 and SPDX 3.0.1 documents into the internal `ParsedSbom` model for matching and downstream analysis.
Current extraction coverage (SPRINT_20260119_015):
- Document metadata: format, specVersion, serialNumber, created, name, profiles, sbomType, namespace/imports
- Components: bomRef, type, name, version, purl, cpe, hashes (including SPDX verifiedUsing), license IDs/expressions, license text (base64 decode), external references, properties, scope/modified, supplier/manufacturer, evidence, pedigree, cryptoProperties, modelCard (CycloneDX), swid (CycloneDX), SPDX AI model parameters, SPDX dataset metadata, SPDX file/snippet properties
- Licensing: SPDX Licensing profile elements (listed/custom licenses, license additions, AND/OR/WITH/or-later operators), with OSI/FSF flags and deprecated IDs captured
- Dependencies: component dependency edges (CycloneDX dependencies, SPDX relationships; DependencyOf is inverted to DependsOn)
- Vulnerabilities: CycloneDX embedded vulnerabilities (ratings, affects, VEX analysis), SPDX Security profile vulnerabilities + VEX assessments
- Services: endpoints, authentication, crossesTrustBoundary, data flows, licenses, external references (CycloneDX)
- Formulation: components, workflows, tasks, properties (CycloneDX)
- Declarations/definitions: attestations, affirmations, standards, signatures (CycloneDX)
- Compositions/annotations (CycloneDX)
- Build metadata: buildId, buildType, timestamps, config source, environment, parameters (SPDX)
- Document properties
Notes:
- License expressions can be validated against embedded SPDX license/exception lists via `ILicenseExpressionValidator`.
- Matching currently uses PURL and CPE; additional fields are stored for downstream consumers.
## VEX consumption
When SBOM vulnerabilities include embedded VEX analysis, Concelier consumes the statements
to filter or annotate advisory matches. NotAffected statements can be filtered when policy
allows, and trust evaluation checks timestamps, signatures (when provided), and justification
requirements for not-affected claims.
Configuration (YAML or JSON), loaded from `Concelier:VexConsumption:PolicyPath`:
```yaml
vexConsumptionPolicy:
trustEmbeddedVex: true
minimumTrustLevel: Unverified
filterNotAffected: true
signatureRequirements:
requireSignedVex: false
trustedSigners:
- "https://example.com/keys/vex-signer"
timestampRequirements:
maxAgeHours: 720
requireTimestamp: true
conflictResolution:
strategy: mostRecent
logConflicts: true
mergePolicy:
mode: union
externalSources:
- type: repository
url: "https://vex.example.com/api"
justificationRequirements:
requireJustificationForNotAffected: true
acceptedJustifications:
- component_not_present
- vulnerable_code_not_present
- vulnerable_code_not_in_execute_path
- inline_mitigations_already_exist
```
Reports are emitted via `VexConsumptionReporter` in JSON, SARIF, and text formats.
Runtime overrides can be supplied via `Concelier:VexConsumption` (Enabled, IgnoreVex,
PolicyPath, TrustEmbeddedVex, MinimumTrustLevel, FilterNotAffected, ExternalVexSources).
## Flow
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ SBOM Learning Flow │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ scan ┌─────────┐ SBOM ┌───────────┐ │
│ │ Image │ ──────────► │ Scanner │ ─────────► │ Concelier │ │
│ │ │ │ │ │ │ │
│ └─────────┘ └─────────┘ └─────┬─────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ SBOM Registration │ │
│ │ ┌───────────────┐ │ │
│ │ │ Extract PURLs │ │ │
│ │ └───────┬───────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌───────────────┐ │ │
│ │ │ Match Advs │ │ │
│ │ └───────┬───────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌───────────────┐ │ │
│ │ │ Update Scores │ │ │
│ │ └───────────────┘ │ │
│ └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## API Endpoints
### Register SBOM
```
POST /api/v1/learn/sbom
Content-Type: application/vnd.cyclonedx+json
```
or
```
POST /api/v1/learn/sbom
Content-Type: application/spdx+json
```
**Request Body:** CycloneDX or SPDX SBOM document
**Query Parameters:**
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `artifact_id` | string | required | Image digest or artifact identifier |
| `update_scores` | bool | true | Trigger immediate score recalculation |
| `include_reachability` | bool | true | Include reachability data in matching |
**Response:**
```json
{
"sbom_id": "uuid",
"sbom_digest": "sha256:abc123...",
"artifact_id": "sha256:image...",
"component_count": 234,
"matched_advisories": 15,
"scores_updated": true,
"registered_at": "2025-01-15T10:30:00Z"
}
```
### Get Affected Advisories
```
GET /api/v1/sboms/{digest}/affected
```
**Response:**
```json
{
"sbom_digest": "sha256:abc123...",
"artifact_id": "sha256:image...",
"matched_advisories": [
{
"canonical_id": "uuid",
"cve": "CVE-2024-1234",
"severity": "high",
"interest_score": 0.85,
"matched_component": "pkg:npm/express@4.17.1",
"is_reachable": true
},
{
"canonical_id": "uuid",
"cve": "CVE-2024-5678",
"severity": "medium",
"interest_score": 0.65,
"matched_component": "pkg:npm/lodash@4.17.20",
"is_reachable": false
}
],
"total_count": 15,
"last_matched_at": "2025-01-15T10:30:00Z"
}
```
### List Registered SBOMs
```
GET /api/v1/sboms
```
**Query Parameters:**
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `artifact_id` | string | null | Filter by artifact |
| `since` | datetime | null | Only SBOMs registered after this time |
| `limit` | int | 100 | Max results |
| `cursor` | string | null | Pagination cursor |
**Response:**
```json
{
"sboms": [
{
"id": "uuid",
"artifact_id": "sha256:image...",
"sbom_digest": "sha256:abc123...",
"sbom_format": "cyclonedx",
"component_count": 234,
"matched_advisory_count": 15,
"registered_at": "2025-01-15T10:30:00Z"
}
],
"total_count": 42,
"next_cursor": "cursor..."
}
```
### Unregister SBOM
```
DELETE /api/v1/sboms/{digest}
```
**Query Parameters:**
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `update_scores` | bool | true | Recalculate scores after removal |
## Matching Algorithm
### PURL Matching
1. **Exact Match:** `pkg:npm/express@4.17.1` matches advisories affecting exactly that version
2. **Range Match:** Uses semantic version ranges from advisory affects_key
3. **Namespace Normalization:** `@scope/pkg` normalized for comparison
### CPE Matching
For OS packages (rpm, deb):
1. Extract CPE from SBOM
2. Match against advisory CPE patterns
3. Apply distro-specific version logic (NEVRA/EVR)
### Reachability Integration
When `include_reachability=true`:
1. Query Scanner call graph data for matched components
2. Mark `is_reachable` based on path from entry point
3. Factor into interest score calculation
## Events
### SbomLearned
Published when SBOM is registered:
```json
{
"event_type": "sbom_learned",
"sbom_id": "uuid",
"sbom_digest": "sha256:...",
"artifact_id": "sha256:...",
"component_count": 234,
"matched_advisory_count": 15,
"timestamp": "2025-01-15T10:30:00Z"
}
```
### ScoresUpdated
Published after batch score update:
```json
{
"event_type": "scores_updated",
"trigger": "sbom_registration",
"sbom_digest": "sha256:...",
"advisories_updated": 15,
"timestamp": "2025-01-15T10:30:05Z"
}
```
## Auto-Learning
Subscribe to Scanner events for automatic SBOM registration:
### Configuration
```yaml
SbomIntegration:
AutoLearn:
Enabled: true
SubscribeToScanEvents: true
EventSource: "scanner:scan_completed"
Matching:
EnablePurl: true
EnableCpe: true
IncludeReachability: true
ScoreUpdate:
BatchSize: 1000
DelaySeconds: 5 # Debounce rapid updates
```
### Event Handler
```csharp
// Automatic registration on scan completion
public class ScanCompletedHandler : IEventHandler<ScanCompletedEvent>
{
public async Task HandleAsync(ScanCompletedEvent evt, CancellationToken ct)
{
await _sbomService.LearnFromScanAsync(
artifactId: evt.ImageDigest,
sbomDigest: evt.SbomDigest,
sbomContent: evt.SbomContent,
cancellationToken: ct);
}
}
```
## CLI Commands
```bash
# Register SBOM from file
stella learn sbom --file ./sbom.json --artifact sha256:image...
# Register from stdin
cat sbom.json | stella learn sbom --artifact sha256:image...
# List affected advisories
stella sbom affected sha256:sbomdigest...
# List registered SBOMs
stella sbom list --limit 20
# Unregister SBOM
stella sbom unregister sha256:sbomdigest...
```
## Integration Examples
### CI/CD Pipeline
```yaml
# Example GitHub Actions workflow
- name: Scan image
run: stella scan image myapp:latest -o sbom.json
- name: Register SBOM
run: stella learn sbom --file sbom.json --artifact ${{ steps.build.outputs.digest }}
- name: Check for critical advisories
run: |
AFFECTED=$(stella sbom affected ${{ steps.sbom.outputs.digest }} --severity critical --count)
if [ "$AFFECTED" -gt 0 ]; then
echo "::error::Found $AFFECTED critical advisories"
exit 1
fi
```
### Programmatic Registration
```csharp
// Register SBOM from code
var result = await sbomService.RegisterSbomAsync(
artifactId: imageDigest,
sbomContent: sbomJson,
format: SbomFormat.CycloneDX,
options: new RegistrationOptions
{
UpdateScores = true,
IncludeReachability = true
},
cancellationToken);
// Get affected advisories
var affected = await sbomService.GetAffectedAdvisoriesAsync(
sbomDigest: result.SbomDigest,
cancellationToken);
```
## Database Schema
```sql
CREATE TABLE vuln.sbom_registry (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
digest TEXT NOT NULL,
format TEXT NOT NULL CHECK (format IN ('cyclonedx', 'spdx')),
spec_version TEXT NOT NULL,
primary_name TEXT,
primary_version TEXT,
component_count INT NOT NULL DEFAULT 0,
affected_count INT NOT NULL DEFAULT 0,
source TEXT NOT NULL,
tenant_id TEXT,
registered_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
last_matched_at TIMESTAMPTZ,
CONSTRAINT uq_sbom_registry_digest UNIQUE (digest)
);
CREATE TABLE vuln.sbom_canonical_match (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
sbom_id UUID NOT NULL REFERENCES vuln.sbom_registry(id),
canonical_id UUID NOT NULL REFERENCES vuln.advisory_canonical(id),
purl TEXT NOT NULL,
match_method TEXT NOT NULL,
confidence NUMERIC(3,2) NOT NULL DEFAULT 1.0,
is_reachable BOOLEAN NOT NULL DEFAULT false,
is_deployed BOOLEAN NOT NULL DEFAULT false,
matched_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT uq_sbom_canonical_match UNIQUE (sbom_id, canonical_id, purl)
);
CREATE TABLE concelier.sbom_documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
serial_number TEXT NOT NULL,
artifact_digest TEXT,
format TEXT NOT NULL CHECK (format IN ('cyclonedx', 'spdx')),
spec_version TEXT NOT NULL,
component_count INT NOT NULL DEFAULT 0,
service_count INT NOT NULL DEFAULT 0,
vulnerability_count INT NOT NULL DEFAULT 0,
has_crypto BOOLEAN NOT NULL DEFAULT false,
has_services BOOLEAN NOT NULL DEFAULT false,
has_vulnerabilities BOOLEAN NOT NULL DEFAULT false,
license_ids TEXT[] NOT NULL DEFAULT '{}',
license_expressions TEXT[] NOT NULL DEFAULT '{}',
sbom_json JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT uq_concelier_sbom_serial UNIQUE (serial_number),
CONSTRAINT uq_concelier_sbom_artifact UNIQUE (artifact_digest)
);
```