13 KiB
SBOM Service architecture (2025Q4)
Scope: canonical SBOM projections, lookup and timeline APIs, asset metadata overlays, and events feeding Advisory AI, Console, Graph, Policy, and Vuln Explorer.
1) Mission & boundaries
- Mission: serve deterministic, tenant-scoped SBOM projections (Link-Not-Merge v1) and related metadata for downstream reasoning and overlays.
- Boundaries:
- Does not perform scanning; consumes Scanner outputs or supplied SPDX/CycloneDX blobs.
- Does not author verdicts/policy; supplies evidence and projections to Policy/Concelier/Graph.
- Append-only SBOM versions; mutations happen via new versions, never in-place edits.
- Owns the SBOM lineage ledger for versioned uploads, diffs, and retention pruning.
2) Project layout
src/SbomService/StellaOps.SbomService— REST API + event emitters + orchestrator integration.- Storage: PostgreSQL tables (proposed)
sbom_snapshots(immutable versions; tenant + artifact + digest + createdAt)sbom_projections(materialised views keyed by snapshotId, entrypoint/service node flags)sbom_assets(asset metadata, criticality/owner/env/exposure; append-only history)sbom_catalog(console catalog surface; indexed by artifact, scope, license, assetTags.*, createdAt for deterministic pagination)sbom_component_neighbors(component lookup graph edges; indexed by purl+artifact for cursor pagination)
sbom_paths(resolved dependency paths with runtime flags, blast-radius hints)sbom_events(outbox for event delivery + watermark/backfill tracking)
2.1) SBOM + provenance spine (Nov 2026)
The service now owns an idempotent spine that converts OCI images into SBOMs and provenance bundles with DSSE and in-toto. The flow is intentionally air-gap ready:
- Extract OCI manifest/layers (hash becomes
contentAddress). - Build SBOM in CycloneDX 1.7 and/or SPDX 3.0.1; canonicalize JSON before hashing (
sbomHash). - Sign outputs as DSSE envelopes; predicate uses in-toto Statement with SLSA Provenance v1.
- Publish attestations optionally to a transparency backend:
rekor,local-merkle, ornull(no-op). Local Merkle log keeps proofs for later sync when online.
Minimal APIs exposed by SbomService (idempotent by hash):
POST /sbom/ingest{ imageDigest, sbom, format, dsseSignature? }→{ sbomId, status: stored|already_present, sbomHash }keyed bycontentAddress + sbomHash.POST /attest/verify{ dsseEnvelope, expectedSubjects[] }→{ verified, predicateType, logIndex?, inclusionProof? }and records attestation when verified.
Operational rules:
- Default media types:
application/vnd.cyclonedx+json,application/spdx+json,application/dsse+json,application/vnd.in-toto+json. - If the same SBOM/attestation arrives again, return HTTP 200 with
"status":"already_present"and do not create a new version. - Offline posture: no external calls required; Rekor publish remains optional and retryable when connectivity is restored.
3) APIs (first wave)
GET /sbom/paths?purl=...&artifact=...&scope=...&env=...— returns ordered paths with runtime_flag/blast_radius and nearest-safe-version hint; supportscursorpagination.GET /sbom/versions?artifact=...– time-ordered SBOM version timeline for Advisory AI; include provenance and source bundle hash.POST /sbom/upload– BYOS upload endpoint; validates/normalizes SPDX 2.3/3.0.1 or CycloneDX 1.4–1.7 and registers a ledger version.GET /sbom/ledger/history– list version history for an artifact (cursor pagination).GET /sbom/ledger/point– resolve the SBOM version at a specific timestamp.GET /sbom/ledger/range– query versions within a time range.GET /sbom/ledger/diff– component/version/license diff between two versions.GET /sbom/ledger/lineage– parent/child lineage edges for an artifact chain.GET /console/sboms– Console catalog with filters (artifact, license, scope, asset tags), cursor pagination, evaluation metadata, immutable JSON projection for drawer views.GET /components/lookup?purl=...– component neighborhood for global search/Graph overlays; returns caches hints + tenant enforcement.POST /entrypoints/GET /entrypoints– manage entrypoint/service node overrides feeding Cartographer relevance; deterministic defaults when unset.GET /sboms/{snapshotId}/projection– Link-Not-Merge v1 projection returning hashes plus asset metadata (criticality, owner, environment, exposure flags, tags) alongside package/component graph.GET /internal/sbom/events— internal diagnostics endpoint returning the in-memory event outbox for validation.POST /internal/sbom/events/backfill— replays existing projections into the event stream; deterministic ordering, clock abstraction for tests.GET /internal/sbom/asset-events— diagnostics endpoint returning emittedsbom.asset.updatedenvelopes for validation and air-gap parity checks.GET/POST /internal/orchestrator/sources— list/register orchestrator ingest/index sources (deterministic seeds; idempotent on artifactDigest+sourceType).GET/POST /internal/orchestrator/control— manage pause/throttle/backpressure signals per tenant; metrics emitted for control updates.GET/POST /internal/orchestrator/watermarks— fetch/set backfill watermarks for reconciliation and deterministic replays.GET /internal/sbom/resolver-feed– list resolver candidates (artifact, purl, version, paths, scope, runtime_flag, nearest_safe_version).POST /internal/sbom/resolver-feed/backfill– clear and repopulate resolver feed from current projections.GET /internal/sbom/resolver-feed/export– NDJSON export of resolver candidates for air-gap delivery.GET /internal/sbom/ledger/audit– audit trail for ledger changes (created/pruned).GET /internal/sbom/analysis/jobs– list analysis jobs triggered by BYOS uploads.POST /internal/sbom/retention/prune– apply retention policy and emit audit entries.
3.1) Ledger + BYOS workflow (Sprint 4600)
- Uploads are validated, normalized, and stored as ledger versions chained per artifact identity.
- Diffs compare normalized component keys and surface version/license deltas with deterministic ordering.
- Lineage is derived from parent version references and emitted for Graph lineage edges.
- Lineage relationships include parent links plus build links (shared CI build IDs when provided).
- Retention policy prunes old versions while preserving audit entries and minimum keep counts.
- See
docs/modules/sbomservice/ledger-lineage.mdfor request/response examples. - See
docs/modules/sbomservice/byos-ingestion.mdfor supported formats and troubleshooting.
4) Ingestion & orchestrator integration
- Ingest sources: Scanner pipeline (preferred) or uploaded SPDX 2.3/3.0.1 and CycloneDX 1.4–1.7 bundles.
- Orchestrator: register SBOM ingest/index jobs; worker SDK emits artifact hash + job metadata; honor pause/throttle; report backpressure metrics; support watermark-based backfill for idempotent replays.
- Idempotency: combine
(tenant, artifactDigest, sbomVersion)as primary key; duplicate ingests short-circuit.
5) Events & streaming
sbom.version.created— emitted per new SBOM snapshot; payload: tenant, artifact digest, sbomVersion, projection hash, source bundle hash, import provenance; replay/backfill via outbox with watermark.sbom.asset.updated— emitted when asset metadata changes; idempotent payload keyed by(tenant, assetId, version).- Inventory/resolver feeds — queue/topic delivering
(artifact, purl, version, paths, runtime_flag, scope, nearest_safe_version)for Vuln Explorer/Findings Ledger.- Current implementation uses an in-memory event store/publisher (with clock abstraction) plus
/internal/sbom/events+/internal/sbom/events/backfillto validate envelopes until the PostgreSQL-backed outbox is wired. - Entrypoint/service node overrides are exposed via
/entrypoints(tenant-scoped) and should be mirrored into Cartographer relevance jobs when the outbox lands.
- Current implementation uses an in-memory event store/publisher (with clock abstraction) plus
6) Determinism & offline posture
- Stable ordering for projections and paths; timestamps in UTC ISO-8601; hash inputs canonicalised.
- Add-only evolution for schemas; LNM v1 fixtures published alongside API docs and replayable tests.
- Offline-friendly: uses mirrored packages, avoids external calls during projection; exports NDJSON bundles for air-gapped replay.
7) Tenancy & security
- All APIs require tenant context (token claims or mTLS binding); collection filters must include tenant keys.
- Enforce least-privilege queries; avoid cross-tenant caches; log tenant IDs in structured logs.
- Input validation: schema-validate incoming SBOMs; reject oversized/unsupported media types early.
8) Observability
- Metrics:
sbom_projection_seconds,sbom_projection_size_bytes,sbom_projection_queries_total,sbom_paths_latency_seconds,sbom_paths_cache_hit_ratio,sbom_events_backlog,sbom_ledger_uploads_total,sbom_ledger_diffs_total,sbom_ledger_retention_pruned_total. - Tracing: ActivitySource
StellaOps.SbomService(entrypoints, component lookup, console catalog, projections, events). - Traces: wrap ingest, projection build, and API handlers; propagate orchestrator job IDs.
- Logs: structured, include tenant + artifact digest + sbomVersion; classify ingest failures (schema, storage, orchestrator, validation).
- Alerts: backlog thresholds for outbox/event delivery; high latency on path/timeline endpoints.
8.1) Registry Source Management (Sprint 012)
The service manages container registry sources for automated image discovery and scanning:
Models
RegistrySource— registry connection with URL, filters, schedule, credentials (via AuthRef).RegistrySourceRun— run history with status, discovered images, triggered scans, error details.RegistrySourceStatus—Draft,Active,Paused,Error,Deleted.RegistrySourceProvider—Generic,Harbor,DockerHub,ACR,ECR,GCR,GHCR.
APIs
GET/POST/PUT/DELETE /api/v1/registry-sources— CRUD operations.POST /api/v1/registry-sources/{id}/test— test registry connection and credentials.POST /api/v1/registry-sources/{id}/trigger— manually trigger discovery and scanning.POST /api/v1/registry-sources/{id}/pause//resume— pause/resume scheduled scans.GET /api/v1/registry-sources/{id}/runs— run history with health metrics.GET /api/v1/registry-sources/{id}/discover/repositories— discover repositories matching filters.GET /api/v1/registry-sources/{id}/discover/tags/{repository}— discover tags for a repository.GET /api/v1/registry-sources/{id}/discover/images— full image discovery.POST /api/v1/registry-sources/{id}/discover-and-scan— discover and submit scan jobs.
Webhook Ingestion
POST /api/v1/webhooks/registry/{sourceId}— receive push notifications from registries.- Supported providers: Harbor, DockerHub, ACR, ECR, GCR, GHCR.
- HMAC-SHA256 signature validation using webhook secret from AuthRef.
- Auto-detection of provider from request headers.
Discovery Service
- OCI Distribution Spec compliant repository/tag enumeration.
- Pagination via RFC 5988 Link headers.
- Allowlist/denylist filtering for repositories and tags (glob patterns).
- Manifest digest retrieval via HEAD requests.
Scan Job Emission
- Batch submission to Scanner service with rate limiting.
- Deduplication (skips if job already exists).
- Metadata includes source ID, trigger type, client request ID.
Configuration
SbomService:ScannerUrl— Scanner service endpoint (default:http://localhost:5100).SbomService:BatchScanSize— max images per batch (default: 10).SbomService:BatchScanDelayMs— delay between batch submissions (default: 100ms).
Credentials
- All credentials via AuthRef URIs:
authref://{vault}/{path}#{key}. - Supports basic auth (
basic:user:pass) and bearer tokens (bearer:token) for development.
9) Configuration (PostgreSQL-backed catalog & lookup)
- Enable PostgreSQL storage for
/console/sbomsand/components/lookupby settingSbomService:PostgreSQL:ConnectionString(env:SBOM_SbomService__PostgreSQL__ConnectionString). - Optional overrides:
SbomService:PostgreSQL:Schema,SbomService:PostgreSQL:CatalogTable,SbomService:PostgreSQL:ComponentLookupTable; defaults aresbom_service,sbom_catalog,sbom_component_neighbors. - When the connection string is absent the service falls back to fixture JSON or deterministic in-memory seeds to keep air-gapped workflows alive.
- Ledger retention settings (env prefix
SBOM_SbomService__Ledger__):MaxVersionsPerArtifact,MaxAgeDays,MinVersionsToKeep.
10) Open questions / dependencies
-
Confirm orchestrator pause/backfill contract (shared with Runtime & Signals 140-series).
-
Finalise storage table names and indexes (compound on tenant+artifactDigest+version, TTL for transient staging).
-
Publish canonical LNM v1 fixtures and JSON schemas for projections and asset metadata.
-
See
docs/modules/sbomservice/api/projection-read.mdfor/sboms/{snapshotId}/projection(LNM v1, tenant-scoped, hash-returning). -
See
docs/modules/sbomservice/lineage-ledger.mdfor ledger endpoints and lineage relationships. -
See
docs/modules/sbomservice/retention-policy.mdfor retention configuration and audit expectations.