122 lines
11 KiB
Markdown
122 lines
11 KiB
Markdown
# SBOM Service architecture (2025Q4)
|
||
|
||
> Scope: canonical SBOM projections, lookup and timeline APIs, asset metadata overlays, and events feeding Advisory AI, Console, Graph, Policy, and Vuln Explorer.
|
||
|
||
## 1) Mission & boundaries
|
||
- Mission: serve deterministic, tenant-scoped SBOM projections (Link-Not-Merge v1) and related metadata for downstream reasoning and overlays.
|
||
- Boundaries:
|
||
- Does not perform scanning; consumes Scanner outputs or supplied SPDX/CycloneDX blobs.
|
||
- Does not author verdicts/policy; supplies evidence and projections to Policy/Concelier/Graph.
|
||
- Append-only SBOM versions; mutations happen via new versions, never in-place edits.
|
||
- Owns the SBOM lineage ledger for versioned uploads, diffs, and retention pruning.
|
||
|
||
## 2) Project layout
|
||
- `src/SbomService/StellaOps.SbomService` — REST API + event emitters + orchestrator integration.
|
||
- Storage: PostgreSQL tables (proposed)
|
||
- `sbom_snapshots` (immutable versions; tenant + artifact + digest + createdAt)
|
||
- `sbom_projections` (materialised views keyed by snapshotId, entrypoint/service node flags)
|
||
- `sbom_assets` (asset metadata, criticality/owner/env/exposure; append-only history)
|
||
- `sbom_catalog` (console catalog surface; indexed by artifact, scope, license, assetTags.*, createdAt for deterministic pagination)
|
||
- `sbom_component_neighbors` (component lookup graph edges; indexed by purl+artifact for cursor pagination)
|
||
- `sbom_paths` (resolved dependency paths with runtime flags, blast-radius hints)
|
||
- `sbom_events` (outbox for event delivery + watermark/backfill tracking)
|
||
|
||
### 2.1) SBOM + provenance spine (Nov 2026)
|
||
|
||
The service now owns an idempotent spine that converts OCI images into SBOMs and provenance bundles with DSSE and in-toto. The flow is intentionally air-gap ready:
|
||
|
||
- **Extract** OCI manifest/layers (hash becomes `contentAddress`).
|
||
- **Build SBOM** in CycloneDX 1.7 and/or SPDX 3.0.1; canonicalize JSON before hashing (`sbomHash`).
|
||
- **Sign** outputs as DSSE envelopes; predicate uses in-toto Statement with SLSA Provenance v1.
|
||
- **Publish** attestations optionally to a transparency backend: `rekor`, `local-merkle`, or `null` (no-op). Local Merkle log keeps proofs for later sync when online.
|
||
|
||
Minimal APIs exposed by SbomService (idempotent by hash):
|
||
|
||
- `POST /sbom/ingest` `{ imageDigest, sbom, format, dsseSignature? }` → `{ sbomId, status: stored|already_present, sbomHash }` keyed by `contentAddress + sbomHash`.
|
||
- `POST /attest/verify` `{ dsseEnvelope, expectedSubjects[] }` → `{ verified, predicateType, logIndex?, inclusionProof? }` and records attestation when verified.
|
||
|
||
Operational rules:
|
||
|
||
- Default media types: `application/vnd.cyclonedx+json`, `application/spdx+json`, `application/dsse+json`, `application/vnd.in-toto+json`.
|
||
- If the same SBOM/attestation arrives again, return HTTP 200 with `"status":"already_present"` and do not create a new version.
|
||
- Offline posture: no external calls required; Rekor publish remains optional and retryable when connectivity is restored.
|
||
|
||
## 3) APIs (first wave)
|
||
- `GET /sbom/paths?purl=...&artifact=...&scope=...&env=...` — returns ordered paths with runtime_flag/blast_radius and nearest-safe-version hint; supports `cursor` pagination.
|
||
- `GET /sbom/versions?artifact=...` – time-ordered SBOM version timeline for Advisory AI; include provenance and source bundle hash.
|
||
- `POST /sbom/upload` – BYOS upload endpoint; validates/normalizes SPDX 2.3/3.0 or CycloneDX 1.4–1.6 and registers a ledger version.
|
||
- `GET /sbom/ledger/history` – list version history for an artifact (cursor pagination).
|
||
- `GET /sbom/ledger/point` – resolve the SBOM version at a specific timestamp.
|
||
- `GET /sbom/ledger/range` – query versions within a time range.
|
||
- `GET /sbom/ledger/diff` – component/version/license diff between two versions.
|
||
- `GET /sbom/ledger/lineage` – parent/child lineage edges for an artifact chain.
|
||
- `GET /console/sboms` – Console catalog with filters (artifact, license, scope, asset tags), cursor pagination, evaluation metadata, immutable JSON projection for drawer views.
|
||
- `GET /components/lookup?purl=...` – component neighborhood for global search/Graph overlays; returns caches hints + tenant enforcement.
|
||
- `POST /entrypoints` / `GET /entrypoints` – manage entrypoint/service node overrides feeding Cartographer relevance; deterministic defaults when unset.
|
||
- `GET /sboms/{snapshotId}/projection` – Link-Not-Merge v1 projection returning hashes plus asset metadata (criticality, owner, environment, exposure flags, tags) alongside package/component graph.
|
||
- `GET /internal/sbom/events` — internal diagnostics endpoint returning the in-memory event outbox for validation.
|
||
- `POST /internal/sbom/events/backfill` — replays existing projections into the event stream; deterministic ordering, clock abstraction for tests.
|
||
- `GET /internal/sbom/asset-events` — diagnostics endpoint returning emitted `sbom.asset.updated` envelopes for validation and air-gap parity checks.
|
||
- `GET/POST /internal/orchestrator/sources` — list/register orchestrator ingest/index sources (deterministic seeds; idempotent on artifactDigest+sourceType).
|
||
- `GET/POST /internal/orchestrator/control` — manage pause/throttle/backpressure signals per tenant; metrics emitted for control updates.
|
||
- `GET/POST /internal/orchestrator/watermarks` — fetch/set backfill watermarks for reconciliation and deterministic replays.
|
||
- `GET /internal/sbom/resolver-feed` – list resolver candidates (artifact, purl, version, paths, scope, runtime_flag, nearest_safe_version).
|
||
- `POST /internal/sbom/resolver-feed/backfill` – clear and repopulate resolver feed from current projections.
|
||
- `GET /internal/sbom/resolver-feed/export` – NDJSON export of resolver candidates for air-gap delivery.
|
||
- `GET /internal/sbom/ledger/audit` – audit trail for ledger changes (created/pruned).
|
||
- `GET /internal/sbom/analysis/jobs` – list analysis jobs triggered by BYOS uploads.
|
||
- `POST /internal/sbom/retention/prune` – apply retention policy and emit audit entries.
|
||
|
||
## 3.1) Ledger + BYOS workflow (Sprint 4600)
|
||
- Uploads are validated, normalized, and stored as ledger versions chained per artifact identity.
|
||
- Diffs compare normalized component keys and surface version/license deltas with deterministic ordering.
|
||
- Lineage is derived from parent version references and emitted for Graph lineage edges.
|
||
- Lineage relationships include parent links plus build links (shared CI build IDs when provided).
|
||
- Retention policy prunes old versions while preserving audit entries and minimum keep counts.
|
||
- See `docs/modules/sbomservice/ledger-lineage.md` for request/response examples.
|
||
- See `docs/modules/sbomservice/byos-ingestion.md` for supported formats and troubleshooting.
|
||
|
||
## 4) Ingestion & orchestrator integration
|
||
- Ingest sources: Scanner pipeline (preferred) or uploaded SPDX 2.3/3.0 and CycloneDX 1.4–1.6 bundles.
|
||
- Orchestrator: register SBOM ingest/index jobs; worker SDK emits artifact hash + job metadata; honor pause/throttle; report backpressure metrics; support watermark-based backfill for idempotent replays.
|
||
- Idempotency: combine `(tenant, artifactDigest, sbomVersion)` as primary key; duplicate ingests short-circuit.
|
||
|
||
## 5) Events & streaming
|
||
- `sbom.version.created` — emitted per new SBOM snapshot; payload: tenant, artifact digest, sbomVersion, projection hash, source bundle hash, import provenance; replay/backfill via outbox with watermark.
|
||
- `sbom.asset.updated` — emitted when asset metadata changes; idempotent payload keyed by `(tenant, assetId, version)`.
|
||
- Inventory/resolver feeds — queue/topic delivering `(artifact, purl, version, paths, runtime_flag, scope, nearest_safe_version)` for Vuln Explorer/Findings Ledger.
|
||
- Current implementation uses an in-memory event store/publisher (with clock abstraction) plus `/internal/sbom/events` + `/internal/sbom/events/backfill` to validate envelopes until the PostgreSQL-backed outbox is wired.
|
||
- Entrypoint/service node overrides are exposed via `/entrypoints` (tenant-scoped) and should be mirrored into Cartographer relevance jobs when the outbox lands.
|
||
|
||
## 6) Determinism & offline posture
|
||
- Stable ordering for projections and paths; timestamps in UTC ISO-8601; hash inputs canonicalised.
|
||
- Add-only evolution for schemas; LNM v1 fixtures published alongside API docs and replayable tests.
|
||
- Offline-friendly: uses mirrored packages, avoids external calls during projection; exports NDJSON bundles for air-gapped replay.
|
||
|
||
## 7) Tenancy & security
|
||
- All APIs require tenant context (token claims or mTLS binding); collection filters must include tenant keys.
|
||
- Enforce least-privilege queries; avoid cross-tenant caches; log tenant IDs in structured logs.
|
||
- Input validation: schema-validate incoming SBOMs; reject oversized/unsupported media types early.
|
||
|
||
## 8) Observability
|
||
- Metrics: `sbom_projection_seconds`, `sbom_projection_size_bytes`, `sbom_projection_queries_total`, `sbom_paths_latency_seconds`, `sbom_paths_cache_hit_ratio`, `sbom_events_backlog`, `sbom_ledger_uploads_total`, `sbom_ledger_diffs_total`, `sbom_ledger_retention_pruned_total`.
|
||
- Tracing: ActivitySource `StellaOps.SbomService` (entrypoints, component lookup, console catalog, projections, events).
|
||
- Traces: wrap ingest, projection build, and API handlers; propagate orchestrator job IDs.
|
||
- Logs: structured, include tenant + artifact digest + sbomVersion; classify ingest failures (schema, storage, orchestrator, validation).
|
||
- Alerts: backlog thresholds for outbox/event delivery; high latency on path/timeline endpoints.
|
||
|
||
## 9) Configuration (PostgreSQL-backed catalog & lookup)
|
||
- Enable PostgreSQL storage for `/console/sboms` and `/components/lookup` by setting `SbomService:PostgreSQL:ConnectionString` (env: `SBOM_SbomService__PostgreSQL__ConnectionString`).
|
||
- Optional overrides: `SbomService:PostgreSQL:Schema`, `SbomService:PostgreSQL:CatalogTable`, `SbomService:PostgreSQL:ComponentLookupTable`; defaults are `sbom_service`, `sbom_catalog`, `sbom_component_neighbors`.
|
||
- When the connection string is absent the service falls back to fixture JSON or deterministic in-memory seeds to keep air-gapped workflows alive.
|
||
- Ledger retention settings (env prefix `SBOM_SbomService__Ledger__`): `MaxVersionsPerArtifact`, `MaxAgeDays`, `MinVersionsToKeep`.
|
||
|
||
## 10) Open questions / dependencies
|
||
- Confirm orchestrator pause/backfill contract (shared with Runtime & Signals 140-series).
|
||
- Finalise storage table names and indexes (compound on tenant+artifactDigest+version, TTL for transient staging).
|
||
- Publish canonical LNM v1 fixtures and JSON schemas for projections and asset metadata.
|
||
|
||
- See `docs/modules/sbomservice/api/projection-read.md` for `/sboms/{snapshotId}/projection` (LNM v1, tenant-scoped, hash-returning).
|
||
- See `docs/modules/sbomservice/lineage-ledger.md` for ledger endpoints and lineage relationships.
|
||
- See `docs/modules/sbomservice/retention-policy.md` for retention configuration and audit expectations.
|