Files
git.stella-ops.org/docs/modules/sbomservice/architecture.md
StellaOps Bot 4dc7cf834a
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Console CI / console-ci (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Export Center CI / export-ci (push) Has been cancelled
VEX Proof Bundles / verify-bundles (push) Has been cancelled
Add sample proof bundle configurations and verification script
- Introduced sample proof bundle configuration files for testing, including `sample-proof-bundle-config.dsse.json`, `sample-proof-bundle.dsse.json`, and `sample-proof-bundle.json`.
- Implemented a verification script `test_verify_sample.sh` to validate proof bundles against specified schemas and catalogs.
- Updated existing proof bundle configurations with new metadata, including versioning, created timestamps, and justification details.
- Enhanced evidence entries with expiration dates and hashes for better integrity checks.
- Ensured all new configurations adhere to the defined schema for consistency and reliability in testing.
2025-12-04 08:54:32 +02:00

8.7 KiB

SBOM Service architecture (2025Q4)

Scope: canonical SBOM projections, lookup and timeline APIs, asset metadata overlays, and events feeding Advisory AI, Console, Graph, Policy, and Vuln Explorer.

1) Mission & boundaries

  • Mission: serve deterministic, tenant-scoped SBOM projections (Link-Not-Merge v1) and related metadata for downstream reasoning and overlays.
  • Boundaries:
    • Does not perform scanning; consumes Scanner outputs or supplied SPDX/CycloneDX blobs.
    • Does not author verdicts/policy; supplies evidence and projections to Policy/Concelier/Graph.
    • Append-only SBOM versions; mutations happen via new versions, never in-place edits.

2) Project layout

  • src/SbomService/StellaOps.SbomService — REST API + event emitters + orchestrator integration.
  • Storage: MongoDB collections (proposed)
    • sbom_snapshots (immutable versions; tenant + artifact + digest + createdAt)
    • sbom_projections (materialised views keyed by snapshotId, entrypoint/service node flags)
    • sbom_assets (asset metadata, criticality/owner/env/exposure; append-only history)
    • sbom_catalog (console catalog surface; indexed by artifact, scope, license, assetTags.*, createdAt for deterministic pagination)
    • sbom_component_neighbors (component lookup graph edges; indexed by purl+artifact for cursor pagination)
  • sbom_paths (resolved dependency paths with runtime flags, blast-radius hints)
  • sbom_events (outbox for event delivery + watermark/backfill tracking)

2.1) SBOM + provenance spine (Nov 2026)

The service now owns an idempotent spine that converts OCI images into SBOMs and provenance bundles with DSSE and in-toto. The flow is intentionally air-gap ready:

  • Extract OCI manifest/layers (hash becomes contentAddress).
  • Build SBOM in CycloneDX 1.6 and/or SPDX 3.0.1; canonicalize JSON before hashing (sbomHash).
  • Sign outputs as DSSE envelopes; predicate uses in-toto Statement with SLSA Provenance v1.
  • Publish attestations optionally to a transparency backend: rekor, local-merkle, or null (no-op). Local Merkle log keeps proofs for later sync when online.

Minimal APIs exposed by SbomService (idempotent by hash):

  • POST /sbom/ingest { imageDigest, sbom, format, dsseSignature? }{ sbomId, status: stored|already_present, sbomHash } keyed by contentAddress + sbomHash.
  • POST /attest/verify { dsseEnvelope, expectedSubjects[] }{ verified, predicateType, logIndex?, inclusionProof? } and records attestation when verified.

Operational rules:

  • Default media types: application/vnd.cyclonedx+json, application/spdx+json, application/dsse+json, application/vnd.in-toto+json.
  • If the same SBOM/attestation arrives again, return HTTP 200 with "status":"already_present" and do not create a new version.
  • Offline posture: no external calls required; Rekor publish remains optional and retryable when connectivity is restored.

3) APIs (first wave)

  • GET /sbom/paths?purl=...&artifact=...&scope=...&env=... — returns ordered paths with runtime_flag/blast_radius and nearest-safe-version hint; supports cursor pagination.
  • GET /sbom/versions?artifact=... — time-ordered SBOM version timeline for Advisory AI; include provenance and source bundle hash.
  • GET /console/sboms — Console catalog with filters (artifact, license, scope, asset tags), cursor pagination, evaluation metadata, immutable JSON projection for drawer views.
  • GET /components/lookup?purl=... — component neighborhood for global search/Graph overlays; returns caches hints + tenant enforcement.
  • POST /entrypoints / GET /entrypoints — manage entrypoint/service node overrides feeding Cartographer relevance; deterministic defaults when unset.
  • GET /sboms/{snapshotId}/projection — Link-Not-Merge v1 projection returning hashes plus asset metadata (criticality, owner, environment, exposure flags, tags) alongside package/component graph.
  • GET /internal/sbom/events — internal diagnostics endpoint returning the in-memory event outbox for validation.
  • POST /internal/sbom/events/backfill — replays existing projections into the event stream; deterministic ordering, clock abstraction for tests.
  • GET /internal/sbom/asset-events — diagnostics endpoint returning emitted sbom.asset.updated envelopes for validation and air-gap parity checks.
  • GET/POST /internal/orchestrator/sources — list/register orchestrator ingest/index sources (deterministic seeds; idempotent on artifactDigest+sourceType).
  • GET/POST /internal/orchestrator/control — manage pause/throttle/backpressure signals per tenant; metrics emitted for control updates.
  • GET/POST /internal/orchestrator/watermarks — fetch/set backfill watermarks for reconciliation and deterministic replays.
  • GET /internal/sbom/resolver-feed — list resolver candidates (artifact, purl, version, paths, scope, runtime_flag, nearest_safe_version).
  • POST /internal/sbom/resolver-feed/backfill — clear and repopulate resolver feed from current projections.
  • GET /internal/sbom/resolver-feed/export — NDJSON export of resolver candidates for air-gap delivery.

4) Ingestion & orchestrator integration

  • Ingest sources: Scanner pipeline (preferred) or uploaded SPDX 3.0.1/CycloneDX 1.6 bundles.
  • Orchestrator: register SBOM ingest/index jobs; worker SDK emits artifact hash + job metadata; honor pause/throttle; report backpressure metrics; support watermark-based backfill for idempotent replays.
  • Idempotency: combine (tenant, artifactDigest, sbomVersion) as primary key; duplicate ingests short-circuit.

5) Events & streaming

  • sbom.version.created — emitted per new SBOM snapshot; payload: tenant, artifact digest, sbomVersion, projection hash, source bundle hash, import provenance; replay/backfill via outbox with watermark.
  • sbom.asset.updated — emitted when asset metadata changes; idempotent payload keyed by (tenant, assetId, version).
  • Inventory/resolver feeds — queue/topic delivering (artifact, purl, version, paths, runtime_flag, scope, nearest_safe_version) for Vuln Explorer/Findings Ledger.
    • Current implementation uses an in-memory event store/publisher (with clock abstraction) plus /internal/sbom/events + /internal/sbom/events/backfill to validate envelopes until the Mongo-backed outbox is wired.
    • Entrypoint/service node overrides are exposed via /entrypoints (tenant-scoped) and should be mirrored into Cartographer relevance jobs when the outbox lands.

6) Determinism & offline posture

  • Stable ordering for projections and paths; timestamps in UTC ISO-8601; hash inputs canonicalised.
  • Add-only evolution for schemas; LNM v1 fixtures published alongside API docs and replayable tests.
  • Offline-friendly: uses mirrored packages, avoids external calls during projection; exports NDJSON bundles for air-gapped replay.

7) Tenancy & security

  • All APIs require tenant context (token claims or mTLS binding); collection filters must include tenant keys.
  • Enforce least-privilege queries; avoid cross-tenant caches; log tenant IDs in structured logs.
  • Input validation: schema-validate incoming SBOMs; reject oversized/unsupported media types early.

8) Observability

  • Metrics: sbom_projection_seconds, sbom_projection_size_bytes, sbom_projection_queries_total, sbom_paths_latency_seconds, sbom_paths_cache_hit_ratio, sbom_events_backlog.
  • Tracing: ActivitySource StellaOps.SbomService (entrypoints, component lookup, console catalog, projections, events).
  • Traces: wrap ingest, projection build, and API handlers; propagate orchestrator job IDs.
  • Logs: structured, include tenant + artifact digest + sbomVersion; classify ingest failures (schema, storage, orchestrator, validation).
  • Alerts: backlog thresholds for outbox/event delivery; high latency on path/timeline endpoints.

9) Configuration (Mongo-backed catalog & lookup)

  • Enable Mongo storage for /console/sboms and /components/lookup by setting SbomService:Mongo:ConnectionString (env: SBOM_SbomService__Mongo__ConnectionString).
  • Optional overrides: SbomService:Mongo:Database, SbomService:Mongo:CatalogCollection, SbomService:Mongo:ComponentLookupCollection; defaults are sbom_service, sbom_catalog, sbom_component_neighbors.
  • When the connection string is absent the service falls back to fixture JSON or deterministic in-memory seeds to keep air-gapped workflows alive.

10) Open questions / dependencies

  • Confirm orchestrator pause/backfill contract (shared with Runtime & Signals 140-series).

  • Finalise storage collection names and indexes (compound on tenant+artifactDigest+version, TTL for transient staging).

  • Publish canonical LNM v1 fixtures and JSON schemas for projections and asset metadata.

  • See docs/modules/sbomservice/api/projection-read.md for /sboms/{snapshotId}/projection (LNM v1, tenant-scoped, hash-returning).