Files
git.stella-ops.org/docs/modules/scanner/sbom-attestation-hot-lookup-profile.md
2026-02-11 01:32:14 +02:00

7.3 KiB

Scanner SBOM and Attestation Hot Lookup Profile

Version: 0.1.0
Status: Draft (Advisory translation)
Last Updated: 2026-02-10

Purpose

Define a Stella-compatible persistence profile for fast SBOM/attestation lookup queries in PostgreSQL without moving full replay/audit payloads out of CAS/object storage.

Scope

This profile covers:

  • Hot OLTP lookup rows for digest, component, and pending-triage queries.
  • Partitioning, indexing, and retention for lookup tables.
  • Ingestion projection contracts from Scanner/Attestor outputs.

This profile does not replace:

  • CAS/object storage as source of truth for large immutable payloads.
  • Analytics star schema in analytics.*.
  • Existing proof bundle and witness contracts.

Current Baseline (confirmed)

  • Scanner stores full SBOM artifacts in object storage via ArtifactStorageService and ArtifactObjectKeyBuilder.
  • Scanner catalog metadata is stored in PostgreSQL (scanner.artifacts, scanner.links, related tables).
  • DSSE and proof metadata already use JSONB where needed (proof_bundle.dsse_envelope, scanner.witnesses.dsse_envelope).
  • High-volume partitioning currently exists for time-series style tables (for example EPSS and runtime samples), not for SBOM component lookup projections.
  • Component-level hot lookup acceleration is currently driven by the BOM-index sidecar contract.

Advisory Fit Assessment

Aligned with current direction:

  • Keep exact-match routing keys narrow and indexed.
  • Use JSONB GIN indexes only on query paths that are actually hot.
  • Partition by time for deterministic retention.
  • Keep analytics and rollups away from Scanner OLTP hot paths.

Gap requiring implementation:

  • No explicit Scanner Postgres contract exists for a partitioned SBOM/attestation lookup projection that supports direct SQL lookups by payload digest, component PURL/name/version, and merged VEX pending state.

Target Contract

1) Authoritative storage split

  • Authoritative blobs:
    • raw_bom, canonical SBOM documents, DSSE envelopes, and merged VEX payloads remain in CAS/object storage.
    • PostgreSQL rows reference these via artifact IDs/URIs and store bounded JSONB projections for search.
  • Authoritative decisions:
    • Policy decisions remain in their existing modules.

2) Hot lookup table

Create a new append-only projection table:

CREATE TABLE scanner.artifact_boms (
  build_id TEXT NOT NULL,
  canonical_bom_sha256 TEXT NOT NULL,
  payload_digest TEXT NOT NULL,
  inserted_at TIMESTAMPTZ NOT NULL DEFAULT now(),

  raw_bom_ref TEXT,
  canonical_bom_ref TEXT,
  dsse_envelope_ref TEXT,
  merged_vex_ref TEXT,

  canonical_bom JSONB,
  merged_vex JSONB,
  attestations JSONB,

  evidence_score INTEGER NOT NULL DEFAULT 0,
  rekor_tile_id TEXT,

  PRIMARY KEY (build_id, inserted_at)
) PARTITION BY RANGE (inserted_at);

Partition policy:

  • Monthly range partitions.
  • DEFAULT partition optional for safety.
  • Retention by DROP TABLE scanner.artifact_boms_YYYY_MM.

3) Index profile

Required:

CREATE INDEX IF NOT EXISTS ix_artifact_boms_payload_digest
  ON scanner.artifact_boms (payload_digest, inserted_at DESC);

CREATE INDEX IF NOT EXISTS ix_artifact_boms_canonical_sha
  ON scanner.artifact_boms (canonical_bom_sha256);

CREATE INDEX IF NOT EXISTS ix_artifact_boms_inserted_at
  ON scanner.artifact_boms (inserted_at DESC);

CREATE INDEX IF NOT EXISTS ix_artifact_boms_canonical_gin
  ON scanner.artifact_boms USING GIN (canonical_bom jsonb_path_ops);

CREATE INDEX IF NOT EXISTS ix_artifact_boms_merged_vex_gin
  ON scanner.artifact_boms USING GIN (merged_vex jsonb_path_ops);

Optional partial index for pending triage:

CREATE INDEX IF NOT EXISTS ix_artifact_boms_pending_vex
  ON scanner.artifact_boms USING GIN (merged_vex jsonb_path_ops)
  WHERE jsonb_path_exists(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")');

Uniqueness guard (optional):

CREATE UNIQUE INDEX IF NOT EXISTS uq_artifact_boms_monthly_dedupe
  ON scanner.artifact_boms (canonical_bom_sha256, payload_digest, date_trunc('month', inserted_at));

Query Contracts

Latest by payload digest

SELECT build_id, inserted_at, evidence_score
FROM scanner.artifact_boms
WHERE payload_digest = $1
ORDER BY inserted_at DESC
LIMIT 1;

Component presence by PURL

SELECT build_id, inserted_at
FROM scanner.artifact_boms
WHERE canonical_bom @? '$.components[*] ? (@.purl == $purl)'
ORDER BY inserted_at DESC
LIMIT 50;

Pending triage extraction

SELECT build_id, inserted_at,
       jsonb_path_query_array(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")') AS pending
FROM scanner.artifact_boms
WHERE jsonb_path_exists(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")')
ORDER BY inserted_at DESC
LIMIT 100;

Implemented API Surface (Scanner WebService)

Base path: /api/v1/sbom/hot-lookup

  • GET /payload/{payloadDigest}/latest
    • Returns latest projection row for digest.
  • GET /components?purl=<purl>&limit=<n>&offset=<n>
    • Component presence search by PURL.
  • GET /components?name=<name>&minVersion=<version>&limit=<n>&offset=<n>
    • Component presence search by normalized name and optional minimum version.
  • GET /pending-triage?limit=<n>&offset=<n>
    • Returns rows where merged VEX contains unknown or triage_pending states and includes extracted pending entries.

Pagination constraints:

  • limit: 1..200 (defaults: 50 for component searches, 100 for pending triage).
  • offset: >= 0.
  • Ordering is deterministic: inserted_at DESC, build_id ASC.

Ingestion Contract

  • Projectors should upsert on (canonical_bom_sha256, payload_digest) plus partition window.
  • canonical_bom_sha256 must be computed from canonical JSON (stable ordering, UTF-8, deterministic normalization).
  • Projection rows must preserve deterministic timestamps (UTC) and stable JSON serialization.
  • If inline JSONB exceeds configured size thresholds, keep JSONB minimal and store full content only by CAS reference fields.

Operations

  • Autovacuum enabled per partition; tune by active partition write rate.
  • Reindex/repack operations should run per partition, never globally.
  • Partition creation job should pre-create at least one future month.
  • Retention job should drop old partitions according to policy (for example 90/180/365-day classes by environment).
  • Keep analytics workloads on analytics.*; export to external columnar systems when query volume exceeds OLTP SLO budgets.

Operational runbook and jobs:

  • Runbook: docs/modules/scanner/operations/sbom-hot-lookup-operations.md
  • SQL snippets: devops/database/postgres-partitioning/003_scanner_artifact_boms_hot_lookup_jobs.sql
  • Shell jobs:
    • devops/scripts/scanner-artifact-boms-ensure-partitions.sh
    • devops/scripts/scanner-artifact-boms-retention.sh
  • Systemd timers:
    • devops/scripts/systemd/scanner-artifact-boms-ensure.timer
    • devops/scripts/systemd/scanner-artifact-boms-retention.timer

Security and Determinism Notes

  • Do not store secrets in JSONB payloads.
  • DSSE and Rekor references must remain verifiable from immutable CAS/object artifacts.
  • Query responses used in policy decisions must preserve stable ordering and deterministic filtering rules.
  • Implementation sprint: docs/implplan/SPRINT_20260210_001_DOCS_sbom_attestation_hot_lookup_contract.md