# Scanner SBOM and Attestation Hot Lookup Profile Version: 0.1.0 Status: Draft (Advisory translation) Last Updated: 2026-02-10 ## Purpose Define a Stella-compatible persistence profile for fast SBOM/attestation lookup queries in PostgreSQL without moving full replay/audit payloads out of CAS/object storage. ## Scope This profile covers: - Hot OLTP lookup rows for digest, component, and pending-triage queries. - Partitioning, indexing, and retention for lookup tables. - Ingestion projection contracts from Scanner/Attestor outputs. This profile does not replace: - CAS/object storage as source of truth for large immutable payloads. - Analytics star schema in `analytics.*`. - Existing proof bundle and witness contracts. ## Current Baseline (confirmed) - Scanner stores full SBOM artifacts in object storage via `ArtifactStorageService` and `ArtifactObjectKeyBuilder`. - Scanner catalog metadata is stored in PostgreSQL (`scanner.artifacts`, `scanner.links`, related tables). - DSSE and proof metadata already use JSONB where needed (`proof_bundle.dsse_envelope`, `scanner.witnesses.dsse_envelope`). - High-volume partitioning currently exists for time-series style tables (for example EPSS and runtime samples), not for SBOM component lookup projections. - Component-level hot lookup acceleration is currently driven by the BOM-index sidecar contract. ## Advisory Fit Assessment Aligned with current direction: - Keep exact-match routing keys narrow and indexed. - Use JSONB GIN indexes only on query paths that are actually hot. - Partition by time for deterministic retention. - Keep analytics and rollups away from Scanner OLTP hot paths. Gap requiring implementation: - No explicit Scanner Postgres contract exists for a partitioned SBOM/attestation lookup projection that supports direct SQL lookups by payload digest, component PURL/name/version, and merged VEX pending state. ## Target Contract ### 1) Authoritative storage split - Authoritative blobs: - `raw_bom`, canonical SBOM documents, DSSE envelopes, and merged VEX payloads remain in CAS/object storage. - PostgreSQL rows reference these via artifact IDs/URIs and store bounded JSONB projections for search. - Authoritative decisions: - Policy decisions remain in their existing modules. ### 2) Hot lookup table Create a new append-only projection table: ```sql CREATE TABLE scanner.artifact_boms ( build_id TEXT NOT NULL, canonical_bom_sha256 TEXT NOT NULL, payload_digest TEXT NOT NULL, inserted_at TIMESTAMPTZ NOT NULL DEFAULT now(), raw_bom_ref TEXT, canonical_bom_ref TEXT, dsse_envelope_ref TEXT, merged_vex_ref TEXT, canonical_bom JSONB, merged_vex JSONB, attestations JSONB, evidence_score INTEGER NOT NULL DEFAULT 0, rekor_tile_id TEXT, PRIMARY KEY (build_id, inserted_at) ) PARTITION BY RANGE (inserted_at); ``` Partition policy: - Monthly range partitions. - `DEFAULT` partition optional for safety. - Retention by `DROP TABLE scanner.artifact_boms_YYYY_MM`. ### 3) Index profile Required: ```sql CREATE INDEX IF NOT EXISTS ix_artifact_boms_payload_digest ON scanner.artifact_boms (payload_digest, inserted_at DESC); CREATE INDEX IF NOT EXISTS ix_artifact_boms_canonical_sha ON scanner.artifact_boms (canonical_bom_sha256); CREATE INDEX IF NOT EXISTS ix_artifact_boms_inserted_at ON scanner.artifact_boms (inserted_at DESC); CREATE INDEX IF NOT EXISTS ix_artifact_boms_canonical_gin ON scanner.artifact_boms USING GIN (canonical_bom jsonb_path_ops); CREATE INDEX IF NOT EXISTS ix_artifact_boms_merged_vex_gin ON scanner.artifact_boms USING GIN (merged_vex jsonb_path_ops); ``` Optional partial index for pending triage: ```sql CREATE INDEX IF NOT EXISTS ix_artifact_boms_pending_vex ON scanner.artifact_boms USING GIN (merged_vex jsonb_path_ops) WHERE jsonb_path_exists(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")'); ``` Uniqueness guard (optional): ```sql CREATE UNIQUE INDEX IF NOT EXISTS uq_artifact_boms_monthly_dedupe ON scanner.artifact_boms (canonical_bom_sha256, payload_digest, date_trunc('month', inserted_at)); ``` ## Query Contracts ### Latest by payload digest ```sql SELECT build_id, inserted_at, evidence_score FROM scanner.artifact_boms WHERE payload_digest = $1 ORDER BY inserted_at DESC LIMIT 1; ``` ### Component presence by PURL ```sql SELECT build_id, inserted_at FROM scanner.artifact_boms WHERE canonical_bom @? '$.components[*] ? (@.purl == $purl)' ORDER BY inserted_at DESC LIMIT 50; ``` ### Pending triage extraction ```sql SELECT build_id, inserted_at, jsonb_path_query_array(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")') AS pending FROM scanner.artifact_boms WHERE jsonb_path_exists(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")') ORDER BY inserted_at DESC LIMIT 100; ``` ## Implemented API Surface (Scanner WebService) Base path: `/api/v1/sbom/hot-lookup` - `GET /payload/{payloadDigest}/latest` - Returns latest projection row for digest. - `GET /components?purl=&limit=&offset=` - Component presence search by PURL. - `GET /components?name=&minVersion=&limit=&offset=` - Component presence search by normalized name and optional minimum version. - `GET /pending-triage?limit=&offset=` - Returns rows where merged VEX contains `unknown` or `triage_pending` states and includes extracted pending entries. Pagination constraints: - `limit`: `1..200` (defaults: 50 for component searches, 100 for pending triage). - `offset`: `>= 0`. - Ordering is deterministic: `inserted_at DESC, build_id ASC`. ## Ingestion Contract - Projectors should upsert on `(canonical_bom_sha256, payload_digest)` plus partition window. - `canonical_bom_sha256` must be computed from canonical JSON (stable ordering, UTF-8, deterministic normalization). - Projection rows must preserve deterministic timestamps (UTC) and stable JSON serialization. - If inline JSONB exceeds configured size thresholds, keep JSONB minimal and store full content only by CAS reference fields. ## Operations - Autovacuum enabled per partition; tune by active partition write rate. - Reindex/repack operations should run per partition, never globally. - Partition creation job should pre-create at least one future month. - Retention job should drop old partitions according to policy (for example 90/180/365-day classes by environment). - Keep analytics workloads on `analytics.*`; export to external columnar systems when query volume exceeds OLTP SLO budgets. Operational runbook and jobs: - Runbook: `docs/modules/scanner/operations/sbom-hot-lookup-operations.md` - SQL snippets: `devops/database/postgres-partitioning/003_scanner_artifact_boms_hot_lookup_jobs.sql` - Shell jobs: - `devops/scripts/scanner-artifact-boms-ensure-partitions.sh` - `devops/scripts/scanner-artifact-boms-retention.sh` - Systemd timers: - `devops/scripts/systemd/scanner-artifact-boms-ensure.timer` - `devops/scripts/systemd/scanner-artifact-boms-retention.timer` ## Security and Determinism Notes - Do not store secrets in JSONB payloads. - DSSE and Rekor references must remain verifiable from immutable CAS/object artifacts. - Query responses used in policy decisions must preserve stable ordering and deterministic filtering rules. ## Delivery Link - Implementation sprint: `docs/implplan/SPRINT_20260210_001_DOCS_sbom_attestation_hot_lookup_contract.md`