7.3 KiB
7.3 KiB
Scanner SBOM and Attestation Hot Lookup Profile
Version: 0.1.0
Status: Draft (Advisory translation)
Last Updated: 2026-02-10
Purpose
Define a Stella-compatible persistence profile for fast SBOM/attestation lookup queries in PostgreSQL without moving full replay/audit payloads out of CAS/object storage.
Scope
This profile covers:
- Hot OLTP lookup rows for digest, component, and pending-triage queries.
- Partitioning, indexing, and retention for lookup tables.
- Ingestion projection contracts from Scanner/Attestor outputs.
This profile does not replace:
- CAS/object storage as source of truth for large immutable payloads.
- Analytics star schema in
analytics.*. - Existing proof bundle and witness contracts.
Current Baseline (confirmed)
- Scanner stores full SBOM artifacts in object storage via
ArtifactStorageServiceandArtifactObjectKeyBuilder. - Scanner catalog metadata is stored in PostgreSQL (
scanner.artifacts,scanner.links, related tables). - DSSE and proof metadata already use JSONB where needed (
proof_bundle.dsse_envelope,scanner.witnesses.dsse_envelope). - High-volume partitioning currently exists for time-series style tables (for example EPSS and runtime samples), not for SBOM component lookup projections.
- Component-level hot lookup acceleration is currently driven by the BOM-index sidecar contract.
Advisory Fit Assessment
Aligned with current direction:
- Keep exact-match routing keys narrow and indexed.
- Use JSONB GIN indexes only on query paths that are actually hot.
- Partition by time for deterministic retention.
- Keep analytics and rollups away from Scanner OLTP hot paths.
Gap requiring implementation:
- No explicit Scanner Postgres contract exists for a partitioned SBOM/attestation lookup projection that supports direct SQL lookups by payload digest, component PURL/name/version, and merged VEX pending state.
Target Contract
1) Authoritative storage split
- Authoritative blobs:
raw_bom, canonical SBOM documents, DSSE envelopes, and merged VEX payloads remain in CAS/object storage.- PostgreSQL rows reference these via artifact IDs/URIs and store bounded JSONB projections for search.
- Authoritative decisions:
- Policy decisions remain in their existing modules.
2) Hot lookup table
Create a new append-only projection table:
CREATE TABLE scanner.artifact_boms (
build_id TEXT NOT NULL,
canonical_bom_sha256 TEXT NOT NULL,
payload_digest TEXT NOT NULL,
inserted_at TIMESTAMPTZ NOT NULL DEFAULT now(),
raw_bom_ref TEXT,
canonical_bom_ref TEXT,
dsse_envelope_ref TEXT,
merged_vex_ref TEXT,
canonical_bom JSONB,
merged_vex JSONB,
attestations JSONB,
evidence_score INTEGER NOT NULL DEFAULT 0,
rekor_tile_id TEXT,
PRIMARY KEY (build_id, inserted_at)
) PARTITION BY RANGE (inserted_at);
Partition policy:
- Monthly range partitions.
DEFAULTpartition optional for safety.- Retention by
DROP TABLE scanner.artifact_boms_YYYY_MM.
3) Index profile
Required:
CREATE INDEX IF NOT EXISTS ix_artifact_boms_payload_digest
ON scanner.artifact_boms (payload_digest, inserted_at DESC);
CREATE INDEX IF NOT EXISTS ix_artifact_boms_canonical_sha
ON scanner.artifact_boms (canonical_bom_sha256);
CREATE INDEX IF NOT EXISTS ix_artifact_boms_inserted_at
ON scanner.artifact_boms (inserted_at DESC);
CREATE INDEX IF NOT EXISTS ix_artifact_boms_canonical_gin
ON scanner.artifact_boms USING GIN (canonical_bom jsonb_path_ops);
CREATE INDEX IF NOT EXISTS ix_artifact_boms_merged_vex_gin
ON scanner.artifact_boms USING GIN (merged_vex jsonb_path_ops);
Optional partial index for pending triage:
CREATE INDEX IF NOT EXISTS ix_artifact_boms_pending_vex
ON scanner.artifact_boms USING GIN (merged_vex jsonb_path_ops)
WHERE jsonb_path_exists(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")');
Uniqueness guard (optional):
CREATE UNIQUE INDEX IF NOT EXISTS uq_artifact_boms_monthly_dedupe
ON scanner.artifact_boms (canonical_bom_sha256, payload_digest, date_trunc('month', inserted_at));
Query Contracts
Latest by payload digest
SELECT build_id, inserted_at, evidence_score
FROM scanner.artifact_boms
WHERE payload_digest = $1
ORDER BY inserted_at DESC
LIMIT 1;
Component presence by PURL
SELECT build_id, inserted_at
FROM scanner.artifact_boms
WHERE canonical_bom @? '$.components[*] ? (@.purl == $purl)'
ORDER BY inserted_at DESC
LIMIT 50;
Pending triage extraction
SELECT build_id, inserted_at,
jsonb_path_query_array(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")') AS pending
FROM scanner.artifact_boms
WHERE jsonb_path_exists(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")')
ORDER BY inserted_at DESC
LIMIT 100;
Implemented API Surface (Scanner WebService)
Base path: /api/v1/sbom/hot-lookup
GET /payload/{payloadDigest}/latest- Returns latest projection row for digest.
GET /components?purl=<purl>&limit=<n>&offset=<n>- Component presence search by PURL.
GET /components?name=<name>&minVersion=<version>&limit=<n>&offset=<n>- Component presence search by normalized name and optional minimum version.
GET /pending-triage?limit=<n>&offset=<n>- Returns rows where merged VEX contains
unknownortriage_pendingstates and includes extracted pending entries.
- Returns rows where merged VEX contains
Pagination constraints:
limit:1..200(defaults: 50 for component searches, 100 for pending triage).offset:>= 0.- Ordering is deterministic:
inserted_at DESC, build_id ASC.
Ingestion Contract
- Projectors should upsert on
(canonical_bom_sha256, payload_digest)plus partition window. canonical_bom_sha256must be computed from canonical JSON (stable ordering, UTF-8, deterministic normalization).- Projection rows must preserve deterministic timestamps (UTC) and stable JSON serialization.
- If inline JSONB exceeds configured size thresholds, keep JSONB minimal and store full content only by CAS reference fields.
Operations
- Autovacuum enabled per partition; tune by active partition write rate.
- Reindex/repack operations should run per partition, never globally.
- Partition creation job should pre-create at least one future month.
- Retention job should drop old partitions according to policy (for example 90/180/365-day classes by environment).
- Keep analytics workloads on
analytics.*; export to external columnar systems when query volume exceeds OLTP SLO budgets.
Operational runbook and jobs:
- Runbook:
docs/modules/scanner/operations/sbom-hot-lookup-operations.md - SQL snippets:
devops/database/postgres-partitioning/003_scanner_artifact_boms_hot_lookup_jobs.sql - Shell jobs:
devops/scripts/scanner-artifact-boms-ensure-partitions.shdevops/scripts/scanner-artifact-boms-retention.sh
- Systemd timers:
devops/scripts/systemd/scanner-artifact-boms-ensure.timerdevops/scripts/systemd/scanner-artifact-boms-retention.timer
Security and Determinism Notes
- Do not store secrets in JSONB payloads.
- DSSE and Rekor references must remain verifiable from immutable CAS/object artifacts.
- Query responses used in policy decisions must preserve stable ordering and deterministic filtering rules.
Delivery Link
- Implementation sprint:
docs/implplan/SPRINT_20260210_001_DOCS_sbom_attestation_hot_lookup_contract.md