205 lines
7.3 KiB
Markdown
205 lines
7.3 KiB
Markdown
# Scanner SBOM and Attestation Hot Lookup Profile
|
|
|
|
Version: 0.1.0
|
|
Status: Draft (Advisory translation)
|
|
Last Updated: 2026-02-10
|
|
|
|
## Purpose
|
|
|
|
Define a Stella-compatible persistence profile for fast SBOM/attestation lookup queries in PostgreSQL without moving full replay/audit payloads out of CAS/object storage.
|
|
|
|
## Scope
|
|
|
|
This profile covers:
|
|
- Hot OLTP lookup rows for digest, component, and pending-triage queries.
|
|
- Partitioning, indexing, and retention for lookup tables.
|
|
- Ingestion projection contracts from Scanner/Attestor outputs.
|
|
|
|
This profile does not replace:
|
|
- CAS/object storage as source of truth for large immutable payloads.
|
|
- Analytics star schema in `analytics.*`.
|
|
- Existing proof bundle and witness contracts.
|
|
|
|
## Current Baseline (confirmed)
|
|
|
|
- Scanner stores full SBOM artifacts in object storage via `ArtifactStorageService` and `ArtifactObjectKeyBuilder`.
|
|
- Scanner catalog metadata is stored in PostgreSQL (`scanner.artifacts`, `scanner.links`, related tables).
|
|
- DSSE and proof metadata already use JSONB where needed (`proof_bundle.dsse_envelope`, `scanner.witnesses.dsse_envelope`).
|
|
- High-volume partitioning currently exists for time-series style tables (for example EPSS and runtime samples), not for SBOM component lookup projections.
|
|
- Component-level hot lookup acceleration is currently driven by the BOM-index sidecar contract.
|
|
|
|
## Advisory Fit Assessment
|
|
|
|
Aligned with current direction:
|
|
- Keep exact-match routing keys narrow and indexed.
|
|
- Use JSONB GIN indexes only on query paths that are actually hot.
|
|
- Partition by time for deterministic retention.
|
|
- Keep analytics and rollups away from Scanner OLTP hot paths.
|
|
|
|
Gap requiring implementation:
|
|
- No explicit Scanner Postgres contract exists for a partitioned SBOM/attestation lookup projection that supports direct SQL lookups by payload digest, component PURL/name/version, and merged VEX pending state.
|
|
|
|
## Target Contract
|
|
|
|
### 1) Authoritative storage split
|
|
|
|
- Authoritative blobs:
|
|
- `raw_bom`, canonical SBOM documents, DSSE envelopes, and merged VEX payloads remain in CAS/object storage.
|
|
- PostgreSQL rows reference these via artifact IDs/URIs and store bounded JSONB projections for search.
|
|
- Authoritative decisions:
|
|
- Policy decisions remain in their existing modules.
|
|
|
|
### 2) Hot lookup table
|
|
|
|
Create a new append-only projection table:
|
|
|
|
```sql
|
|
CREATE TABLE scanner.artifact_boms (
|
|
build_id TEXT NOT NULL,
|
|
canonical_bom_sha256 TEXT NOT NULL,
|
|
payload_digest TEXT NOT NULL,
|
|
inserted_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
|
|
|
raw_bom_ref TEXT,
|
|
canonical_bom_ref TEXT,
|
|
dsse_envelope_ref TEXT,
|
|
merged_vex_ref TEXT,
|
|
|
|
canonical_bom JSONB,
|
|
merged_vex JSONB,
|
|
attestations JSONB,
|
|
|
|
evidence_score INTEGER NOT NULL DEFAULT 0,
|
|
rekor_tile_id TEXT,
|
|
|
|
PRIMARY KEY (build_id, inserted_at)
|
|
) PARTITION BY RANGE (inserted_at);
|
|
```
|
|
|
|
Partition policy:
|
|
- Monthly range partitions.
|
|
- `DEFAULT` partition optional for safety.
|
|
- Retention by `DROP TABLE scanner.artifact_boms_YYYY_MM`.
|
|
|
|
### 3) Index profile
|
|
|
|
Required:
|
|
|
|
```sql
|
|
CREATE INDEX IF NOT EXISTS ix_artifact_boms_payload_digest
|
|
ON scanner.artifact_boms (payload_digest, inserted_at DESC);
|
|
|
|
CREATE INDEX IF NOT EXISTS ix_artifact_boms_canonical_sha
|
|
ON scanner.artifact_boms (canonical_bom_sha256);
|
|
|
|
CREATE INDEX IF NOT EXISTS ix_artifact_boms_inserted_at
|
|
ON scanner.artifact_boms (inserted_at DESC);
|
|
|
|
CREATE INDEX IF NOT EXISTS ix_artifact_boms_canonical_gin
|
|
ON scanner.artifact_boms USING GIN (canonical_bom jsonb_path_ops);
|
|
|
|
CREATE INDEX IF NOT EXISTS ix_artifact_boms_merged_vex_gin
|
|
ON scanner.artifact_boms USING GIN (merged_vex jsonb_path_ops);
|
|
```
|
|
|
|
Optional partial index for pending triage:
|
|
|
|
```sql
|
|
CREATE INDEX IF NOT EXISTS ix_artifact_boms_pending_vex
|
|
ON scanner.artifact_boms USING GIN (merged_vex jsonb_path_ops)
|
|
WHERE jsonb_path_exists(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")');
|
|
```
|
|
|
|
Uniqueness guard (optional):
|
|
|
|
```sql
|
|
CREATE UNIQUE INDEX IF NOT EXISTS uq_artifact_boms_monthly_dedupe
|
|
ON scanner.artifact_boms (canonical_bom_sha256, payload_digest, date_trunc('month', inserted_at));
|
|
```
|
|
|
|
## Query Contracts
|
|
|
|
### Latest by payload digest
|
|
|
|
```sql
|
|
SELECT build_id, inserted_at, evidence_score
|
|
FROM scanner.artifact_boms
|
|
WHERE payload_digest = $1
|
|
ORDER BY inserted_at DESC
|
|
LIMIT 1;
|
|
```
|
|
|
|
### Component presence by PURL
|
|
|
|
```sql
|
|
SELECT build_id, inserted_at
|
|
FROM scanner.artifact_boms
|
|
WHERE canonical_bom @? '$.components[*] ? (@.purl == $purl)'
|
|
ORDER BY inserted_at DESC
|
|
LIMIT 50;
|
|
```
|
|
|
|
### Pending triage extraction
|
|
|
|
```sql
|
|
SELECT build_id, inserted_at,
|
|
jsonb_path_query_array(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")') AS pending
|
|
FROM scanner.artifact_boms
|
|
WHERE jsonb_path_exists(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")')
|
|
ORDER BY inserted_at DESC
|
|
LIMIT 100;
|
|
```
|
|
|
|
## Implemented API Surface (Scanner WebService)
|
|
|
|
Base path: `/api/v1/sbom/hot-lookup`
|
|
|
|
- `GET /payload/{payloadDigest}/latest`
|
|
- Returns latest projection row for digest.
|
|
- `GET /components?purl=<purl>&limit=<n>&offset=<n>`
|
|
- Component presence search by PURL.
|
|
- `GET /components?name=<name>&minVersion=<version>&limit=<n>&offset=<n>`
|
|
- Component presence search by normalized name and optional minimum version.
|
|
- `GET /pending-triage?limit=<n>&offset=<n>`
|
|
- Returns rows where merged VEX contains `unknown` or `triage_pending` states and includes extracted pending entries.
|
|
|
|
Pagination constraints:
|
|
- `limit`: `1..200` (defaults: 50 for component searches, 100 for pending triage).
|
|
- `offset`: `>= 0`.
|
|
- Ordering is deterministic: `inserted_at DESC, build_id ASC`.
|
|
|
|
## Ingestion Contract
|
|
|
|
- Projectors should upsert on `(canonical_bom_sha256, payload_digest)` plus partition window.
|
|
- `canonical_bom_sha256` must be computed from canonical JSON (stable ordering, UTF-8, deterministic normalization).
|
|
- Projection rows must preserve deterministic timestamps (UTC) and stable JSON serialization.
|
|
- If inline JSONB exceeds configured size thresholds, keep JSONB minimal and store full content only by CAS reference fields.
|
|
|
|
## Operations
|
|
|
|
- Autovacuum enabled per partition; tune by active partition write rate.
|
|
- Reindex/repack operations should run per partition, never globally.
|
|
- Partition creation job should pre-create at least one future month.
|
|
- Retention job should drop old partitions according to policy (for example 90/180/365-day classes by environment).
|
|
- Keep analytics workloads on `analytics.*`; export to external columnar systems when query volume exceeds OLTP SLO budgets.
|
|
|
|
Operational runbook and jobs:
|
|
- Runbook: `docs/modules/scanner/operations/sbom-hot-lookup-operations.md`
|
|
- SQL snippets: `devops/database/postgres-partitioning/003_scanner_artifact_boms_hot_lookup_jobs.sql`
|
|
- Shell jobs:
|
|
- `devops/scripts/scanner-artifact-boms-ensure-partitions.sh`
|
|
- `devops/scripts/scanner-artifact-boms-retention.sh`
|
|
- Systemd timers:
|
|
- `devops/scripts/systemd/scanner-artifact-boms-ensure.timer`
|
|
- `devops/scripts/systemd/scanner-artifact-boms-retention.timer`
|
|
|
|
## Security and Determinism Notes
|
|
|
|
- Do not store secrets in JSONB payloads.
|
|
- DSSE and Rekor references must remain verifiable from immutable CAS/object artifacts.
|
|
- Query responses used in policy decisions must preserve stable ordering and deterministic filtering rules.
|
|
|
|
## Delivery Link
|
|
|
|
- Implementation sprint: `docs/implplan/SPRINT_20260210_001_DOCS_sbom_attestation_hot_lookup_contract.md`
|