Files
git.stella-ops.org/docs/modules/scanner/sbom-attestation-hot-lookup-profile.md
2026-02-11 01:32:14 +02:00

205 lines
7.3 KiB
Markdown

# Scanner SBOM and Attestation Hot Lookup Profile
Version: 0.1.0
Status: Draft (Advisory translation)
Last Updated: 2026-02-10
## Purpose
Define a Stella-compatible persistence profile for fast SBOM/attestation lookup queries in PostgreSQL without moving full replay/audit payloads out of CAS/object storage.
## Scope
This profile covers:
- Hot OLTP lookup rows for digest, component, and pending-triage queries.
- Partitioning, indexing, and retention for lookup tables.
- Ingestion projection contracts from Scanner/Attestor outputs.
This profile does not replace:
- CAS/object storage as source of truth for large immutable payloads.
- Analytics star schema in `analytics.*`.
- Existing proof bundle and witness contracts.
## Current Baseline (confirmed)
- Scanner stores full SBOM artifacts in object storage via `ArtifactStorageService` and `ArtifactObjectKeyBuilder`.
- Scanner catalog metadata is stored in PostgreSQL (`scanner.artifacts`, `scanner.links`, related tables).
- DSSE and proof metadata already use JSONB where needed (`proof_bundle.dsse_envelope`, `scanner.witnesses.dsse_envelope`).
- High-volume partitioning currently exists for time-series style tables (for example EPSS and runtime samples), not for SBOM component lookup projections.
- Component-level hot lookup acceleration is currently driven by the BOM-index sidecar contract.
## Advisory Fit Assessment
Aligned with current direction:
- Keep exact-match routing keys narrow and indexed.
- Use JSONB GIN indexes only on query paths that are actually hot.
- Partition by time for deterministic retention.
- Keep analytics and rollups away from Scanner OLTP hot paths.
Gap requiring implementation:
- No explicit Scanner Postgres contract exists for a partitioned SBOM/attestation lookup projection that supports direct SQL lookups by payload digest, component PURL/name/version, and merged VEX pending state.
## Target Contract
### 1) Authoritative storage split
- Authoritative blobs:
- `raw_bom`, canonical SBOM documents, DSSE envelopes, and merged VEX payloads remain in CAS/object storage.
- PostgreSQL rows reference these via artifact IDs/URIs and store bounded JSONB projections for search.
- Authoritative decisions:
- Policy decisions remain in their existing modules.
### 2) Hot lookup table
Create a new append-only projection table:
```sql
CREATE TABLE scanner.artifact_boms (
build_id TEXT NOT NULL,
canonical_bom_sha256 TEXT NOT NULL,
payload_digest TEXT NOT NULL,
inserted_at TIMESTAMPTZ NOT NULL DEFAULT now(),
raw_bom_ref TEXT,
canonical_bom_ref TEXT,
dsse_envelope_ref TEXT,
merged_vex_ref TEXT,
canonical_bom JSONB,
merged_vex JSONB,
attestations JSONB,
evidence_score INTEGER NOT NULL DEFAULT 0,
rekor_tile_id TEXT,
PRIMARY KEY (build_id, inserted_at)
) PARTITION BY RANGE (inserted_at);
```
Partition policy:
- Monthly range partitions.
- `DEFAULT` partition optional for safety.
- Retention by `DROP TABLE scanner.artifact_boms_YYYY_MM`.
### 3) Index profile
Required:
```sql
CREATE INDEX IF NOT EXISTS ix_artifact_boms_payload_digest
ON scanner.artifact_boms (payload_digest, inserted_at DESC);
CREATE INDEX IF NOT EXISTS ix_artifact_boms_canonical_sha
ON scanner.artifact_boms (canonical_bom_sha256);
CREATE INDEX IF NOT EXISTS ix_artifact_boms_inserted_at
ON scanner.artifact_boms (inserted_at DESC);
CREATE INDEX IF NOT EXISTS ix_artifact_boms_canonical_gin
ON scanner.artifact_boms USING GIN (canonical_bom jsonb_path_ops);
CREATE INDEX IF NOT EXISTS ix_artifact_boms_merged_vex_gin
ON scanner.artifact_boms USING GIN (merged_vex jsonb_path_ops);
```
Optional partial index for pending triage:
```sql
CREATE INDEX IF NOT EXISTS ix_artifact_boms_pending_vex
ON scanner.artifact_boms USING GIN (merged_vex jsonb_path_ops)
WHERE jsonb_path_exists(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")');
```
Uniqueness guard (optional):
```sql
CREATE UNIQUE INDEX IF NOT EXISTS uq_artifact_boms_monthly_dedupe
ON scanner.artifact_boms (canonical_bom_sha256, payload_digest, date_trunc('month', inserted_at));
```
## Query Contracts
### Latest by payload digest
```sql
SELECT build_id, inserted_at, evidence_score
FROM scanner.artifact_boms
WHERE payload_digest = $1
ORDER BY inserted_at DESC
LIMIT 1;
```
### Component presence by PURL
```sql
SELECT build_id, inserted_at
FROM scanner.artifact_boms
WHERE canonical_bom @? '$.components[*] ? (@.purl == $purl)'
ORDER BY inserted_at DESC
LIMIT 50;
```
### Pending triage extraction
```sql
SELECT build_id, inserted_at,
jsonb_path_query_array(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")') AS pending
FROM scanner.artifact_boms
WHERE jsonb_path_exists(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")')
ORDER BY inserted_at DESC
LIMIT 100;
```
## Implemented API Surface (Scanner WebService)
Base path: `/api/v1/sbom/hot-lookup`
- `GET /payload/{payloadDigest}/latest`
- Returns latest projection row for digest.
- `GET /components?purl=<purl>&limit=<n>&offset=<n>`
- Component presence search by PURL.
- `GET /components?name=<name>&minVersion=<version>&limit=<n>&offset=<n>`
- Component presence search by normalized name and optional minimum version.
- `GET /pending-triage?limit=<n>&offset=<n>`
- Returns rows where merged VEX contains `unknown` or `triage_pending` states and includes extracted pending entries.
Pagination constraints:
- `limit`: `1..200` (defaults: 50 for component searches, 100 for pending triage).
- `offset`: `>= 0`.
- Ordering is deterministic: `inserted_at DESC, build_id ASC`.
## Ingestion Contract
- Projectors should upsert on `(canonical_bom_sha256, payload_digest)` plus partition window.
- `canonical_bom_sha256` must be computed from canonical JSON (stable ordering, UTF-8, deterministic normalization).
- Projection rows must preserve deterministic timestamps (UTC) and stable JSON serialization.
- If inline JSONB exceeds configured size thresholds, keep JSONB minimal and store full content only by CAS reference fields.
## Operations
- Autovacuum enabled per partition; tune by active partition write rate.
- Reindex/repack operations should run per partition, never globally.
- Partition creation job should pre-create at least one future month.
- Retention job should drop old partitions according to policy (for example 90/180/365-day classes by environment).
- Keep analytics workloads on `analytics.*`; export to external columnar systems when query volume exceeds OLTP SLO budgets.
Operational runbook and jobs:
- Runbook: `docs/modules/scanner/operations/sbom-hot-lookup-operations.md`
- SQL snippets: `devops/database/postgres-partitioning/003_scanner_artifact_boms_hot_lookup_jobs.sql`
- Shell jobs:
- `devops/scripts/scanner-artifact-boms-ensure-partitions.sh`
- `devops/scripts/scanner-artifact-boms-retention.sh`
- Systemd timers:
- `devops/scripts/systemd/scanner-artifact-boms-ensure.timer`
- `devops/scripts/systemd/scanner-artifact-boms-retention.timer`
## Security and Determinism Notes
- Do not store secrets in JSONB payloads.
- DSSE and Rekor references must remain verifiable from immutable CAS/object artifacts.
- Query responses used in policy decisions must preserve stable ordering and deterministic filtering rules.
## Delivery Link
- Implementation sprint: `docs/implplan/SPRINT_20260210_001_DOCS_sbom_attestation_hot_lookup_contract.md`