save checkpoint

This commit is contained in:
master
2026-02-11 01:32:14 +02:00
parent 5593212b41
commit cf5b72974f
2316 changed files with 68799 additions and 3808 deletions

View File

@@ -41,6 +41,8 @@ Scanner analyses container images layer-by-layer, producing deterministic SBOM f
- ./analyzers-go.md
- ./operations/secret-leak-detection.md
- ./operations/dsse-rekor-operator-guide.md
- ./operations/sbom-hot-lookup-operations.md
- ./sbom-attestation-hot-lookup-profile.md
- ./os-analyzers-evidence.md
- ./design/macos-analyzer.md
- ./design/windows-analyzer.md

View File

@@ -3,7 +3,8 @@
> Aligned with Epic 6 – Vulnerability Explorer and Epic 10 – Export Center.
> **Scope.** Implementationâ€ready architecture for the **Scanner** subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), perâ€layer caching, threeâ€way diffs, artifact catalog (RustFS default + PostgreSQL, S3-compatible fallback), attestation handâ€off, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).
> **Related:** `docs/modules/scanner/operations/ai-code-guard.md`
> **Related:** `docs/modules/scanner/operations/ai-code-guard.md`
> **Storage profile:** `docs/modules/scanner/sbom-attestation-hot-lookup-profile.md`
---

View File

@@ -1,60 +1,76 @@
# SLSA Source Track Capture (SC3)
Status: Draft · Date: 2025-12-03
Scope: Define deterministic capture of SLSA Source Track data for replay bundles and CycloneDX 1.7 + CBOM exports. Aligns Scanner record/replay with provenance signals (build-id, repo/ref, provenance DSSE).
Status: Active (partial implementation) | Last Updated: 2026-02-10
Scope: Define deterministic capture of SLSA Source Track data for replay bundles and CycloneDX 1.7 + CBOM exports. Align scanner record/replay with source and build provenance signals.
## Objectives
- Persist source provenance required by SLSA 1.2 Source Track: repo URI, resolved ref, digest of checked-out tree, invocation hash, builder ID, and reproducible build inputs.
- Make data replayable offline: no network fetch; hashes + DSSE envelope paths must resolve locally.
- Keep ordering/hashes deterministic: canonical JSON (sorted keys), BLAKE3-256 primary hash, SHA-256 secondary.
- Persist source provenance required by SLSA 1.2 Source Track: repo URI, resolved ref, commit, source review controls, and policy snapshot signals.
- Make data replayable offline with no network dependency.
- Keep ordering and hashes deterministic with canonical JSON and explicit hash algorithm prefixes.
## Minimal fields (per build)
- `source.repo`: canonical URI (https, ssh); normalized to lower-case host; trailing slash stripped.
- `source.ref`: fully qualified ref (`refs/heads/main`, tag, or immutable commit).
- `source.commit`: 40-hex commit digest.
- `source.treeHash`: BLAKE3-256 of source tree snapshot (stable archive); optional SHA-256 mirror.
- `build.invocation.hash`: BLAKE3-256 of normalized invocation (args/env/tool versions); also store `build.invocation.dsse` hash when signed.
- `builder.id`: URI for builder identity (SLSA-style).
- `provenance.dsse`: SHA-256 of DSSE envelope for provenance statement (e.g., in-toto SLSA provenance v1.0). Optionally include BLAKE3 and CAS URI.
## Shipped Defaults (2026-02-10)
- Build provenance policy supports Source Track controls:
- `minimumReviewApprovals`
- `requireNoSelfMerge`
- `requireProtectedBranch`
- `requireStatusChecksPassed`
- `requirePolicyHash`
- Source metadata is captured from build parameters using keys such as:
- `sourceRef`
- `sourceReviewCount` or `sourceApproverIds`
- `sourceAuthorId` and `sourceMergedById`
- `sourceBranchProtected`
- `sourceStatusChecksPassed`
- `sourcePolicyHash`
- Source policy violations emit deterministic `SourcePolicyFailed` findings.
- In-toto predicate output now includes source review and policy evidence fields.
## JSON shape (suggested)
## Minimal Fields (Per Build)
- `source.repo`: canonical repository URI.
- `source.ref`: fully-qualified source ref (`refs/heads/main`, tag, or immutable commit).
- `source.commit`: immutable source commit.
- `source.review.count`: numeric review approval count.
- `source.review.approvers`: sorted approver identity list.
- `source.review.authorId`: source author identity.
- `source.review.mergedById`: merge actor identity.
- `source.branchProtected`: boolean signal from SCM policy enforcement.
- `source.statusChecksPassed`: boolean signal for required CI checks.
- `source.policyHash`: deterministic digest for branch/review policy snapshot.
## JSON Shape (Current Direction)
```json
{
"source": {
"repo": "https://example.invalid/demo",
"ref": "refs/tags/v1.0.0",
"ref": "refs/heads/main",
"commit": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"treeHash": "b3:1111...",
"builderId": "https://builder.stellaops.local/scanner",
"invocationHash": "b3:2222...",
"invocationDsse": "sha256:3333...",
"provenance": {
"dsse": "sha256:4444...",
"cas": "cas://provenance/demo/v1.0.0.dsse"
"policyHash": "sha256:policy123",
"review": {
"count": 2,
"approvers": ["approver-a", "approver-b"],
"authorId": "author-a",
"mergedById": "approver-a",
"branchProtected": true,
"statusChecksPassed": true
}
}
}
```
## Where to store
- CycloneDX 1.7 + CBOM: encode under `metadata.properties` using namespaced keys:
- `source.repo`, `source.ref`, `source.commit`, `source.tree.hash`, `builder.id`, `build.invocation.hash`, `build.invocation.dsse`, `provenance.dsse`, `provenance.cas`.
- Replay manifest: add `source` block mirroring the JSON shape above; include hashes in manifest subject list.
- CAS: store provenance DSSE envelope under `cas://provenance/{component}/{version}.dsse`; store tree snapshot tarball under `cas://source/{commit}.tar.gz`.
## Determinism Rules
- Canonical JSON (lexicographic keys, UTF-8, no pretty-print) before hashing/signing.
- UTC timestamps with `Z` suffix in exported provenance when timestamps are included.
- Hash values must include algorithm prefix (`sha256:`, `b3:`).
## Determinism rules
- Canonical JSON (lexicographic keys, UTF-8, no pretty-print) before hashing.
- Timestamps in provenance statements must be UTC `Z`; strip milliseconds unless non-zero.
- All hashes recorded with algorithm prefix (`b3:` for BLAKE3-256, `sha256:` for SHA-256).
## Verification
- Verifier MUST: (1) schema-check fields are present; (2) recompute `treeHash` from tree tarball; (3) recompute `build.invocation.hash` from normalized invocation file; (4) verify DSSE envelope hash matches `provenance.dsse` and signature keys; (5) ensure repo/ref/commit are consistent (ref→commit mapping known or provided in bundle).
- Fail closed on any mismatch; never fetch network.
## Verification Expectations
- Verifier fails closed when required Source Track controls are absent or violated.
- Verifier links source control evidence (review, policy hash, branch/status signals) with build provenance identity.
- No external fetch is allowed during verification.
## Fixtures
- `docs/modules/scanner/fixtures/cdx17-cbom/source-track.sample.json` — deterministic example with placeholder hashes.
- Future: add CAS tarball + invocation file under `tests/reachability/fixtures/source-track/` with recomputation script.
- `docs/modules/scanner/fixtures/cdx17-cbom/source-track.sample.json`
## TODO (outside this doc)
- Implement `scripts/scanner/verify_source_track.py` to validate source-track blocks and CAS payloads offline.
- Extend replay manifest schema to include `source` block; add determinism tests in Scanner replay suite once manifest contract lands.
## Remaining Work
- Extend replay manifest schema to include source hash material (`treeHash`, invocation hash, DSSE hash) and offline recomputation assets.
- Add a dedicated offline source-track verifier script for CAS-bound evidence inputs.
- Add first-class SCM/CI attestation ingestion for source controls beyond parameter maps.

View File

@@ -0,0 +1,114 @@
# Scanner SBOM Hot Lookup Operations
Status: Active
Last Updated: 2026-02-10
Sprint: `SPRINT_20260210_001_DOCS_sbom_attestation_hot_lookup_contract` (`HOT-005`)
## Purpose
Operate the `scanner.artifact_boms` monthly partition set used by Scanner SBOM hot lookups:
- pre-create upcoming partitions to avoid month-boundary ingest failures
- enforce retention windows by dropping old partitions
- keep maintenance scoped to partition units (not whole-table rewrites)
## Required Inputs
- PostgreSQL DSN in `PG_DSN`
- migration `025_artifact_boms_hot_lookup.sql` applied
- permissions to execute:
- `scanner.ensure_artifact_boms_future_partitions(int)`
- `scanner.drop_artifact_boms_partitions_older_than(int, bool)`
## Manual Operations
Pre-create current + next month partition:
```bash
PG_DSN="Host=...;Database=...;Username=...;Password=..." \
./devops/scripts/scanner-artifact-boms-ensure-partitions.sh 1
```
Retention dry-run (default keep 12 months):
```bash
PG_DSN="Host=...;Database=...;Username=...;Password=..." \
./devops/scripts/scanner-artifact-boms-retention.sh 12 true
```
Retention execution:
```bash
PG_DSN="Host=...;Database=...;Username=...;Password=..." \
./devops/scripts/scanner-artifact-boms-retention.sh 12 false
```
## Scheduled Jobs
### Cron example
```cron
# first day each month: ensure next partition exists
10 0 1 * * PG_DSN="..." /opt/stellaops/devops/scripts/scanner-artifact-boms-ensure-partitions.sh 1
# daily retention check
15 0 * * * PG_DSN="..." /opt/stellaops/devops/scripts/scanner-artifact-boms-retention.sh 12 false
```
### Systemd units
Install:
```bash
sudo cp devops/scripts/systemd/scanner-artifact-boms-*.service /etc/systemd/system/
sudo cp devops/scripts/systemd/scanner-artifact-boms-*.timer /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now scanner-artifact-boms-ensure.timer
sudo systemctl enable --now scanner-artifact-boms-retention.timer
```
`/etc/stellaops/scanner-hotlookup.env` must define `PG_DSN`.
## Failure Modes and Rollback
### Missing upcoming partition
Symptom:
- ingest errors near month boundary with partition routing failure.
Mitigation:
1. Run `scanner-artifact-boms-ensure-partitions.sh 2`.
2. Re-run failed ingest operations.
### Retention job dropped incorrect partition
Symptom:
- historical hot-lookup rows unexpectedly missing.
Rollback:
1. Restore dropped partition table from latest PostgreSQL backup.
2. Attach restored table back to parent:
```sql
ALTER TABLE scanner.artifact_boms
ATTACH PARTITION scanner.artifact_boms_YYYY_MM
FOR VALUES FROM ('YYYY-MM-01') TO ('YYYY-MM-01'::date + INTERVAL '1 month');
```
3. Rebuild per-partition indexes if restore omitted them.
### Hot partition bloat
Symptom:
- query latency regression on current month.
Mitigation:
1. Run `VACUUM (ANALYZE) scanner.artifact_boms_YYYY_MM;`
2. If needed, run `REINDEX TABLE scanner.artifact_boms_YYYY_MM;`
3. For online reclaim workflows, use `pg_repack` partition-by-partition.
## References
- Schema + functions: `src/Scanner/__Libraries/StellaOps.Scanner.Storage/Postgres/Migrations/025_artifact_boms_hot_lookup.sql`
- SQL job snippets: `devops/database/postgres-partitioning/003_scanner_artifact_boms_hot_lookup_jobs.sql`
- Shell jobs:
- `devops/scripts/scanner-artifact-boms-ensure-partitions.sh`
- `devops/scripts/scanner-artifact-boms-retention.sh`

View File

@@ -0,0 +1,204 @@
# Scanner SBOM and Attestation Hot Lookup Profile
Version: 0.1.0
Status: Draft (Advisory translation)
Last Updated: 2026-02-10
## Purpose
Define a Stella-compatible persistence profile for fast SBOM/attestation lookup queries in PostgreSQL without moving full replay/audit payloads out of CAS/object storage.
## Scope
This profile covers:
- Hot OLTP lookup rows for digest, component, and pending-triage queries.
- Partitioning, indexing, and retention for lookup tables.
- Ingestion projection contracts from Scanner/Attestor outputs.
This profile does not replace:
- CAS/object storage as source of truth for large immutable payloads.
- Analytics star schema in `analytics.*`.
- Existing proof bundle and witness contracts.
## Current Baseline (confirmed)
- Scanner stores full SBOM artifacts in object storage via `ArtifactStorageService` and `ArtifactObjectKeyBuilder`.
- Scanner catalog metadata is stored in PostgreSQL (`scanner.artifacts`, `scanner.links`, related tables).
- DSSE and proof metadata already use JSONB where needed (`proof_bundle.dsse_envelope`, `scanner.witnesses.dsse_envelope`).
- High-volume partitioning currently exists for time-series style tables (for example EPSS and runtime samples), not for SBOM component lookup projections.
- Component-level hot lookup acceleration is currently driven by the BOM-index sidecar contract.
## Advisory Fit Assessment
Aligned with current direction:
- Keep exact-match routing keys narrow and indexed.
- Use JSONB GIN indexes only on query paths that are actually hot.
- Partition by time for deterministic retention.
- Keep analytics and rollups away from Scanner OLTP hot paths.
Gap requiring implementation:
- No explicit Scanner Postgres contract exists for a partitioned SBOM/attestation lookup projection that supports direct SQL lookups by payload digest, component PURL/name/version, and merged VEX pending state.
## Target Contract
### 1) Authoritative storage split
- Authoritative blobs:
- `raw_bom`, canonical SBOM documents, DSSE envelopes, and merged VEX payloads remain in CAS/object storage.
- PostgreSQL rows reference these via artifact IDs/URIs and store bounded JSONB projections for search.
- Authoritative decisions:
- Policy decisions remain in their existing modules.
### 2) Hot lookup table
Create a new append-only projection table:
```sql
CREATE TABLE scanner.artifact_boms (
build_id TEXT NOT NULL,
canonical_bom_sha256 TEXT NOT NULL,
payload_digest TEXT NOT NULL,
inserted_at TIMESTAMPTZ NOT NULL DEFAULT now(),
raw_bom_ref TEXT,
canonical_bom_ref TEXT,
dsse_envelope_ref TEXT,
merged_vex_ref TEXT,
canonical_bom JSONB,
merged_vex JSONB,
attestations JSONB,
evidence_score INTEGER NOT NULL DEFAULT 0,
rekor_tile_id TEXT,
PRIMARY KEY (build_id, inserted_at)
) PARTITION BY RANGE (inserted_at);
```
Partition policy:
- Monthly range partitions.
- `DEFAULT` partition optional for safety.
- Retention by `DROP TABLE scanner.artifact_boms_YYYY_MM`.
### 3) Index profile
Required:
```sql
CREATE INDEX IF NOT EXISTS ix_artifact_boms_payload_digest
ON scanner.artifact_boms (payload_digest, inserted_at DESC);
CREATE INDEX IF NOT EXISTS ix_artifact_boms_canonical_sha
ON scanner.artifact_boms (canonical_bom_sha256);
CREATE INDEX IF NOT EXISTS ix_artifact_boms_inserted_at
ON scanner.artifact_boms (inserted_at DESC);
CREATE INDEX IF NOT EXISTS ix_artifact_boms_canonical_gin
ON scanner.artifact_boms USING GIN (canonical_bom jsonb_path_ops);
CREATE INDEX IF NOT EXISTS ix_artifact_boms_merged_vex_gin
ON scanner.artifact_boms USING GIN (merged_vex jsonb_path_ops);
```
Optional partial index for pending triage:
```sql
CREATE INDEX IF NOT EXISTS ix_artifact_boms_pending_vex
ON scanner.artifact_boms USING GIN (merged_vex jsonb_path_ops)
WHERE jsonb_path_exists(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")');
```
Uniqueness guard (optional):
```sql
CREATE UNIQUE INDEX IF NOT EXISTS uq_artifact_boms_monthly_dedupe
ON scanner.artifact_boms (canonical_bom_sha256, payload_digest, date_trunc('month', inserted_at));
```
## Query Contracts
### Latest by payload digest
```sql
SELECT build_id, inserted_at, evidence_score
FROM scanner.artifact_boms
WHERE payload_digest = $1
ORDER BY inserted_at DESC
LIMIT 1;
```
### Component presence by PURL
```sql
SELECT build_id, inserted_at
FROM scanner.artifact_boms
WHERE canonical_bom @? '$.components[*] ? (@.purl == $purl)'
ORDER BY inserted_at DESC
LIMIT 50;
```
### Pending triage extraction
```sql
SELECT build_id, inserted_at,
jsonb_path_query_array(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")') AS pending
FROM scanner.artifact_boms
WHERE jsonb_path_exists(merged_vex, '$[*] ? (@.state == "unknown" || @.state == "triage_pending")')
ORDER BY inserted_at DESC
LIMIT 100;
```
## Implemented API Surface (Scanner WebService)
Base path: `/api/v1/sbom/hot-lookup`
- `GET /payload/{payloadDigest}/latest`
- Returns latest projection row for digest.
- `GET /components?purl=<purl>&limit=<n>&offset=<n>`
- Component presence search by PURL.
- `GET /components?name=<name>&minVersion=<version>&limit=<n>&offset=<n>`
- Component presence search by normalized name and optional minimum version.
- `GET /pending-triage?limit=<n>&offset=<n>`
- Returns rows where merged VEX contains `unknown` or `triage_pending` states and includes extracted pending entries.
Pagination constraints:
- `limit`: `1..200` (defaults: 50 for component searches, 100 for pending triage).
- `offset`: `>= 0`.
- Ordering is deterministic: `inserted_at DESC, build_id ASC`.
## Ingestion Contract
- Projectors should upsert on `(canonical_bom_sha256, payload_digest)` plus partition window.
- `canonical_bom_sha256` must be computed from canonical JSON (stable ordering, UTF-8, deterministic normalization).
- Projection rows must preserve deterministic timestamps (UTC) and stable JSON serialization.
- If inline JSONB exceeds configured size thresholds, keep JSONB minimal and store full content only by CAS reference fields.
## Operations
- Autovacuum enabled per partition; tune by active partition write rate.
- Reindex/repack operations should run per partition, never globally.
- Partition creation job should pre-create at least one future month.
- Retention job should drop old partitions according to policy (for example 90/180/365-day classes by environment).
- Keep analytics workloads on `analytics.*`; export to external columnar systems when query volume exceeds OLTP SLO budgets.
Operational runbook and jobs:
- Runbook: `docs/modules/scanner/operations/sbom-hot-lookup-operations.md`
- SQL snippets: `devops/database/postgres-partitioning/003_scanner_artifact_boms_hot_lookup_jobs.sql`
- Shell jobs:
- `devops/scripts/scanner-artifact-boms-ensure-partitions.sh`
- `devops/scripts/scanner-artifact-boms-retention.sh`
- Systemd timers:
- `devops/scripts/systemd/scanner-artifact-boms-ensure.timer`
- `devops/scripts/systemd/scanner-artifact-boms-retention.timer`
## Security and Determinism Notes
- Do not store secrets in JSONB payloads.
- DSSE and Rekor references must remain verifiable from immutable CAS/object artifacts.
- Query responses used in policy decisions must preserve stable ordering and deterministic filtering rules.
## Delivery Link
- Implementation sprint: `docs/implplan/SPRINT_20260210_001_DOCS_sbom_attestation_hot_lookup_contract.md`