# Competitor Ingest Normalization (CM1) ## Purpose Define how external SBOM/scan outputs (Syft, Trivy, Clair) are normalized into StellaOps schemas with deterministic ordering, provenance checks, and offline-ready adapters. Covers CM1–CM10 in the 31-Nov-2025 findings advisory. ## Scope - Import pipeline for external SBOM + vulnerability scan payloads. - Adapter mappings, validation, provenance/signature verification, and fallback rules. - Offline ingest kits (adapters + fixtures) and regression tests. ## Deliverables (CM tasks) - CM1: Mapping tables per tool → StellaOps SBOM/scan schema; required/optional fields; deterministic sort rules. - CM2: Signature/provenance verification policy (acceptable algorithms, trust roots, failure modes). - CM3: Snapshot governance: versioning, freshness SLA, rollback plan for imported feeds. - CM4: Anomaly regression suite (schema drift, nullables, encoding, ordering). Golden fixtures + hashes. - CM5: Offline ingest kit: DSSE-signed adapters/mappings/fixtures with tool versions and hashes. - CM6: Fallback hierarchy when data incomplete (signed SBOM → unsigned SBOM → scan → defaults) with explicit decision trace. - CM7: Source transparency fields (tool name/version/hash, build metadata) persisted and surfaced. - CM8: Benchmark parity plan with upstream tools (pinned versions, hash-logged runs). - CM9: Coverage matrix by ecosystem; gap tracker. - CM10: Retry/backoff/error taxonomy and deterministic diagnostics. ## Determinism & Validation - Adapters must sort components and vulnerabilities deterministically (locale-invariant, stable keys). - All mapping rules and fixtures carry BLAKE3/SHA256 hashes; adapters are pure functions (no network). - Signature verification rejects unverifiable payloads; logs reason codes; can run offline using bundled trust roots. ## Adapter mapping skeleton (CM1) - Tool coverage v0.1: Syft 1.0.x, Trivy 0.50.x, Clair 6.x (pin exact versions in fixtures). - Mapping tables (CSV, checked in under `docs/modules/scanner/fixtures/competitor-adapters/`): - component: external fields → `name`, `version`, `purl`, `type`, `hashes`, `licenses`, `evidenceRef`. - vulnerability: `id`, `source`, `severity` (normalised), `cvss` (score/vector), `fixVersions`, `evidenceRef`. - metadata: tool name/version/hash, scan timestamp (UTC), data source. - Sorting: components by `purl` → `name` → `version`; vulns by `id` → `source` → `severityScore` desc → `cvss.vector`. ## Verification policy (CM2) - Acceptable signatures: DSSE/COSE/JWS with SHA256/Ed25519/ECDSA; trust roots bundled in offline kit. - Provenance check: require signer identity + hash match; if missing, mark provenance = `unknown` and apply fallback (CM6). ## Snapshot governance (CM3) - Freshness budget: max age 7 days from `scanTimestamp`; reject older unless override flag set (logged). - Versioning: stored as `snapshot_version` (semver) and `source_tool_hash`; rollback plan requires prior snapshot hash. ## Regression + fixtures (CM4/CM5) - Fixtures under `docs/modules/scanner/fixtures/competitor-adapters/fixtures/` with golden hashes (BLAKE3/SHA256) and expected normalized output. - CI step runs adapter → normalized → hash compare; offline, no network. ## Fallback hierarchy (CM6) 1) Signed SBOM w/ valid provenance → accepted. 2) Unsigned SBOM → accepted with `provenance=unknown`, warnings emitted. 3) Scan-only results → accepted with degraded confidence; policy lattice may penalize. 4) If all absent: reject with reason code `no_evidence`. ## Transparency & coverage (CM7–CM9) - Persist: `source.tool`, `source.version`, `source.hash`, `adapter.version`, `normalized_hash`. - Coverage matrix maintained in `docs/modules/scanner/fixtures/competitor-adapters/coverage.csv` (ecosystem yes/no, notes). - Bench parity (CM8): pin upstream versions; store run hashes/logs in fixtures folder. ## Error taxonomy (CM10) - Retryable: network/unavailable (should not occur in offline mode), rate-limit, transient IO. - Non-retryable: signature_invalid, schema_invalid, unsupported_version, no_evidence. - All errors must carry deterministic reason codes and be logged in normalized output metadata. ## Open Items - Decide minimal evidence set for accepting unsigned SBOMs (intermediate level before scan-only fallback). - Confirm which hash (BLAKE3/SHA256) is canonical for adapter outputs. ## Links - Sprint: `docs/implplan/SPRINT_0186_0001_0001_record_deterministic_execution.md` (CM1–CM10) - Advisory: `docs/product-advisories/31-Nov-2025 FINDINGS.md`