# Competitor Ingest Normalization (CM1) ## Purpose Define how external SBOM/scan outputs (Syft, Trivy, Clair) are normalized into StellaOps schemas with deterministic ordering, provenance checks, and offline-ready adapters. Covers CM1–CM10 in the 31-Nov-2025 findings advisory. ## Scope - Import pipeline for external SBOM + vulnerability scan payloads. - Adapter mappings, validation, provenance/signature verification, and fallback rules. - Offline ingest kits (adapters + fixtures) and regression tests. ## Deliverables (CM tasks) - CM1: Mapping tables per tool → StellaOps SBOM/scan schema; required/optional fields; deterministic sort rules. - CM2: Signature/provenance verification policy (acceptable algorithms, trust roots, failure modes). - CM3: Snapshot governance: versioning, freshness SLA, rollback plan for imported feeds. - CM4: Anomaly regression suite (schema drift, nullables, encoding, ordering). Golden fixtures + hashes. - CM5: Offline ingest kit: DSSE-signed adapters/mappings/fixtures with tool versions and hashes. - CM6: Fallback hierarchy when data incomplete (signed SBOM → unsigned SBOM → scan → defaults) with explicit decision trace. - CM7: Source transparency fields (tool name/version/hash, build metadata) persisted and surfaced. - CM8: Benchmark parity plan with upstream tools (pinned versions, hash-logged runs). - CM9: Coverage matrix by ecosystem; gap tracker. - CM10: Retry/backoff/error taxonomy and deterministic diagnostics. ## Determinism & Validation - Adapters must sort components and vulnerabilities deterministically (locale-invariant, stable keys). - All mapping rules and fixtures carry BLAKE3/SHA256 hashes; adapters are pure functions (no network). - Signature verification rejects unverifiable payloads; logs reason codes; can run offline using bundled trust roots. ## Adapter mapping skeleton (CM1) - Tool coverage v0.1: Syft 1.0.x, Trivy 0.50.x, Clair 6.x (pin exact versions in fixtures). - Mapping tables (CSV, checked in under `docs/modules/scanner/fixtures/competitor-adapters/`): - component: external fields → `name`, `version`, `purl`, `type`, `hashes`, `licenses`, `evidenceRef`. - vulnerability: `id`, `source`, `severity` (normalised), `cvss` (score/vector), `fixVersions`, `evidenceRef`. - metadata: tool name/version/hash, scan timestamp (UTC), data source. - Sorting: components by `purl` → `name` → `version`; vulns by `id` → `source` → `severityScore` desc → `cvss.vector`. ## Verification policy (CM2) - Acceptable signatures: DSSE/COSE/JWS with SHA256/Ed25519/ECDSA; trust roots bundled in offline kit. - Provenance check: require signer identity + hash match; if missing, mark provenance = `unknown` and apply fallback (CM6). ## Snapshot governance (CM3) - Freshness budget: max age 7 days from `scanTimestamp`; reject older unless override flag set (logged). - Versioning: stored as `snapshot_version` (semver) and `source_tool_hash`; rollback plan requires prior snapshot hash. ## Regression + fixtures (CM4/CM5) - Fixtures under `docs/modules/scanner/fixtures/competitor-adapters/fixtures/` with golden hashes (BLAKE3/SHA256) and expected normalized output. - `normalized-syft.json` BLAKE3=aa42c167d19535709a10df73dc39e6a50b8efbbb0ae596d17183ce62676fa85a SHA256=3f8684ff341808dcb92e97dd2c10acca727baaff05182e81a4364bb3dad0eaa7 - `normalized-trivy.json` BLAKE3=0da216b49ebcf823d8d4aa3c9c1d2a1dcc579d836ba66bb2ae94dd781e214130 SHA256=c29aa6251d378c2aca1c3c6165e61bd2e16b6fa1227c976417b8a525ad7c1fc1 - `normalized-clair.json` BLAKE3=92985f4cbdeecc8a0e585a70e07f17b07abdd866eecacaca9ba1b331f4b3af68 SHA256=bc232cc19885c53e4d801f5c830e3683a4031e42f6421739c4cc221f33f15e01 - CI step runs adapter → normalized → hash compare; offline, no network. Hashes act as guardrails for deterministic ordering and mapping stability. ## Fallback hierarchy (CM6) 1) Signed SBOM w/ valid provenance → accepted. 2) Unsigned SBOM → accepted with `provenance=unknown`, warnings emitted. 3) Scan-only results → accepted with degraded confidence; policy lattice may penalize. 4) If all absent: reject with reason code `no_evidence`. ## Transparency & coverage (CM7–CM9) - Persist: `source.tool`, `source.version`, `source.hash`, `adapter.version`, `normalized_hash`. - Coverage matrix maintained in `docs/modules/scanner/fixtures/competitor-adapters/coverage.csv` (ecosystem yes/no, notes). Current snapshot (2025-12-03): container/java/python/go/os rows populated; dotnet pending Syft/Clair support. - Bench parity (CM8): pin upstream versions; store run hashes/logs in fixtures folder. ## Error taxonomy (CM10) - Retryable: network/unavailable (should not occur in offline mode), rate-limit, transient IO. - Non-retryable: signature_invalid, schema_invalid, unsupported_version, no_evidence. - All errors must carry deterministic reason codes and be logged in normalized output metadata. ## Offline kit (CM5) - Kit contents: adapter CSVs (one per tool), fixtures + hashes above, coverage matrix, trust roots, signature policy, retry taxonomy, and DSSE envelope referencing every file hash. Bundle path: `out/offline/competitor-ingest-kit-v1/`. ## Decisions (2025-12-03) - Minimal evidence for unsigned SBOM acceptance: must include tool metadata (name/version/hash), component list with purl + SHA256, and scan timestamp; otherwise fallback to scan-only path (CM6 step 3). - Canonical adapter output hash: BLAKE3 primary, SHA256 secondary; both recorded in fixture hash lists and surfaced in normalized metadata (`normalized_hash`). - Signature verification policy is strict fail-closed unless `--allow-unsigned` flag explicitly set; fallback hierarchy still applied but logged. ## Links - Sprint: `docs/implplan/SPRINT_0186_0001_0001_record_deterministic_execution.md` (CM1–CM10) - Advisory: `docs/product-advisories/31-Nov-2025 FINDINGS.md`