Files
git.stella-ops.org/docs/modules/scanner/design/competitor-ingest-normalization.md
StellaOps Bot 35c8f9216f Add tests and implement timeline ingestion options with NATS and Redis subscribers
- Introduced `BinaryReachabilityLifterTests` to validate binary lifting functionality.
- Created `PackRunWorkerOptions` for configuring worker paths and execution persistence.
- Added `TimelineIngestionOptions` for configuring NATS and Redis ingestion transports.
- Implemented `NatsTimelineEventSubscriber` for subscribing to NATS events.
- Developed `RedisTimelineEventSubscriber` for reading from Redis Streams.
- Added `TimelineEnvelopeParser` to normalize incoming event envelopes.
- Created unit tests for `TimelineEnvelopeParser` to ensure correct field mapping.
- Implemented `TimelineAuthorizationAuditSink` for logging authorization outcomes.
2025-12-03 09:46:48 +02:00

5.7 KiB
Raw Blame History

Competitor Ingest Normalization (CM1)

Purpose

Define how external SBOM/scan outputs (Syft, Trivy, Clair) are normalized into StellaOps schemas with deterministic ordering, provenance checks, and offline-ready adapters. Covers CM1CM10 in the 31-Nov-2025 findings advisory.

Scope

  • Import pipeline for external SBOM + vulnerability scan payloads.
  • Adapter mappings, validation, provenance/signature verification, and fallback rules.
  • Offline ingest kits (adapters + fixtures) and regression tests.

Deliverables (CM tasks)

  • CM1: Mapping tables per tool → StellaOps SBOM/scan schema; required/optional fields; deterministic sort rules.
  • CM2: Signature/provenance verification policy (acceptable algorithms, trust roots, failure modes).
  • CM3: Snapshot governance: versioning, freshness SLA, rollback plan for imported feeds.
  • CM4: Anomaly regression suite (schema drift, nullables, encoding, ordering). Golden fixtures + hashes.
  • CM5: Offline ingest kit: DSSE-signed adapters/mappings/fixtures with tool versions and hashes.
  • CM6: Fallback hierarchy when data incomplete (signed SBOM → unsigned SBOM → scan → defaults) with explicit decision trace.
  • CM7: Source transparency fields (tool name/version/hash, build metadata) persisted and surfaced.
  • CM8: Benchmark parity plan with upstream tools (pinned versions, hash-logged runs).
  • CM9: Coverage matrix by ecosystem; gap tracker.
  • CM10: Retry/backoff/error taxonomy and deterministic diagnostics.

Determinism & Validation

  • Adapters must sort components and vulnerabilities deterministically (locale-invariant, stable keys).
  • All mapping rules and fixtures carry BLAKE3/SHA256 hashes; adapters are pure functions (no network).
  • Signature verification rejects unverifiable payloads; logs reason codes; can run offline using bundled trust roots.

Adapter mapping skeleton (CM1)

  • Tool coverage v0.1: Syft 1.0.x, Trivy 0.50.x, Clair 6.x (pin exact versions in fixtures).
  • Mapping tables (CSV, checked in under docs/modules/scanner/fixtures/competitor-adapters/):
    • component: external fields → name, version, purl, type, hashes, licenses, evidenceRef.
    • vulnerability: id, source, severity (normalised), cvss (score/vector), fixVersions, evidenceRef.
    • metadata: tool name/version/hash, scan timestamp (UTC), data source.
  • Sorting: components by purlnameversion; vulns by idsourceseverityScore desc → cvss.vector.

Verification policy (CM2)

  • Acceptable signatures: DSSE/COSE/JWS with SHA256/Ed25519/ECDSA; trust roots bundled in offline kit.
  • Provenance check: require signer identity + hash match; if missing, mark provenance = unknown and apply fallback (CM6).

Snapshot governance (CM3)

  • Freshness budget: max age 7 days from scanTimestamp; reject older unless override flag set (logged).
  • Versioning: stored as snapshot_version (semver) and source_tool_hash; rollback plan requires prior snapshot hash.

Regression + fixtures (CM4/CM5)

  • Fixtures under docs/modules/scanner/fixtures/competitor-adapters/fixtures/ with golden hashes (BLAKE3/SHA256) and expected normalized output.
    • normalized-syft.json BLAKE3=aa42c167d19535709a10df73dc39e6a50b8efbbb0ae596d17183ce62676fa85a SHA256=3f8684ff341808dcb92e97dd2c10acca727baaff05182e81a4364bb3dad0eaa7
    • normalized-trivy.json BLAKE3=0da216b49ebcf823d8d4aa3c9c1d2a1dcc579d836ba66bb2ae94dd781e214130 SHA256=c29aa6251d378c2aca1c3c6165e61bd2e16b6fa1227c976417b8a525ad7c1fc1
    • normalized-clair.json BLAKE3=92985f4cbdeecc8a0e585a70e07f17b07abdd866eecacaca9ba1b331f4b3af68 SHA256=bc232cc19885c53e4d801f5c830e3683a4031e42f6421739c4cc221f33f15e01
  • CI step runs adapter → normalized → hash compare; offline, no network. Hashes act as guardrails for deterministic ordering and mapping stability.

Fallback hierarchy (CM6)

  1. Signed SBOM w/ valid provenance → accepted.
  2. Unsigned SBOM → accepted with provenance=unknown, warnings emitted.
  3. Scan-only results → accepted with degraded confidence; policy lattice may penalize.
  4. If all absent: reject with reason code no_evidence.

Transparency & coverage (CM7CM9)

  • Persist: source.tool, source.version, source.hash, adapter.version, normalized_hash.
  • Coverage matrix maintained in docs/modules/scanner/fixtures/competitor-adapters/coverage.csv (ecosystem yes/no, notes). Current snapshot (2025-12-03): container/java/python/go/os rows populated; dotnet pending Syft/Clair support.
  • Bench parity (CM8): pin upstream versions; store run hashes/logs in fixtures folder.

Error taxonomy (CM10)

  • Retryable: network/unavailable (should not occur in offline mode), rate-limit, transient IO.
  • Non-retryable: signature_invalid, schema_invalid, unsupported_version, no_evidence.
  • All errors must carry deterministic reason codes and be logged in normalized output metadata.

Offline kit (CM5)

  • Kit contents: adapter CSVs (one per tool), fixtures + hashes above, coverage matrix, trust roots, signature policy, retry taxonomy, and DSSE envelope referencing every file hash. Bundle path: out/offline/competitor-ingest-kit-v1/.

Decisions (2025-12-03)

  • Minimal evidence for unsigned SBOM acceptance: must include tool metadata (name/version/hash), component list with purl + SHA256, and scan timestamp; otherwise fallback to scan-only path (CM6 step 3).
  • Canonical adapter output hash: BLAKE3 primary, SHA256 secondary; both recorded in fixture hash lists and surfaced in normalized metadata (normalized_hash).
  • Signature verification policy is strict fail-closed unless --allow-unsigned flag explicitly set; fallback hierarchy still applied but logged.
  • Sprint: docs/implplan/SPRINT_0186_0001_0001_record_deterministic_execution.md (CM1CM10)
  • Advisory: docs/product-advisories/31-Nov-2025 FINDINGS.md