# Data Flows This document details the data flows for SBOM generation, advisory ingestion, policy evaluation, and VEX processing in StellaOps. All flows are designed for deterministic, offline-first operation. ## Table of Contents - [1. SBOM Data Lifecycle](#1-sbom-data-lifecycle) - [2. Advisory Data Flow](#2-advisory-data-flow) - [3. VEX Data Flow](#3-vex-data-flow) - [4. Policy Evaluation Data Flow](#4-policy-evaluation-data-flow) - [5. Event-Driven Flows](#5-event-driven-flows) - [6. Offline/Air-Gap Data Flow](#6-offlineair-gap-data-flow) --- ## 1. SBOM Data Lifecycle ### 1.1 Generation Phase (Scanner) ``` Image Input (OCI reference) | v +--------------------------------------------------------------------------------------------+ | Scanner.Worker | | | | +----------------------+ +----------------------+ +----------------------+ | | | Layer Extraction |---->| Delta Cache Check |---->| Analyzer Execution | | | | (OCI manifest) | | (Valkey layers:*) | | (11 language + OS) | | | +----------------------+ +----------------------+ +----------------------+ | | | | | | | Cache Hit | Cache Miss | | v v | | +------------------+ +------------------+ | | | Stitch Existing | | Full Analysis | | | | SBOM Fragments | | (20ms fast path) | | | +------------------+ +------------------+ | | | | | | +-------------+---------------+ | | | | | v | | +-----------------------------------------------------------------------------------------+ | | Component Discovery | | | | | | +---------------+ +---------------+ +---------------+ +---------------+ | | | | OS Packages | | Language Deps | | Native Bins | | Call Graphs | | | | | (Apk/Dpkg/Rpm)| | (11 ecosystems)| | (ELF/PE/Mach) | | (Reachability)| | | | +---------------+ +---------------+ +---------------+ +---------------+ | | +-----------------------------------------------------------------------------------------+ | | | | v | | +-----------------------------------------------------------------------------------------+ | | SBOM Generation (Two Views) | | | | | | Inventory View: Usage View: | | | - All components in filesystem - Entrypoint closure only | | | - Declared + transitive + vendored - Actually linked libraries | | | - Path: images/{digest}/inventory.* - Path: images/{digest}/usage.* | | +-----------------------------------------------------------------------------------------+ | | | | v | | +-----------------------------------------------------------------------------------------+ | | Format Output | | | | | | +-------------------+ +-------------------+ +-------------------+ | | | | CycloneDX 1.6 JSON| | CycloneDX Protobuf| | SPDX 3.0.1 JSON | | | | | (.cdx.json) | | (.cdx.pb, compact)| | (.spdx.json) | | | | +-------------------+ +-------------------+ +-------------------+ | | +-----------------------------------------------------------------------------------------+ +--------------------------------------------------------------------------------------------+ ``` ### 1.2 Storage Phase ``` +--------------------------------------------------------------------------------------------+ | Dual-Write Coordination | | | | +------------------------------------------+ +------------------------------------------+| | | PostgreSQL (scanner schema) | | RustFS (S3 API) || | | | | || | | artifacts table: | | Blob Layout: || | | - artifact_id (sha256) | | blobs/{sha256_prefix}/ || | | - image_digest | | sbom.json (payload) || | | - format (cdx-json, spdx-json, etc.) | | sbom.meta.json (wrapper) || | | - created_at | | sbom.cdx.pb (binary) || | | - rekor_proof (optional) | | || | | | | Wrapper Envelope: || | | images table: | | { || | | - image_digest | | "id": "sha256:417f...", || | | - repository | | "imageDigest": "sha256:e2b9...", || | | - tag | | "format": "cdx-json", || | | - architecture | | "layers": ["sha256:..."], || | | | | "partial": false, || | | layers table: | | "provenanceId": "prov_0291" || | | - layer_digest | | } || | | - media_type | | || | | - size | | || | +------------------------------------------+ +------------------------------------------+| +--------------------------------------------------------------------------------------------+ ``` ### 1.3 Index & Cache Phase (Valkey) ``` +--------------------------------------------------------------------------------------------+ | Valkey Keyspace for SBOM | | | | +------------------------------------------+ +------------------------------------------+| | | Key Pattern | | Purpose || | +------------------------------------------+ +------------------------------------------+| | | scan:{digest} | | Last scan JSON result || | | layers:{digest} | | Set of layer digests (90d TTL) || | | locator:{imageDigest} | | sbomBlobId mapping (30d TTL) || | +------------------------------------------+ +------------------------------------------+| | | | Delta SBOM Flow: | | 1. Check layers:{digest} for cached layers | | 2. Scan only missing layers (partial=true) | | 3. Stitch new data onto cached full SBOM | | 4. Update locator mapping | | 5. Fast path: 20ms for unchanged layers | +--------------------------------------------------------------------------------------------+ ``` ### 1.4 Consumption Phase ``` +--------------------------------------------------------------------------------------------+ | SBOM Consumers | | | | +-----------------+ +-----------------+ +-----------------+ | | | Policy Engine | | Export Center | | Replay Engine | | | +-----------------+ +-----------------+ +-----------------+ | | | | | | | v v v | | +-------------+ +-------------+ +-------------+ | | | Read SBOM | | Retrieve | | Link to | | | | from RustFS | | by digest | | scan ID | | | +-------------+ +-------------+ +-------------+ | | | | | | | v v v | | +-------------+ +-------------+ +-------------+ | | | Join with | | Convert | | Replay with | | | | advisories | | format | | same inputs | | | +-------------+ +-------------+ +-------------+ | | | | | | | v v v | | +-------------+ +-------------+ +-------------+ | | | Apply | | Generate | | Verify | | | | reachability| | SARIF | | determinism | | | +-------------+ +-------------+ +-------------+ | +--------------------------------------------------------------------------------------------+ ``` --- ## 2. Advisory Data Flow ### 2.1 Ingestion (Concelier) ``` +--------------------------------------------------------------------------------------------+ | Advisory Ingestion Pipeline | | | | External Sources Concelier.Worker | | +---------------+ +-------------------------------------------+ | | | NVD |----------------------->| | | | | Red Hat | | 1. Fetch advisories (HTTP/mirror) | | | | OSV | | 2. For air-gap: use mirror bundles first | | | | GHSA | | 3. Validate schema conformance | | | | CSAF sources | | 4. Normalize to canonical observations | | | +---------------+ | 5. Apply AOC (Aggregation-Only Contract) | | | | 6. Persist raw documents (append-only) | | | | 7. Build linksets (advisory -> PURL) | | | | 8. Publish delta event | | | +-------------------------------------------+ | | | | | v | | +-----------------------------------------------------------------------------------------+ | | PostgreSQL (vuln schema) | | | | | | advisory_raw (append-only): linksets: | | | - raw_document (JSON as-received) - advisory_id -> purl[] | | | - source (NVD, RED_HAT, OSV, etc.) - Used for SBOM join in Policy Engine | | | - advisory_id (CVE-2024-xxxx) | | | - affected_purls observations: | | | - published_at (UTC) - Normalized advisory metadata | | | - revision - Severity, description, references | | +-----------------------------------------------------------------------------------------+ | | | | v | | +-----------------------------------------------------------------------------------------+ | | Event: concelier:drift (Valkey Stream) | | | | | | Triggers: | | | - Scheduler: identifies affected scans | | | - Policy Engine: re-evaluation of impacted findings | | | - Notify: critical vuln alerts | | +-----------------------------------------------------------------------------------------+ +--------------------------------------------------------------------------------------------+ ``` ### 2.2 Advisory Data Model ``` +--------------------------------------------------------------------------------------------+ | Raw Layer (Immutable - AOC Enforced) | | | | { | | "advisoryId": "CVE-2024-1234", | | "source": "NVD", | | "rawDocument": { /* original JSON as received */ }, | | "publishedAt": "2024-01-15T10:00:00Z", | | "revision": 2, | | "affectedPurls": [ | | "pkg:npm/lodash@4.17.20", | | "pkg:maven/org.apache.struts/struts2-core@2.5.30" | | ], | | "severity": { | | "cvssV3": { "baseScore": 9.8, "vector": "..." }, | | "cvssV4": { "baseScore": 9.2, "vector": "..." } | | } | | } | | | | Key Constraints: | | - Raw documents are NEVER modified after ingestion | | - Conflicts are preserved, not collapsed | | - Multiple sources for same CVE stored separately | | - Provenance tracked per observation | +--------------------------------------------------------------------------------------------+ ``` --- ## 3. VEX Data Flow ### 3.1 VEX Ingestion (Excititor) ``` +--------------------------------------------------------------------------------------------+ | VEX Ingestion Pipeline | | | | External Sources Excititor.Worker | | +---------------+ +-------------------------------------------+ | | | OpenVEX |----------------------->| | | | | CSAF VEX | | 1. Fetch VEX statements | | | | SBOM referrers| | 2. For air-gap: use offline bundles | | | | Vendor feeds | | 3. Verify signatures (if signed) | | | +---------------+ | 4. Normalize to canonical shape | | | | 5. Persist immutable raw statements | | | | 6. Publish to VexLens for consensus | | | +-------------------------------------------+ | | | | | v | | +-----------------------------------------------------------------------------------------+ | | PostgreSQL (vex schema) | | | | | | vex_raw (append-only): | | | - raw_statement (OpenVEX JSON as-received) | | | - issuer_id (vendor or trust issuer) | | | - component_purl | | | - vulnerability_id (CVE or GHSA) | | | - status (not_affected, affected, under_investigation) | | | - justification (component_not_present, vulnerable_code_not_present, etc.) | | | - published_at | | | - signature (DSSE envelope if signed) | | +-----------------------------------------------------------------------------------------+ +--------------------------------------------------------------------------------------------+ ``` ### 3.2 VEX Consensus (VexLens) ``` +--------------------------------------------------------------------------------------------+ | VEX Consensus Pipeline | | | | +--------------------+ | | | Multiple VEX | | | | statements for | | | | same (CVE, PURL) | | | +--------------------+ | | | | | v | | +--------------------------------------------------------------------------------------------+ | | VexLens Consensus Engine | | | | | | 1. Merge observations by component identity (PURL) | | | 2. Apply issuer priority rules: | | | - Vendor > Distro > Researcher > Community | | | 3. Apply trust scores (from IssuerDirectory) | | | 4. Detect conflicts (multiple issuers disagree) | | | 5. Preserve conflict state (K4 lattice: True + False = Conflict) | | | 6. Export consensus outcomes | | +--------------------------------------------------------------------------------------------+ | | | | v | | +--------------------------------------------------------------------------------------------+ | | Policy Engine Integration | | | | | | VEX gates override severity thresholds: | | | - not_affected -> PASS (with evidence) | | | - affected -> normal evaluation continues | | | - under_investigation -> WARN (pending) | | +--------------------------------------------------------------------------------------------+ +--------------------------------------------------------------------------------------------+ ``` --- ## 4. Policy Evaluation Data Flow ### 4.1 Input Assembly ``` +--------------------------------------------------------------------------------------------+ | Policy Engine Input Sources (All Immutable - AOC Enforced) | | | | +------------------+ +------------------+ +------------------+ +------------------+ | | | SBOM | | Advisory | | VEX | | Reachability | | | | (from RustFS) | | (from vuln.*) | | (from vex.*) | | (from Scanner) | | | +------------------+ +------------------+ +------------------+ +------------------+ | | | | | | | | +--------------------+--------------------+--------------------+ | | | | | v | | +-----------------------------------------------------------------------------------------+ | | Selection Layer | | | | | | Deterministic Joining: | | | - SBOM <-> Advisory (by PURL matching) | | | - Advisory <-> VEX (by CVE + PURL) | | | - Component <-> Reachability (by identity) | | | | | | Batch Ordering: | | | - Sort by (tenant, policyId, vulnerabilityId, productKey, source) | | | - Enables incremental cursor-based processing | | +-----------------------------------------------------------------------------------------+ +--------------------------------------------------------------------------------------------+ ``` ### 4.2 Evaluation Pipeline ``` +--------------------------------------------------------------------------------------------+ | Policy Evaluation Pipeline | | | | +-----------------------------------------------------------------------------------------+ | | 1. Load Policy IR (cached by policyId+version hash) | | +-----------------------------------------------------------------------------------------+ | | | | v | | +-----------------------------------------------------------------------------------------+ | | 2. For Each Batch (component, vulnerability): | | | | | | +------------------+ | | | | Evidence-Weighted| severityWeight (from advisory CVSS) | | | | Score Compute |---> trustWeight (from VEX issuer) | | | +------------------+ reachabilityWeight (from Scanner entrypoint closure) | | | | runtimeWeight (from Zastava signals) | | | v | | | +------------------+ | | | | Policy Rules | First-match semantics | | | | Execution |---> Actions: assign, annotate, escalate, warn | | | +------------------+ | | | | | | | v | | | +------------------+ | | | | Exception Apply | Specificity-ranked | | | | |---> Effects: suppress, defer, downgrade, require-control | | | +------------------+ | | | | | | | v | | | +------------------+ | | | | Unknown Budget | Per-environment limits | | | | Check |---> Block if exceeded, Warn if approaching | | | +------------------+ | | | | | | | v | | | +------------------+ | | | | Confidence Calc | 5 factors: reachability, runtime, VEX, provenance, policy | | | | |---> Final = Clamp01(Sum(Weight x RawValue)) | | | +------------------+ Tiers: VeryHigh(>=0.9), High(>=0.7), Medium(>=0.5), etc. | | | | | | | v | | | +------------------+ | | | | VEX Decision | Emit OpenVEX statements for verdict changes | | | | Emission |---> DSSE-signed, logged to Rekor v2 | | | +------------------+ | | +-----------------------------------------------------------------------------------------+ +--------------------------------------------------------------------------------------------+ ``` ### 4.3 Output Materialization ``` +--------------------------------------------------------------------------------------------+ | Finding Materialization | | | | +------------------------------------------+ +------------------------------------------+| | | Current Snapshot | | History (Audit Trail) || | | policy.effective_finding_{policyId} | | policy.effective_finding_{policyId}_history| | +------------------------------------------+ +------------------------------------------+| | | - finding_key (deterministic digest) | | - All snapshots with timestamps || | | - severity (CRITICAL, HIGH, etc.) | | - Previous verdicts || | | - source (NVD, vendor, research) | | - Provenance chain || | | - advisory_raw_ids (back-link) | | || | | - vex_raw_ids (if overridden) | | || | | - sbom_component_id (inventory link) | | || | | - verdict (PASS, BLOCK, WARN, FAIL) | | || | | - confidence_score (0-100) | | || | | - explained_trace (policy rule hits) | | || | +------------------------------------------+ +------------------------------------------+| | | | Determinism Hash: | | SHA256(policyVersion + batchCursor + inputsHash) | | -> Same inputs always produce same outputs | +--------------------------------------------------------------------------------------------+ ``` --- ## 5. Event-Driven Flows ### 5.1 Event Bus Architecture ``` +--------------------------------------------------------------------------------------------+ | Valkey Streams / NATS JetStream Event Bus | | | | +-----------------------------------------------------------------------------------------+ | | Stream: scanner:events | | | Events: scan.submitted, scan.started, scan.completed, scan.failed | | | Consumers: Policy, Notify, TimelineIndexer, ExportCenter | | +-----------------------------------------------------------------------------------------+ | | | +-----------------------------------------------------------------------------------------+ | | Stream: concelier:drift | | | Events: advisory.new, advisory.updated, advisory.withdrawn | | | Consumers: Scheduler, Policy, Notify | | +-----------------------------------------------------------------------------------------+ | | | +-----------------------------------------------------------------------------------------+ | | Stream: policy:evaluated | | | Events: evaluation.completed, verdict.changed, exception.applied | | | Consumers: Notify, Findings, ExportCenter | | +-----------------------------------------------------------------------------------------+ | | | +-----------------------------------------------------------------------------------------+ | | Stream: scheduler:jobs | | | Events: run.started, run.completed, run.failed, rescan.triggered | | | Consumers: Scanner, Notify, TimelineIndexer | | +-----------------------------------------------------------------------------------------+ | | | +-----------------------------------------------------------------------------------------+ | | Stream: notify:delivery | | | Events: notification.sent, notification.failed, notification.throttled | | | Consumers: Audit, TimelineIndexer | | +-----------------------------------------------------------------------------------------+ +--------------------------------------------------------------------------------------------+ ``` ### 5.2 Scan Completion Event Flow ``` scan.completed event | +-------------------+-------------------+-------------------+ | | | | v v v v +----------------+ +----------------+ +----------------+ +----------------+ | Policy.Engine | | Notify.Worker | |TimelineIndexer | | ExportCenter | +----------------+ +----------------+ +----------------+ +----------------+ | | | | | | | | | Evaluate with | | Check rules | | Index event | | Generate SARIF | | new SBOM data | | for matches | | for audit | | if configured | | | | | | | | | +-------+--------+ +-------+--------+ +-------+--------+ +----------------+ | | | v v v policy:evaluated notification.sent event indexed | | +-------------------+ | v Downstream consumers ``` --- ## 6. Offline/Air-Gap Data Flow ### 6.1 Offline Kit Contents ``` +--------------------------------------------------------------------------------------------+ | Offline Update Kit Structure | | | | offline-kit-2025-01-02/ | | +-- feeds/ | | | +-- nvd/ # NVD advisory snapshots | | | +-- osv/ # OSV advisory snapshots | | | +-- ghsa/ # GHSA advisory snapshots | | | +-- vex/ # VEX statement bundles | | +-- images/ # Container images for platform services | | +-- sboms/ # Pre-generated SBOMs for bundled images | | +-- signatures/ # DSSE-signed bundles | | | +-- feeds.dsse # Signed feed manifest | | | +-- images.dsse # Signed image manifest | | +-- trust-roots/ # CA certificates, JWKS | | +-- policies/ # Default policy definitions | | +-- manifest.json # Kit contents and checksums | | +-- manifest.dsse # Signed manifest | +--------------------------------------------------------------------------------------------+ ``` ### 6.2 Offline Ingestion Flow ``` +--------------------------------------------------------------------------------------------+ | Air-Gap Ingestion Pipeline | | | | USB/Portable Media AirGap.Importer | | +---------------+ +-------------------------------------------+ | | | Offline Kit |----------------------->| | | | | (signed) | | 1. Verify manifest signature (DSSE) | | | +---------------+ | 2. Validate checksums | | | | 3. Import feeds to Concelier | | | | 4. Import VEX to Excititor | | | | 5. Update trust roots | | | | 6. Trigger re-evaluation | | | +-------------------------------------------+ | | | | Key Guarantees: | | - No network calls during import | | - All verification is local | | - Deterministic outputs match online mode | | - Full audit trail preserved | +--------------------------------------------------------------------------------------------+ ``` --- ## Related Documentation - [User Flows](user-flows.md) - [Module Matrix](module-matrix.md) - [Schema Mapping](schema-mapping.md) - [Data Schemas](../../11_DATA_SCHEMAS.md) - [Offline Kit](../../OFFLINE_KIT.md)