Files
git.stella-ops.org/docs/technical/architecture/data-flows.md

40 KiB

Data Flows

This document details the data flows for SBOM generation, advisory ingestion, policy evaluation, and VEX processing in StellaOps. All flows are designed for deterministic, offline-first operation.

Table of Contents


1. SBOM Data Lifecycle

1.1 Generation Phase (Scanner)

Image Input (OCI reference)
         |
         v
+--------------------------------------------------------------------------------------------+
|  Scanner.Worker                                                                            |
|                                                                                            |
|  +----------------------+     +----------------------+     +----------------------+        |
|  |  Layer Extraction    |---->|  Delta Cache Check   |---->|  Analyzer Execution  |        |
|  |  (OCI manifest)      |     |  (Valkey layers:*)   |     |  (11 language + OS)  |        |
|  +----------------------+     +----------------------+     +----------------------+        |
|                                       |                             |                      |
|                                       | Cache Hit                   | Cache Miss           |
|                                       v                             v                      |
|                            +------------------+           +------------------+             |
|                            | Stitch Existing  |           | Full Analysis    |             |
|                            | SBOM Fragments   |           | (20ms fast path) |             |
|                            +------------------+           +------------------+             |
|                                       |                             |                      |
|                                       +-------------+---------------+                      |
|                                                     |                                      |
|                                                     v                                      |
|  +-----------------------------------------------------------------------------------------+
|  |  Component Discovery                                                                    |
|  |                                                                                         |
|  |  +---------------+  +---------------+  +---------------+  +---------------+             |
|  |  | OS Packages   |  | Language Deps |  | Native Bins   |  | Call Graphs   |             |
|  |  | (Apk/Dpkg/Rpm)|  | (11 ecosystems)|  | (ELF/PE/Mach) |  | (Reachability)|             |
|  |  +---------------+  +---------------+  +---------------+  +---------------+             |
|  +-----------------------------------------------------------------------------------------+
|                                                     |                                      |
|                                                     v                                      |
|  +-----------------------------------------------------------------------------------------+
|  |  SBOM Generation (Two Views)                                                            |
|  |                                                                                         |
|  |  Inventory View:                      Usage View:                                       |
|  |  - All components in filesystem       - Entrypoint closure only                         |
|  |  - Declared + transitive + vendored   - Actually linked libraries                       |
|  |  - Path: images/{digest}/inventory.*  - Path: images/{digest}/usage.*                   |
|  +-----------------------------------------------------------------------------------------+
|                                                     |                                      |
|                                                     v                                      |
|  +-----------------------------------------------------------------------------------------+
|  |  Format Output                                                                          |
|  |                                                                                         |
|  |  +-------------------+  +-------------------+  +-------------------+                    |
|  |  | CycloneDX 1.6 JSON|  | CycloneDX Protobuf|  | SPDX 3.0.1 JSON   |                    |
|  |  | (.cdx.json)       |  | (.cdx.pb, compact)|  | (.spdx.json)      |                    |
|  |  +-------------------+  +-------------------+  +-------------------+                    |
|  +-----------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+

1.2 Storage Phase

+--------------------------------------------------------------------------------------------+
|  Dual-Write Coordination                                                                   |
|                                                                                            |
|  +------------------------------------------+  +------------------------------------------+|
|  |  PostgreSQL (scanner schema)             |  |  RustFS (S3 API)                         ||
|  |                                          |  |                                          ||
|  |  artifacts table:                        |  |  Blob Layout:                            ||
|  |  - artifact_id (sha256)                  |  |  blobs/{sha256_prefix}/                  ||
|  |  - image_digest                          |  |    sbom.json (payload)                   ||
|  |  - format (cdx-json, spdx-json, etc.)    |  |    sbom.meta.json (wrapper)              ||
|  |  - created_at                            |  |    sbom.cdx.pb (binary)                  ||
|  |  - rekor_proof (optional)                |  |                                          ||
|  |                                          |  |  Wrapper Envelope:                       ||
|  |  images table:                           |  |  {                                       ||
|  |  - image_digest                          |  |    "id": "sha256:417f...",               ||
|  |  - repository                            |  |    "imageDigest": "sha256:e2b9...",      ||
|  |  - tag                                   |  |    "format": "cdx-json",                 ||
|  |  - architecture                          |  |    "layers": ["sha256:..."],             ||
|  |                                          |  |    "partial": false,                     ||
|  |  layers table:                           |  |    "provenanceId": "prov_0291"           ||
|  |  - layer_digest                          |  |  }                                       ||
|  |  - media_type                            |  |                                          ||
|  |  - size                                  |  |                                          ||
|  +------------------------------------------+  +------------------------------------------+|
+--------------------------------------------------------------------------------------------+

1.3 Index & Cache Phase (Valkey)

+--------------------------------------------------------------------------------------------+
|  Valkey Keyspace for SBOM                                                                  |
|                                                                                            |
|  +------------------------------------------+  +------------------------------------------+|
|  |  Key Pattern                             |  |  Purpose                                 ||
|  +------------------------------------------+  +------------------------------------------+|
|  |  scan:{digest}                           |  |  Last scan JSON result                   ||
|  |  layers:{digest}                         |  |  Set of layer digests (90d TTL)         ||
|  |  locator:{imageDigest}                   |  |  sbomBlobId mapping (30d TTL)           ||
|  +------------------------------------------+  +------------------------------------------+|
|                                                                                            |
|  Delta SBOM Flow:                                                                          |
|  1. Check layers:{digest} for cached layers                                                |
|  2. Scan only missing layers (partial=true)                                                |
|  3. Stitch new data onto cached full SBOM                                                  |
|  4. Update locator mapping                                                                 |
|  5. Fast path: 20ms for unchanged layers                                                   |
+--------------------------------------------------------------------------------------------+

1.4 Consumption Phase

+--------------------------------------------------------------------------------------------+
|  SBOM Consumers                                                                            |
|                                                                                            |
|  +-----------------+     +-----------------+     +-----------------+                       |
|  | Policy Engine   |     | Export Center   |     | Replay Engine   |                       |
|  +-----------------+     +-----------------+     +-----------------+                       |
|         |                       |                       |                                  |
|         v                       v                       v                                  |
|  +-------------+         +-------------+         +-------------+                           |
|  | Read SBOM   |         | Retrieve    |         | Link to     |                           |
|  | from RustFS |         | by digest   |         | scan ID     |                           |
|  +-------------+         +-------------+         +-------------+                           |
|         |                       |                       |                                  |
|         v                       v                       v                                  |
|  +-------------+         +-------------+         +-------------+                           |
|  | Join with   |         | Convert     |         | Replay with |                           |
|  | advisories  |         | format      |         | same inputs |                           |
|  +-------------+         +-------------+         +-------------+                           |
|         |                       |                       |                                  |
|         v                       v                       v                                  |
|  +-------------+         +-------------+         +-------------+                           |
|  | Apply       |         | Generate    |         | Verify      |                           |
|  | reachability|         | SARIF       |         | determinism |                           |
|  +-------------+         +-------------+         +-------------+                           |
+--------------------------------------------------------------------------------------------+

2. Advisory Data Flow

2.1 Ingestion (Concelier)

+--------------------------------------------------------------------------------------------+
|  Advisory Ingestion Pipeline                                                               |
|                                                                                            |
|  External Sources                         Concelier.Worker                                 |
|  +---------------+                        +-------------------------------------------+    |
|  | NVD           |----------------------->|                                           |    |
|  | Red Hat       |                        |  1. Fetch advisories (HTTP/mirror)        |    |
|  | OSV           |                        |  2. For air-gap: use mirror bundles first |    |
|  | GHSA          |                        |  3. Validate schema conformance           |    |
|  | CSAF sources  |                        |  4. Normalize to canonical observations   |    |
|  +---------------+                        |  5. Apply AOC (Aggregation-Only Contract) |    |
|                                           |  6. Persist raw documents (append-only)   |    |
|                                           |  7. Build linksets (advisory -> PURL)     |    |
|                                           |  8. Publish delta event                   |    |
|                                           +-------------------------------------------+    |
|                                                     |                                      |
|                                                     v                                      |
|  +-----------------------------------------------------------------------------------------+
|  |  PostgreSQL (vuln schema)                                                               |
|  |                                                                                         |
|  |  advisory_raw (append-only):            linksets:                                       |
|  |  - raw_document (JSON as-received)      - advisory_id -> purl[]                         |
|  |  - source (NVD, RED_HAT, OSV, etc.)     - Used for SBOM join in Policy Engine           |
|  |  - advisory_id (CVE-2024-xxxx)                                                          |
|  |  - affected_purls                       observations:                                   |
|  |  - published_at (UTC)                   - Normalized advisory metadata                  |
|  |  - revision                             - Severity, description, references             |
|  +-----------------------------------------------------------------------------------------+
|                                                     |                                      |
|                                                     v                                      |
|  +-----------------------------------------------------------------------------------------+
|  |  Event: concelier:drift (Valkey Stream)                                                 |
|  |                                                                                         |
|  |  Triggers:                                                                              |
|  |  - Scheduler: identifies affected scans                                                 |
|  |  - Policy Engine: re-evaluation of impacted findings                                    |
|  |  - Notify: critical vuln alerts                                                         |
|  +-----------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+

2.2 Advisory Data Model

+--------------------------------------------------------------------------------------------+
|  Raw Layer (Immutable - AOC Enforced)                                                      |
|                                                                                            |
|  {                                                                                         |
|    "advisoryId": "CVE-2024-1234",                                                          |
|    "source": "NVD",                                                                        |
|    "rawDocument": { /* original JSON as received */ },                                     |
|    "publishedAt": "2024-01-15T10:00:00Z",                                                  |
|    "revision": 2,                                                                          |
|    "affectedPurls": [                                                                      |
|      "pkg:npm/lodash@4.17.20",                                                             |
|      "pkg:maven/org.apache.struts/struts2-core@2.5.30"                                     |
|    ],                                                                                      |
|    "severity": {                                                                           |
|      "cvssV3": { "baseScore": 9.8, "vector": "..." },                                      |
|      "cvssV4": { "baseScore": 9.2, "vector": "..." }                                       |
|    }                                                                                       |
|  }                                                                                         |
|                                                                                            |
|  Key Constraints:                                                                          |
|  - Raw documents are NEVER modified after ingestion                                        |
|  - Conflicts are preserved, not collapsed                                                  |
|  - Multiple sources for same CVE stored separately                                         |
|  - Provenance tracked per observation                                                      |
+--------------------------------------------------------------------------------------------+

3. VEX Data Flow

3.1 VEX Ingestion (Excititor)

+--------------------------------------------------------------------------------------------+
|  VEX Ingestion Pipeline                                                                    |
|                                                                                            |
|  External Sources                         Excititor.Worker                                 |
|  +---------------+                        +-------------------------------------------+    |
|  | OpenVEX       |----------------------->|                                           |    |
|  | CSAF VEX      |                        |  1. Fetch VEX statements                  |    |
|  | SBOM referrers|                        |  2. For air-gap: use offline bundles      |    |
|  | Vendor feeds  |                        |  3. Verify signatures (if signed)         |    |
|  +---------------+                        |  4. Normalize to canonical shape          |    |
|                                           |  5. Persist immutable raw statements      |    |
|                                           |  6. Publish to VexLens for consensus      |    |
|                                           +-------------------------------------------+    |
|                                                     |                                      |
|                                                     v                                      |
|  +-----------------------------------------------------------------------------------------+
|  |  PostgreSQL (vex schema)                                                                |
|  |                                                                                         |
|  |  vex_raw (append-only):                                                                 |
|  |  - raw_statement (OpenVEX JSON as-received)                                             |
|  |  - issuer_id (vendor or trust issuer)                                                   |
|  |  - component_purl                                                                       |
|  |  - vulnerability_id (CVE or GHSA)                                                       |
|  |  - status (not_affected, affected, under_investigation)                                 |
|  |  - justification (component_not_present, vulnerable_code_not_present, etc.)            |
|  |  - published_at                                                                         |
|  |  - signature (DSSE envelope if signed)                                                  |
|  +-----------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+

3.2 VEX Consensus (VexLens)

+--------------------------------------------------------------------------------------------+
|  VEX Consensus Pipeline                                                                    |
|                                                                                            |
|  +--------------------+                                                                    |
|  | Multiple VEX       |                                                                    |
|  | statements for     |                                                                    |
|  | same (CVE, PURL)   |                                                                    |
|  +--------------------+                                                                    |
|           |                                                                                |
|           v                                                                                |
|  +--------------------------------------------------------------------------------------------+
|  |  VexLens Consensus Engine                                                               |
|  |                                                                                         |
|  |  1. Merge observations by component identity (PURL)                                     |
|  |  2. Apply issuer priority rules:                                                        |
|  |     - Vendor > Distro > Researcher > Community                                          |
|  |  3. Apply trust scores (from IssuerDirectory)                                           |
|  |  4. Detect conflicts (multiple issuers disagree)                                        |
|  |  5. Preserve conflict state (K4 lattice: True + False = Conflict)                       |
|  |  6. Export consensus outcomes                                                           |
|  +--------------------------------------------------------------------------------------------+
|           |                                                                                |
|           v                                                                                |
|  +--------------------------------------------------------------------------------------------+
|  |  Policy Engine Integration                                                              |
|  |                                                                                         |
|  |  VEX gates override severity thresholds:                                                |
|  |  - not_affected -> PASS (with evidence)                                                 |
|  |  - affected -> normal evaluation continues                                              |
|  |  - under_investigation -> WARN (pending)                                                |
|  +--------------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+

4. Policy Evaluation Data Flow

4.1 Input Assembly

+--------------------------------------------------------------------------------------------+
|  Policy Engine Input Sources (All Immutable - AOC Enforced)                                |
|                                                                                            |
|  +------------------+  +------------------+  +------------------+  +------------------+    |
|  |     SBOM         |  |   Advisory       |  |      VEX         |  |  Reachability    |    |
|  | (from RustFS)    |  | (from vuln.*)    |  | (from vex.*)     |  | (from Scanner)   |    |
|  +------------------+  +------------------+  +------------------+  +------------------+    |
|           |                    |                    |                    |                 |
|           +--------------------+--------------------+--------------------+                 |
|                                |                                                           |
|                                v                                                           |
|  +-----------------------------------------------------------------------------------------+
|  |  Selection Layer                                                                        |
|  |                                                                                         |
|  |  Deterministic Joining:                                                                 |
|  |  - SBOM <-> Advisory (by PURL matching)                                                 |
|  |  - Advisory <-> VEX (by CVE + PURL)                                                     |
|  |  - Component <-> Reachability (by identity)                                             |
|  |                                                                                         |
|  |  Batch Ordering:                                                                        |
|  |  - Sort by (tenant, policyId, vulnerabilityId, productKey, source)                      |
|  |  - Enables incremental cursor-based processing                                          |
|  +-----------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+

4.2 Evaluation Pipeline

+--------------------------------------------------------------------------------------------+
|  Policy Evaluation Pipeline                                                                |
|                                                                                            |
|  +-----------------------------------------------------------------------------------------+
|  |  1. Load Policy IR (cached by policyId+version hash)                                    |
|  +-----------------------------------------------------------------------------------------+
|                                     |                                                      |
|                                     v                                                      |
|  +-----------------------------------------------------------------------------------------+
|  |  2. For Each Batch (component, vulnerability):                                          |
|  |                                                                                         |
|  |  +------------------+                                                                   |
|  |  | Evidence-Weighted|     severityWeight (from advisory CVSS)                           |
|  |  | Score Compute    |---> trustWeight (from VEX issuer)                                 |
|  |  +------------------+     reachabilityWeight (from Scanner entrypoint closure)          |
|  |           |               runtimeWeight (from Zastava signals)                          |
|  |           v                                                                             |
|  |  +------------------+                                                                   |
|  |  | Policy Rules     |     First-match semantics                                         |
|  |  | Execution        |---> Actions: assign, annotate, escalate, warn                     |
|  |  +------------------+                                                                   |
|  |           |                                                                             |
|  |           v                                                                             |
|  |  +------------------+                                                                   |
|  |  | Exception Apply  |     Specificity-ranked                                            |
|  |  |                  |---> Effects: suppress, defer, downgrade, require-control          |
|  |  +------------------+                                                                   |
|  |           |                                                                             |
|  |           v                                                                             |
|  |  +------------------+                                                                   |
|  |  | Unknown Budget   |     Per-environment limits                                        |
|  |  | Check            |---> Block if exceeded, Warn if approaching                        |
|  |  +------------------+                                                                   |
|  |           |                                                                             |
|  |           v                                                                             |
|  |  +------------------+                                                                   |
|  |  | Confidence Calc  |     5 factors: reachability, runtime, VEX, provenance, policy     |
|  |  |                  |---> Final = Clamp01(Sum(Weight x RawValue))                       |
|  |  +------------------+     Tiers: VeryHigh(>=0.9), High(>=0.7), Medium(>=0.5), etc.       |
|  |           |                                                                             |
|  |           v                                                                             |
|  |  +------------------+                                                                   |
|  |  | VEX Decision     |     Emit OpenVEX statements for verdict changes                   |
|  |  | Emission         |---> DSSE-signed, logged to Rekor v2                               |
|  |  +------------------+                                                                   |
|  +-----------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+

4.3 Output Materialization

+--------------------------------------------------------------------------------------------+
|  Finding Materialization                                                                   |
|                                                                                            |
|  +------------------------------------------+  +------------------------------------------+|
|  |  Current Snapshot                        |  |  History (Audit Trail)                   ||
|  |  policy.effective_finding_{policyId}     |  |  policy.effective_finding_{policyId}_history|
|  +------------------------------------------+  +------------------------------------------+|
|  |  - finding_key (deterministic digest)    |  |  - All snapshots with timestamps         ||
|  |  - severity (CRITICAL, HIGH, etc.)       |  |  - Previous verdicts                     ||
|  |  - source (NVD, vendor, research)        |  |  - Provenance chain                      ||
|  |  - advisory_raw_ids (back-link)          |  |                                          ||
|  |  - vex_raw_ids (if overridden)           |  |                                          ||
|  |  - sbom_component_id (inventory link)    |  |                                          ||
|  |  - verdict (PASS, BLOCK, WARN, FAIL)     |  |                                          ||
|  |  - confidence_score (0-100)              |  |                                          ||
|  |  - explained_trace (policy rule hits)    |  |                                          ||
|  +------------------------------------------+  +------------------------------------------+|
|                                                                                            |
|  Determinism Hash:                                                                         |
|  SHA256(policyVersion + batchCursor + inputsHash)                                          |
|  -> Same inputs always produce same outputs                                                |
+--------------------------------------------------------------------------------------------+

5. Event-Driven Flows

5.1 Event Bus Architecture

+--------------------------------------------------------------------------------------------+
|  Valkey Streams / NATS JetStream Event Bus                                                 |
|                                                                                            |
|  +-----------------------------------------------------------------------------------------+
|  |  Stream: scanner:events                                                                 |
|  |  Events: scan.submitted, scan.started, scan.completed, scan.failed                      |
|  |  Consumers: Policy, Notify, TimelineIndexer, ExportCenter                               |
|  +-----------------------------------------------------------------------------------------+
|                                                                                            |
|  +-----------------------------------------------------------------------------------------+
|  |  Stream: concelier:drift                                                                |
|  |  Events: advisory.new, advisory.updated, advisory.withdrawn                             |
|  |  Consumers: Scheduler, Policy, Notify                                                   |
|  +-----------------------------------------------------------------------------------------+
|                                                                                            |
|  +-----------------------------------------------------------------------------------------+
|  |  Stream: policy:evaluated                                                               |
|  |  Events: evaluation.completed, verdict.changed, exception.applied                       |
|  |  Consumers: Notify, Findings, ExportCenter                                              |
|  +-----------------------------------------------------------------------------------------+
|                                                                                            |
|  +-----------------------------------------------------------------------------------------+
|  |  Stream: scheduler:jobs                                                                 |
|  |  Events: run.started, run.completed, run.failed, rescan.triggered                       |
|  |  Consumers: Scanner, Notify, TimelineIndexer                                            |
|  +-----------------------------------------------------------------------------------------+
|                                                                                            |
|  +-----------------------------------------------------------------------------------------+
|  |  Stream: notify:delivery                                                                |
|  |  Events: notification.sent, notification.failed, notification.throttled                 |
|  |  Consumers: Audit, TimelineIndexer                                                      |
|  +-----------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+

5.2 Scan Completion Event Flow

scan.completed event
         |
         +-------------------+-------------------+-------------------+
         |                   |                   |                   |
         v                   v                   v                   v
+----------------+  +----------------+  +----------------+  +----------------+
| Policy.Engine  |  | Notify.Worker  |  |TimelineIndexer |  | ExportCenter   |
+----------------+  +----------------+  +----------------+  +----------------+
|                |  |                |  |                |  |                |
| Evaluate with  |  | Check rules    |  | Index event    |  | Generate SARIF |
| new SBOM data  |  | for matches    |  | for audit      |  | if configured  |
|                |  |                |  |                |  |                |
+-------+--------+  +-------+--------+  +-------+--------+  +----------------+
        |                   |                   |
        v                   v                   v
policy:evaluated    notification.sent    event indexed
        |                   |
        +-------------------+
                |
                v
     Downstream consumers

6. Offline/Air-Gap Data Flow

6.1 Offline Kit Contents

+--------------------------------------------------------------------------------------------+
|  Offline Update Kit Structure                                                              |
|                                                                                            |
|  offline-kit-2025-01-02/                                                                   |
|  +-- feeds/                                                                                |
|  |   +-- nvd/                 # NVD advisory snapshots                                     |
|  |   +-- osv/                 # OSV advisory snapshots                                     |
|  |   +-- ghsa/                # GHSA advisory snapshots                                    |
|  |   +-- vex/                 # VEX statement bundles                                      |
|  +-- images/                  # Container images for platform services                     |
|  +-- sboms/                   # Pre-generated SBOMs for bundled images                     |
|  +-- signatures/              # DSSE-signed bundles                                        |
|  |   +-- feeds.dsse           # Signed feed manifest                                       |
|  |   +-- images.dsse          # Signed image manifest                                      |
|  +-- trust-roots/             # CA certificates, JWKS                                      |
|  +-- policies/                # Default policy definitions                                 |
|  +-- manifest.json            # Kit contents and checksums                                 |
|  +-- manifest.dsse            # Signed manifest                                            |
+--------------------------------------------------------------------------------------------+

6.2 Offline Ingestion Flow

+--------------------------------------------------------------------------------------------+
|  Air-Gap Ingestion Pipeline                                                                |
|                                                                                            |
|  USB/Portable Media                       AirGap.Importer                                  |
|  +---------------+                        +-------------------------------------------+    |
|  | Offline Kit   |----------------------->|                                           |    |
|  | (signed)      |                        |  1. Verify manifest signature (DSSE)      |    |
|  +---------------+                        |  2. Validate checksums                    |    |
|                                           |  3. Import feeds to Concelier             |    |
|                                           |  4. Import VEX to Excititor               |    |
|                                           |  5. Update trust roots                    |    |
|                                           |  6. Trigger re-evaluation                 |    |
|                                           +-------------------------------------------+    |
|                                                                                            |
|  Key Guarantees:                                                                           |
|  - No network calls during import                                                          |
|  - All verification is local                                                               |
|  - Deterministic outputs match online mode                                                 |
|  - Full audit trail preserved                                                              |
+--------------------------------------------------------------------------------------------+