save progress

This commit is contained in:
StellaOps Bot
2026-01-03 00:47:24 +02:00
parent 3f197814c5
commit ca578801fd
319 changed files with 32478 additions and 2202 deletions

View File

@@ -0,0 +1,550 @@
# Data Flows
This document details the data flows for SBOM generation, advisory ingestion, policy evaluation, and VEX processing in StellaOps. All flows are designed for deterministic, offline-first operation.
## Table of Contents
- [1. SBOM Data Lifecycle](#1-sbom-data-lifecycle)
- [2. Advisory Data Flow](#2-advisory-data-flow)
- [3. VEX Data Flow](#3-vex-data-flow)
- [4. Policy Evaluation Data Flow](#4-policy-evaluation-data-flow)
- [5. Event-Driven Flows](#5-event-driven-flows)
- [6. Offline/Air-Gap Data Flow](#6-offlineair-gap-data-flow)
---
## 1. SBOM Data Lifecycle
### 1.1 Generation Phase (Scanner)
```
Image Input (OCI reference)
|
v
+--------------------------------------------------------------------------------------------+
| Scanner.Worker |
| |
| +----------------------+ +----------------------+ +----------------------+ |
| | Layer Extraction |---->| Delta Cache Check |---->| Analyzer Execution | |
| | (OCI manifest) | | (Valkey layers:*) | | (11 language + OS) | |
| +----------------------+ +----------------------+ +----------------------+ |
| | | |
| | Cache Hit | Cache Miss |
| v v |
| +------------------+ +------------------+ |
| | Stitch Existing | | Full Analysis | |
| | SBOM Fragments | | (20ms fast path) | |
| +------------------+ +------------------+ |
| | | |
| +-------------+---------------+ |
| | |
| v |
| +-----------------------------------------------------------------------------------------+
| | Component Discovery |
| | |
| | +---------------+ +---------------+ +---------------+ +---------------+ |
| | | OS Packages | | Language Deps | | Native Bins | | Call Graphs | |
| | | (Apk/Dpkg/Rpm)| | (11 ecosystems)| | (ELF/PE/Mach) | | (Reachability)| |
| | +---------------+ +---------------+ +---------------+ +---------------+ |
| +-----------------------------------------------------------------------------------------+
| | |
| v |
| +-----------------------------------------------------------------------------------------+
| | SBOM Generation (Two Views) |
| | |
| | Inventory View: Usage View: |
| | - All components in filesystem - Entrypoint closure only |
| | - Declared + transitive + vendored - Actually linked libraries |
| | - Path: images/{digest}/inventory.* - Path: images/{digest}/usage.* |
| +-----------------------------------------------------------------------------------------+
| | |
| v |
| +-----------------------------------------------------------------------------------------+
| | Format Output |
| | |
| | +-------------------+ +-------------------+ +-------------------+ |
| | | CycloneDX 1.6 JSON| | CycloneDX Protobuf| | SPDX 3.0.1 JSON | |
| | | (.cdx.json) | | (.cdx.pb, compact)| | (.spdx.json) | |
| | +-------------------+ +-------------------+ +-------------------+ |
| +-----------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+
```
### 1.2 Storage Phase
```
+--------------------------------------------------------------------------------------------+
| Dual-Write Coordination |
| |
| +------------------------------------------+ +------------------------------------------+|
| | PostgreSQL (scanner schema) | | RustFS (S3 API) ||
| | | | ||
| | artifacts table: | | Blob Layout: ||
| | - artifact_id (sha256) | | blobs/{sha256_prefix}/ ||
| | - image_digest | | sbom.json (payload) ||
| | - format (cdx-json, spdx-json, etc.) | | sbom.meta.json (wrapper) ||
| | - created_at | | sbom.cdx.pb (binary) ||
| | - rekor_proof (optional) | | ||
| | | | Wrapper Envelope: ||
| | images table: | | { ||
| | - image_digest | | "id": "sha256:417f...", ||
| | - repository | | "imageDigest": "sha256:e2b9...", ||
| | - tag | | "format": "cdx-json", ||
| | - architecture | | "layers": ["sha256:..."], ||
| | | | "partial": false, ||
| | layers table: | | "provenanceId": "prov_0291" ||
| | - layer_digest | | } ||
| | - media_type | | ||
| | - size | | ||
| +------------------------------------------+ +------------------------------------------+|
+--------------------------------------------------------------------------------------------+
```
### 1.3 Index & Cache Phase (Valkey)
```
+--------------------------------------------------------------------------------------------+
| Valkey Keyspace for SBOM |
| |
| +------------------------------------------+ +------------------------------------------+|
| | Key Pattern | | Purpose ||
| +------------------------------------------+ +------------------------------------------+|
| | scan:{digest} | | Last scan JSON result ||
| | layers:{digest} | | Set of layer digests (90d TTL) ||
| | locator:{imageDigest} | | sbomBlobId mapping (30d TTL) ||
| +------------------------------------------+ +------------------------------------------+|
| |
| Delta SBOM Flow: |
| 1. Check layers:{digest} for cached layers |
| 2. Scan only missing layers (partial=true) |
| 3. Stitch new data onto cached full SBOM |
| 4. Update locator mapping |
| 5. Fast path: 20ms for unchanged layers |
+--------------------------------------------------------------------------------------------+
```
### 1.4 Consumption Phase
```
+--------------------------------------------------------------------------------------------+
| SBOM Consumers |
| |
| +-----------------+ +-----------------+ +-----------------+ |
| | Policy Engine | | Export Center | | Replay Engine | |
| +-----------------+ +-----------------+ +-----------------+ |
| | | | |
| v v v |
| +-------------+ +-------------+ +-------------+ |
| | Read SBOM | | Retrieve | | Link to | |
| | from RustFS | | by digest | | scan ID | |
| +-------------+ +-------------+ +-------------+ |
| | | | |
| v v v |
| +-------------+ +-------------+ +-------------+ |
| | Join with | | Convert | | Replay with | |
| | advisories | | format | | same inputs | |
| +-------------+ +-------------+ +-------------+ |
| | | | |
| v v v |
| +-------------+ +-------------+ +-------------+ |
| | Apply | | Generate | | Verify | |
| | reachability| | SARIF | | determinism | |
| +-------------+ +-------------+ +-------------+ |
+--------------------------------------------------------------------------------------------+
```
---
## 2. Advisory Data Flow
### 2.1 Ingestion (Concelier)
```
+--------------------------------------------------------------------------------------------+
| Advisory Ingestion Pipeline |
| |
| External Sources Concelier.Worker |
| +---------------+ +-------------------------------------------+ |
| | NVD |----------------------->| | |
| | Red Hat | | 1. Fetch advisories (HTTP/mirror) | |
| | OSV | | 2. For air-gap: use mirror bundles first | |
| | GHSA | | 3. Validate schema conformance | |
| | CSAF sources | | 4. Normalize to canonical observations | |
| +---------------+ | 5. Apply AOC (Aggregation-Only Contract) | |
| | 6. Persist raw documents (append-only) | |
| | 7. Build linksets (advisory -> PURL) | |
| | 8. Publish delta event | |
| +-------------------------------------------+ |
| | |
| v |
| +-----------------------------------------------------------------------------------------+
| | PostgreSQL (vuln schema) |
| | |
| | advisory_raw (append-only): linksets: |
| | - raw_document (JSON as-received) - advisory_id -> purl[] |
| | - source (NVD, RED_HAT, OSV, etc.) - Used for SBOM join in Policy Engine |
| | - advisory_id (CVE-2024-xxxx) |
| | - affected_purls observations: |
| | - published_at (UTC) - Normalized advisory metadata |
| | - revision - Severity, description, references |
| +-----------------------------------------------------------------------------------------+
| | |
| v |
| +-----------------------------------------------------------------------------------------+
| | Event: concelier:drift (Valkey Stream) |
| | |
| | Triggers: |
| | - Scheduler: identifies affected scans |
| | - Policy Engine: re-evaluation of impacted findings |
| | - Notify: critical vuln alerts |
| +-----------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+
```
### 2.2 Advisory Data Model
```
+--------------------------------------------------------------------------------------------+
| Raw Layer (Immutable - AOC Enforced) |
| |
| { |
| "advisoryId": "CVE-2024-1234", |
| "source": "NVD", |
| "rawDocument": { /* original JSON as received */ }, |
| "publishedAt": "2024-01-15T10:00:00Z", |
| "revision": 2, |
| "affectedPurls": [ |
| "pkg:npm/lodash@4.17.20", |
| "pkg:maven/org.apache.struts/struts2-core@2.5.30" |
| ], |
| "severity": { |
| "cvssV3": { "baseScore": 9.8, "vector": "..." }, |
| "cvssV4": { "baseScore": 9.2, "vector": "..." } |
| } |
| } |
| |
| Key Constraints: |
| - Raw documents are NEVER modified after ingestion |
| - Conflicts are preserved, not collapsed |
| - Multiple sources for same CVE stored separately |
| - Provenance tracked per observation |
+--------------------------------------------------------------------------------------------+
```
---
## 3. VEX Data Flow
### 3.1 VEX Ingestion (Excititor)
```
+--------------------------------------------------------------------------------------------+
| VEX Ingestion Pipeline |
| |
| External Sources Excititor.Worker |
| +---------------+ +-------------------------------------------+ |
| | OpenVEX |----------------------->| | |
| | CSAF VEX | | 1. Fetch VEX statements | |
| | SBOM referrers| | 2. For air-gap: use offline bundles | |
| | Vendor feeds | | 3. Verify signatures (if signed) | |
| +---------------+ | 4. Normalize to canonical shape | |
| | 5. Persist immutable raw statements | |
| | 6. Publish to VexLens for consensus | |
| +-------------------------------------------+ |
| | |
| v |
| +-----------------------------------------------------------------------------------------+
| | PostgreSQL (vex schema) |
| | |
| | vex_raw (append-only): |
| | - raw_statement (OpenVEX JSON as-received) |
| | - issuer_id (vendor or trust issuer) |
| | - component_purl |
| | - vulnerability_id (CVE or GHSA) |
| | - status (not_affected, affected, under_investigation) |
| | - justification (component_not_present, vulnerable_code_not_present, etc.) |
| | - published_at |
| | - signature (DSSE envelope if signed) |
| +-----------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+
```
### 3.2 VEX Consensus (VexLens)
```
+--------------------------------------------------------------------------------------------+
| VEX Consensus Pipeline |
| |
| +--------------------+ |
| | Multiple VEX | |
| | statements for | |
| | same (CVE, PURL) | |
| +--------------------+ |
| | |
| v |
| +--------------------------------------------------------------------------------------------+
| | VexLens Consensus Engine |
| | |
| | 1. Merge observations by component identity (PURL) |
| | 2. Apply issuer priority rules: |
| | - Vendor > Distro > Researcher > Community |
| | 3. Apply trust scores (from IssuerDirectory) |
| | 4. Detect conflicts (multiple issuers disagree) |
| | 5. Preserve conflict state (K4 lattice: True + False = Conflict) |
| | 6. Export consensus outcomes |
| +--------------------------------------------------------------------------------------------+
| | |
| v |
| +--------------------------------------------------------------------------------------------+
| | Policy Engine Integration |
| | |
| | VEX gates override severity thresholds: |
| | - not_affected -> PASS (with evidence) |
| | - affected -> normal evaluation continues |
| | - under_investigation -> WARN (pending) |
| +--------------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+
```
---
## 4. Policy Evaluation Data Flow
### 4.1 Input Assembly
```
+--------------------------------------------------------------------------------------------+
| Policy Engine Input Sources (All Immutable - AOC Enforced) |
| |
| +------------------+ +------------------+ +------------------+ +------------------+ |
| | SBOM | | Advisory | | VEX | | Reachability | |
| | (from RustFS) | | (from vuln.*) | | (from vex.*) | | (from Scanner) | |
| +------------------+ +------------------+ +------------------+ +------------------+ |
| | | | | |
| +--------------------+--------------------+--------------------+ |
| | |
| v |
| +-----------------------------------------------------------------------------------------+
| | Selection Layer |
| | |
| | Deterministic Joining: |
| | - SBOM <-> Advisory (by PURL matching) |
| | - Advisory <-> VEX (by CVE + PURL) |
| | - Component <-> Reachability (by identity) |
| | |
| | Batch Ordering: |
| | - Sort by (tenant, policyId, vulnerabilityId, productKey, source) |
| | - Enables incremental cursor-based processing |
| +-----------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+
```
### 4.2 Evaluation Pipeline
```
+--------------------------------------------------------------------------------------------+
| Policy Evaluation Pipeline |
| |
| +-----------------------------------------------------------------------------------------+
| | 1. Load Policy IR (cached by policyId+version hash) |
| +-----------------------------------------------------------------------------------------+
| | |
| v |
| +-----------------------------------------------------------------------------------------+
| | 2. For Each Batch (component, vulnerability): |
| | |
| | +------------------+ |
| | | Evidence-Weighted| severityWeight (from advisory CVSS) |
| | | Score Compute |---> trustWeight (from VEX issuer) |
| | +------------------+ reachabilityWeight (from Scanner entrypoint closure) |
| | | runtimeWeight (from Zastava signals) |
| | v |
| | +------------------+ |
| | | Policy Rules | First-match semantics |
| | | Execution |---> Actions: assign, annotate, escalate, warn |
| | +------------------+ |
| | | |
| | v |
| | +------------------+ |
| | | Exception Apply | Specificity-ranked |
| | | |---> Effects: suppress, defer, downgrade, require-control |
| | +------------------+ |
| | | |
| | v |
| | +------------------+ |
| | | Unknown Budget | Per-environment limits |
| | | Check |---> Block if exceeded, Warn if approaching |
| | +------------------+ |
| | | |
| | v |
| | +------------------+ |
| | | Confidence Calc | 5 factors: reachability, runtime, VEX, provenance, policy |
| | | |---> Final = Clamp01(Sum(Weight x RawValue)) |
| | +------------------+ Tiers: VeryHigh(>=0.9), High(>=0.7), Medium(>=0.5), etc. |
| | | |
| | v |
| | +------------------+ |
| | | VEX Decision | Emit OpenVEX statements for verdict changes |
| | | Emission |---> DSSE-signed, logged to Rekor v2 |
| | +------------------+ |
| +-----------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+
```
### 4.3 Output Materialization
```
+--------------------------------------------------------------------------------------------+
| Finding Materialization |
| |
| +------------------------------------------+ +------------------------------------------+|
| | Current Snapshot | | History (Audit Trail) ||
| | policy.effective_finding_{policyId} | | policy.effective_finding_{policyId}_history|
| +------------------------------------------+ +------------------------------------------+|
| | - finding_key (deterministic digest) | | - All snapshots with timestamps ||
| | - severity (CRITICAL, HIGH, etc.) | | - Previous verdicts ||
| | - source (NVD, vendor, research) | | - Provenance chain ||
| | - advisory_raw_ids (back-link) | | ||
| | - vex_raw_ids (if overridden) | | ||
| | - sbom_component_id (inventory link) | | ||
| | - verdict (PASS, BLOCK, WARN, FAIL) | | ||
| | - confidence_score (0-100) | | ||
| | - explained_trace (policy rule hits) | | ||
| +------------------------------------------+ +------------------------------------------+|
| |
| Determinism Hash: |
| SHA256(policyVersion + batchCursor + inputsHash) |
| -> Same inputs always produce same outputs |
+--------------------------------------------------------------------------------------------+
```
---
## 5. Event-Driven Flows
### 5.1 Event Bus Architecture
```
+--------------------------------------------------------------------------------------------+
| Valkey Streams / NATS JetStream Event Bus |
| |
| +-----------------------------------------------------------------------------------------+
| | Stream: scanner:events |
| | Events: scan.submitted, scan.started, scan.completed, scan.failed |
| | Consumers: Policy, Notify, TimelineIndexer, ExportCenter |
| +-----------------------------------------------------------------------------------------+
| |
| +-----------------------------------------------------------------------------------------+
| | Stream: concelier:drift |
| | Events: advisory.new, advisory.updated, advisory.withdrawn |
| | Consumers: Scheduler, Policy, Notify |
| +-----------------------------------------------------------------------------------------+
| |
| +-----------------------------------------------------------------------------------------+
| | Stream: policy:evaluated |
| | Events: evaluation.completed, verdict.changed, exception.applied |
| | Consumers: Notify, Findings, ExportCenter |
| +-----------------------------------------------------------------------------------------+
| |
| +-----------------------------------------------------------------------------------------+
| | Stream: scheduler:jobs |
| | Events: run.started, run.completed, run.failed, rescan.triggered |
| | Consumers: Scanner, Notify, TimelineIndexer |
| +-----------------------------------------------------------------------------------------+
| |
| +-----------------------------------------------------------------------------------------+
| | Stream: notify:delivery |
| | Events: notification.sent, notification.failed, notification.throttled |
| | Consumers: Audit, TimelineIndexer |
| +-----------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+
```
### 5.2 Scan Completion Event Flow
```
scan.completed event
|
+-------------------+-------------------+-------------------+
| | | |
v v v v
+----------------+ +----------------+ +----------------+ +----------------+
| Policy.Engine | | Notify.Worker | |TimelineIndexer | | ExportCenter |
+----------------+ +----------------+ +----------------+ +----------------+
| | | | | | | |
| Evaluate with | | Check rules | | Index event | | Generate SARIF |
| new SBOM data | | for matches | | for audit | | if configured |
| | | | | | | |
+-------+--------+ +-------+--------+ +-------+--------+ +----------------+
| | |
v v v
policy:evaluated notification.sent event indexed
| |
+-------------------+
|
v
Downstream consumers
```
---
## 6. Offline/Air-Gap Data Flow
### 6.1 Offline Kit Contents
```
+--------------------------------------------------------------------------------------------+
| Offline Update Kit Structure |
| |
| offline-kit-2025-01-02/ |
| +-- feeds/ |
| | +-- nvd/ # NVD advisory snapshots |
| | +-- osv/ # OSV advisory snapshots |
| | +-- ghsa/ # GHSA advisory snapshots |
| | +-- vex/ # VEX statement bundles |
| +-- images/ # Container images for platform services |
| +-- sboms/ # Pre-generated SBOMs for bundled images |
| +-- signatures/ # DSSE-signed bundles |
| | +-- feeds.dsse # Signed feed manifest |
| | +-- images.dsse # Signed image manifest |
| +-- trust-roots/ # CA certificates, JWKS |
| +-- policies/ # Default policy definitions |
| +-- manifest.json # Kit contents and checksums |
| +-- manifest.dsse # Signed manifest |
+--------------------------------------------------------------------------------------------+
```
### 6.2 Offline Ingestion Flow
```
+--------------------------------------------------------------------------------------------+
| Air-Gap Ingestion Pipeline |
| |
| USB/Portable Media AirGap.Importer |
| +---------------+ +-------------------------------------------+ |
| | Offline Kit |----------------------->| | |
| | (signed) | | 1. Verify manifest signature (DSSE) | |
| +---------------+ | 2. Validate checksums | |
| | 3. Import feeds to Concelier | |
| | 4. Import VEX to Excititor | |
| | 5. Update trust roots | |
| | 6. Trigger re-evaluation | |
| +-------------------------------------------+ |
| |
| Key Guarantees: |
| - No network calls during import |
| - All verification is local |
| - Deterministic outputs match online mode |
| - Full audit trail preserved |
+--------------------------------------------------------------------------------------------+
```
---
## Related Documentation
- [User Flows](user-flows.md)
- [Module Matrix](module-matrix.md)
- [Schema Mapping](schema-mapping.md)
- [Data Schemas](../../11_DATA_SCHEMAS.md)
- [Offline Kit](../../24_OFFLINE_KIT.md)