Here’s a crisp idea you can drop straight into Stella Ops: treat “unknowns” as first‑class data, not noise. --- # Unknowns Registry — turning uncertainty into signals **Why:** Scanners and VEX feeds miss things (ambiguous package IDs, unverifiable hashes, orphaned layers, missing SBOM edges, runtime-only artifacts). Today these get logged and forgotten. If we **structure** them, downstream agents can reason about risk and shrink blast radius proactively. **What it is:** A small service + schema that records every uncertainty with enough context for later inference. ## Core model (v0) ```json { "unknown_id": "unk:sha256:…", "observed_at": "2025-11-18T12:00:00Z", "provenance": { "source": "Scanner.Analyzer.DotNet|Sbomer|Signals|Vexer", "host": "runner-42", "scan_id": "scan:…" }, "scope": { "artifact": { "type": "oci.image", "ref": "registry/app@sha256:…" }, "subpath": "/app/bin/Contoso.dll", "phase": "build|scan|runtime" }, "unknown_type": "identity_gap|version_conflict|hash_mismatch|missing_edge|runtime_shadow|policy_undecidable", "evidence": { "raw": "nuget id 'Serilog' but assembly name 'Serilog.Core'", "signals": ["sym:Serilog.Core.Logger", "procopen:/app/agent"] }, "transitive": { "depth": 2, "parents": ["pkg:nuget/Serilog@?"], "children": [] }, "confidence": { "p": 0.42, "method": "bayes-merge|rule" }, "exposure_hints": { "surface": ["logging pipeline", "startup path"], "runtime_hits": 3 }, "status": "open|triaged|suppressed|resolved", "labels": ["reachability:possible", "sbom:incomplete"] } ``` ## Categorize by three axes * **Provenance** (where it came from): Scanner vs Sbomer vs Vexer vs Signals. * **Scope** (what it touches): image/layer/file/symbol/runtime‑proc/policy. * **Transitive depth** (how far from an entry point): 0 = direct, 1..N via deps. ## How agents use it * **Cartographer**: includes unknown edges in the graph with special weight; lets Policy/Lattice down‑rank vulnerable nodes near high‑impact unknowns. * **Remedy Assistant (Zastava)**: proposes micro‑probes (“add EventPipe/JFR tap for X symbol”) or build‑time assertions (“pin Serilog>=3.1, regenerate SBOM”). * **Scheduler**: prioritizes scans where unknown density × asset criticality is highest. ## Minimal API (idempotent, additive) * `POST /unknowns/ingest` — upsert by `unknown_id` (hash of type+scope+evidence). * `GET /unknowns?artifact=…&status=open` — list for a target. * `POST /unknowns/:id/triage` — set status/labels, attach rationale. * `GET /metrics` — density by artifact/namespace/unknown_type. *All additive; no versioning required. Repeat calls with the same payload are no‑ops.* ## Scoring hook (into your lattice) * Add a **“Unknowns Pressure”** term: `risk = base ⊕ (α * density_depth≤1) ⊕ (β * runtime_shadow) ⊕ (γ * policy_undecidable)` * Gate “green” only if `density_depth≤1 == 0` **or** compensating controls active. ## Storage & plumbing * **Store:** append‑only KV (Badger/Rocks) + Graph overlay (SQLite/Neo4j—your call). * **Emit:** DSSE‑signed “Unknowns Attestation” per scan for replayable audits. * **UI:** heatmap per artifact (unknowns by type × depth), drill‑down to evidence. ## First 2‑day slice 1. Define `unknown_type` enum + hashable `unknown_id`. 2. Wire Scanner/Sbomer/Vexer to emit unknowns (start with: identity_gap, missing_edge). 3. Persist + expose `/metrics` (density, by depth and type). 4. In Policy Studio, add the Unknowns Pressure term with default α/β/γ. If you want, I’ll draft the exact protobuf/JSON schema and drop a .NET 10 record types + EF model, plus a tiny CLI to query and a Grafana panel JSON. I will treat “it” as the whole vision behind **Pushing Binary Reachability Toward True Determinism** inside Stella Ops: function-/symbol-level reachability for binaries and higher-level languages, wired into Scanner, Cartographer, Signals, and VEX. Below is an implementation-oriented architecture plan you can hand directly to agents. --- ## 1. Scope, goals, and non-negotiable invariants ### 1.1. Scope Deliver a deterministic reachability pipeline for containers that: 1. Builds **call graphs** and **symbol usage maps** for: * Native binaries (ELF, PE, Mach-O) — primary for this branch. * Scripted/VM languages later: JS, Python, PHP (as part of the same architecture). 2. Maps symbols and functions to: * Packages (purls). * Vulnerabilities (CVE → symbol/function list via Concelier/VEX data). 3. Computes **deterministic reachability states** for each `(vulnerability, artifact)` pair. 4. Emits: * Machine-readable JSON (with `purl`s). * Graph overlays for Cartographer. * Inputs for the lattice/trust engine and VEXer/Excitor. ### 1.2. Invariants * **Deterministic replay**: Given the same: * Image digest(s), * Analyzer versions, * Config + policy, * Runtime trace inputs (if any), the same reachability outputs must be produced, bit-for-bit. * **Idempotent, additive APIs**: * No versioning of endpoints, only additive/optional fields. * Same request = same response, no side effects besides storing/caching. * **Lattice logic runs in `Scanner.WebService`**: * All “reachable/unreachable/unknown” and confidence merging lives in Scanner, not Concelier/Excitors. * **Preserve prune source**: * Concelier and Excitors preserve provenance and do not “massage” reachability; they only consume it. * **Offline, air-gap friendly**: * No mandatory external calls; dependency on local analyzers and local advisory/VEX cache. --- ## 2. High-level pipeline From container image to reachability output: 1. **Image enumeration** `Scanner.WebService` receives an image ref or tarball and spawns an analysis run. 2. **Binary discovery & classification** Binary analyzers detect ELF/PE/Mach-O + main interpreters (python, node, php) and scripts. 3. **Symbolization & call graph building** * For each binary/module, we produce: * Symbol table (exported + imported). * Call graph edges (function-level where possible). * For dynamic languages, we later plug in appropriate analyzers. 4. **Symbol→package mapping** * Match symbols to packages and `purl`s using: * Known vendor symbol maps (from Concelier / Feedser). * Heuristics, path patterns, build IDs. 5. **Vulnerability→symbol mapping** * From Concelier/VEX/CSAF: map each CVE to the set of symbols/functions it affects. 6. **Reachability solving** * For each `(CVE, artifact)`: * Determine presence and reachability of affected symbols from known entrypoints. * Merge static call graph and runtime signals (if available) via deterministic lattice. 7. **Output & storage** * Reachability JSON with purls and confidence. * Graph overlay into Cartographer. * Signals/events for downstream scoring. * DSSE-signed reachability attestation for replay/audit. --- ## 3. Component architecture ### 3.1. New and extended services 1. **`StellaOps.Scanner.WebService` (extended)** * Orchestration of reachability analyses. * Lattice/merging engine. * Idempotent reachability APIs. 2. **`StellaOps.Scanner.Analyzers.Binary.*` (new)** * `…Binary.Discovery`: file type detection, ELF/PE/Mach-O parsing. * `…Binary.Symbolizer`: resolves symbols, imports/exports, relocations. * `…Binary.CallGraph.Native`: builds call graphs where possible (via disassembly/CFG). * `…Binary.CallGraph.DynamicStubs`: heuristics for indirect calls, PLT/GOT, vtables. 3. **`StellaOps.Scanner.Analyzers.Script.*` (future extension)** * `…Lang.JavaScript.CallGraph` * `…Lang.Python.CallGraph` * `…Lang.Php.CallGraph` * These emit the same generic call-graph IR. 4. **`StellaOps.Reachability.Engine` (within Scanner.WebService)** * Normalizes all call graphs into a common IR. * Merges static and dynamic evidence. * Computes reachability states and scores. 5. **`StellaOps.Cartographer.ReachabilityOverlay` (new overlay module)** * Stores per-artifact call graphs and reachability tags. * Provides graph queries for UI and policy tools. 6. **`StellaOps.Signals` (extended)** * Ingests runtime call traces (e.g., from EventPipe/JFR/ebpf in other branches). * Feeds function-hit events into the Reachability Engine. 7. **Unknowns Registry integration (optional but recommended)** * Stores unresolved symbol/package mappings and incomplete edges as `unknowns`. * Used to adjust risk scores (“Unknowns Pressure”) when binary analysis is incomplete. --- ## 4. Detailed design by layer ### 4.1. Static analysis layer (binaries) #### 4.1.1. Binary discovery Module: `StellaOps.Scanner.Analyzers.Binary.Discovery` * Inputs: * Per-image file list (from existing Scanner). * Byte slices of candidate binaries. * Logic: * Detect ELF/PE/Mach-O via magic bytes, not extensions. * Classify as: * Main executable * Shared library * Plugin/module * Output: * `binary_manifest.json` per image: ```json { "image_ref": "registry/app@sha256:…", "binaries": [ { "id": "bin:elf:/usr/local/bin/app", "path": "/usr/local/bin/app", "format": "elf", "arch": "x86_64", "role": "executable" } ] } ``` #### 4.1.2. Symbolization Module: `StellaOps.Scanner.Analyzers.Binary.Symbolizer` * Uses: * ELF/PE/Mach-O parsers (internal or third-party), no external calls. * Output per binary: ```json { "binary_id": "bin:elf:/usr/local/bin/app", "build_id": "buildid:abcd…", "exports": ["pkg1::ClassA::method1", "..."], "imports": ["openssl::EVP_EncryptInit_ex", "..."], "sections": { "text": { "va": "0x...", "size": 12345 } } } ``` * Writes unresolved symbol sets to Unknowns Registry when: * Imports cannot be tied to known packages or symbols. #### 4.1.3. Call graph construction Module: `StellaOps.Scanner.Analyzers.Binary.CallGraph.Native` * Core tasks: * Build control-flow graphs (CFG) for each function via: * Disassembly. * Basic block detection. * Identify direct calls (`call func`) and indirect calls (function pointers, vtables). * IR model: ```json { "binary_id": "bin:elf:/usr/local/bin/app", "functions": [ { "fid": "func:app::main", "va": "0x401000", "size": 128 }, { "fid": "func:libssl::EVP_EncryptInit_ex", "external": true } ], "edges": [ { "caller": "func:app::main", "callee": "func:app::init_config", "type": "direct" }, { "caller": "func:app::main", "callee": "func:libssl::EVP_EncryptInit_ex", "type": "import" } ] } ``` * Edge confidence: * `type: direct|import|indirect|heuristic` * Used later by the lattice. #### 4.1.4. Entry point inference * Sources: * ELF `PT_INTERP`, PE `AddressOfEntryPoint`. * Application-level hints (known frameworks, service main methods). * Container metadata (CMD, ENTRYPOINT). * Output: ```json { "binary_id": "bin:elf:/usr/local/bin/app", "entrypoints": ["func:app::main"] } ``` > Note: For JS/Python/PHP, equivalent analyzers will later define module entrypoints (`index.js`, `wsgi_app`, `public/index.php`). --- ### 4.2. Symbol-to-package and CVE-to-symbol mapping #### 4.2.1. Symbol→package mapping Module: `StellaOps.Reachability.Mapping.SymbolToPurl` * Inputs: * Binary symbolization outputs. * Local mapping DB in Concelier (vendor symbol maps, debug info, name patterns). * File path + container context (`/usr/lib/...`, `/site-packages/...`). * Output: ```json { "symbol": "libssl::EVP_EncryptInit_ex", "purl": "pkg:apk/alpine/openssl@3.1.5-r2", "confidence": 0.93, "method": "vendor_map+path_heuristic" } ``` * Unresolved / ambiguous symbols: * Stored as `unknowns` of type `identity_gap`. #### 4.2.2. CVE→symbol mapping Responsibility: Concelier + its advisory ingestion. * For each vulnerability: ```json { "cve_id": "CVE-2025-12345", "purl": "pkg:apk/alpine/openssl@3.1.5-r2", "affected_symbols": [ "libssl::EVP_EncryptInit_ex", "libssl::EVP_EncryptUpdate" ], "source": "vendor_vex", "confidence": 1.0 } ``` * Reachability Engine consumes this mapping read-only. --- ### 4.3. Reachability Engine Module: `StellaOps.Reachability.Engine` (in Scanner.WebService) #### 4.3.1. Core data model Per `(artifact, cve, purl)`: ```json { "artifact": { "type": "oci.image", "ref": "registry/app@sha256:…" }, "cve_id": "CVE-2025-12345", "purl": "pkg:apk/alpine/openssl@3.1.5-r2", "symbols": [ { "symbol": "libssl::EVP_EncryptInit_ex", "static_presence": "present|absent|unknown", "static_reachability": "reachable|unreachable|unknown", "runtime_hits": 3, "runtime_reachability": "observed|not_observed|unknown" } ], "reachability_state": "confirmed_reachable|statically_reachable|present_not_reachable|not_present|unknown", "confidence": { "p": 0.87, "evidence": ["static_callgraph", "runtime_trace", "symbol_map"], "unknowns_pressure": 0.12 } } ``` #### 4.3.2. Lattice / state machine Define a deterministic lattice over states: * `NOT_PRESENT` * `PRESENT_NOT_REACHABLE` * `STATICALLY_REACHABLE` * `RUNTIME_OBSERVED` And “unknown” flags overlayed when evidence is missing. Merging rules (simplified): * If `NOT_PRESENT` and no conflicting evidence → `NOT_PRESENT`. * If at least one affected symbol is on a static path from any entrypoint → `STATICALLY_REACHABLE`. * If symbol observed at runtime → `RUNTIME_OBSERVED` (top state). * If symbol present in binary but not on any static path → `PRESENT_NOT_REACHABLE`, unless unknown edges exist near it (then downgrade with lower confidence). * Unknowns Registry entries near affected symbols increase `unknowns_pressure` and may push from `NOT_PRESENT` to `UNKNOWN`. Implementation: pure functional merge functions inside Scanner.WebService: ```csharp ReachabilityState Merge(ReachabilityState a, ReachabilityState b); ReachabilityState FromEvidence(StaticEvidence s, RuntimeEvidence r, UnknownsPressure u); ``` #### 4.3.3. Deterministic inputs To guarantee replay: * Build **Reachability Plan Manifest** per run: ```json { "plan_id": "reach:sha256:…", "scanner_version": "1.4.0", "analyzers": { "binary_discovery": "1.0.0", "binary_symbolizer": "1.1.0", "binary_callgraph": "1.2.0" }, "inputs": { "image_digest": "sha256:…", "runtime_trace_files": ["signals:run:2025-11-18T12:00:00Z"], "config": { "assume_indirect_calls": "conservative", "max_call_depth": 10 } } } ``` * DSSE-sign the plan + result. --- ### 4.4. Storage and graph overlay #### 4.4.1. Reachability store Backend: re-use existing Scanner/Cartographer storage stack (e.g., Postgres or SQLite + blob store). Tables/collections: * `reachability_runs` * `plan_id`, `image_ref`, `created_at`, `scanner_version`. * `reachability_results` * `plan_id`, `cve_id`, `purl`, `state`, `confidence_p`, `unknowns_pressure`, `payload_json`. * Indexes on `(image_ref, cve_id)`, `(image_ref, purl)`. #### 4.4.2. Cartographer overlay Edges: * `IMAGE` → `BINARY` → `FUNCTION` → `PACKAGE` → `CVE` * Extra property on `IMAGE -[AFFECTED_BY]-> CVE`: * `reachability_state` * `reachability_plan_id` Enables queries: * “Show me all CVEs with `STATICALLY_REACHABLE` in this namespace.” * “Show me binaries with high density of reachable crypto CVEs.” --- ### 4.5. APIs (idempotent, additive) #### 4.5.1. Trigger reachability `POST /reachability/runs` Request: ```json { "artifact": { "type": "oci.image", "ref": "registry/app@sha256:…" }, "config": { "include_languages": ["binary"], "max_call_depth": 10, "assume_indirect_calls": "conservative" } } ``` Response: ```json { "plan_id": "reach:sha256:…" } ``` * Idempotent key: `(image_ref, config_hash)`. Subsequent calls return same `plan_id`. #### 4.5.2. Fetch results `GET /reachability/runs/:plan_id` ```json { "plan": { /* reachability plan manifest */ }, "results": [ { "cve_id": "CVE-2025-12345", "purl": "pkg:apk/alpine/openssl@3.1.5-r2", "reachability_state": "static_reachable", "confidence": { "p": 0.84, "unknowns_pressure": 0.1 } } ] } ``` #### 4.5.3. Per-CVE view for VEXer/Excitor `GET /reachability/by-cve?artifact=…&cve_id=…` * Returns filtered result for downstream VEX creation. All APIs are **read-only** except for the side effect of storing/caching runs. --- ## 5. Interaction with other Stella Ops modules ### 5.1. Concelier * Provides: * CVE→purl→symbol mapping. * Vendor VEX statements indicating affected functions. * Consumes: * Nothing from reachability directly; Scanner/WebService passes reachability summary to VEXer/Excitor which merges with vendor statements. ### 5.2. VEXer / Excitor * Input: * For each `(artifact, cve)`: * Reachability state. * Confidence. * Logic: * Translate states to VEX statements: * `NOT_PRESENT` → `not_affected` * `PRESENT_NOT_REACHABLE` → `not_affected` (with justification “code not reachable according to analysis”) * `STATICALLY_REACHABLE` → `affected` * `RUNTIME_OBSERVED` → `affected` (higher severity) * Attach determinism proof: * Plan ID + DSSE of reachability run. ### 5.3. Signals * Provides: * Function hit events: `(binary_id, function_id, timestamp)` aggregated per image. * Reachability Engine: * Marks `runtime_hits` and state `RUNTIME_OBSERVED` for symbols with hits. * Unknowns: * If runtime sees hits in functions with no static edges to entrypoints (or unmapped symbols), these produce Unknowns and increase `unknowns_pressure`. ### 5.4. Unknowns Registry * From reachability pipeline, create Unknowns when: * Symbol→package mapping is ambiguous. * CVE→symbol mapping exists, but symbol cannot be found in binaries. * Call graph has indirect calls that cannot be resolved. * The “Unknowns Pressure” term is fed into: * Reachability confidence. * Global risk scoring (Trust Algebra Studio). --- ## 6. Implementation phases and engineering plan ### Phase 0 – Scaffolding & manifests (1 sprint) * Create: * `StellaOps.Reachability.Engine` skeleton. * Reachability Plan Manifest schema. * Reachability Run + Result persistence. * Add `/reachability/runs` and `/reachability/runs/:plan_id` endpoints, returning mock data. * Wire DSSE attestation generation for reachability results (even if payload is empty). ### Phase 1 – Binary discovery + symbolization (1–2 sprints) * Implement `Binary.Discovery` and `Binary.Symbolizer`. * Feed symbol tables into Reachability Engine as “presence-only evidence”: * States: `NOT_PRESENT` vs `PRESENT_NOT_REACHABLE` vs `UNKNOWN`. * Integrate with Concelier’s CVE→purl mapping (no symbol-level yet): * For CVEs affecting a package present in the image, mark as `PRESENT_NOT_REACHABLE`. * Emit Unknowns for unresolved binary roles and ambiguous package mapping. Deliverable: package-level reachability with deterministic manifests. ### Phase 2 – Binary call graphs & entrypoints (2–3 sprints) * Implement `Binary.CallGraph.Native`: * CFG + direct call edges. * Implement entrypoint inference from binary + container ENTRYPOINT/CMD. * Add static reachability algorithm: * DFS/BFS from entrypoints through call graph. * Mark affected symbols as reachable if found on paths. * Extend Concelier to ingest symbol-aware vulnerability metadata (for pilots; can be partial). Deliverable: function-level static reachability for native binaries where symbol maps exist. ### Phase 3 – Runtime integration (2 sprints, may be in parallel workstream) * Integrate Signals runtime evidence: * Define schema for function hit events. * Add ingestion path into Reachability Engine. * Update lattice: * Promote symbols to `RUNTIME_OBSERVED` when hits exist. * Extend DSSE attestation to reference runtime evidence URIs (hashes of trace inputs). Deliverable: static + runtime-confirmed reachability. ### Phase 4 – Unknowns & pressure (1 sprint) * Wire Unknowns Registry: * Emit unknowns from Symbolizer and CallGraph (identity gaps, missing edges). * Compute `unknowns_pressure` per `(artifact, cve)` as density of unknowns near affected symbols. * Adjust confidence calculation in Reachability Engine. * Expose unknowns metrics in API and Cartographer. Deliverable: explicit modelling of uncertainty, feeding into trust/lattice. ### Phase 5 – Language extensions (JS/Python/PHP) (ongoing) * Implement per-language call-graph analyzers creating the same IR as binary. * Extend symbol→purl mapping for these ecosystems (npm, PyPI, Packagist). * Update reachability solver to include multi-language edges (e.g., Python calling into native modules). --- ## 7. Minimal contracts for agents To hand off to agents, you can codify: 1. **IR schemas** * Call graph IR. * Reachability Result JSON. * Reachability Plan Manifest. 2. **API contracts** * `POST /reachability/runs` * `GET /reachability/runs/:plan_id` * `GET /reachability/by-cve` 3. **Module boundaries** * `Scanner.Analyzers.Binary.*` produce IR only; NO network calls. * `Reachability.Engine` is the only place where lattice logic lives. * `Concelier` is read-only for reachability; no custom logic there. 4. **Determinism practices** * All algorithmic randomness is banned; where unavoidable, seed with values derived from plan_id. * All external inputs must be listed in the Plan Manifest. If you like, next step I can draft: * Concrete C# record types for the IRs. * A small pseudo-code implementation of the lattice functions and static reachability DFS. * A proposed directory layout under `src/StellaOps.Scanner` and `src/StellaOps.Cartographer`.