Here’s a practical, from‑scratch blueprint for a **two‑stage reachability map** that turns low‑level runtime facts into auditable, reproducible evidence for triage and VEX decisions. --- # What this is (plain English) * **Goal:** prove (or rule out) whether a vulnerable function/package could actually run in *your* build and deployment. * **How:** 1. extract **binary‑level call targets** (what functions your program *could* call), 2. map those targets onto **symbol graphs** (named functions/classes/modules), 3. correlate those symbols with **SBOM components** (which package/image layer they live in), 4. store each “slice” of reachability as a **signed attestation** so anyone can replay and verify it. --- # Stage A — Binary → Symbol graph * **Inputs:** built artifacts (ELF/COFF/Mach‑O), debug symbols (when available), stripped bins, and language runtimes. * **Process (per artifact):** * Parse binaries (headers, sections, symbol tables, relocations). * Recover call edges: * Direct calls: disassemble; record `caller -> callee`. * Indirect calls: resolve via PLT/IAT/vtables; fall back to conservative points‑to sets. * Dynamic loading: log `dlopen/LoadLibrary` + exported symbol usage heuristics. * Normalize to **Symbol Graph**: nodes = `{binary, symbol, addr, hash}`, edges = `CALLS`. * **Outputs:** `symbol-graph.jsonl` (+ compact binary form), content‑addressed by hash. # Stage B — Symbol graph ↔ SBOM components * **Inputs:** CycloneDX/SPDX SBOM for the image/build; file→component mapping (path→pkg). * **Process:** * For each symbol: derive file path (or Build‑ID) → map to SBOM component/version/layer. * Build **Component Reachability Graph**: * nodes = `{component@version}`, edges = “component provides symbol X used by Y”. * annotate with file hashes, Build‑IDs, container layer digests. * **Outputs:** `reachability-slices/COMPONENT@VERSION.slice.json` (per impacted component). # Attestable “slice” (the evidence object) Each slice is a minimal proof unit answering: *“This vulnerable symbol is (or isn’t) on a feasible path at runtime in build X.”* * **Contents:** * Scan manifest (tool versions, ruleset hashes, feed versions). * Inputs digests (binaries, SBOM, container layers). * The subgraph (only nodes/edges needed). * Query + result (e.g., “is `openssl:EVP_PKEY_decrypt` reachable from any exported entrypoint?”). * **Format:** DSSE + in‑toto statement, stored as OCI artifact or file; **deterministic** (same inputs → same bytes). # Triage flow (how it helps today) * Given CVE → map to symbols/functions → check reachability slice: * **Reachable path found:** mark “affected (reachable)”, include call chain and components; raise priority. * **No path / gated by feature flag:** mark “not affected (unreachable/mitigated)”, with proof chain. * **Unknowns present:** fail‑safe policy (e.g., “unknowns > N → block prod”) with explicit unknown edges listed. # Minimal data model (JSON hints) * `Symbol`: `{ id, name, demangled, addr, file_sha256, build_id }` * `Edge`: `{ src_symbol_id, dst_symbol_id, kind: "direct"|"plt"|"indirect" }` * `Mapping`: `{ file_sha256|build_id -> component_purl, layer_digest, path }` * `Slice`: `{ inputs:{…}, query:{…}, subgraph:{symbols:[…],edges:[…]}, verdict:"reachable"|"unreachable"|"unknown" }` # Determinism & replay * Pin **everything**: disassembler version, rules, demangler options, container digests, SBOM doc hash, symbolization flags. * Emit a **Scan Manifest** with content hashes; store alongside slices. * Provide a `replay` command that re‑hydrates inputs and re‑computes the slice; byte‑for‑byte match required. # Where this plugs into Stella Ops (suggested modules) * **Sbomer**: component/file mapping & SBOM import. * **Scanner.webservice**: binary parse & call‑graph extraction (keep lattice/policy elsewhere per your rule). * **Vexer/Policy Engine**: consume slices as evidence for “affected/not‑affected” claims. * **Attestor/Authority**: sign DSSE/in‑toto statements; push to OCI. * **Timeline/Notify**: surface verdict deltas over time, link to slices. # Guardrails & fallbacks * If stripped binaries: prefer Build‑ID + external symbol servers; else conservative over‑approx (mark unknown). * For JIT/dynamic plugins: capture runtime traces (eBPF/ETW) and merge as **observed edges** with timestamps. * Mixed‑lang stacks: unify by file hash + symbol name mangling rules per toolchain. # Quick implementation plan (6 sprints) 1. **Binary ingest**: ELF/PE/Mach‑O parsing, Build‑ID hashing, symbol tables, PLT/IAT resolution. 2. **Call‑edge recovery**: direct calls, basic indirect resolution, slice extractor by entrypoint. 3. **SBOM mapping**: file→component map, layer digests, purl normalization. 4. **Evidence format**: DSSE/in‑toto schema, deterministic manifests, OCI storage. 5. **Queries & policies**: “is‑reachable?” API, unknowns budget, feature‑flag conditions, VEX plumbing. 6. **Runtime merge**: optional eBPF/ETW traces → annotate edges, produce “observed‑path” slices. # Lightweight APIs (sketch) * `POST /reachability/query { cve, symbols[], entrypoints[], policy } -> slice+verdict` * `GET /slice/{digest}` -> attested slice * `POST /replay { slice_digest }` -> match | mismatch (with diff) # Small example (CVE → symbol mapping) * `CVE‑XXXX‑YYYY` → advisory lists function `foo_decrypt` in `libfoo.so` * We resolve `libfoo.so` Build‑ID in image, find symbols that match demangled name, build call paths from service entrypoints; if path exists, slice is “reachable” with 3–7 hop chain; otherwise “unreachable” with reasons (no import, stripped at link‑time, dead code eliminated, or gated by `FEATURE_X=false`). # Costs (rough, for planning inside Stella Ops) * **Core parsing & graph**: 3–4 engineer‑weeks * **Indirect calls & heuristics**: +3–5 weeks * **SBOM mapping & layers**: 2 weeks * **Attestations & OCI storage**: 1–2 weeks * **Policy/VEX integration & UI surfacing**: 2–3 weeks * **Runtime trace merge (optional)**: 2–4 weeks *(Parallelizable; add 25–40% for hardening/tests.)* If you want, I can turn this into: * a concrete **.NET 10 service skeleton** (endpoints + data contracts), * a **DSSE/in‑toto schema** for the slice, and * a **dev checklist** for deterministic builds and replay harness.