Here’s a compact, plain‑English plan to make your scanner **faster, quieter, and auditor‑friendly** by (1) diff‑aware rescans and (2) unified binary+source reachability—both drop‑in for Stella Ops. # Deterministic, diff‑aware rescans (clean SBOM/VEX diffs) **Goal:** Only recompute what changed; emit stable, minimal diffs reviewers can trust. **Core ideas** - **Per‑layer SBOM artifacts (cacheable):** For each image layer `L#`, persist: - `sbom-L#.cdx.json` (CycloneDX), `hash(L#)`, `toolchain-hash`, `feeds-hash`. - **Symbol‑fingerprints** for each discovered file: `algo|path|size|mtime|xxh3|funcIDs[]`. - **Slice recomputation:** On new image `I'`, match layers via hashes; for changed layers or files, recompute *only* their call‑graph slices and vuln joins. - **Deterministic manifests:** Every scan writes a `scan.lock.json` (inputs, feed versions, rules, lattice policy hash, tool versions, clocks) so results are **replayable**. **Minimal data model (Mongo)** - `scan_runs(_id, imageDigest, inputsHash, policyHash, feedsHash, startedAt, finishedAt, parentRunId?)` - `layer_sboms(scanRunId, layerDigest, sbomCid, symbolIndexCid, layerHash)` - `file_symbols(scanRunId, path, fileHash, funcIDs[], lang, size, mtime)` - `diffs(fromRunId, toRunId, kind: 'sbom'|'vex'|'reachability', stats, patch)` (store JSON Patch) **Algorithm sketch** 1. Resolve base image ancestry → map `old layer digest ↔ new layer digest`. 2. For unchanged layers: reuse `layer_sboms` + `file_symbols`. 3. For changed/added files: re‑symbolize + re‑analyze; restrict call‑graph build to **impacted SCCs**. 4. Re‑join OSV/GHSA/vendor vulns → compute reachability deltas → emit **stable JSON Patch**. **CLI impact** - `stella scan --deterministic --cache-dir ~/.stella/cache --emit-diff previousRunId` - `stella diff --from --to --format jsonpatch|md` --- # Unified binary + source reachability (function‑level) **Goal:** Decide “is the vulnerable function reachable/used here?” across native and managed code. **Extraction** - **Binary symbolizers:** - ELF: parse `.symtab`/`.dynsym`, DWARF (if present). - Mach‑O/PE: export tables + DWARF/PDB (if present). - Build **Canonical Symbol ID (CSID)**: `lang:pkg@ver!binary#file:function(signature)`; normalize C++/Rust mangling. - **Source symbolizers:** - .NET (Roslyn+IL), JVM (bytecode), Go (SSA), Node/TS (TS AST), Python (AST), Rust (HIR/MIR if available). - **Bindings join:** Map FFI edges (P/Invoke, cgo, JNI/JNA, N-API) → **cross‑ecosystem call edges**: - `.NET P/Invoke` → DLL export CSID. - Java JNI → `Java_com_pkg_Class_Method` ↔ native export. - Node N-API → addon exports ↔ JS require() site. **Reachability pipeline** 1. Build per‑language call graphs (CG) with framework models (ASP.NET, Spring, Express, etc.). 2. Add FFI edges; merge into a **polyglot call graph**. 3. Mark **entrypoints** (container `CMD/ENTRYPOINT`, web handlers, cron, CLI verbs). 4. For each CVE → {pkg, version, affected symbols[]} map → **is any affected CSID on a path from an entrypoint?** 5. Output evidence: - `reachable: true|false|unknown` - shortest path (symbols list) - probes (optional): runtime samples (EventPipe/JFR/uprobes) hitting CSIDs **Artifacts emitted** - `symbols.csi.jsonl` (all CSIDs) - `polyglot.cg.slices.json` (only impacted SCCs for diffs) - `reach.vex.json` (OpenVEX/CSAF with function‑level notes + confidence) --- # What to build next (low‑risk, high‑impact) - **[Week 1–2]** Per‑layer caches + `scan.lock.json`; file symbol‑fingerprints (xxh3 + top‑K funcIDs). - **[Week 3–4]** ELF/PE/Mach‑O symbolizer lib with CSIDs; .NET IL + P/Invoke mapper. - **[Week 5–6]** Polyglot CG merge + entrypoint discovery from Docker metadata; JSON Patch diffs. - **[Week 7+]** Runtime probes (opt‑in) to boost confidence and suppress false positives. --- # Tiny code seeds (C# hints) **Symbol fingerprint (per file)** ```csharp record SymbolFingerprint( string Algo, string Path, long Size, long MTimeUnix, string ContentHash, string[] FuncIds); ``` **Deterministic scan lock** ```csharp record ScanLock( string FeedsHash, string RulesHash, string PolicyHash, string Toolchain, string ImageDigest, string[] LayerDigests, DateTimeOffset Clock, IDictionary EnvPins); ``` **JSON Patch diff emit** ```csharp var patch = JsonDiffPatch.Diff(oldVexJson, newVexJson); // stable sort keys beforehand File.WriteAllText("vex.diff.json", patch); ``` --- If you want, I can turn this into: - a **.proto** for the cache/index objects, - a **Mongo schema + indexes** (including compound keys for fast layer reuse), - and a **.NET 10** service skeleton (`StellaOps.Scanner.WebService`) with endpoints: `/scan`, `/diff/{from}/{to}`, `/reach/{runId}`.