- Introduced a detailed specification for encoding binary reachability that integrates call graphs with SBOMs. - Defined a minimal data model including nodes, edges, and SBOM components. - Outlined a step-by-step guide for building the reachability graph in a C#-centric manner. - Established core domain models, including enumerations for binary formats and symbol kinds. - Created a public API for the binary reachability service, including methods for graph building and serialization. - Specified SBOM component resolution and binary parsing abstractions for PE, ELF, and Mach-O formats. - Enhanced symbol normalization and digesting processes to ensure deterministic signatures. - Included error handling, logging, and a high-level test plan to ensure robustness and correctness. - Added non-functional requirements to guide performance, memory usage, and thread safety.
37 KiB
Vlad, here’s a concrete, pure‑C# blueprint to build a multi‑format binary analyzer (Mach‑O, ELF, PE) that produces call graphs + reachability, with no external tools. Where needed, I point to permissively‑licensed code you can port (copy) from other ecosystems.
0) Targets & non‑negotiables
- Formats: Mach‑O (inc. LC_DYLD_INFO / LC_DYLD_CHAINED_FIXUPS), ELF (SysV gABI), PE/COFF
- Architectures: x86‑64 (and x86), AArch64 (ARM64)
- Outputs: JSON with purls per module + function‑level call graph & reachability
- No tool reuse: Only pure C# libraries or code ported from permissive sources
1) Parsing the containers (pure C#)
Pick one C# reader per format, keeping licenses permissive:
- ELF & Mach‑O:
ELFSharp(pure managed C#; ELF + Mach‑O reading). MIT/X11 license. (GitHub) - ELF & PE (+ DWARF v4):
LibObjectFile(C#, BSD‑2). Good ELF relocations (i386, x86_64, ARM, AArch64), PE directories, DWARF sections. Use it as your common object model for ELF+PE, then add a Mach‑O adapter. (GitHub) - PE (optional alternative):
PeNet(pure C#, broad PE directories, imp/exp, TLS, certs). MIT. Useful if you want a second implementation for cross‑checks. (GitHub)
Why two libs?
LibObjectFilegives you DWARF and clean models for ELF/PE;ELFSharpcovers Mach‑O today (and ELF as a fallback). You control the code paths.
Spec references you’ll implement against (for correctness of your readers & link‑time semantics):
- ELF (gABI, AMD64 supplement): dynamic section, PLT/GOT,
R_X86_64_JUMP_SLOTsemantics (eager vs lazy). (refspecs.linuxbase.org) - PE/COFF: imports/exports/IAT, delay‑load, TLS. (Microsoft Learn)
- Mach‑O: file layout, load commands (
LC_SYMTAB,LC_DYSYMTAB,LC_FUNCTION_STARTS,LC_DYLD_INFO(_ONLY)), and the modernLC_DYLD_CHAINED_FIXUPS. (leopard-adc.pepas.com)
2) Mach‑O: what you must port (byte‑for‑byte compatible)
Apple moved from traditional dyld bind opcodes to chained fixups on macOS 12/iOS 15+; you need both:
- Dyld bind opcodes (
LC_DYLD_INFO(_ONLY)): parse the BIND/LAZY_BIND streams (tuples of<seg,off,type,ordinal,symbol,addend>). Port minimal logic from LLVM or LIEF (both Apache‑2.0‑compatible) into C#. (LIEF) - Chained fixups (
LC_DYLD_CHAINED_FIXUPS): portdyld_chained_fixups_headerstructs & chain walking from LLVM’sMachO.hor Apple’s dyld headers. This restores imports/rebases without running dyld. (LLVM) - Function discovery hint: read
LC_FUNCTION_STARTS(ULEB128 deltas) to seed function boundaries—very helpful on stripped binaries. (Stack Overflow) - Stubs mapping: resolve
__TEXT,__stubs↔__DATA,__la_symbol_ptrvia the indirect symbol table; conceptually identical to ELF’s PLT/GOT. (MaskRay)
If you prefer an in‑C# base for Mach‑O manipulation, Melanzana.MachO exists (MIT) and has been used by .NET folks for Mach‑O/Code Signing/obj writing; you can mine its approach for load‑command modeling. (GitHub)
3) Disassembly (pure C#, multi‑arch)
-
x86/x64:
iced(C# decoder/disassembler/encoder; MIT; fast & complete). (GitHub) -
AArch64/ARM64: two options that keep you pure‑C#:
-
x86 fallback:
SharpDisasm(udis86 port in C#; BSD‑2). Older than iced; keep as a reference. (GitHub)
4) Call graph recovery (static)
4.1 Function seeds
- From symbols (
.dynsym/LC_SYMTAB/PE exports) - From LC_FUNCTION_STARTS (Mach‑O) for stripped code (Stack Overflow)
- From entrypoints (
_start/mainor PE AddressOfEntryPoint) - From exception/unwind tables & DWARF (when present)—
LibObjectFilealready models DWARF v4. (GitHub)
4.2 CFG & interprocedural calls
-
Decode with iced/Disarm from each seed; form basic blocks by following control‑flow until terminators (ret/jmp/call).
-
Direct calls: immediate targets become edges (PC‑relative fixups where needed).
-
Imported calls:
- ELF: calls to PLT stubs → resolve via
.rela.plt&R_*_JUMP_SLOTto symbol names (link‑time target). (cs61.seas.harvard.edu) - PE: calls through the IAT → resolve via
IMAGE_IMPORT_DESCRIPTOR/ thunk tables. (Microsoft Learn) - Mach‑O: calls to
__stubsuse indirect symbol table +__la_symbol_ptr(or chained fixups) → map to dylib/symbol. (reinterpretcast.com)
- ELF: calls to PLT stubs → resolve via
-
Indirect calls within the binary: heuristics only (function pointer tables, vtables, small constant pools). Keep them labeled “indirect‑unresolved” unless a heuristic yields a concrete target.
4.3 Cross‑binary graph
-
Build module‑level edges by simulating the platform’s loader:
- ELF: honor
DT_NEEDED,DT_RPATH/RUNPATH, versioning (.gnu.version*) to pick the definer of an imported symbol. gABI rules apply. (refspecs.linuxbase.org) - PE: pick DLL from the import descriptors. (Microsoft Learn)
- Mach‑O:
LC_LOAD_DYLIB+ dyld binding / chained fixups determine the provider image. (LIEF)
- ELF: honor
5) Reachability analysis
Represent the call graph using a .NET graph lib (or a simple adjacency set). I suggest:
- QuikGraph (successor of QuickGraph; MIT) for algorithms (DFS/BFS, SCCs). Use it to compute reachability from chosen roots (entrypoint(s), exported APIs, or “sinks”). (GitHub)
You can visualize with MSAGL (MIT) when you need layouts, but your core output is JSON. (GitHub)
6) Symbol demangling (nice‑to‑have, pure C#)
- Itanium (ELF/Mach‑O): Either port LLVM’s Itanium demangler or use a C# lib like CxxDemangler (a C# rewrite of
cpp_demangle). (LLVM) - MSVC (PE): Port LLVM’s
MicrosoftDemangle.cpp(Apache‑2.0 with LLVM exception) to C#. (LLVM)
7) JSON output (with purls)
Use a stable schema (example) to feed SBOM/vuln matching downstream:
{
"modules": [
{
"purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1?arch=amd64",
"format": "ELF",
"arch": "x86_64",
"path": "/usr/lib/x86_64-linux-gnu/libssl.so.1.1",
"exports": ["SSL_read", "SSL_write"],
"imports": ["BIO_new", "EVP_CipherInit_ex"],
"functions": [{"name":"SSL_do_handshake","va":"0x401020","size":512,"demangled": "..."}]
}
],
"graph": {
"nodes": [
{"id":"bin:main@0x401000","module": "pkg:generic/myapp@1.0.0"},
{"id":"lib:SSL_read","module":"pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1?arch=amd64"}
],
"edges": [
{"src":"bin:main@0x401000","dst":"lib:SSL_read","kind":"import_call","evidence":"ELF.R_X86_64_JUMP_SLOT"}
]
},
"reachability": {
"roots": ["bin:_start","bin:main@0x401000"],
"reachable": ["lib:SSL_read", "lib:SSL_write"],
"unresolved_indirect_calls": [
{"site":"0x402ABC","reason":"register-indirect"}
]
}
}
8) Minimal C# module layout (sketch)
Stella.Analysis.Core/
BinaryModule.cs // common model (sections, symbols, relocs, imports/exports)
Loader/
PeLoader.cs // wrap LibObjectFile (or PeNet) to BinaryModule
ElfLoader.cs // wrap LibObjectFile to BinaryModule
MachOLoader.cs // wrap ELFSharp + your ported Dyld/ChainedFixups
Disasm/
X86Disassembler.cs // iced bridge: bytes -> instructions
Arm64Disassembler.cs // Disarm (or ARMeilleure port) bridge
Graph/
CallGraphBuilder.cs // builds CFG per function + inter-procedural edges
Reachability.cs // BFS/DFS over QuikGraph
Demangle/
ItaniumDemangler.cs // port or wrap CxxDemangler
MicrosoftDemangler.cs // port from LLVM
Export/
JsonWriter.cs // writes schema above
9) Implementation notes (where issues usually bite)
- Mach‑O moderns: Implement both dyld opcode and chained fixups; many macOS 12+/iOS15+ binaries only have chained fixups. (emergetools.com)
- Stubs vs real targets (Mach‑O): map
__stubs→__la_symbol_ptrvia indirect symbols to the true imported symbol (or its post‑fixup target). (reinterpretcast.com) - ELF PLT/GOT: treat
.pltentries as call trampolines; ultimate edge should point to the symbol (library) that satisfiesDT_NEEDED+ version. (refspecs.linuxbase.org) - PE delay‑load: don’t forget
IMAGE_DELAYLOAD_DESCRIPTORfor delayed IATs. (Microsoft Learn) - Function discovery: use
LC_FUNCTION_STARTSwhen symbols are stripped; it’s a cheap way to seed analysis. (Stack Overflow) - Name clarity: demangle Itanium/MSVC so downstream vuln rules can match consistently. (LLVM)
10) What to copy/port verbatim (safe licenses)
- Dyld bind & exports trie logic: from LLVM or LIEF Mach‑O (Apache‑2.0). Great for getting the exact opcode semantics right. (LIEF)
- Chained fixups structs/walkers: from LLVM MachO.h or Apple dyld headers (permissive headers). (LLVM)
- Itanium/MS demanglers: LLVM demangler sources are standalone; easy to translate to C#. (LLVM)
- ARM64 decoder: if Disarm gaps hurt, lift just the decoder pieces from Ryujinx ARMeilleure (MIT). (Gitee)
(Avoid GPL’d parsers like binutils/BFD; they will contaminate your codebase’s licensing.)
11) End‑to‑end pipeline (per container image)
- Enumerate binaries in the container FS.
- Parse each with the appropriate loader →
BinaryModule(+ imports/exports/symbols/relocs). - Simulate linking per platform to resolve imported functions to provider libraries. (refspecs.linuxbase.org)
- Disassemble functions (iced/Disarm) → CFGs → call edges (direct, PLT/IAT/stub, indirect).
- Assemble call graph across modules; normalize names via demangling.
- Reachability: given roots (entry or user‑specified) compute reachable set; emit JSON with purls (from your SBOM/package resolver).
- (Optional) dump GraphViz / MSAGL views for debugging. (GitHub)
12) Quick heuristics for vulnerability triage
- Sink maps: flag edges to high‑risk APIs (
strcpy,gets, legacy SSL ciphers) even without CVE versioning. - DWARF line info (when present): attach file:line to nodes for developer action.
LibObjectFilegives you DWARF v4 reads. (GitHub)
13) Test corpora
- ELF: glibc/openssl/libpng from distro repos; validate
R_*_JUMP_SLOThandling and PLT edges. (cs61.seas.harvard.edu) - PE: system DLLs (Kernel32, Advapi32) and a small MSVC console app; validate IAT & delay‑load. (Microsoft Learn)
- Mach‑O: Xcode‑built binaries across macOS 11 & 12+ to cover both dyld opcode and chained fixups paths; verify
LC_FUNCTION_STARTSimproves discovery. (Stack Overflow)
14) Deliverables you can start coding now
-
MachOLoader.cs
- Parse headers + load commands (ELFSharp).
- Implement
DyldInfoParser(port from LLVM/LIEF) andChainedFixupsParser(port structs & walkers). (LIEF)
-
X86Disassembler.cs / Arm64Disassembler.cs (iced / Disarm bridges). (GitHub)
-
CallGraphBuilder.cs (recursive descent + linear sweep fallback; PLT/IAT/stub resolution).
-
Reachability.cs (QuikGraph BFS/DFS). (GitHub)
-
JsonWriter.cs (schema above with purls).
References (core, load‑bearing)
- ELFSharp (ELF + Mach‑O pure C#). (GitHub)
- LibObjectFile (ELF/PE/DWARF C#, BSD‑2). (GitHub)
- iced (x86/x64 disasm, C#, MIT). (GitHub)
- Disarm (ARM64 disasm, C#, MIT). (GitHub)
- Ryujinx (ARMeilleure) (ARMv8 decode/JIT in C#, MIT). (Gitee)
- ELF gABI & AMD64 supplement (PLT/GOT, relocations). (refspecs.linuxbase.org)
- PE/COFF (imports/exports/IAT). (Microsoft Learn)
- Mach‑O docs (load commands; LC_FUNCTION_STARTS; dyld bindings; chained fixups). (Apple Developer)
If you want, I can draft MachOLoader + DyldInfoParser in C# next, including chained‑fixups structs (ported from LLVM’s headers) and an iced‑based call‑edge walker for x86‑64.
Good, this is exactly the point where you want to nail the internal model, otherwise everything on top (VEX, policy, reporting) will be shaky.
Below is a detailed, implementation-ready plan for a reachability graph with purl-aware edges, covering ELF, PE, and Mach-O, in C#.
I’ll structure it as:
- Overall graph design (3 layers: function, module, purl)
- Core C# data model
- Pipeline steps (end-to-end)
- Format-specific edge construction (ELF / PE / Mach-O)
- Reachability queries (from entrypoints to vulnerable purls / functions)
- JSON output layout and integration with SBOM
1. Overall graph design
You want three tightly linked graph layers:
-
Function-level call graph (FLG)
- Nodes: individual functions inside binaries
- Edges: calls from function A → function B (intra- or inter-module)
-
Module-level graph (MLG)
- Nodes: binaries (ELF/PE/Mach-O files)
- Edges: “module A calls module B at least once” (aggregated from FLG)
-
Purl-level graph (PLG)
- Nodes: purls (packages or generic artifacts)
- Edges: “purl P1 depends-at-runtime on purl P2” (aggregated from module edges)
The reachability algorithm runs primarily on the function graph, but:
- You can project reachability results to module and purl nodes.
- You can also run coarse-grained analysis directly on purl graph when needed (“Is any code in purl X reachable from the container entrypoint?”).
2. Core C# data model
2.1 Identifiers and enums
public enum BinaryFormat { Elf, Pe, MachO }
public readonly record struct ModuleId(string Path, BinaryFormat Format);
public readonly record struct Purl(string Value);
public enum EdgeKind
{
IntraModuleDirect, // call foo -> bar in same module
ImportCall, // call via plt/iat/stub to imported function
SyntheticRoot, // root (entrypoint) edge
IndirectUnresolved // optional: we saw an indirect call we couldn't resolve
}
2.2 Function node
public sealed class FunctionNode
{
public int Id { get; init; } // internal numeric id
public ModuleId Module { get; init; }
public Purl Purl { get; init; } // resolved from Module -> Purl
public ulong Address { get; init; } // VA or RVA
public string Name { get; init; } // mangled
public string? DemangledName { get; init; } // optional
public bool IsExported { get; init; }
public bool IsImportedStub { get; init; } // e.g. PLT stub, Mach-O stub, PE thunks
public bool IsRoot { get; set; } // _start/main/entrypoint etc.
}
2.3 Edges
public sealed class CallEdge
{
public int FromId { get; init; } // FunctionNode.Id
public int ToId { get; init; } // FunctionNode.Id
public EdgeKind Kind { get; init; }
public string Evidence { get; init; } // e.g. "ELF.R_X86_64_JUMP_SLOT", "PE.IAT", "MachO.indirectSym"
}
2.4 Graph container
public sealed class CallGraph
{
public IReadOnlyDictionary<int, FunctionNode> Nodes { get; init; }
public IReadOnlyDictionary<int, List<CallEdge>> OutEdges { get; init; }
public IReadOnlyDictionary<int, List<CallEdge>> InEdges { get; init; }
// Convenience: mappings
public IReadOnlyDictionary<ModuleId, List<int>> FunctionsByModule { get; init; }
public IReadOnlyDictionary<Purl, List<int>> FunctionsByPurl { get; init; }
}
2.5 Purl-level graph view
You don’t store a separate physical graph; you derive it on demand:
public sealed class PurlEdge
{
public Purl From { get; init; }
public Purl To { get; init; }
public List<(int FromFnId, int ToFnId)> SupportingCalls { get; init; }
}
public sealed class PurlGraphView
{
public IReadOnlyDictionary<Purl, HashSet<Purl>> Adjacent { get; init; }
public IReadOnlyList<PurlEdge> Edges { get; init; }
}
3. Pipeline steps (end-to-end)
Step 0 – Inputs
- Set of binaries (files) extracted from container image.
- SBOM or other metadata that can map a file path (or hash) → purl.
Step 1 – Parse binaries → BinaryModule objects
You define a common in-memory model:
public sealed class BinaryModule
{
public ModuleId Id { get; init; }
public Purl Purl { get; init; }
public BinaryFormat Format { get; init; }
// Raw sections / segments
public IReadOnlyList<SectionInfo> Sections { get; init; }
// Symbols
public IReadOnlyList<SymbolInfo> Symbols { get; init; } // imports + exports + locals
// Relocations / fixups
public IReadOnlyList<RelocationInfo> Relocations { get; init; }
// Import/export tables (PE)/dylib commands (Mach-O)/DT_NEEDED (ELF)
public ImportInfo[] Imports { get; init; }
public ExportInfo[] Exports { get; init; }
}
Implement format-specific loaders:
ElfLoader : IBinaryLoaderPeLoader : IBinaryLoaderMachOLoader : IBinaryLoader
Each loader uses your chosen C# parsers or ported code and fills BinaryModule.
Step 2 – Disassembly → basic blocks & candidate functions
For each BinaryModule:
-
Use appropriate decoder (iced for x86/x64; Disarm/ported ARMeilleure for AArch64).
-
Seed function starts:
- Exported functions
- Entry points (
_start,main, AddressOfEntryPoint) - Mach-O
LC_FUNCTION_STARTSif available
-
Walk instructions to build basic blocks:
-
Stop blocks at conditional/unconditional branches, calls, rets.
-
Record for each call site:
- Address of caller function
- Operand type (immediate, memory with import table address, etc.)
-
Disassembler outputs a list of FunctionNode skeletons (no cross-module link yet) and a list of raw call sites:
public sealed class RawCallSite
{
public int CallerFunctionId { get; init; }
public ulong InstructionAddress { get; init; }
public ulong? DirectTargetAddress { get; init; } // e.g. CALL 0x401000
public ulong? MemoryTargetAddress { get; init; } // e.g. CALL [0x404000]
public bool IsIndirect { get; init; } // register-based etc.
}
Step 3 – Build function nodes
Using disassembly + symbol tables:
-
For each discovered function:
- Determine: address, name (if sym available), export/import flags.
- Map
ModuleId→PurlusingIPurlResolver.
-
Populate
FunctionNodeinstances and index them byId.
Step 4 – Construct intra-module edges
For each RawCallSite:
- If
DirectTargetAddressfalls inside a known function’s address range in the same module, add IntraModuleDirect edge.
This gives you “normal” calls like foo() calling bar() in the same .so/.dll/.
Step 5 – Construct inter-module edges (import calls)
This is where ELF/PE/Mach-O differ; details in section 4 below.
But the abstract logic is:
-
For each call site with
MemoryTargetAddress(IAT slot / GOT entry / la_symbol_ptr / PLT): -
From the module’s import, relocation or fixup tables, determine:
- Which imported symbol it corresponds to (name, ordinal, etc.).
- Which imported module / dylib / DLL provides that symbol.
-
Find (or create) a
FunctionNoderepresenting that imported symbol in the provider module. -
Add an ImportCall edge from caller function to the provider
FunctionNode.
This is the key to turning low-level dynamic linking into purl-aware cross-module edges, because each FunctionNode is already stamped with a Purl.
Step 6 – Build adjacency structures
Once you have all FunctionNodes and CallEdges:
- Build
OutEdgesandInEdgesdictionaries keyed byFunctionNode.Id. - Build
FunctionsByModule/FunctionsByPurl.
4. Format-specific edge construction
This is the “how” for step 5, per binary format.
4.1 ELF
Goal: map call sites that go via PLT/GOT to an imported function in a DT_NEEDED library.
Algorithm:
-
Parse:
.dynsym,.dynstr– dynamic symbol table.rela.plt/.rel.plt– relocation entries for PLT.got.plt/.got– PLT’s GOTDT_NEEDEDentries – list of linked shared objects and their sonames
-
For each relocation of type
R_*_JUMP_SLOT:-
It applies to an entry in the PLT GOT; that GOT entry is what CALL instructions read from.
-
Relocation gives you:
- Offset in GOT (
r_offset) - Symbol index (
r_info→ symbol) → dynamic symbol (ElfSymbol) - Symbol name, type (FUNC), binding, etc.
- Offset in GOT (
-
-
Link GOT entries to call sites:
-
For each
RawCallSitewithMemoryTargetAddress, check if that address falls inside.got.plt(or.got). If it does:- Find relocation whose
r_offsetequals that GOT entry offset. - That tells you which symbol is being called.
- Find relocation whose
-
-
Determine provider module:
- From the symbol’s
st_nameandDT_NEEDEDlist, decide which shared object is expected to define it (an approximation is: first DT_NEEDED that provides that name). - Map DT_NEEDED →
ModuleId(you’ll have loaded these modules separately, or you can create “placeholder modules” if they’re not in the container image).
- From the symbol’s
-
Create edges:
- Create/find
FunctionNodefor the imported symbol in provider module. - Add
CallEdgefrom caller function to imported function,EdgeKind = ImportCall,Evidence = "ELF.R_X86_64_JUMP_SLOT"(or arch-specific).
- Create/find
This yields edges like:
myapp:main→libssl.so.1.1:SSL_readlibfoo.so:foo→libc.so.6:malloc
4.2 PE
Goal: map call sites that go via the Import Address Table (IAT) to imported functions in DLLs.
Algorithm:
-
Parse:
IMAGE_IMPORT_DESCRIPTOR[]– each for a DLL name.- Original thunk table (INT) – names/ordinals of imported symbols.
- IAT – where the loader writes function addresses at runtime.
-
For each import entry:
-
Determine:
- DLL name (
Name) - Function name or ordinal (from INT)
- IAT slot address (RVA)
- DLL name (
-
-
Link IAT slots to call sites:
-
For each
RawCallSitewithMemoryTargetAddress:- Check if this address equals the VA of an IAT slot.
- If yes, the call site is effectively calling that imported function.
-
-
Determine provider module:
- The DLL name gives you a target module (e.g.
KERNEL32.dll→ModuleId). - Ensure that DLL is represented as a
BinaryModuleor a “placeholder” if not present in image.
- The DLL name gives you a target module (e.g.
-
Create edges:
- Create/find
FunctionNodefor imported function in provider module. - Add
CallEdgewithEdgeKind = ImportCallandEvidence = "PE.IAT"(or"PE.DelayLoad"if using delay load descriptors).
- Create/find
Example:
myservice.exe:Start→SSPICLI.dll:AcquireCredentialsHandleW
4.3 Mach-O
Goal: map stub calls via __TEXT,__stubs / __DATA,__la_symbol_ptr (and / or chained fixups) to symbols in dependent dylibs.
Algorithm (for classic dyld opcodes, not chained fixups, then extend):
-
Parse:
-
Load commands:
LC_SYMTAB,LC_DYSYMTABLC_LOAD_DYLIB(to know dependent dylibs)LC_FUNCTION_STARTS(for seeding functions)LC_DYLD_INFO(rebase/bind/lazy bind)
-
__TEXT,__stubs– stub code -
__DATA,__la_symbol_ptr(or__DATA_CONST,__la_symbol_ptr) – lazy pointer table -
Indirect symbol table – maps slot indices to symbol table indices
-
-
Stub → la_symbol_ptr mapping:
-
Stubs are small functions (usually a few instructions) that indirect through the corresponding
la_symbol_ptrentry. -
For each stub function:
-
Determine which la_symbol_ptr entry it uses (based on stub index and linking metadata).
-
From the indirect symbol table, find which dynamic symbol that la_symbol_ptr entry corresponds to.
- This gives you symbol name and the index in
LC_LOAD_DYLIB(dylib ordinal).
- This gives you symbol name and the index in
-
-
-
Link stub call sites:
-
In disassembly, treat calls to these stub functions as import calls.
-
For each call instruction
CALL stub_function:RawCallSite.DirectTargetAddresslies inside__TEXT,__stubs.- Resolve stub → la_symbol_ptr → symbol → dylib.
-
-
Determine provider module:
- From dylib ordinal and load commands, get the path / install name of dylib (
libssl.1.1.dylib, etc.). - Map that to a
ModuleIdin your module set.
- From dylib ordinal and load commands, get the path / install name of dylib (
-
Create edges:
- Create/find imported
FunctionNodein provider module. - Add
CallEdgefrom caller to that function withEdgeKind = ImportCall,Evidence = "MachO.IndirectSymbol".
- Create/find imported
For chained fixups (LC_DYLD_CHAINED_FIXUPS), you’ll compute a similar mapping but walking chain entries instead of traditional lazy/weak binds. The key is still:
- Map a stub or function to a fixup entry.
- From fixup, determine the symbol and dylib.
- Then connect call-site → imported function.
5. Reachability queries
Once the graph is built, reachability is “just graph algorithms” + mapping back to purls.
5.1 Roots
Decide what are your root functions:
-
Binary entrypoints:
- ELF:
_start,main, constructors (.init_array) - PE: AddressOfEntryPoint, registered service entrypoints
- Mach-O:
_main, constructors
- ELF:
-
Optionally, any exported API function that a container orchestrator or plugin system will call.
Mark them as FunctionNode.IsRoot = true and create synthetic edges from a special root node if you want:
var syntheticRoot = new FunctionNode
{
Id = 0,
Name = "<root>",
IsRoot = true,
// Module, Purl can be special markers
};
foreach (var fn in allFunctions.Where(f => f.IsRoot))
{
edges.Add(new CallEdge
{
FromId = syntheticRoot.Id,
ToId = fn.Id,
Kind = EdgeKind.SyntheticRoot,
Evidence = "Root"
});
}
5.2 Reachability algorithm (function-level)
Use BFS/DFS from the root node(s):
public sealed class ReachabilityResult
{
public HashSet<int> ReachableFunctions { get; } = new();
}
public ReachabilityResult ComputeReachableFunctions(CallGraph graph, IEnumerable<int> rootIds)
{
var visited = new HashSet<int>();
var stack = new Stack<int>();
foreach (var root in rootIds)
{
if (visited.Add(root))
stack.Push(root);
}
while (stack.Count > 0)
{
var current = stack.Pop();
if (!graph.OutEdges.TryGetValue(current, out var edges))
continue;
foreach (var edge in edges)
{
if (visited.Add(edge.ToId))
stack.Push(edge.ToId);
}
}
return new ReachabilityResult { ReachableFunctions = visited };
}
5.3 Project reachability to modules and purls
Given ReachableFunctions:
public sealed class ReachabilityProjection
{
public HashSet<ModuleId> ReachableModules { get; } = new();
public HashSet<Purl> ReachablePurls { get; } = new();
}
public ReachabilityProjection ProjectToModulesAndPurls(CallGraph graph, ReachabilityResult result)
{
var projection = new ReachabilityProjection();
foreach (var fnId in result.ReachableFunctions)
{
if (!graph.Nodes.TryGetValue(fnId, out var fn))
continue;
projection.ReachableModules.Add(fn.Module);
projection.ReachablePurls.Add(fn.Purl);
}
return projection;
}
Now you can answer questions like:
- “Is any code from purl
pkg:deb/openssl@1.1.1w-1reachable from the container entrypoint?” - “Which purls are reachable at all?”
5.4 Vulnerability reachability
Assume you’ve mapped each vulnerability to:
Purl(where it lives)AffectedFunctionNames(symbols; optionally demangled)
You can implement:
public sealed class VulnerabilitySink
{
public string VulnerabilityId { get; init; } // CVE-...
public Purl Purl { get; init; }
public string FunctionName { get; init; } // symbol name or demangled
}
Resolution algorithm:
-
For each
VulnerabilitySink, find allFunctionNodewith:node.Purl == sink.Purlandnode.Nameornode.DemangledNamematchessink.FunctionName.
-
For each such node, check
ReachableFunctions.Contains(node.Id). -
Build a
Findingobject:
public sealed class VulnerabilityFinding
{
public string VulnerabilityId { get; init; }
public Purl Purl { get; init; }
public bool IsReachable { get; init; }
public List<int> SinkFunctionIds { get; init; } = new();
}
Plus, if you want path evidence, you run a shortest-path search (BFS predecessor map) from root to sink and store the sequence of FunctionNode.Ids.
6. Purl edges (derived graph)
For reporting and analytics, it’s useful to produce a purl-level dependency graph.
Given CallGraph:
public PurlGraphView BuildPurlGraph(CallGraph graph)
{
var edgesByPair = new Dictionary<(Purl From, Purl To), PurlEdge>();
foreach (var kv in graph.OutEdges)
{
var fromFn = graph.Nodes[kv.Key];
foreach (var edge in kv.Value)
{
var toFn = graph.Nodes[edge.ToId];
if (fromFn.Purl.Equals(toFn.Purl))
continue; // intra-purl, skip if you only care about inter-purl
var key = (fromFn.Purl, toFn.Purl);
if (!edgesByPair.TryGetValue(key, out var pe))
{
pe = new PurlEdge
{
From = fromFn.Purl,
To = toFn.Purl,
SupportingCalls = new List<(int, int)>()
};
edgesByPair[key] = pe;
}
pe.SupportingCalls.Add((fromFn.Id, toFn.Id));
}
}
var adj = new Dictionary<Purl, HashSet<Purl>>();
foreach (var kv in edgesByPair)
{
var (from, to) = kv.Key;
if (!adj.TryGetValue(from, out var list))
{
list = new HashSet<Purl>();
adj[from] = list;
}
list.Add(to);
}
return new PurlGraphView
{
Adjacent = adj,
Edges = edgesByPair.Values.ToList()
};
}
This gives you:
- A coarse view of runtime dependencies between purls (“Purl A calls into Purl B”).
- Enough context to emit purl-level VEX or to reason about trust at package granularity.
7. JSON output and SBOM integration
7.1 JSON shape (high level)
You can emit a composite document:
{
"image": "registry.example.com/app@sha256:...",
"modules": [
{
"moduleId": { "path": "/usr/lib/libssl.so.1.1", "format": "Elf" },
"purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
"arch": "x86_64"
}
],
"functions": [
{
"id": 42,
"name": "SSL_do_handshake",
"demangledName": null,
"module": { "path": "/usr/lib/libssl.so.1.1", "format": "Elf" },
"purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
"address": "0x401020",
"exported": true
}
],
"edges": [
{
"from": 10,
"to": 42,
"kind": "ImportCall",
"evidence": "ELF.R_X86_64_JUMP_SLOT"
}
],
"reachability": {
"roots": [1],
"reachableFunctions": [1,10,42]
},
"purlGraph": {
"edges": [
{
"from": "pkg:generic/myapp@1.0.0",
"to": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
"supportingCalls": [[10,42]]
}
]
},
"vulnerabilities": [
{
"id": "CVE-2024-XXXX",
"purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
"sinkFunctions": [42],
"reachable": true,
"paths": [
[1, 10, 42]
]
}
]
}
7.2 Purl resolution
Implement an IPurlResolver interface:
public interface IPurlResolver
{
Purl ResolveForModule(string filePath, byte[] contentHash);
}
Possible implementations:
SbomPurlResolver– given a CycloneDX/SPDX SBOM for the image, match by path or checksum.LinuxPackagePurlResolver– read/var/lib/dpkg/status/ rpm DB in the filesystem.GenericPurlResolver– fallback:pkg:generic/<hash>.
You call the resolver in your loaders so that every BinaryModule has a purl and thus every FunctionNode has a purl.
8. Concrete implementation tasks for your team
-
Data model & interfaces
- Implement
ModuleId,FunctionNode,CallEdge,CallGraph. - Define
RawCallSite,BinaryModule, andIPurlResolver.
- Implement
-
Loaders
ElfLoader: fill symbols, dynamic relocations (PLT), DT_NEEDED, etc.PeLoader: import descriptors, IAT, delay-load.MachOLoader: load commands, stubs, la_symbol_ptr, indirect symbols / chained fixups.
-
Disassembly
X86Disassembler(iced) andArm64Disassembler(Disarm or port).- Function detection and
RawCallSiteextraction.
-
CallGraphBuilder
- Build intra-module edges from direct calls.
- Build inter-module edges using the format-specific rules above.
- Construct final
CallGraphwith adjacency maps and purl mappings.
-
Reachability
- Implement BFS/DFS from root functions.
- Projection to modules + purls.
- Vulnerability sink resolution & path reconstruction.
-
Export
- JSON serializer for the schema above.
- Optional: purl-level summary / VEX generator.
If you want, next step I can do a more concrete design for CallGraphBuilder (including per-format helper classes with method signatures) or a C# skeleton for the ElfImportResolver, PeImportResolver, and MachOStubResolver that plug directly into this plan.