Vlad, here’s a concrete, **pure‑C#** blueprint to build a multi‑format binary analyzer (Mach‑O, ELF, PE) that produces **call graphs + reachability**, with **no external tools**. Where needed, I point to permissively‑licensed code you can **port** (copy) from other ecosystems.

---

## 0) Targets & non‑negotiables

* **Formats:** Mach‑O (inc. LC_DYLD_INFO / LC_DYLD_CHAINED_FIXUPS), ELF (SysV gABI), PE/COFF
* **Architectures:** x86‑64 (and x86), AArch64 (ARM64)
* **Outputs:** JSON with **purls** per module + function‑level call graph & reachability
* **No tool reuse:** Only pure C# libraries or code **ported** from permissive sources

---

## 1) Parsing the containers (pure C#)

**Pick one C# reader per format, keeping licenses permissive:**

* **ELF & Mach‑O:** `ELFSharp` (pure managed C#; ELF + Mach‑O reading). MIT/X11 license. ([GitHub][1])
* **ELF & PE (+ DWARF v4):** `LibObjectFile` (C#, BSD‑2). Good ELF relocations (i386, x86_64, ARM, AArch64), PE directories, DWARF sections. Use it as your **common object model** for ELF+PE, then add a Mach‑O adapter. ([GitHub][2])
* **PE (optional alternative):** `PeNet` (pure C#, broad PE directories, imp/exp, TLS, certs). MIT. Useful if you want a second implementation for cross‑checks. ([GitHub][3])

> Why two libs? `LibObjectFile` gives you DWARF and clean models for ELF/PE; `ELFSharp` covers Mach‑O today (and ELF as a fallback). You control the code paths.

**Spec references you’ll implement against** (for correctness of your readers & link‑time semantics):

* **ELF (gABI, AMD64 supplement):** dynamic section, PLT/GOT, `R_X86_64_JUMP_SLOT` semantics (eager vs lazy). ([refspecs.linuxbase.org][4])
* **PE/COFF:** imports/exports/IAT, delay‑load, TLS. ([Microsoft Learn][5])
* **Mach‑O:** file layout, load commands (`LC_SYMTAB`, `LC_DYSYMTAB`, `LC_FUNCTION_STARTS`, `LC_DYLD_INFO(_ONLY)`), and the modern `LC_DYLD_CHAINED_FIXUPS`. ([leopard-adc.pepas.com][6])

---

## 2) Mach‑O: what you must **port** (byte‑for‑byte compatible)

Apple moved from traditional dyld bind opcodes to **chained fixups** on macOS 12/iOS 15+; you need both:

* **Dyld bind opcodes** (`LC_DYLD_INFO(_ONLY)`): parse the BIND/LAZY_BIND streams (tuples of `<seg,off,type,ordinal,symbol,addend>`). Port minimal logic from **LLVM** or **LIEF** (both Apache‑2.0‑compatible) into C#. ([LIEF][7])
* **Chained fixups** (`LC_DYLD_CHAINED_FIXUPS`): port `dyld_chained_fixups_header` structs & chain walking from LLVM’s `MachO.h` or Apple’s dyld headers. This restores imports/rebases without running dyld. ([LLVM][8])
* **Function discovery hint:** read `LC_FUNCTION_STARTS` (ULEB128 deltas) to seed function boundaries—very helpful on stripped binaries. ([Stack Overflow][9])
* **Stubs mapping:** resolve `__TEXT,__stubs` ↔ `__DATA,__la_symbol_ptr` via the **indirect symbol table**; conceptually identical to ELF’s PLT/GOT. ([MaskRay][10])

> If you prefer an in‑C# base for Mach‑O manipulation, **Melanzana.MachO** exists (MIT) and has been used by .NET folks for Mach‑O/Code Signing/obj writing; you can mine its approach for load‑command modeling. ([GitHub][11])

---

## 3) Disassembly (pure C#, multi‑arch)

* **x86/x64:** `iced` (C# decoder/disassembler/encoder; MIT; fast & complete). ([GitHub][12])
* **AArch64/ARM64:** two options that keep you pure‑C#:

  * **Disarm** (pure C# ARM64 disassembler; MIT). Good starting point to decode & get branch/call kinds. ([GitHub][13])
  * **Port from Ryujinx ARMeilleure** (ARMv8 decoder/JIT in C#, MIT). You can lift only the **decoder** pieces you need. ([Gitee][14])
* **x86 fallback:** `SharpDisasm` (udis86 port in C#; BSD‑2). Older than iced; keep as a reference. ([GitHub][15])

---

## 4) Call graph recovery (static)

**4.1 Function seeds**

* From symbols (`.dynsym`/`LC_SYMTAB`/PE exports)
* From **LC_FUNCTION_STARTS** (Mach‑O) for stripped code ([Stack Overflow][9])
* From entrypoints (`_start`/`main` or PE AddressOfEntryPoint)
* From exception/unwind tables & DWARF (when present)—`LibObjectFile` already models DWARF v4. ([GitHub][2])

**4.2 CFG & interprocedural calls**

* **Decode** with iced/Disarm from each seed; form **basic blocks** by following control‑flow until terminators (ret/jmp/call).
* **Direct calls:** immediate targets become edges (PC‑relative fixups where needed).
* **Imported calls:**

  * **ELF:** calls to PLT stubs → resolve via `.rela.plt` & `R_*_JUMP_SLOT` to symbol names (link‑time target). ([cs61.seas.harvard.edu][16])
  * **PE:** calls through the **IAT** → resolve via `IMAGE_IMPORT_DESCRIPTOR` / thunk tables. ([Microsoft Learn][5])
  * **Mach‑O:** calls to `__stubs` use **indirect symbol table** + `__la_symbol_ptr` (or chained fixups) → map to dylib/symbol. ([reinterpretcast.com][17])
* **Indirect calls within the binary:** heuristics only (function pointer tables, vtables, small constant pools). Keep them labeled **“indirect‑unresolved”** unless a heuristic yields a concrete target.

**4.3 Cross‑binary graph**

* Build module‑level edges by simulating the platform’s loader:

  * **ELF:** honor `DT_NEEDED`, `DT_RPATH/RUNPATH`, versioning (`.gnu.version*`) to pick the definer of an imported symbol. gABI rules apply. ([refspecs.linuxbase.org][4])
  * **PE:** pick DLL from the import descriptors. ([Microsoft Learn][5])
  * **Mach‑O:** `LC_LOAD_DYLIB` + dyld binding / chained fixups determine the provider image. ([LIEF][7])

---

## 5) Reachability analysis

Represent the **call graph** using a .NET graph lib (or a simple adjacency set). I suggest:

* **QuikGraph** (successor of QuickGraph; MIT) for algorithms (DFS/BFS, SCCs). Use it to compute reachability from chosen roots (entrypoint(s), exported APIs, or “sinks”). ([GitHub][18])

You can visualize with **MSAGL** (MIT) when you need layouts, but your core output is JSON. ([GitHub][19])

---

## 6) Symbol demangling (nice‑to‑have, pure C#)

* **Itanium (ELF/Mach‑O):** Either port LLVM’s Itanium demangler or use a C# lib like **CxxDemangler** (a C# rewrite of `cpp_demangle`). ([LLVM][20])
* **MSVC (PE):** Port LLVM’s `MicrosoftDemangle.cpp` (Apache‑2.0 with LLVM exception) to C#. ([LLVM][21])

---

## 7) JSON output (with purls)

Use a stable schema (example) to feed SBOM/vuln matching downstream:

```json
{
  "modules": [
    {
      "purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1?arch=amd64",
      "format": "ELF",
      "arch": "x86_64",
      "path": "/usr/lib/x86_64-linux-gnu/libssl.so.1.1",
      "exports": ["SSL_read", "SSL_write"],
      "imports": ["BIO_new", "EVP_CipherInit_ex"],
      "functions": [{"name":"SSL_do_handshake","va":"0x401020","size":512,"demangled": "..."}]
    }
  ],
  "graph": {
    "nodes": [
      {"id":"bin:main@0x401000","module": "pkg:generic/myapp@1.0.0"},
      {"id":"lib:SSL_read","module":"pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1?arch=amd64"}
    ],
    "edges": [
      {"src":"bin:main@0x401000","dst":"lib:SSL_read","kind":"import_call","evidence":"ELF.R_X86_64_JUMP_SLOT"}
    ]
  },
  "reachability": {
    "roots": ["bin:_start","bin:main@0x401000"],
    "reachable": ["lib:SSL_read", "lib:SSL_write"],
    "unresolved_indirect_calls": [
      {"site":"0x402ABC","reason":"register-indirect"}
    ]
  }
}
```

---

## 8) Minimal C# module layout (sketch)

```
Stella.Analysis.Core/
  BinaryModule.cs            // common model (sections, symbols, relocs, imports/exports)
  Loader/
    PeLoader.cs              // wrap LibObjectFile (or PeNet) to BinaryModule
    ElfLoader.cs             // wrap LibObjectFile to BinaryModule
    MachOLoader.cs           // wrap ELFSharp + your ported Dyld/ChainedFixups
  Disasm/
    X86Disassembler.cs       // iced bridge: bytes -> instructions
    Arm64Disassembler.cs     // Disarm (or ARMeilleure port) bridge
  Graph/
    CallGraphBuilder.cs      // builds CFG per function + inter-procedural edges
    Reachability.cs          // BFS/DFS over QuikGraph
  Demangle/
    ItaniumDemangler.cs      // port or wrap CxxDemangler
    MicrosoftDemangler.cs    // port from LLVM
  Export/
    JsonWriter.cs            // writes schema above
```

---

## 9) Implementation notes (where issues usually bite)

* **Mach‑O moderns:** Implement both dyld opcode **and** chained fixups; many macOS 12+/iOS15+ binaries only have chained fixups. ([emergetools.com][22])
* **Stubs vs real targets (Mach‑O):** map `__stubs` → `__la_symbol_ptr` via **indirect symbols** to the true imported symbol (or its post‑fixup target). ([reinterpretcast.com][17])
* **ELF PLT/GOT:** treat `.plt` entries as **call trampolines**; ultimate edge should point to the symbol (library) that satisfies `DT_NEEDED` + version. ([refspecs.linuxbase.org][4])
* **PE delay‑load:** don’t forget `IMAGE_DELAYLOAD_DESCRIPTOR` for delayed IATs. ([Microsoft Learn][5])
* **Function discovery:** use `LC_FUNCTION_STARTS` when symbols are stripped; it’s a cheap way to seed analysis. ([Stack Overflow][9])
* **Name clarity:** demangle Itanium/MSVC so downstream vuln rules can match consistently. ([LLVM][20])

---

## 10) What to **copy/port** verbatim (safe licenses)

* **Dyld bind & exports trie logic:** from **LLVM** or **LIEF** Mach‑O (Apache‑2.0). Great for getting the exact opcode semantics right. ([LIEF][7])
* **Chained fixups structs/walkers:** from **LLVM MachO.h** or Apple dyld headers (permissive headers). ([LLVM][8])
* **Itanium/MS demanglers:** LLVM demangler sources are standalone; easy to translate to C#. ([LLVM][23])
* **ARM64 decoder:** if Disarm gaps hurt, lift just the **decoder** pieces from **Ryujinx ARMeilleure** (MIT). ([Gitee][14])

*(Avoid GPL’d parsers like binutils/BFD; they will contaminate your codebase’s licensing.)*

---

## 11) End‑to‑end pipeline (per container image)

1. **Enumerate binaries** in the container FS.
2. **Parse** each with the appropriate loader → `BinaryModule` (+ imports/exports/symbols/relocs).
3. **Simulate linking** per platform to resolve imported functions to provider libraries. ([refspecs.linuxbase.org][4])
4. **Disassemble** functions (iced/Disarm) → CFGs → **call edges** (direct, PLT/IAT/stub, indirect).
5. **Assemble call graph** across modules; normalize names via demangling.
6. **Reachability**: given roots (entry or user‑specified) compute reachable set; emit JSON with **purls** (from your SBOM/package resolver).
7. **(Optional)** dump GraphViz / MSAGL views for debugging. ([GitHub][19])

---

## 12) Quick heuristics for vulnerability triage

* **Sink maps**: flag edges to high‑risk APIs (`strcpy`, `gets`, legacy SSL ciphers) even without CVE versioning.
* **DWARF line info** (when present): attach file:line to nodes for developer action. `LibObjectFile` gives you DWARF v4 reads. ([GitHub][2])

---

## 13) Test corpora

* **ELF:** glibc/openssl/libpng from distro repos; validate `R_*_JUMP_SLOT` handling and PLT edges. ([cs61.seas.harvard.edu][16])
* **PE:** system DLLs (Kernel32, Advapi32) and a small MSVC console app; validate IAT & delay‑load. ([Microsoft Learn][5])
* **Mach‑O:** Xcode‑built binaries across macOS 11 & 12+ to cover both dyld opcode and chained fixups paths; verify `LC_FUNCTION_STARTS` improves discovery. ([Stack Overflow][9])

---

## 14) Deliverables you can start coding now

* **MachOLoader.cs**

  * Parse headers + load commands (ELFSharp).
  * Implement `DyldInfoParser` (port from LLVM/LIEF) and `ChainedFixupsParser` (port structs & walkers). ([LIEF][7])
* **X86Disassembler.cs / Arm64Disassembler.cs** (iced / Disarm bridges). ([GitHub][12])
* **CallGraphBuilder.cs** (recursive descent + linear sweep fallback; PLT/IAT/stub resolution).
* **Reachability.cs** (QuikGraph BFS/DFS). ([GitHub][18])
* **JsonWriter.cs** (schema above with purls).

---

### References (core, load‑bearing)

* **ELFSharp** (ELF + Mach‑O pure C#). ([GitHub][1])
* **LibObjectFile** (ELF/PE/DWARF C#, BSD‑2). ([GitHub][2])
* **iced** (x86/x64 disasm, C#, MIT). ([GitHub][12])
* **Disarm** (ARM64 disasm, C#, MIT). ([GitHub][13])
* **Ryujinx (ARMeilleure)** (ARMv8 decode/JIT in C#, MIT). ([Gitee][14])
* **ELF gABI & AMD64 supplement** (PLT/GOT, relocations). ([refspecs.linuxbase.org][4])
* **PE/COFF** (imports/exports/IAT). ([Microsoft Learn][5])
* **Mach‑O docs** (load commands; LC_FUNCTION_STARTS; dyld bindings; chained fixups). ([Apple Developer][24])

---

If you want, I can draft **`MachOLoader` + `DyldInfoParser`** in C# next, including chained‑fixups structs (ported from LLVM’s headers) and an **iced**‑based call‑edge walker for x86‑64.

[1]: https://github.com/konrad-kruczynski/elfsharp "GitHub - konrad-kruczynski/elfsharp: Pure managed C# library for reading ELF, UImage, Mach-O binaries."
[2]: https://github.com/xoofx/LibObjectFile "GitHub - xoofx/LibObjectFile: LibObjectFile is a .NET library to read, manipulate and write linker and executable object files (e.g ELF, PE, DWARF, ar...)"
[3]: https://github.com/secana/PeNet?utm_source=chatgpt.com "secana/PeNet: Portable Executable (PE) library written in . ..."
[4]: https://refspecs.linuxbase.org/elf/gabi4%2B/contents.html?utm_source=chatgpt.com "System V Application Binary Interface - DRAFT - 24 April 2001"
[5]: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format?utm_source=chatgpt.com "PE Format - Win32 apps"
[6]: https://leopard-adc.pepas.com/documentation/DeveloperTools/Conceptual/MachOTopics/0-Introduction/introduction.html?utm_source=chatgpt.com "Mach-O Programming Topics: Introduction"
[7]: https://lief.re/doc/stable/doxygen/classLIEF_1_1MachO_1_1DyldInfo.html?utm_source=chatgpt.com "MachO::DyldInfo Class Reference - LIEF"
[8]: https://llvm.org/doxygen/structllvm_1_1MachO_1_1dyld__chained__fixups__header.html?utm_source=chatgpt.com "MachO::dyld_chained_fixups_header Struct Reference"
[9]: https://stackoverflow.com/questions/9602438/mach-o-file-lc-function-starts-load-command?utm_source=chatgpt.com "Mach-O file LC_FUNCTION_STARTS load command"
[10]: https://maskray.me/blog/2021-09-19-all-about-procedure-linkage-table?utm_source=chatgpt.com "All about Procedure Linkage Table"
[11]: https://github.com/dotnet/runtime/issues/77178 "Discussion: ObjWriter in C# · Issue #77178 · dotnet/runtime · GitHub"
[12]: https://github.com/icedland/iced?utm_source=chatgpt.com "icedland/iced: Blazing fast and correct x86/x64 ..."
[13]: https://github.com/SamboyCoding/Disarm?utm_source=chatgpt.com "SamboyCoding/Disarm: Fast, pure-C# ARM64 Disassembler"
[14]: https://gitee.com/ryujinx/Ryujinx/blob/master/LICENSE.txt?utm_source=chatgpt.com "Ryujinx/Ryujinx"
[15]: https://github.com/justinstenning/SharpDisasm?utm_source=chatgpt.com "justinstenning/SharpDisasm"
[16]: https://cs61.seas.harvard.edu/site/2022/pdf/x86-64-abi-20210928.pdf?utm_source=chatgpt.com "System V Application Binary Interface"
[17]: https://www.reinterpretcast.com/hello-world-mach-o?utm_source=chatgpt.com "The Nitty Gritty of “Hello World” on macOS | reinterpretcast.com"
[18]: https://github.com/KeRNeLith/QuikGraph?utm_source=chatgpt.com "KeRNeLith/QuikGraph: Generic Graph Data Structures and ..."
[19]: https://github.com/microsoft/automatic-graph-layout?utm_source=chatgpt.com "microsoft/automatic-graph-layout: A set of tools for ..."
[20]: https://llvm.org/doxygen/structllvm_1_1ItaniumPartialDemangler.html?utm_source=chatgpt.com "ItaniumPartialDemangler Struct Reference"
[21]: https://llvm.org/doxygen/MicrosoftDemangle_8cpp_source.html?utm_source=chatgpt.com "lib/Demangle/MicrosoftDemangle.cpp Source File"
[22]: https://www.emergetools.com/blog/posts/iOS15LaunchTime?utm_source=chatgpt.com "How iOS 15 makes your app launch faster"
[23]: https://llvm.org/doxygen/ItaniumDemangle_8cpp.html?utm_source=chatgpt.com "lib/Demangle/ItaniumDemangle.cpp File Reference"
[24]: https://developer.apple.com/library/archive/documentation/Performance/Conceptual/CodeFootprint/Articles/MachOOverview.html?utm_source=chatgpt.com "Overview of the Mach-O Executable Format"
Good, this is exactly the point where you want to nail the internal model, otherwise everything on top (VEX, policy, reporting) will be shaky.

Below is a detailed, implementation-ready plan for a **reachability graph with purl-aware edges**, covering ELF, PE, and Mach-O, in C#.

I’ll structure it as:

1. Overall graph design (3 layers: function, module, purl)
2. Core C# data model
3. Pipeline steps (end-to-end)
4. Format-specific edge construction (ELF / PE / Mach-O)
5. Reachability queries (from entrypoints to vulnerable purls / functions)
6. JSON output layout and integration with SBOM

---

## 1. Overall graph design

You want three tightly linked graph layers:

1. **Function-level call graph (FLG)**

   * Nodes: individual **functions** inside binaries
   * Edges: calls from function A → function B (intra- or inter-module)

2. **Module-level graph (MLG)**

   * Nodes: **binaries** (ELF/PE/Mach-O files)
   * Edges: “module A calls module B at least once” (aggregated from FLG)

3. **Purl-level graph (PLG)**

   * Nodes: **purls** (packages or generic artifacts)
   * Edges: “purl P1 depends-at-runtime on purl P2” (aggregated from module edges)

The **reachability algorithm** runs primarily on the **function graph**, but:

* You can project reachability results to **module** and **purl** nodes.
* You can also run coarse-grained analysis directly on **purl graph** when needed (“Is any code in purl X reachable from the container entrypoint?”).

---

## 2. Core C# data model

### 2.1 Identifiers and enums

```csharp
public enum BinaryFormat { Elf, Pe, MachO }

public readonly record struct ModuleId(string Path, BinaryFormat Format);

public readonly record struct Purl(string Value);

public enum EdgeKind
{
    IntraModuleDirect,       // call foo -> bar in same module
    ImportCall,              // call via plt/iat/stub to imported function
    SyntheticRoot,           // root (entrypoint) edge
    IndirectUnresolved       // optional: we saw an indirect call we couldn't resolve
}
```

### 2.2 Function node

```csharp
public sealed class FunctionNode
{
    public int Id { get; init; }                // internal numeric id
    public ModuleId Module { get; init; }
    public Purl Purl { get; init; }             // resolved from Module -> Purl
    public ulong Address { get; init; }         // VA or RVA
    public string Name { get; init; }           // mangled
    public string? DemangledName { get; init; } // optional
    public bool IsExported { get; init; }
    public bool IsImportedStub { get; init; }   // e.g. PLT stub, Mach-O stub, PE thunks
    public bool IsRoot { get; set; }            // _start/main/entrypoint etc.
}
```

### 2.3 Edges

```csharp
public sealed class CallEdge
{
    public int FromId { get; init; }        // FunctionNode.Id
    public int ToId { get; init; }          // FunctionNode.Id
    public EdgeKind Kind { get; init; }
    public string Evidence { get; init; }   // e.g. "ELF.R_X86_64_JUMP_SLOT", "PE.IAT", "MachO.indirectSym"
}
```

### 2.4 Graph container

```csharp
public sealed class CallGraph
{
    public IReadOnlyDictionary<int, FunctionNode> Nodes { get; init; }
    public IReadOnlyDictionary<int, List<CallEdge>> OutEdges { get; init; }
    public IReadOnlyDictionary<int, List<CallEdge>> InEdges { get; init; }

    // Convenience: mappings
    public IReadOnlyDictionary<ModuleId, List<int>> FunctionsByModule { get; init; }
    public IReadOnlyDictionary<Purl, List<int>> FunctionsByPurl { get; init; }
}
```

### 2.5 Purl-level graph view

You don’t store a separate physical graph; you **derive** it on demand:

```csharp
public sealed class PurlEdge
{
    public Purl From { get; init; }
    public Purl To { get; init; }
    public List<(int FromFnId, int ToFnId)> SupportingCalls { get; init; }
}

public sealed class PurlGraphView
{
    public IReadOnlyDictionary<Purl, HashSet<Purl>> Adjacent { get; init; }
    public IReadOnlyList<PurlEdge> Edges { get; init; }
}
```

---

## 3. Pipeline steps (end-to-end)

### Step 0 – Inputs

* Set of binaries (files) extracted from container image.
* SBOM or other metadata that can map a file path (or hash) → **purl**.

### Step 1 – Parse binaries → `BinaryModule` objects

You define a common in-memory model:

```csharp
public sealed class BinaryModule
{
    public ModuleId Id { get; init; }
    public Purl Purl { get; init; }
    public BinaryFormat Format { get; init; }

    // Raw sections / segments
    public IReadOnlyList<SectionInfo> Sections { get; init; }

    // Symbols
    public IReadOnlyList<SymbolInfo> Symbols { get; init; }     // imports + exports + locals

    // Relocations / fixups
    public IReadOnlyList<RelocationInfo> Relocations { get; init; }

    // Import/export tables (PE)/dylib commands (Mach-O)/DT_NEEDED (ELF)
    public ImportInfo[] Imports { get; init; }
    public ExportInfo[] Exports { get; init; }
}
```

Implement format-specific loaders:

* `ElfLoader : IBinaryLoader`
* `PeLoader : IBinaryLoader`
* `MachOLoader : IBinaryLoader`

Each loader uses your chosen C# parsers or ported code and fills `BinaryModule`.

### Step 2 – Disassembly → basic blocks & candidate functions

For each `BinaryModule`:

1. Use appropriate decoder (iced for x86/x64; Disarm/ported ARMeilleure for AArch64).
2. Seed function starts:

   * Exported functions
   * Entry points (`_start`, `main`, AddressOfEntryPoint)
   * Mach-O `LC_FUNCTION_STARTS` if available
3. Walk instructions to build basic blocks:

   * Stop blocks at conditional/unconditional branches, calls, rets.
   * Record for each call site:

     * Address of caller function
     * Operand type (immediate, memory with import table address, etc.)

Disassembler outputs a list of `FunctionNode` skeletons (no cross-module link yet) and a list of **raw call sites**:

```csharp
public sealed class RawCallSite
{
    public int CallerFunctionId { get; init; }
    public ulong InstructionAddress { get; init; }
    public ulong? DirectTargetAddress { get; init; }     // e.g. CALL 0x401000
    public ulong? MemoryTargetAddress { get; init; }     // e.g. CALL [0x404000]
    public bool IsIndirect { get; init; }                // register-based etc.
}
```

### Step 3 – Build function nodes

Using disassembly + symbol tables:

* For each discovered function:

  * Determine: address, name (if sym available), export/import flags.
  * Map `ModuleId` → `Purl` using `IPurlResolver`.
* Populate `FunctionNode` instances and index them by `Id`.

### Step 4 – Construct intra-module edges

For each `RawCallSite`:

* If `DirectTargetAddress` falls inside a known function’s address range in the **same module**, add **IntraModuleDirect** edge.

This gives you “normal” calls like `foo()` calling `bar()` in the same .so/.dll/.

### Step 5 – Construct inter-module edges (import calls)

This is where ELF/PE/Mach-O differ; details in section 4 below.

But the abstract logic is:

1. For each call site with `MemoryTargetAddress` (IAT slot / GOT entry / la_symbol_ptr / PLT):
2. From the module’s import, relocation or fixup tables, determine:

   * Which **imported symbol** it corresponds to (name, ordinal, etc.).
   * Which **imported module / dylib / DLL** provides that symbol.
3. Find (or create) a `FunctionNode` representing that imported symbol in the **provider module**.
4. Add an **ImportCall** edge from caller function to the provider `FunctionNode`.

This is the key to turning low-level dynamic linking into **purl-aware cross-module edges**, because each `FunctionNode` is already stamped with a `Purl`.

### Step 6 – Build adjacency structures

Once you have all `FunctionNode`s and `CallEdge`s:

* Build `OutEdges` and `InEdges` dictionaries keyed by `FunctionNode.Id`.
* Build `FunctionsByModule` / `FunctionsByPurl`.

---

## 4. Format-specific edge construction

This is the “how” for step 5, per binary format.

### 4.1 ELF

Goal: map call sites that go via PLT/GOT to an imported function in a `DT_NEEDED` library.

Algorithm:

1. Parse:

   * `.dynsym`, `.dynstr` – dynamic symbol table
   * `.rela.plt` / `.rel.plt` – relocation entries for PLT
   * `.got.plt` / `.got` – PLT’s GOT
   * `DT_NEEDED` entries – list of linked shared objects and their sonames

2. For each relocation of type `R_*_JUMP_SLOT`:

   * It applies to an entry in the PLT GOT; that GOT entry is what CALL instructions read from.
   * Relocation gives you:

     * Offset in GOT (`r_offset`)
     * Symbol index (`r_info` → symbol) → dynamic symbol (`ElfSymbol`)
     * Symbol name, type (FUNC), binding, etc.

3. Link GOT entries to call sites:

   * For each `RawCallSite` with `MemoryTargetAddress`, check if that address falls inside `.got.plt` (or `.got`). If it does:

     * Find relocation whose `r_offset` equals that GOT entry offset.
     * That tells you which **symbol** is being called.

4. Determine provider module:

   * From the symbol’s `st_name` and `DT_NEEDED` list, decide which shared object is expected to define it (an approximation is: first DT_NEEDED that provides that name).
   * Map DT_NEEDED → `ModuleId` (you’ll have loaded these modules separately, or you can create “placeholder modules” if they’re not in the container image).

5. Create edges:

   * Create/find `FunctionNode` for the **imported symbol** in provider module.
   * Add `CallEdge` from caller function to imported function, `EdgeKind = ImportCall`, `Evidence = "ELF.R_X86_64_JUMP_SLOT"` (or arch-specific).

This yields edges like:

* `myapp:main` → `libssl.so.1.1:SSL_read`
* `libfoo.so:foo` → `libc.so.6:malloc`

### 4.2 PE

Goal: map call sites that go via the Import Address Table (IAT) to imported functions in DLLs.

Algorithm:

1. Parse:

   * `IMAGE_IMPORT_DESCRIPTOR[]` – each for a DLL name.
   * Original thunk table (INT) – names/ordinals of imported symbols.
   * IAT – where the loader writes function addresses at runtime.

2. For each import entry:

   * Determine:

     * DLL name (`Name`)
     * Function name or ordinal (from INT)
     * IAT slot address (RVA)

3. Link IAT slots to call sites:

   * For each `RawCallSite` with `MemoryTargetAddress`:

     * Check if this address equals the VA of an IAT slot.
     * If yes, the call site is effectively calling that imported function.

4. Determine provider module:

   * The DLL name gives you a target module (e.g. `KERNEL32.dll` → `ModuleId`).
   * Ensure that DLL is represented as a `BinaryModule` or a “placeholder” if not present in image.

5. Create edges:

   * Create/find `FunctionNode` for imported function in provider module.
   * Add `CallEdge` with `EdgeKind = ImportCall` and `Evidence = "PE.IAT"` (or `"PE.DelayLoad"` if using delay load descriptors).

Example:

* `myservice.exe:Start` → `SSPICLI.dll:AcquireCredentialsHandleW`

### 4.3 Mach-O

Goal: map stub calls via `__TEXT,__stubs` / `__DATA,__la_symbol_ptr` (and / or chained fixups) to symbols in dependent dylibs.

Algorithm (for classic dyld opcodes, not chained fixups, then extend):

1. Parse:

   * Load commands:

     * `LC_SYMTAB`, `LC_DYSYMTAB`
     * `LC_LOAD_DYLIB` (to know dependent dylibs)
     * `LC_FUNCTION_STARTS` (for seeding functions)
     * `LC_DYLD_INFO` (rebase/bind/lazy bind)
   * `__TEXT,__stubs` – stub code
   * `__DATA,__la_symbol_ptr` (or `__DATA_CONST,__la_symbol_ptr`) – lazy pointer table
   * **Indirect symbol table** – maps slot indices to symbol table indices

2. Stub → la_symbol_ptr mapping:

   * Stubs are small functions (usually a few instructions) that indirect through the corresponding `la_symbol_ptr` entry.
   * For each stub function:

     * Determine which la_symbol_ptr entry it uses (based on stub index and linking metadata).
     * From the indirect symbol table, find which dynamic symbol that la_symbol_ptr entry corresponds to.

       * This gives you symbol name and the index in `LC_LOAD_DYLIB` (dylib ordinal).

3. Link stub call sites:

   * In disassembly, treat calls to these stub functions as **import calls**.
   * For each call instruction `CALL stub_function`:

     * `RawCallSite.DirectTargetAddress` lies inside `__TEXT,__stubs`.
     * Resolve stub → la_symbol_ptr → symbol → dylib.

4. Determine provider module:

   * From dylib ordinal and load commands, get the path / install name of dylib (`libssl.1.1.dylib`, etc.).
   * Map that to a `ModuleId` in your module set.

5. Create edges:

   * Create/find imported `FunctionNode` in provider module.
   * Add `CallEdge` from caller to that function with `EdgeKind = ImportCall`, `Evidence = "MachO.IndirectSymbol"`.

For **chained fixups** (`LC_DYLD_CHAINED_FIXUPS`), you’ll compute a similar mapping but walking chain entries instead of traditional lazy/weak binds. The key is still:

* Map a stub or function to a **fixup** entry.
* From fixup, determine the symbol and dylib.
* Then connect call-site → imported function.

---

## 5. Reachability queries

Once the graph is built, reachability is “just graph algorithms” + mapping back to purls.

### 5.1 Roots

Decide what are your **root functions**:

* Binary entrypoints:

  * ELF: `_start`, `main`, constructors (`.init_array`)
  * PE: AddressOfEntryPoint, registered service entrypoints
  * Mach-O: `_main`, constructors
* Optionally, any exported API function that a container orchestrator or plugin system will call.

Mark them as `FunctionNode.IsRoot = true` and create synthetic edges from a special root node if you want:

```csharp
var syntheticRoot = new FunctionNode
{
    Id = 0,
    Name = "<root>",
    IsRoot = true,
    // Module, Purl can be special markers
};

foreach (var fn in allFunctions.Where(f => f.IsRoot))
{
    edges.Add(new CallEdge
    {
        FromId = syntheticRoot.Id,
        ToId = fn.Id,
        Kind = EdgeKind.SyntheticRoot,
        Evidence = "Root"
    });
}
```

### 5.2 Reachability algorithm (function-level)

Use BFS/DFS from the root node(s):

```csharp
public sealed class ReachabilityResult
{
    public HashSet<int> ReachableFunctions { get; } = new();
}

public ReachabilityResult ComputeReachableFunctions(CallGraph graph, IEnumerable<int> rootIds)
{
    var visited = new HashSet<int>();
    var stack = new Stack<int>();

    foreach (var root in rootIds)
    {
        if (visited.Add(root))
            stack.Push(root);
    }

    while (stack.Count > 0)
    {
        var current = stack.Pop();

        if (!graph.OutEdges.TryGetValue(current, out var edges))
            continue;

        foreach (var edge in edges)
        {
            if (visited.Add(edge.ToId))
                stack.Push(edge.ToId);
        }
    }

    return new ReachabilityResult { ReachableFunctions = visited };
}
```

### 5.3 Project reachability to modules and purls

Given `ReachableFunctions`:

```csharp
public sealed class ReachabilityProjection
{
    public HashSet<ModuleId> ReachableModules { get; } = new();
    public HashSet<Purl> ReachablePurls { get; } = new();
}

public ReachabilityProjection ProjectToModulesAndPurls(CallGraph graph, ReachabilityResult result)
{
    var projection = new ReachabilityProjection();

    foreach (var fnId in result.ReachableFunctions)
    {
        if (!graph.Nodes.TryGetValue(fnId, out var fn))
            continue;

        projection.ReachableModules.Add(fn.Module);
        projection.ReachablePurls.Add(fn.Purl);
    }

    return projection;
}
```

Now you can answer questions like:

* “Is any code from purl `pkg:deb/openssl@1.1.1w-1` reachable from the container entrypoint?”
* “Which purls are reachable at all?”

### 5.4 Vulnerability reachability

Assume you’ve mapped each vulnerability to:

* `Purl` (where it lives)
* `AffectedFunctionNames` (symbols; optionally demangled)

You can implement:

```csharp
public sealed class VulnerabilitySink
{
    public string VulnerabilityId { get; init; } // CVE-...
    public Purl Purl { get; init; }
    public string FunctionName { get; init; }    // symbol name or demangled
}
```

Resolution algorithm:

1. For each `VulnerabilitySink`, find all `FunctionNode` with:

   * `node.Purl == sink.Purl` and
   * `node.Name` or `node.DemangledName` matches `sink.FunctionName`.

2. For each such node, check `ReachableFunctions.Contains(node.Id)`.

3. Build a `Finding` object:

```csharp
public sealed class VulnerabilityFinding
{
    public string VulnerabilityId { get; init; }
    public Purl Purl { get; init; }
    public bool IsReachable { get; init; }
    public List<int> SinkFunctionIds { get; init; } = new();
}
```

Plus, if you want **path evidence**, you run a shortest-path search (BFS predecessor map) from root to sink and store the sequence of `FunctionNode.Id`s.

---

## 6. Purl edges (derived graph)

For reporting and analytics, it’s useful to produce a **purl-level dependency graph**.

Given `CallGraph`:

```csharp
public PurlGraphView BuildPurlGraph(CallGraph graph)
{
    var edgesByPair = new Dictionary<(Purl From, Purl To), PurlEdge>();

    foreach (var kv in graph.OutEdges)
    {
        var fromFn = graph.Nodes[kv.Key];

        foreach (var edge in kv.Value)
        {
            var toFn = graph.Nodes[edge.ToId];

            if (fromFn.Purl.Equals(toFn.Purl))
                continue; // intra-purl, skip if you only care about inter-purl

            var key = (fromFn.Purl, toFn.Purl);
            if (!edgesByPair.TryGetValue(key, out var pe))
            {
                pe = new PurlEdge
                {
                    From = fromFn.Purl,
                    To = toFn.Purl,
                    SupportingCalls = new List<(int, int)>()
                };
                edgesByPair[key] = pe;
            }

            pe.SupportingCalls.Add((fromFn.Id, toFn.Id));
        }
    }

    var adj = new Dictionary<Purl, HashSet<Purl>>();

    foreach (var kv in edgesByPair)
    {
        var (from, to) = kv.Key;
        if (!adj.TryGetValue(from, out var list))
        {
            list = new HashSet<Purl>();
            adj[from] = list;
        }
        list.Add(to);
    }

    return new PurlGraphView
    {
        Adjacent = adj,
        Edges = edgesByPair.Values.ToList()
    };
}
```

This gives you:

* A coarse view of runtime dependencies between purls (“Purl A calls into Purl B”).
* Enough context to emit purl-level VEX or to reason about trust at package granularity.

---

## 7. JSON output and SBOM integration

### 7.1 JSON shape (high level)

You can emit a composite document:

```json
{
  "image": "registry.example.com/app@sha256:...",
  "modules": [
    {
      "moduleId": { "path": "/usr/lib/libssl.so.1.1", "format": "Elf" },
      "purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
      "arch": "x86_64"
    }
  ],
  "functions": [
    {
      "id": 42,
      "name": "SSL_do_handshake",
      "demangledName": null,
      "module": { "path": "/usr/lib/libssl.so.1.1", "format": "Elf" },
      "purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
      "address": "0x401020",
      "exported": true
    }
  ],
  "edges": [
    {
      "from": 10,
      "to": 42,
      "kind": "ImportCall",
      "evidence": "ELF.R_X86_64_JUMP_SLOT"
    }
  ],
  "reachability": {
    "roots": [1],
    "reachableFunctions": [1,10,42]
  },
  "purlGraph": {
    "edges": [
      {
        "from": "pkg:generic/myapp@1.0.0",
        "to": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
        "supportingCalls": [[10,42]]
      }
    ]
  },
  "vulnerabilities": [
    {
      "id": "CVE-2024-XXXX",
      "purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
      "sinkFunctions": [42],
      "reachable": true,
      "paths": [
        [1, 10, 42]
      ]
    }
  ]
}
```

### 7.2 Purl resolution

Implement an `IPurlResolver` interface:

```csharp
public interface IPurlResolver
{
    Purl ResolveForModule(string filePath, byte[] contentHash);
}
```

Possible implementations:

* `SbomPurlResolver` – given a CycloneDX/SPDX SBOM for the image, match by path or checksum.
* `LinuxPackagePurlResolver` – read `/var/lib/dpkg/status` / rpm DB in the filesystem.
* `GenericPurlResolver` – fallback: `pkg:generic/<hash>`.

You call the resolver in your loaders so that **every `BinaryModule` has a purl** and thus every `FunctionNode` has a purl.

---

## 8. Concrete implementation tasks for your team

1. **Data model & interfaces**

   * Implement `ModuleId`, `FunctionNode`, `CallEdge`, `CallGraph`.
   * Define `RawCallSite`, `BinaryModule`, and `IPurlResolver`.

2. **Loaders**

   * `ElfLoader`: fill symbols, dynamic relocations (PLT), DT_NEEDED, etc.
   * `PeLoader`: import descriptors, IAT, delay-load.
   * `MachOLoader`: load commands, stubs, la_symbol_ptr, indirect symbols / chained fixups.

3. **Disassembly**

   * `X86Disassembler` (iced) and `Arm64Disassembler` (Disarm or port).
   * Function detection and `RawCallSite` extraction.

4. **CallGraphBuilder**

   * Build intra-module edges from direct calls.
   * Build inter-module edges using the format-specific rules above.
   * Construct final `CallGraph` with adjacency maps and purl mappings.

5. **Reachability**

   * Implement BFS/DFS from root functions.
   * Projection to modules + purls.
   * Vulnerability sink resolution & path reconstruction.

6. **Export**

   * JSON serializer for the schema above.
   * Optional: purl-level summary / VEX generator.

---

If you want, next step I can do a **more concrete design for `CallGraphBuilder`** (including per-format helper classes with method signatures) or a **C# skeleton** for the `ElfImportResolver`, `PeImportResolver`, and `MachOStubResolver` that plug directly into this plan.