feat: Add new provenance and crypto registry documentation

- Introduced attestation inventory and subject-rekor mapping files for tracking Docker packages. - Added a comprehensive crypto registry decision document outlining defaults and required follow-ups. - Created an offline feeds manifest for bundling air-gap resources. - Implemented a script to generate and update binary manifests for curated binaries. - Added a verification script to ensure binary artefacts are located in approved directories. - Defined new schemas for AdvisoryEvidenceBundle, OrchestratorEnvelope, ScannerReportReadyPayload, and ScannerScanCompletedPayload. - Established project files for StellaOps.Orchestrator.Schemas and StellaOps.PolicyAuthoritySignals.Contracts. - Updated vendor manifest to track pinned binaries for integrity.
2025-11-18 23:47:13 +02:00
parent d3ecd7f8e6
commit e91da22836
44 changed files with 6793 additions and 99 deletions
--- a/docs/product-advisories/18-Nov-2026
+++ b/docs/product-advisories/18-Nov-2026
@@ -0,0 +1,989 @@
+Vlad, here’s a concrete, **pure‑C#** blueprint to build a multi‑format binary analyzer (Mach‑O, ELF, PE) that produces **call graphs + reachability**, with **no external tools**. Where needed, I point to permissively‑licensed code you can **port** (copy) from other ecosystems.
+
+---
+
+## 0) Targets & non‑negotiables
+
+* **Formats:** Mach‑O (inc. LC_DYLD_INFO / LC_DYLD_CHAINED_FIXUPS), ELF (SysV gABI), PE/COFF
+* **Architectures:** x86‑64 (and x86), AArch64 (ARM64)
+* **Outputs:** JSON with **purls** per module + function‑level call graph & reachability
+* **No tool reuse:** Only pure C# libraries or code **ported** from permissive sources
+
+---
+
+## 1) Parsing the containers (pure C#)
+
+**Pick one C# reader per format, keeping licenses permissive:**
+
+* **ELF & Mach‑O:** `ELFSharp` (pure managed C#; ELF + Mach‑O reading). MIT/X11 license. ([GitHub][1])
+* **ELF & PE (+ DWARF v4):** `LibObjectFile` (C#, BSD‑2). Good ELF relocations (i386, x86_64, ARM, AArch64), PE directories, DWARF sections. Use it as your **common object model** for ELF+PE, then add a Mach‑O adapter. ([GitHub][2])
+* **PE (optional alternative):** `PeNet` (pure C#, broad PE directories, imp/exp, TLS, certs). MIT. Useful if you want a second implementation for cross‑checks. ([GitHub][3])
+
+> Why two libs? `LibObjectFile` gives you DWARF and clean models for ELF/PE; `ELFSharp` covers Mach‑O today (and ELF as a fallback). You control the code paths.
+
+**Spec references you’ll implement against** (for correctness of your readers & link‑time semantics):
+
+* **ELF (gABI, AMD64 supplement):** dynamic section, PLT/GOT, `R_X86_64_JUMP_SLOT` semantics (eager vs lazy). ([refspecs.linuxbase.org][4])
+* **PE/COFF:** imports/exports/IAT, delay‑load, TLS. ([Microsoft Learn][5])
+* **Mach‑O:** file layout, load commands (`LC_SYMTAB`, `LC_DYSYMTAB`, `LC_FUNCTION_STARTS`, `LC_DYLD_INFO(_ONLY)`), and the modern `LC_DYLD_CHAINED_FIXUPS`. ([leopard-adc.pepas.com][6])
+
+---
+
+## 2) Mach‑O: what you must **port** (byte‑for‑byte compatible)
+
+Apple moved from traditional dyld bind opcodes to **chained fixups** on macOS 12/iOS 15+; you need both:
+
+* **Dyld bind opcodes** (`LC_DYLD_INFO(_ONLY)`): parse the BIND/LAZY_BIND streams (tuples of `<seg,off,type,ordinal,symbol,addend>`). Port minimal logic from **LLVM** or **LIEF** (both Apache‑2.0‑compatible) into C#. ([LIEF][7])
+* **Chained fixups** (`LC_DYLD_CHAINED_FIXUPS`): port `dyld_chained_fixups_header` structs & chain walking from LLVM’s `MachO.h` or Apple’s dyld headers. This restores imports/rebases without running dyld. ([LLVM][8])
+* **Function discovery hint:** read `LC_FUNCTION_STARTS` (ULEB128 deltas) to seed function boundaries—very helpful on stripped binaries. ([Stack Overflow][9])
+* **Stubs mapping:** resolve `__TEXT,__stubs` ↔ `__DATA,__la_symbol_ptr` via the **indirect symbol table**; conceptually identical to ELF’s PLT/GOT. ([MaskRay][10])
+
+> If you prefer an in‑C# base for Mach‑O manipulation, **Melanzana.MachO** exists (MIT) and has been used by .NET folks for Mach‑O/Code Signing/obj writing; you can mine its approach for load‑command modeling. ([GitHub][11])
+
+---
+
+## 3) Disassembly (pure C#, multi‑arch)
+
+* **x86/x64:** `iced` (C# decoder/disassembler/encoder; MIT; fast & complete). ([GitHub][12])
+* **AArch64/ARM64:** two options that keep you pure‑C#:
+
+  * **Disarm** (pure C# ARM64 disassembler; MIT). Good starting point to decode & get branch/call kinds. ([GitHub][13])
+  * **Port from Ryujinx ARMeilleure** (ARMv8 decoder/JIT in C#, MIT). You can lift only the **decoder** pieces you need. ([Gitee][14])
+* **x86 fallback:** `SharpDisasm` (udis86 port in C#; BSD‑2). Older than iced; keep as a reference. ([GitHub][15])
+
+---
+
+## 4) Call graph recovery (static)
+
+**4.1 Function seeds**
+
+* From symbols (`.dynsym`/`LC_SYMTAB`/PE exports)
+* From **LC_FUNCTION_STARTS** (Mach‑O) for stripped code ([Stack Overflow][9])
+* From entrypoints (`_start`/`main` or PE AddressOfEntryPoint)
+* From exception/unwind tables & DWARF (when present)—`LibObjectFile` already models DWARF v4. ([GitHub][2])
+
+**4.2 CFG & interprocedural calls**
+
+* **Decode** with iced/Disarm from each seed; form **basic blocks** by following control‑flow until terminators (ret/jmp/call).
+* **Direct calls:** immediate targets become edges (PC‑relative fixups where needed).
+* **Imported calls:**
+
+  * **ELF:** calls to PLT stubs → resolve via `.rela.plt` & `R_*_JUMP_SLOT` to symbol names (link‑time target). ([cs61.seas.harvard.edu][16])
+  * **PE:** calls through the **IAT** → resolve via `IMAGE_IMPORT_DESCRIPTOR` / thunk tables. ([Microsoft Learn][5])
+  * **Mach‑O:** calls to `__stubs` use **indirect symbol table** + `__la_symbol_ptr` (or chained fixups) → map to dylib/symbol. ([reinterpretcast.com][17])
+* **Indirect calls within the binary:** heuristics only (function pointer tables, vtables, small constant pools). Keep them labeled **“indirect‑unresolved”** unless a heuristic yields a concrete target.
+
+**4.3 Cross‑binary graph**
+
+* Build module‑level edges by simulating the platform’s loader:
+
+  * **ELF:** honor `DT_NEEDED`, `DT_RPATH/RUNPATH`, versioning (`.gnu.version*`) to pick the definer of an imported symbol. gABI rules apply. ([refspecs.linuxbase.org][4])
+  * **PE:** pick DLL from the import descriptors. ([Microsoft Learn][5])
+  * **Mach‑O:** `LC_LOAD_DYLIB` + dyld binding / chained fixups determine the provider image. ([LIEF][7])
+
+---
+
+## 5) Reachability analysis
+
+Represent the **call graph** using a .NET graph lib (or a simple adjacency set). I suggest:
+
+* **QuikGraph** (successor of QuickGraph; MIT) for algorithms (DFS/BFS, SCCs). Use it to compute reachability from chosen roots (entrypoint(s), exported APIs, or “sinks”). ([GitHub][18])
+
+You can visualize with **MSAGL** (MIT) when you need layouts, but your core output is JSON. ([GitHub][19])
+
+---
+
+## 6) Symbol demangling (nice‑to‑have, pure C#)
+
+* **Itanium (ELF/Mach‑O):** Either port LLVM’s Itanium demangler or use a C# lib like **CxxDemangler** (a C# rewrite of `cpp_demangle`). ([LLVM][20])
+* **MSVC (PE):** Port LLVM’s `MicrosoftDemangle.cpp` (Apache‑2.0 with LLVM exception) to C#. ([LLVM][21])
+
+---
+
+## 7) JSON output (with purls)
+
+Use a stable schema (example) to feed SBOM/vuln matching downstream:
+
+```json
+{
+  "modules": [
+    {
+      "purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1?arch=amd64",
+      "format": "ELF",
+      "arch": "x86_64",
+      "path": "/usr/lib/x86_64-linux-gnu/libssl.so.1.1",
+      "exports": ["SSL_read", "SSL_write"],
+      "imports": ["BIO_new", "EVP_CipherInit_ex"],
+      "functions": [{"name":"SSL_do_handshake","va":"0x401020","size":512,"demangled": "..."}]
+    }
+  ],
+  "graph": {
+    "nodes": [
+      {"id":"bin:main@0x401000","module": "pkg:generic/myapp@1.0.0"},
+      {"id":"lib:SSL_read","module":"pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1?arch=amd64"}
+    ],
+    "edges": [
+      {"src":"bin:main@0x401000","dst":"lib:SSL_read","kind":"import_call","evidence":"ELF.R_X86_64_JUMP_SLOT"}
+    ]
+  },
+  "reachability": {
+    "roots": ["bin:_start","bin:main@0x401000"],
+    "reachable": ["lib:SSL_read", "lib:SSL_write"],
+    "unresolved_indirect_calls": [
+      {"site":"0x402ABC","reason":"register-indirect"}
+    ]
+  }
+}
+```
+
+---
+
+## 8) Minimal C# module layout (sketch)
+
+```
+Stella.Analysis.Core/
+  BinaryModule.cs            // common model (sections, symbols, relocs, imports/exports)
+  Loader/
+    PeLoader.cs              // wrap LibObjectFile (or PeNet) to BinaryModule
+    ElfLoader.cs             // wrap LibObjectFile to BinaryModule
+    MachOLoader.cs           // wrap ELFSharp + your ported Dyld/ChainedFixups
+  Disasm/
+    X86Disassembler.cs       // iced bridge: bytes -> instructions
+    Arm64Disassembler.cs     // Disarm (or ARMeilleure port) bridge
+  Graph/
+    CallGraphBuilder.cs      // builds CFG per function + inter-procedural edges
+    Reachability.cs          // BFS/DFS over QuikGraph
+  Demangle/
+    ItaniumDemangler.cs      // port or wrap CxxDemangler
+    MicrosoftDemangler.cs    // port from LLVM
+  Export/
+    JsonWriter.cs            // writes schema above
+```
+
+---
+
+## 9) Implementation notes (where issues usually bite)
+
+* **Mach‑O moderns:** Implement both dyld opcode **and** chained fixups; many macOS 12+/iOS15+ binaries only have chained fixups. ([emergetools.com][22])
+* **Stubs vs real targets (Mach‑O):** map `__stubs` → `__la_symbol_ptr` via **indirect symbols** to the true imported symbol (or its post‑fixup target). ([reinterpretcast.com][17])
+* **ELF PLT/GOT:** treat `.plt` entries as **call trampolines**; ultimate edge should point to the symbol (library) that satisfies `DT_NEEDED` + version. ([refspecs.linuxbase.org][4])
+* **PE delay‑load:** don’t forget `IMAGE_DELAYLOAD_DESCRIPTOR` for delayed IATs. ([Microsoft Learn][5])
+* **Function discovery:** use `LC_FUNCTION_STARTS` when symbols are stripped; it’s a cheap way to seed analysis. ([Stack Overflow][9])
+* **Name clarity:** demangle Itanium/MSVC so downstream vuln rules can match consistently. ([LLVM][20])
+
+---
+
+## 10) What to **copy/port** verbatim (safe licenses)
+
+* **Dyld bind & exports trie logic:** from **LLVM** or **LIEF** Mach‑O (Apache‑2.0). Great for getting the exact opcode semantics right. ([LIEF][7])
+* **Chained fixups structs/walkers:** from **LLVM MachO.h** or Apple dyld headers (permissive headers). ([LLVM][8])
+* **Itanium/MS demanglers:** LLVM demangler sources are standalone; easy to translate to C#. ([LLVM][23])
+* **ARM64 decoder:** if Disarm gaps hurt, lift just the **decoder** pieces from **Ryujinx ARMeilleure** (MIT). ([Gitee][14])
+
+*(Avoid GPL’d parsers like binutils/BFD; they will contaminate your codebase’s licensing.)*
+
+---
+
+## 11) End‑to‑end pipeline (per container image)
+
+1. **Enumerate binaries** in the container FS.
+2. **Parse** each with the appropriate loader → `BinaryModule` (+ imports/exports/symbols/relocs).
+3. **Simulate linking** per platform to resolve imported functions to provider libraries. ([refspecs.linuxbase.org][4])
+4. **Disassemble** functions (iced/Disarm) → CFGs → **call edges** (direct, PLT/IAT/stub, indirect).
+5. **Assemble call graph** across modules; normalize names via demangling.
+6. **Reachability**: given roots (entry or user‑specified) compute reachable set; emit JSON with **purls** (from your SBOM/package resolver).
+7. **(Optional)** dump GraphViz / MSAGL views for debugging. ([GitHub][19])
+
+---
+
+## 12) Quick heuristics for vulnerability triage
+
+* **Sink maps**: flag edges to high‑risk APIs (`strcpy`, `gets`, legacy SSL ciphers) even without CVE versioning.
+* **DWARF line info** (when present): attach file:line to nodes for developer action. `LibObjectFile` gives you DWARF v4 reads. ([GitHub][2])
+
+---
+
+## 13) Test corpora
+
+* **ELF:** glibc/openssl/libpng from distro repos; validate `R_*_JUMP_SLOT` handling and PLT edges. ([cs61.seas.harvard.edu][16])
+* **PE:** system DLLs (Kernel32, Advapi32) and a small MSVC console app; validate IAT & delay‑load. ([Microsoft Learn][5])
+* **Mach‑O:** Xcode‑built binaries across macOS 11 & 12+ to cover both dyld opcode and chained fixups paths; verify `LC_FUNCTION_STARTS` improves discovery. ([Stack Overflow][9])
+
+---
+
+## 14) Deliverables you can start coding now
+
+* **MachOLoader.cs**
+
+  * Parse headers + load commands (ELFSharp).
+  * Implement `DyldInfoParser` (port from LLVM/LIEF) and `ChainedFixupsParser` (port structs & walkers). ([LIEF][7])
+* **X86Disassembler.cs / Arm64Disassembler.cs** (iced / Disarm bridges). ([GitHub][12])
+* **CallGraphBuilder.cs** (recursive descent + linear sweep fallback; PLT/IAT/stub resolution).
+* **Reachability.cs** (QuikGraph BFS/DFS). ([GitHub][18])
+* **JsonWriter.cs** (schema above with purls).
+
+---
+
+### References (core, load‑bearing)
+
+* **ELFSharp** (ELF + Mach‑O pure C#). ([GitHub][1])
+* **LibObjectFile** (ELF/PE/DWARF C#, BSD‑2). ([GitHub][2])
+* **iced** (x86/x64 disasm, C#, MIT). ([GitHub][12])
+* **Disarm** (ARM64 disasm, C#, MIT). ([GitHub][13])
+* **Ryujinx (ARMeilleure)** (ARMv8 decode/JIT in C#, MIT). ([Gitee][14])
+* **ELF gABI & AMD64 supplement** (PLT/GOT, relocations). ([refspecs.linuxbase.org][4])
+* **PE/COFF** (imports/exports/IAT). ([Microsoft Learn][5])
+* **Mach‑O docs** (load commands; LC_FUNCTION_STARTS; dyld bindings; chained fixups). ([Apple Developer][24])
+
+---
+
+If you want, I can draft **`MachOLoader` + `DyldInfoParser`** in C# next, including chained‑fixups structs (ported from LLVM’s headers) and an **iced**‑based call‑edge walker for x86‑64.
+
+[1]: https://github.com/konrad-kruczynski/elfsharp "GitHub - konrad-kruczynski/elfsharp: Pure managed C# library for reading ELF, UImage, Mach-O binaries."
+[2]: https://github.com/xoofx/LibObjectFile "GitHub - xoofx/LibObjectFile: LibObjectFile is a .NET library to read, manipulate and write linker and executable object files (e.g ELF, PE, DWARF, ar...)"
+[3]: https://github.com/secana/PeNet?utm_source=chatgpt.com "secana/PeNet: Portable Executable (PE) library written in . ..."
+[4]: https://refspecs.linuxbase.org/elf/gabi4%2B/contents.html?utm_source=chatgpt.com "System V Application Binary Interface - DRAFT - 24 April 2001"
+[5]: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format?utm_source=chatgpt.com "PE Format - Win32 apps"
+[6]: https://leopard-adc.pepas.com/documentation/DeveloperTools/Conceptual/MachOTopics/0-Introduction/introduction.html?utm_source=chatgpt.com "Mach-O Programming Topics: Introduction"
+[7]: https://lief.re/doc/stable/doxygen/classLIEF_1_1MachO_1_1DyldInfo.html?utm_source=chatgpt.com "MachO::DyldInfo Class Reference - LIEF"
+[8]: https://llvm.org/doxygen/structllvm_1_1MachO_1_1dyld__chained__fixups__header.html?utm_source=chatgpt.com "MachO::dyld_chained_fixups_header Struct Reference"
+[9]: https://stackoverflow.com/questions/9602438/mach-o-file-lc-function-starts-load-command?utm_source=chatgpt.com "Mach-O file LC_FUNCTION_STARTS load command"
+[10]: https://maskray.me/blog/2021-09-19-all-about-procedure-linkage-table?utm_source=chatgpt.com "All about Procedure Linkage Table"
+[11]: https://github.com/dotnet/runtime/issues/77178 "Discussion: ObjWriter in C# · Issue #77178 · dotnet/runtime · GitHub"
+[12]: https://github.com/icedland/iced?utm_source=chatgpt.com "icedland/iced: Blazing fast and correct x86/x64 ..."
+[13]: https://github.com/SamboyCoding/Disarm?utm_source=chatgpt.com "SamboyCoding/Disarm: Fast, pure-C# ARM64 Disassembler"
+[14]: https://gitee.com/ryujinx/Ryujinx/blob/master/LICENSE.txt?utm_source=chatgpt.com "Ryujinx/Ryujinx"
+[15]: https://github.com/justinstenning/SharpDisasm?utm_source=chatgpt.com "justinstenning/SharpDisasm"
+[16]: https://cs61.seas.harvard.edu/site/2022/pdf/x86-64-abi-20210928.pdf?utm_source=chatgpt.com "System V Application Binary Interface"
+[17]: https://www.reinterpretcast.com/hello-world-mach-o?utm_source=chatgpt.com "The Nitty Gritty of “Hello World” on macOS | reinterpretcast.com"
+[18]: https://github.com/KeRNeLith/QuikGraph?utm_source=chatgpt.com "KeRNeLith/QuikGraph: Generic Graph Data Structures and ..."
+[19]: https://github.com/microsoft/automatic-graph-layout?utm_source=chatgpt.com "microsoft/automatic-graph-layout: A set of tools for ..."
+[20]: https://llvm.org/doxygen/structllvm_1_1ItaniumPartialDemangler.html?utm_source=chatgpt.com "ItaniumPartialDemangler Struct Reference"
+[21]: https://llvm.org/doxygen/MicrosoftDemangle_8cpp_source.html?utm_source=chatgpt.com "lib/Demangle/MicrosoftDemangle.cpp Source File"
+[22]: https://www.emergetools.com/blog/posts/iOS15LaunchTime?utm_source=chatgpt.com "How iOS 15 makes your app launch faster"
+[23]: https://llvm.org/doxygen/ItaniumDemangle_8cpp.html?utm_source=chatgpt.com "lib/Demangle/ItaniumDemangle.cpp File Reference"
+[24]: https://developer.apple.com/library/archive/documentation/Performance/Conceptual/CodeFootprint/Articles/MachOOverview.html?utm_source=chatgpt.com "Overview of the Mach-O Executable Format"
+Good, this is exactly the point where you want to nail the internal model, otherwise everything on top (VEX, policy, reporting) will be shaky.
+
+Below is a detailed, implementation-ready plan for a **reachability graph with purl-aware edges**, covering ELF, PE, and Mach-O, in C#.
+
+I’ll structure it as:
+
+1. Overall graph design (3 layers: function, module, purl)
+2. Core C# data model
+3. Pipeline steps (end-to-end)
+4. Format-specific edge construction (ELF / PE / Mach-O)
+5. Reachability queries (from entrypoints to vulnerable purls / functions)
+6. JSON output layout and integration with SBOM
+
+---
+
+## 1. Overall graph design
+
+You want three tightly linked graph layers:
+
+1. **Function-level call graph (FLG)**
+
+   * Nodes: individual **functions** inside binaries
+   * Edges: calls from function A → function B (intra- or inter-module)
+
+2. **Module-level graph (MLG)**
+
+   * Nodes: **binaries** (ELF/PE/Mach-O files)
+   * Edges: “module A calls module B at least once” (aggregated from FLG)
+
+3. **Purl-level graph (PLG)**
+
+   * Nodes: **purls** (packages or generic artifacts)
+   * Edges: “purl P1 depends-at-runtime on purl P2” (aggregated from module edges)
+
+The **reachability algorithm** runs primarily on the **function graph**, but:
+
+* You can project reachability results to **module** and **purl** nodes.
+* You can also run coarse-grained analysis directly on **purl graph** when needed (“Is any code in purl X reachable from the container entrypoint?”).
+
+---
+
+## 2. Core C# data model
+
+### 2.1 Identifiers and enums
+
+```csharp
+public enum BinaryFormat { Elf, Pe, MachO }
+
+public readonly record struct ModuleId(string Path, BinaryFormat Format);
+
+public readonly record struct Purl(string Value);
+
+public enum EdgeKind
+{
+    IntraModuleDirect,       // call foo -> bar in same module
+    ImportCall,              // call via plt/iat/stub to imported function
+    SyntheticRoot,           // root (entrypoint) edge
+    IndirectUnresolved       // optional: we saw an indirect call we couldn't resolve
+}
+```
+
+### 2.2 Function node
+
+```csharp
+public sealed class FunctionNode
+{
+    public int Id { get; init; }                // internal numeric id
+    public ModuleId Module { get; init; }
+    public Purl Purl { get; init; }             // resolved from Module -> Purl
+    public ulong Address { get; init; }         // VA or RVA
+    public string Name { get; init; }           // mangled
+    public string? DemangledName { get; init; } // optional
+    public bool IsExported { get; init; }
+    public bool IsImportedStub { get; init; }   // e.g. PLT stub, Mach-O stub, PE thunks
+    public bool IsRoot { get; set; }            // _start/main/entrypoint etc.
+}
+```
+
+### 2.3 Edges
+
+```csharp
+public sealed class CallEdge
+{
+    public int FromId { get; init; }        // FunctionNode.Id
+    public int ToId { get; init; }          // FunctionNode.Id
+    public EdgeKind Kind { get; init; }
+    public string Evidence { get; init; }   // e.g. "ELF.R_X86_64_JUMP_SLOT", "PE.IAT", "MachO.indirectSym"
+}
+```
+
+### 2.4 Graph container
+
+```csharp
+public sealed class CallGraph
+{
+    public IReadOnlyDictionary<int, FunctionNode> Nodes { get; init; }
+    public IReadOnlyDictionary<int, List<CallEdge>> OutEdges { get; init; }
+    public IReadOnlyDictionary<int, List<CallEdge>> InEdges { get; init; }
+
+    // Convenience: mappings
+    public IReadOnlyDictionary<ModuleId, List<int>> FunctionsByModule { get; init; }
+    public IReadOnlyDictionary<Purl, List<int>> FunctionsByPurl { get; init; }
+}
+```
+
+### 2.5 Purl-level graph view
+
+You don’t store a separate physical graph; you **derive** it on demand:
+
+```csharp
+public sealed class PurlEdge
+{
+    public Purl From { get; init; }
+    public Purl To { get; init; }
+    public List<(int FromFnId, int ToFnId)> SupportingCalls { get; init; }
+}
+
+public sealed class PurlGraphView
+{
+    public IReadOnlyDictionary<Purl, HashSet<Purl>> Adjacent { get; init; }
+    public IReadOnlyList<PurlEdge> Edges { get; init; }
+}
+```
+
+---
+
+## 3. Pipeline steps (end-to-end)
+
+### Step 0 – Inputs
+
+* Set of binaries (files) extracted from container image.
+* SBOM or other metadata that can map a file path (or hash) → **purl**.
+
+### Step 1 – Parse binaries → `BinaryModule` objects
+
+You define a common in-memory model:
+
+```csharp
+public sealed class BinaryModule
+{
+    public ModuleId Id { get; init; }
+    public Purl Purl { get; init; }
+    public BinaryFormat Format { get; init; }
+
+    // Raw sections / segments
+    public IReadOnlyList<SectionInfo> Sections { get; init; }
+
+    // Symbols
+    public IReadOnlyList<SymbolInfo> Symbols { get; init; }     // imports + exports + locals
+
+    // Relocations / fixups
+    public IReadOnlyList<RelocationInfo> Relocations { get; init; }
+
+    // Import/export tables (PE)/dylib commands (Mach-O)/DT_NEEDED (ELF)
+    public ImportInfo[] Imports { get; init; }
+    public ExportInfo[] Exports { get; init; }
+}
+```
+
+Implement format-specific loaders:
+
+* `ElfLoader : IBinaryLoader`
+* `PeLoader : IBinaryLoader`
+* `MachOLoader : IBinaryLoader`
+
+Each loader uses your chosen C# parsers or ported code and fills `BinaryModule`.
+
+### Step 2 – Disassembly → basic blocks & candidate functions
+
+For each `BinaryModule`:
+
+1. Use appropriate decoder (iced for x86/x64; Disarm/ported ARMeilleure for AArch64).
+2. Seed function starts:
+
+   * Exported functions
+   * Entry points (`_start`, `main`, AddressOfEntryPoint)
+   * Mach-O `LC_FUNCTION_STARTS` if available
+3. Walk instructions to build basic blocks:
+
+   * Stop blocks at conditional/unconditional branches, calls, rets.
+   * Record for each call site:
+
+     * Address of caller function
+     * Operand type (immediate, memory with import table address, etc.)
+
+Disassembler outputs a list of `FunctionNode` skeletons (no cross-module link yet) and a list of **raw call sites**:
+
+```csharp
+public sealed class RawCallSite
+{
+    public int CallerFunctionId { get; init; }
+    public ulong InstructionAddress { get; init; }
+    public ulong? DirectTargetAddress { get; init; }     // e.g. CALL 0x401000
+    public ulong? MemoryTargetAddress { get; init; }     // e.g. CALL [0x404000]
+    public bool IsIndirect { get; init; }                // register-based etc.
+}
+```
+
+### Step 3 – Build function nodes
+
+Using disassembly + symbol tables:
+
+* For each discovered function:
+
+  * Determine: address, name (if sym available), export/import flags.
+  * Map `ModuleId` → `Purl` using `IPurlResolver`.
+* Populate `FunctionNode` instances and index them by `Id`.
+
+### Step 4 – Construct intra-module edges
+
+For each `RawCallSite`:
+
+* If `DirectTargetAddress` falls inside a known function’s address range in the **same module**, add **IntraModuleDirect** edge.
+
+This gives you “normal” calls like `foo()` calling `bar()` in the same .so/.dll/.
+
+### Step 5 – Construct inter-module edges (import calls)
+
+This is where ELF/PE/Mach-O differ; details in section 4 below.
+
+But the abstract logic is:
+
+1. For each call site with `MemoryTargetAddress` (IAT slot / GOT entry / la_symbol_ptr / PLT):
+2. From the module’s import, relocation or fixup tables, determine:
+
+   * Which **imported symbol** it corresponds to (name, ordinal, etc.).
+   * Which **imported module / dylib / DLL** provides that symbol.
+3. Find (or create) a `FunctionNode` representing that imported symbol in the **provider module**.
+4. Add an **ImportCall** edge from caller function to the provider `FunctionNode`.
+
+This is the key to turning low-level dynamic linking into **purl-aware cross-module edges**, because each `FunctionNode` is already stamped with a `Purl`.
+
+### Step 6 – Build adjacency structures
+
+Once you have all `FunctionNode`s and `CallEdge`s:
+
+* Build `OutEdges` and `InEdges` dictionaries keyed by `FunctionNode.Id`.
+* Build `FunctionsByModule` / `FunctionsByPurl`.
+
+---
+
+## 4. Format-specific edge construction
+
+This is the “how” for step 5, per binary format.
+
+### 4.1 ELF
+
+Goal: map call sites that go via PLT/GOT to an imported function in a `DT_NEEDED` library.
+
+Algorithm:
+
+1. Parse:
+
+   * `.dynsym`, `.dynstr` – dynamic symbol table
+   * `.rela.plt` / `.rel.plt` – relocation entries for PLT
+   * `.got.plt` / `.got` – PLT’s GOT
+   * `DT_NEEDED` entries – list of linked shared objects and their sonames
+
+2. For each relocation of type `R_*_JUMP_SLOT`:
+
+   * It applies to an entry in the PLT GOT; that GOT entry is what CALL instructions read from.
+   * Relocation gives you:
+
+     * Offset in GOT (`r_offset`)
+     * Symbol index (`r_info` → symbol) → dynamic symbol (`ElfSymbol`)
+     * Symbol name, type (FUNC), binding, etc.
+
+3. Link GOT entries to call sites:
+
+   * For each `RawCallSite` with `MemoryTargetAddress`, check if that address falls inside `.got.plt` (or `.got`). If it does:
+
+     * Find relocation whose `r_offset` equals that GOT entry offset.
+     * That tells you which **symbol** is being called.
+
+4. Determine provider module:
+
+   * From the symbol’s `st_name` and `DT_NEEDED` list, decide which shared object is expected to define it (an approximation is: first DT_NEEDED that provides that name).
+   * Map DT_NEEDED → `ModuleId` (you’ll have loaded these modules separately, or you can create “placeholder modules” if they’re not in the container image).
+
+5. Create edges:
+
+   * Create/find `FunctionNode` for the **imported symbol** in provider module.
+   * Add `CallEdge` from caller function to imported function, `EdgeKind = ImportCall`, `Evidence = "ELF.R_X86_64_JUMP_SLOT"` (or arch-specific).
+
+This yields edges like:
+
+* `myapp:main` → `libssl.so.1.1:SSL_read`
+* `libfoo.so:foo` → `libc.so.6:malloc`
+
+### 4.2 PE
+
+Goal: map call sites that go via the Import Address Table (IAT) to imported functions in DLLs.
+
+Algorithm:
+
+1. Parse:
+
+   * `IMAGE_IMPORT_DESCRIPTOR[]` – each for a DLL name.
+   * Original thunk table (INT) – names/ordinals of imported symbols.
+   * IAT – where the loader writes function addresses at runtime.
+
+2. For each import entry:
+
+   * Determine:
+
+     * DLL name (`Name`)
+     * Function name or ordinal (from INT)
+     * IAT slot address (RVA)
+
+3. Link IAT slots to call sites:
+
+   * For each `RawCallSite` with `MemoryTargetAddress`:
+
+     * Check if this address equals the VA of an IAT slot.
+     * If yes, the call site is effectively calling that imported function.
+
+4. Determine provider module:
+
+   * The DLL name gives you a target module (e.g. `KERNEL32.dll` → `ModuleId`).
+   * Ensure that DLL is represented as a `BinaryModule` or a “placeholder” if not present in image.
+
+5. Create edges:
+
+   * Create/find `FunctionNode` for imported function in provider module.
+   * Add `CallEdge` with `EdgeKind = ImportCall` and `Evidence = "PE.IAT"` (or `"PE.DelayLoad"` if using delay load descriptors).
+
+Example:
+
+* `myservice.exe:Start` → `SSPICLI.dll:AcquireCredentialsHandleW`
+
+### 4.3 Mach-O
+
+Goal: map stub calls via `__TEXT,__stubs` / `__DATA,__la_symbol_ptr` (and / or chained fixups) to symbols in dependent dylibs.
+
+Algorithm (for classic dyld opcodes, not chained fixups, then extend):
+
+1. Parse:
+
+   * Load commands:
+
+     * `LC_SYMTAB`, `LC_DYSYMTAB`
+     * `LC_LOAD_DYLIB` (to know dependent dylibs)
+     * `LC_FUNCTION_STARTS` (for seeding functions)
+     * `LC_DYLD_INFO` (rebase/bind/lazy bind)
+   * `__TEXT,__stubs` – stub code
+   * `__DATA,__la_symbol_ptr` (or `__DATA_CONST,__la_symbol_ptr`) – lazy pointer table
+   * **Indirect symbol table** – maps slot indices to symbol table indices
+
+2. Stub → la_symbol_ptr mapping:
+
+   * Stubs are small functions (usually a few instructions) that indirect through the corresponding `la_symbol_ptr` entry.
+   * For each stub function:
+
+     * Determine which la_symbol_ptr entry it uses (based on stub index and linking metadata).
+     * From the indirect symbol table, find which dynamic symbol that la_symbol_ptr entry corresponds to.
+
+       * This gives you symbol name and the index in `LC_LOAD_DYLIB` (dylib ordinal).
+
+3. Link stub call sites:
+
+   * In disassembly, treat calls to these stub functions as **import calls**.
+   * For each call instruction `CALL stub_function`:
+
+     * `RawCallSite.DirectTargetAddress` lies inside `__TEXT,__stubs`.
+     * Resolve stub → la_symbol_ptr → symbol → dylib.
+
+4. Determine provider module:
+
+   * From dylib ordinal and load commands, get the path / install name of dylib (`libssl.1.1.dylib`, etc.).
+   * Map that to a `ModuleId` in your module set.
+
+5. Create edges:
+
+   * Create/find imported `FunctionNode` in provider module.
+   * Add `CallEdge` from caller to that function with `EdgeKind = ImportCall`, `Evidence = "MachO.IndirectSymbol"`.
+
+For **chained fixups** (`LC_DYLD_CHAINED_FIXUPS`), you’ll compute a similar mapping but walking chain entries instead of traditional lazy/weak binds. The key is still:
+
+* Map a stub or function to a **fixup** entry.
+* From fixup, determine the symbol and dylib.
+* Then connect call-site → imported function.
+
+---
+
+## 5. Reachability queries
+
+Once the graph is built, reachability is “just graph algorithms” + mapping back to purls.
+
+### 5.1 Roots
+
+Decide what are your **root functions**:
+
+* Binary entrypoints:
+
+  * ELF: `_start`, `main`, constructors (`.init_array`)
+  * PE: AddressOfEntryPoint, registered service entrypoints
+  * Mach-O: `_main`, constructors
+* Optionally, any exported API function that a container orchestrator or plugin system will call.
+
+Mark them as `FunctionNode.IsRoot = true` and create synthetic edges from a special root node if you want:
+
+```csharp
+var syntheticRoot = new FunctionNode
+{
+    Id = 0,
+    Name = "<root>",
+    IsRoot = true,
+    // Module, Purl can be special markers
+};
+
+foreach (var fn in allFunctions.Where(f => f.IsRoot))
+{
+    edges.Add(new CallEdge
+    {
+        FromId = syntheticRoot.Id,
+        ToId = fn.Id,
+        Kind = EdgeKind.SyntheticRoot,
+        Evidence = "Root"
+    });
+}
+```
+
+### 5.2 Reachability algorithm (function-level)
+
+Use BFS/DFS from the root node(s):
+
+```csharp
+public sealed class ReachabilityResult
+{
+    public HashSet<int> ReachableFunctions { get; } = new();
+}
+
+public ReachabilityResult ComputeReachableFunctions(CallGraph graph, IEnumerable<int> rootIds)
+{
+    var visited = new HashSet<int>();
+    var stack = new Stack<int>();
+
+    foreach (var root in rootIds)
+    {
+        if (visited.Add(root))
+            stack.Push(root);
+    }
+
+    while (stack.Count > 0)
+    {
+        var current = stack.Pop();
+
+        if (!graph.OutEdges.TryGetValue(current, out var edges))
+            continue;
+
+        foreach (var edge in edges)
+        {
+            if (visited.Add(edge.ToId))
+                stack.Push(edge.ToId);
+        }
+    }
+
+    return new ReachabilityResult { ReachableFunctions = visited };
+}
+```
+
+### 5.3 Project reachability to modules and purls
+
+Given `ReachableFunctions`:
+
+```csharp
+public sealed class ReachabilityProjection
+{
+    public HashSet<ModuleId> ReachableModules { get; } = new();
+    public HashSet<Purl> ReachablePurls { get; } = new();
+}
+
+public ReachabilityProjection ProjectToModulesAndPurls(CallGraph graph, ReachabilityResult result)
+{
+    var projection = new ReachabilityProjection();
+
+    foreach (var fnId in result.ReachableFunctions)
+    {
+        if (!graph.Nodes.TryGetValue(fnId, out var fn))
+            continue;
+
+        projection.ReachableModules.Add(fn.Module);
+        projection.ReachablePurls.Add(fn.Purl);
+    }
+
+    return projection;
+}
+```
+
+Now you can answer questions like:
+
+* “Is any code from purl `pkg:deb/openssl@1.1.1w-1` reachable from the container entrypoint?”
+* “Which purls are reachable at all?”
+
+### 5.4 Vulnerability reachability
+
+Assume you’ve mapped each vulnerability to:
+
+* `Purl` (where it lives)
+* `AffectedFunctionNames` (symbols; optionally demangled)
+
+You can implement:
+
+```csharp
+public sealed class VulnerabilitySink
+{
+    public string VulnerabilityId { get; init; } // CVE-...
+    public Purl Purl { get; init; }
+    public string FunctionName { get; init; }    // symbol name or demangled
+}
+```
+
+Resolution algorithm:
+
+1. For each `VulnerabilitySink`, find all `FunctionNode` with:
+
+   * `node.Purl == sink.Purl` and
+   * `node.Name` or `node.DemangledName` matches `sink.FunctionName`.
+
+2. For each such node, check `ReachableFunctions.Contains(node.Id)`.
+
+3. Build a `Finding` object:
+
+```csharp
+public sealed class VulnerabilityFinding
+{
+    public string VulnerabilityId { get; init; }
+    public Purl Purl { get; init; }
+    public bool IsReachable { get; init; }
+    public List<int> SinkFunctionIds { get; init; } = new();
+}
+```
+
+Plus, if you want **path evidence**, you run a shortest-path search (BFS predecessor map) from root to sink and store the sequence of `FunctionNode.Id`s.
+
+---
+
+## 6. Purl edges (derived graph)
+
+For reporting and analytics, it’s useful to produce a **purl-level dependency graph**.
+
+Given `CallGraph`:
+
+```csharp
+public PurlGraphView BuildPurlGraph(CallGraph graph)
+{
+    var edgesByPair = new Dictionary<(Purl From, Purl To), PurlEdge>();
+
+    foreach (var kv in graph.OutEdges)
+    {
+        var fromFn = graph.Nodes[kv.Key];
+
+        foreach (var edge in kv.Value)
+        {
+            var toFn = graph.Nodes[edge.ToId];
+
+            if (fromFn.Purl.Equals(toFn.Purl))
+                continue; // intra-purl, skip if you only care about inter-purl
+
+            var key = (fromFn.Purl, toFn.Purl);
+            if (!edgesByPair.TryGetValue(key, out var pe))
+            {
+                pe = new PurlEdge
+                {
+                    From = fromFn.Purl,
+                    To = toFn.Purl,
+                    SupportingCalls = new List<(int, int)>()
+                };
+                edgesByPair[key] = pe;
+            }
+
+            pe.SupportingCalls.Add((fromFn.Id, toFn.Id));
+        }
+    }
+
+    var adj = new Dictionary<Purl, HashSet<Purl>>();
+
+    foreach (var kv in edgesByPair)
+    {
+        var (from, to) = kv.Key;
+        if (!adj.TryGetValue(from, out var list))
+        {
+            list = new HashSet<Purl>();
+            adj[from] = list;
+        }
+        list.Add(to);
+    }
+
+    return new PurlGraphView
+    {
+        Adjacent = adj,
+        Edges = edgesByPair.Values.ToList()
+    };
+}
+```
+
+This gives you:
+
+* A coarse view of runtime dependencies between purls (“Purl A calls into Purl B”).
+* Enough context to emit purl-level VEX or to reason about trust at package granularity.
+
+---
+
+## 7. JSON output and SBOM integration
+
+### 7.1 JSON shape (high level)
+
+You can emit a composite document:
+
+```json
+{
+  "image": "registry.example.com/app@sha256:...",
+  "modules": [
+    {
+      "moduleId": { "path": "/usr/lib/libssl.so.1.1", "format": "Elf" },
+      "purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
+      "arch": "x86_64"
+    }
+  ],
+  "functions": [
+    {
+      "id": 42,
+      "name": "SSL_do_handshake",
+      "demangledName": null,
+      "module": { "path": "/usr/lib/libssl.so.1.1", "format": "Elf" },
+      "purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
+      "address": "0x401020",
+      "exported": true
+    }
+  ],
+  "edges": [
+    {
+      "from": 10,
+      "to": 42,
+      "kind": "ImportCall",
+      "evidence": "ELF.R_X86_64_JUMP_SLOT"
+    }
+  ],
+  "reachability": {
+    "roots": [1],
+    "reachableFunctions": [1,10,42]
+  },
+  "purlGraph": {
+    "edges": [
+      {
+        "from": "pkg:generic/myapp@1.0.0",
+        "to": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
+        "supportingCalls": [[10,42]]
+      }
+    ]
+  },
+  "vulnerabilities": [
+    {
+      "id": "CVE-2024-XXXX",
+      "purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
+      "sinkFunctions": [42],
+      "reachable": true,
+      "paths": [
+        [1, 10, 42]
+      ]
+    }
+  ]
+}
+```
+
+### 7.2 Purl resolution
+
+Implement an `IPurlResolver` interface:
+
+```csharp
+public interface IPurlResolver
+{
+    Purl ResolveForModule(string filePath, byte[] contentHash);
+}
+```
+
+Possible implementations:
+
+* `SbomPurlResolver` – given a CycloneDX/SPDX SBOM for the image, match by path or checksum.
+* `LinuxPackagePurlResolver` – read `/var/lib/dpkg/status` / rpm DB in the filesystem.
+* `GenericPurlResolver` – fallback: `pkg:generic/<hash>`.
+
+You call the resolver in your loaders so that **every `BinaryModule` has a purl** and thus every `FunctionNode` has a purl.
+
+---
+
+## 8. Concrete implementation tasks for your team
+
+1. **Data model & interfaces**
+
+   * Implement `ModuleId`, `FunctionNode`, `CallEdge`, `CallGraph`.
+   * Define `RawCallSite`, `BinaryModule`, and `IPurlResolver`.
+
+2. **Loaders**
+
+   * `ElfLoader`: fill symbols, dynamic relocations (PLT), DT_NEEDED, etc.
+   * `PeLoader`: import descriptors, IAT, delay-load.
+   * `MachOLoader`: load commands, stubs, la_symbol_ptr, indirect symbols / chained fixups.
+
+3. **Disassembly**
+
+   * `X86Disassembler` (iced) and `Arm64Disassembler` (Disarm or port).
+   * Function detection and `RawCallSite` extraction.
+
+4. **CallGraphBuilder**
+
+   * Build intra-module edges from direct calls.
+   * Build inter-module edges using the format-specific rules above.
+   * Construct final `CallGraph` with adjacency maps and purl mappings.
+
+5. **Reachability**
+
+   * Implement BFS/DFS from root functions.
+   * Projection to modules + purls.
+   * Vulnerability sink resolution & path reconstruction.
+
+6. **Export**
+
+   * JSON serializer for the schema above.
+   * Optional: purl-level summary / VEX generator.
+
+---
+
+If you want, next step I can do a **more concrete design for `CallGraphBuilder`** (including per-format helper classes with method signatures) or a **C# skeleton** for the `ElfImportResolver`, `PeImportResolver`, and `MachOStubResolver` that plug directly into this plan.