feat: Add new provenance and crypto registry documentation
- Introduced attestation inventory and subject-rekor mapping files for tracking Docker packages. - Added a comprehensive crypto registry decision document outlining defaults and required follow-ups. - Created an offline feeds manifest for bundling air-gap resources. - Implemented a script to generate and update binary manifests for curated binaries. - Added a verification script to ensure binary artefacts are located in approved directories. - Defined new schemas for AdvisoryEvidenceBundle, OrchestratorEnvelope, ScannerReportReadyPayload, and ScannerScanCompletedPayload. - Established project files for StellaOps.Orchestrator.Schemas and StellaOps.PolicyAuthoritySignals.Contracts. - Updated vendor manifest to track pinned binaries for integrity.
This commit is contained in:
989
docs/product-advisories/18-Nov-2026 - 1 copy 2.md
Normal file
989
docs/product-advisories/18-Nov-2026 - 1 copy 2.md
Normal file
@@ -0,0 +1,989 @@
|
||||
Vlad, here’s a concrete, **pure‑C#** blueprint to build a multi‑format binary analyzer (Mach‑O, ELF, PE) that produces **call graphs + reachability**, with **no external tools**. Where needed, I point to permissively‑licensed code you can **port** (copy) from other ecosystems.
|
||||
|
||||
---
|
||||
|
||||
## 0) Targets & non‑negotiables
|
||||
|
||||
* **Formats:** Mach‑O (inc. LC_DYLD_INFO / LC_DYLD_CHAINED_FIXUPS), ELF (SysV gABI), PE/COFF
|
||||
* **Architectures:** x86‑64 (and x86), AArch64 (ARM64)
|
||||
* **Outputs:** JSON with **purls** per module + function‑level call graph & reachability
|
||||
* **No tool reuse:** Only pure C# libraries or code **ported** from permissive sources
|
||||
|
||||
---
|
||||
|
||||
## 1) Parsing the containers (pure C#)
|
||||
|
||||
**Pick one C# reader per format, keeping licenses permissive:**
|
||||
|
||||
* **ELF & Mach‑O:** `ELFSharp` (pure managed C#; ELF + Mach‑O reading). MIT/X11 license. ([GitHub][1])
|
||||
* **ELF & PE (+ DWARF v4):** `LibObjectFile` (C#, BSD‑2). Good ELF relocations (i386, x86_64, ARM, AArch64), PE directories, DWARF sections. Use it as your **common object model** for ELF+PE, then add a Mach‑O adapter. ([GitHub][2])
|
||||
* **PE (optional alternative):** `PeNet` (pure C#, broad PE directories, imp/exp, TLS, certs). MIT. Useful if you want a second implementation for cross‑checks. ([GitHub][3])
|
||||
|
||||
> Why two libs? `LibObjectFile` gives you DWARF and clean models for ELF/PE; `ELFSharp` covers Mach‑O today (and ELF as a fallback). You control the code paths.
|
||||
|
||||
**Spec references you’ll implement against** (for correctness of your readers & link‑time semantics):
|
||||
|
||||
* **ELF (gABI, AMD64 supplement):** dynamic section, PLT/GOT, `R_X86_64_JUMP_SLOT` semantics (eager vs lazy). ([refspecs.linuxbase.org][4])
|
||||
* **PE/COFF:** imports/exports/IAT, delay‑load, TLS. ([Microsoft Learn][5])
|
||||
* **Mach‑O:** file layout, load commands (`LC_SYMTAB`, `LC_DYSYMTAB`, `LC_FUNCTION_STARTS`, `LC_DYLD_INFO(_ONLY)`), and the modern `LC_DYLD_CHAINED_FIXUPS`. ([leopard-adc.pepas.com][6])
|
||||
|
||||
---
|
||||
|
||||
## 2) Mach‑O: what you must **port** (byte‑for‑byte compatible)
|
||||
|
||||
Apple moved from traditional dyld bind opcodes to **chained fixups** on macOS 12/iOS 15+; you need both:
|
||||
|
||||
* **Dyld bind opcodes** (`LC_DYLD_INFO(_ONLY)`): parse the BIND/LAZY_BIND streams (tuples of `<seg,off,type,ordinal,symbol,addend>`). Port minimal logic from **LLVM** or **LIEF** (both Apache‑2.0‑compatible) into C#. ([LIEF][7])
|
||||
* **Chained fixups** (`LC_DYLD_CHAINED_FIXUPS`): port `dyld_chained_fixups_header` structs & chain walking from LLVM’s `MachO.h` or Apple’s dyld headers. This restores imports/rebases without running dyld. ([LLVM][8])
|
||||
* **Function discovery hint:** read `LC_FUNCTION_STARTS` (ULEB128 deltas) to seed function boundaries—very helpful on stripped binaries. ([Stack Overflow][9])
|
||||
* **Stubs mapping:** resolve `__TEXT,__stubs` ↔ `__DATA,__la_symbol_ptr` via the **indirect symbol table**; conceptually identical to ELF’s PLT/GOT. ([MaskRay][10])
|
||||
|
||||
> If you prefer an in‑C# base for Mach‑O manipulation, **Melanzana.MachO** exists (MIT) and has been used by .NET folks for Mach‑O/Code Signing/obj writing; you can mine its approach for load‑command modeling. ([GitHub][11])
|
||||
|
||||
---
|
||||
|
||||
## 3) Disassembly (pure C#, multi‑arch)
|
||||
|
||||
* **x86/x64:** `iced` (C# decoder/disassembler/encoder; MIT; fast & complete). ([GitHub][12])
|
||||
* **AArch64/ARM64:** two options that keep you pure‑C#:
|
||||
|
||||
* **Disarm** (pure C# ARM64 disassembler; MIT). Good starting point to decode & get branch/call kinds. ([GitHub][13])
|
||||
* **Port from Ryujinx ARMeilleure** (ARMv8 decoder/JIT in C#, MIT). You can lift only the **decoder** pieces you need. ([Gitee][14])
|
||||
* **x86 fallback:** `SharpDisasm` (udis86 port in C#; BSD‑2). Older than iced; keep as a reference. ([GitHub][15])
|
||||
|
||||
---
|
||||
|
||||
## 4) Call graph recovery (static)
|
||||
|
||||
**4.1 Function seeds**
|
||||
|
||||
* From symbols (`.dynsym`/`LC_SYMTAB`/PE exports)
|
||||
* From **LC_FUNCTION_STARTS** (Mach‑O) for stripped code ([Stack Overflow][9])
|
||||
* From entrypoints (`_start`/`main` or PE AddressOfEntryPoint)
|
||||
* From exception/unwind tables & DWARF (when present)—`LibObjectFile` already models DWARF v4. ([GitHub][2])
|
||||
|
||||
**4.2 CFG & interprocedural calls**
|
||||
|
||||
* **Decode** with iced/Disarm from each seed; form **basic blocks** by following control‑flow until terminators (ret/jmp/call).
|
||||
* **Direct calls:** immediate targets become edges (PC‑relative fixups where needed).
|
||||
* **Imported calls:**
|
||||
|
||||
* **ELF:** calls to PLT stubs → resolve via `.rela.plt` & `R_*_JUMP_SLOT` to symbol names (link‑time target). ([cs61.seas.harvard.edu][16])
|
||||
* **PE:** calls through the **IAT** → resolve via `IMAGE_IMPORT_DESCRIPTOR` / thunk tables. ([Microsoft Learn][5])
|
||||
* **Mach‑O:** calls to `__stubs` use **indirect symbol table** + `__la_symbol_ptr` (or chained fixups) → map to dylib/symbol. ([reinterpretcast.com][17])
|
||||
* **Indirect calls within the binary:** heuristics only (function pointer tables, vtables, small constant pools). Keep them labeled **“indirect‑unresolved”** unless a heuristic yields a concrete target.
|
||||
|
||||
**4.3 Cross‑binary graph**
|
||||
|
||||
* Build module‑level edges by simulating the platform’s loader:
|
||||
|
||||
* **ELF:** honor `DT_NEEDED`, `DT_RPATH/RUNPATH`, versioning (`.gnu.version*`) to pick the definer of an imported symbol. gABI rules apply. ([refspecs.linuxbase.org][4])
|
||||
* **PE:** pick DLL from the import descriptors. ([Microsoft Learn][5])
|
||||
* **Mach‑O:** `LC_LOAD_DYLIB` + dyld binding / chained fixups determine the provider image. ([LIEF][7])
|
||||
|
||||
---
|
||||
|
||||
## 5) Reachability analysis
|
||||
|
||||
Represent the **call graph** using a .NET graph lib (or a simple adjacency set). I suggest:
|
||||
|
||||
* **QuikGraph** (successor of QuickGraph; MIT) for algorithms (DFS/BFS, SCCs). Use it to compute reachability from chosen roots (entrypoint(s), exported APIs, or “sinks”). ([GitHub][18])
|
||||
|
||||
You can visualize with **MSAGL** (MIT) when you need layouts, but your core output is JSON. ([GitHub][19])
|
||||
|
||||
---
|
||||
|
||||
## 6) Symbol demangling (nice‑to‑have, pure C#)
|
||||
|
||||
* **Itanium (ELF/Mach‑O):** Either port LLVM’s Itanium demangler or use a C# lib like **CxxDemangler** (a C# rewrite of `cpp_demangle`). ([LLVM][20])
|
||||
* **MSVC (PE):** Port LLVM’s `MicrosoftDemangle.cpp` (Apache‑2.0 with LLVM exception) to C#. ([LLVM][21])
|
||||
|
||||
---
|
||||
|
||||
## 7) JSON output (with purls)
|
||||
|
||||
Use a stable schema (example) to feed SBOM/vuln matching downstream:
|
||||
|
||||
```json
|
||||
{
|
||||
"modules": [
|
||||
{
|
||||
"purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1?arch=amd64",
|
||||
"format": "ELF",
|
||||
"arch": "x86_64",
|
||||
"path": "/usr/lib/x86_64-linux-gnu/libssl.so.1.1",
|
||||
"exports": ["SSL_read", "SSL_write"],
|
||||
"imports": ["BIO_new", "EVP_CipherInit_ex"],
|
||||
"functions": [{"name":"SSL_do_handshake","va":"0x401020","size":512,"demangled": "..."}]
|
||||
}
|
||||
],
|
||||
"graph": {
|
||||
"nodes": [
|
||||
{"id":"bin:main@0x401000","module": "pkg:generic/myapp@1.0.0"},
|
||||
{"id":"lib:SSL_read","module":"pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1?arch=amd64"}
|
||||
],
|
||||
"edges": [
|
||||
{"src":"bin:main@0x401000","dst":"lib:SSL_read","kind":"import_call","evidence":"ELF.R_X86_64_JUMP_SLOT"}
|
||||
]
|
||||
},
|
||||
"reachability": {
|
||||
"roots": ["bin:_start","bin:main@0x401000"],
|
||||
"reachable": ["lib:SSL_read", "lib:SSL_write"],
|
||||
"unresolved_indirect_calls": [
|
||||
{"site":"0x402ABC","reason":"register-indirect"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8) Minimal C# module layout (sketch)
|
||||
|
||||
```
|
||||
Stella.Analysis.Core/
|
||||
BinaryModule.cs // common model (sections, symbols, relocs, imports/exports)
|
||||
Loader/
|
||||
PeLoader.cs // wrap LibObjectFile (or PeNet) to BinaryModule
|
||||
ElfLoader.cs // wrap LibObjectFile to BinaryModule
|
||||
MachOLoader.cs // wrap ELFSharp + your ported Dyld/ChainedFixups
|
||||
Disasm/
|
||||
X86Disassembler.cs // iced bridge: bytes -> instructions
|
||||
Arm64Disassembler.cs // Disarm (or ARMeilleure port) bridge
|
||||
Graph/
|
||||
CallGraphBuilder.cs // builds CFG per function + inter-procedural edges
|
||||
Reachability.cs // BFS/DFS over QuikGraph
|
||||
Demangle/
|
||||
ItaniumDemangler.cs // port or wrap CxxDemangler
|
||||
MicrosoftDemangler.cs // port from LLVM
|
||||
Export/
|
||||
JsonWriter.cs // writes schema above
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9) Implementation notes (where issues usually bite)
|
||||
|
||||
* **Mach‑O moderns:** Implement both dyld opcode **and** chained fixups; many macOS 12+/iOS15+ binaries only have chained fixups. ([emergetools.com][22])
|
||||
* **Stubs vs real targets (Mach‑O):** map `__stubs` → `__la_symbol_ptr` via **indirect symbols** to the true imported symbol (or its post‑fixup target). ([reinterpretcast.com][17])
|
||||
* **ELF PLT/GOT:** treat `.plt` entries as **call trampolines**; ultimate edge should point to the symbol (library) that satisfies `DT_NEEDED` + version. ([refspecs.linuxbase.org][4])
|
||||
* **PE delay‑load:** don’t forget `IMAGE_DELAYLOAD_DESCRIPTOR` for delayed IATs. ([Microsoft Learn][5])
|
||||
* **Function discovery:** use `LC_FUNCTION_STARTS` when symbols are stripped; it’s a cheap way to seed analysis. ([Stack Overflow][9])
|
||||
* **Name clarity:** demangle Itanium/MSVC so downstream vuln rules can match consistently. ([LLVM][20])
|
||||
|
||||
---
|
||||
|
||||
## 10) What to **copy/port** verbatim (safe licenses)
|
||||
|
||||
* **Dyld bind & exports trie logic:** from **LLVM** or **LIEF** Mach‑O (Apache‑2.0). Great for getting the exact opcode semantics right. ([LIEF][7])
|
||||
* **Chained fixups structs/walkers:** from **LLVM MachO.h** or Apple dyld headers (permissive headers). ([LLVM][8])
|
||||
* **Itanium/MS demanglers:** LLVM demangler sources are standalone; easy to translate to C#. ([LLVM][23])
|
||||
* **ARM64 decoder:** if Disarm gaps hurt, lift just the **decoder** pieces from **Ryujinx ARMeilleure** (MIT). ([Gitee][14])
|
||||
|
||||
*(Avoid GPL’d parsers like binutils/BFD; they will contaminate your codebase’s licensing.)*
|
||||
|
||||
---
|
||||
|
||||
## 11) End‑to‑end pipeline (per container image)
|
||||
|
||||
1. **Enumerate binaries** in the container FS.
|
||||
2. **Parse** each with the appropriate loader → `BinaryModule` (+ imports/exports/symbols/relocs).
|
||||
3. **Simulate linking** per platform to resolve imported functions to provider libraries. ([refspecs.linuxbase.org][4])
|
||||
4. **Disassemble** functions (iced/Disarm) → CFGs → **call edges** (direct, PLT/IAT/stub, indirect).
|
||||
5. **Assemble call graph** across modules; normalize names via demangling.
|
||||
6. **Reachability**: given roots (entry or user‑specified) compute reachable set; emit JSON with **purls** (from your SBOM/package resolver).
|
||||
7. **(Optional)** dump GraphViz / MSAGL views for debugging. ([GitHub][19])
|
||||
|
||||
---
|
||||
|
||||
## 12) Quick heuristics for vulnerability triage
|
||||
|
||||
* **Sink maps**: flag edges to high‑risk APIs (`strcpy`, `gets`, legacy SSL ciphers) even without CVE versioning.
|
||||
* **DWARF line info** (when present): attach file:line to nodes for developer action. `LibObjectFile` gives you DWARF v4 reads. ([GitHub][2])
|
||||
|
||||
---
|
||||
|
||||
## 13) Test corpora
|
||||
|
||||
* **ELF:** glibc/openssl/libpng from distro repos; validate `R_*_JUMP_SLOT` handling and PLT edges. ([cs61.seas.harvard.edu][16])
|
||||
* **PE:** system DLLs (Kernel32, Advapi32) and a small MSVC console app; validate IAT & delay‑load. ([Microsoft Learn][5])
|
||||
* **Mach‑O:** Xcode‑built binaries across macOS 11 & 12+ to cover both dyld opcode and chained fixups paths; verify `LC_FUNCTION_STARTS` improves discovery. ([Stack Overflow][9])
|
||||
|
||||
---
|
||||
|
||||
## 14) Deliverables you can start coding now
|
||||
|
||||
* **MachOLoader.cs**
|
||||
|
||||
* Parse headers + load commands (ELFSharp).
|
||||
* Implement `DyldInfoParser` (port from LLVM/LIEF) and `ChainedFixupsParser` (port structs & walkers). ([LIEF][7])
|
||||
* **X86Disassembler.cs / Arm64Disassembler.cs** (iced / Disarm bridges). ([GitHub][12])
|
||||
* **CallGraphBuilder.cs** (recursive descent + linear sweep fallback; PLT/IAT/stub resolution).
|
||||
* **Reachability.cs** (QuikGraph BFS/DFS). ([GitHub][18])
|
||||
* **JsonWriter.cs** (schema above with purls).
|
||||
|
||||
---
|
||||
|
||||
### References (core, load‑bearing)
|
||||
|
||||
* **ELFSharp** (ELF + Mach‑O pure C#). ([GitHub][1])
|
||||
* **LibObjectFile** (ELF/PE/DWARF C#, BSD‑2). ([GitHub][2])
|
||||
* **iced** (x86/x64 disasm, C#, MIT). ([GitHub][12])
|
||||
* **Disarm** (ARM64 disasm, C#, MIT). ([GitHub][13])
|
||||
* **Ryujinx (ARMeilleure)** (ARMv8 decode/JIT in C#, MIT). ([Gitee][14])
|
||||
* **ELF gABI & AMD64 supplement** (PLT/GOT, relocations). ([refspecs.linuxbase.org][4])
|
||||
* **PE/COFF** (imports/exports/IAT). ([Microsoft Learn][5])
|
||||
* **Mach‑O docs** (load commands; LC_FUNCTION_STARTS; dyld bindings; chained fixups). ([Apple Developer][24])
|
||||
|
||||
---
|
||||
|
||||
If you want, I can draft **`MachOLoader` + `DyldInfoParser`** in C# next, including chained‑fixups structs (ported from LLVM’s headers) and an **iced**‑based call‑edge walker for x86‑64.
|
||||
|
||||
[1]: https://github.com/konrad-kruczynski/elfsharp "GitHub - konrad-kruczynski/elfsharp: Pure managed C# library for reading ELF, UImage, Mach-O binaries."
|
||||
[2]: https://github.com/xoofx/LibObjectFile "GitHub - xoofx/LibObjectFile: LibObjectFile is a .NET library to read, manipulate and write linker and executable object files (e.g ELF, PE, DWARF, ar...)"
|
||||
[3]: https://github.com/secana/PeNet?utm_source=chatgpt.com "secana/PeNet: Portable Executable (PE) library written in . ..."
|
||||
[4]: https://refspecs.linuxbase.org/elf/gabi4%2B/contents.html?utm_source=chatgpt.com "System V Application Binary Interface - DRAFT - 24 April 2001"
|
||||
[5]: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format?utm_source=chatgpt.com "PE Format - Win32 apps"
|
||||
[6]: https://leopard-adc.pepas.com/documentation/DeveloperTools/Conceptual/MachOTopics/0-Introduction/introduction.html?utm_source=chatgpt.com "Mach-O Programming Topics: Introduction"
|
||||
[7]: https://lief.re/doc/stable/doxygen/classLIEF_1_1MachO_1_1DyldInfo.html?utm_source=chatgpt.com "MachO::DyldInfo Class Reference - LIEF"
|
||||
[8]: https://llvm.org/doxygen/structllvm_1_1MachO_1_1dyld__chained__fixups__header.html?utm_source=chatgpt.com "MachO::dyld_chained_fixups_header Struct Reference"
|
||||
[9]: https://stackoverflow.com/questions/9602438/mach-o-file-lc-function-starts-load-command?utm_source=chatgpt.com "Mach-O file LC_FUNCTION_STARTS load command"
|
||||
[10]: https://maskray.me/blog/2021-09-19-all-about-procedure-linkage-table?utm_source=chatgpt.com "All about Procedure Linkage Table"
|
||||
[11]: https://github.com/dotnet/runtime/issues/77178 "Discussion: ObjWriter in C# · Issue #77178 · dotnet/runtime · GitHub"
|
||||
[12]: https://github.com/icedland/iced?utm_source=chatgpt.com "icedland/iced: Blazing fast and correct x86/x64 ..."
|
||||
[13]: https://github.com/SamboyCoding/Disarm?utm_source=chatgpt.com "SamboyCoding/Disarm: Fast, pure-C# ARM64 Disassembler"
|
||||
[14]: https://gitee.com/ryujinx/Ryujinx/blob/master/LICENSE.txt?utm_source=chatgpt.com "Ryujinx/Ryujinx"
|
||||
[15]: https://github.com/justinstenning/SharpDisasm?utm_source=chatgpt.com "justinstenning/SharpDisasm"
|
||||
[16]: https://cs61.seas.harvard.edu/site/2022/pdf/x86-64-abi-20210928.pdf?utm_source=chatgpt.com "System V Application Binary Interface"
|
||||
[17]: https://www.reinterpretcast.com/hello-world-mach-o?utm_source=chatgpt.com "The Nitty Gritty of “Hello World” on macOS | reinterpretcast.com"
|
||||
[18]: https://github.com/KeRNeLith/QuikGraph?utm_source=chatgpt.com "KeRNeLith/QuikGraph: Generic Graph Data Structures and ..."
|
||||
[19]: https://github.com/microsoft/automatic-graph-layout?utm_source=chatgpt.com "microsoft/automatic-graph-layout: A set of tools for ..."
|
||||
[20]: https://llvm.org/doxygen/structllvm_1_1ItaniumPartialDemangler.html?utm_source=chatgpt.com "ItaniumPartialDemangler Struct Reference"
|
||||
[21]: https://llvm.org/doxygen/MicrosoftDemangle_8cpp_source.html?utm_source=chatgpt.com "lib/Demangle/MicrosoftDemangle.cpp Source File"
|
||||
[22]: https://www.emergetools.com/blog/posts/iOS15LaunchTime?utm_source=chatgpt.com "How iOS 15 makes your app launch faster"
|
||||
[23]: https://llvm.org/doxygen/ItaniumDemangle_8cpp.html?utm_source=chatgpt.com "lib/Demangle/ItaniumDemangle.cpp File Reference"
|
||||
[24]: https://developer.apple.com/library/archive/documentation/Performance/Conceptual/CodeFootprint/Articles/MachOOverview.html?utm_source=chatgpt.com "Overview of the Mach-O Executable Format"
|
||||
Good, this is exactly the point where you want to nail the internal model, otherwise everything on top (VEX, policy, reporting) will be shaky.
|
||||
|
||||
Below is a detailed, implementation-ready plan for a **reachability graph with purl-aware edges**, covering ELF, PE, and Mach-O, in C#.
|
||||
|
||||
I’ll structure it as:
|
||||
|
||||
1. Overall graph design (3 layers: function, module, purl)
|
||||
2. Core C# data model
|
||||
3. Pipeline steps (end-to-end)
|
||||
4. Format-specific edge construction (ELF / PE / Mach-O)
|
||||
5. Reachability queries (from entrypoints to vulnerable purls / functions)
|
||||
6. JSON output layout and integration with SBOM
|
||||
|
||||
---
|
||||
|
||||
## 1. Overall graph design
|
||||
|
||||
You want three tightly linked graph layers:
|
||||
|
||||
1. **Function-level call graph (FLG)**
|
||||
|
||||
* Nodes: individual **functions** inside binaries
|
||||
* Edges: calls from function A → function B (intra- or inter-module)
|
||||
|
||||
2. **Module-level graph (MLG)**
|
||||
|
||||
* Nodes: **binaries** (ELF/PE/Mach-O files)
|
||||
* Edges: “module A calls module B at least once” (aggregated from FLG)
|
||||
|
||||
3. **Purl-level graph (PLG)**
|
||||
|
||||
* Nodes: **purls** (packages or generic artifacts)
|
||||
* Edges: “purl P1 depends-at-runtime on purl P2” (aggregated from module edges)
|
||||
|
||||
The **reachability algorithm** runs primarily on the **function graph**, but:
|
||||
|
||||
* You can project reachability results to **module** and **purl** nodes.
|
||||
* You can also run coarse-grained analysis directly on **purl graph** when needed (“Is any code in purl X reachable from the container entrypoint?”).
|
||||
|
||||
---
|
||||
|
||||
## 2. Core C# data model
|
||||
|
||||
### 2.1 Identifiers and enums
|
||||
|
||||
```csharp
|
||||
public enum BinaryFormat { Elf, Pe, MachO }
|
||||
|
||||
public readonly record struct ModuleId(string Path, BinaryFormat Format);
|
||||
|
||||
public readonly record struct Purl(string Value);
|
||||
|
||||
public enum EdgeKind
|
||||
{
|
||||
IntraModuleDirect, // call foo -> bar in same module
|
||||
ImportCall, // call via plt/iat/stub to imported function
|
||||
SyntheticRoot, // root (entrypoint) edge
|
||||
IndirectUnresolved // optional: we saw an indirect call we couldn't resolve
|
||||
}
|
||||
```
|
||||
|
||||
### 2.2 Function node
|
||||
|
||||
```csharp
|
||||
public sealed class FunctionNode
|
||||
{
|
||||
public int Id { get; init; } // internal numeric id
|
||||
public ModuleId Module { get; init; }
|
||||
public Purl Purl { get; init; } // resolved from Module -> Purl
|
||||
public ulong Address { get; init; } // VA or RVA
|
||||
public string Name { get; init; } // mangled
|
||||
public string? DemangledName { get; init; } // optional
|
||||
public bool IsExported { get; init; }
|
||||
public bool IsImportedStub { get; init; } // e.g. PLT stub, Mach-O stub, PE thunks
|
||||
public bool IsRoot { get; set; } // _start/main/entrypoint etc.
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 Edges
|
||||
|
||||
```csharp
|
||||
public sealed class CallEdge
|
||||
{
|
||||
public int FromId { get; init; } // FunctionNode.Id
|
||||
public int ToId { get; init; } // FunctionNode.Id
|
||||
public EdgeKind Kind { get; init; }
|
||||
public string Evidence { get; init; } // e.g. "ELF.R_X86_64_JUMP_SLOT", "PE.IAT", "MachO.indirectSym"
|
||||
}
|
||||
```
|
||||
|
||||
### 2.4 Graph container
|
||||
|
||||
```csharp
|
||||
public sealed class CallGraph
|
||||
{
|
||||
public IReadOnlyDictionary<int, FunctionNode> Nodes { get; init; }
|
||||
public IReadOnlyDictionary<int, List<CallEdge>> OutEdges { get; init; }
|
||||
public IReadOnlyDictionary<int, List<CallEdge>> InEdges { get; init; }
|
||||
|
||||
// Convenience: mappings
|
||||
public IReadOnlyDictionary<ModuleId, List<int>> FunctionsByModule { get; init; }
|
||||
public IReadOnlyDictionary<Purl, List<int>> FunctionsByPurl { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
### 2.5 Purl-level graph view
|
||||
|
||||
You don’t store a separate physical graph; you **derive** it on demand:
|
||||
|
||||
```csharp
|
||||
public sealed class PurlEdge
|
||||
{
|
||||
public Purl From { get; init; }
|
||||
public Purl To { get; init; }
|
||||
public List<(int FromFnId, int ToFnId)> SupportingCalls { get; init; }
|
||||
}
|
||||
|
||||
public sealed class PurlGraphView
|
||||
{
|
||||
public IReadOnlyDictionary<Purl, HashSet<Purl>> Adjacent { get; init; }
|
||||
public IReadOnlyList<PurlEdge> Edges { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Pipeline steps (end-to-end)
|
||||
|
||||
### Step 0 – Inputs
|
||||
|
||||
* Set of binaries (files) extracted from container image.
|
||||
* SBOM or other metadata that can map a file path (or hash) → **purl**.
|
||||
|
||||
### Step 1 – Parse binaries → `BinaryModule` objects
|
||||
|
||||
You define a common in-memory model:
|
||||
|
||||
```csharp
|
||||
public sealed class BinaryModule
|
||||
{
|
||||
public ModuleId Id { get; init; }
|
||||
public Purl Purl { get; init; }
|
||||
public BinaryFormat Format { get; init; }
|
||||
|
||||
// Raw sections / segments
|
||||
public IReadOnlyList<SectionInfo> Sections { get; init; }
|
||||
|
||||
// Symbols
|
||||
public IReadOnlyList<SymbolInfo> Symbols { get; init; } // imports + exports + locals
|
||||
|
||||
// Relocations / fixups
|
||||
public IReadOnlyList<RelocationInfo> Relocations { get; init; }
|
||||
|
||||
// Import/export tables (PE)/dylib commands (Mach-O)/DT_NEEDED (ELF)
|
||||
public ImportInfo[] Imports { get; init; }
|
||||
public ExportInfo[] Exports { get; init; }
|
||||
}
|
||||
```
|
||||
|
||||
Implement format-specific loaders:
|
||||
|
||||
* `ElfLoader : IBinaryLoader`
|
||||
* `PeLoader : IBinaryLoader`
|
||||
* `MachOLoader : IBinaryLoader`
|
||||
|
||||
Each loader uses your chosen C# parsers or ported code and fills `BinaryModule`.
|
||||
|
||||
### Step 2 – Disassembly → basic blocks & candidate functions
|
||||
|
||||
For each `BinaryModule`:
|
||||
|
||||
1. Use appropriate decoder (iced for x86/x64; Disarm/ported ARMeilleure for AArch64).
|
||||
2. Seed function starts:
|
||||
|
||||
* Exported functions
|
||||
* Entry points (`_start`, `main`, AddressOfEntryPoint)
|
||||
* Mach-O `LC_FUNCTION_STARTS` if available
|
||||
3. Walk instructions to build basic blocks:
|
||||
|
||||
* Stop blocks at conditional/unconditional branches, calls, rets.
|
||||
* Record for each call site:
|
||||
|
||||
* Address of caller function
|
||||
* Operand type (immediate, memory with import table address, etc.)
|
||||
|
||||
Disassembler outputs a list of `FunctionNode` skeletons (no cross-module link yet) and a list of **raw call sites**:
|
||||
|
||||
```csharp
|
||||
public sealed class RawCallSite
|
||||
{
|
||||
public int CallerFunctionId { get; init; }
|
||||
public ulong InstructionAddress { get; init; }
|
||||
public ulong? DirectTargetAddress { get; init; } // e.g. CALL 0x401000
|
||||
public ulong? MemoryTargetAddress { get; init; } // e.g. CALL [0x404000]
|
||||
public bool IsIndirect { get; init; } // register-based etc.
|
||||
}
|
||||
```
|
||||
|
||||
### Step 3 – Build function nodes
|
||||
|
||||
Using disassembly + symbol tables:
|
||||
|
||||
* For each discovered function:
|
||||
|
||||
* Determine: address, name (if sym available), export/import flags.
|
||||
* Map `ModuleId` → `Purl` using `IPurlResolver`.
|
||||
* Populate `FunctionNode` instances and index them by `Id`.
|
||||
|
||||
### Step 4 – Construct intra-module edges
|
||||
|
||||
For each `RawCallSite`:
|
||||
|
||||
* If `DirectTargetAddress` falls inside a known function’s address range in the **same module**, add **IntraModuleDirect** edge.
|
||||
|
||||
This gives you “normal” calls like `foo()` calling `bar()` in the same .so/.dll/.
|
||||
|
||||
### Step 5 – Construct inter-module edges (import calls)
|
||||
|
||||
This is where ELF/PE/Mach-O differ; details in section 4 below.
|
||||
|
||||
But the abstract logic is:
|
||||
|
||||
1. For each call site with `MemoryTargetAddress` (IAT slot / GOT entry / la_symbol_ptr / PLT):
|
||||
2. From the module’s import, relocation or fixup tables, determine:
|
||||
|
||||
* Which **imported symbol** it corresponds to (name, ordinal, etc.).
|
||||
* Which **imported module / dylib / DLL** provides that symbol.
|
||||
3. Find (or create) a `FunctionNode` representing that imported symbol in the **provider module**.
|
||||
4. Add an **ImportCall** edge from caller function to the provider `FunctionNode`.
|
||||
|
||||
This is the key to turning low-level dynamic linking into **purl-aware cross-module edges**, because each `FunctionNode` is already stamped with a `Purl`.
|
||||
|
||||
### Step 6 – Build adjacency structures
|
||||
|
||||
Once you have all `FunctionNode`s and `CallEdge`s:
|
||||
|
||||
* Build `OutEdges` and `InEdges` dictionaries keyed by `FunctionNode.Id`.
|
||||
* Build `FunctionsByModule` / `FunctionsByPurl`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Format-specific edge construction
|
||||
|
||||
This is the “how” for step 5, per binary format.
|
||||
|
||||
### 4.1 ELF
|
||||
|
||||
Goal: map call sites that go via PLT/GOT to an imported function in a `DT_NEEDED` library.
|
||||
|
||||
Algorithm:
|
||||
|
||||
1. Parse:
|
||||
|
||||
* `.dynsym`, `.dynstr` – dynamic symbol table
|
||||
* `.rela.plt` / `.rel.plt` – relocation entries for PLT
|
||||
* `.got.plt` / `.got` – PLT’s GOT
|
||||
* `DT_NEEDED` entries – list of linked shared objects and their sonames
|
||||
|
||||
2. For each relocation of type `R_*_JUMP_SLOT`:
|
||||
|
||||
* It applies to an entry in the PLT GOT; that GOT entry is what CALL instructions read from.
|
||||
* Relocation gives you:
|
||||
|
||||
* Offset in GOT (`r_offset`)
|
||||
* Symbol index (`r_info` → symbol) → dynamic symbol (`ElfSymbol`)
|
||||
* Symbol name, type (FUNC), binding, etc.
|
||||
|
||||
3. Link GOT entries to call sites:
|
||||
|
||||
* For each `RawCallSite` with `MemoryTargetAddress`, check if that address falls inside `.got.plt` (or `.got`). If it does:
|
||||
|
||||
* Find relocation whose `r_offset` equals that GOT entry offset.
|
||||
* That tells you which **symbol** is being called.
|
||||
|
||||
4. Determine provider module:
|
||||
|
||||
* From the symbol’s `st_name` and `DT_NEEDED` list, decide which shared object is expected to define it (an approximation is: first DT_NEEDED that provides that name).
|
||||
* Map DT_NEEDED → `ModuleId` (you’ll have loaded these modules separately, or you can create “placeholder modules” if they’re not in the container image).
|
||||
|
||||
5. Create edges:
|
||||
|
||||
* Create/find `FunctionNode` for the **imported symbol** in provider module.
|
||||
* Add `CallEdge` from caller function to imported function, `EdgeKind = ImportCall`, `Evidence = "ELF.R_X86_64_JUMP_SLOT"` (or arch-specific).
|
||||
|
||||
This yields edges like:
|
||||
|
||||
* `myapp:main` → `libssl.so.1.1:SSL_read`
|
||||
* `libfoo.so:foo` → `libc.so.6:malloc`
|
||||
|
||||
### 4.2 PE
|
||||
|
||||
Goal: map call sites that go via the Import Address Table (IAT) to imported functions in DLLs.
|
||||
|
||||
Algorithm:
|
||||
|
||||
1. Parse:
|
||||
|
||||
* `IMAGE_IMPORT_DESCRIPTOR[]` – each for a DLL name.
|
||||
* Original thunk table (INT) – names/ordinals of imported symbols.
|
||||
* IAT – where the loader writes function addresses at runtime.
|
||||
|
||||
2. For each import entry:
|
||||
|
||||
* Determine:
|
||||
|
||||
* DLL name (`Name`)
|
||||
* Function name or ordinal (from INT)
|
||||
* IAT slot address (RVA)
|
||||
|
||||
3. Link IAT slots to call sites:
|
||||
|
||||
* For each `RawCallSite` with `MemoryTargetAddress`:
|
||||
|
||||
* Check if this address equals the VA of an IAT slot.
|
||||
* If yes, the call site is effectively calling that imported function.
|
||||
|
||||
4. Determine provider module:
|
||||
|
||||
* The DLL name gives you a target module (e.g. `KERNEL32.dll` → `ModuleId`).
|
||||
* Ensure that DLL is represented as a `BinaryModule` or a “placeholder” if not present in image.
|
||||
|
||||
5. Create edges:
|
||||
|
||||
* Create/find `FunctionNode` for imported function in provider module.
|
||||
* Add `CallEdge` with `EdgeKind = ImportCall` and `Evidence = "PE.IAT"` (or `"PE.DelayLoad"` if using delay load descriptors).
|
||||
|
||||
Example:
|
||||
|
||||
* `myservice.exe:Start` → `SSPICLI.dll:AcquireCredentialsHandleW`
|
||||
|
||||
### 4.3 Mach-O
|
||||
|
||||
Goal: map stub calls via `__TEXT,__stubs` / `__DATA,__la_symbol_ptr` (and / or chained fixups) to symbols in dependent dylibs.
|
||||
|
||||
Algorithm (for classic dyld opcodes, not chained fixups, then extend):
|
||||
|
||||
1. Parse:
|
||||
|
||||
* Load commands:
|
||||
|
||||
* `LC_SYMTAB`, `LC_DYSYMTAB`
|
||||
* `LC_LOAD_DYLIB` (to know dependent dylibs)
|
||||
* `LC_FUNCTION_STARTS` (for seeding functions)
|
||||
* `LC_DYLD_INFO` (rebase/bind/lazy bind)
|
||||
* `__TEXT,__stubs` – stub code
|
||||
* `__DATA,__la_symbol_ptr` (or `__DATA_CONST,__la_symbol_ptr`) – lazy pointer table
|
||||
* **Indirect symbol table** – maps slot indices to symbol table indices
|
||||
|
||||
2. Stub → la_symbol_ptr mapping:
|
||||
|
||||
* Stubs are small functions (usually a few instructions) that indirect through the corresponding `la_symbol_ptr` entry.
|
||||
* For each stub function:
|
||||
|
||||
* Determine which la_symbol_ptr entry it uses (based on stub index and linking metadata).
|
||||
* From the indirect symbol table, find which dynamic symbol that la_symbol_ptr entry corresponds to.
|
||||
|
||||
* This gives you symbol name and the index in `LC_LOAD_DYLIB` (dylib ordinal).
|
||||
|
||||
3. Link stub call sites:
|
||||
|
||||
* In disassembly, treat calls to these stub functions as **import calls**.
|
||||
* For each call instruction `CALL stub_function`:
|
||||
|
||||
* `RawCallSite.DirectTargetAddress` lies inside `__TEXT,__stubs`.
|
||||
* Resolve stub → la_symbol_ptr → symbol → dylib.
|
||||
|
||||
4. Determine provider module:
|
||||
|
||||
* From dylib ordinal and load commands, get the path / install name of dylib (`libssl.1.1.dylib`, etc.).
|
||||
* Map that to a `ModuleId` in your module set.
|
||||
|
||||
5. Create edges:
|
||||
|
||||
* Create/find imported `FunctionNode` in provider module.
|
||||
* Add `CallEdge` from caller to that function with `EdgeKind = ImportCall`, `Evidence = "MachO.IndirectSymbol"`.
|
||||
|
||||
For **chained fixups** (`LC_DYLD_CHAINED_FIXUPS`), you’ll compute a similar mapping but walking chain entries instead of traditional lazy/weak binds. The key is still:
|
||||
|
||||
* Map a stub or function to a **fixup** entry.
|
||||
* From fixup, determine the symbol and dylib.
|
||||
* Then connect call-site → imported function.
|
||||
|
||||
---
|
||||
|
||||
## 5. Reachability queries
|
||||
|
||||
Once the graph is built, reachability is “just graph algorithms” + mapping back to purls.
|
||||
|
||||
### 5.1 Roots
|
||||
|
||||
Decide what are your **root functions**:
|
||||
|
||||
* Binary entrypoints:
|
||||
|
||||
* ELF: `_start`, `main`, constructors (`.init_array`)
|
||||
* PE: AddressOfEntryPoint, registered service entrypoints
|
||||
* Mach-O: `_main`, constructors
|
||||
* Optionally, any exported API function that a container orchestrator or plugin system will call.
|
||||
|
||||
Mark them as `FunctionNode.IsRoot = true` and create synthetic edges from a special root node if you want:
|
||||
|
||||
```csharp
|
||||
var syntheticRoot = new FunctionNode
|
||||
{
|
||||
Id = 0,
|
||||
Name = "<root>",
|
||||
IsRoot = true,
|
||||
// Module, Purl can be special markers
|
||||
};
|
||||
|
||||
foreach (var fn in allFunctions.Where(f => f.IsRoot))
|
||||
{
|
||||
edges.Add(new CallEdge
|
||||
{
|
||||
FromId = syntheticRoot.Id,
|
||||
ToId = fn.Id,
|
||||
Kind = EdgeKind.SyntheticRoot,
|
||||
Evidence = "Root"
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 Reachability algorithm (function-level)
|
||||
|
||||
Use BFS/DFS from the root node(s):
|
||||
|
||||
```csharp
|
||||
public sealed class ReachabilityResult
|
||||
{
|
||||
public HashSet<int> ReachableFunctions { get; } = new();
|
||||
}
|
||||
|
||||
public ReachabilityResult ComputeReachableFunctions(CallGraph graph, IEnumerable<int> rootIds)
|
||||
{
|
||||
var visited = new HashSet<int>();
|
||||
var stack = new Stack<int>();
|
||||
|
||||
foreach (var root in rootIds)
|
||||
{
|
||||
if (visited.Add(root))
|
||||
stack.Push(root);
|
||||
}
|
||||
|
||||
while (stack.Count > 0)
|
||||
{
|
||||
var current = stack.Pop();
|
||||
|
||||
if (!graph.OutEdges.TryGetValue(current, out var edges))
|
||||
continue;
|
||||
|
||||
foreach (var edge in edges)
|
||||
{
|
||||
if (visited.Add(edge.ToId))
|
||||
stack.Push(edge.ToId);
|
||||
}
|
||||
}
|
||||
|
||||
return new ReachabilityResult { ReachableFunctions = visited };
|
||||
}
|
||||
```
|
||||
|
||||
### 5.3 Project reachability to modules and purls
|
||||
|
||||
Given `ReachableFunctions`:
|
||||
|
||||
```csharp
|
||||
public sealed class ReachabilityProjection
|
||||
{
|
||||
public HashSet<ModuleId> ReachableModules { get; } = new();
|
||||
public HashSet<Purl> ReachablePurls { get; } = new();
|
||||
}
|
||||
|
||||
public ReachabilityProjection ProjectToModulesAndPurls(CallGraph graph, ReachabilityResult result)
|
||||
{
|
||||
var projection = new ReachabilityProjection();
|
||||
|
||||
foreach (var fnId in result.ReachableFunctions)
|
||||
{
|
||||
if (!graph.Nodes.TryGetValue(fnId, out var fn))
|
||||
continue;
|
||||
|
||||
projection.ReachableModules.Add(fn.Module);
|
||||
projection.ReachablePurls.Add(fn.Purl);
|
||||
}
|
||||
|
||||
return projection;
|
||||
}
|
||||
```
|
||||
|
||||
Now you can answer questions like:
|
||||
|
||||
* “Is any code from purl `pkg:deb/openssl@1.1.1w-1` reachable from the container entrypoint?”
|
||||
* “Which purls are reachable at all?”
|
||||
|
||||
### 5.4 Vulnerability reachability
|
||||
|
||||
Assume you’ve mapped each vulnerability to:
|
||||
|
||||
* `Purl` (where it lives)
|
||||
* `AffectedFunctionNames` (symbols; optionally demangled)
|
||||
|
||||
You can implement:
|
||||
|
||||
```csharp
|
||||
public sealed class VulnerabilitySink
|
||||
{
|
||||
public string VulnerabilityId { get; init; } // CVE-...
|
||||
public Purl Purl { get; init; }
|
||||
public string FunctionName { get; init; } // symbol name or demangled
|
||||
}
|
||||
```
|
||||
|
||||
Resolution algorithm:
|
||||
|
||||
1. For each `VulnerabilitySink`, find all `FunctionNode` with:
|
||||
|
||||
* `node.Purl == sink.Purl` and
|
||||
* `node.Name` or `node.DemangledName` matches `sink.FunctionName`.
|
||||
|
||||
2. For each such node, check `ReachableFunctions.Contains(node.Id)`.
|
||||
|
||||
3. Build a `Finding` object:
|
||||
|
||||
```csharp
|
||||
public sealed class VulnerabilityFinding
|
||||
{
|
||||
public string VulnerabilityId { get; init; }
|
||||
public Purl Purl { get; init; }
|
||||
public bool IsReachable { get; init; }
|
||||
public List<int> SinkFunctionIds { get; init; } = new();
|
||||
}
|
||||
```
|
||||
|
||||
Plus, if you want **path evidence**, you run a shortest-path search (BFS predecessor map) from root to sink and store the sequence of `FunctionNode.Id`s.
|
||||
|
||||
---
|
||||
|
||||
## 6. Purl edges (derived graph)
|
||||
|
||||
For reporting and analytics, it’s useful to produce a **purl-level dependency graph**.
|
||||
|
||||
Given `CallGraph`:
|
||||
|
||||
```csharp
|
||||
public PurlGraphView BuildPurlGraph(CallGraph graph)
|
||||
{
|
||||
var edgesByPair = new Dictionary<(Purl From, Purl To), PurlEdge>();
|
||||
|
||||
foreach (var kv in graph.OutEdges)
|
||||
{
|
||||
var fromFn = graph.Nodes[kv.Key];
|
||||
|
||||
foreach (var edge in kv.Value)
|
||||
{
|
||||
var toFn = graph.Nodes[edge.ToId];
|
||||
|
||||
if (fromFn.Purl.Equals(toFn.Purl))
|
||||
continue; // intra-purl, skip if you only care about inter-purl
|
||||
|
||||
var key = (fromFn.Purl, toFn.Purl);
|
||||
if (!edgesByPair.TryGetValue(key, out var pe))
|
||||
{
|
||||
pe = new PurlEdge
|
||||
{
|
||||
From = fromFn.Purl,
|
||||
To = toFn.Purl,
|
||||
SupportingCalls = new List<(int, int)>()
|
||||
};
|
||||
edgesByPair[key] = pe;
|
||||
}
|
||||
|
||||
pe.SupportingCalls.Add((fromFn.Id, toFn.Id));
|
||||
}
|
||||
}
|
||||
|
||||
var adj = new Dictionary<Purl, HashSet<Purl>>();
|
||||
|
||||
foreach (var kv in edgesByPair)
|
||||
{
|
||||
var (from, to) = kv.Key;
|
||||
if (!adj.TryGetValue(from, out var list))
|
||||
{
|
||||
list = new HashSet<Purl>();
|
||||
adj[from] = list;
|
||||
}
|
||||
list.Add(to);
|
||||
}
|
||||
|
||||
return new PurlGraphView
|
||||
{
|
||||
Adjacent = adj,
|
||||
Edges = edgesByPair.Values.ToList()
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
This gives you:
|
||||
|
||||
* A coarse view of runtime dependencies between purls (“Purl A calls into Purl B”).
|
||||
* Enough context to emit purl-level VEX or to reason about trust at package granularity.
|
||||
|
||||
---
|
||||
|
||||
## 7. JSON output and SBOM integration
|
||||
|
||||
### 7.1 JSON shape (high level)
|
||||
|
||||
You can emit a composite document:
|
||||
|
||||
```json
|
||||
{
|
||||
"image": "registry.example.com/app@sha256:...",
|
||||
"modules": [
|
||||
{
|
||||
"moduleId": { "path": "/usr/lib/libssl.so.1.1", "format": "Elf" },
|
||||
"purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
|
||||
"arch": "x86_64"
|
||||
}
|
||||
],
|
||||
"functions": [
|
||||
{
|
||||
"id": 42,
|
||||
"name": "SSL_do_handshake",
|
||||
"demangledName": null,
|
||||
"module": { "path": "/usr/lib/libssl.so.1.1", "format": "Elf" },
|
||||
"purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
|
||||
"address": "0x401020",
|
||||
"exported": true
|
||||
}
|
||||
],
|
||||
"edges": [
|
||||
{
|
||||
"from": 10,
|
||||
"to": 42,
|
||||
"kind": "ImportCall",
|
||||
"evidence": "ELF.R_X86_64_JUMP_SLOT"
|
||||
}
|
||||
],
|
||||
"reachability": {
|
||||
"roots": [1],
|
||||
"reachableFunctions": [1,10,42]
|
||||
},
|
||||
"purlGraph": {
|
||||
"edges": [
|
||||
{
|
||||
"from": "pkg:generic/myapp@1.0.0",
|
||||
"to": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
|
||||
"supportingCalls": [[10,42]]
|
||||
}
|
||||
]
|
||||
},
|
||||
"vulnerabilities": [
|
||||
{
|
||||
"id": "CVE-2024-XXXX",
|
||||
"purl": "pkg:deb/ubuntu/openssl@1.1.1w-0ubuntu1",
|
||||
"sinkFunctions": [42],
|
||||
"reachable": true,
|
||||
"paths": [
|
||||
[1, 10, 42]
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 7.2 Purl resolution
|
||||
|
||||
Implement an `IPurlResolver` interface:
|
||||
|
||||
```csharp
|
||||
public interface IPurlResolver
|
||||
{
|
||||
Purl ResolveForModule(string filePath, byte[] contentHash);
|
||||
}
|
||||
```
|
||||
|
||||
Possible implementations:
|
||||
|
||||
* `SbomPurlResolver` – given a CycloneDX/SPDX SBOM for the image, match by path or checksum.
|
||||
* `LinuxPackagePurlResolver` – read `/var/lib/dpkg/status` / rpm DB in the filesystem.
|
||||
* `GenericPurlResolver` – fallback: `pkg:generic/<hash>`.
|
||||
|
||||
You call the resolver in your loaders so that **every `BinaryModule` has a purl** and thus every `FunctionNode` has a purl.
|
||||
|
||||
---
|
||||
|
||||
## 8. Concrete implementation tasks for your team
|
||||
|
||||
1. **Data model & interfaces**
|
||||
|
||||
* Implement `ModuleId`, `FunctionNode`, `CallEdge`, `CallGraph`.
|
||||
* Define `RawCallSite`, `BinaryModule`, and `IPurlResolver`.
|
||||
|
||||
2. **Loaders**
|
||||
|
||||
* `ElfLoader`: fill symbols, dynamic relocations (PLT), DT_NEEDED, etc.
|
||||
* `PeLoader`: import descriptors, IAT, delay-load.
|
||||
* `MachOLoader`: load commands, stubs, la_symbol_ptr, indirect symbols / chained fixups.
|
||||
|
||||
3. **Disassembly**
|
||||
|
||||
* `X86Disassembler` (iced) and `Arm64Disassembler` (Disarm or port).
|
||||
* Function detection and `RawCallSite` extraction.
|
||||
|
||||
4. **CallGraphBuilder**
|
||||
|
||||
* Build intra-module edges from direct calls.
|
||||
* Build inter-module edges using the format-specific rules above.
|
||||
* Construct final `CallGraph` with adjacency maps and purl mappings.
|
||||
|
||||
5. **Reachability**
|
||||
|
||||
* Implement BFS/DFS from root functions.
|
||||
* Projection to modules + purls.
|
||||
* Vulnerability sink resolution & path reconstruction.
|
||||
|
||||
6. **Export**
|
||||
|
||||
* JSON serializer for the schema above.
|
||||
* Optional: purl-level summary / VEX generator.
|
||||
|
||||
---
|
||||
|
||||
If you want, next step I can do a **more concrete design for `CallGraphBuilder`** (including per-format helper classes with method signatures) or a **C# skeleton** for the `ElfImportResolver`, `PeImportResolver`, and `MachOStubResolver` that plug directly into this plan.
|
||||
Reference in New Issue
Block a user