# DECISION-NATIVE-TOOLCHAIN-401: Native Lifter and Demangler Selection > **Status:** Published > **Version:** 1.0.0 > **Published:** 2025-12-13 > **Owners:** Scanner Guild, Platform Guild > **Unblocks:** SCANNER-NATIVE-401-015, SCAN-REACH-401-009 ## Decision Summary This document records the decisions for native binary analysis toolchain selection, enabling implementation of native symbol extraction, callgraph generation, and demangling for ELF/PE/Mach-O binaries. --- ## 1. Component Decisions ### 1.1 ELF Parser **Decision:** Use custom pure-C# ELF parser **Rationale:** - No native dependencies, portable across platforms - Already implemented in `StellaOps.Scanner.Analyzers.Native` - Sufficient for symbol table, dynamic section, and relocation parsing - Avoids licensing complexity of external libraries **Implementation:** `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/Elf/` ### 1.2 PE Parser **Decision:** Use custom pure-C# PE parser **Rationale:** - No native dependencies - Already implemented in `StellaOps.Scanner.Analyzers.Native` - Handles import/export tables, Debug directory - Compatible with air-gapped deployment **Implementation:** `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/Pe/` ### 1.3 Mach-O Parser **Decision:** Use custom pure-C# Mach-O parser **Rationale:** - Consistent with ELF/PE approach - No native dependencies - Sufficient for symbol table and load commands **Implementation:** `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/MachO/` ### 1.4 Symbol Demangler **Decision:** Use per-language managed demanglers with native fallback | Language | Primary Demangler | Fallback | |----------|-------------------|----------| | C++ (Itanium ABI) | `Demangler.Net` (NuGet) | llvm-cxxfilt via P/Invoke | | C++ (MSVC) | `UnDecorateSymbolName` wrapper | None (Windows-specific) | | Rust | `rustc-demangle` port | rustfilt via P/Invoke | | Swift | `swift-demangle` port | None | | D | `dlang-demangler` port | None | **Rationale:** - Managed demanglers provide determinism and portability - Native fallback only for edge cases - No runtime dependency on external tools **NuGet packages:** ```xml ``` ### 1.5 Disassembler (Optional, for heuristic analysis) **Decision:** Use Iced (x86/x64) + Capstone.NET (ARM/others) | Architecture | Library | NuGet Package | |--------------|---------|---------------| | x86/x64 | Iced | `Iced` | | ARM/ARM64 | Capstone.NET | `Capstone.NET` | | Other | Skip disassembly | N/A | **Rationale:** - Iced is pure managed, no native deps for x86 - Capstone.NET wraps Capstone with native lib - Disassembly is optional for heuristic edge detection ### 1.6 Callgraph Extraction **Decision:** Static analysis only (no dynamic execution) **Methods:** 1. Relocation-based: Extract call targets from relocations 2. Import/Export: Map import references to exports 3. Symbol-based: Direct and indirect call targets from symbol table 4. CFG heuristics: Basic block boundary detection (x86 only) **No dynamic analysis:** Avoids execution risks, portable. --- ## 2. CI Toolchain Requirements ### 2.1 Build Requirements | Component | Requirement | Notes | |-----------|-------------|-------| | .NET SDK | 10.0+ | Required for all builds | | Native libs (optional) | Capstone 4.0+ | Only for ARM disassembly | | Test binaries | Pre-built fixtures | No compiler dependency in CI | ### 2.2 Test Fixture Strategy **Decision:** Ship pre-built binary fixtures, not source + compiler **Rationale:** - Deterministic: Same binary hash every run - No compiler dependency in CI - Smaller CI image footprint - Cross-platform: Same fixtures on all runners **Fixture locations:** ``` tests/Binary/fixtures/ elf-x86_64/ binary.elf # Pre-built expected.json # Expected graph expected-hashes.txt # Determinism check pe-x64/ binary.exe expected.json macho-arm64/ binary.dylib expected.json ``` ### 2.3 Fixture Generation (Offline) Fixtures are generated offline by maintainers: ```bash # Generate ELF fixture (run once, commit result) cd tools/fixtures ./generate-elf-fixture.sh # Verify hashes match ./verify-fixtures.sh ``` --- ## 3. Demangling Contract ### 3.1 Output Format Demangled names follow this format: ```json { "symbol": { "mangled": "_ZN4Curl7Session4readEv", "demangled": "Curl::Session::read()", "source": "itanium-abi", "confidence": 1.0 } } ``` ### 3.2 Demangling Sources | Source | Description | Confidence | |--------|-------------|------------| | `itanium-abi` | Itanium C++ ABI (GCC/Clang) | 1.0 | | `msvc` | Microsoft Visual C++ | 1.0 | | `rust` | Rust mangling | 1.0 | | `swift` | Swift mangling | 1.0 | | `fallback` | Native tool fallback | 0.9 | | `heuristic` | Pattern-based guess | 0.6 | | `none` | No demangling available | 0.3 | ### 3.3 Failed Demangling When demangling fails: ```json { "symbol": { "mangled": "_Z15unknown_format", "demangled": null, "source": "none", "confidence": 0.3, "demangling_error": "Unrecognized mangling scheme" } } ``` --- ## 4. Callgraph Edge Types ### 4.1 Edge Type Enumeration | Type | Description | Confidence | |------|-------------|------------| | `call` | Direct call instruction | 1.0 | | `plt` | PLT/GOT indirect call | 0.95 | | `indirect` | Indirect call (vtable, function pointer) | 0.6 | | `init_array` | From init_array to function | 1.0 | | `tls_callback` | TLS callback invocation | 1.0 | | `exception` | Exception handler target | 0.8 | | `switch` | Switch table target | 0.7 | | `heuristic` | CFG-based heuristic | 0.4 | ### 4.2 Unknown Targets When call target cannot be resolved: ```json { "unknowns": [ { "id": "unknown:call:0x12345678", "type": "unresolved_call_target", "source_id": "sym:binary:abc...", "call_site": "0x12345678", "reason": "Indirect call through register" } ] } ``` --- ## 5. Performance Constraints ### 5.1 Size Limits | Metric | Limit | Action on Exceed | |--------|-------|------------------| | Binary size | 100 MB | Warn, proceed | | Symbol count | 1M symbols | Chunk processing | | Edge count | 10M edges | Chunk output | | Memory usage | 4 GB | Stream processing | ### 5.2 Timeout Constraints | Operation | Timeout | Action on Exceed | |-----------|---------|------------------| | ELF parse | 60s | Fail with partial | | Demangle all | 120s | Truncate results | | CFG analysis | 300s | Skip heuristics | | Total analysis | 600s | Fail gracefully | --- ## 6. Integration Points ### 6.1 Scanner Plugin Interface ```csharp public interface INativeAnalyzer : IAnalyzerPlugin { Task AnalyzeAsync( Stream binaryStream, NativeAnalyzerOptions options, CancellationToken ct); } ``` ### 6.2 RichGraph Integration Native analysis results feed into RichGraph: ``` NativeObservation → NativeReachabilityGraph → RichGraph nodes/edges ``` ### 6.3 Signals Integration Native symbols with runtime hits: ``` Signals runtime-facts + RichGraph → ReachabilityFact with confidence ``` --- ## 7. Implementation Checklist | Task | Status | Owner | |------|--------|-------| | ELF parser | Done | Scanner Guild | | PE parser | Done | Scanner Guild | | Mach-O parser | In Progress | Scanner Guild | | C++ demangler | Done | Scanner Guild | | Rust demangler | Pending | Scanner Guild | | Callgraph builder | Done | Scanner Guild | | Test fixtures | Partial | QA Guild | | CI integration | Pending | DevOps Guild | --- ## 8. Related Documents - [richgraph-v1 Contract](./richgraph-v1.md) - [Build-ID Propagation](./buildid-propagation.md) - [Init-Section Roots](./init-section-roots.md) - [Binary Reachability Schema](../reachability/binary-reachability-schema.md) --- ## Changelog | Version | Date | Author | Changes | |---------|------|--------|---------| | 1.0.0 | 2025-12-13 | Platform Guild | Initial toolchain decision |