7.9 KiB
DECISION-NATIVE-TOOLCHAIN-401: Native Lifter and Demangler Selection
Status: Published Version: 1.0.0 Published: 2025-12-13 Owners: Scanner Guild, Platform Guild Unblocks: SCANNER-NATIVE-401-015, SCAN-REACH-401-009
Decision Summary
This document records the decisions for native binary analysis toolchain selection, enabling implementation of native symbol extraction, callgraph generation, and demangling for ELF/PE/Mach-O binaries.
1. Component Decisions
1.1 ELF Parser
Decision: Use custom pure-C# ELF parser
Rationale:
- No native dependencies, portable across platforms
- Already implemented in
StellaOps.Scanner.Analyzers.Native - Sufficient for symbol table, dynamic section, and relocation parsing
- Avoids licensing complexity of external libraries
Implementation: src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/Elf/
1.2 PE Parser
Decision: Use custom pure-C# PE parser
Rationale:
- No native dependencies
- Already implemented in
StellaOps.Scanner.Analyzers.Native - Handles import/export tables, Debug directory
- Compatible with air-gapped deployment
Implementation: src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/Pe/
1.3 Mach-O Parser
Decision: Use custom pure-C# Mach-O parser
Rationale:
- Consistent with ELF/PE approach
- No native dependencies
- Sufficient for symbol table and load commands
Implementation: src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/MachO/
1.4 Symbol Demangler
Decision: Use per-language managed demanglers with native fallback
| Language | Primary Demangler | Fallback |
|---|---|---|
| C++ (Itanium ABI) | Demangler.Net (NuGet) |
llvm-cxxfilt via P/Invoke |
| C++ (MSVC) | UnDecorateSymbolName wrapper |
None (Windows-specific) |
| Rust | rustc-demangle port |
rustfilt via P/Invoke |
| Swift | swift-demangle port |
None |
| D | dlang-demangler port |
None |
Rationale:
- Managed demanglers provide determinism and portability
- Native fallback only for edge cases
- No runtime dependency on external tools
NuGet packages:
<PackageReference Include="Demangler.Net" Version="1.0.0" />
1.5 Disassembler (Optional, for heuristic analysis)
Decision: Use Iced (x86/x64) + Capstone.NET (ARM/others)
| Architecture | Library | NuGet Package |
|---|---|---|
| x86/x64 | Iced | Iced |
| ARM/ARM64 | Capstone.NET | Capstone.NET |
| Other | Skip disassembly | N/A |
Rationale:
- Iced is pure managed, no native deps for x86
- Capstone.NET wraps Capstone with native lib
- Disassembly is optional for heuristic edge detection
1.6 Callgraph Extraction
Decision: Static analysis only (no dynamic execution)
Methods:
- Relocation-based: Extract call targets from relocations
- Import/Export: Map import references to exports
- Symbol-based: Direct and indirect call targets from symbol table
- CFG heuristics: Basic block boundary detection (x86 only)
No dynamic analysis: Avoids execution risks, portable.
2. CI Toolchain Requirements
2.1 Build Requirements
| Component | Requirement | Notes |
|---|---|---|
| .NET SDK | 10.0+ | Required for all builds |
| Native libs (optional) | Capstone 4.0+ | Only for ARM disassembly |
| Test binaries | Pre-built fixtures | No compiler dependency in CI |
2.2 Test Fixture Strategy
Decision: Ship pre-built binary fixtures, not source + compiler
Rationale:
- Deterministic: Same binary hash every run
- No compiler dependency in CI
- Smaller CI image footprint
- Cross-platform: Same fixtures on all runners
Fixture locations:
tests/Binary/fixtures/
elf-x86_64/
binary.elf # Pre-built
expected.json # Expected graph
expected-hashes.txt # Determinism check
pe-x64/
binary.exe
expected.json
macho-arm64/
binary.dylib
expected.json
2.3 Fixture Generation (Offline)
Fixtures are generated offline by maintainers:
# Generate ELF fixture (run once, commit result)
cd tools/fixtures
./generate-elf-fixture.sh
# Verify hashes match
./verify-fixtures.sh
3. Demangling Contract
3.1 Output Format
Demangled names follow this format:
{
"symbol": {
"mangled": "_ZN4Curl7Session4readEv",
"demangled": "Curl::Session::read()",
"source": "itanium-abi",
"confidence": 1.0
}
}
3.2 Demangling Sources
| Source | Description | Confidence |
|---|---|---|
itanium-abi |
Itanium C++ ABI (GCC/Clang) | 1.0 |
msvc |
Microsoft Visual C++ | 1.0 |
rust |
Rust mangling | 1.0 |
swift |
Swift mangling | 1.0 |
fallback |
Native tool fallback | 0.9 |
heuristic |
Pattern-based guess | 0.6 |
none |
No demangling available | 0.3 |
3.3 Failed Demangling
When demangling fails:
{
"symbol": {
"mangled": "_Z15unknown_format",
"demangled": null,
"source": "none",
"confidence": 0.3,
"demangling_error": "Unrecognized mangling scheme"
}
}
4. Callgraph Edge Types
4.1 Edge Type Enumeration
| Type | Description | Confidence |
|---|---|---|
call |
Direct call instruction | 1.0 |
plt |
PLT/GOT indirect call | 0.95 |
indirect |
Indirect call (vtable, function pointer) | 0.6 |
init_array |
From init_array to function | 1.0 |
tls_callback |
TLS callback invocation | 1.0 |
exception |
Exception handler target | 0.8 |
switch |
Switch table target | 0.7 |
heuristic |
CFG-based heuristic | 0.4 |
4.2 Unknown Targets
When call target cannot be resolved:
{
"unknowns": [
{
"id": "unknown:call:0x12345678",
"type": "unresolved_call_target",
"source_id": "sym:binary:abc...",
"call_site": "0x12345678",
"reason": "Indirect call through register"
}
]
}
5. Performance Constraints
5.1 Size Limits
| Metric | Limit | Action on Exceed |
|---|---|---|
| Binary size | 100 MB | Warn, proceed |
| Symbol count | 1M symbols | Chunk processing |
| Edge count | 10M edges | Chunk output |
| Memory usage | 4 GB | Stream processing |
5.2 Timeout Constraints
| Operation | Timeout | Action on Exceed |
|---|---|---|
| ELF parse | 60s | Fail with partial |
| Demangle all | 120s | Truncate results |
| CFG analysis | 300s | Skip heuristics |
| Total analysis | 600s | Fail gracefully |
6. Integration Points
6.1 Scanner Plugin Interface
public interface INativeAnalyzer : IAnalyzerPlugin
{
Task<NativeObservationDocument> AnalyzeAsync(
Stream binaryStream,
NativeAnalyzerOptions options,
CancellationToken ct);
}
6.2 RichGraph Integration
Native analysis results feed into RichGraph:
NativeObservation → NativeReachabilityGraph → RichGraph nodes/edges
6.3 Signals Integration
Native symbols with runtime hits:
Signals runtime-facts + RichGraph → ReachabilityFact with confidence
7. Implementation Checklist
| Task | Status | Owner |
|---|---|---|
| ELF parser | Done | Scanner Guild |
| PE parser | Done | Scanner Guild |
| Mach-O parser | In Progress | Scanner Guild |
| C++ demangler | Done | Scanner Guild |
| Rust demangler | Pending | Scanner Guild |
| Callgraph builder | Done | Scanner Guild |
| Test fixtures | Partial | QA Guild |
| CI integration | Pending | DevOps Guild |
8. Related Documents
Changelog
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2025-12-13 | Platform Guild | Initial toolchain decision |