Files
git.stella-ops.org/docs/contracts/native-toolchain-decision.md
StellaOps Bot f1a39c4ce3
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
up
2025-12-13 18:08:55 +02:00

318 lines
7.9 KiB
Markdown

# DECISION-NATIVE-TOOLCHAIN-401: Native Lifter and Demangler Selection
> **Status:** Published
> **Version:** 1.0.0
> **Published:** 2025-12-13
> **Owners:** Scanner Guild, Platform Guild
> **Unblocks:** SCANNER-NATIVE-401-015, SCAN-REACH-401-009
## Decision Summary
This document records the decisions for native binary analysis toolchain selection, enabling implementation of native symbol extraction, callgraph generation, and demangling for ELF/PE/Mach-O binaries.
---
## 1. Component Decisions
### 1.1 ELF Parser
**Decision:** Use custom pure-C# ELF parser
**Rationale:**
- No native dependencies, portable across platforms
- Already implemented in `StellaOps.Scanner.Analyzers.Native`
- Sufficient for symbol table, dynamic section, and relocation parsing
- Avoids licensing complexity of external libraries
**Implementation:** `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/Elf/`
### 1.2 PE Parser
**Decision:** Use custom pure-C# PE parser
**Rationale:**
- No native dependencies
- Already implemented in `StellaOps.Scanner.Analyzers.Native`
- Handles import/export tables, Debug directory
- Compatible with air-gapped deployment
**Implementation:** `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/Pe/`
### 1.3 Mach-O Parser
**Decision:** Use custom pure-C# Mach-O parser
**Rationale:**
- Consistent with ELF/PE approach
- No native dependencies
- Sufficient for symbol table and load commands
**Implementation:** `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/MachO/`
### 1.4 Symbol Demangler
**Decision:** Use per-language managed demanglers with native fallback
| Language | Primary Demangler | Fallback |
|----------|-------------------|----------|
| C++ (Itanium ABI) | `Demangler.Net` (NuGet) | llvm-cxxfilt via P/Invoke |
| C++ (MSVC) | `UnDecorateSymbolName` wrapper | None (Windows-specific) |
| Rust | `rustc-demangle` port | rustfilt via P/Invoke |
| Swift | `swift-demangle` port | None |
| D | `dlang-demangler` port | None |
**Rationale:**
- Managed demanglers provide determinism and portability
- Native fallback only for edge cases
- No runtime dependency on external tools
**NuGet packages:**
```xml
<PackageReference Include="Demangler.Net" Version="1.0.0" />
```
### 1.5 Disassembler (Optional, for heuristic analysis)
**Decision:** Use Iced (x86/x64) + Capstone.NET (ARM/others)
| Architecture | Library | NuGet Package |
|--------------|---------|---------------|
| x86/x64 | Iced | `Iced` |
| ARM/ARM64 | Capstone.NET | `Capstone.NET` |
| Other | Skip disassembly | N/A |
**Rationale:**
- Iced is pure managed, no native deps for x86
- Capstone.NET wraps Capstone with native lib
- Disassembly is optional for heuristic edge detection
### 1.6 Callgraph Extraction
**Decision:** Static analysis only (no dynamic execution)
**Methods:**
1. Relocation-based: Extract call targets from relocations
2. Import/Export: Map import references to exports
3. Symbol-based: Direct and indirect call targets from symbol table
4. CFG heuristics: Basic block boundary detection (x86 only)
**No dynamic analysis:** Avoids execution risks, portable.
---
## 2. CI Toolchain Requirements
### 2.1 Build Requirements
| Component | Requirement | Notes |
|-----------|-------------|-------|
| .NET SDK | 10.0+ | Required for all builds |
| Native libs (optional) | Capstone 4.0+ | Only for ARM disassembly |
| Test binaries | Pre-built fixtures | No compiler dependency in CI |
### 2.2 Test Fixture Strategy
**Decision:** Ship pre-built binary fixtures, not source + compiler
**Rationale:**
- Deterministic: Same binary hash every run
- No compiler dependency in CI
- Smaller CI image footprint
- Cross-platform: Same fixtures on all runners
**Fixture locations:**
```
tests/Binary/fixtures/
elf-x86_64/
binary.elf # Pre-built
expected.json # Expected graph
expected-hashes.txt # Determinism check
pe-x64/
binary.exe
expected.json
macho-arm64/
binary.dylib
expected.json
```
### 2.3 Fixture Generation (Offline)
Fixtures are generated offline by maintainers:
```bash
# Generate ELF fixture (run once, commit result)
cd tools/fixtures
./generate-elf-fixture.sh
# Verify hashes match
./verify-fixtures.sh
```
---
## 3. Demangling Contract
### 3.1 Output Format
Demangled names follow this format:
```json
{
"symbol": {
"mangled": "_ZN4Curl7Session4readEv",
"demangled": "Curl::Session::read()",
"source": "itanium-abi",
"confidence": 1.0
}
}
```
### 3.2 Demangling Sources
| Source | Description | Confidence |
|--------|-------------|------------|
| `itanium-abi` | Itanium C++ ABI (GCC/Clang) | 1.0 |
| `msvc` | Microsoft Visual C++ | 1.0 |
| `rust` | Rust mangling | 1.0 |
| `swift` | Swift mangling | 1.0 |
| `fallback` | Native tool fallback | 0.9 |
| `heuristic` | Pattern-based guess | 0.6 |
| `none` | No demangling available | 0.3 |
### 3.3 Failed Demangling
When demangling fails:
```json
{
"symbol": {
"mangled": "_Z15unknown_format",
"demangled": null,
"source": "none",
"confidence": 0.3,
"demangling_error": "Unrecognized mangling scheme"
}
}
```
---
## 4. Callgraph Edge Types
### 4.1 Edge Type Enumeration
| Type | Description | Confidence |
|------|-------------|------------|
| `call` | Direct call instruction | 1.0 |
| `plt` | PLT/GOT indirect call | 0.95 |
| `indirect` | Indirect call (vtable, function pointer) | 0.6 |
| `init_array` | From init_array to function | 1.0 |
| `tls_callback` | TLS callback invocation | 1.0 |
| `exception` | Exception handler target | 0.8 |
| `switch` | Switch table target | 0.7 |
| `heuristic` | CFG-based heuristic | 0.4 |
### 4.2 Unknown Targets
When call target cannot be resolved:
```json
{
"unknowns": [
{
"id": "unknown:call:0x12345678",
"type": "unresolved_call_target",
"source_id": "sym:binary:abc...",
"call_site": "0x12345678",
"reason": "Indirect call through register"
}
]
}
```
---
## 5. Performance Constraints
### 5.1 Size Limits
| Metric | Limit | Action on Exceed |
|--------|-------|------------------|
| Binary size | 100 MB | Warn, proceed |
| Symbol count | 1M symbols | Chunk processing |
| Edge count | 10M edges | Chunk output |
| Memory usage | 4 GB | Stream processing |
### 5.2 Timeout Constraints
| Operation | Timeout | Action on Exceed |
|-----------|---------|------------------|
| ELF parse | 60s | Fail with partial |
| Demangle all | 120s | Truncate results |
| CFG analysis | 300s | Skip heuristics |
| Total analysis | 600s | Fail gracefully |
---
## 6. Integration Points
### 6.1 Scanner Plugin Interface
```csharp
public interface INativeAnalyzer : IAnalyzerPlugin
{
Task<NativeObservationDocument> AnalyzeAsync(
Stream binaryStream,
NativeAnalyzerOptions options,
CancellationToken ct);
}
```
### 6.2 RichGraph Integration
Native analysis results feed into RichGraph:
```
NativeObservation → NativeReachabilityGraph → RichGraph nodes/edges
```
### 6.3 Signals Integration
Native symbols with runtime hits:
```
Signals runtime-facts + RichGraph → ReachabilityFact with confidence
```
---
## 7. Implementation Checklist
| Task | Status | Owner |
|------|--------|-------|
| ELF parser | Done | Scanner Guild |
| PE parser | Done | Scanner Guild |
| Mach-O parser | In Progress | Scanner Guild |
| C++ demangler | Done | Scanner Guild |
| Rust demangler | Pending | Scanner Guild |
| Callgraph builder | Done | Scanner Guild |
| Test fixtures | Partial | QA Guild |
| CI integration | Pending | DevOps Guild |
---
## 8. Related Documents
- [richgraph-v1 Contract](./richgraph-v1.md)
- [Build-ID Propagation](./buildid-propagation.md)
- [Init-Section Roots](./init-section-roots.md)
- [Binary Reachability Schema](../reachability/binary-reachability-schema.md)
---
## Changelog
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0.0 | 2025-12-13 | Platform Guild | Initial toolchain decision |