Files
git.stella-ops.org/docs/contracts/native-toolchain-decision.md
StellaOps Bot f1a39c4ce3
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
up
2025-12-13 18:08:55 +02:00

7.9 KiB

DECISION-NATIVE-TOOLCHAIN-401: Native Lifter and Demangler Selection

Status: Published Version: 1.0.0 Published: 2025-12-13 Owners: Scanner Guild, Platform Guild Unblocks: SCANNER-NATIVE-401-015, SCAN-REACH-401-009

Decision Summary

This document records the decisions for native binary analysis toolchain selection, enabling implementation of native symbol extraction, callgraph generation, and demangling for ELF/PE/Mach-O binaries.


1. Component Decisions

1.1 ELF Parser

Decision: Use custom pure-C# ELF parser

Rationale:

  • No native dependencies, portable across platforms
  • Already implemented in StellaOps.Scanner.Analyzers.Native
  • Sufficient for symbol table, dynamic section, and relocation parsing
  • Avoids licensing complexity of external libraries

Implementation: src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/Elf/

1.2 PE Parser

Decision: Use custom pure-C# PE parser

Rationale:

  • No native dependencies
  • Already implemented in StellaOps.Scanner.Analyzers.Native
  • Handles import/export tables, Debug directory
  • Compatible with air-gapped deployment

Implementation: src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/Pe/

1.3 Mach-O Parser

Decision: Use custom pure-C# Mach-O parser

Rationale:

  • Consistent with ELF/PE approach
  • No native dependencies
  • Sufficient for symbol table and load commands

Implementation: src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/MachO/

1.4 Symbol Demangler

Decision: Use per-language managed demanglers with native fallback

Language Primary Demangler Fallback
C++ (Itanium ABI) Demangler.Net (NuGet) llvm-cxxfilt via P/Invoke
C++ (MSVC) UnDecorateSymbolName wrapper None (Windows-specific)
Rust rustc-demangle port rustfilt via P/Invoke
Swift swift-demangle port None
D dlang-demangler port None

Rationale:

  • Managed demanglers provide determinism and portability
  • Native fallback only for edge cases
  • No runtime dependency on external tools

NuGet packages:

<PackageReference Include="Demangler.Net" Version="1.0.0" />

1.5 Disassembler (Optional, for heuristic analysis)

Decision: Use Iced (x86/x64) + Capstone.NET (ARM/others)

Architecture Library NuGet Package
x86/x64 Iced Iced
ARM/ARM64 Capstone.NET Capstone.NET
Other Skip disassembly N/A

Rationale:

  • Iced is pure managed, no native deps for x86
  • Capstone.NET wraps Capstone with native lib
  • Disassembly is optional for heuristic edge detection

1.6 Callgraph Extraction

Decision: Static analysis only (no dynamic execution)

Methods:

  1. Relocation-based: Extract call targets from relocations
  2. Import/Export: Map import references to exports
  3. Symbol-based: Direct and indirect call targets from symbol table
  4. CFG heuristics: Basic block boundary detection (x86 only)

No dynamic analysis: Avoids execution risks, portable.


2. CI Toolchain Requirements

2.1 Build Requirements

Component Requirement Notes
.NET SDK 10.0+ Required for all builds
Native libs (optional) Capstone 4.0+ Only for ARM disassembly
Test binaries Pre-built fixtures No compiler dependency in CI

2.2 Test Fixture Strategy

Decision: Ship pre-built binary fixtures, not source + compiler

Rationale:

  • Deterministic: Same binary hash every run
  • No compiler dependency in CI
  • Smaller CI image footprint
  • Cross-platform: Same fixtures on all runners

Fixture locations:

tests/Binary/fixtures/
  elf-x86_64/
    binary.elf           # Pre-built
    expected.json        # Expected graph
    expected-hashes.txt  # Determinism check
  pe-x64/
    binary.exe
    expected.json
  macho-arm64/
    binary.dylib
    expected.json

2.3 Fixture Generation (Offline)

Fixtures are generated offline by maintainers:

# Generate ELF fixture (run once, commit result)
cd tools/fixtures
./generate-elf-fixture.sh

# Verify hashes match
./verify-fixtures.sh

3. Demangling Contract

3.1 Output Format

Demangled names follow this format:

{
  "symbol": {
    "mangled": "_ZN4Curl7Session4readEv",
    "demangled": "Curl::Session::read()",
    "source": "itanium-abi",
    "confidence": 1.0
  }
}

3.2 Demangling Sources

Source Description Confidence
itanium-abi Itanium C++ ABI (GCC/Clang) 1.0
msvc Microsoft Visual C++ 1.0
rust Rust mangling 1.0
swift Swift mangling 1.0
fallback Native tool fallback 0.9
heuristic Pattern-based guess 0.6
none No demangling available 0.3

3.3 Failed Demangling

When demangling fails:

{
  "symbol": {
    "mangled": "_Z15unknown_format",
    "demangled": null,
    "source": "none",
    "confidence": 0.3,
    "demangling_error": "Unrecognized mangling scheme"
  }
}

4. Callgraph Edge Types

4.1 Edge Type Enumeration

Type Description Confidence
call Direct call instruction 1.0
plt PLT/GOT indirect call 0.95
indirect Indirect call (vtable, function pointer) 0.6
init_array From init_array to function 1.0
tls_callback TLS callback invocation 1.0
exception Exception handler target 0.8
switch Switch table target 0.7
heuristic CFG-based heuristic 0.4

4.2 Unknown Targets

When call target cannot be resolved:

{
  "unknowns": [
    {
      "id": "unknown:call:0x12345678",
      "type": "unresolved_call_target",
      "source_id": "sym:binary:abc...",
      "call_site": "0x12345678",
      "reason": "Indirect call through register"
    }
  ]
}

5. Performance Constraints

5.1 Size Limits

Metric Limit Action on Exceed
Binary size 100 MB Warn, proceed
Symbol count 1M symbols Chunk processing
Edge count 10M edges Chunk output
Memory usage 4 GB Stream processing

5.2 Timeout Constraints

Operation Timeout Action on Exceed
ELF parse 60s Fail with partial
Demangle all 120s Truncate results
CFG analysis 300s Skip heuristics
Total analysis 600s Fail gracefully

6. Integration Points

6.1 Scanner Plugin Interface

public interface INativeAnalyzer : IAnalyzerPlugin
{
    Task<NativeObservationDocument> AnalyzeAsync(
        Stream binaryStream,
        NativeAnalyzerOptions options,
        CancellationToken ct);
}

6.2 RichGraph Integration

Native analysis results feed into RichGraph:

NativeObservation → NativeReachabilityGraph → RichGraph nodes/edges

6.3 Signals Integration

Native symbols with runtime hits:

Signals runtime-facts + RichGraph → ReachabilityFact with confidence

7. Implementation Checklist

Task Status Owner
ELF parser Done Scanner Guild
PE parser Done Scanner Guild
Mach-O parser In Progress Scanner Guild
C++ demangler Done Scanner Guild
Rust demangler Pending Scanner Guild
Callgraph builder Done Scanner Guild
Test fixtures Partial QA Guild
CI integration Pending DevOps Guild


Changelog

Version Date Author Changes
1.0.0 2025-12-13 Platform Guild Initial toolchain decision