Files
git.stella-ops.org/docs/modules/binary-index/semantic-diffing.md

31 KiB

Semantic Diffing Architecture

Status: PHASE 1 IMPLEMENTED (B2R2 IR Lifting) Version: 1.1.0 Related Sprints:

  • SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.md
  • SPRINT_20260105_001_002_BINDEX_semdiff_corpus.md
  • SPRINT_20260105_001_003_BINDEX_semdiff_ghidra.md
  • SPRINT_20260105_001_004_BINDEX_semdiff_decompiler_ml.md

1. Executive Summary

Semantic diffing is an advanced binary analysis capability that detects function equivalence based on behavior rather than syntax. This enables accurate vulnerability detection in scenarios where traditional byte-level or symbol-based matching fails:

  • Compiler optimizations - Same source, different instructions
  • Obfuscation - Intentionally altered code structure
  • Stripped binaries - No symbols or debug information
  • Cross-compiler - GCC vs Clang produce different output
  • Backported patches - Different version, same fix

Expected Impact

Capability Current Accuracy With Semantic Diffing
Patch detection (optimized) ~70% 92%+
Function identification (stripped) ~50% 85%+
Obfuscation resilience ~40% 75%+
False positive rate ~5% <2%

2. Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────────┐
│                        Semantic Diffing Architecture                             │
│                                                                                  │
│  ┌─────────────────────────────────────────────────────────────────────────────┐│
│  │                         Analysis Layer                                       ││
│  │                                                                              ││
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        ││
│  │  │   B2R2      │  │   Ghidra    │  │ Decompiler  │  │     ML      │        ││
│  │  │  (Primary)  │  │ (Fallback)  │  │  (Optional) │  │ (Optional)  │        ││
│  │  │             │  │             │  │             │  │             │        ││
│  │  │ - Disasm    │  │ - P-Code    │  │ - C output  │  │ - CodeBERT  │        ││
│  │  │ - LowUIR    │  │ - BSim      │  │ - AST parse │  │ - GraphSage │        ││
│  │  │ - CFG       │  │ - Ver.Track │  │ - Normalize │  │ - Embedding │        ││
│  │  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘        ││
│  │         │                │                │                │               ││
│  └─────────┴────────────────┴────────────────┴────────────────┴───────────────┘│
│                                      │                                          │
│                                      v                                          │
│  ┌─────────────────────────────────────────────────────────────────────────────┐│
│  │                       Fingerprint Layer                                      ││
│  │                                                                              ││
│  │  ┌───────────────────┐  ┌───────────────────┐  ┌───────────────────┐       ││
│  │  │   Instruction     │  │    Semantic       │  │   Decompiled      │       ││
│  │  │   Fingerprint     │  │    Fingerprint    │  │   Fingerprint     │       ││
│  │  │                   │  │                   │  │                   │       ││
│  │  │ - BasicBlock hash │  │ - KSG graph hash  │  │ - AST hash        │       ││
│  │  │ - CFG edge hash   │  │ - WL hash         │  │ - Normalized code │       ││
│  │  │ - String refs     │  │ - DataFlow hash   │  │ - API sequence    │       ││
│  │  │ - Rolling chunks  │  │ - API calls       │  │ - Pattern hash    │       ││
│  │  └───────────────────┘  └───────────────────┘  └───────────────────┘       ││
│  │                                                                              ││
│  │  ┌───────────────────┐  ┌───────────────────┐                               ││
│  │  │      BSim         │  │   ML Embedding    │                               ││
│  │  │   Signature       │  │     Vector        │                               ││
│  │  │                   │  │                   │                               ││
│  │  │ - Feature vector  │  │ - 768-dim float[] │                               ││
│  │  │ - Significance    │  │ - Cosine sim      │                               ││
│  │  └───────────────────┘  └───────────────────┘                               ││
│  │                                                                              ││
│  └─────────────────────────────────────────────────────────────────────────────┘│
│                                      │                                          │
│                                      v                                          │
│  ┌─────────────────────────────────────────────────────────────────────────────┐│
│  │                       Matching Layer                                         ││
│  │                                                                              ││
│  │  ┌───────────────────────────────────────────────────────────────────────┐  ││
│  │  │                    Ensemble Decision Engine                            │  ││
│  │  │                                                                        │  ││
│  │  │  Signal Weights:                                                       │  ││
│  │  │  - Instruction fingerprint:  15%                                       │  ││
│  │  │  - Semantic graph:           25%                                       │  ││
│  │  │  - Decompiled AST:           35%                                       │  ││
│  │  │  - ML embedding:             25%                                       │  ││
│  │  │                                                                        │  ││
│  │  │  Output: Confidence-weighted similarity score                          │  ││
│  │  │                                                                        │  ││
│  │  └───────────────────────────────────────────────────────────────────────┘  ││
│  │                                                                              ││
│  └─────────────────────────────────────────────────────────────────────────────┘│
│                                      │                                          │
│                                      v                                          │
│  ┌─────────────────────────────────────────────────────────────────────────────┐│
│  │                       Storage Layer                                          ││
│  │                                                                              ││
│  │  PostgreSQL                RustFS                 Valkey                    ││
│  │  - corpus.* tables         - Fingerprint blobs    - Query cache             ││
│  │  - binaries.* tables       - Model artifacts      - Embedding index         ││
│  │  - BSim database           - Training data                                  ││
│  │                                                                              ││
│  └─────────────────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────────────────┘

3. Implementation Phases

Phase 1: IR-Level Semantic Analysis (Foundation)

Sprints:

  • SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.md
  • SPRINT_20260112_004_BINIDX_b2r2_lowuir_perf_cache.md (Performance & Ops)

Leverage B2R2's Intermediate Representation (IR) for semantic-level function comparison.

Key Components:

  • B2R2LowUirLiftingService - Lifts instructions to B2R2 LowUIR, maps to Stella IR model
  • B2R2LifterPool - Bounded pool with warm preload for lifter reuse
  • FunctionIrCacheService - Valkey-backed cache for semantic fingerprints
  • SemanticGraphExtractor - Build Key-Semantics Graph (KSG)
  • WeisfeilerLehmanHasher - Graph fingerprinting
  • SemanticMatcher - Semantic similarity scoring

B2R2LowUirLiftingService Implementation:

  • Supports Intel, ARM, MIPS, RISC-V, PowerPC, SPARC, SH4, AVR, EVM
  • Maps B2R2 LowUIR statements to IrStatement model
  • Applies SSA numbering to temporary registers
  • Deterministic block ordering (by entry address)
  • InvariantCulture formatting throughout

B2R2LifterPool Implementation:

  • Bounded per-ISA pooling (default 4 lifters/ISA)
  • Warm preload at startup for common ISAs
  • Per-ISA stats (pooled, active, max)
  • Automatic return on dispose

FunctionIrCacheService Implementation:

  • Cache key: (isa, b2r2_version, normalization_recipe, canonical_ir_hash)
  • Valkey as hot cache (default 4h TTL)
  • PostgreSQL persistence for fingerprint records
  • Hit/miss/eviction statistics

Ops Endpoints:

  • GET /api/v1/ops/binaryindex/health - Lifter warmness, cache status
  • POST /api/v1/ops/binaryindex/bench/run - Benchmark latency
  • GET /api/v1/ops/binaryindex/cache - Cache statistics
  • GET /api/v1/ops/binaryindex/config - Effective configuration

Deliverables:

  • StellaOps.BinaryIndex.Semantic library
  • StellaOps.BinaryIndex.Disassembly.B2R2 (LowUIR adapter, lifter pool)
  • StellaOps.BinaryIndex.Cache (function IR cache)
  • BinaryIndexOpsController
  • 20+ tasks, ~3 weeks

Phase 2: Function Behavior Corpus (Scale)

Sprint: SPRINT_20260105_001_002_BINDEX_semdiff_corpus.md

Build comprehensive database of known library functions.

Key Components:

  • Library corpus connectors (glibc, OpenSSL, zlib, curl, SQLite)
  • CorpusIngestionService - Batch fingerprint generation
  • FunctionClusteringService - Group similar functions
  • CorpusQueryService - Function identification

Deliverables:

  • StellaOps.BinaryIndex.Corpus library
  • PostgreSQL corpus.* schema
  • ~30,000 indexed functions
  • 22 tasks, ~4 weeks

Phase 3: Ghidra Integration (Depth)

Sprint: SPRINT_20260105_001_003_BINDEX_semdiff_ghidra.md

Add Ghidra as secondary backend for complex cases.

Key Components:

  • GhidraHeadlessManager - Process lifecycle
  • VersionTrackingService - Multi-correlator diffing
  • GhidriffBridge - Python interop
  • BSimService - Behavioral similarity

Deliverables:

  • StellaOps.BinaryIndex.Ghidra library
  • Docker image for Ghidra Headless
  • 20 tasks, ~4 weeks

Phase 4: Decompiler & ML (Excellence)

Sprint: SPRINT_20260105_001_004_BINDEX_semdiff_decompiler_ml.md

Highest-fidelity semantic analysis.

Key Components:

  • IDecompilerService - Ghidra decompilation
  • AstComparisonEngine - Structural similarity
  • OnnxInferenceEngine - ML embeddings
  • EnsembleDecisionEngine - Multi-signal fusion

Deliverables:

  • StellaOps.BinaryIndex.Decompiler library
  • StellaOps.BinaryIndex.ML library
  • Trained CodeBERT-Binary model
  • 30 tasks, ~5 weeks

4. Fingerprint Types

4.1 Instruction Fingerprint (Existing)

Algorithm: BasicBlock hash + CFG edge hash + String refs hash

Properties:

  • Fast to compute
  • Sensitive to instruction changes
  • Good for exact/near-exact matches

Weight in ensemble: 15%

4.2 Semantic Fingerprint (Phase 1)

Algorithm: Key-Semantics Graph + Weisfeiler-Lehman hash

Properties:

  • Captures data/control dependencies
  • Resilient to register renaming
  • Resilient to instruction reordering

Weight in ensemble: 25%

4.3 Decompiled Fingerprint (Phase 4)

Algorithm: Normalized AST hash + Pattern detection

Properties:

  • Highest semantic fidelity
  • Captures algorithmic structure
  • Resilient to most optimizations

Weight in ensemble: 35%

4.4 ML Embedding (Phase 4)

Algorithm: CodeBERT-Binary transformer, 768-dim vectors

Properties:

  • Learned similarity metric
  • Captures latent patterns
  • Resilient to obfuscation

Weight in ensemble: 25%


5. Matching Pipeline

sequenceDiagram
    participant Client
    participant DiffEngine as PatchDiffEngine
    participant B2R2
    participant Ghidra
    participant Corpus
    participant Ensemble

    Client->>DiffEngine: Compare(oldBinary, newBinary)

    par Parallel Analysis
        DiffEngine->>B2R2: Disassemble + IR lift
        DiffEngine->>Ghidra: Decompile (if needed)
    end

    B2R2-->>DiffEngine: SemanticFingerprints[]
    Ghidra-->>DiffEngine: DecompiledFunctions[]

    DiffEngine->>Corpus: IdentifyFunctions(fingerprints)
    Corpus-->>DiffEngine: FunctionMatches[]

    DiffEngine->>Ensemble: ComputeSimilarity(old, new)
    Ensemble-->>DiffEngine: EnsembleResult

    DiffEngine-->>Client: PatchDiffResult

6. Fallback Strategy

The system uses a tiered fallback strategy:

Tier 1: B2R2 IR + Semantic Graph (fast, ~90% coverage)
   │
   │ If confidence < threshold OR architecture unsupported
   v
Tier 2: Ghidra Version Tracking (slower, ~95% coverage)
   │
   │ If function is high-value (CVE-relevant)
   v
Tier 3: Decompiled AST + ML Embedding (slowest, ~99% coverage)

Selection Criteria:

Condition Backend Reason
Standard x64/ARM64 binary B2R2 only Fast, accurate
Low B2R2 confidence (<0.7) B2R2 + Ghidra Validation
Exotic architecture Ghidra only Better coverage
CVE-affected function Full pipeline Maximum accuracy
Obfuscated binary ML embedding Obfuscation resilience

7. Corpus Coverage

Priority Libraries

Library Priority Functions CVEs
glibc Critical ~15,000 50+
OpenSSL Critical ~8,000 100+
zlib High ~200 5+
libcurl High ~2,000 80+
SQLite High ~1,500 30+
libxml2 Medium ~1,200 40+
libpng Medium ~300 10+
expat Medium ~150 15+

Architecture Coverage

Architecture B2R2 Ghidra Status
x86_64 Excellent Excellent Primary
ARM64 Excellent Excellent Primary
ARM32 Good Excellent Secondary
MIPS32 Fair Excellent Fallback
MIPS64 Fair Excellent Fallback
RISC-V Good Good Emerging
PPC32/64 Fair Excellent Fallback

8. Performance Characteristics

Latency Budget

Operation Target Notes
B2R2 disassembly <100ms Per function
IR lifting <50ms Per function
Semantic fingerprint <50ms Per function
Ghidra analysis <30s Per binary (startup)
Decompilation <500ms Per function
ML inference <100ms Per function
Ensemble decision <10ms Per comparison
Total (Tier 1) <200ms Per function
Total (Full) <1s Per function

Memory Budget

Component Memory Notes
B2R2 per binary ~100MB Scales with binary size
Ghidra per project ~2GB Persistent cache
ML model ~500MB ONNX loaded
Corpus query cache ~100MB LRU eviction

9. Integration Points

9.1 Scanner Integration

// Scanner.Worker uses semantic diffing for binary vulnerability detection
var result = await _binaryVulnerabilityService.LookupByFingerprintAsync(
    fingerprint,
    minSimilarity: 0.85m,
    useSemanticMatching: true,  // Enable semantic diffing
    ct);

9.2 PatchDiffEngine Enhancement

// PatchDiffEngine now includes semantic comparison
var diff = await _patchDiffEngine.DiffAsync(
    vulnerableBinary,
    patchedBinary,
    new PatchDiffOptions
    {
        UseSemanticAnalysis = true,
        SemanticThreshold = 0.7m,
        IncludeDecompilation = true,
        IncludeMlEmbedding = true
    },
    ct);

9.3 DeltaSignature Enhancement

// Delta signatures now include semantic fingerprints
var signature = await _deltaSignatureGenerator.GenerateSignaturesAsync(
    binaryStream,
    new DeltaSignatureRequest
    {
        Cve = "CVE-2024-1234",
        TargetSymbols = ["vulnerable_func"],
        IncludeSemanticFingerprint = true,
        IncludeDecompiledHash = true
    },
    ct);

10. Security Considerations

10.1 Sandbox Requirements

All binary analysis runs in sandboxed environments:

  • Seccomp profile restricting syscalls
  • Read-only root filesystem
  • No network access during analysis
  • Memory/CPU limits

10.2 Model Security

ML models are:

  • Signed with DSSE attestations
  • Verified before loading
  • Not user-uploadable (pre-trained only)

10.3 Corpus Integrity

Corpus data is:

  • Ingested from trusted sources only
  • Signed at snapshot level
  • Version-controlled with audit trail

11. Configuration

# binaryindex.yaml - Semantic diffing configuration
binaryindex:
  semantic_diffing:
    enabled: true

    # Analysis backends
    backends:
      b2r2:
        enabled: true
        ir_lifting: true
        semantic_graph: true
      ghidra:
        enabled: true
        fallback_only: true
        min_b2r2_confidence: 0.7
        headless_timeout_ms: 30000
      decompiler:
        enabled: true
        high_value_only: true  # Only for CVE-affected functions
      ml:
        enabled: true
        model_path: /models/codebert_binary_v1.onnx
        embedding_dimension: 768

    # Ensemble weights
    ensemble:
      instruction_weight: 0.15
      semantic_weight: 0.25
      decompiled_weight: 0.35
      ml_weight: 0.25
      min_confidence: 0.6

    # Corpus
    corpus:
      auto_update: true
      update_interval_hours: 24
      libraries:
        - glibc
        - openssl
        - zlib
        - curl
        - sqlite

    # Performance
    performance:
      max_parallel_analyses: 4
      cache_ttl_seconds: 3600
      max_function_size_bytes: 1048576  # 1MB

Additional appsettings sections (case-insensitive):

  • BinaryIndex:B2R2Pool - lifter pool sizing and warm ISA list.
  • BinaryIndex:SemanticLifting - LowUIR enablement and deterministic controls.
  • BinaryIndex:FunctionCache - Valkey function cache configuration.
  • Postgres:BinaryIndex - persistence for canonical IR fingerprints.

12. Metrics & Observability

Ops Endpoints

BinaryIndex exposes read-only ops endpoints for health, bench, cache, and effective configuration:

  • GET /api/v1/ops/binaryindex/health -> BinaryIndexOpsHealthResponse
  • POST /api/v1/ops/binaryindex/bench/run -> BinaryIndexBenchResponse
  • GET /api/v1/ops/binaryindex/cache -> BinaryIndexFunctionCacheStats
  • GET /api/v1/ops/binaryindex/config -> BinaryIndexEffectiveConfig

Metrics

Metric Type Labels
semantic_diffing_analysis_total Counter backend, result
semantic_diffing_latency_ms Histogram backend, tier
semantic_diffing_accuracy Gauge comparison_type
corpus_functions_total Gauge library
ml_inference_latency_ms Histogram model
ensemble_signal_weight Gauge signal_type

Traces

  • semantic_diffing.analyze - Full analysis span
  • semantic_diffing.b2r2.lift - IR lifting
  • semantic_diffing.ghidra.decompile - Decompilation
  • semantic_diffing.ml.inference - ML embedding
  • semantic_diffing.ensemble.decide - Ensemble decision

13. Testing Strategy

Unit Tests

Test Suite Coverage
IrLiftingServiceTests IR lifting correctness
SemanticGraphExtractorTests Graph construction
WeisfeilerLehmanHasherTests Hash stability
AstComparisonEngineTests AST similarity
OnnxInferenceEngineTests ML inference
EnsembleDecisionEngineTests Weight combination

Integration Tests

Test Suite Coverage
EndToEndSemanticDiffTests Full pipeline
OptimizationResilienceTests O0 vs O2 vs O3
CompilerVariantTests GCC vs Clang
GhidraFallbackTests Fallback scenarios

Golden Corpus Tests

Pre-computed test cases with known results:

  • 100 CVE patch pairs (vulnerable -> fixed)
  • 50 optimization variant sets
  • 25 compiler variant sets
  • 25 obfuscation variant sets

14. Roadmap

Phase Status ETA Impact
Phase 1: IR Semantics Planned 2026-01-24 +15% accuracy
Phase 2: Corpus Planned 2026-02-15 +10% coverage
Phase 3: Ghidra Planned 2026-02-28 +5% edge cases
Phase 4: Decompiler/ML Planned 2026-03-31 +10% obfuscation
Total +35-40%

15. Delta-Sig Predicate Attestation

Sprint Reference: SPRINT_20260117_003_BINDEX_delta_sig_predicate

Delta-sig predicates provide a supply chain attestation format for binary patches, enabling policy-gated releases based on function-level change scope.

15.1 Predicate Structure

{
  "_type": "https://in-toto.io/Statement/v1",
  "predicateType": "https://stellaops.io/delta-sig/v1",
  "subject": [
    {
      "name": "libexample-1.1.so",
      "digest": {
        "sha256": "abc123..."
      }
    }
  ],
  "predicate": {
    "before": {
      "name": "libexample-1.0.so",
      "digest": { "sha256": "def456..." }
    },
    "after": {
      "name": "libexample-1.1.so",
      "digest": { "sha256": "abc123..." }
    },
    "diff": [
      {
        "function": "process_input",
        "changeType": "modified",
        "beforeHash": "sha256:old...",
        "afterHash": "sha256:new...",
        "bytesDelta": 48,
        "semanticSimilarity": 0.87
      },
      {
        "function": "new_handler",
        "changeType": "added",
        "afterHash": "sha256:new...",
        "bytesDelta": 256
      }
    ],
    "summary": {
      "functionsAdded": 1,
      "functionsRemoved": 0,
      "functionsModified": 1,
      "totalBytesChanged": 304
    },
    "timestamp": "2026-01-16T12:00:00Z"
  }
}

15.2 Policy Gate Integration

The DeltaScopePolicyGate enforces limits on patch scope:

policy:
  deltaSig:
    maxAddedFunctions: 10
    maxRemovedFunctions: 5
    maxModifiedFunctions: 20
    maxBytesChanged: 50000
    minSemanticSimilarity: 0.5
    requireSemanticAnalysis: false

15.3 Attestor Integration

Delta-sig predicates integrate with the Attestor module:

  1. Generate - Create predicate from before/after binary analysis
  2. Sign - Create DSSE envelope with cosign/fulcio signature
  3. Submit - Log to Rekor transparency log
  4. Verify - Validate signature and inclusion proof

15.4 CLI Commands

# Generate delta-sig predicate
stella binary diff --before old.so --after new.so --output delta.json

# Generate and attest in one step
stella binary attest --before old.so --after new.so --sign --rekor

# Verify attestation
stella binary verify --predicate delta.json --signature sig.dsse

# Check against policy gate
stella binary gate --predicate delta.json --policy policy.yaml

15.5 Semantic Similarity Scoring

When requireSemanticAnalysis is enabled, the gate also checks:

Threshold Meaning
> 0.9 Near-identical (cosmetic changes)
0.7 - 0.9 Similar (refactoring, optimization)
0.5 - 0.7 Moderate changes (significant logic)
< 0.5 Major rewrite (requires review)

15.6 Evidence Storage

Delta-sig predicates are stored in the Evidence Locker and can be included in portable bundles for air-gapped verification.


16. References

Internal

  • docs/modules/binary-index/architecture.md
  • src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/
  • src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Fingerprints/

External



17. B2R2 Troubleshooting Guide

This section covers common issues and resolutions when using B2R2 for IR lifting.

17.1 Lifting Failures

Symptom: B2R2LiftingException: Failed to lift function at address 0x...

Common Causes:

  1. Unsupported instruction - B2R2 may not recognize certain instructions
  2. Invalid entry point - Function address is not a valid entry point
  3. Obfuscated code - Heavy obfuscation defeats parsing

Resolution:

// Check if architecture is supported before lifting
if (!liftingService.SupportsArchitecture(binary.Architecture))
{
    // Fall back to disassembly-only mode
    return await _disassemblyService.DisassembleAsync(binary, ct);
}

// Use try-lift with fallback
var result = await _liftingService.TryLiftWithFallbackAsync(
    binary,
    new LiftingOptions { FallbackToDisassembly = true },
    ct);

17.2 Memory Issues

Symptom: OutOfMemoryException during lifting of large binaries

Common Causes:

  1. Pool exhaustion - Too many concurrent lifter instances
  2. Large function - Single function exceeds memory budget
  3. Memory leak - Lifter instances not properly disposed

Resolution:

# Adjust pool configuration in appsettings.yaml
BinaryIndex:
  B2R2Pool:
    MaxInstancesPerIsa: 4          # Reduce if OOM
    RecycleAfterOperations: 1000   # Force recycle more often
    MaxFunctionSizeBytes: 1048576  # Skip very large functions

17.3 Performance Issues

Symptom: Lifting takes longer than expected (>30s for small binaries)

Common Causes:

  1. Cold pool - No warm lifter instances available
  2. Complex CFG - Function has extremely complex control flow
  3. Cache misses - IR cache not configured or full

Resolution:

// Ensure pool is warmed at startup
await _lifterPool.WarmAsync(new[] { ISA.AMD64, ISA.ARM64 }, ct);

// Check cache health
var stats = await _cacheService.GetStatisticsAsync(ct);
if (stats.HitRate < 0.5)
{
    _logger.LogWarning("Low cache hit rate: {HitRate:P}", stats.HitRate);
}

17.4 Determinism Issues

Symptom: Same binary produces different IR hashes on repeated lifts

Common Causes:

  1. Non-deterministic block ordering - Blocks not sorted by address
  2. Timestamp inclusion - IR includes lift timestamp
  3. B2R2 version mismatch - Different versions produce different IR

Resolution:

  • Ensure InvariantCulture is used for all string formatting
  • Sort basic blocks by entry address before hashing
  • Include B2R2 version in cache keys
  • Use DeterministicHash utility for consistent hashing

17.5 Architecture Detection Issues

Symptom: Wrong architecture selected for multi-arch binary (fat binary)

Common Causes:

  1. Universal binary - macOS fat binaries contain multiple architectures
  2. ELF with multiple ABIs - Rare but possible

Resolution:

// Explicitly specify target architecture
var liftOptions = new LiftingOptions
{
    TargetArchitecture = ISA.AMD64,  // Force x86-64
    IgnoreOtherArchitectures = true
};

17.6 LowUIR Mapping Issues

Symptom: Specific B2R2 LowUIR statements not mapped correctly

Reference: LowUIR Statement Type Mapping

B2R2 LowUIR Stella IR Model Notes
LMark IrLabel Block label markers
Put IrAssignment Register write
Store IrStore Memory write
InterJmp IrJump Cross-function jump
IntraJmp IrJump Intra-function jump
InterCJmp IrConditionalJump Cross-function conditional
IntraCJmp IrConditionalJump Intra-function conditional
SideEffect IrCall/IrReturn Function calls, returns
Def/Use/Phi IrPhi SSA form constructs

17.7 Diagnostic Commands

# Check B2R2 health
stella ops binaryindex health --verbose

# Run benchmark suite
stella ops binaryindex bench --iterations 100 --binary sample.so

# View cache statistics
stella ops binaryindex cache --stats

# Dump effective configuration
stella ops binaryindex config

Document Version: 1.1.0 Last Updated: 2026-01-19