31 KiB
Semantic Diffing Architecture
Status: PHASE 1 IMPLEMENTED (B2R2 IR Lifting) Version: 1.1.0 Related Sprints:
SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.mdSPRINT_20260105_001_002_BINDEX_semdiff_corpus.mdSPRINT_20260105_001_003_BINDEX_semdiff_ghidra.mdSPRINT_20260105_001_004_BINDEX_semdiff_decompiler_ml.md
1. Executive Summary
Semantic diffing is an advanced binary analysis capability that detects function equivalence based on behavior rather than syntax. This enables accurate vulnerability detection in scenarios where traditional byte-level or symbol-based matching fails:
- Compiler optimizations - Same source, different instructions
- Obfuscation - Intentionally altered code structure
- Stripped binaries - No symbols or debug information
- Cross-compiler - GCC vs Clang produce different output
- Backported patches - Different version, same fix
Expected Impact
| Capability | Current Accuracy | With Semantic Diffing |
|---|---|---|
| Patch detection (optimized) | ~70% | 92%+ |
| Function identification (stripped) | ~50% | 85%+ |
| Obfuscation resilience | ~40% | 75%+ |
| False positive rate | ~5% | <2% |
2. Architecture Overview
┌─────────────────────────────────────────────────────────────────────────────────┐
│ Semantic Diffing Architecture │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────┐│
│ │ Analysis Layer ││
│ │ ││
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││
│ │ │ B2R2 │ │ Ghidra │ │ Decompiler │ │ ML │ ││
│ │ │ (Primary) │ │ (Fallback) │ │ (Optional) │ │ (Optional) │ ││
│ │ │ │ │ │ │ │ │ │ ││
│ │ │ - Disasm │ │ - P-Code │ │ - C output │ │ - CodeBERT │ ││
│ │ │ - LowUIR │ │ - BSim │ │ - AST parse │ │ - GraphSage │ ││
│ │ │ - CFG │ │ - Ver.Track │ │ - Normalize │ │ - Embedding │ ││
│ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ ││
│ │ │ │ │ │ ││
│ └─────────┴────────────────┴────────────────┴────────────────┴───────────────┘│
│ │ │
│ v │
│ ┌─────────────────────────────────────────────────────────────────────────────┐│
│ │ Fingerprint Layer ││
│ │ ││
│ │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ││
│ │ │ Instruction │ │ Semantic │ │ Decompiled │ ││
│ │ │ Fingerprint │ │ Fingerprint │ │ Fingerprint │ ││
│ │ │ │ │ │ │ │ ││
│ │ │ - BasicBlock hash │ │ - KSG graph hash │ │ - AST hash │ ││
│ │ │ - CFG edge hash │ │ - WL hash │ │ - Normalized code │ ││
│ │ │ - String refs │ │ - DataFlow hash │ │ - API sequence │ ││
│ │ │ - Rolling chunks │ │ - API calls │ │ - Pattern hash │ ││
│ │ └───────────────────┘ └───────────────────┘ └───────────────────┘ ││
│ │ ││
│ │ ┌───────────────────┐ ┌───────────────────┐ ││
│ │ │ BSim │ │ ML Embedding │ ││
│ │ │ Signature │ │ Vector │ ││
│ │ │ │ │ │ ││
│ │ │ - Feature vector │ │ - 768-dim float[] │ ││
│ │ │ - Significance │ │ - Cosine sim │ ││
│ │ └───────────────────┘ └───────────────────┘ ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────────────┘│
│ │ │
│ v │
│ ┌─────────────────────────────────────────────────────────────────────────────┐│
│ │ Matching Layer ││
│ │ ││
│ │ ┌───────────────────────────────────────────────────────────────────────┐ ││
│ │ │ Ensemble Decision Engine │ ││
│ │ │ │ ││
│ │ │ Signal Weights: │ ││
│ │ │ - Instruction fingerprint: 15% │ ││
│ │ │ - Semantic graph: 25% │ ││
│ │ │ - Decompiled AST: 35% │ ││
│ │ │ - ML embedding: 25% │ ││
│ │ │ │ ││
│ │ │ Output: Confidence-weighted similarity score │ ││
│ │ │ │ ││
│ │ └───────────────────────────────────────────────────────────────────────┘ ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────────────┘│
│ │ │
│ v │
│ ┌─────────────────────────────────────────────────────────────────────────────┐│
│ │ Storage Layer ││
│ │ ││
│ │ PostgreSQL RustFS Valkey ││
│ │ - corpus.* tables - Fingerprint blobs - Query cache ││
│ │ - binaries.* tables - Model artifacts - Embedding index ││
│ │ - BSim database - Training data ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────────────────┘
3. Implementation Phases
Phase 1: IR-Level Semantic Analysis (Foundation)
Sprints:
SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.mdSPRINT_20260112_004_BINIDX_b2r2_lowuir_perf_cache.md(Performance & Ops)
Leverage B2R2's Intermediate Representation (IR) for semantic-level function comparison.
Key Components:
B2R2LowUirLiftingService- Lifts instructions to B2R2 LowUIR, maps to Stella IR modelB2R2LifterPool- Bounded pool with warm preload for lifter reuseFunctionIrCacheService- Valkey-backed cache for semantic fingerprintsSemanticGraphExtractor- Build Key-Semantics Graph (KSG)WeisfeilerLehmanHasher- Graph fingerprintingSemanticMatcher- Semantic similarity scoring
B2R2LowUirLiftingService Implementation:
- Supports Intel, ARM, MIPS, RISC-V, PowerPC, SPARC, SH4, AVR, EVM
- Maps B2R2 LowUIR statements to
IrStatementmodel - Applies SSA numbering to temporary registers
- Deterministic block ordering (by entry address)
- InvariantCulture formatting throughout
B2R2LifterPool Implementation:
- Bounded per-ISA pooling (default 4 lifters/ISA)
- Warm preload at startup for common ISAs
- Per-ISA stats (pooled, active, max)
- Automatic return on dispose
FunctionIrCacheService Implementation:
- Cache key:
(isa, b2r2_version, normalization_recipe, canonical_ir_hash) - Valkey as hot cache (default 4h TTL)
- PostgreSQL persistence for fingerprint records
- Hit/miss/eviction statistics
Ops Endpoints:
GET /api/v1/ops/binaryindex/health- Lifter warmness, cache statusPOST /api/v1/ops/binaryindex/bench/run- Benchmark latencyGET /api/v1/ops/binaryindex/cache- Cache statisticsGET /api/v1/ops/binaryindex/config- Effective configuration
Deliverables:
StellaOps.BinaryIndex.SemanticlibraryStellaOps.BinaryIndex.Disassembly.B2R2(LowUIR adapter, lifter pool)StellaOps.BinaryIndex.Cache(function IR cache)- BinaryIndexOpsController
- 20+ tasks, ~3 weeks
Phase 2: Function Behavior Corpus (Scale)
Sprint: SPRINT_20260105_001_002_BINDEX_semdiff_corpus.md
Build comprehensive database of known library functions.
Key Components:
- Library corpus connectors (glibc, OpenSSL, zlib, curl, SQLite)
CorpusIngestionService- Batch fingerprint generationFunctionClusteringService- Group similar functionsCorpusQueryService- Function identification
Deliverables:
StellaOps.BinaryIndex.Corpuslibrary- PostgreSQL
corpus.*schema - ~30,000 indexed functions
- 22 tasks, ~4 weeks
Phase 3: Ghidra Integration (Depth)
Sprint: SPRINT_20260105_001_003_BINDEX_semdiff_ghidra.md
Add Ghidra as secondary backend for complex cases.
Key Components:
GhidraHeadlessManager- Process lifecycleVersionTrackingService- Multi-correlator diffingGhidriffBridge- Python interopBSimService- Behavioral similarity
Deliverables:
StellaOps.BinaryIndex.Ghidralibrary- Docker image for Ghidra Headless
- 20 tasks, ~4 weeks
Phase 4: Decompiler & ML (Excellence)
Sprint: SPRINT_20260105_001_004_BINDEX_semdiff_decompiler_ml.md
Highest-fidelity semantic analysis.
Key Components:
IDecompilerService- Ghidra decompilationAstComparisonEngine- Structural similarityOnnxInferenceEngine- ML embeddingsEnsembleDecisionEngine- Multi-signal fusion
Deliverables:
StellaOps.BinaryIndex.DecompilerlibraryStellaOps.BinaryIndex.MLlibrary- Trained CodeBERT-Binary model
- 30 tasks, ~5 weeks
4. Fingerprint Types
4.1 Instruction Fingerprint (Existing)
Algorithm: BasicBlock hash + CFG edge hash + String refs hash
Properties:
- Fast to compute
- Sensitive to instruction changes
- Good for exact/near-exact matches
Weight in ensemble: 15%
4.2 Semantic Fingerprint (Phase 1)
Algorithm: Key-Semantics Graph + Weisfeiler-Lehman hash
Properties:
- Captures data/control dependencies
- Resilient to register renaming
- Resilient to instruction reordering
Weight in ensemble: 25%
4.3 Decompiled Fingerprint (Phase 4)
Algorithm: Normalized AST hash + Pattern detection
Properties:
- Highest semantic fidelity
- Captures algorithmic structure
- Resilient to most optimizations
Weight in ensemble: 35%
4.4 ML Embedding (Phase 4)
Algorithm: CodeBERT-Binary transformer, 768-dim vectors
Properties:
- Learned similarity metric
- Captures latent patterns
- Resilient to obfuscation
Weight in ensemble: 25%
5. Matching Pipeline
sequenceDiagram
participant Client
participant DiffEngine as PatchDiffEngine
participant B2R2
participant Ghidra
participant Corpus
participant Ensemble
Client->>DiffEngine: Compare(oldBinary, newBinary)
par Parallel Analysis
DiffEngine->>B2R2: Disassemble + IR lift
DiffEngine->>Ghidra: Decompile (if needed)
end
B2R2-->>DiffEngine: SemanticFingerprints[]
Ghidra-->>DiffEngine: DecompiledFunctions[]
DiffEngine->>Corpus: IdentifyFunctions(fingerprints)
Corpus-->>DiffEngine: FunctionMatches[]
DiffEngine->>Ensemble: ComputeSimilarity(old, new)
Ensemble-->>DiffEngine: EnsembleResult
DiffEngine-->>Client: PatchDiffResult
6. Fallback Strategy
The system uses a tiered fallback strategy:
Tier 1: B2R2 IR + Semantic Graph (fast, ~90% coverage)
│
│ If confidence < threshold OR architecture unsupported
v
Tier 2: Ghidra Version Tracking (slower, ~95% coverage)
│
│ If function is high-value (CVE-relevant)
v
Tier 3: Decompiled AST + ML Embedding (slowest, ~99% coverage)
Selection Criteria:
| Condition | Backend | Reason |
|---|---|---|
| Standard x64/ARM64 binary | B2R2 only | Fast, accurate |
| Low B2R2 confidence (<0.7) | B2R2 + Ghidra | Validation |
| Exotic architecture | Ghidra only | Better coverage |
| CVE-affected function | Full pipeline | Maximum accuracy |
| Obfuscated binary | ML embedding | Obfuscation resilience |
7. Corpus Coverage
Priority Libraries
| Library | Priority | Functions | CVEs |
|---|---|---|---|
| glibc | Critical | ~15,000 | 50+ |
| OpenSSL | Critical | ~8,000 | 100+ |
| zlib | High | ~200 | 5+ |
| libcurl | High | ~2,000 | 80+ |
| SQLite | High | ~1,500 | 30+ |
| libxml2 | Medium | ~1,200 | 40+ |
| libpng | Medium | ~300 | 10+ |
| expat | Medium | ~150 | 15+ |
Architecture Coverage
| Architecture | B2R2 | Ghidra | Status |
|---|---|---|---|
| x86_64 | Excellent | Excellent | Primary |
| ARM64 | Excellent | Excellent | Primary |
| ARM32 | Good | Excellent | Secondary |
| MIPS32 | Fair | Excellent | Fallback |
| MIPS64 | Fair | Excellent | Fallback |
| RISC-V | Good | Good | Emerging |
| PPC32/64 | Fair | Excellent | Fallback |
8. Performance Characteristics
Latency Budget
| Operation | Target | Notes |
|---|---|---|
| B2R2 disassembly | <100ms | Per function |
| IR lifting | <50ms | Per function |
| Semantic fingerprint | <50ms | Per function |
| Ghidra analysis | <30s | Per binary (startup) |
| Decompilation | <500ms | Per function |
| ML inference | <100ms | Per function |
| Ensemble decision | <10ms | Per comparison |
| Total (Tier 1) | <200ms | Per function |
| Total (Full) | <1s | Per function |
Memory Budget
| Component | Memory | Notes |
|---|---|---|
| B2R2 per binary | ~100MB | Scales with binary size |
| Ghidra per project | ~2GB | Persistent cache |
| ML model | ~500MB | ONNX loaded |
| Corpus query cache | ~100MB | LRU eviction |
9. Integration Points
9.1 Scanner Integration
// Scanner.Worker uses semantic diffing for binary vulnerability detection
var result = await _binaryVulnerabilityService.LookupByFingerprintAsync(
fingerprint,
minSimilarity: 0.85m,
useSemanticMatching: true, // Enable semantic diffing
ct);
9.2 PatchDiffEngine Enhancement
// PatchDiffEngine now includes semantic comparison
var diff = await _patchDiffEngine.DiffAsync(
vulnerableBinary,
patchedBinary,
new PatchDiffOptions
{
UseSemanticAnalysis = true,
SemanticThreshold = 0.7m,
IncludeDecompilation = true,
IncludeMlEmbedding = true
},
ct);
9.3 DeltaSignature Enhancement
// Delta signatures now include semantic fingerprints
var signature = await _deltaSignatureGenerator.GenerateSignaturesAsync(
binaryStream,
new DeltaSignatureRequest
{
Cve = "CVE-2024-1234",
TargetSymbols = ["vulnerable_func"],
IncludeSemanticFingerprint = true,
IncludeDecompiledHash = true
},
ct);
10. Security Considerations
10.1 Sandbox Requirements
All binary analysis runs in sandboxed environments:
- Seccomp profile restricting syscalls
- Read-only root filesystem
- No network access during analysis
- Memory/CPU limits
10.2 Model Security
ML models are:
- Signed with DSSE attestations
- Verified before loading
- Not user-uploadable (pre-trained only)
10.3 Corpus Integrity
Corpus data is:
- Ingested from trusted sources only
- Signed at snapshot level
- Version-controlled with audit trail
11. Configuration
# binaryindex.yaml - Semantic diffing configuration
binaryindex:
semantic_diffing:
enabled: true
# Analysis backends
backends:
b2r2:
enabled: true
ir_lifting: true
semantic_graph: true
ghidra:
enabled: true
fallback_only: true
min_b2r2_confidence: 0.7
headless_timeout_ms: 30000
decompiler:
enabled: true
high_value_only: true # Only for CVE-affected functions
ml:
enabled: true
model_path: /models/codebert_binary_v1.onnx
embedding_dimension: 768
# Ensemble weights
ensemble:
instruction_weight: 0.15
semantic_weight: 0.25
decompiled_weight: 0.35
ml_weight: 0.25
min_confidence: 0.6
# Corpus
corpus:
auto_update: true
update_interval_hours: 24
libraries:
- glibc
- openssl
- zlib
- curl
- sqlite
# Performance
performance:
max_parallel_analyses: 4
cache_ttl_seconds: 3600
max_function_size_bytes: 1048576 # 1MB
Additional appsettings sections (case-insensitive):
BinaryIndex:B2R2Pool- lifter pool sizing and warm ISA list.BinaryIndex:SemanticLifting- LowUIR enablement and deterministic controls.BinaryIndex:FunctionCache- Valkey function cache configuration.Postgres:BinaryIndex- persistence for canonical IR fingerprints.
12. Metrics & Observability
Ops Endpoints
BinaryIndex exposes read-only ops endpoints for health, bench, cache, and effective configuration:
- GET
/api/v1/ops/binaryindex/health-> BinaryIndexOpsHealthResponse - POST
/api/v1/ops/binaryindex/bench/run-> BinaryIndexBenchResponse - GET
/api/v1/ops/binaryindex/cache-> BinaryIndexFunctionCacheStats - GET
/api/v1/ops/binaryindex/config-> BinaryIndexEffectiveConfig
Metrics
| Metric | Type | Labels |
|---|---|---|
semantic_diffing_analysis_total |
Counter | backend, result |
semantic_diffing_latency_ms |
Histogram | backend, tier |
semantic_diffing_accuracy |
Gauge | comparison_type |
corpus_functions_total |
Gauge | library |
ml_inference_latency_ms |
Histogram | model |
ensemble_signal_weight |
Gauge | signal_type |
Traces
semantic_diffing.analyze- Full analysis spansemantic_diffing.b2r2.lift- IR liftingsemantic_diffing.ghidra.decompile- Decompilationsemantic_diffing.ml.inference- ML embeddingsemantic_diffing.ensemble.decide- Ensemble decision
13. Testing Strategy
Unit Tests
| Test Suite | Coverage |
|---|---|
IrLiftingServiceTests |
IR lifting correctness |
SemanticGraphExtractorTests |
Graph construction |
WeisfeilerLehmanHasherTests |
Hash stability |
AstComparisonEngineTests |
AST similarity |
OnnxInferenceEngineTests |
ML inference |
EnsembleDecisionEngineTests |
Weight combination |
Integration Tests
| Test Suite | Coverage |
|---|---|
EndToEndSemanticDiffTests |
Full pipeline |
OptimizationResilienceTests |
O0 vs O2 vs O3 |
CompilerVariantTests |
GCC vs Clang |
GhidraFallbackTests |
Fallback scenarios |
Golden Corpus Tests
Pre-computed test cases with known results:
- 100 CVE patch pairs (vulnerable -> fixed)
- 50 optimization variant sets
- 25 compiler variant sets
- 25 obfuscation variant sets
14. Roadmap
| Phase | Status | ETA | Impact |
|---|---|---|---|
| Phase 1: IR Semantics | Planned | 2026-01-24 | +15% accuracy |
| Phase 2: Corpus | Planned | 2026-02-15 | +10% coverage |
| Phase 3: Ghidra | Planned | 2026-02-28 | +5% edge cases |
| Phase 4: Decompiler/ML | Planned | 2026-03-31 | +10% obfuscation |
| Total | +35-40% |
15. Delta-Sig Predicate Attestation
Sprint Reference: SPRINT_20260117_003_BINDEX_delta_sig_predicate
Delta-sig predicates provide a supply chain attestation format for binary patches, enabling policy-gated releases based on function-level change scope.
15.1 Predicate Structure
{
"_type": "https://in-toto.io/Statement/v1",
"predicateType": "https://stellaops.io/delta-sig/v1",
"subject": [
{
"name": "libexample-1.1.so",
"digest": {
"sha256": "abc123..."
}
}
],
"predicate": {
"before": {
"name": "libexample-1.0.so",
"digest": { "sha256": "def456..." }
},
"after": {
"name": "libexample-1.1.so",
"digest": { "sha256": "abc123..." }
},
"diff": [
{
"function": "process_input",
"changeType": "modified",
"beforeHash": "sha256:old...",
"afterHash": "sha256:new...",
"bytesDelta": 48,
"semanticSimilarity": 0.87
},
{
"function": "new_handler",
"changeType": "added",
"afterHash": "sha256:new...",
"bytesDelta": 256
}
],
"summary": {
"functionsAdded": 1,
"functionsRemoved": 0,
"functionsModified": 1,
"totalBytesChanged": 304
},
"timestamp": "2026-01-16T12:00:00Z"
}
}
15.2 Policy Gate Integration
The DeltaScopePolicyGate enforces limits on patch scope:
policy:
deltaSig:
maxAddedFunctions: 10
maxRemovedFunctions: 5
maxModifiedFunctions: 20
maxBytesChanged: 50000
minSemanticSimilarity: 0.5
requireSemanticAnalysis: false
15.3 Attestor Integration
Delta-sig predicates integrate with the Attestor module:
- Generate - Create predicate from before/after binary analysis
- Sign - Create DSSE envelope with cosign/fulcio signature
- Submit - Log to Rekor transparency log
- Verify - Validate signature and inclusion proof
15.4 CLI Commands
# Generate delta-sig predicate
stella binary diff --before old.so --after new.so --output delta.json
# Generate and attest in one step
stella binary attest --before old.so --after new.so --sign --rekor
# Verify attestation
stella binary verify --predicate delta.json --signature sig.dsse
# Check against policy gate
stella binary gate --predicate delta.json --policy policy.yaml
15.5 Semantic Similarity Scoring
When requireSemanticAnalysis is enabled, the gate also checks:
| Threshold | Meaning |
|---|---|
| > 0.9 | Near-identical (cosmetic changes) |
| 0.7 - 0.9 | Similar (refactoring, optimization) |
| 0.5 - 0.7 | Moderate changes (significant logic) |
| < 0.5 | Major rewrite (requires review) |
15.6 Evidence Storage
Delta-sig predicates are stored in the Evidence Locker and can be included in portable bundles for air-gapped verification.
16. References
Internal
docs/modules/binary-index/architecture.mdsrc/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Fingerprints/
External
- B2R2 Binary Analysis Framework
- Ghidra Patch Diffing Guide
- ghidriff Tool
- SemDiff Paper (arXiv)
- SEI Semantic Equivalence Research
- in-toto Attestation Framework
- SLSA Provenance Spec
17. B2R2 Troubleshooting Guide
This section covers common issues and resolutions when using B2R2 for IR lifting.
17.1 Lifting Failures
Symptom: B2R2LiftingException: Failed to lift function at address 0x...
Common Causes:
- Unsupported instruction - B2R2 may not recognize certain instructions
- Invalid entry point - Function address is not a valid entry point
- Obfuscated code - Heavy obfuscation defeats parsing
Resolution:
// Check if architecture is supported before lifting
if (!liftingService.SupportsArchitecture(binary.Architecture))
{
// Fall back to disassembly-only mode
return await _disassemblyService.DisassembleAsync(binary, ct);
}
// Use try-lift with fallback
var result = await _liftingService.TryLiftWithFallbackAsync(
binary,
new LiftingOptions { FallbackToDisassembly = true },
ct);
17.2 Memory Issues
Symptom: OutOfMemoryException during lifting of large binaries
Common Causes:
- Pool exhaustion - Too many concurrent lifter instances
- Large function - Single function exceeds memory budget
- Memory leak - Lifter instances not properly disposed
Resolution:
# Adjust pool configuration in appsettings.yaml
BinaryIndex:
B2R2Pool:
MaxInstancesPerIsa: 4 # Reduce if OOM
RecycleAfterOperations: 1000 # Force recycle more often
MaxFunctionSizeBytes: 1048576 # Skip very large functions
17.3 Performance Issues
Symptom: Lifting takes longer than expected (>30s for small binaries)
Common Causes:
- Cold pool - No warm lifter instances available
- Complex CFG - Function has extremely complex control flow
- Cache misses - IR cache not configured or full
Resolution:
// Ensure pool is warmed at startup
await _lifterPool.WarmAsync(new[] { ISA.AMD64, ISA.ARM64 }, ct);
// Check cache health
var stats = await _cacheService.GetStatisticsAsync(ct);
if (stats.HitRate < 0.5)
{
_logger.LogWarning("Low cache hit rate: {HitRate:P}", stats.HitRate);
}
17.4 Determinism Issues
Symptom: Same binary produces different IR hashes on repeated lifts
Common Causes:
- Non-deterministic block ordering - Blocks not sorted by address
- Timestamp inclusion - IR includes lift timestamp
- B2R2 version mismatch - Different versions produce different IR
Resolution:
- Ensure
InvariantCultureis used for all string formatting - Sort basic blocks by entry address before hashing
- Include B2R2 version in cache keys
- Use
DeterministicHashutility for consistent hashing
17.5 Architecture Detection Issues
Symptom: Wrong architecture selected for multi-arch binary (fat binary)
Common Causes:
- Universal binary - macOS fat binaries contain multiple architectures
- ELF with multiple ABIs - Rare but possible
Resolution:
// Explicitly specify target architecture
var liftOptions = new LiftingOptions
{
TargetArchitecture = ISA.AMD64, // Force x86-64
IgnoreOtherArchitectures = true
};
17.6 LowUIR Mapping Issues
Symptom: Specific B2R2 LowUIR statements not mapped correctly
Reference: LowUIR Statement Type Mapping
| B2R2 LowUIR | Stella IR Model | Notes |
|---|---|---|
LMark |
IrLabel |
Block label markers |
Put |
IrAssignment |
Register write |
Store |
IrStore |
Memory write |
InterJmp |
IrJump |
Cross-function jump |
IntraJmp |
IrJump |
Intra-function jump |
InterCJmp |
IrConditionalJump |
Cross-function conditional |
IntraCJmp |
IrConditionalJump |
Intra-function conditional |
SideEffect |
IrCall/IrReturn |
Function calls, returns |
Def/Use/Phi |
IrPhi |
SSA form constructs |
17.7 Diagnostic Commands
# Check B2R2 health
stella ops binaryindex health --verbose
# Run benchmark suite
stella ops binaryindex bench --iterations 100 --binary sample.so
# View cache statistics
stella ops binaryindex cache --stats
# Dump effective configuration
stella ops binaryindex config
Document Version: 1.1.0 Last Updated: 2026-01-19