868 lines
31 KiB
Markdown
868 lines
31 KiB
Markdown
# Semantic Diffing Architecture
|
|
|
|
> **Status:** PHASE 1 IMPLEMENTED (B2R2 IR Lifting)
|
|
> **Version:** 1.1.0
|
|
> **Related Sprints:**
|
|
> - `SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.md`
|
|
> - `SPRINT_20260105_001_002_BINDEX_semdiff_corpus.md`
|
|
> - `SPRINT_20260105_001_003_BINDEX_semdiff_ghidra.md`
|
|
> - `SPRINT_20260105_001_004_BINDEX_semdiff_decompiler_ml.md`
|
|
|
|
---
|
|
|
|
## 1. Executive Summary
|
|
|
|
Semantic diffing is an advanced binary analysis capability that detects function equivalence based on **behavior** rather than **syntax**. This enables accurate vulnerability detection in scenarios where traditional byte-level or symbol-based matching fails:
|
|
|
|
- **Compiler optimizations** - Same source, different instructions
|
|
- **Obfuscation** - Intentionally altered code structure
|
|
- **Stripped binaries** - No symbols or debug information
|
|
- **Cross-compiler** - GCC vs Clang produce different output
|
|
- **Backported patches** - Different version, same fix
|
|
|
|
### Expected Impact
|
|
|
|
| Capability | Current Accuracy | With Semantic Diffing |
|
|
|------------|-----------------|----------------------|
|
|
| Patch detection (optimized) | ~70% | 92%+ |
|
|
| Function identification (stripped) | ~50% | 85%+ |
|
|
| Obfuscation resilience | ~40% | 75%+ |
|
|
| False positive rate | ~5% | <2% |
|
|
|
|
---
|
|
|
|
## 2. Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────────┐
|
|
│ Semantic Diffing Architecture │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────────────────┐│
|
|
│ │ Analysis Layer ││
|
|
│ │ ││
|
|
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││
|
|
│ │ │ B2R2 │ │ Ghidra │ │ Decompiler │ │ ML │ ││
|
|
│ │ │ (Primary) │ │ (Fallback) │ │ (Optional) │ │ (Optional) │ ││
|
|
│ │ │ │ │ │ │ │ │ │ ││
|
|
│ │ │ - Disasm │ │ - P-Code │ │ - C output │ │ - CodeBERT │ ││
|
|
│ │ │ - LowUIR │ │ - BSim │ │ - AST parse │ │ - GraphSage │ ││
|
|
│ │ │ - CFG │ │ - Ver.Track │ │ - Normalize │ │ - Embedding │ ││
|
|
│ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ ││
|
|
│ │ │ │ │ │ ││
|
|
│ └─────────┴────────────────┴────────────────┴────────────────┴───────────────┘│
|
|
│ │ │
|
|
│ v │
|
|
│ ┌─────────────────────────────────────────────────────────────────────────────┐│
|
|
│ │ Fingerprint Layer ││
|
|
│ │ ││
|
|
│ │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ││
|
|
│ │ │ Instruction │ │ Semantic │ │ Decompiled │ ││
|
|
│ │ │ Fingerprint │ │ Fingerprint │ │ Fingerprint │ ││
|
|
│ │ │ │ │ │ │ │ ││
|
|
│ │ │ - BasicBlock hash │ │ - KSG graph hash │ │ - AST hash │ ││
|
|
│ │ │ - CFG edge hash │ │ - WL hash │ │ - Normalized code │ ││
|
|
│ │ │ - String refs │ │ - DataFlow hash │ │ - API sequence │ ││
|
|
│ │ │ - Rolling chunks │ │ - API calls │ │ - Pattern hash │ ││
|
|
│ │ └───────────────────┘ └───────────────────┘ └───────────────────┘ ││
|
|
│ │ ││
|
|
│ │ ┌───────────────────┐ ┌───────────────────┐ ││
|
|
│ │ │ BSim │ │ ML Embedding │ ││
|
|
│ │ │ Signature │ │ Vector │ ││
|
|
│ │ │ │ │ │ ││
|
|
│ │ │ - Feature vector │ │ - 768-dim float[] │ ││
|
|
│ │ │ - Significance │ │ - Cosine sim │ ││
|
|
│ │ └───────────────────┘ └───────────────────┘ ││
|
|
│ │ ││
|
|
│ └─────────────────────────────────────────────────────────────────────────────┘│
|
|
│ │ │
|
|
│ v │
|
|
│ ┌─────────────────────────────────────────────────────────────────────────────┐│
|
|
│ │ Matching Layer ││
|
|
│ │ ││
|
|
│ │ ┌───────────────────────────────────────────────────────────────────────┐ ││
|
|
│ │ │ Ensemble Decision Engine │ ││
|
|
│ │ │ │ ││
|
|
│ │ │ Signal Weights: │ ││
|
|
│ │ │ - Instruction fingerprint: 15% │ ││
|
|
│ │ │ - Semantic graph: 25% │ ││
|
|
│ │ │ - Decompiled AST: 35% │ ││
|
|
│ │ │ - ML embedding: 25% │ ││
|
|
│ │ │ │ ││
|
|
│ │ │ Output: Confidence-weighted similarity score │ ││
|
|
│ │ │ │ ││
|
|
│ │ └───────────────────────────────────────────────────────────────────────┘ ││
|
|
│ │ ││
|
|
│ └─────────────────────────────────────────────────────────────────────────────┘│
|
|
│ │ │
|
|
│ v │
|
|
│ ┌─────────────────────────────────────────────────────────────────────────────┐│
|
|
│ │ Storage Layer ││
|
|
│ │ ││
|
|
│ │ PostgreSQL RustFS Valkey ││
|
|
│ │ - corpus.* tables - Fingerprint blobs - Query cache ││
|
|
│ │ - binaries.* tables - Model artifacts - Embedding index ││
|
|
│ │ - BSim database - Training data ││
|
|
│ │ ││
|
|
│ └─────────────────────────────────────────────────────────────────────────────┘│
|
|
└─────────────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Implementation Phases
|
|
|
|
### Phase 1: IR-Level Semantic Analysis (Foundation)
|
|
|
|
**Sprints:**
|
|
- `SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.md`
|
|
- `SPRINT_20260112_004_BINIDX_b2r2_lowuir_perf_cache.md` (Performance & Ops)
|
|
|
|
Leverage B2R2's Intermediate Representation (IR) for semantic-level function comparison.
|
|
|
|
**Key Components:**
|
|
- `B2R2LowUirLiftingService` - Lifts instructions to B2R2 LowUIR, maps to Stella IR model
|
|
- `B2R2LifterPool` - Bounded pool with warm preload for lifter reuse
|
|
- `FunctionIrCacheService` - Valkey-backed cache for semantic fingerprints
|
|
- `SemanticGraphExtractor` - Build Key-Semantics Graph (KSG)
|
|
- `WeisfeilerLehmanHasher` - Graph fingerprinting
|
|
- `SemanticMatcher` - Semantic similarity scoring
|
|
|
|
**B2R2LowUirLiftingService Implementation:**
|
|
- Supports Intel, ARM, MIPS, RISC-V, PowerPC, SPARC, SH4, AVR, EVM
|
|
- Maps B2R2 LowUIR statements to `IrStatement` model
|
|
- Applies SSA numbering to temporary registers
|
|
- Deterministic block ordering (by entry address)
|
|
- InvariantCulture formatting throughout
|
|
|
|
**B2R2LifterPool Implementation:**
|
|
- Bounded per-ISA pooling (default 4 lifters/ISA)
|
|
- Warm preload at startup for common ISAs
|
|
- Per-ISA stats (pooled, active, max)
|
|
- Automatic return on dispose
|
|
|
|
**FunctionIrCacheService Implementation:**
|
|
- Cache key: `(isa, b2r2_version, normalization_recipe, canonical_ir_hash)`
|
|
- Valkey as hot cache (default 4h TTL)
|
|
- PostgreSQL persistence for fingerprint records
|
|
- Hit/miss/eviction statistics
|
|
|
|
**Ops Endpoints:**
|
|
- `GET /api/v1/ops/binaryindex/health` - Lifter warmness, cache status
|
|
- `POST /api/v1/ops/binaryindex/bench/run` - Benchmark latency
|
|
- `GET /api/v1/ops/binaryindex/cache` - Cache statistics
|
|
- `GET /api/v1/ops/binaryindex/config` - Effective configuration
|
|
|
|
**Deliverables:**
|
|
- `StellaOps.BinaryIndex.Semantic` library
|
|
- `StellaOps.BinaryIndex.Disassembly.B2R2` (LowUIR adapter, lifter pool)
|
|
- `StellaOps.BinaryIndex.Cache` (function IR cache)
|
|
- BinaryIndexOpsController
|
|
- 20+ tasks, ~3 weeks
|
|
|
|
### Phase 2: Function Behavior Corpus (Scale)
|
|
|
|
**Sprint:** `SPRINT_20260105_001_002_BINDEX_semdiff_corpus.md`
|
|
|
|
Build comprehensive database of known library functions.
|
|
|
|
**Key Components:**
|
|
- Library corpus connectors (glibc, OpenSSL, zlib, curl, SQLite)
|
|
- `CorpusIngestionService` - Batch fingerprint generation
|
|
- `FunctionClusteringService` - Group similar functions
|
|
- `CorpusQueryService` - Function identification
|
|
|
|
**Deliverables:**
|
|
- `StellaOps.BinaryIndex.Corpus` library
|
|
- PostgreSQL `corpus.*` schema
|
|
- ~30,000 indexed functions
|
|
- 22 tasks, ~4 weeks
|
|
|
|
### Phase 3: Ghidra Integration (Depth)
|
|
|
|
**Sprint:** `SPRINT_20260105_001_003_BINDEX_semdiff_ghidra.md`
|
|
|
|
Add Ghidra as secondary backend for complex cases.
|
|
|
|
**Key Components:**
|
|
- `GhidraHeadlessManager` - Process lifecycle
|
|
- `VersionTrackingService` - Multi-correlator diffing
|
|
- `GhidriffBridge` - Python interop
|
|
- `BSimService` - Behavioral similarity
|
|
|
|
**Deliverables:**
|
|
- `StellaOps.BinaryIndex.Ghidra` library
|
|
- Docker image for Ghidra Headless
|
|
- 20 tasks, ~4 weeks
|
|
|
|
### Phase 4: Decompiler & ML (Excellence)
|
|
|
|
**Sprint:** `SPRINT_20260105_001_004_BINDEX_semdiff_decompiler_ml.md`
|
|
|
|
Highest-fidelity semantic analysis.
|
|
|
|
**Key Components:**
|
|
- `IDecompilerService` - Ghidra decompilation
|
|
- `AstComparisonEngine` - Structural similarity
|
|
- `OnnxInferenceEngine` - ML embeddings
|
|
- `EnsembleDecisionEngine` - Multi-signal fusion
|
|
|
|
**Deliverables:**
|
|
- `StellaOps.BinaryIndex.Decompiler` library
|
|
- `StellaOps.BinaryIndex.ML` library
|
|
- Trained CodeBERT-Binary model
|
|
- 30 tasks, ~5 weeks
|
|
|
|
---
|
|
|
|
## 4. Fingerprint Types
|
|
|
|
### 4.1 Instruction Fingerprint (Existing)
|
|
|
|
**Algorithm:** BasicBlock hash + CFG edge hash + String refs hash
|
|
|
|
**Properties:**
|
|
- Fast to compute
|
|
- Sensitive to instruction changes
|
|
- Good for exact/near-exact matches
|
|
|
|
**Weight in ensemble:** 15%
|
|
|
|
### 4.2 Semantic Fingerprint (Phase 1)
|
|
|
|
**Algorithm:** Key-Semantics Graph + Weisfeiler-Lehman hash
|
|
|
|
**Properties:**
|
|
- Captures data/control dependencies
|
|
- Resilient to register renaming
|
|
- Resilient to instruction reordering
|
|
|
|
**Weight in ensemble:** 25%
|
|
|
|
### 4.3 Decompiled Fingerprint (Phase 4)
|
|
|
|
**Algorithm:** Normalized AST hash + Pattern detection
|
|
|
|
**Properties:**
|
|
- Highest semantic fidelity
|
|
- Captures algorithmic structure
|
|
- Resilient to most optimizations
|
|
|
|
**Weight in ensemble:** 35%
|
|
|
|
### 4.4 ML Embedding (Phase 4)
|
|
|
|
**Algorithm:** CodeBERT-Binary transformer, 768-dim vectors
|
|
|
|
**Properties:**
|
|
- Learned similarity metric
|
|
- Captures latent patterns
|
|
- Resilient to obfuscation
|
|
|
|
**Weight in ensemble:** 25%
|
|
|
|
---
|
|
|
|
## 5. Matching Pipeline
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Client
|
|
participant DiffEngine as PatchDiffEngine
|
|
participant B2R2
|
|
participant Ghidra
|
|
participant Corpus
|
|
participant Ensemble
|
|
|
|
Client->>DiffEngine: Compare(oldBinary, newBinary)
|
|
|
|
par Parallel Analysis
|
|
DiffEngine->>B2R2: Disassemble + IR lift
|
|
DiffEngine->>Ghidra: Decompile (if needed)
|
|
end
|
|
|
|
B2R2-->>DiffEngine: SemanticFingerprints[]
|
|
Ghidra-->>DiffEngine: DecompiledFunctions[]
|
|
|
|
DiffEngine->>Corpus: IdentifyFunctions(fingerprints)
|
|
Corpus-->>DiffEngine: FunctionMatches[]
|
|
|
|
DiffEngine->>Ensemble: ComputeSimilarity(old, new)
|
|
Ensemble-->>DiffEngine: EnsembleResult
|
|
|
|
DiffEngine-->>Client: PatchDiffResult
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Fallback Strategy
|
|
|
|
The system uses a tiered fallback strategy:
|
|
|
|
```
|
|
Tier 1: B2R2 IR + Semantic Graph (fast, ~90% coverage)
|
|
│
|
|
│ If confidence < threshold OR architecture unsupported
|
|
v
|
|
Tier 2: Ghidra Version Tracking (slower, ~95% coverage)
|
|
│
|
|
│ If function is high-value (CVE-relevant)
|
|
v
|
|
Tier 3: Decompiled AST + ML Embedding (slowest, ~99% coverage)
|
|
```
|
|
|
|
**Selection Criteria:**
|
|
|
|
| Condition | Backend | Reason |
|
|
|-----------|---------|--------|
|
|
| Standard x64/ARM64 binary | B2R2 only | Fast, accurate |
|
|
| Low B2R2 confidence (<0.7) | B2R2 + Ghidra | Validation |
|
|
| Exotic architecture | Ghidra only | Better coverage |
|
|
| CVE-affected function | Full pipeline | Maximum accuracy |
|
|
| Obfuscated binary | ML embedding | Obfuscation resilience |
|
|
|
|
---
|
|
|
|
## 7. Corpus Coverage
|
|
|
|
### Priority Libraries
|
|
|
|
| Library | Priority | Functions | CVEs |
|
|
|---------|----------|-----------|------|
|
|
| glibc | Critical | ~15,000 | 50+ |
|
|
| OpenSSL | Critical | ~8,000 | 100+ |
|
|
| zlib | High | ~200 | 5+ |
|
|
| libcurl | High | ~2,000 | 80+ |
|
|
| SQLite | High | ~1,500 | 30+ |
|
|
| libxml2 | Medium | ~1,200 | 40+ |
|
|
| libpng | Medium | ~300 | 10+ |
|
|
| expat | Medium | ~150 | 15+ |
|
|
|
|
### Architecture Coverage
|
|
|
|
| Architecture | B2R2 | Ghidra | Status |
|
|
|--------------|------|--------|--------|
|
|
| x86_64 | Excellent | Excellent | Primary |
|
|
| ARM64 | Excellent | Excellent | Primary |
|
|
| ARM32 | Good | Excellent | Secondary |
|
|
| MIPS32 | Fair | Excellent | Fallback |
|
|
| MIPS64 | Fair | Excellent | Fallback |
|
|
| RISC-V | Good | Good | Emerging |
|
|
| PPC32/64 | Fair | Excellent | Fallback |
|
|
|
|
---
|
|
|
|
## 8. Performance Characteristics
|
|
|
|
### Latency Budget
|
|
|
|
| Operation | Target | Notes |
|
|
|-----------|--------|-------|
|
|
| B2R2 disassembly | <100ms | Per function |
|
|
| IR lifting | <50ms | Per function |
|
|
| Semantic fingerprint | <50ms | Per function |
|
|
| Ghidra analysis | <30s | Per binary (startup) |
|
|
| Decompilation | <500ms | Per function |
|
|
| ML inference | <100ms | Per function |
|
|
| Ensemble decision | <10ms | Per comparison |
|
|
| **Total (Tier 1)** | **<200ms** | Per function |
|
|
| **Total (Full)** | **<1s** | Per function |
|
|
|
|
### Memory Budget
|
|
|
|
| Component | Memory | Notes |
|
|
|-----------|--------|-------|
|
|
| B2R2 per binary | ~100MB | Scales with binary size |
|
|
| Ghidra per project | ~2GB | Persistent cache |
|
|
| ML model | ~500MB | ONNX loaded |
|
|
| Corpus query cache | ~100MB | LRU eviction |
|
|
|
|
---
|
|
|
|
## 9. Integration Points
|
|
|
|
### 9.1 Scanner Integration
|
|
|
|
```csharp
|
|
// Scanner.Worker uses semantic diffing for binary vulnerability detection
|
|
var result = await _binaryVulnerabilityService.LookupByFingerprintAsync(
|
|
fingerprint,
|
|
minSimilarity: 0.85m,
|
|
useSemanticMatching: true, // Enable semantic diffing
|
|
ct);
|
|
```
|
|
|
|
### 9.2 PatchDiffEngine Enhancement
|
|
|
|
```csharp
|
|
// PatchDiffEngine now includes semantic comparison
|
|
var diff = await _patchDiffEngine.DiffAsync(
|
|
vulnerableBinary,
|
|
patchedBinary,
|
|
new PatchDiffOptions
|
|
{
|
|
UseSemanticAnalysis = true,
|
|
SemanticThreshold = 0.7m,
|
|
IncludeDecompilation = true,
|
|
IncludeMlEmbedding = true
|
|
},
|
|
ct);
|
|
```
|
|
|
|
### 9.3 DeltaSignature Enhancement
|
|
|
|
```csharp
|
|
// Delta signatures now include semantic fingerprints
|
|
var signature = await _deltaSignatureGenerator.GenerateSignaturesAsync(
|
|
binaryStream,
|
|
new DeltaSignatureRequest
|
|
{
|
|
Cve = "CVE-2024-1234",
|
|
TargetSymbols = ["vulnerable_func"],
|
|
IncludeSemanticFingerprint = true,
|
|
IncludeDecompiledHash = true
|
|
},
|
|
ct);
|
|
```
|
|
|
|
---
|
|
|
|
## 10. Security Considerations
|
|
|
|
### 10.1 Sandbox Requirements
|
|
|
|
All binary analysis runs in sandboxed environments:
|
|
- Seccomp profile restricting syscalls
|
|
- Read-only root filesystem
|
|
- No network access during analysis
|
|
- Memory/CPU limits
|
|
|
|
### 10.2 Model Security
|
|
|
|
ML models are:
|
|
- Signed with DSSE attestations
|
|
- Verified before loading
|
|
- Not user-uploadable (pre-trained only)
|
|
|
|
### 10.3 Corpus Integrity
|
|
|
|
Corpus data is:
|
|
- Ingested from trusted sources only
|
|
- Signed at snapshot level
|
|
- Version-controlled with audit trail
|
|
|
|
---
|
|
|
|
## 11. Configuration
|
|
|
|
```yaml
|
|
# binaryindex.yaml - Semantic diffing configuration
|
|
binaryindex:
|
|
semantic_diffing:
|
|
enabled: true
|
|
|
|
# Analysis backends
|
|
backends:
|
|
b2r2:
|
|
enabled: true
|
|
ir_lifting: true
|
|
semantic_graph: true
|
|
ghidra:
|
|
enabled: true
|
|
fallback_only: true
|
|
min_b2r2_confidence: 0.7
|
|
headless_timeout_ms: 30000
|
|
decompiler:
|
|
enabled: true
|
|
high_value_only: true # Only for CVE-affected functions
|
|
ml:
|
|
enabled: true
|
|
model_path: /models/codebert_binary_v1.onnx
|
|
embedding_dimension: 768
|
|
|
|
# Ensemble weights
|
|
ensemble:
|
|
instruction_weight: 0.15
|
|
semantic_weight: 0.25
|
|
decompiled_weight: 0.35
|
|
ml_weight: 0.25
|
|
min_confidence: 0.6
|
|
|
|
# Corpus
|
|
corpus:
|
|
auto_update: true
|
|
update_interval_hours: 24
|
|
libraries:
|
|
- glibc
|
|
- openssl
|
|
- zlib
|
|
- curl
|
|
- sqlite
|
|
|
|
# Performance
|
|
performance:
|
|
max_parallel_analyses: 4
|
|
cache_ttl_seconds: 3600
|
|
max_function_size_bytes: 1048576 # 1MB
|
|
```
|
|
|
|
Additional appsettings sections (case-insensitive):
|
|
- `BinaryIndex:B2R2Pool` - lifter pool sizing and warm ISA list.
|
|
- `BinaryIndex:SemanticLifting` - LowUIR enablement and deterministic controls.
|
|
- `BinaryIndex:FunctionCache` - Valkey function cache configuration.
|
|
- `Postgres:BinaryIndex` - persistence for canonical IR fingerprints.
|
|
|
|
---
|
|
|
|
## 12. Metrics & Observability
|
|
|
|
### Ops Endpoints
|
|
|
|
BinaryIndex exposes read-only ops endpoints for health, bench, cache, and effective configuration:
|
|
|
|
- GET `/api/v1/ops/binaryindex/health` -> BinaryIndexOpsHealthResponse
|
|
- POST `/api/v1/ops/binaryindex/bench/run` -> BinaryIndexBenchResponse
|
|
- GET `/api/v1/ops/binaryindex/cache` -> BinaryIndexFunctionCacheStats
|
|
- GET `/api/v1/ops/binaryindex/config` -> BinaryIndexEffectiveConfig
|
|
|
|
### Metrics
|
|
|
|
| Metric | Type | Labels |
|
|
|--------|------|--------|
|
|
| `semantic_diffing_analysis_total` | Counter | backend, result |
|
|
| `semantic_diffing_latency_ms` | Histogram | backend, tier |
|
|
| `semantic_diffing_accuracy` | Gauge | comparison_type |
|
|
| `corpus_functions_total` | Gauge | library |
|
|
| `ml_inference_latency_ms` | Histogram | model |
|
|
| `ensemble_signal_weight` | Gauge | signal_type |
|
|
|
|
### Traces
|
|
|
|
- `semantic_diffing.analyze` - Full analysis span
|
|
- `semantic_diffing.b2r2.lift` - IR lifting
|
|
- `semantic_diffing.ghidra.decompile` - Decompilation
|
|
- `semantic_diffing.ml.inference` - ML embedding
|
|
- `semantic_diffing.ensemble.decide` - Ensemble decision
|
|
|
|
---
|
|
|
|
## 13. Testing Strategy
|
|
|
|
### Unit Tests
|
|
|
|
| Test Suite | Coverage |
|
|
|------------|----------|
|
|
| `IrLiftingServiceTests` | IR lifting correctness |
|
|
| `SemanticGraphExtractorTests` | Graph construction |
|
|
| `WeisfeilerLehmanHasherTests` | Hash stability |
|
|
| `AstComparisonEngineTests` | AST similarity |
|
|
| `OnnxInferenceEngineTests` | ML inference |
|
|
| `EnsembleDecisionEngineTests` | Weight combination |
|
|
|
|
### Integration Tests
|
|
|
|
| Test Suite | Coverage |
|
|
|------------|----------|
|
|
| `EndToEndSemanticDiffTests` | Full pipeline |
|
|
| `OptimizationResilienceTests` | O0 vs O2 vs O3 |
|
|
| `CompilerVariantTests` | GCC vs Clang |
|
|
| `GhidraFallbackTests` | Fallback scenarios |
|
|
|
|
### Golden Corpus Tests
|
|
|
|
Pre-computed test cases with known results:
|
|
- 100 CVE patch pairs (vulnerable -> fixed)
|
|
- 50 optimization variant sets
|
|
- 25 compiler variant sets
|
|
- 25 obfuscation variant sets
|
|
|
|
---
|
|
|
|
## 14. Roadmap
|
|
|
|
| Phase | Status | ETA | Impact |
|
|
|-------|--------|-----|--------|
|
|
| Phase 1: IR Semantics | Planned | 2026-01-24 | +15% accuracy |
|
|
| Phase 2: Corpus | Planned | 2026-02-15 | +10% coverage |
|
|
| Phase 3: Ghidra | Planned | 2026-02-28 | +5% edge cases |
|
|
| Phase 4: Decompiler/ML | Planned | 2026-03-31 | +10% obfuscation |
|
|
| **Total** | | | **+35-40%** |
|
|
|
|
---
|
|
|
|
## 15. Delta-Sig Predicate Attestation
|
|
|
|
**Sprint Reference**: `SPRINT_20260117_003_BINDEX_delta_sig_predicate`
|
|
|
|
Delta-sig predicates provide a supply chain attestation format for binary patches, enabling policy-gated releases based on function-level change scope.
|
|
|
|
### 15.1 Predicate Structure
|
|
|
|
```jsonc
|
|
{
|
|
"_type": "https://in-toto.io/Statement/v1",
|
|
"predicateType": "https://stellaops.io/delta-sig/v1",
|
|
"subject": [
|
|
{
|
|
"name": "libexample-1.1.so",
|
|
"digest": {
|
|
"sha256": "abc123..."
|
|
}
|
|
}
|
|
],
|
|
"predicate": {
|
|
"before": {
|
|
"name": "libexample-1.0.so",
|
|
"digest": { "sha256": "def456..." }
|
|
},
|
|
"after": {
|
|
"name": "libexample-1.1.so",
|
|
"digest": { "sha256": "abc123..." }
|
|
},
|
|
"diff": [
|
|
{
|
|
"function": "process_input",
|
|
"changeType": "modified",
|
|
"beforeHash": "sha256:old...",
|
|
"afterHash": "sha256:new...",
|
|
"bytesDelta": 48,
|
|
"semanticSimilarity": 0.87
|
|
},
|
|
{
|
|
"function": "new_handler",
|
|
"changeType": "added",
|
|
"afterHash": "sha256:new...",
|
|
"bytesDelta": 256
|
|
}
|
|
],
|
|
"summary": {
|
|
"functionsAdded": 1,
|
|
"functionsRemoved": 0,
|
|
"functionsModified": 1,
|
|
"totalBytesChanged": 304
|
|
},
|
|
"timestamp": "2026-01-16T12:00:00Z"
|
|
}
|
|
}
|
|
```
|
|
|
|
### 15.2 Policy Gate Integration
|
|
|
|
The `DeltaScopePolicyGate` enforces limits on patch scope:
|
|
|
|
```yaml
|
|
policy:
|
|
deltaSig:
|
|
maxAddedFunctions: 10
|
|
maxRemovedFunctions: 5
|
|
maxModifiedFunctions: 20
|
|
maxBytesChanged: 50000
|
|
minSemanticSimilarity: 0.5
|
|
requireSemanticAnalysis: false
|
|
```
|
|
|
|
### 15.3 Attestor Integration
|
|
|
|
Delta-sig predicates integrate with the Attestor module:
|
|
|
|
1. **Generate** - Create predicate from before/after binary analysis
|
|
2. **Sign** - Create DSSE envelope with cosign/fulcio signature
|
|
3. **Submit** - Log to Rekor transparency log
|
|
4. **Verify** - Validate signature and inclusion proof
|
|
|
|
### 15.4 CLI Commands
|
|
|
|
```bash
|
|
# Generate delta-sig predicate
|
|
stella binary diff --before old.so --after new.so --output delta.json
|
|
|
|
# Generate and attest in one step
|
|
stella binary attest --before old.so --after new.so --sign --rekor
|
|
|
|
# Verify attestation
|
|
stella binary verify --predicate delta.json --signature sig.dsse
|
|
|
|
# Check against policy gate
|
|
stella binary gate --predicate delta.json --policy policy.yaml
|
|
```
|
|
|
|
### 15.5 Semantic Similarity Scoring
|
|
|
|
When `requireSemanticAnalysis` is enabled, the gate also checks:
|
|
|
|
| Threshold | Meaning |
|
|
|-----------|---------|
|
|
| > 0.9 | Near-identical (cosmetic changes) |
|
|
| 0.7 - 0.9 | Similar (refactoring, optimization) |
|
|
| 0.5 - 0.7 | Moderate changes (significant logic) |
|
|
| < 0.5 | Major rewrite (requires review) |
|
|
|
|
### 15.6 Evidence Storage
|
|
|
|
Delta-sig predicates are stored in the Evidence Locker and can be included in portable bundles for air-gapped verification.
|
|
|
|
---
|
|
|
|
## 16. References
|
|
|
|
### Internal
|
|
|
|
- `docs/modules/binary-index/architecture.md`
|
|
- `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.DeltaSig/`
|
|
- `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Fingerprints/`
|
|
|
|
### External
|
|
|
|
- [B2R2 Binary Analysis Framework](https://b2r2.org/)
|
|
- [Ghidra Patch Diffing Guide](https://cve-north-stars.github.io/docs/Ghidra-Patch-Diffing)
|
|
- [ghidriff Tool](https://github.com/clearbluejar/ghidriff)
|
|
- [SemDiff Paper (arXiv)](https://arxiv.org/abs/2308.01463)
|
|
- [SEI Semantic Equivalence Research](https://www.sei.cmu.edu/annual-reviews/2022-research-review/semantic-equivalence-checking-of-decompiled-binaries/)
|
|
- [in-toto Attestation Framework](https://in-toto.io/)
|
|
- [SLSA Provenance Spec](https://slsa.dev/provenance/v1)
|
|
|
|
---
|
|
|
|
---
|
|
|
|
## 17. B2R2 Troubleshooting Guide
|
|
|
|
This section covers common issues and resolutions when using B2R2 for IR lifting.
|
|
|
|
### 17.1 Lifting Failures
|
|
|
|
**Symptom:** `B2R2LiftingException: Failed to lift function at address 0x...`
|
|
|
|
**Common Causes:**
|
|
1. **Unsupported instruction** - B2R2 may not recognize certain instructions
|
|
2. **Invalid entry point** - Function address is not a valid entry point
|
|
3. **Obfuscated code** - Heavy obfuscation defeats parsing
|
|
|
|
**Resolution:**
|
|
```csharp
|
|
// Check if architecture is supported before lifting
|
|
if (!liftingService.SupportsArchitecture(binary.Architecture))
|
|
{
|
|
// Fall back to disassembly-only mode
|
|
return await _disassemblyService.DisassembleAsync(binary, ct);
|
|
}
|
|
|
|
// Use try-lift with fallback
|
|
var result = await _liftingService.TryLiftWithFallbackAsync(
|
|
binary,
|
|
new LiftingOptions { FallbackToDisassembly = true },
|
|
ct);
|
|
```
|
|
|
|
### 17.2 Memory Issues
|
|
|
|
**Symptom:** `OutOfMemoryException` during lifting of large binaries
|
|
|
|
**Common Causes:**
|
|
1. **Pool exhaustion** - Too many concurrent lifter instances
|
|
2. **Large function** - Single function exceeds memory budget
|
|
3. **Memory leak** - Lifter instances not properly disposed
|
|
|
|
**Resolution:**
|
|
```yaml
|
|
# Adjust pool configuration in appsettings.yaml
|
|
BinaryIndex:
|
|
B2R2Pool:
|
|
MaxInstancesPerIsa: 4 # Reduce if OOM
|
|
RecycleAfterOperations: 1000 # Force recycle more often
|
|
MaxFunctionSizeBytes: 1048576 # Skip very large functions
|
|
```
|
|
|
|
### 17.3 Performance Issues
|
|
|
|
**Symptom:** Lifting takes longer than expected (>30s for small binaries)
|
|
|
|
**Common Causes:**
|
|
1. **Cold pool** - No warm lifter instances available
|
|
2. **Complex CFG** - Function has extremely complex control flow
|
|
3. **Cache misses** - IR cache not configured or full
|
|
|
|
**Resolution:**
|
|
```csharp
|
|
// Ensure pool is warmed at startup
|
|
await _lifterPool.WarmAsync(new[] { ISA.AMD64, ISA.ARM64 }, ct);
|
|
|
|
// Check cache health
|
|
var stats = await _cacheService.GetStatisticsAsync(ct);
|
|
if (stats.HitRate < 0.5)
|
|
{
|
|
_logger.LogWarning("Low cache hit rate: {HitRate:P}", stats.HitRate);
|
|
}
|
|
```
|
|
|
|
### 17.4 Determinism Issues
|
|
|
|
**Symptom:** Same binary produces different IR hashes on repeated lifts
|
|
|
|
**Common Causes:**
|
|
1. **Non-deterministic block ordering** - Blocks not sorted by address
|
|
2. **Timestamp inclusion** - IR includes lift timestamp
|
|
3. **B2R2 version mismatch** - Different versions produce different IR
|
|
|
|
**Resolution:**
|
|
- Ensure `InvariantCulture` is used for all string formatting
|
|
- Sort basic blocks by entry address before hashing
|
|
- Include B2R2 version in cache keys
|
|
- Use `DeterministicHash` utility for consistent hashing
|
|
|
|
### 17.5 Architecture Detection Issues
|
|
|
|
**Symptom:** Wrong architecture selected for multi-arch binary (fat binary)
|
|
|
|
**Common Causes:**
|
|
1. **Universal binary** - macOS fat binaries contain multiple architectures
|
|
2. **ELF with multiple ABIs** - Rare but possible
|
|
|
|
**Resolution:**
|
|
```csharp
|
|
// Explicitly specify target architecture
|
|
var liftOptions = new LiftingOptions
|
|
{
|
|
TargetArchitecture = ISA.AMD64, // Force x86-64
|
|
IgnoreOtherArchitectures = true
|
|
};
|
|
```
|
|
|
|
### 17.6 LowUIR Mapping Issues
|
|
|
|
**Symptom:** Specific B2R2 LowUIR statements not mapped correctly
|
|
|
|
**Reference: LowUIR Statement Type Mapping**
|
|
|
|
| B2R2 LowUIR | Stella IR Model | Notes |
|
|
|-------------|-----------------|-------|
|
|
| `LMark` | `IrLabel` | Block label markers |
|
|
| `Put` | `IrAssignment` | Register write |
|
|
| `Store` | `IrStore` | Memory write |
|
|
| `InterJmp` | `IrJump` | Cross-function jump |
|
|
| `IntraJmp` | `IrJump` | Intra-function jump |
|
|
| `InterCJmp` | `IrConditionalJump` | Cross-function conditional |
|
|
| `IntraCJmp` | `IrConditionalJump` | Intra-function conditional |
|
|
| `SideEffect` | `IrCall`/`IrReturn` | Function calls, returns |
|
|
| `Def`/`Use`/`Phi` | `IrPhi` | SSA form constructs |
|
|
|
|
### 17.7 Diagnostic Commands
|
|
|
|
```bash
|
|
# Check B2R2 health
|
|
stella ops binaryindex health --verbose
|
|
|
|
# Run benchmark suite
|
|
stella ops binaryindex bench --iterations 100 --binary sample.so
|
|
|
|
# View cache statistics
|
|
stella ops binaryindex cache --stats
|
|
|
|
# Dump effective configuration
|
|
stella ops binaryindex config
|
|
```
|
|
|
|
---
|
|
|
|
*Document Version: 1.1.0*
|
|
*Last Updated: 2026-01-19*
|