32 lines
2.6 KiB
Markdown
32 lines
2.6 KiB
Markdown
# Semantic Analysis Library (IR Lifting and Function Fingerprinting)
|
|
|
|
## Module
|
|
BinaryIndex
|
|
|
|
## Status
|
|
IMPLEMENTED
|
|
|
|
## Description
|
|
Semantic binary analysis with IR lifting, function fingerprint generation, semantic matching, graph extraction, and call n-gram generation for function-level binary comparison.
|
|
|
|
## Implementation Details
|
|
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/`
|
|
- **Key Classes**:
|
|
- `IrLiftingService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/IrLiftingService.cs`) - lifts disassembled instructions to deterministic IR/SSA models (with B2R2-specific lifting types available under `Lifting/`)
|
|
- `SemanticFingerprintGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticFingerprintGenerator.cs`) - generates `SemanticFingerprint` using Weisfeiler-Lehman graph hashing (KsgWeisfeilerLehmanV1 algorithm)
|
|
- `SemanticGraphExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticGraphExtractor.cs`) - extracts key-semantics graphs (KSG) from lifted IR
|
|
- `SemanticMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticMatcher.cs`) - matches semantic fingerprints for similarity scoring
|
|
- `CallNgramGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/CallNgramGenerator.cs`) - call-sequence n-gram fingerprinting
|
|
- `WeisfeilerLehmanHasher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/Internal/WeisfeilerLehmanHasher.cs`) - WL graph hash implementation
|
|
- `GraphCanonicalizer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/Internal/GraphCanonicalizer.cs`) - graph canonicalization for deterministic hashing
|
|
- **Models**: `FingerprintModels` (SemanticFingerprint, SemanticFingerprintOptions, SemanticFingerprintAlgorithm), `GraphModels` (KeySemanticsGraph), `IrModels` (LiftedFunction, IrStatement)
|
|
- **Interfaces**: `IIrLiftingService`, `ISemanticFingerprintGenerator`, `ISemanticGraphExtractor`, `ISemanticMatcher`
|
|
|
|
## E2E Test Plan
|
|
- [ ] Lift a binary function to IR via `IrLiftingService` and verify IR structure contains valid statements
|
|
- [ ] Generate a semantic fingerprint via `SemanticFingerprintGenerator` and verify hash is deterministic
|
|
- [ ] Extract a key-semantics graph via `SemanticGraphExtractor` and verify node/edge structure
|
|
- [ ] Match two fingerprints of the same function (different compilers) via `SemanticMatcher` and verify high similarity
|
|
- [ ] Verify Weisfeiler-Lehman graph hash produces different hashes for structurally different functions
|
|
- [ ] Verify `GraphCanonicalizer` produces consistent canonical forms for isomorphic graphs
|