# Semantic Analysis Library (IR Lifting and Function Fingerprinting) ## Module BinaryIndex ## Status IMPLEMENTED ## Description Semantic binary analysis with IR lifting, function fingerprint generation, semantic matching, graph extraction, and call n-gram generation for function-level binary comparison. ## Implementation Details - **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/` - **Key Classes**: - `IrLiftingService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/IrLiftingService.cs`) - lifts machine code to intermediate representation using B2R2 - `SemanticFingerprintGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticFingerprintGenerator.cs`) - generates `SemanticFingerprint` using Weisfeiler-Lehman graph hashing (KsgWeisfeilerLehmanV1 algorithm) - `SemanticGraphExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticGraphExtractor.cs`) - extracts key-semantics graphs (KSG) from lifted IR - `SemanticMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticMatcher.cs`) - matches semantic fingerprints for similarity scoring - `CallNgramGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/CallNgramGenerator.cs`) - call-sequence n-gram fingerprinting - `WeisfeilerLehmanHasher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/Internal/WeisfeilerLehmanHasher.cs`) - WL graph hash implementation - `GraphCanonicalizer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/Internal/GraphCanonicalizer.cs`) - graph canonicalization for deterministic hashing - **Models**: `FingerprintModels` (SemanticFingerprint, SemanticFingerprintOptions, SemanticFingerprintAlgorithm), `GraphModels` (KeySemanticsGraph), `IrModels` (LiftedFunction, IrStatement) - **Interfaces**: `IIrLiftingService`, `ISemanticFingerprintGenerator`, `ISemanticGraphExtractor`, `ISemanticMatcher` ## E2E Test Plan - [ ] Lift a binary function to IR via `IrLiftingService` and verify IR structure contains valid statements - [ ] Generate a semantic fingerprint via `SemanticFingerprintGenerator` and verify hash is deterministic - [ ] Extract a key-semantics graph via `SemanticGraphExtractor` and verify node/edge structure - [ ] Match two fingerprints of the same function (different compilers) via `SemanticMatcher` and verify high similarity - [ ] Verify Weisfeiler-Lehman graph hash produces different hashes for structurally different functions - [ ] Verify `GraphCanonicalizer` produces consistent canonical forms for isomorphic graphs