Files
git.stella-ops.org/docs/features/unchecked/binaryindex/semantic-analysis-library.md

32 lines
2.5 KiB
Markdown

# Semantic Analysis Library (IR Lifting and Function Fingerprinting)
## Module
BinaryIndex
## Status
IMPLEMENTED
## Description
Semantic binary analysis with IR lifting, function fingerprint generation, semantic matching, graph extraction, and call n-gram generation for function-level binary comparison.
## Implementation Details
- **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/`
- **Key Classes**:
- `IrLiftingService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/IrLiftingService.cs`) - lifts machine code to intermediate representation using B2R2
- `SemanticFingerprintGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticFingerprintGenerator.cs`) - generates `SemanticFingerprint` using Weisfeiler-Lehman graph hashing (KsgWeisfeilerLehmanV1 algorithm)
- `SemanticGraphExtractor` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticGraphExtractor.cs`) - extracts key-semantics graphs (KSG) from lifted IR
- `SemanticMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticMatcher.cs`) - matches semantic fingerprints for similarity scoring
- `CallNgramGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/CallNgramGenerator.cs`) - call-sequence n-gram fingerprinting
- `WeisfeilerLehmanHasher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/Internal/WeisfeilerLehmanHasher.cs`) - WL graph hash implementation
- `GraphCanonicalizer` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/Internal/GraphCanonicalizer.cs`) - graph canonicalization for deterministic hashing
- **Models**: `FingerprintModels` (SemanticFingerprint, SemanticFingerprintOptions, SemanticFingerprintAlgorithm), `GraphModels` (KeySemanticsGraph), `IrModels` (LiftedFunction, IrStatement)
- **Interfaces**: `IIrLiftingService`, `ISemanticFingerprintGenerator`, `ISemanticGraphExtractor`, `ISemanticMatcher`
## E2E Test Plan
- [ ] Lift a binary function to IR via `IrLiftingService` and verify IR structure contains valid statements
- [ ] Generate a semantic fingerprint via `SemanticFingerprintGenerator` and verify hash is deterministic
- [ ] Extract a key-semantics graph via `SemanticGraphExtractor` and verify node/edge structure
- [ ] Match two fingerprints of the same function (different compilers) via `SemanticMatcher` and verify high similarity
- [ ] Verify Weisfeiler-Lehman graph hash produces different hashes for structurally different functions
- [ ] Verify `GraphCanonicalizer` produces consistent canonical forms for isomorphic graphs