Files
git.stella-ops.org/docs/features/unchecked/binaryindex/semantic-analysis-library.md

2.5 KiB

Semantic Analysis Library (IR Lifting and Function Fingerprinting)

Module

BinaryIndex

Status

IMPLEMENTED

Description

Semantic binary analysis with IR lifting, function fingerprint generation, semantic matching, graph extraction, and call n-gram generation for function-level binary comparison.

Implementation Details

  • Modules: src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/
  • Key Classes:
    • IrLiftingService (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/IrLiftingService.cs) - lifts machine code to intermediate representation using B2R2
    • SemanticFingerprintGenerator (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticFingerprintGenerator.cs) - generates SemanticFingerprint using Weisfeiler-Lehman graph hashing (KsgWeisfeilerLehmanV1 algorithm)
    • SemanticGraphExtractor (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticGraphExtractor.cs) - extracts key-semantics graphs (KSG) from lifted IR
    • SemanticMatcher (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticMatcher.cs) - matches semantic fingerprints for similarity scoring
    • CallNgramGenerator (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/CallNgramGenerator.cs) - call-sequence n-gram fingerprinting
    • WeisfeilerLehmanHasher (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/Internal/WeisfeilerLehmanHasher.cs) - WL graph hash implementation
    • GraphCanonicalizer (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/Internal/GraphCanonicalizer.cs) - graph canonicalization for deterministic hashing
  • Models: FingerprintModels (SemanticFingerprint, SemanticFingerprintOptions, SemanticFingerprintAlgorithm), GraphModels (KeySemanticsGraph), IrModels (LiftedFunction, IrStatement)
  • Interfaces: IIrLiftingService, ISemanticFingerprintGenerator, ISemanticGraphExtractor, ISemanticMatcher

E2E Test Plan

  • Lift a binary function to IR via IrLiftingService and verify IR structure contains valid statements
  • Generate a semantic fingerprint via SemanticFingerprintGenerator and verify hash is deterministic
  • Extract a key-semantics graph via SemanticGraphExtractor and verify node/edge structure
  • Match two fingerprints of the same function (different compilers) via SemanticMatcher and verify high similarity
  • Verify Weisfeiler-Lehman graph hash produces different hashes for structurally different functions
  • Verify GraphCanonicalizer produces consistent canonical forms for isomorphic graphs