# Vulnerable Code Fingerprint Matching (CFG + Basic Block + String Refs Ensemble) ## Module BinaryIndex ## Status IMPLEMENTED ## Description Function-level vulnerability detection independent of package metadata using an ensemble of fingerprint algorithms: basic block hashing, control flow graph fingerprinting, and string reference fingerprinting. Combined generator provides multi-algorithm similarity matching with configurable thresholds. Includes pre-seeded fingerprints for high-impact CVEs in OpenSSL, glibc, zlib, and curl. ## Implementation Details - **Modules**: `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/`, `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/` - **Key Classes**: - `SignatureMatcher` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/SignatureMatcher.cs`) - matches vulnerability signatures using fingerprint index - `EnsembleDecisionEngine` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/EnsembleDecisionEngine.cs`) - combines CFG, basic block, string ref, and ML embedding fingerprints with configurable weights - `FunctionAnalysisBuilder` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/FunctionAnalysisBuilder.cs`) - assembles multi-algorithm fingerprint inputs - `SemanticFingerprintGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticFingerprintGenerator.cs`) - KSG-based semantic fingerprinting - `CallNgramGenerator` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/CallNgramGenerator.cs`) - call-sequence fingerprinting - `BinaryVulnerabilityService` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/Services/BinaryVulnerabilityService.cs`) - vulnerability lookup with pre-seeded fingerprints - **Models**: `SignatureIndexModels` (`src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/Models/`) - fingerprint index models - **Source**: SPRINT_20251226_013_BINIDX_fingerprint_factory.md ## E2E Test Plan - [ ] Match a known vulnerable function (e.g., OpenSSL Heartbleed) against pre-seeded fingerprints and verify detection - [ ] Verify multi-algorithm ensemble: CFG fingerprint + basic block hash + string refs all contribute to match score - [ ] Verify configurable threshold: adjust threshold to 0.8 and verify borderline matches are excluded - [ ] Verify pre-seeded fingerprints exist for high-impact CVEs (OpenSSL, glibc, zlib, curl) - [ ] Verify false positive rate: submit clean binary functions and verify no false matches - [ ] Verify `EnsembleDecisionEngine` weight tuning affects match outcomes