2.6 KiB
2.6 KiB
Vulnerable Code Fingerprint Matching (CFG + Basic Block + String Refs Ensemble)
Module
BinaryIndex
Status
IMPLEMENTED
Description
Function-level vulnerability detection independent of package metadata using an ensemble of fingerprint algorithms: basic block hashing, control flow graph fingerprinting, and string reference fingerprinting. Combined generator provides multi-algorithm similarity matching with configurable thresholds. Includes pre-seeded fingerprints for high-impact CVEs in OpenSSL, glibc, zlib, and curl.
Implementation Details
- Modules:
src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/,src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/,src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/ - Key Classes:
SignatureMatcher(src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/SignatureMatcher.cs) - matches vulnerability signatures using fingerprint indexEnsembleDecisionEngine(src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/EnsembleDecisionEngine.cs) - combines CFG, basic block, string ref, and ML embedding fingerprints with configurable weightsFunctionAnalysisBuilder(src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/FunctionAnalysisBuilder.cs) - assembles multi-algorithm fingerprint inputsSemanticFingerprintGenerator(src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticFingerprintGenerator.cs) - KSG-based semantic fingerprintingCallNgramGenerator(src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/CallNgramGenerator.cs) - call-sequence fingerprintingBinaryVulnerabilityService(src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/Services/BinaryVulnerabilityService.cs) - vulnerability lookup with pre-seeded fingerprints
- Models:
SignatureIndexModels(src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/Models/) - fingerprint index models - Source: SPRINT_20251226_013_BINIDX_fingerprint_factory.md
E2E Test Plan
- Match a known vulnerable function (e.g., OpenSSL Heartbleed) against pre-seeded fingerprints and verify detection
- Verify multi-algorithm ensemble: CFG fingerprint + basic block hash + string refs all contribute to match score
- Verify configurable threshold: adjust threshold to 0.8 and verify borderline matches are excluded
- Verify pre-seeded fingerprints exist for high-impact CVEs (OpenSSL, glibc, zlib, curl)
- Verify false positive rate: submit clean binary functions and verify no false matches
- Verify
EnsembleDecisionEngineweight tuning affects match outcomes