Files
git.stella-ops.org/docs/features/checked/binaryindex/vulnerable-code-fingerprint-matching.md
2026-02-12 21:02:43 +02:00

3.0 KiB

Vulnerable Code Fingerprint Matching (CFG + Basic Block + String Refs Ensemble)

Module

BinaryIndex

Status

VERIFIED

Description

Function-level vulnerability detection independent of package metadata using an ensemble of fingerprint algorithms: basic block hashing, control flow graph fingerprinting, and string reference fingerprinting. Combined generator provides multi-algorithm similarity matching with configurable thresholds. Includes pre-seeded fingerprints for high-impact CVEs in OpenSSL, glibc, zlib, and curl.

Implementation Details

  • Modules: src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/, src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/, src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/
  • Key Classes:
    • SignatureMatcher (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/SignatureMatcher.cs) - matches vulnerability signatures using fingerprint index
    • EnsembleDecisionEngine (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/EnsembleDecisionEngine.cs) - combines CFG, basic block, string ref, and ML embedding fingerprints with configurable weights
    • FunctionAnalysisBuilder (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/FunctionAnalysisBuilder.cs) - assembles multi-algorithm fingerprint inputs
    • SemanticFingerprintGenerator (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/SemanticFingerprintGenerator.cs) - KSG-based semantic fingerprinting
    • CallNgramGenerator (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/CallNgramGenerator.cs) - call-sequence fingerprinting
    • BinaryVulnerabilityService (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Persistence/Services/BinaryVulnerabilityService.cs) - vulnerability lookup with pre-seeded fingerprints
  • Models: SignatureIndexModels (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Analysis/Models/) - fingerprint index models
  • Source: SPRINT_20251226_013_BINIDX_fingerprint_factory.md

E2E Test Plan

  • Match a known vulnerable function (e.g., OpenSSL Heartbleed) against pre-seeded fingerprints and verify detection
  • Verify multi-algorithm ensemble: CFG fingerprint + basic block hash + string refs all contribute to match score
  • Verify configurable threshold: adjust threshold to 0.8 and verify borderline matches are excluded
  • Verify pre-seeded fingerprints exist for high-impact CVEs (OpenSSL, glibc, zlib, curl)
  • Verify false positive rate: submit clean binary functions and verify no false matches
  • Verify EnsembleDecisionEngine weight tuning affects match outcomes

Verification

  • Verified on 2026-02-12 via run run-002.
  • Tier 0 source/symbol checks: pass.
  • Tier 1 build/tests/code-review: pass (420/420 tests).
  • Tier 2 behavioral verification: pass (golden signature behavior, threshold behavior, and pre-seeded package coverage including openssl/glibc/zlib/curl).
  • Run evidence: docs/qa/feature-checks/runs/binaryindex/vulnerable-code-fingerprint-matching/run-002/.