Files
git.stella-ops.org/docs/features/unchecked/binaryindex/ensemble-decision-engine-for-multi-tier-matching.md

2.2 KiB

Ensemble decision engine for multi-tier matching

Module

BinaryIndex

Status

IMPLEMENTED

Description

Ensemble decision engine combines multiple matching tiers (range match, Build-ID, fingerprint) with configurable weight tuning for vulnerability classification.

Implementation Details

  • Modules: src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/
  • Key Classes:
    • EnsembleDecisionEngine (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/EnsembleDecisionEngine.cs) - combines multiple matching signals with configurable weights into a final vulnerability classification decision
    • FunctionAnalysisBuilder (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/FunctionAnalysisBuilder.cs) - builds function analysis inputs including optional ML embeddings
    • WeightTuningService (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/WeightTuningService.cs) - tunes ensemble weights based on golden set validation results
    • EnsembleOptions (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/Models.cs) - configurable weights and thresholds for matching tiers
    • MlEmbeddingMatcherAdapter (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.ML/Training/MlEmbeddingMatcherAdapter.cs) - adapts ML function embeddings for ensemble use
  • Interfaces: IEnsembleDecisionEngine (src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Ensemble/IEnsembleDecisionEngine.cs)
  • Registration: EnsembleServiceCollectionExtensions.AddBinarySimilarityServices() for full pipeline setup
  • Benchmarks: EnsembleAccuracyBenchmarks, EnsembleLatencyBenchmarks (src/BinaryIndex/__Tests/StellaOps.BinaryIndex.Benchmarks/)

E2E Test Plan

  • Submit a binary with known vulnerability and verify ensemble produces correct classification
  • Verify weight tuning: adjust instruction weight to 0.6 and verify it changes classification outcomes
  • Verify multi-tier integration: Build-ID match, fingerprint match, and ML embedding all contribute to score
  • Verify FunctionAnalysisBuilder correctly assembles all matching dimensions
  • Verify WeightTuningService optimizes weights based on golden set validation accuracy
  • Run accuracy benchmark and verify F1 score meets minimum threshold