Files
git.stella-ops.org/docs/modules/binary-index
StellaOps Bot 37e11918e0 save progress
2026-01-06 09:42:20 +02:00
..
2026-01-06 09:42:20 +02:00
2026-01-06 09:42:20 +02:00
2026-01-06 09:42:20 +02:00
2026-01-06 09:42:20 +02:00
2026-01-06 09:42:20 +02:00
2026-01-06 09:42:20 +02:00

BinaryIndex

Status: Implemented Source: src/BinaryIndex/ Owner: Scanner Guild + Concelier Guild

Purpose

BinaryIndex provides vulnerable binary detection independent of package metadata. It addresses the gap where package version strings can lie (backports, custom builds, stripped metadata) through binary-first vulnerability identification using Build-IDs, hash catalogs, and function fingerprints.

Components

Libraries:

  • StellaOps.BinaryIndex.Core - Core binary identity extraction and matching engine
  • StellaOps.BinaryIndex.Corpus - Binary-to-advisory mapping database
  • StellaOps.BinaryIndex.Corpus.Debian - Debian-specific corpus support
  • StellaOps.BinaryIndex.Fingerprints - Function fingerprint storage and matching (CFG/basic-block hashes)
  • StellaOps.BinaryIndex.FixIndex - Patch-aware backport handling
  • StellaOps.BinaryIndex.Persistence - Storage adapters for binary catalogs

Configuration

Configuration is typically embedded in Scanner and Concelier module settings.

Key features:

  • Three-tier binary identification (package/version, Build-ID/hash, function fingerprints)
  • Binary identity extraction (Build-ID, PE CodeView GUID, Mach-O UUID)
  • Integration with Scanner.Worker for binary lookup
  • Offline-first design with deterministic outputs

Dependencies

  • PostgreSQL (integrated with Scanner/Concelier schemas)
  • Scanner.Analyzers.Native (for binary disassembly/analysis)
  • Concelier (for advisory-to-binary mapping)
  • Architecture: ./architecture.md
  • High-Level Architecture: ../../ARCHITECTURE_OVERVIEW.md
  • Scanner Architecture: ../scanner/architecture.md
  • Concelier Architecture: ../concelier/architecture.md

Current Status

Library implementation complete with support for ELF (Build-ID), PE (CodeView GUID), and Mach-O (UUID) binary formats. Integrated into Scanner's native binary analysis pipeline.


Semantic Diffing Roadmap

A major enhancement to BinaryIndex is planned to enable semantic-level binary diffing - detecting function equivalence based on behavior rather than syntax. This addresses limitations in current byte/symbol-based matching when dealing with:

  • Compiler optimizations (same source, different instructions)
  • Stripped binaries (no symbols)
  • Cross-compiler builds (GCC vs Clang)
  • Obfuscated code

Planned Phases

Phase Description Impact Status
Phase 1 IR-Level Semantic Analysis +15% accuracy on optimized binaries Planned
Phase 2 Function Behavior Corpus +10% coverage on stripped binaries Planned
Phase 3 Ghidra Integration +5% edge case handling Planned
Phase 4 Decompiler & ML Similarity +10% obfuscation resilience Planned

New Libraries (Planned)

  • StellaOps.BinaryIndex.Semantic - IR lifting and semantic graph fingerprints
  • StellaOps.BinaryIndex.Corpus - 30K+ function behavior database
  • StellaOps.BinaryIndex.Ghidra - Ghidra Headless integration
  • StellaOps.BinaryIndex.Decompiler - Decompiled code AST comparison
  • StellaOps.BinaryIndex.ML - CodeBERT-based function embeddings
  • StellaOps.BinaryIndex.Ensemble - Multi-signal decision fusion

Expected Outcomes

Metric Current Target
Patch detection accuracy ~70% 92%+
Function identification (stripped) ~50% 85%+
False positive rate ~5% <2%

Sprint Files

  • docs/implplan/SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.md
  • docs/implplan/SPRINT_20260105_001_002_BINDEX_semdiff_corpus.md
  • docs/implplan/SPRINT_20260105_001_003_BINDEX_semdiff_ghidra.md
  • docs/implplan/SPRINT_20260105_001_004_BINDEX_semdiff_decompiler_ml.md

Architecture Documentation

See ./semantic-diffing.md for comprehensive architecture documentation.