Files
git.stella-ops.org/docs/modules/binary-index
master 7f7eb8b228 Complete batch 012 (golden set diff) and 013 (advisory chat), fix build errors
Sprints completed:
- SPRINT_20260110_012_* (golden set diff layer - 10 sprints)
- SPRINT_20260110_013_* (advisory chat - 4 sprints)

Build fixes applied:
- Fix namespace conflicts with Microsoft.Extensions.Options.Options.Create
- Fix VexDecisionReachabilityIntegrationTests API drift (major rewrite)
- Fix VexSchemaValidationTests FluentAssertions method name
- Fix FixChainGateIntegrationTests ambiguous type references
- Fix AdvisoryAI test files required properties and namespace aliases
- Add stub types for CveMappingController (ICveSymbolMappingService)
- Fix VerdictBuilderService static context issue

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 10:09:07 +02:00
..
2026-01-06 19:07:48 +02:00
2026-01-06 09:42:20 +02:00
2026-01-06 09:42:20 +02:00
2026-01-06 09:42:20 +02:00
2026-01-06 09:42:20 +02:00
2026-01-06 09:42:20 +02:00

BinaryIndex

Status: Implemented Source: src/BinaryIndex/ Owner: Scanner Guild + Concelier Guild

Purpose

BinaryIndex provides vulnerable binary detection independent of package metadata. It addresses the gap where package version strings can lie (backports, custom builds, stripped metadata) through binary-first vulnerability identification using Build-IDs, hash catalogs, and function fingerprints.

Components

Libraries:

  • StellaOps.BinaryIndex.Core - Core binary identity extraction and matching engine
  • StellaOps.BinaryIndex.Corpus - Binary-to-advisory mapping database
  • StellaOps.BinaryIndex.Corpus.Debian - Debian-specific corpus support
  • StellaOps.BinaryIndex.Fingerprints - Function fingerprint storage and matching (CFG/basic-block hashes)
  • StellaOps.BinaryIndex.FixIndex - Patch-aware backport handling
  • StellaOps.BinaryIndex.Persistence - Storage adapters for binary catalogs

Configuration

Configuration is typically embedded in Scanner and Concelier module settings.

Key features:

  • Three-tier binary identification (package/version, Build-ID/hash, function fingerprints)
  • Binary identity extraction (Build-ID, PE CodeView GUID, Mach-O UUID)
  • Integration with Scanner.Worker for binary lookup
  • Offline-first design with deterministic outputs

Dependencies

  • PostgreSQL (integrated with Scanner/Concelier schemas)
  • Scanner.Analyzers.Native (for binary disassembly/analysis)
  • Concelier (for advisory-to-binary mapping)
  • Architecture: ./architecture.md
  • High-Level Architecture: ../../ARCHITECTURE_OVERVIEW.md
  • Scanner Architecture: ../scanner/architecture.md
  • Concelier Architecture: ../concelier/architecture.md

Current Status

Library implementation complete with support for ELF (Build-ID), PE (CodeView GUID), and Mach-O (UUID) binary formats. Integrated into Scanner's native binary analysis pipeline.


Semantic Diffing Roadmap

A major enhancement to BinaryIndex is planned to enable semantic-level binary diffing - detecting function equivalence based on behavior rather than syntax. This addresses limitations in current byte/symbol-based matching when dealing with:

  • Compiler optimizations (same source, different instructions)
  • Stripped binaries (no symbols)
  • Cross-compiler builds (GCC vs Clang)
  • Obfuscated code

Planned Phases

Phase Description Impact Status
Phase 1 IR-Level Semantic Analysis +15% accuracy on optimized binaries Planned
Phase 2 Function Behavior Corpus +10% coverage on stripped binaries Planned
Phase 3 Ghidra Integration +5% edge case handling Planned
Phase 4 Decompiler & ML Similarity +10% obfuscation resilience Planned

New Libraries (Planned)

  • StellaOps.BinaryIndex.Semantic - IR lifting and semantic graph fingerprints
  • StellaOps.BinaryIndex.Corpus - 30K+ function behavior database
  • StellaOps.BinaryIndex.Ghidra - Ghidra Headless integration
  • StellaOps.BinaryIndex.Decompiler - Decompiled code AST comparison
  • StellaOps.BinaryIndex.ML - CodeBERT-based function embeddings
  • StellaOps.BinaryIndex.Ensemble - Multi-signal decision fusion

Expected Outcomes

Metric Current Target
Patch detection accuracy ~70% 92%+
Function identification (stripped) ~50% 85%+
False positive rate ~5% <2%

Sprint Files

  • docs/implplan/SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.md
  • docs/implplan/SPRINT_20260105_001_002_BINDEX_semdiff_corpus.md
  • docs/implplan/SPRINT_20260105_001_003_BINDEX_semdiff_ghidra.md
  • docs/implplan/SPRINT_20260105_001_004_BINDEX_semdiff_decompiler_ml.md

Architecture Documentation

See ./semantic-diffing.md for comprehensive architecture documentation.