Sprints completed: - SPRINT_20260110_012_* (golden set diff layer - 10 sprints) - SPRINT_20260110_013_* (advisory chat - 4 sprints) Build fixes applied: - Fix namespace conflicts with Microsoft.Extensions.Options.Options.Create - Fix VexDecisionReachabilityIntegrationTests API drift (major rewrite) - Fix VexSchemaValidationTests FluentAssertions method name - Fix FixChainGateIntegrationTests ambiguous type references - Fix AdvisoryAI test files required properties and namespace aliases - Add stub types for CveMappingController (ICveSymbolMappingService) - Fix VerdictBuilderService static context issue Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
BinaryIndex
Status: Implemented
Source: src/BinaryIndex/
Owner: Scanner Guild + Concelier Guild
Purpose
BinaryIndex provides vulnerable binary detection independent of package metadata. It addresses the gap where package version strings can lie (backports, custom builds, stripped metadata) through binary-first vulnerability identification using Build-IDs, hash catalogs, and function fingerprints.
Components
Libraries:
StellaOps.BinaryIndex.Core- Core binary identity extraction and matching engineStellaOps.BinaryIndex.Corpus- Binary-to-advisory mapping databaseStellaOps.BinaryIndex.Corpus.Debian- Debian-specific corpus supportStellaOps.BinaryIndex.Fingerprints- Function fingerprint storage and matching (CFG/basic-block hashes)StellaOps.BinaryIndex.FixIndex- Patch-aware backport handlingStellaOps.BinaryIndex.Persistence- Storage adapters for binary catalogs
Configuration
Configuration is typically embedded in Scanner and Concelier module settings.
Key features:
- Three-tier binary identification (package/version, Build-ID/hash, function fingerprints)
- Binary identity extraction (Build-ID, PE CodeView GUID, Mach-O UUID)
- Integration with Scanner.Worker for binary lookup
- Offline-first design with deterministic outputs
Dependencies
- PostgreSQL (integrated with Scanner/Concelier schemas)
- Scanner.Analyzers.Native (for binary disassembly/analysis)
- Concelier (for advisory-to-binary mapping)
Related Documentation
- Architecture:
./architecture.md - High-Level Architecture:
../../ARCHITECTURE_OVERVIEW.md - Scanner Architecture:
../scanner/architecture.md - Concelier Architecture:
../concelier/architecture.md
Current Status
Library implementation complete with support for ELF (Build-ID), PE (CodeView GUID), and Mach-O (UUID) binary formats. Integrated into Scanner's native binary analysis pipeline.
Semantic Diffing Roadmap
A major enhancement to BinaryIndex is planned to enable semantic-level binary diffing - detecting function equivalence based on behavior rather than syntax. This addresses limitations in current byte/symbol-based matching when dealing with:
- Compiler optimizations (same source, different instructions)
- Stripped binaries (no symbols)
- Cross-compiler builds (GCC vs Clang)
- Obfuscated code
Planned Phases
| Phase | Description | Impact | Status |
|---|---|---|---|
| Phase 1 | IR-Level Semantic Analysis | +15% accuracy on optimized binaries | Planned |
| Phase 2 | Function Behavior Corpus | +10% coverage on stripped binaries | Planned |
| Phase 3 | Ghidra Integration | +5% edge case handling | Planned |
| Phase 4 | Decompiler & ML Similarity | +10% obfuscation resilience | Planned |
New Libraries (Planned)
StellaOps.BinaryIndex.Semantic- IR lifting and semantic graph fingerprintsStellaOps.BinaryIndex.Corpus- 30K+ function behavior databaseStellaOps.BinaryIndex.Ghidra- Ghidra Headless integrationStellaOps.BinaryIndex.Decompiler- Decompiled code AST comparisonStellaOps.BinaryIndex.ML- CodeBERT-based function embeddingsStellaOps.BinaryIndex.Ensemble- Multi-signal decision fusion
Expected Outcomes
| Metric | Current | Target |
|---|---|---|
| Patch detection accuracy | ~70% | 92%+ |
| Function identification (stripped) | ~50% | 85%+ |
| False positive rate | ~5% | <2% |
Sprint Files
docs/implplan/SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.mddocs/implplan/SPRINT_20260105_001_002_BINDEX_semdiff_corpus.mddocs/implplan/SPRINT_20260105_001_003_BINDEX_semdiff_ghidra.mddocs/implplan/SPRINT_20260105_001_004_BINDEX_semdiff_decompiler_ml.md
Architecture Documentation
See ./semantic-diffing.md for comprehensive architecture documentation.