# Sprint 6000 Series Summary: BinaryIndex Module
## Overview
The 6000 series implements the **BinaryIndex** module - a vulnerable binaries database that enables detection of vulnerable code at the binary level, independent of package metadata.
**Advisory Source:** `docs/product-advisories/21-Dec-2025 - Mapping Evidence Within Compiled Binaries.md`
---
## MVP Roadmap
### MVP 1: Known-Build Binary Catalog (Sprint 6000.0001)
**Goal:** Query "is this Build-ID vulnerable?" with distro-level precision.
| Sprint | Topic | Description |
|--------|-------|-------------|
| 6000.0001.0001 | Binaries Schema | PostgreSQL schema creation |
| 6000.0001.0002 | Binary Identity Service | Core identity extraction and storage |
| 6000.0001.0003 | Debian Corpus Connector | Debian/Ubuntu package ingestion |
| 6000.0001.0004 | Build-ID Lookup Service | Query API for Build-ID matching |
**Acceptance:** Given a Build-ID, return associated CVEs from known distro builds.
---
### MVP 2: Patch-Aware Backport Handling (Sprint 6000.0002)
**Goal:** Handle "version says vulnerable but distro backported the fix."
| Sprint | Topic | Description |
|--------|-------|-------------|
| 6000.0002.0001 | Fix Evidence Parser | Changelog and patch header parsing |
| 6000.0002.0002 | Fix Index Builder | Merge evidence into fix index |
| 6000.0002.0003 | Version Comparators | Distro-specific version comparison |
| 6000.0002.0004 | RPM Corpus Connector | RHEL/Fedora package ingestion |
**Acceptance:** For a CVE that upstream marks vulnerable, correctly identify distro backport as fixed.
---
### MVP 3: Binary Fingerprint Factory (Sprint 6000.0003)
**Goal:** Detect vulnerable code independent of package metadata.
| Sprint | Topic | Description |
|--------|-------|-------------|
| 6000.0003.0001 | Fingerprint Storage | Database and blob storage for fingerprints |
| 6000.0003.0002 | Reference Build Pipeline | Generate vulnerable/fixed reference builds |
| 6000.0003.0003 | Fingerprint Generator | Extract function fingerprints from binaries |
| 6000.0003.0004 | Fingerprint Matching Engine | Similarity search and matching |
| 6000.0003.0005 | Validation Corpus | Golden corpus for fingerprint validation |
**Acceptance:** Detect CVE in stripped binary with no package metadata, confidence > 0.95.
---
### MVP 4: Scanner Integration (Sprint 6000.0004)
**Goal:** Binary evidence in production scans.
| Sprint | Topic | Description |
|--------|-------|-------------|
| 6000.0004.0001 | Scanner Worker Integration | Wire BinaryIndex into scan pipeline |
| 6000.0004.0002 | Findings Ledger Integration | Record binary matches as findings |
| 6000.0004.0003 | Proof Segment Attestation | DSSE attestations for binary evidence |
| 6000.0004.0004 | CLI Binary Match Inspection | CLI commands for match inspection |
**Acceptance:** Container scan produces binary match findings with evidence chain.
---
## Dependencies
```mermaid
graph TD
subgraph MVP1["MVP 1: Known-Build Catalog"]
S6001[6000.0001.0001
Schema]
S6002[6000.0001.0002
Identity Service]
S6003[6000.0001.0003
Debian Connector]
S6004[6000.0001.0004
Build-ID Lookup]
S6001 --> S6002
S6002 --> S6003
S6002 --> S6004
S6003 --> S6004
end
subgraph MVP2["MVP 2: Patch-Aware"]
S6011[6000.0002.0001
Fix Parser]
S6012[6000.0002.0002
Fix Index Builder]
S6013[6000.0002.0003
Version Comparators]
S6014[6000.0002.0004
RPM Connector]
S6011 --> S6012
S6013 --> S6012
S6012 --> S6014
end
subgraph MVP3["MVP 3: Fingerprints"]
S6021[6000.0003.0001
FP Storage]
S6022[6000.0003.0002
Ref Build Pipeline]
S6023[6000.0003.0003
FP Generator]
S6024[6000.0003.0004
Matching Engine]
S6025[6000.0003.0005
Validation Corpus]
S6021 --> S6023
S6022 --> S6023
S6023 --> S6024
S6024 --> S6025
end
subgraph MVP4["MVP 4: Integration"]
S6031[6000.0004.0001
Scanner Integration]
S6032[6000.0004.0002
Findings Ledger]
S6033[6000.0004.0003
Attestations]
S6034[6000.0004.0004
CLI]
S6031 --> S6032
S6032 --> S6033
S6031 --> S6034
end
MVP1 --> MVP2
MVP1 --> MVP3
MVP2 --> MVP4
MVP3 --> MVP4
```
---
## Module Structure
```
src/BinaryIndex/
├── StellaOps.BinaryIndex.WebService/ # API service
├── StellaOps.BinaryIndex.Worker/ # Corpus ingestion worker
├── __Libraries/
│ ├── StellaOps.BinaryIndex.Core/ # Domain models, interfaces
│ ├── StellaOps.BinaryIndex.Persistence/ # PostgreSQL + RustFS
│ ├── StellaOps.BinaryIndex.Corpus/ # Corpus connector framework
│ ├── StellaOps.BinaryIndex.Corpus.Debian/ # Debian connector
│ ├── StellaOps.BinaryIndex.Corpus.Rpm/ # RPM connector
│ ├── StellaOps.BinaryIndex.FixIndex/ # Patch-aware fix index
│ └── StellaOps.BinaryIndex.Fingerprints/ # Fingerprint generation
└── __Tests/
├── StellaOps.BinaryIndex.Core.Tests/
├── StellaOps.BinaryIndex.Persistence.Tests/
├── StellaOps.BinaryIndex.Corpus.Tests/
└── StellaOps.BinaryIndex.Integration.Tests/
```
---
## Key Interfaces
```csharp
// Query interface (consumed by Scanner.Worker)
public interface IBinaryVulnerabilityService
{
Task> LookupByIdentityAsync(BinaryIdentity identity, CancellationToken ct);
Task> LookupByFingerprintAsync(CodeFingerprint fp, CancellationToken ct);
Task GetFixStatusAsync(string distro, string release, string sourcePkg, string cveId, CancellationToken ct);
}
// Corpus connector interface
public interface IBinaryCorpusConnector
{
string ConnectorId { get; }
Task FetchSnapshotAsync(CorpusQuery query, CancellationToken ct);
IAsyncEnumerable ExtractBinariesAsync(PackageReference pkg, CancellationToken ct);
}
// Fix index interface
public interface IFixIndexBuilder
{
Task BuildIndexAsync(DistroRelease distro, CancellationToken ct);
Task GetFixRecordAsync(string distro, string release, string sourcePkg, string cveId, CancellationToken ct);
}
```
---
## Database Schema
Schema: `binaries`
Owner: BinaryIndex module
**Key Tables:**
| Table | Purpose |
|-------|---------|
| `binary_identity` | Known binary identities (Build-ID, hashes) |
| `binary_package_map` | Binary → package mapping per snapshot |
| `vulnerable_buildids` | Build-IDs known to be vulnerable |
| `cve_fix_index` | Patch-aware fix status per distro |
| `vulnerable_fingerprints` | Function fingerprints for CVEs |
| `fingerprint_matches` | Match results (findings evidence) |
See: `docs/db/schemas/binaries_schema_specification.md`
---
## Integration Points
### Scanner.Worker
```csharp
// During binary extraction
var identity = await _featureExtractor.ExtractIdentityAsync(binaryStream, ct);
var matches = await _binaryVulnService.LookupByIdentityAsync(identity, ct);
// If distro known, check fix status
var fixStatus = await _binaryVulnService.GetFixStatusAsync(
distro, release, sourcePkg, cveId, ct);
```
### Findings Ledger
```csharp
public record BinaryVulnerabilityFinding : IFinding
{
public string MatchType { get; init; } // "fingerprint", "buildid"
public string VulnerablePurl { get; init; }
public string MatchedSymbol { get; init; }
public float Similarity { get; init; }
public string[] LinkedCves { get; init; }
}
```
### Policy Engine
New proof segment type: `binary_fingerprint_evidence`
---
## Configuration
```yaml
binaryindex:
enabled: true
corpus:
connectors:
- type: debian
enabled: true
releases: [bookworm, bullseye, jammy, noble]
fingerprinting:
enabled: true
target_components: [openssl, glibc, zlib, curl]
lookup:
cache_ttl: 3600
```
---
## Success Criteria
### MVP 1
- [ ] `binaries` schema deployed and migrated
- [ ] Debian/Ubuntu corpus ingestion operational
- [ ] Build-ID lookup returns CVEs with < 100ms p95 latency
### MVP 2
- [ ] Fix index correctly handles Debian/RHEL backports
- [ ] 95%+ accuracy on backport test corpus
### MVP 3
- [ ] Fingerprints generated for OpenSSL, glibc, zlib, curl
- [ ] < 5% false positive rate on validation corpus
### MVP 4
- [ ] Scanner produces binary match findings
- [ ] DSSE attestations include binary evidence
- [ ] CLI `stella binary-matches` command operational
---
## References
- Architecture: `docs/modules/binaryindex/architecture.md`
- Schema: `docs/db/schemas/binaries_schema_specification.md`
- Advisory: `docs/product-advisories/21-Dec-2025 - Mapping Evidence Within Compiled Binaries.md`
- Existing fingerprinting: `src/Scanner/__Libraries/StellaOps.Scanner.EntryTrace/Binary/`
- Build-ID indexing: `src/Scanner/StellaOps.Scanner.Analyzers.Native/Index/`
---
*Document Version: 1.0.0*
*Created: 2025-12-21*