- Created a new document for the Stella Ops Reference Architecture outlining the system's topology, trust boundaries, artifact association, and interfaces. - Developed a comprehensive Testing Strategy document detailing the importance of offline readiness, interoperability, determinism, and operational guardrails. - Introduced a README for the Testing Strategy, summarizing processing details and key concepts implemented. - Added guidance for AI agents and developers in the tests directory, including directory structure, test categories, key patterns, and rules for test development.
559 lines
22 KiB
Markdown
559 lines
22 KiB
Markdown
# BinaryIndex Module Architecture
|
|
|
|
> **Ownership:** Scanner Guild + Concelier Guild
|
|
> **Status:** DRAFT
|
|
> **Version:** 1.0.0
|
|
> **Related:** [High-Level Architecture](../../07_HIGH_LEVEL_ARCHITECTURE.md), [Scanner Architecture](../scanner/architecture.md), [Concelier Architecture](../concelier/architecture.md)
|
|
|
|
---
|
|
|
|
## 1. Overview
|
|
|
|
The **BinaryIndex** module provides a vulnerable binaries database that enables detection of vulnerable code at the binary level, independent of package metadata. This addresses a critical gap in vulnerability scanning: package version strings can lie (backports, custom builds, stripped metadata), but **binary identity doesn't lie**.
|
|
|
|
### 1.1 Problem Statement
|
|
|
|
Traditional vulnerability scanners rely on package version matching, which fails in several scenarios:
|
|
|
|
1. **Backported patches** - Distros backport security fixes without changing upstream version
|
|
2. **Custom/vendored builds** - Binaries compiled from source without package metadata
|
|
3. **Stripped binaries** - Debug info and version strings removed
|
|
4. **Static linking** - Vulnerable library code embedded in final binary
|
|
5. **Container base images** - Distroless or scratch images with no package DB
|
|
|
|
### 1.2 Solution: Binary-First Vulnerability Detection
|
|
|
|
BinaryIndex provides three tiers of binary identification:
|
|
|
|
| Tier | Method | Precision | Coverage |
|
|
|------|--------|-----------|----------|
|
|
| A | Package/version range matching | Medium | High |
|
|
| B | Build-ID/hash catalog (exact binary identity) | High | Medium |
|
|
| C | Function fingerprints (CFG/basic-block hashes) | Very High | Targeted |
|
|
|
|
### 1.3 Module Scope
|
|
|
|
**In Scope:**
|
|
- Binary identity extraction (Build-ID, PE CodeView GUID, Mach-O UUID)
|
|
- Binary-to-advisory mapping database
|
|
- Fingerprint storage and matching engine
|
|
- Fix index for patch-aware backport handling
|
|
- Integration with Scanner.Worker for binary lookup
|
|
|
|
**Out of Scope:**
|
|
- Binary disassembly/analysis (provided by Scanner.Analyzers.Native)
|
|
- Runtime binary tracing (provided by Zastava)
|
|
- SBOM generation (provided by Scanner)
|
|
|
|
---
|
|
|
|
## 2. Architecture
|
|
|
|
### 2.1 System Context
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────────────────┐
|
|
│ External Systems │
|
|
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
|
│ │ Distro Repos │ │ Debug Symbol │ │ Upstream Source │ │
|
|
│ │ (Debian, RPM, │ │ Servers │ │ (GitHub, etc.) │ │
|
|
│ │ Alpine) │ │ (debuginfod) │ │ │ │
|
|
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
|
|
└───────────│─────────────────────│─────────────────────│──────────────────┘
|
|
│ │ │
|
|
v v v
|
|
┌──────────────────────────────────────────────────────────────────────────┐
|
|
│ BinaryIndex Module │
|
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ Corpus Ingestion Layer │ │
|
|
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
|
|
│ │ │ DebianCorpus │ │ RpmCorpus │ │ AlpineCorpus │ │ │
|
|
│ │ │ Connector │ │ Connector │ │ Connector │ │ │
|
|
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ v │
|
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ Processing Layer │ │
|
|
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
|
|
│ │ │ BinaryFeature│ │ FixIndex │ │ Fingerprint │ │ │
|
|
│ │ │ Extractor │ │ Builder │ │ Generator │ │ │
|
|
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ v │
|
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ Storage Layer │ │
|
|
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
|
|
│ │ │ PostgreSQL │ │ RustFS │ │ Valkey │ │ │
|
|
│ │ │ (binaries │ │ (fingerprint │ │ (lookup │ │ │
|
|
│ │ │ schema) │ │ blobs) │ │ cache) │ │ │
|
|
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ v │
|
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ Query Layer │ │
|
|
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
|
|
│ │ │ IBinaryVulnerabilityService │ │ │
|
|
│ │ │ - LookupByBuildIdAsync(buildId) │ │ │
|
|
│ │ │ - LookupByFingerprintAsync(fingerprint) │ │ │
|
|
│ │ │ - LookupBatchAsync(identities) │ │ │
|
|
│ │ │ - GetFixStatusAsync(distro, release, sourcePkg, cve) │ │ │
|
|
│ │ └──────────────────────────────────────────────────────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
└──────────────────────────────────────────────────────────────────────────┘
|
|
│
|
|
v
|
|
┌──────────────────────────────────────────────────────────────────────────┐
|
|
│ Consuming Modules │
|
|
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
|
│ │ Scanner.Worker │ │ Policy Engine │ │ Findings Ledger │ │
|
|
│ │ (binary lookup │ │ (evidence in │ │ (match records) │ │
|
|
│ │ during scan) │ │ proof chain) │ │ │ │
|
|
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
|
|
└──────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### 2.2 Component Breakdown
|
|
|
|
#### 2.2.1 Corpus Connectors
|
|
|
|
Plugin-based connectors that ingest binaries from distribution repositories.
|
|
|
|
```csharp
|
|
public interface IBinaryCorpusConnector
|
|
{
|
|
string ConnectorId { get; }
|
|
string[] SupportedDistros { get; }
|
|
|
|
Task<CorpusSnapshot> FetchSnapshotAsync(CorpusQuery query, CancellationToken ct);
|
|
Task<IAsyncEnumerable<ExtractedBinary>> ExtractBinariesAsync(PackageReference pkg, CancellationToken ct);
|
|
}
|
|
```
|
|
|
|
**Implementations:**
|
|
- `DebianBinaryCorpusConnector` - Debian/Ubuntu packages + debuginfo
|
|
- `RpmBinaryCorpusConnector` - RHEL/Fedora/CentOS + SRPM
|
|
- `AlpineBinaryCorpusConnector` - Alpine APK + APKBUILD
|
|
|
|
#### 2.2.2 Binary Feature Extractor
|
|
|
|
Extracts identity and features from binaries. Reuses existing Scanner.Analyzers.Native capabilities.
|
|
|
|
```csharp
|
|
public interface IBinaryFeatureExtractor
|
|
{
|
|
Task<BinaryIdentity> ExtractIdentityAsync(Stream binaryStream, CancellationToken ct);
|
|
Task<BinaryFeatures> ExtractFeaturesAsync(Stream binaryStream, ExtractorOptions opts, CancellationToken ct);
|
|
}
|
|
|
|
public sealed record BinaryIdentity(
|
|
string Format, // elf, pe, macho
|
|
string? BuildId, // ELF GNU Build-ID
|
|
string? PeCodeViewGuid, // PE CodeView GUID + Age
|
|
string? MachoUuid, // Mach-O LC_UUID
|
|
string FileSha256,
|
|
string TextSectionSha256);
|
|
|
|
public sealed record BinaryFeatures(
|
|
BinaryIdentity Identity,
|
|
string[] DynamicDeps, // DT_NEEDED
|
|
string[] ExportedSymbols,
|
|
string[] ImportedSymbols,
|
|
BinaryHardening Hardening);
|
|
```
|
|
|
|
#### 2.2.3 Fix Index Builder
|
|
|
|
Builds the patch-aware CVE fix index from distro sources.
|
|
|
|
```csharp
|
|
public interface IFixIndexBuilder
|
|
{
|
|
Task BuildIndexAsync(DistroRelease distro, CancellationToken ct);
|
|
Task<FixRecord?> GetFixRecordAsync(string distro, string release, string sourcePkg, string cveId, CancellationToken ct);
|
|
}
|
|
|
|
public sealed record FixRecord(
|
|
string Distro,
|
|
string Release,
|
|
string SourcePkg,
|
|
string CveId,
|
|
FixState State, // fixed, vulnerable, not_affected, wontfix, unknown
|
|
string? FixedVersion, // Distro version string
|
|
FixMethod Method, // security_feed, changelog, patch_header
|
|
decimal Confidence, // 0.00-1.00
|
|
FixEvidence Evidence);
|
|
|
|
public enum FixState { Fixed, Vulnerable, NotAffected, Wontfix, Unknown }
|
|
public enum FixMethod { SecurityFeed, Changelog, PatchHeader, UpstreamPatchMatch }
|
|
```
|
|
|
|
#### 2.2.4 Fingerprint Generator
|
|
|
|
Generates function-level fingerprints for vulnerable code detection.
|
|
|
|
```csharp
|
|
public interface IVulnFingerprintGenerator
|
|
{
|
|
Task<ImmutableArray<VulnFingerprint>> GenerateAsync(
|
|
string cveId,
|
|
BinaryPair vulnAndFixed, // Reference builds
|
|
FingerprintOptions opts,
|
|
CancellationToken ct);
|
|
}
|
|
|
|
public sealed record VulnFingerprint(
|
|
string CveId,
|
|
string Component, // e.g., openssl
|
|
string Architecture, // x86-64, aarch64
|
|
FingerprintType Type, // basic_block, cfg, combined
|
|
string FingerprintId, // e.g., "bb-abc123..."
|
|
byte[] FingerprintHash, // 16-32 bytes
|
|
string? FunctionHint, // Function name if known
|
|
decimal Confidence,
|
|
FingerprintEvidence Evidence);
|
|
|
|
public enum FingerprintType { BasicBlock, ControlFlowGraph, StringReferences, Combined }
|
|
```
|
|
|
|
#### 2.2.5 Binary Vulnerability Service
|
|
|
|
Main query interface for consumers.
|
|
|
|
```csharp
|
|
public interface IBinaryVulnerabilityService
|
|
{
|
|
/// <summary>
|
|
/// Look up vulnerabilities by Build-ID or equivalent binary identity.
|
|
/// </summary>
|
|
Task<ImmutableArray<BinaryVulnMatch>> LookupByIdentityAsync(
|
|
BinaryIdentity identity,
|
|
LookupOptions? opts = null,
|
|
CancellationToken ct = default);
|
|
|
|
/// <summary>
|
|
/// Look up vulnerabilities by function fingerprint.
|
|
/// </summary>
|
|
Task<ImmutableArray<BinaryVulnMatch>> LookupByFingerprintAsync(
|
|
CodeFingerprint fingerprint,
|
|
decimal minSimilarity = 0.95m,
|
|
CancellationToken ct = default);
|
|
|
|
/// <summary>
|
|
/// Batch lookup for scan performance.
|
|
/// </summary>
|
|
Task<ImmutableDictionary<string, ImmutableArray<BinaryVulnMatch>>> LookupBatchAsync(
|
|
IEnumerable<BinaryIdentity> identities,
|
|
LookupOptions? opts = null,
|
|
CancellationToken ct = default);
|
|
|
|
/// <summary>
|
|
/// Get distro-specific fix status (patch-aware).
|
|
/// </summary>
|
|
Task<FixRecord?> GetFixStatusAsync(
|
|
string distro,
|
|
string release,
|
|
string sourcePkg,
|
|
string cveId,
|
|
CancellationToken ct = default);
|
|
}
|
|
|
|
public sealed record BinaryVulnMatch(
|
|
string CveId,
|
|
string VulnerablePurl,
|
|
MatchMethod Method, // buildid_catalog, fingerprint_match, range_match
|
|
decimal Confidence,
|
|
MatchEvidence Evidence);
|
|
|
|
public enum MatchMethod { BuildIdCatalog, FingerprintMatch, RangeMatch }
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Data Model
|
|
|
|
### 3.1 PostgreSQL Schema (`binaries`)
|
|
|
|
The `binaries` schema stores binary identity, fingerprint, and match data.
|
|
|
|
```sql
|
|
CREATE SCHEMA IF NOT EXISTS binaries;
|
|
CREATE SCHEMA IF NOT EXISTS binaries_app;
|
|
|
|
-- RLS helper
|
|
CREATE OR REPLACE FUNCTION binaries_app.require_current_tenant()
|
|
RETURNS TEXT LANGUAGE plpgsql STABLE SECURITY DEFINER AS $$
|
|
DECLARE v_tenant TEXT;
|
|
BEGIN
|
|
v_tenant := current_setting('app.tenant_id', true);
|
|
IF v_tenant IS NULL OR v_tenant = '' THEN
|
|
RAISE EXCEPTION 'app.tenant_id session variable not set';
|
|
END IF;
|
|
RETURN v_tenant;
|
|
END;
|
|
$$;
|
|
```
|
|
|
|
#### 3.1.1 Core Tables
|
|
|
|
See `docs/db/schemas/binaries_schema_specification.md` for complete DDL.
|
|
|
|
**Key Tables:**
|
|
|
|
| Table | Purpose |
|
|
|-------|---------|
|
|
| `binaries.binary_identity` | Known binary identities (Build-ID, hashes) |
|
|
| `binaries.binary_package_map` | Binary → package mapping per snapshot |
|
|
| `binaries.vulnerable_buildids` | Build-IDs known to be vulnerable |
|
|
| `binaries.vulnerable_fingerprints` | Function fingerprints for CVEs |
|
|
| `binaries.cve_fix_index` | Patch-aware fix status per distro |
|
|
| `binaries.fingerprint_matches` | Match results (findings evidence) |
|
|
| `binaries.corpus_snapshots` | Corpus ingestion tracking |
|
|
|
|
### 3.2 RustFS Layout
|
|
|
|
```
|
|
rustfs://stellaops/binaryindex/
|
|
fingerprints/<algorithm>/<prefix>/<fingerprint_id>.bin
|
|
corpus/<distro>/<release>/<snapshot_id>/manifest.json
|
|
corpus/<distro>/<release>/<snapshot_id>/packages/<pkg>.metadata.json
|
|
evidence/<match_id>.dsse.json
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Integration Points
|
|
|
|
### 4.1 Scanner.Worker Integration
|
|
|
|
During container scanning, Scanner.Worker queries BinaryIndex for each extracted binary:
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant SW as Scanner.Worker
|
|
participant BI as BinaryIndex
|
|
participant PG as PostgreSQL
|
|
participant FL as Findings Ledger
|
|
|
|
SW->>SW: Extract binary from layer
|
|
SW->>SW: Compute BinaryIdentity
|
|
SW->>BI: LookupByIdentityAsync(identity)
|
|
BI->>PG: Query binaries.vulnerable_buildids
|
|
PG-->>BI: Matches
|
|
BI->>PG: Query binaries.cve_fix_index (if distro known)
|
|
PG-->>BI: Fix status
|
|
BI-->>SW: BinaryVulnMatch[]
|
|
SW->>FL: RecordFinding(match, evidence)
|
|
```
|
|
|
|
### 4.2 Concelier Integration
|
|
|
|
BinaryIndex subscribes to Concelier's advisory updates:
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant CO as Concelier
|
|
participant BI as BinaryIndex
|
|
participant PG as PostgreSQL
|
|
|
|
CO->>CO: Ingest new advisory
|
|
CO->>BI: advisory.created event
|
|
BI->>BI: Check if affected packages in corpus
|
|
BI->>PG: Update binaries.binary_vuln_assertion
|
|
BI->>BI: Queue fingerprint generation (if high-impact)
|
|
```
|
|
|
|
### 4.3 Policy Integration
|
|
|
|
Binary matches are recorded as proof segments:
|
|
|
|
```json
|
|
{
|
|
"segment_type": "binary_fingerprint_evidence",
|
|
"payload": {
|
|
"binary_identity": {
|
|
"format": "elf",
|
|
"build_id": "abc123...",
|
|
"file_sha256": "def456..."
|
|
},
|
|
"matches": [
|
|
{
|
|
"cve_id": "CVE-2024-1234",
|
|
"method": "buildid_catalog",
|
|
"confidence": 0.98,
|
|
"vulnerable_purl": "pkg:deb/debian/libssl3@1.1.1n-0+deb11u3"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 5. MVP Roadmap
|
|
|
|
### MVP 1: Known-Build Binary Catalog (Sprint 6000.0001)
|
|
|
|
**Goal:** Query "is this Build-ID vulnerable?" with distro-level precision.
|
|
|
|
**Deliverables:**
|
|
- `binaries` PostgreSQL schema
|
|
- Build-ID to package mapping tables
|
|
- Basic CVE lookup by binary identity
|
|
- Debian/Ubuntu corpus connector
|
|
|
|
### MVP 2: Patch-Aware Backport Handling (Sprint 6000.0002)
|
|
|
|
**Goal:** Handle "version says vulnerable but distro backported the fix."
|
|
|
|
**Deliverables:**
|
|
- Fix index builder (changelog + patch header parsing)
|
|
- Distro-specific version comparison
|
|
- RPM corpus connector
|
|
- Scanner.Worker integration
|
|
|
|
### MVP 3: Binary Fingerprint Factory (Sprint 6000.0003)
|
|
|
|
**Goal:** Detect vulnerable code independent of package metadata.
|
|
|
|
**Deliverables:**
|
|
- Fingerprint storage and matching
|
|
- Reference build generation pipeline
|
|
- Fingerprint validation corpus
|
|
- High-impact CVE coverage (OpenSSL, glibc, zlib, curl)
|
|
|
|
### MVP 4: Full Scanner Integration (Sprint 6000.0004)
|
|
|
|
**Goal:** Binary evidence in production scans.
|
|
|
|
**Deliverables:**
|
|
- Scanner.Worker binary lookup integration
|
|
- Findings Ledger binary match records
|
|
- Proof segment attestations
|
|
- CLI binary match inspection
|
|
|
|
---
|
|
|
|
## 6. Security Considerations
|
|
|
|
### 6.1 Trust Boundaries
|
|
|
|
1. **Corpus Ingestion** - Packages are untrusted; extraction runs in sandboxed workers
|
|
2. **Fingerprint Generation** - Reference builds compiled in isolated environments
|
|
3. **Query API** - Tenant-isolated via RLS; no cross-tenant data leakage
|
|
|
|
### 6.2 Signing & Provenance
|
|
|
|
- All corpus snapshots are signed (DSSE)
|
|
- Fingerprint sets are versioned and signed
|
|
- Every match result references evidence digests
|
|
|
|
### 6.3 Sandbox Requirements
|
|
|
|
Binary extraction and fingerprint generation MUST run with:
|
|
- Seccomp profile restricting syscalls
|
|
- Read-only root filesystem
|
|
- No network access during analysis
|
|
- Memory/CPU limits
|
|
|
|
---
|
|
|
|
## 7. Observability
|
|
|
|
### 7.1 Metrics
|
|
|
|
| Metric | Type | Labels |
|
|
|--------|------|--------|
|
|
| `binaryindex_lookup_total` | Counter | method, result |
|
|
| `binaryindex_lookup_latency_ms` | Histogram | method |
|
|
| `binaryindex_corpus_packages_total` | Gauge | distro, release |
|
|
| `binaryindex_fingerprints_indexed` | Gauge | algorithm, component |
|
|
| `binaryindex_match_confidence` | Histogram | method |
|
|
|
|
### 7.2 Traces
|
|
|
|
- `binaryindex.lookup` - Full lookup span
|
|
- `binaryindex.corpus.ingest` - Corpus ingestion
|
|
- `binaryindex.fingerprint.generate` - Fingerprint generation
|
|
|
|
---
|
|
|
|
## 8. Configuration
|
|
|
|
```yaml
|
|
# binaryindex.yaml
|
|
binaryindex:
|
|
enabled: true
|
|
|
|
corpus:
|
|
connectors:
|
|
- type: debian
|
|
enabled: true
|
|
mirror: http://deb.debian.org/debian
|
|
releases: [bookworm, bullseye]
|
|
architectures: [amd64, arm64]
|
|
- type: ubuntu
|
|
enabled: true
|
|
mirror: http://archive.ubuntu.com/ubuntu
|
|
releases: [jammy, noble]
|
|
|
|
fingerprinting:
|
|
enabled: true
|
|
algorithms: [basic_block, cfg]
|
|
target_components:
|
|
- openssl
|
|
- glibc
|
|
- zlib
|
|
- curl
|
|
- sqlite
|
|
min_function_size: 16 # bytes
|
|
max_functions_per_binary: 10000
|
|
|
|
lookup:
|
|
cache_ttl: 3600
|
|
batch_size: 100
|
|
timeout_ms: 5000
|
|
|
|
storage:
|
|
postgres_schema: binaries
|
|
rustfs_bucket: stellaops/binaryindex
|
|
```
|
|
|
|
---
|
|
|
|
## 9. Testing Strategy
|
|
|
|
### 9.1 Unit Tests
|
|
|
|
- Identity extraction (Build-ID, hashes)
|
|
- Fingerprint generation determinism
|
|
- Fix index parsing (changelog, patch headers)
|
|
|
|
### 9.2 Integration Tests
|
|
|
|
- PostgreSQL schema validation
|
|
- Full corpus ingestion flow
|
|
- Scanner.Worker lookup integration
|
|
|
|
### 9.3 Regression Tests
|
|
|
|
- Known CVE detection (golden corpus)
|
|
- Backport handling (Debian libssl example)
|
|
- False positive rate validation
|
|
|
|
---
|
|
|
|
## 10. References
|
|
|
|
- Advisory: `docs/product-advisories/21-Dec-2025 - Mapping Evidence Within Compiled Binaries.md`
|
|
- Scanner Native Analysis: `src/Scanner/StellaOps.Scanner.Analyzers.Native/`
|
|
- Existing Fingerprinting: `src/Scanner/__Libraries/StellaOps.Scanner.EntryTrace/Binary/`
|
|
- Build-ID Index: `src/Scanner/StellaOps.Scanner.Analyzers.Native/Index/`
|
|
|
|
---
|
|
|
|
*Document Version: 1.0.0*
|
|
*Last Updated: 2025-12-21*
|