696 lines
26 KiB
Markdown
696 lines
26 KiB
Markdown
# BinaryIndex Module Architecture
|
|
|
|
> **Ownership:** Scanner Guild + Concelier Guild
|
|
> **Status:** DRAFT
|
|
> **Version:** 1.0.0
|
|
> **Related:** [High-Level Architecture](../../07_HIGH_LEVEL_ARCHITECTURE.md), [Scanner Architecture](../scanner/architecture.md), [Concelier Architecture](../concelier/architecture.md)
|
|
|
|
---
|
|
|
|
## 1. Overview
|
|
|
|
The **BinaryIndex** module provides a vulnerable binaries database that enables detection of vulnerable code at the binary level, independent of package metadata. This addresses a critical gap in vulnerability scanning: package version strings can lie (backports, custom builds, stripped metadata), but **binary identity doesn't lie**.
|
|
|
|
### 1.1 Problem Statement
|
|
|
|
Traditional vulnerability scanners rely on package version matching, which fails in several scenarios:
|
|
|
|
1. **Backported patches** - Distros backport security fixes without changing upstream version
|
|
2. **Custom/vendored builds** - Binaries compiled from source without package metadata
|
|
3. **Stripped binaries** - Debug info and version strings removed
|
|
4. **Static linking** - Vulnerable library code embedded in final binary
|
|
5. **Container base images** - Distroless or scratch images with no package DB
|
|
|
|
### 1.2 Solution: Binary-First Vulnerability Detection
|
|
|
|
BinaryIndex provides three tiers of binary identification:
|
|
|
|
| Tier | Method | Precision | Coverage |
|
|
|------|--------|-----------|----------|
|
|
| A | Package/version range matching | Medium | High |
|
|
| B | Build-ID/hash catalog (exact binary identity) | High | Medium |
|
|
| C | Function fingerprints (CFG/basic-block hashes) | Very High | Targeted |
|
|
|
|
### 1.3 Module Scope
|
|
|
|
**In Scope:**
|
|
- Binary identity extraction (Build-ID, PE CodeView GUID, Mach-O UUID)
|
|
- Binary-to-advisory mapping database
|
|
- Fingerprint storage and matching engine
|
|
- Fix index for patch-aware backport handling
|
|
- Integration with Scanner.Worker for binary lookup
|
|
|
|
**Out of Scope:**
|
|
- Binary disassembly/analysis (provided by Scanner.Analyzers.Native)
|
|
- Runtime binary tracing (provided by Zastava)
|
|
- SBOM generation (provided by Scanner)
|
|
|
|
---
|
|
|
|
## 2. Architecture
|
|
|
|
### 2.1 System Context
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────────────────┐
|
|
│ External Systems │
|
|
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
|
│ │ Distro Repos │ │ Debug Symbol │ │ Upstream Source │ │
|
|
│ │ (Debian, RPM, │ │ Servers │ │ (GitHub, etc.) │ │
|
|
│ │ Alpine) │ │ (debuginfod) │ │ │ │
|
|
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
|
|
└───────────│─────────────────────│─────────────────────│──────────────────┘
|
|
│ │ │
|
|
v v v
|
|
┌──────────────────────────────────────────────────────────────────────────┐
|
|
│ BinaryIndex Module │
|
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ Corpus Ingestion Layer │ │
|
|
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
|
|
│ │ │ DebianCorpus │ │ RpmCorpus │ │ AlpineCorpus │ │ │
|
|
│ │ │ Connector │ │ Connector │ │ Connector │ │ │
|
|
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ v │
|
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ Processing Layer │ │
|
|
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
|
|
│ │ │ BinaryFeature│ │ FixIndex │ │ Fingerprint │ │ │
|
|
│ │ │ Extractor │ │ Builder │ │ Generator │ │ │
|
|
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ v │
|
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ Storage Layer │ │
|
|
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
|
|
│ │ │ PostgreSQL │ │ RustFS │ │ Valkey │ │ │
|
|
│ │ │ (binaries │ │ (fingerprint │ │ (lookup │ │ │
|
|
│ │ │ schema) │ │ blobs) │ │ cache) │ │ │
|
|
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ v │
|
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ Query Layer │ │
|
|
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
|
|
│ │ │ IBinaryVulnerabilityService │ │ │
|
|
│ │ │ - LookupByBuildIdAsync(buildId) │ │ │
|
|
│ │ │ - LookupByFingerprintAsync(fingerprint) │ │ │
|
|
│ │ │ - LookupBatchAsync(identities) │ │ │
|
|
│ │ │ - GetFixStatusAsync(distro, release, sourcePkg, cve) │ │ │
|
|
│ │ └──────────────────────────────────────────────────────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
└──────────────────────────────────────────────────────────────────────────┘
|
|
│
|
|
v
|
|
┌──────────────────────────────────────────────────────────────────────────┐
|
|
│ Consuming Modules │
|
|
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
|
│ │ Scanner.Worker │ │ Policy Engine │ │ Findings Ledger │ │
|
|
│ │ (binary lookup │ │ (evidence in │ │ (match records) │ │
|
|
│ │ during scan) │ │ proof chain) │ │ │ │
|
|
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
|
|
└──────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### 2.2 Component Breakdown
|
|
|
|
#### 2.2.1 Corpus Connectors
|
|
|
|
Plugin-based connectors that ingest binaries from distribution repositories.
|
|
|
|
```csharp
|
|
public interface IBinaryCorpusConnector
|
|
{
|
|
string ConnectorId { get; }
|
|
string[] SupportedDistros { get; }
|
|
|
|
Task<CorpusSnapshot> FetchSnapshotAsync(CorpusQuery query, CancellationToken ct);
|
|
Task<IAsyncEnumerable<ExtractedBinary>> ExtractBinariesAsync(PackageReference pkg, CancellationToken ct);
|
|
}
|
|
```
|
|
|
|
**Implementations:**
|
|
- `DebianBinaryCorpusConnector` - Debian/Ubuntu packages + debuginfo
|
|
- `RpmBinaryCorpusConnector` - RHEL/Fedora/CentOS + SRPM
|
|
- `AlpineBinaryCorpusConnector` - Alpine APK + APKBUILD
|
|
|
|
#### 2.2.2 Binary Feature Extractor
|
|
|
|
Extracts identity and features from binaries. Reuses existing Scanner.Analyzers.Native capabilities.
|
|
|
|
```csharp
|
|
public interface IBinaryFeatureExtractor
|
|
{
|
|
Task<BinaryIdentity> ExtractIdentityAsync(Stream binaryStream, CancellationToken ct);
|
|
Task<BinaryFeatures> ExtractFeaturesAsync(Stream binaryStream, ExtractorOptions opts, CancellationToken ct);
|
|
}
|
|
|
|
public sealed record BinaryIdentity(
|
|
string Format, // elf, pe, macho
|
|
string? BuildId, // ELF GNU Build-ID
|
|
string? PeCodeViewGuid, // PE CodeView GUID + Age
|
|
string? MachoUuid, // Mach-O LC_UUID
|
|
string FileSha256,
|
|
string TextSectionSha256);
|
|
|
|
public sealed record BinaryFeatures(
|
|
BinaryIdentity Identity,
|
|
string[] DynamicDeps, // DT_NEEDED
|
|
string[] ExportedSymbols,
|
|
string[] ImportedSymbols,
|
|
BinaryHardening Hardening);
|
|
```
|
|
|
|
#### 2.2.3 Fix Index Builder
|
|
|
|
Builds the patch-aware CVE fix index from distro sources.
|
|
|
|
```csharp
|
|
public interface IFixIndexBuilder
|
|
{
|
|
Task BuildIndexAsync(DistroRelease distro, CancellationToken ct);
|
|
Task<FixRecord?> GetFixRecordAsync(string distro, string release, string sourcePkg, string cveId, CancellationToken ct);
|
|
}
|
|
|
|
public sealed record FixRecord(
|
|
string Distro,
|
|
string Release,
|
|
string SourcePkg,
|
|
string CveId,
|
|
FixState State, // fixed, vulnerable, not_affected, wontfix, unknown
|
|
string? FixedVersion, // Distro version string
|
|
FixMethod Method, // security_feed, changelog, patch_header
|
|
decimal Confidence, // 0.00-1.00
|
|
FixEvidence Evidence);
|
|
|
|
public enum FixState { Fixed, Vulnerable, NotAffected, Wontfix, Unknown }
|
|
public enum FixMethod { SecurityFeed, Changelog, PatchHeader, UpstreamPatchMatch }
|
|
```
|
|
|
|
#### 2.2.4 Fingerprint Generator
|
|
|
|
Generates function-level fingerprints for vulnerable code detection.
|
|
|
|
```csharp
|
|
public interface IVulnFingerprintGenerator
|
|
{
|
|
Task<ImmutableArray<VulnFingerprint>> GenerateAsync(
|
|
string cveId,
|
|
BinaryPair vulnAndFixed, // Reference builds
|
|
FingerprintOptions opts,
|
|
CancellationToken ct);
|
|
}
|
|
|
|
public sealed record VulnFingerprint(
|
|
string CveId,
|
|
string Component, // e.g., openssl
|
|
string Architecture, // x86-64, aarch64
|
|
FingerprintType Type, // basic_block, cfg, combined
|
|
string FingerprintId, // e.g., "bb-abc123..."
|
|
byte[] FingerprintHash, // 16-32 bytes
|
|
string? FunctionHint, // Function name if known
|
|
decimal Confidence,
|
|
FingerprintEvidence Evidence);
|
|
|
|
public enum FingerprintType { BasicBlock, ControlFlowGraph, StringReferences, Combined }
|
|
```
|
|
|
|
#### 2.2.5 Binary Vulnerability Service
|
|
|
|
Main query interface for consumers.
|
|
|
|
```csharp
|
|
public interface IBinaryVulnerabilityService
|
|
{
|
|
/// <summary>
|
|
/// Look up vulnerabilities by Build-ID or equivalent binary identity.
|
|
/// </summary>
|
|
Task<ImmutableArray<BinaryVulnMatch>> LookupByIdentityAsync(
|
|
BinaryIdentity identity,
|
|
LookupOptions? opts = null,
|
|
CancellationToken ct = default);
|
|
|
|
/// <summary>
|
|
/// Look up vulnerabilities by function fingerprint.
|
|
/// </summary>
|
|
Task<ImmutableArray<BinaryVulnMatch>> LookupByFingerprintAsync(
|
|
CodeFingerprint fingerprint,
|
|
decimal minSimilarity = 0.95m,
|
|
CancellationToken ct = default);
|
|
|
|
/// <summary>
|
|
/// Batch lookup for scan performance.
|
|
/// </summary>
|
|
Task<ImmutableDictionary<string, ImmutableArray<BinaryVulnMatch>>> LookupBatchAsync(
|
|
IEnumerable<BinaryIdentity> identities,
|
|
LookupOptions? opts = null,
|
|
CancellationToken ct = default);
|
|
|
|
/// <summary>
|
|
/// Get distro-specific fix status (patch-aware).
|
|
/// </summary>
|
|
Task<FixRecord?> GetFixStatusAsync(
|
|
string distro,
|
|
string release,
|
|
string sourcePkg,
|
|
string cveId,
|
|
CancellationToken ct = default);
|
|
}
|
|
|
|
public sealed record BinaryVulnMatch(
|
|
string CveId,
|
|
string VulnerablePurl,
|
|
MatchMethod Method, // buildid_catalog, fingerprint_match, range_match
|
|
decimal Confidence,
|
|
MatchEvidence Evidence);
|
|
|
|
public enum MatchMethod { BuildIdCatalog, FingerprintMatch, RangeMatch }
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Data Model
|
|
|
|
### 3.1 PostgreSQL Schema (`binaries`)
|
|
|
|
The `binaries` schema stores binary identity, fingerprint, and match data.
|
|
|
|
```sql
|
|
CREATE SCHEMA IF NOT EXISTS binaries;
|
|
CREATE SCHEMA IF NOT EXISTS binaries_app;
|
|
|
|
-- RLS helper
|
|
CREATE OR REPLACE FUNCTION binaries_app.require_current_tenant()
|
|
RETURNS TEXT LANGUAGE plpgsql STABLE SECURITY DEFINER AS $$
|
|
DECLARE v_tenant TEXT;
|
|
BEGIN
|
|
v_tenant := current_setting('app.tenant_id', true);
|
|
IF v_tenant IS NULL OR v_tenant = '' THEN
|
|
RAISE EXCEPTION 'app.tenant_id session variable not set';
|
|
END IF;
|
|
RETURN v_tenant;
|
|
END;
|
|
$$;
|
|
```
|
|
|
|
#### 3.1.1 Core Tables
|
|
|
|
See `docs/db/schemas/binaries_schema_specification.md` for complete DDL.
|
|
|
|
**Key Tables:**
|
|
|
|
| Table | Purpose |
|
|
|-------|---------|
|
|
| `binaries.binary_identity` | Known binary identities (Build-ID, hashes) |
|
|
| `binaries.binary_package_map` | Binary → package mapping per snapshot |
|
|
| `binaries.vulnerable_buildids` | Build-IDs known to be vulnerable |
|
|
| `binaries.vulnerable_fingerprints` | Function fingerprints for CVEs |
|
|
| `binaries.cve_fix_index` | Patch-aware fix status per distro |
|
|
| `binaries.fingerprint_matches` | Match results (findings evidence) |
|
|
| `binaries.corpus_snapshots` | Corpus ingestion tracking |
|
|
|
|
### 3.2 RustFS Layout
|
|
|
|
```
|
|
rustfs://stellaops/binaryindex/
|
|
fingerprints/<algorithm>/<prefix>/<fingerprint_id>.bin
|
|
corpus/<distro>/<release>/<snapshot_id>/manifest.json
|
|
corpus/<distro>/<release>/<snapshot_id>/packages/<pkg>.metadata.json
|
|
evidence/<match_id>.dsse.json
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Integration Points
|
|
|
|
### 4.1 Scanner.Worker Integration
|
|
|
|
During container scanning, Scanner.Worker queries BinaryIndex for each extracted binary:
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant SW as Scanner.Worker
|
|
participant BI as BinaryIndex
|
|
participant PG as PostgreSQL
|
|
participant FL as Findings Ledger
|
|
|
|
SW->>SW: Extract binary from layer
|
|
SW->>SW: Compute BinaryIdentity
|
|
SW->>BI: LookupByIdentityAsync(identity)
|
|
BI->>PG: Query binaries.vulnerable_buildids
|
|
PG-->>BI: Matches
|
|
BI->>PG: Query binaries.cve_fix_index (if distro known)
|
|
PG-->>BI: Fix status
|
|
BI-->>SW: BinaryVulnMatch[]
|
|
SW->>FL: RecordFinding(match, evidence)
|
|
```
|
|
|
|
### 4.2 Concelier Integration
|
|
|
|
BinaryIndex subscribes to Concelier's advisory updates:
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant CO as Concelier
|
|
participant BI as BinaryIndex
|
|
participant PG as PostgreSQL
|
|
|
|
CO->>CO: Ingest new advisory
|
|
CO->>BI: advisory.created event
|
|
BI->>BI: Check if affected packages in corpus
|
|
BI->>PG: Update binaries.binary_vuln_assertion
|
|
BI->>BI: Queue fingerprint generation (if high-impact)
|
|
```
|
|
|
|
### 4.3 Policy Integration
|
|
|
|
Binary matches are recorded as proof segments:
|
|
|
|
```json
|
|
{
|
|
"segment_type": "binary_fingerprint_evidence",
|
|
"payload": {
|
|
"binary_identity": {
|
|
"format": "elf",
|
|
"build_id": "abc123...",
|
|
"file_sha256": "def456..."
|
|
},
|
|
"matches": [
|
|
{
|
|
"cve_id": "CVE-2024-1234",
|
|
"method": "buildid_catalog",
|
|
"confidence": 0.98,
|
|
"vulnerable_purl": "pkg:deb/debian/libssl3@1.1.1n-0+deb11u3"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 5. MVP Roadmap
|
|
|
|
### MVP 1: Known-Build Binary Catalog (Sprint 6000.0001)
|
|
|
|
**Goal:** Query "is this Build-ID vulnerable?" with distro-level precision.
|
|
|
|
**Deliverables:**
|
|
- `binaries` PostgreSQL schema
|
|
- Build-ID to package mapping tables
|
|
- Basic CVE lookup by binary identity
|
|
- Debian/Ubuntu corpus connector
|
|
|
|
### MVP 2: Patch-Aware Backport Handling (Sprint 6000.0002)
|
|
|
|
**Goal:** Handle "version says vulnerable but distro backported the fix."
|
|
|
|
**Deliverables:**
|
|
- Fix index builder (changelog + patch header parsing)
|
|
- Distro-specific version comparison
|
|
- RPM corpus connector
|
|
- Scanner.Worker integration
|
|
|
|
### MVP 3: Binary Fingerprint Factory (Sprint 6000.0003)
|
|
|
|
**Goal:** Detect vulnerable code independent of package metadata.
|
|
|
|
**Deliverables:**
|
|
- Fingerprint storage and matching
|
|
- Reference build generation pipeline
|
|
- Fingerprint validation corpus
|
|
- High-impact CVE coverage (OpenSSL, glibc, zlib, curl)
|
|
|
|
### MVP 4: Full Scanner Integration (Sprint 6000.0004)
|
|
|
|
**Goal:** Binary evidence in production scans.
|
|
|
|
**Deliverables:**
|
|
- Scanner.Worker binary lookup integration
|
|
- Findings Ledger binary match records
|
|
- Proof segment attestations
|
|
- CLI binary match inspection
|
|
|
|
---
|
|
|
|
## 5b. Fix Evidence Chain
|
|
|
|
The **Fix Evidence Chain** provides auditable proof of why a CVE is marked as fixed (or not) for a specific distro/package combination. This is critical for patch-aware backport handling where package versions can be misleading.
|
|
|
|
### 5b.1 Evidence Sources
|
|
|
|
| Source | Confidence | Description |
|
|
|--------|------------|-------------|
|
|
| **Security Feed (OVAL)** | 0.95-0.99 | Authoritative feed from distro (Debian Security Tracker, Red Hat OVAL) |
|
|
| **Patch Header (DEP-3)** | 0.87-0.95 | CVE reference in Debian/Ubuntu patch metadata |
|
|
| **Changelog** | 0.75-0.85 | CVE mention in debian/changelog or RPM %changelog |
|
|
| **Upstream Patch Match** | 0.90 | Binary diff matches known upstream fix |
|
|
|
|
### 5b.2 Evidence Storage
|
|
|
|
Evidence is stored in two PostgreSQL tables:
|
|
|
|
```sql
|
|
-- Fix index: one row per (distro, release, source_pkg, cve_id)
|
|
CREATE TABLE binaries.cve_fix_index (
|
|
id UUID PRIMARY KEY,
|
|
tenant_id TEXT NOT NULL,
|
|
distro TEXT NOT NULL, -- debian, ubuntu, alpine, rhel
|
|
release TEXT NOT NULL, -- bookworm, jammy, v3.19
|
|
source_pkg TEXT NOT NULL,
|
|
cve_id TEXT NOT NULL,
|
|
state TEXT NOT NULL, -- fixed, vulnerable, not_affected, wontfix, unknown
|
|
fixed_version TEXT,
|
|
method TEXT NOT NULL, -- security_feed, changelog, patch_header, upstream_match
|
|
confidence DECIMAL(3,2) NOT NULL,
|
|
evidence_id UUID REFERENCES binaries.fix_evidence(id),
|
|
snapshot_id UUID,
|
|
indexed_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
|
UNIQUE (tenant_id, distro, release, source_pkg, cve_id)
|
|
);
|
|
|
|
-- Evidence blobs: audit trail
|
|
CREATE TABLE binaries.fix_evidence (
|
|
id UUID PRIMARY KEY,
|
|
tenant_id TEXT NOT NULL,
|
|
evidence_type TEXT NOT NULL, -- changelog, patch_header, security_feed
|
|
source_file TEXT, -- Path to source file (changelog, patch)
|
|
source_sha256 TEXT, -- Hash of source file
|
|
excerpt TEXT, -- Relevant snippet (max 1KB)
|
|
metadata JSONB NOT NULL, -- Structured metadata
|
|
snapshot_id UUID,
|
|
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
|
);
|
|
```
|
|
|
|
### 5b.3 Evidence Types
|
|
|
|
**ChangelogEvidence:**
|
|
```json
|
|
{
|
|
"evidence_type": "changelog",
|
|
"source_file": "debian/changelog",
|
|
"excerpt": "* Fix CVE-2024-0727: PKCS12 decoding crash",
|
|
"metadata": {
|
|
"version": "3.0.11-1~deb12u2",
|
|
"line_number": 5
|
|
}
|
|
}
|
|
```
|
|
|
|
**PatchHeaderEvidence:**
|
|
```json
|
|
{
|
|
"evidence_type": "patch_header",
|
|
"source_file": "debian/patches/CVE-2024-0727.patch",
|
|
"excerpt": "CVE: CVE-2024-0727\nOrigin: upstream, https://github.com/openssl/commit/abc123",
|
|
"metadata": {
|
|
"patch_sha256": "abc123def456..."
|
|
}
|
|
}
|
|
```
|
|
|
|
**SecurityFeedEvidence:**
|
|
```json
|
|
{
|
|
"evidence_type": "security_feed",
|
|
"metadata": {
|
|
"feed_id": "debian-security-tracker",
|
|
"entry_id": "DSA-5678-1",
|
|
"published_at": "2024-01-15T10:00:00Z"
|
|
}
|
|
}
|
|
```
|
|
|
|
### 5b.4 Confidence Resolution
|
|
|
|
When multiple evidence sources exist for the same CVE, the system keeps the **highest confidence** entry:
|
|
|
|
```csharp
|
|
ON CONFLICT (tenant_id, distro, release, source_pkg, cve_id)
|
|
DO UPDATE SET
|
|
confidence = GREATEST(existing.confidence, new.confidence),
|
|
method = CASE
|
|
WHEN existing.confidence < new.confidence THEN new.method
|
|
ELSE existing.method
|
|
END,
|
|
evidence_id = CASE
|
|
WHEN existing.confidence < new.confidence THEN new.evidence_id
|
|
ELSE existing.evidence_id
|
|
END
|
|
```
|
|
|
|
### 5b.5 Parsers
|
|
|
|
The following parsers extract CVE fix information:
|
|
|
|
| Parser | Distros | Input | Confidence |
|
|
|--------|---------|-------|------------|
|
|
| `DebianChangelogParser` | Debian, Ubuntu | debian/changelog | 0.80 |
|
|
| `PatchHeaderParser` | Debian, Ubuntu | debian/patches/*.patch (DEP-3) | 0.87 |
|
|
| `AlpineSecfixesParser` | Alpine | APKBUILD secfixes block | 0.95 |
|
|
| `RpmChangelogParser` | RHEL, Fedora, CentOS | RPM spec %changelog | 0.75 |
|
|
|
|
### 5b.6 Query Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant SW as Scanner.Worker
|
|
participant BVS as BinaryVulnerabilityService
|
|
participant FIR as FixIndexRepository
|
|
participant PG as PostgreSQL
|
|
|
|
SW->>BVS: GetFixStatusAsync(debian, bookworm, openssl, CVE-2024-0727)
|
|
BVS->>FIR: GetFixStatusAsync(...)
|
|
FIR->>PG: SELECT FROM cve_fix_index WHERE ...
|
|
PG-->>FIR: FixIndexEntry (state=fixed, confidence=0.87)
|
|
FIR-->>BVS: FixStatusResult
|
|
BVS-->>SW: {state: Fixed, confidence: 0.87, method: PatchHeader}
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Security Considerations
|
|
|
|
### 6.1 Trust Boundaries
|
|
|
|
1. **Corpus Ingestion** - Packages are untrusted; extraction runs in sandboxed workers
|
|
2. **Fingerprint Generation** - Reference builds compiled in isolated environments
|
|
3. **Query API** - Tenant-isolated via RLS; no cross-tenant data leakage
|
|
|
|
### 6.2 Signing & Provenance
|
|
|
|
- All corpus snapshots are signed (DSSE)
|
|
- Fingerprint sets are versioned and signed
|
|
- Every match result references evidence digests
|
|
|
|
### 6.3 Sandbox Requirements
|
|
|
|
Binary extraction and fingerprint generation MUST run with:
|
|
- Seccomp profile restricting syscalls
|
|
- Read-only root filesystem
|
|
- No network access during analysis
|
|
- Memory/CPU limits
|
|
|
|
---
|
|
|
|
## 7. Observability
|
|
|
|
### 7.1 Metrics
|
|
|
|
| Metric | Type | Labels |
|
|
|--------|------|--------|
|
|
| `binaryindex_lookup_total` | Counter | method, result |
|
|
| `binaryindex_lookup_latency_ms` | Histogram | method |
|
|
| `binaryindex_corpus_packages_total` | Gauge | distro, release |
|
|
| `binaryindex_fingerprints_indexed` | Gauge | algorithm, component |
|
|
| `binaryindex_match_confidence` | Histogram | method |
|
|
|
|
### 7.2 Traces
|
|
|
|
- `binaryindex.lookup` - Full lookup span
|
|
- `binaryindex.corpus.ingest` - Corpus ingestion
|
|
- `binaryindex.fingerprint.generate` - Fingerprint generation
|
|
|
|
---
|
|
|
|
## 8. Configuration
|
|
|
|
```yaml
|
|
# binaryindex.yaml
|
|
binaryindex:
|
|
enabled: true
|
|
|
|
corpus:
|
|
connectors:
|
|
- type: debian
|
|
enabled: true
|
|
mirror: http://deb.debian.org/debian
|
|
releases: [bookworm, bullseye]
|
|
architectures: [amd64, arm64]
|
|
- type: ubuntu
|
|
enabled: true
|
|
mirror: http://archive.ubuntu.com/ubuntu
|
|
releases: [jammy, noble]
|
|
|
|
fingerprinting:
|
|
enabled: true
|
|
algorithms: [basic_block, cfg]
|
|
target_components:
|
|
- openssl
|
|
- glibc
|
|
- zlib
|
|
- curl
|
|
- sqlite
|
|
min_function_size: 16 # bytes
|
|
max_functions_per_binary: 10000
|
|
|
|
lookup:
|
|
cache_ttl: 3600
|
|
batch_size: 100
|
|
timeout_ms: 5000
|
|
|
|
storage:
|
|
postgres_schema: binaries
|
|
rustfs_bucket: stellaops/binaryindex
|
|
```
|
|
|
|
---
|
|
|
|
## 9. Testing Strategy
|
|
|
|
### 9.1 Unit Tests
|
|
|
|
- Identity extraction (Build-ID, hashes)
|
|
- Fingerprint generation determinism
|
|
- Fix index parsing (changelog, patch headers)
|
|
|
|
### 9.2 Integration Tests
|
|
|
|
- PostgreSQL schema validation
|
|
- Full corpus ingestion flow
|
|
- Scanner.Worker lookup integration
|
|
|
|
### 9.3 Regression Tests
|
|
|
|
- Known CVE detection (golden corpus)
|
|
- Backport handling (Debian libssl example)
|
|
- False positive rate validation
|
|
|
|
---
|
|
|
|
## 10. References
|
|
|
|
- Advisory: `docs/product-advisories/21-Dec-2025 - Mapping Evidence Within Compiled Binaries.md`
|
|
- Scanner Native Analysis: `src/Scanner/StellaOps.Scanner.Analyzers.Native/`
|
|
- Existing Fingerprinting: `src/Scanner/__Libraries/StellaOps.Scanner.EntryTrace/Binary/`
|
|
- Build-ID Index: `src/Scanner/StellaOps.Scanner.Analyzers.Native/Index/`
|
|
|
|
---
|
|
|
|
*Document Version: 1.0.0*
|
|
*Last Updated: 2025-12-21*
|