# BinaryIndex Module Architecture > **Ownership:** Scanner Guild + Concelier Guild > **Status:** DRAFT > **Version:** 1.0.0 > **Related:** [High-Level Architecture](../../ARCHITECTURE_OVERVIEW.md), [Scanner Architecture](../scanner/architecture.md), [Concelier Architecture](../concelier/architecture.md) --- ## 1. Overview The **BinaryIndex** module provides a vulnerable binaries database that enables detection of vulnerable code at the binary level, independent of package metadata. This addresses a critical gap in vulnerability scanning: package version strings can lie (backports, custom builds, stripped metadata), but **binary identity doesn't lie**. ### 1.1 Problem Statement Traditional vulnerability scanners rely on package version matching, which fails in several scenarios: 1. **Backported patches** - Distros backport security fixes without changing upstream version 2. **Custom/vendored builds** - Binaries compiled from source without package metadata 3. **Stripped binaries** - Debug info and version strings removed 4. **Static linking** - Vulnerable library code embedded in final binary 5. **Container base images** - Distroless or scratch images with no package DB ### 1.2 Solution: Binary-First Vulnerability Detection BinaryIndex provides three tiers of binary identification: | Tier | Method | Precision | Coverage | |------|--------|-----------|----------| | A | Package/version range matching | Medium | High | | B | Build-ID/hash catalog (exact binary identity) | High | Medium | | C | Function fingerprints (CFG/basic-block hashes) | Very High | Targeted | ### 1.3 Module Scope **In Scope:** - Binary identity extraction (Build-ID, PE CodeView GUID, Mach-O UUID) - Binary-to-advisory mapping database - Fingerprint storage and matching engine - Fix index for patch-aware backport handling - Integration with Scanner.Worker for binary lookup **Out of Scope:** - Binary disassembly/analysis (provided by Scanner.Analyzers.Native) - Runtime binary tracing (provided by Zastava) - SBOM generation (provided by Scanner) --- ## 2. Architecture ### 2.1 System Context ``` ┌──────────────────────────────────────────────────────────────────────────┐ │ External Systems │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Distro Repos │ │ Debug Symbol │ │ Upstream Source │ │ │ │ (Debian, RPM, │ │ Servers │ │ (GitHub, etc.) │ │ │ │ Alpine) │ │ (debuginfod) │ │ │ │ │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ └───────────│─────────────────────│─────────────────────│──────────────────┘ │ │ │ v v v ┌──────────────────────────────────────────────────────────────────────────┐ │ BinaryIndex Module │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ Corpus Ingestion Layer │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ DebianCorpus │ │ RpmCorpus │ │ AlpineCorpus │ │ │ │ │ │ Connector │ │ Connector │ │ Connector │ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ v │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ Processing Layer │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ BinaryFeature│ │ FixIndex │ │ Fingerprint │ │ │ │ │ │ Extractor │ │ Builder │ │ Generator │ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ v │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ Storage Layer │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ PostgreSQL │ │ RustFS │ │ Valkey │ │ │ │ │ │ (binaries │ │ (fingerprint │ │ (lookup │ │ │ │ │ │ schema) │ │ blobs) │ │ cache) │ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ v │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ Query Layer │ │ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ │ │ IBinaryVulnerabilityService │ │ │ │ │ │ - LookupByBuildIdAsync(buildId) │ │ │ │ │ │ - LookupByFingerprintAsync(fingerprint) │ │ │ │ │ │ - LookupBatchAsync(identities) │ │ │ │ │ │ - GetFixStatusAsync(distro, release, sourcePkg, cve) │ │ │ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────────────────────────────┘ │ v ┌──────────────────────────────────────────────────────────────────────────┐ │ Consuming Modules │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Scanner.Worker │ │ Policy Engine │ │ Findings Ledger │ │ │ │ (binary lookup │ │ (evidence in │ │ (match records) │ │ │ │ during scan) │ │ proof chain) │ │ │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ └──────────────────────────────────────────────────────────────────────────┘ ``` ### 2.2 Component Breakdown #### 2.2.1 Corpus Connectors Plugin-based connectors that ingest binaries from distribution repositories. ```csharp public interface IBinaryCorpusConnector { string ConnectorId { get; } string[] SupportedDistros { get; } Task FetchSnapshotAsync(CorpusQuery query, CancellationToken ct); Task> ExtractBinariesAsync(PackageReference pkg, CancellationToken ct); } ``` **Implementations:** - `DebianBinaryCorpusConnector` - Debian/Ubuntu packages + debuginfo - `RpmBinaryCorpusConnector` - RHEL/Fedora/CentOS + SRPM - `AlpineBinaryCorpusConnector` - Alpine APK + APKBUILD #### 2.2.2 Binary Feature Extractor Extracts identity and features from binaries. Reuses existing Scanner.Analyzers.Native capabilities. ```csharp public interface IBinaryFeatureExtractor { Task ExtractIdentityAsync(Stream binaryStream, CancellationToken ct); Task ExtractFeaturesAsync(Stream binaryStream, ExtractorOptions opts, CancellationToken ct); } public sealed record BinaryIdentity( string Format, // elf, pe, macho string? BuildId, // ELF GNU Build-ID string? PeCodeViewGuid, // PE CodeView GUID + Age string? MachoUuid, // Mach-O LC_UUID string FileSha256, string TextSectionSha256); public sealed record BinaryFeatures( BinaryIdentity Identity, string[] DynamicDeps, // DT_NEEDED string[] ExportedSymbols, string[] ImportedSymbols, BinaryHardening Hardening); ``` #### 2.2.3 Fix Index Builder Builds the patch-aware CVE fix index from distro sources. ```csharp public interface IFixIndexBuilder { Task BuildIndexAsync(DistroRelease distro, CancellationToken ct); Task GetFixRecordAsync(string distro, string release, string sourcePkg, string cveId, CancellationToken ct); } public sealed record FixRecord( string Distro, string Release, string SourcePkg, string CveId, FixState State, // fixed, vulnerable, not_affected, wontfix, unknown string? FixedVersion, // Distro version string FixMethod Method, // security_feed, changelog, patch_header decimal Confidence, // 0.00-1.00 FixEvidence Evidence); public enum FixState { Fixed, Vulnerable, NotAffected, Wontfix, Unknown } public enum FixMethod { SecurityFeed, Changelog, PatchHeader, UpstreamPatchMatch } ``` #### 2.2.4 Fingerprint Generator Generates function-level fingerprints for vulnerable code detection. ```csharp public interface IVulnFingerprintGenerator { Task> GenerateAsync( string cveId, BinaryPair vulnAndFixed, // Reference builds FingerprintOptions opts, CancellationToken ct); } public sealed record VulnFingerprint( string CveId, string Component, // e.g., openssl string Architecture, // x86-64, aarch64 FingerprintType Type, // basic_block, cfg, combined string FingerprintId, // e.g., "bb-abc123..." byte[] FingerprintHash, // 16-32 bytes string? FunctionHint, // Function name if known decimal Confidence, FingerprintEvidence Evidence); public enum FingerprintType { BasicBlock, ControlFlowGraph, StringReferences, Combined } ``` #### 2.2.5 Semantic Analysis Library > **Library:** `StellaOps.BinaryIndex.Semantic` > **Sprint:** 20260105_001_001_BINDEX - Semantic Diffing Phase 1 The Semantic Analysis Library extends fingerprint generation with IR-level semantic matching, enabling detection of semantically equivalent code despite compiler optimizations, instruction reordering, and register allocation differences. **Key Insight:** Traditional instruction-level fingerprinting loses accuracy on optimized binaries by ~15-20%. Semantic analysis lifts to B2R2's Intermediate Representation (LowUIR), extracts key-semantics graphs, and uses graph hashing for similarity computation. ##### 2.2.5.1 Architecture ``` Binary Input │ v B2R2 Disassembly → Raw Instructions │ v IR Lifting Service → LowUIR Statements │ v Semantic Graph Extractor → Key-Semantics Graph (KSG) │ v Graph Fingerprinting → Semantic Fingerprint │ v Semantic Matcher → Similarity Score + Deltas ``` ##### 2.2.5.2 Core Components **IR Lifting Service** (`IIrLiftingService`) Lifts disassembled instructions to B2R2 LowUIR: ```csharp public interface IIrLiftingService { Task LiftToIrAsync( IReadOnlyList instructions, string functionName, LiftOptions? options = null, CancellationToken ct = default); } public sealed record LiftedFunction( string Name, ImmutableArray Statements, ImmutableArray BasicBlocks); ``` **Semantic Graph Extractor** (`ISemanticGraphExtractor`) Extracts key-semantics graphs capturing data dependencies, control flow, and memory operations: ```csharp public interface ISemanticGraphExtractor { Task ExtractGraphAsync( LiftedFunction function, GraphExtractionOptions? options = null, CancellationToken ct = default); } public sealed record KeySemanticsGraph( string FunctionName, ImmutableArray Nodes, ImmutableArray Edges, GraphProperties Properties); public enum SemanticNodeType { Compute, Load, Store, Branch, Call, Return, Phi } public enum SemanticEdgeType { DataDependency, ControlDependency, MemoryDependency } ``` **Semantic Fingerprint Generator** (`ISemanticFingerprintGenerator`) Generates semantic fingerprints using Weisfeiler-Lehman graph hashing: ```csharp public interface ISemanticFingerprintGenerator { Task GenerateAsync( KeySemanticsGraph graph, SemanticFingerprintOptions? options = null, CancellationToken ct = default); } public sealed record SemanticFingerprint( string FunctionName, string GraphHashHex, // WL graph hash (SHA-256) string OperationHashHex, // Normalized operation sequence hash string DataFlowHashHex, // Data dependency pattern hash int NodeCount, int EdgeCount, int CyclomaticComplexity, ImmutableArray ApiCalls, SemanticFingerprintAlgorithm Algorithm); ``` **Semantic Matcher** (`ISemanticMatcher`) Computes semantic similarity with weighted components: ```csharp public interface ISemanticMatcher { Task MatchAsync( SemanticFingerprint a, SemanticFingerprint b, MatchOptions? options = null, CancellationToken ct = default); Task MatchWithDeltasAsync( SemanticFingerprint a, SemanticFingerprint b, MatchOptions? options = null, CancellationToken ct = default); } public sealed record SemanticMatchResult( decimal Similarity, // 0.00-1.00 decimal GraphSimilarity, decimal OperationSimilarity, decimal DataFlowSimilarity, decimal ApiCallSimilarity, MatchConfidence Confidence); ``` ##### 2.2.5.3 Algorithm Details **Weisfeiler-Lehman Graph Hashing:** - 3 iterations of label propagation - SHA-256 for final hash computation - Deterministic node ordering via canonical sort **Similarity Weights (Default):** | Component | Weight | |-----------|--------| | Graph Hash | 0.35 | | Operation Hash | 0.25 | | Data Flow Hash | 0.25 | | API Calls | 0.15 | ##### 2.2.5.4 Integration Points The semantic library integrates with existing BinaryIndex components: **DeltaSignatureGenerator Extension:** ```csharp // Optional semantic services via constructor injection services.AddDeltaSignaturesWithSemantic(); // Extended SymbolSignature with semantic properties public sealed record SymbolSignature { // ... existing properties ... public string? SemanticHashHex { get; init; } public ImmutableArray SemanticApiCalls { get; init; } } ``` **PatchDiffEngine Extension:** ```csharp // SemanticWeight in HashWeights public decimal SemanticWeight { get; init; } = 0.2m; // FunctionFingerprint extended with semantic fingerprint public SemanticFingerprint? SemanticFingerprint { get; init; } ``` ##### 2.2.5.5 Test Coverage | Category | Tests | Coverage | |----------|-------|----------| | Unit Tests (IR lifting, graph extraction, hashing) | 53 | Core algorithms | | Integration Tests (full pipeline) | 9 | End-to-end flow | | Golden Corpus (compiler variations) | 11 | Register allocation, optimization, compiler variants | | Benchmarks (accuracy, performance) | 7 | Baseline metrics | ##### 2.2.5.6 Current Baselines > **Note:** Baselines reflect foundational implementation; accuracy improves as semantic features mature. | Metric | Baseline | Target | |--------|----------|--------| | Similarity (register allocation variants) | ≥0.55 | ≥0.85 | | Overall accuracy | ≥40% | ≥70% | | False positive rate | <10% | <5% | | P95 fingerprint latency | <100ms | <50ms | #### 2.2.6 Binary Vulnerability Service Main query interface for consumers. ```csharp public interface IBinaryVulnerabilityService { /// /// Look up vulnerabilities by Build-ID or equivalent binary identity. /// Task> LookupByIdentityAsync( BinaryIdentity identity, LookupOptions? opts = null, CancellationToken ct = default); /// /// Look up vulnerabilities by function fingerprint. /// Task> LookupByFingerprintAsync( CodeFingerprint fingerprint, decimal minSimilarity = 0.95m, CancellationToken ct = default); /// /// Batch lookup for scan performance. /// Task>> LookupBatchAsync( IEnumerable identities, LookupOptions? opts = null, CancellationToken ct = default); /// /// Get distro-specific fix status (patch-aware). /// Task GetFixStatusAsync( string distro, string release, string sourcePkg, string cveId, CancellationToken ct = default); } public sealed record BinaryVulnMatch( string CveId, string VulnerablePurl, MatchMethod Method, // buildid_catalog, fingerprint_match, range_match decimal Confidence, MatchEvidence Evidence); public enum MatchMethod { BuildIdCatalog, FingerprintMatch, RangeMatch } ``` --- ## 3. Data Model ### 3.1 PostgreSQL Schema (`binaries`) The `binaries` schema stores binary identity, fingerprint, and match data. ```sql CREATE SCHEMA IF NOT EXISTS binaries; CREATE SCHEMA IF NOT EXISTS binaries_app; -- RLS helper CREATE OR REPLACE FUNCTION binaries_app.require_current_tenant() RETURNS TEXT LANGUAGE plpgsql STABLE SECURITY DEFINER AS $$ DECLARE v_tenant TEXT; BEGIN v_tenant := current_setting('app.tenant_id', true); IF v_tenant IS NULL OR v_tenant = '' THEN RAISE EXCEPTION 'app.tenant_id session variable not set'; END IF; RETURN v_tenant; END; $$; ``` #### 3.1.1 Core Tables See `docs/db/schemas/binaries_schema_specification.md` for complete DDL. **Key Tables:** | Table | Purpose | |-------|---------| | `binaries.binary_identity` | Known binary identities (Build-ID, hashes) | | `binaries.binary_package_map` | Binary → package mapping per snapshot | | `binaries.vulnerable_buildids` | Build-IDs known to be vulnerable | | `binaries.vulnerable_fingerprints` | Function fingerprints for CVEs | | `binaries.cve_fix_index` | Patch-aware fix status per distro | | `binaries.fingerprint_matches` | Match results (findings evidence) | | `binaries.corpus_snapshots` | Corpus ingestion tracking | ### 3.2 RustFS Layout ``` rustfs://stellaops/binaryindex/ fingerprints///.bin corpus////manifest.json corpus////packages/.metadata.json evidence/.dsse.json ``` --- ## 4. Integration Points ### 4.1 Scanner.Worker Integration During container scanning, Scanner.Worker queries BinaryIndex for each extracted binary: ```mermaid sequenceDiagram participant SW as Scanner.Worker participant BI as BinaryIndex participant PG as PostgreSQL participant FL as Findings Ledger SW->>SW: Extract binary from layer SW->>SW: Compute BinaryIdentity SW->>BI: LookupByIdentityAsync(identity) BI->>PG: Query binaries.vulnerable_buildids PG-->>BI: Matches BI->>PG: Query binaries.cve_fix_index (if distro known) PG-->>BI: Fix status BI-->>SW: BinaryVulnMatch[] SW->>FL: RecordFinding(match, evidence) ``` ### 4.2 Concelier Integration BinaryIndex subscribes to Concelier's advisory updates: ```mermaid sequenceDiagram participant CO as Concelier participant BI as BinaryIndex participant PG as PostgreSQL CO->>CO: Ingest new advisory CO->>BI: advisory.created event BI->>BI: Check if affected packages in corpus BI->>PG: Update binaries.binary_vuln_assertion BI->>BI: Queue fingerprint generation (if high-impact) ``` ### 4.3 Policy Integration Binary matches are recorded as proof segments: ```json { "segment_type": "binary_fingerprint_evidence", "payload": { "binary_identity": { "format": "elf", "build_id": "abc123...", "file_sha256": "def456..." }, "matches": [ { "cve_id": "CVE-2024-1234", "method": "buildid_catalog", "confidence": 0.98, "vulnerable_purl": "pkg:deb/debian/libssl3@1.1.1n-0+deb11u3" } ] } } ``` --- ## 5. MVP Roadmap ### MVP 1: Known-Build Binary Catalog (Sprint 6000.0001) **Goal:** Query "is this Build-ID vulnerable?" with distro-level precision. **Deliverables:** - `binaries` PostgreSQL schema - Build-ID to package mapping tables - Basic CVE lookup by binary identity - Debian/Ubuntu corpus connector ### MVP 2: Patch-Aware Backport Handling (Sprint 6000.0002) **Goal:** Handle "version says vulnerable but distro backported the fix." **Deliverables:** - Fix index builder (changelog + patch header parsing) - Distro-specific version comparison - RPM corpus connector - Scanner.Worker integration ### MVP 3: Binary Fingerprint Factory (Sprint 6000.0003) **Goal:** Detect vulnerable code independent of package metadata. **Deliverables:** - Fingerprint storage and matching - Reference build generation pipeline - Fingerprint validation corpus - High-impact CVE coverage (OpenSSL, glibc, zlib, curl) ### MVP 4: Full Scanner Integration (Sprint 6000.0004) **Goal:** Binary evidence in production scans. **Deliverables:** - Scanner.Worker binary lookup integration - Findings Ledger binary match records - Proof segment attestations - CLI binary match inspection --- ## 5b. Fix Evidence Chain The **Fix Evidence Chain** provides auditable proof of why a CVE is marked as fixed (or not) for a specific distro/package combination. This is critical for patch-aware backport handling where package versions can be misleading. ### 5b.1 Evidence Sources | Source | Confidence | Description | |--------|------------|-------------| | **Security Feed (OVAL)** | 0.95-0.99 | Authoritative feed from distro (Debian Security Tracker, Red Hat OVAL) | | **Patch Header (DEP-3)** | 0.87-0.95 | CVE reference in Debian/Ubuntu patch metadata | | **Changelog** | 0.75-0.85 | CVE mention in debian/changelog or RPM %changelog | | **Upstream Patch Match** | 0.90 | Binary diff matches known upstream fix | ### 5b.2 Evidence Storage Evidence is stored in two PostgreSQL tables: ```sql -- Fix index: one row per (distro, release, source_pkg, cve_id) CREATE TABLE binaries.cve_fix_index ( id UUID PRIMARY KEY, tenant_id TEXT NOT NULL, distro TEXT NOT NULL, -- debian, ubuntu, alpine, rhel release TEXT NOT NULL, -- bookworm, jammy, v3.19 source_pkg TEXT NOT NULL, cve_id TEXT NOT NULL, state TEXT NOT NULL, -- fixed, vulnerable, not_affected, wontfix, unknown fixed_version TEXT, method TEXT NOT NULL, -- security_feed, changelog, patch_header, upstream_match confidence DECIMAL(3,2) NOT NULL, evidence_id UUID REFERENCES binaries.fix_evidence(id), snapshot_id UUID, indexed_at TIMESTAMPTZ NOT NULL DEFAULT now(), UNIQUE (tenant_id, distro, release, source_pkg, cve_id) ); -- Evidence blobs: audit trail CREATE TABLE binaries.fix_evidence ( id UUID PRIMARY KEY, tenant_id TEXT NOT NULL, evidence_type TEXT NOT NULL, -- changelog, patch_header, security_feed source_file TEXT, -- Path to source file (changelog, patch) source_sha256 TEXT, -- Hash of source file excerpt TEXT, -- Relevant snippet (max 1KB) metadata JSONB NOT NULL, -- Structured metadata snapshot_id UUID, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); ``` ### 5b.3 Evidence Types **ChangelogEvidence:** ```json { "evidence_type": "changelog", "source_file": "debian/changelog", "excerpt": "* Fix CVE-2024-0727: PKCS12 decoding crash", "metadata": { "version": "3.0.11-1~deb12u2", "line_number": 5 } } ``` **PatchHeaderEvidence:** ```json { "evidence_type": "patch_header", "source_file": "debian/patches/CVE-2024-0727.patch", "excerpt": "CVE: CVE-2024-0727\nOrigin: upstream, https://github.com/openssl/commit/abc123", "metadata": { "patch_sha256": "abc123def456..." } } ``` **SecurityFeedEvidence:** ```json { "evidence_type": "security_feed", "metadata": { "feed_id": "debian-security-tracker", "entry_id": "DSA-5678-1", "published_at": "2024-01-15T10:00:00Z" } } ``` ### 5b.4 Confidence Resolution When multiple evidence sources exist for the same CVE, the system keeps the **highest confidence** entry: ```csharp ON CONFLICT (tenant_id, distro, release, source_pkg, cve_id) DO UPDATE SET confidence = GREATEST(existing.confidence, new.confidence), method = CASE WHEN existing.confidence < new.confidence THEN new.method ELSE existing.method END, evidence_id = CASE WHEN existing.confidence < new.confidence THEN new.evidence_id ELSE existing.evidence_id END ``` ### 5b.5 Parsers The following parsers extract CVE fix information: | Parser | Distros | Input | Confidence | |--------|---------|-------|------------| | `DebianChangelogParser` | Debian, Ubuntu | debian/changelog | 0.80 | | `PatchHeaderParser` | Debian, Ubuntu | debian/patches/*.patch (DEP-3) | 0.87 | | `AlpineSecfixesParser` | Alpine | APKBUILD secfixes block | 0.95 | | `RpmChangelogParser` | RHEL, Fedora, CentOS | RPM spec %changelog | 0.75 | ### 5b.6 Query Flow ```mermaid sequenceDiagram participant SW as Scanner.Worker participant BVS as BinaryVulnerabilityService participant FIR as FixIndexRepository participant PG as PostgreSQL SW->>BVS: GetFixStatusAsync(debian, bookworm, openssl, CVE-2024-0727) BVS->>FIR: GetFixStatusAsync(...) FIR->>PG: SELECT FROM cve_fix_index WHERE ... PG-->>FIR: FixIndexEntry (state=fixed, confidence=0.87) FIR-->>BVS: FixStatusResult BVS-->>SW: {state: Fixed, confidence: 0.87, method: PatchHeader} ``` --- ## 6. Security Considerations ### 6.1 Trust Boundaries 1. **Corpus Ingestion** - Packages are untrusted; extraction runs in sandboxed workers 2. **Fingerprint Generation** - Reference builds compiled in isolated environments 3. **Query API** - Tenant-isolated via RLS; no cross-tenant data leakage ### 6.2 Signing & Provenance - All corpus snapshots are signed (DSSE) - Fingerprint sets are versioned and signed - Every match result references evidence digests ### 6.3 Sandbox Requirements Binary extraction and fingerprint generation MUST run with: - Seccomp profile restricting syscalls - Read-only root filesystem - No network access during analysis - Memory/CPU limits --- ## 7. Observability ### 7.1 Metrics | Metric | Type | Labels | |--------|------|--------| | `binaryindex_lookup_total` | Counter | method, result | | `binaryindex_lookup_latency_ms` | Histogram | method | | `binaryindex_corpus_packages_total` | Gauge | distro, release | | `binaryindex_fingerprints_indexed` | Gauge | algorithm, component | | `binaryindex_match_confidence` | Histogram | method | ### 7.2 Traces - `binaryindex.lookup` - Full lookup span - `binaryindex.corpus.ingest` - Corpus ingestion - `binaryindex.fingerprint.generate` - Fingerprint generation ### 7.3 Ops Endpoints BinaryIndex exposes read-only ops endpoints for health, bench, cache, and effective configuration: - GET `/api/v1/ops/binaryindex/health` -> BinaryIndexOpsHealthResponse - POST `/api/v1/ops/binaryindex/bench/run` -> BinaryIndexBenchResponse - GET `/api/v1/ops/binaryindex/cache` -> BinaryIndexFunctionCacheStats - GET `/api/v1/ops/binaryindex/config` -> BinaryIndexEffectiveConfig --- ## 8. Configuration ```yaml # binaryindex.yaml binaryindex: enabled: true corpus: connectors: - type: debian enabled: true mirror: http://deb.debian.org/debian releases: [bookworm, bullseye] architectures: [amd64, arm64] - type: ubuntu enabled: true mirror: http://archive.ubuntu.com/ubuntu releases: [jammy, noble] fingerprinting: enabled: true algorithms: [basic_block, cfg] target_components: - openssl - glibc - zlib - curl - sqlite min_function_size: 16 # bytes max_functions_per_binary: 10000 lookup: cache_ttl: 3600 batch_size: 100 timeout_ms: 5000 storage: postgres_schema: binaries rustfs_bucket: stellaops/binaryindex ``` Additional appsettings sections (case-insensitive): - `BinaryIndex:B2R2Pool` - lifter pool sizing and warm ISA list. - `BinaryIndex:SemanticLifting` - LowUIR enablement and deterministic controls. - `BinaryIndex:FunctionCache` - Valkey function cache configuration. - `Postgres:BinaryIndex` - persistence for canonical IR fingerprints. --- ## 9. Testing Strategy ### 9.1 Unit Tests - Identity extraction (Build-ID, hashes) - Fingerprint generation determinism - Fix index parsing (changelog, patch headers) ### 9.2 Integration Tests - PostgreSQL schema validation - Full corpus ingestion flow - Scanner.Worker lookup integration ### 9.3 Regression Tests - Known CVE detection (golden corpus) - Backport handling (Debian libssl example) - False positive rate validation --- ## 10. References - Advisory: `docs/product/advisories/21-Dec-2025 - Mapping Evidence Within Compiled Binaries.md` - Scanner Native Analysis: `src/Scanner/StellaOps.Scanner.Analyzers.Native/` - Existing Fingerprinting: `src/Scanner/__Libraries/StellaOps.Scanner.EntryTrace/Binary/` - Build-ID Index: `src/Scanner/StellaOps.Scanner.Analyzers.Native/Index/` - **Semantic Diffing Sprint:** `docs/implplan/SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.md` - **Semantic Library:** `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/` - **Semantic Tests:** `src/BinaryIndex/__Tests/StellaOps.BinaryIndex.Semantic.Tests/` --- *Document Version: 1.1.1* *Last Updated: 2026-01-14*