# Mapping a Binary Intelligence Graph > **Status:** SUPERSEDED > **Date:** 2026-12-26 > **Updated:** 2025-12-26 > **Superseded By:** BinaryIndex Module Architecture > **Related Sprints:** [`SPRINT_20251226_011_BINIDX_known_build_catalog.md`](../implplan/SPRINT_20251226_011_BINIDX_known_build_catalog.md), [`SPRINT_20251226_012_BINIDX_backport_handling.md`](../implplan/SPRINT_20251226_012_BINIDX_backport_handling.md), [`SPRINT_20251226_013_BINIDX_fingerprint_factory.md`](../implplan/SPRINT_20251226_013_BINIDX_fingerprint_factory.md), [`SPRINT_20251226_014_BINIDX_scanner_integration.md`](../implplan/SPRINT_20251226_014_BINIDX_scanner_integration.md) --- ## Supersession Notice This advisory has been **superseded** by the comprehensive BinaryIndex module architecture. All proposals in this advisory are covered by the existing design: | Advisory Proposal | Implementation | Location | |-------------------|----------------|----------| | artifacts table | `binaries.binary_identity` | `docs/modules/binaryindex/architecture.md` | | symbols table | `BinaryFeatures` in `IBinaryFeatureExtractor` | `src/BinaryIndex/__Libraries/.../Services/` | | vuln_segments (byte_sig/patch_sig) | `VulnFingerprint` model | `src/BinaryIndex/__Libraries/.../Fingerprints/` | | matches table | `FingerprintMatch` model | `src/BinaryIndex/__Libraries/.../Fingerprints/` | | reachability_hints | `ReachabilityStatus` enum | `src/BinaryIndex/__Libraries/.../Models/` | | Build-ID/PE indexer | `ElfFeatureExtractor`, `IBinaryFeatureExtractor` | `src/BinaryIndex/__Libraries/.../Services/` | | Patch-aware handling | `FixEvidence`, changelog/patch parsers | `src/BinaryIndex/__Libraries/.../FixIndex/` | | Corpus connectors | `DebianCorpusConnector`, `IBinaryCorpusConnector` | `src/BinaryIndex/__Libraries/.../Corpus/` | ### Related Archived Advisories - `18-Dec-2025 - Building Better Binary Mapping and Call‑Stack Reachability.md` - `23-Dec-2026 - Binary Mapping as Attestable Proof.md` ### Related Active Advisories - `25-Dec-2025 - Evolving Evidence Models for Reachability.md` - Runtime → build braid, eBPF sampling --- ## Original Advisory Content Here's a compact blueprint for a **binary‑level knowledge base** that maps ELF Build‑IDs / PE signatures to vulnerable functions, patch lineage, and reachability hints—so your scanner can act like a provenance‑aware "binary oracle," not just a CVE lookup. --- # Why this matters (in plain terms) * **Same version ≠ same risk.** Distros (and vendors) frequently **backport** fixes without bumping versions. Only the **binary** tells the truth. * **Function‑level matching** turns noisy "package has CVE" into precise "this exact function range is vulnerable in your binary." * **Reachability hints** cut triage noise by ranking vulns the code path can actually hit at runtime. --- # Minimal starter schema (MVP) Keep it tiny so it grows with real evidence: **artifacts** * `id (pk)` * `platform` (linux, windows) * `format` (ELF, PE) * `build_id` (ELF `.note.gnu.build-id`), `pdb_guid` / `pe_imphash` (Windows) * `sha256` (whole‑file) * `compiler_fingerprint` (e.g., `gcc-13.2`, `msvc-19.39`) * `source_hint` (optional: pname/version if known) **symbols** * `artifact_id (fk)` * `symbol_name` * `addr_start`, `addr_end` (or RVA for PE) * `section`, `file_offset` (optional) **vuln_segments** * `id (pk)` * `cve_id` (CVE‑YYYY‑NNNN) * `function_signature` (normalized name + arity) * `byte_sig` (short stable pattern around the vulnerable hunk) * `patch_sig` (pattern from fixed hunk) * `evidence_ref` (link to patch diff, commit, or NVD note) * `backport_flag` (bool) * `introduced_in`, `fixed_in` (semver-ish text; note "backport" when used) **matches** * `artifact_id (fk)`, `vuln_segment_id (fk)` * `match_type` (`byte`, `range`, `symbol`) * `confidence` (0–1) * `explain` (why we think this matches) **reachability_hints** * `artifact_id (fk)`, `symbol_name` * `hint_type` (`imported`, `exported`, `hot`, `ebpf_seen`, `graph_core`) * `weight` (0–100) --- # How the oracle answers "Am I affected?" 1. **Identify**: Look up by Build‑ID / PE signature; fall back to file hash. 2. **Locate**: Map symbols → address ranges; scan for `byte_sig`/`patch_sig`. 3. **Decide**: * if `patch_sig` present ⇒ **Not affected (backported)**. * if `byte_sig` present and reachable (weighted) ⇒ **Affected (prioritized)**. * if only `byte_sig` present, unreachable ⇒ **Affected (low priority)**. * if neither ⇒ **Unknown**. 4. **Explain**: Attach `evidence_ref`, the exact offsets, and the reason (match_type + reachability). --- # Ingestion pipeline (no humans in the loop) * **Fingerprinting**: extract Build‑ID / PE GUID; compute `sha256`. * **Symbol map**: parse DWARF/PDB if present; else fall back to heuristics (ELF `symtab`, PE exports). * **Patch intelligence**: auto‑diff upstream commits (plus major distros) → synthesize short **byte signatures** around changed hunks (stable across relocations). * **Evidence links**: store URLs/commit IDs for cross‑audit. * **Noise control**: only accept a vuln signature if it hits N≥3 independent binaries across distros (tunable). --- # Deterministic verdicts (fit to Stella Ops) * **Inputs**: `(artifact fingerprint, vuln_segments@version, reachability@policy)` * **Output**: **Signed OCI attestation** "verdict.json" (same inputs → same verdict). * **Replay**: keep rule bundle & feed hashes for audit. * **Backport precedence**: `patch_sig` beats package version claims every time. --- # Fast path to MVP (2 sprints) * Add a **Build‑ID/PE indexer** to Scanner. * Teach Feedser/Vexer to ingest `vuln_segments` (with `byte_sig`/`patch_sig`). * Implement matching + verdict attestation; surface **"Backported & Safe"** vs **"Affected & Reachable"** badges in UI. * Seed DB with 10 high‑impact CVEs (OpenSSL, zlib, xz, glibc, libxml2, curl, musl, busybox, OpenSSH, sudo). --- # Example: SQL skeleton (Postgres) ```sql create table artifacts( id bigserial primary key, platform text, format text, build_id text, pdb_guid text, pe_imphash text, sha256 bytea not null unique, compiler_fingerprint text, source_hint text ); create table symbols( artifact_id bigint references artifacts(id), symbol_name text, addr_start bigint, addr_end bigint, section text, file_offset bigint ); create table vuln_segments( id bigserial primary key, cve_id text, function_signature text, byte_sig bytea, patch_sig bytea, evidence_ref text, backport_flag boolean, introduced_in text, fixed_in text ); create table matches( artifact_id bigint references artifacts(id), vuln_segment_id bigint references vuln_segments(id), match_type text, confidence real, explain text ); create table reachability_hints( artifact_id bigint references artifacts(id), symbol_name text, hint_type text, weight int ); ```