Here’s a compact blueprint for a **binary‑level knowledge base** that maps ELF Build‑IDs / PE signatures to vulnerable functions, patch lineage, and reachability hints—so your scanner can act like a provenance‑aware “binary oracle,” not just a CVE lookup. --- # Why this matters (in plain terms) * **Same version ≠ same risk.** Distros (and vendors) frequently **backport** fixes without bumping versions. Only the **binary** tells the truth. * **Function‑level matching** turns noisy “package has CVE” into precise “this exact function range is vulnerable in your binary.” * **Reachability hints** cut triage noise by ranking vulns the code path can actually hit at runtime. --- # Minimal starter schema (MVP) Keep it tiny so it grows with real evidence: **artifacts** * `id (pk)` * `platform` (linux, windows) * `format` (ELF, PE) * `build_id` (ELF `.note.gnu.build-id`), `pdb_guid` / `pe_imphash` (Windows) * `sha256` (whole‑file) * `compiler_fingerprint` (e.g., `gcc-13.2`, `msvc-19.39`) * `source_hint` (optional: pname/version if known) **symbols** * `artifact_id (fk)` * `symbol_name` * `addr_start`, `addr_end` (or RVA for PE) * `section`, `file_offset` (optional) **vuln_segments** * `id (pk)` * `cve_id` (CVE‑YYYY‑NNNN) * `function_signature` (normalized name + arity) * `byte_sig` (short stable pattern around the vulnerable hunk) * `patch_sig` (pattern from fixed hunk) * `evidence_ref` (link to patch diff, commit, or NVD note) * `backport_flag` (bool) * `introduced_in`, `fixed_in` (semver-ish text; note “backport” when used) **matches** * `artifact_id (fk)`, `vuln_segment_id (fk)` * `match_type` (`byte`, `range`, `symbol`) * `confidence` (0–1) * `explain` (why we think this matches) **reachability_hints** * `artifact_id (fk)`, `symbol_name` * `hint_type` (`imported`, `exported`, `hot`, `ebpf_seen`, `graph_core`) * `weight` (0–100) --- # How the oracle answers “Am I affected?” 1. **Identify**: Look up by Build‑ID / PE signature; fall back to file hash. 2. **Locate**: Map symbols → address ranges; scan for `byte_sig`/`patch_sig`. 3. **Decide**: * if `patch_sig` present ⇒ **Not affected (backported)**. * if `byte_sig` present and reachable (weighted) ⇒ **Affected (prioritized)**. * if only `byte_sig` present, unreachable ⇒ **Affected (low priority)**. * if neither ⇒ **Unknown**. 4. **Explain**: Attach `evidence_ref`, the exact offsets, and the reason (match_type + reachability). --- # Ingestion pipeline (no humans in the loop) * **Fingerprinting**: extract Build‑ID / PE GUID; compute `sha256`. * **Symbol map**: parse DWARF/PDB if present; else fall back to heuristics (ELF `symtab`, PE exports). * **Patch intelligence**: auto‑diff upstream commits (plus major distros) → synthesize short **byte signatures** around changed hunks (stable across relocations). * **Evidence links**: store URLs/commit IDs for cross‑audit. * **Noise control**: only accept a vuln signature if it hits N≥3 independent binaries across distros (tunable). --- # Deterministic verdicts (fit to Stella Ops) * **Inputs**: `(artifact fingerprint, vuln_segments@version, reachability@policy)` * **Output**: **Signed OCI attestation** “verdict.json” (same inputs → same verdict). * **Replay**: keep rule bundle & feed hashes for audit. * **Backport precedence**: `patch_sig` beats package version claims every time. --- # Fast path to MVP (2 sprints) * Add a **Build‑ID/PE indexer** to Scanner. * Teach Feedser/Vexer to ingest `vuln_segments` (with `byte_sig`/`patch_sig`). * Implement matching + verdict attestation; surface **“Backported & Safe”** vs **“Affected & Reachable”** badges in UI. * Seed DB with 10 high‑impact CVEs (OpenSSL, zlib, xz, glibc, libxml2, curl, musl, busybox, OpenSSH, sudo). --- # Example: SQL skeleton (Postgres) ```sql create table artifacts( id bigserial primary key, platform text, format text, build_id text, pdb_guid text, pe_imphash text, sha256 bytea not null unique, compiler_fingerprint text, source_hint text ); create table symbols( artifact_id bigint references artifacts(id), symbol_name text, addr_start bigint, addr_end bigint, section text, file_offset bigint ); create table vuln_segments( id bigserial primary key, cve_id text, function_signature text, byte_sig bytea, patch_sig bytea, evidence_ref text, backport_flag boolean, introduced_in text, fixed_in text ); create table matches( artifact_id bigint references artifacts(id), vuln_segment_id bigint references vuln_segments(id), match_type text, confidence real, explain text ); create table reachability_hints( artifact_id bigint references artifacts(id), symbol_name text, hint_type text, weight int ); ``` --- If you want, I can: * drop in a tiny **.NET 10** matcher (ELF/PE parsers + byte‑window scanner), * wire verdicts as **OCI attestations** in your current pipeline, * and prep the first **10 CVE byte/patch signatures** to seed the DB.