4.9 KiB
4.9 KiB
Here’s a compact blueprint for a binary‑level knowledge base that maps ELF Build‑IDs / PE signatures to vulnerable functions, patch lineage, and reachability hints—so your scanner can act like a provenance‑aware “binary oracle,” not just a CVE lookup.
Why this matters (in plain terms)
- Same version ≠ same risk. Distros (and vendors) frequently backport fixes without bumping versions. Only the binary tells the truth.
- Function‑level matching turns noisy “package has CVE” into precise “this exact function range is vulnerable in your binary.”
- Reachability hints cut triage noise by ranking vulns the code path can actually hit at runtime.
Minimal starter schema (MVP)
Keep it tiny so it grows with real evidence:
artifacts
id (pk)platform(linux, windows)format(ELF, PE)build_id(ELF.note.gnu.build-id),pdb_guid/pe_imphash(Windows)sha256(whole‑file)compiler_fingerprint(e.g.,gcc-13.2,msvc-19.39)source_hint(optional: pname/version if known)
symbols
artifact_id (fk)symbol_nameaddr_start,addr_end(or RVA for PE)section,file_offset(optional)
vuln_segments
id (pk)cve_id(CVE‑YYYY‑NNNN)function_signature(normalized name + arity)byte_sig(short stable pattern around the vulnerable hunk)patch_sig(pattern from fixed hunk)evidence_ref(link to patch diff, commit, or NVD note)backport_flag(bool)introduced_in,fixed_in(semver-ish text; note “backport” when used)
matches
artifact_id (fk),vuln_segment_id (fk)match_type(byte,range,symbol)confidence(0–1)explain(why we think this matches)
reachability_hints
artifact_id (fk),symbol_namehint_type(imported,exported,hot,ebpf_seen,graph_core)weight(0–100)
How the oracle answers “Am I affected?”
-
Identify: Look up by Build‑ID / PE signature; fall back to file hash.
-
Locate: Map symbols → address ranges; scan for
byte_sig/patch_sig. -
Decide:
- if
patch_sigpresent ⇒ Not affected (backported). - if
byte_sigpresent and reachable (weighted) ⇒ Affected (prioritized). - if only
byte_sigpresent, unreachable ⇒ Affected (low priority). - if neither ⇒ Unknown.
- if
-
Explain: Attach
evidence_ref, the exact offsets, and the reason (match_type + reachability).
Ingestion pipeline (no humans in the loop)
- Fingerprinting: extract Build‑ID / PE GUID; compute
sha256. - Symbol map: parse DWARF/PDB if present; else fall back to heuristics (ELF
symtab, PE exports). - Patch intelligence: auto‑diff upstream commits (plus major distros) → synthesize short byte signatures around changed hunks (stable across relocations).
- Evidence links: store URLs/commit IDs for cross‑audit.
- Noise control: only accept a vuln signature if it hits N≥3 independent binaries across distros (tunable).
Deterministic verdicts (fit to Stella Ops)
- Inputs:
(artifact fingerprint, vuln_segments@version, reachability@policy) - Output: Signed OCI attestation “verdict.json” (same inputs → same verdict).
- Replay: keep rule bundle & feed hashes for audit.
- Backport precedence:
patch_sigbeats package version claims every time.
Fast path to MVP (2 sprints)
- Add a Build‑ID/PE indexer to Scanner.
- Teach Feedser/Vexer to ingest
vuln_segments(withbyte_sig/patch_sig). - Implement matching + verdict attestation; surface “Backported & Safe” vs “Affected & Reachable” badges in UI.
- Seed DB with 10 high‑impact CVEs (OpenSSL, zlib, xz, glibc, libxml2, curl, musl, busybox, OpenSSH, sudo).
Example: SQL skeleton (Postgres)
create table artifacts(
id bigserial primary key,
platform text, format text,
build_id text, pdb_guid text, pe_imphash text,
sha256 bytea not null unique,
compiler_fingerprint text, source_hint text
);
create table symbols(
artifact_id bigint references artifacts(id),
symbol_name text, addr_start bigint, addr_end bigint,
section text, file_offset bigint
);
create table vuln_segments(
id bigserial primary key,
cve_id text, function_signature text,
byte_sig bytea, patch_sig bytea,
evidence_ref text, backport_flag boolean,
introduced_in text, fixed_in text
);
create table matches(
artifact_id bigint references artifacts(id),
vuln_segment_id bigint references vuln_segments(id),
match_type text, confidence real, explain text
);
create table reachability_hints(
artifact_id bigint references artifacts(id),
symbol_name text, hint_type text, weight int
);
If you want, I can:
- drop in a tiny .NET 10 matcher (ELF/PE parsers + byte‑window scanner),
- wire verdicts as OCI attestations in your current pipeline,
- and prep the first 10 CVE byte/patch signatures to seed the DB.