- Introduced new advisory documents for archived superseded advisories, including detailed descriptions of features already implemented or covered by existing sprints. - Added "Smart-Diff as a Core Evidence Primitive" advisory outlining the treatment of SBOM diffs as first-class evidence objects, enhancing vulnerability verdicts with deterministic replayability. - Created "Visual Diffs for Explainable Triage" advisory to improve user experience in understanding policy decisions and reachability changes through visual diffs. - Implemented "Weighted Confidence for VEX Sources" advisory to rank conflicting vulnerability evidence based on freshness and confidence, facilitating better decision-making. - Established a signer module charter detailing the mission, expectations, key components, and signing modes for cryptographic signing services in StellaOps. - Consolidated overlapping concepts from triage UI, visual diffs, and risk budget visualization advisories into a unified specification for better clarity and implementation tracking.
6.7 KiB
Mapping a Binary Intelligence Graph
Status: SUPERSEDED Date: 2026-12-26 Updated: 2025-12-26 Superseded By: BinaryIndex Module Architecture Related Sprints:
SPRINT_20251226_011_BINIDX_known_build_catalog.md,SPRINT_20251226_012_BINIDX_backport_handling.md,SPRINT_20251226_013_BINIDX_fingerprint_factory.md,SPRINT_20251226_014_BINIDX_scanner_integration.md
Supersession Notice
This advisory has been superseded by the comprehensive BinaryIndex module architecture. All proposals in this advisory are covered by the existing design:
| Advisory Proposal | Implementation | Location |
|---|---|---|
| artifacts table | binaries.binary_identity |
docs/modules/binaryindex/architecture.md |
| symbols table | BinaryFeatures in IBinaryFeatureExtractor |
src/BinaryIndex/__Libraries/.../Services/ |
| vuln_segments (byte_sig/patch_sig) | VulnFingerprint model |
src/BinaryIndex/__Libraries/.../Fingerprints/ |
| matches table | FingerprintMatch model |
src/BinaryIndex/__Libraries/.../Fingerprints/ |
| reachability_hints | ReachabilityStatus enum |
src/BinaryIndex/__Libraries/.../Models/ |
| Build-ID/PE indexer | ElfFeatureExtractor, IBinaryFeatureExtractor |
src/BinaryIndex/__Libraries/.../Services/ |
| Patch-aware handling | FixEvidence, changelog/patch parsers |
src/BinaryIndex/__Libraries/.../FixIndex/ |
| Corpus connectors | DebianCorpusConnector, IBinaryCorpusConnector |
src/BinaryIndex/__Libraries/.../Corpus/ |
Related Archived Advisories
18-Dec-2025 - Building Better Binary Mapping and Call‑Stack Reachability.md23-Dec-2026 - Binary Mapping as Attestable Proof.md
Related Active Advisories
25-Dec-2025 - Evolving Evidence Models for Reachability.md- Runtime → build braid, eBPF sampling
Original Advisory Content
Here's a compact blueprint for a binary‑level knowledge base that maps ELF Build‑IDs / PE signatures to vulnerable functions, patch lineage, and reachability hints—so your scanner can act like a provenance‑aware "binary oracle," not just a CVE lookup.
Why this matters (in plain terms)
- Same version ≠ same risk. Distros (and vendors) frequently backport fixes without bumping versions. Only the binary tells the truth.
- Function‑level matching turns noisy "package has CVE" into precise "this exact function range is vulnerable in your binary."
- Reachability hints cut triage noise by ranking vulns the code path can actually hit at runtime.
Minimal starter schema (MVP)
Keep it tiny so it grows with real evidence:
artifacts
id (pk)platform(linux, windows)format(ELF, PE)build_id(ELF.note.gnu.build-id),pdb_guid/pe_imphash(Windows)sha256(whole‑file)compiler_fingerprint(e.g.,gcc-13.2,msvc-19.39)source_hint(optional: pname/version if known)
symbols
artifact_id (fk)symbol_nameaddr_start,addr_end(or RVA for PE)section,file_offset(optional)
vuln_segments
id (pk)cve_id(CVE‑YYYY‑NNNN)function_signature(normalized name + arity)byte_sig(short stable pattern around the vulnerable hunk)patch_sig(pattern from fixed hunk)evidence_ref(link to patch diff, commit, or NVD note)backport_flag(bool)introduced_in,fixed_in(semver-ish text; note "backport" when used)
matches
artifact_id (fk),vuln_segment_id (fk)match_type(byte,range,symbol)confidence(0–1)explain(why we think this matches)
reachability_hints
artifact_id (fk),symbol_namehint_type(imported,exported,hot,ebpf_seen,graph_core)weight(0–100)
How the oracle answers "Am I affected?"
-
Identify: Look up by Build‑ID / PE signature; fall back to file hash.
-
Locate: Map symbols → address ranges; scan for
byte_sig/patch_sig. -
Decide:
- if
patch_sigpresent ⇒ Not affected (backported). - if
byte_sigpresent and reachable (weighted) ⇒ Affected (prioritized). - if only
byte_sigpresent, unreachable ⇒ Affected (low priority). - if neither ⇒ Unknown.
- if
-
Explain: Attach
evidence_ref, the exact offsets, and the reason (match_type + reachability).
Ingestion pipeline (no humans in the loop)
- Fingerprinting: extract Build‑ID / PE GUID; compute
sha256. - Symbol map: parse DWARF/PDB if present; else fall back to heuristics (ELF
symtab, PE exports). - Patch intelligence: auto‑diff upstream commits (plus major distros) → synthesize short byte signatures around changed hunks (stable across relocations).
- Evidence links: store URLs/commit IDs for cross‑audit.
- Noise control: only accept a vuln signature if it hits N≥3 independent binaries across distros (tunable).
Deterministic verdicts (fit to Stella Ops)
- Inputs:
(artifact fingerprint, vuln_segments@version, reachability@policy) - Output: Signed OCI attestation "verdict.json" (same inputs → same verdict).
- Replay: keep rule bundle & feed hashes for audit.
- Backport precedence:
patch_sigbeats package version claims every time.
Fast path to MVP (2 sprints)
- Add a Build‑ID/PE indexer to Scanner.
- Teach Feedser/Vexer to ingest
vuln_segments(withbyte_sig/patch_sig). - Implement matching + verdict attestation; surface "Backported & Safe" vs "Affected & Reachable" badges in UI.
- Seed DB with 10 high‑impact CVEs (OpenSSL, zlib, xz, glibc, libxml2, curl, musl, busybox, OpenSSH, sudo).
Example: SQL skeleton (Postgres)
create table artifacts(
id bigserial primary key,
platform text, format text,
build_id text, pdb_guid text, pe_imphash text,
sha256 bytea not null unique,
compiler_fingerprint text, source_hint text
);
create table symbols(
artifact_id bigint references artifacts(id),
symbol_name text, addr_start bigint, addr_end bigint,
section text, file_offset bigint
);
create table vuln_segments(
id bigserial primary key,
cve_id text, function_signature text,
byte_sig bytea, patch_sig bytea,
evidence_ref text, backport_flag boolean,
introduced_in text, fixed_in text
);
create table matches(
artifact_id bigint references artifacts(id),
vuln_segment_id bigint references vuln_segments(id),
match_type text, confidence real, explain text
);
create table reachability_hints(
artifact_id bigint references artifacts(id),
symbol_name text, hint_type text, weight int
);