Files
git.stella-ops.org/docs/product-advisories/26-Dec-2026 - Mapping a Binary Intelligence Graph.md
StellaOps Bot 7792749bb4 feat: Add archived advisories and implement smart-diff as a core evidence primitive
- Introduced new advisory documents for archived superseded advisories, including detailed descriptions of features already implemented or covered by existing sprints.
- Added "Smart-Diff as a Core Evidence Primitive" advisory outlining the treatment of SBOM diffs as first-class evidence objects, enhancing vulnerability verdicts with deterministic replayability.
- Created "Visual Diffs for Explainable Triage" advisory to improve user experience in understanding policy decisions and reachability changes through visual diffs.
- Implemented "Weighted Confidence for VEX Sources" advisory to rank conflicting vulnerability evidence based on freshness and confidence, facilitating better decision-making.
- Established a signer module charter detailing the mission, expectations, key components, and signing modes for cryptographic signing services in StellaOps.
- Consolidated overlapping concepts from triage UI, visual diffs, and risk budget visualization advisories into a unified specification for better clarity and implementation tracking.
2025-12-26 13:01:43 +02:00

6.7 KiB
Raw Blame History

Mapping a Binary Intelligence Graph

Status: SUPERSEDED Date: 2026-12-26 Updated: 2025-12-26 Superseded By: BinaryIndex Module Architecture Related Sprints: SPRINT_20251226_011_BINIDX_known_build_catalog.md, SPRINT_20251226_012_BINIDX_backport_handling.md, SPRINT_20251226_013_BINIDX_fingerprint_factory.md, SPRINT_20251226_014_BINIDX_scanner_integration.md


Supersession Notice

This advisory has been superseded by the comprehensive BinaryIndex module architecture. All proposals in this advisory are covered by the existing design:

Advisory Proposal Implementation Location
artifacts table binaries.binary_identity docs/modules/binaryindex/architecture.md
symbols table BinaryFeatures in IBinaryFeatureExtractor src/BinaryIndex/__Libraries/.../Services/
vuln_segments (byte_sig/patch_sig) VulnFingerprint model src/BinaryIndex/__Libraries/.../Fingerprints/
matches table FingerprintMatch model src/BinaryIndex/__Libraries/.../Fingerprints/
reachability_hints ReachabilityStatus enum src/BinaryIndex/__Libraries/.../Models/
Build-ID/PE indexer ElfFeatureExtractor, IBinaryFeatureExtractor src/BinaryIndex/__Libraries/.../Services/
Patch-aware handling FixEvidence, changelog/patch parsers src/BinaryIndex/__Libraries/.../FixIndex/
Corpus connectors DebianCorpusConnector, IBinaryCorpusConnector src/BinaryIndex/__Libraries/.../Corpus/
  • 18-Dec-2025 - Building Better Binary Mapping and CallStack Reachability.md
  • 23-Dec-2026 - Binary Mapping as Attestable Proof.md
  • 25-Dec-2025 - Evolving Evidence Models for Reachability.md - Runtime → build braid, eBPF sampling

Original Advisory Content

Here's a compact blueprint for a binarylevel knowledge base that maps ELF BuildIDs / PE signatures to vulnerable functions, patch lineage, and reachability hints—so your scanner can act like a provenanceaware "binary oracle," not just a CVE lookup.


Why this matters (in plain terms)

  • Same version ≠ same risk. Distros (and vendors) frequently backport fixes without bumping versions. Only the binary tells the truth.
  • Functionlevel matching turns noisy "package has CVE" into precise "this exact function range is vulnerable in your binary."
  • Reachability hints cut triage noise by ranking vulns the code path can actually hit at runtime.

Minimal starter schema (MVP)

Keep it tiny so it grows with real evidence:

artifacts

  • id (pk)
  • platform (linux, windows)
  • format (ELF, PE)
  • build_id (ELF .note.gnu.build-id), pdb_guid / pe_imphash (Windows)
  • sha256 (wholefile)
  • compiler_fingerprint (e.g., gcc-13.2, msvc-19.39)
  • source_hint (optional: pname/version if known)

symbols

  • artifact_id (fk)
  • symbol_name
  • addr_start, addr_end (or RVA for PE)
  • section, file_offset (optional)

vuln_segments

  • id (pk)
  • cve_id (CVEYYYYNNNN)
  • function_signature (normalized name + arity)
  • byte_sig (short stable pattern around the vulnerable hunk)
  • patch_sig (pattern from fixed hunk)
  • evidence_ref (link to patch diff, commit, or NVD note)
  • backport_flag (bool)
  • introduced_in, fixed_in (semver-ish text; note "backport" when used)

matches

  • artifact_id (fk), vuln_segment_id (fk)
  • match_type (byte, range, symbol)
  • confidence (01)
  • explain (why we think this matches)

reachability_hints

  • artifact_id (fk), symbol_name
  • hint_type (imported, exported, hot, ebpf_seen, graph_core)
  • weight (0100)

How the oracle answers "Am I affected?"

  1. Identify: Look up by BuildID / PE signature; fall back to file hash.

  2. Locate: Map symbols → address ranges; scan for byte_sig/patch_sig.

  3. Decide:

    • if patch_sig present ⇒ Not affected (backported).
    • if byte_sig present and reachable (weighted) ⇒ Affected (prioritized).
    • if only byte_sig present, unreachable ⇒ Affected (low priority).
    • if neither ⇒ Unknown.
  4. Explain: Attach evidence_ref, the exact offsets, and the reason (match_type + reachability).


Ingestion pipeline (no humans in the loop)

  • Fingerprinting: extract BuildID / PE GUID; compute sha256.
  • Symbol map: parse DWARF/PDB if present; else fall back to heuristics (ELF symtab, PE exports).
  • Patch intelligence: autodiff upstream commits (plus major distros) → synthesize short byte signatures around changed hunks (stable across relocations).
  • Evidence links: store URLs/commit IDs for crossaudit.
  • Noise control: only accept a vuln signature if it hits N≥3 independent binaries across distros (tunable).

Deterministic verdicts (fit to Stella Ops)

  • Inputs: (artifact fingerprint, vuln_segments@version, reachability@policy)
  • Output: Signed OCI attestation "verdict.json" (same inputs → same verdict).
  • Replay: keep rule bundle & feed hashes for audit.
  • Backport precedence: patch_sig beats package version claims every time.

Fast path to MVP (2 sprints)

  • Add a BuildID/PE indexer to Scanner.
  • Teach Feedser/Vexer to ingest vuln_segments (with byte_sig/patch_sig).
  • Implement matching + verdict attestation; surface "Backported & Safe" vs "Affected & Reachable" badges in UI.
  • Seed DB with 10 highimpact CVEs (OpenSSL, zlib, xz, glibc, libxml2, curl, musl, busybox, OpenSSH, sudo).

Example: SQL skeleton (Postgres)

create table artifacts(
  id bigserial primary key,
  platform text, format text,
  build_id text, pdb_guid text, pe_imphash text,
  sha256 bytea not null unique,
  compiler_fingerprint text, source_hint text
);

create table symbols(
  artifact_id bigint references artifacts(id),
  symbol_name text, addr_start bigint, addr_end bigint,
  section text, file_offset bigint
);

create table vuln_segments(
  id bigserial primary key,
  cve_id text, function_signature text,
  byte_sig bytea, patch_sig bytea,
  evidence_ref text, backport_flag boolean,
  introduced_in text, fixed_in text
);

create table matches(
  artifact_id bigint references artifacts(id),
  vuln_segment_id bigint references vuln_segments(id),
  match_type text, confidence real, explain text
);

create table reachability_hints(
  artifact_id bigint references artifacts(id),
  symbol_name text, hint_type text, weight int
);