fixes save
This commit is contained in:
@@ -0,0 +1,175 @@
|
||||
# Mapping a Binary Intelligence Graph
|
||||
|
||||
> **Status:** SUPERSEDED
|
||||
> **Date:** 2026-12-26
|
||||
> **Updated:** 2025-12-26
|
||||
> **Superseded By:** BinaryIndex Module Architecture
|
||||
> **Related Sprints:** [`SPRINT_20251226_011_BINIDX_known_build_catalog.md`](../implplan/SPRINT_20251226_011_BINIDX_known_build_catalog.md), [`SPRINT_20251226_012_BINIDX_backport_handling.md`](../implplan/SPRINT_20251226_012_BINIDX_backport_handling.md), [`SPRINT_20251226_013_BINIDX_fingerprint_factory.md`](../implplan/SPRINT_20251226_013_BINIDX_fingerprint_factory.md), [`SPRINT_20251226_014_BINIDX_scanner_integration.md`](../implplan/SPRINT_20251226_014_BINIDX_scanner_integration.md)
|
||||
|
||||
---
|
||||
|
||||
## Supersession Notice
|
||||
|
||||
This advisory has been **superseded** by the comprehensive BinaryIndex module architecture. All proposals in this advisory are covered by the existing design:
|
||||
|
||||
| Advisory Proposal | Implementation | Location |
|
||||
|-------------------|----------------|----------|
|
||||
| artifacts table | `binaries.binary_identity` | `docs/modules/binaryindex/architecture.md` |
|
||||
| symbols table | `BinaryFeatures` in `IBinaryFeatureExtractor` | `src/BinaryIndex/__Libraries/.../Services/` |
|
||||
| vuln_segments (byte_sig/patch_sig) | `VulnFingerprint` model | `src/BinaryIndex/__Libraries/.../Fingerprints/` |
|
||||
| matches table | `FingerprintMatch` model | `src/BinaryIndex/__Libraries/.../Fingerprints/` |
|
||||
| reachability_hints | `ReachabilityStatus` enum | `src/BinaryIndex/__Libraries/.../Models/` |
|
||||
| Build-ID/PE indexer | `ElfFeatureExtractor`, `IBinaryFeatureExtractor` | `src/BinaryIndex/__Libraries/.../Services/` |
|
||||
| Patch-aware handling | `FixEvidence`, changelog/patch parsers | `src/BinaryIndex/__Libraries/.../FixIndex/` |
|
||||
| Corpus connectors | `DebianCorpusConnector`, `IBinaryCorpusConnector` | `src/BinaryIndex/__Libraries/.../Corpus/` |
|
||||
|
||||
### Related Archived Advisories
|
||||
|
||||
- `18-Dec-2025 - Building Better Binary Mapping and Call‑Stack Reachability.md`
|
||||
- `23-Dec-2026 - Binary Mapping as Attestable Proof.md`
|
||||
|
||||
### Related Active Advisories
|
||||
|
||||
- `25-Dec-2025 - Evolving Evidence Models for Reachability.md` - Runtime → build braid, eBPF sampling
|
||||
|
||||
---
|
||||
|
||||
## Original Advisory Content
|
||||
|
||||
Here's a compact blueprint for a **binary‑level knowledge base** that maps ELF Build‑IDs / PE signatures to vulnerable functions, patch lineage, and reachability hints—so your scanner can act like a provenance‑aware "binary oracle," not just a CVE lookup.
|
||||
|
||||
---
|
||||
|
||||
# Why this matters (in plain terms)
|
||||
|
||||
* **Same version ≠ same risk.** Distros (and vendors) frequently **backport** fixes without bumping versions. Only the **binary** tells the truth.
|
||||
* **Function‑level matching** turns noisy "package has CVE" into precise "this exact function range is vulnerable in your binary."
|
||||
* **Reachability hints** cut triage noise by ranking vulns the code path can actually hit at runtime.
|
||||
|
||||
---
|
||||
|
||||
# Minimal starter schema (MVP)
|
||||
|
||||
Keep it tiny so it grows with real evidence:
|
||||
|
||||
**artifacts**
|
||||
|
||||
* `id (pk)`
|
||||
* `platform` (linux, windows)
|
||||
* `format` (ELF, PE)
|
||||
* `build_id` (ELF `.note.gnu.build-id`), `pdb_guid` / `pe_imphash` (Windows)
|
||||
* `sha256` (whole‑file)
|
||||
* `compiler_fingerprint` (e.g., `gcc-13.2`, `msvc-19.39`)
|
||||
* `source_hint` (optional: pname/version if known)
|
||||
|
||||
**symbols**
|
||||
|
||||
* `artifact_id (fk)`
|
||||
* `symbol_name`
|
||||
* `addr_start`, `addr_end` (or RVA for PE)
|
||||
* `section`, `file_offset` (optional)
|
||||
|
||||
**vuln_segments**
|
||||
|
||||
* `id (pk)`
|
||||
* `cve_id` (CVE‑YYYY‑NNNN)
|
||||
* `function_signature` (normalized name + arity)
|
||||
* `byte_sig` (short stable pattern around the vulnerable hunk)
|
||||
* `patch_sig` (pattern from fixed hunk)
|
||||
* `evidence_ref` (link to patch diff, commit, or NVD note)
|
||||
* `backport_flag` (bool)
|
||||
* `introduced_in`, `fixed_in` (semver-ish text; note "backport" when used)
|
||||
|
||||
**matches**
|
||||
|
||||
* `artifact_id (fk)`, `vuln_segment_id (fk)`
|
||||
* `match_type` (`byte`, `range`, `symbol`)
|
||||
* `confidence` (0–1)
|
||||
* `explain` (why we think this matches)
|
||||
|
||||
**reachability_hints**
|
||||
|
||||
* `artifact_id (fk)`, `symbol_name`
|
||||
* `hint_type` (`imported`, `exported`, `hot`, `ebpf_seen`, `graph_core`)
|
||||
* `weight` (0–100)
|
||||
|
||||
---
|
||||
|
||||
# How the oracle answers "Am I affected?"
|
||||
|
||||
1. **Identify**: Look up by Build‑ID / PE signature; fall back to file hash.
|
||||
2. **Locate**: Map symbols → address ranges; scan for `byte_sig`/`patch_sig`.
|
||||
3. **Decide**:
|
||||
|
||||
* if `patch_sig` present ⇒ **Not affected (backported)**.
|
||||
* if `byte_sig` present and reachable (weighted) ⇒ **Affected (prioritized)**.
|
||||
* if only `byte_sig` present, unreachable ⇒ **Affected (low priority)**.
|
||||
* if neither ⇒ **Unknown**.
|
||||
4. **Explain**: Attach `evidence_ref`, the exact offsets, and the reason (match_type + reachability).
|
||||
|
||||
---
|
||||
|
||||
# Ingestion pipeline (no humans in the loop)
|
||||
|
||||
* **Fingerprinting**: extract Build‑ID / PE GUID; compute `sha256`.
|
||||
* **Symbol map**: parse DWARF/PDB if present; else fall back to heuristics (ELF `symtab`, PE exports).
|
||||
* **Patch intelligence**: auto‑diff upstream commits (plus major distros) → synthesize short **byte signatures** around changed hunks (stable across relocations).
|
||||
* **Evidence links**: store URLs/commit IDs for cross‑audit.
|
||||
* **Noise control**: only accept a vuln signature if it hits N≥3 independent binaries across distros (tunable).
|
||||
|
||||
---
|
||||
|
||||
# Deterministic verdicts (fit to Stella Ops)
|
||||
|
||||
* **Inputs**: `(artifact fingerprint, vuln_segments@version, reachability@policy)`
|
||||
* **Output**: **Signed OCI attestation** "verdict.json" (same inputs → same verdict).
|
||||
* **Replay**: keep rule bundle & feed hashes for audit.
|
||||
* **Backport precedence**: `patch_sig` beats package version claims every time.
|
||||
|
||||
---
|
||||
|
||||
# Fast path to MVP (2 sprints)
|
||||
|
||||
* Add a **Build‑ID/PE indexer** to Scanner.
|
||||
* Teach Feedser/Vexer to ingest `vuln_segments` (with `byte_sig`/`patch_sig`).
|
||||
* Implement matching + verdict attestation; surface **"Backported & Safe"** vs **"Affected & Reachable"** badges in UI.
|
||||
* Seed DB with 10 high‑impact CVEs (OpenSSL, zlib, xz, glibc, libxml2, curl, musl, busybox, OpenSSH, sudo).
|
||||
|
||||
---
|
||||
|
||||
# Example: SQL skeleton (Postgres)
|
||||
|
||||
```sql
|
||||
create table artifacts(
|
||||
id bigserial primary key,
|
||||
platform text, format text,
|
||||
build_id text, pdb_guid text, pe_imphash text,
|
||||
sha256 bytea not null unique,
|
||||
compiler_fingerprint text, source_hint text
|
||||
);
|
||||
|
||||
create table symbols(
|
||||
artifact_id bigint references artifacts(id),
|
||||
symbol_name text, addr_start bigint, addr_end bigint,
|
||||
section text, file_offset bigint
|
||||
);
|
||||
|
||||
create table vuln_segments(
|
||||
id bigserial primary key,
|
||||
cve_id text, function_signature text,
|
||||
byte_sig bytea, patch_sig bytea,
|
||||
evidence_ref text, backport_flag boolean,
|
||||
introduced_in text, fixed_in text
|
||||
);
|
||||
|
||||
create table matches(
|
||||
artifact_id bigint references artifacts(id),
|
||||
vuln_segment_id bigint references vuln_segments(id),
|
||||
match_type text, confidence real, explain text
|
||||
);
|
||||
|
||||
create table reachability_hints(
|
||||
artifact_id bigint references artifacts(id),
|
||||
symbol_name text, hint_type text, weight int
|
||||
);
|
||||
```
|
||||
Reference in New Issue
Block a user