7.6 KiB
Here’s a compact, practical blueprint for a binary‑fingerprint store + trust‑scoring engine that lets you quickly tell whether a system binary is patched, backported, or risky—even fully offline.
Why this matters (plain English)
Package versions lie (backports!). Instead of trusting names like libssl 1.1.1k, we trust what’s inside: build IDs, section hashes, compiler metadata, and signed provenance. With that, we can answer: Is this exact binary known‑good, known‑bad, or unknown—on this distro, on this date, with these patches?
Core concept
-
Binary Fingerprint = tuple of:
- Build‑ID (ELF/PE), if present.
- Section‑level hashes (e.g.,
.text,.rodata, selected function ranges). - Compiler/Linker metadata (vendor/version, LTO flags, PIE/RELRO, sanitizer bits).
- Symbol graph sketch (optional, min‑hash of exported symbol names + sizes).
- Feature toggles (FIPS mode, CET/CFI present, Fortify level, RELRO type, SSP).
-
Provenance Chain (who built it): Upstream → Distro vendor (with patchset) → Local rebuild.
-
Trust Score: combines provenance weight + cryptographic attestations + “golden set” matches + observed patch deltas.
Minimal architecture (fits Stella Ops style)
-
Ingesters
ingester.distro: walks repo mirrors or local systems, extracts ELF/PE, computes fingerprints, captures package→file mapping, vendor patch metadata (changelog, source SRPM diffs).ingester.upstream: indexes upstream releases, commit tags, and official build artifacts.ingester.local: indexes CI outputs (your own builds), in‑toto/DSSE attestations if available.
-
Fingerprint Store (offline‑ready)
- Primary DB: PostgreSQL (authoritative).
- Accelerator: Valkey (ephemeral) for fast lookup by Build‑ID and section hash prefixes.
- Bundle Export: signed, chunked SQLite/Parquet packs for air‑gapped sites.
-
Trust Engine
-
Scores (0–100) per binary instance using:
- Provenance weight (Upstream signed > Distro signed > Local unsigned).
- Attestation presence/quality (in‑toto/DSSE, reproducible build stamp).
- Patch alignment vs Golden Set (reference fingerprints for “fixed” and “vulnerable” builds).
- Hardening baseline (RELRO/PIE/SSP/CET/CFI).
- Divergence penalty (unexpected section deltas vs vendor‑declared patch).
-
Emits Verdict:
Patched,Likely Patched (Backport),Unpatched,Unknown, with rationale.
-
-
Query APIs
/lookup/by-buildid/{id}/lookup/by-hash/{algo}/{prefix}/classify(batch): accepts an SBOM file list or live filesystem scan./explain/{fingerprint}: returns diff vs Golden Set and the proof trail.
Data model (tables you can lift into Postgres)
artifact(artifact_id PK, file_sha256, size, mime, elf_machine, pe_machine, ts, signers[])fingerprint(fp_id PK, artifact_id, build_id, text_hash, rodata_hash, sym_sketch, compiler_vendor, compiler_ver, lto, pie, relro, ssp, cfi, cet, flags jsonb)provenance(prov_id PK, fp_id, origin ENUM('upstream','distro','local'), vendor, distro, release, package, version, source_commit, patchset jsonb, attestation_hash, attestation_quality_score)golden_set(golden_id PK, package, cve, status ENUM('fixed','vulnerable'), fp_ref, method ENUM('vendor-advisory','diff-sig','function-patch'), notes)trust_score(fp_id, score int, verdict, reasons jsonb, computed_at)
Indexes: (build_id), (text_hash), (rodata_hash), (package, version), GIN on patchset, reasons.
How detection works (fast path)
-
Exact match Build‑ID hit → join
golden_set→ return verdict + reason. -
Near match (backport mode) No Build‑ID match → compare
.text/.rodataand function‑range hashes against “fixed” Golden Set:- If patched function ranges match, mark Likely Patched (Backport).
- If vulnerable function ranges match, mark Unpatched.
-
Heuristic fallback Symbol sketch + compiler metadata + hardening flags narrow candidate set; compute targeted function hashes only (don’t hash the whole file).
Building the “Golden Set”
-
Sources:
- Vendor advisories (per‑CVE “fixed in” builds).
- Upstream tags containing the fix commit.
- Distro SRPM diffs for backports (extract exact hunk regions; compute function‑range hashes pre/post).
-
Store both:
- “Fixed” fingerprints (post‑patch).
- “Vulnerable” fingerprints (pre‑patch).
-
Annotate evidence method:
vendor-advisory(strong),diff-sig(strong if clean hunk),function-patch(targeted).
Trust scoring (example)
-
Base by provenance:
- Upstream + signed + reproducible: +40
- Distro signed with changelog & SRPM diff: +30
- Local unsigned: +10
-
Attestations:
- Valid DSSE + in‑toto chain: +20
- Reproducible build proof: +10
-
Golden Set alignment:
- Matches “fixed”: +20
- Matches “vulnerable”: −40
- Partial (patched functions match, rest differs): +10
-
Hardening:
- PIE/RELRO/SSP/CET/CFI each +2 (cap +10)
-
Divergence penalties:
- Unexplained text‑section drift −10
- Suspicious toolchain fingerprint −5
Verdict bands: ≥80 Patched, 65–79 Likely Patched (Backport), 35–64 Unknown, <35 Unpatched.
CLI outline (Stella Ops‑style)
# Index a filesystem or package repo
stella-fp index /usr/bin /lib --out fp.db --bundle out.bundle.parquet
# Score a host (offline)
stella-fp classify --fp-store fp.db --golden golden.db --out verdicts.json
# Explain a result
stella-fp explain --fp <fp_id> --golden golden.db
# Maintain Golden Set
stella-fp golden add --package openssl --cve CVE-2023-XXXX --status fixed --from-srpm path.src.rpm
stella-fp golden add --package openssl --cve CVE-2023-XXXX --status vulnerable --from-upstream v1.1.1k
Implementation notes (ELF/PE)
- ELF: read Build‑ID from
.note.gnu.build-id; hash.textand selected function ranges (use DWARF/eh_frame or symbol table when present; otherwise lightweight linear‑sweep with sanity checks). Record RELRO/PIE from program headers. - PE: use Debug Directory (GUID/age) and Section Table; capture CFG/ASLR/NX/GS flags.
- Function‑range hashing: normalize NOPs/padding, zero relocation slots, mask address‑relative operands (keeps hashes stable across vendor rebuilds).
- Performance: cache per‑section hash; only compute function hashes when near‑match needs confirmation.
How this plugs into your world
- Sbomer/Vexer: attach trust scores & verdicts to components in CycloneDX/SPDX; emit VEX statements like “Fixed by backport: evidence=diff‑sig, source=Astra/RedHat SRPM.”
- Feedser: when CVE feed says “vulnerable by version,” override with binary proof from Golden Set.
- Policy Engine: gate deployments on
verdict ∈ {Patched, Likely Patched}ORscore ≥ 65.
Next steps you can action today
- Create schemas above in Postgres; scaffold a small
stella-fpGo/.NET tool to compute fingerprints for/bin,/lib*on one reference host (e.g., Debian + Alpine). - Hand‑curate a pilot Golden Set for 3 noisy CVEs (OpenSSL, glibc, curl). Store both pre/post patch fingerprints and 2–3 backported vendor builds each.
- Wire a
classifystep into your CI/CD and surface the verdict + rationale in your VEX output.
If you want, I can drop in starter code (C#/.NET 10) for the fingerprint extractor and the Postgres schema migration, plus a tiny “function‑range hasher” that masks relocations and normalizes padding.