save progress
This commit is contained in:
@@ -0,0 +1,175 @@
|
||||
Here’s a compact, practical blueprint for a **binary‑fingerprint store + trust‑scoring engine** that lets you quickly tell whether a system binary is patched, backported, or risky—even fully offline.
|
||||
|
||||
# Why this matters (plain English)
|
||||
|
||||
Package versions lie (backports!). Instead of trusting names like `libssl 1.1.1k`, we trust **what’s inside**: build IDs, section hashes, compiler metadata, and signed provenance. With that, we can answer: *Is this exact binary known‑good, known‑bad, or unknown—on this distro, on this date, with these patches?*
|
||||
|
||||
---
|
||||
|
||||
# Core concept
|
||||
|
||||
* **Binary Fingerprint** = tuple of:
|
||||
|
||||
* **Build‑ID** (ELF/PE), if present.
|
||||
* **Section‑level hashes** (e.g., `.text`, `.rodata`, selected function ranges).
|
||||
* **Compiler/Linker metadata** (vendor/version, LTO flags, PIE/RELRO, sanitizer bits).
|
||||
* **Symbol graph sketch** (optional, min‑hash of exported symbol names + sizes).
|
||||
* **Feature toggles** (FIPS mode, CET/CFI present, Fortify level, RELRO type, SSP).
|
||||
* **Provenance Chain** (who built it): Upstream → Distro vendor (with patchset) → Local rebuild.
|
||||
* **Trust Score**: combines provenance weight + cryptographic attestations + “golden set” matches + observed patch deltas.
|
||||
|
||||
---
|
||||
|
||||
# Minimal architecture (fits Stella Ops style)
|
||||
|
||||
1. **Ingesters**
|
||||
|
||||
* `ingester.distro`: walks repo mirrors or local systems, extracts ELF/PE, computes fingerprints, captures package→file mapping, vendor patch metadata (changelog, source SRPM diffs).
|
||||
* `ingester.upstream`: indexes upstream releases, commit tags, and official build artifacts.
|
||||
* `ingester.local`: indexes CI outputs (your own builds), in‑toto/DSSE attestations if available.
|
||||
|
||||
2. **Fingerprint Store (offline‑ready)**
|
||||
|
||||
* **Primary DB**: PostgreSQL (authoritative).
|
||||
* **Accelerator**: Valkey (ephemeral) for fast lookup by Build‑ID and section hash prefixes.
|
||||
* **Bundle Export**: signed, chunked SQLite/Parquet packs for air‑gapped sites.
|
||||
|
||||
3. **Trust Engine**
|
||||
|
||||
* Scores (0–100) per binary instance using:
|
||||
|
||||
* Provenance weight (Upstream signed > Distro signed > Local unsigned).
|
||||
* Attestation presence/quality (in‑toto/DSSE, reproducible build stamp).
|
||||
* Patch alignment vs **Golden Set** (reference fingerprints for “fixed” and “vulnerable” builds).
|
||||
* Hardening baseline (RELRO/PIE/SSP/CET/CFI).
|
||||
* Divergence penalty (unexpected section deltas vs vendor‑declared patch).
|
||||
* Emits **Verdict**: `Patched`, `Likely Patched (Backport)`, `Unpatched`, `Unknown`, with rationale.
|
||||
|
||||
4. **Query APIs**
|
||||
|
||||
* `/lookup/by-buildid/{id}`
|
||||
* `/lookup/by-hash/{algo}/{prefix}`
|
||||
* `/classify` (batch): accepts an SBOM file list or live filesystem scan.
|
||||
* `/explain/{fingerprint}`: returns diff vs Golden Set and the proof trail.
|
||||
|
||||
---
|
||||
|
||||
# Data model (tables you can lift into Postgres)
|
||||
|
||||
* `artifact`
|
||||
`(artifact_id PK, file_sha256, size, mime, elf_machine, pe_machine, ts, signers[])`
|
||||
* `fingerprint`
|
||||
`(fp_id PK, artifact_id, build_id, text_hash, rodata_hash, sym_sketch, compiler_vendor, compiler_ver, lto, pie, relro, ssp, cfi, cet, flags jsonb)`
|
||||
* `provenance`
|
||||
`(prov_id PK, fp_id, origin ENUM('upstream','distro','local'), vendor, distro, release, package, version, source_commit, patchset jsonb, attestation_hash, attestation_quality_score)`
|
||||
* `golden_set`
|
||||
`(golden_id PK, package, cve, status ENUM('fixed','vulnerable'), fp_ref, method ENUM('vendor-advisory','diff-sig','function-patch'), notes)`
|
||||
* `trust_score`
|
||||
`(fp_id, score int, verdict, reasons jsonb, computed_at)`
|
||||
|
||||
Indexes: `(build_id)`, `(text_hash)`, `(rodata_hash)`, `(package, version)`, GIN on `patchset`, `reasons`.
|
||||
|
||||
---
|
||||
|
||||
# How detection works (fast path)
|
||||
|
||||
1. **Exact match**
|
||||
Build‑ID hit → join `golden_set` → return verdict + reason.
|
||||
2. **Near match (backport mode)**
|
||||
No Build‑ID match → compare `.text`/`.rodata` and function‑range hashes against “fixed” Golden Set:
|
||||
|
||||
* If patched function ranges match, mark **Likely Patched (Backport)**.
|
||||
* If vulnerable function ranges match, mark **Unpatched**.
|
||||
3. **Heuristic fallback**
|
||||
Symbol sketch + compiler metadata + hardening flags narrow candidate set; compute targeted function hashes only (don’t hash the whole file).
|
||||
|
||||
---
|
||||
|
||||
# Building the “Golden Set”
|
||||
|
||||
* Sources:
|
||||
|
||||
* Vendor advisories (per‑CVE “fixed in” builds).
|
||||
* Upstream tags containing the fix commit.
|
||||
* Distro SRPM diffs for backports (extract exact hunk regions; compute function‑range hashes pre/post).
|
||||
* Store **both**:
|
||||
|
||||
* “Fixed” fingerprints (post‑patch).
|
||||
* “Vulnerable” fingerprints (pre‑patch).
|
||||
* Annotate evidence method:
|
||||
|
||||
* `vendor-advisory` (strong), `diff-sig` (strong if clean hunk), `function-patch` (targeted).
|
||||
|
||||
---
|
||||
|
||||
# Trust scoring (example)
|
||||
|
||||
* Base by provenance:
|
||||
|
||||
* Upstream + signed + reproducible: **+40**
|
||||
* Distro signed with changelog & SRPM diff: **+30**
|
||||
* Local unsigned: **+10**
|
||||
* Attestations:
|
||||
|
||||
* Valid DSSE + in‑toto chain: **+20**
|
||||
* Reproducible build proof: **+10**
|
||||
* Golden Set alignment:
|
||||
|
||||
* Matches “fixed”: **+20**
|
||||
* Matches “vulnerable”: **−40**
|
||||
* Partial (patched functions match, rest differs): **+10**
|
||||
* Hardening:
|
||||
|
||||
* PIE/RELRO/SSP/CET/CFI each **+2** (cap +10)
|
||||
* Divergence penalties:
|
||||
|
||||
* Unexplained text‑section drift **−10**
|
||||
* Suspicious toolchain fingerprint **−5**
|
||||
|
||||
Verdict bands: `≥80 Patched`, `65–79 Likely Patched (Backport)`, `35–64 Unknown`, `<35 Unpatched`.
|
||||
|
||||
---
|
||||
|
||||
# CLI outline (Stella Ops‑style)
|
||||
|
||||
```bash
|
||||
# Index a filesystem or package repo
|
||||
stella-fp index /usr/bin /lib --out fp.db --bundle out.bundle.parquet
|
||||
|
||||
# Score a host (offline)
|
||||
stella-fp classify --fp-store fp.db --golden golden.db --out verdicts.json
|
||||
|
||||
# Explain a result
|
||||
stella-fp explain --fp <fp_id> --golden golden.db
|
||||
|
||||
# Maintain Golden Set
|
||||
stella-fp golden add --package openssl --cve CVE-2023-XXXX --status fixed --from-srpm path.src.rpm
|
||||
stella-fp golden add --package openssl --cve CVE-2023-XXXX --status vulnerable --from-upstream v1.1.1k
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Implementation notes (ELF/PE)
|
||||
|
||||
* **ELF**: read Build‑ID from `.note.gnu.build-id`; hash `.text` and selected function ranges (use DWARF/eh_frame or symbol table when present; otherwise lightweight linear‑sweep with sanity checks). Record RELRO/PIE from program headers.
|
||||
* **PE**: use Debug Directory (GUID/age) and Section Table; capture CFG/ASLR/NX/GS flags.
|
||||
* **Function‑range hashing**: normalize NOPs/padding, zero relocation slots, mask address‑relative operands (keeps hashes stable across vendor rebuilds).
|
||||
* **Performance**: cache per‑section hash; only compute function hashes when near‑match needs confirmation.
|
||||
|
||||
---
|
||||
|
||||
# How this plugs into your world
|
||||
|
||||
* **Sbomer/Vexer**: attach trust scores & verdicts to components in CycloneDX/SPDX; emit VEX statements like “Fixed by backport: evidence=diff‑sig, source=Astra/RedHat SRPM.”
|
||||
* **Feedser**: when CVE feed says “vulnerable by version,” override with binary proof from Golden Set.
|
||||
* **Policy Engine**: gate deployments on `verdict ∈ {Patched, Likely Patched}` OR `score ≥ 65`.
|
||||
|
||||
---
|
||||
|
||||
# Next steps you can action today
|
||||
|
||||
1. Create schemas above in Postgres; scaffold a small `stella-fp` Go/.NET tool to compute fingerprints for `/bin`, `/lib*` on one reference host (e.g., Debian + Alpine).
|
||||
2. Hand‑curate a **pilot Golden Set** for 3 noisy CVEs (OpenSSL, glibc, curl). Store both pre/post patch fingerprints and 2–3 backported vendor builds each.
|
||||
3. Wire a `classify` step into your CI/CD and surface the **verdict + rationale** in your VEX output.
|
||||
|
||||
If you want, I can drop in starter code (C#/.NET 10) for the fingerprint extractor and the Postgres schema migration, plus a tiny “function‑range hasher” that masks relocations and normalizes padding.
|
||||
Reference in New Issue
Block a user