save progress

2026-01-03 12:41:57 +02:00
parent 83c37243e0
commit d486d41a48
48 changed files with 7174 additions and 1086 deletions
--- a/docs/product-advisories/03-Dec-2026
+++ b/docs/product-advisories/03-Dec-2026
@@ -0,0 +1,175 @@
+Here’s a compact, practical blueprint for a **binary‑fingerprint store + trust‑scoring engine** that lets you quickly tell whether a system binary is patched, backported, or risky—even fully offline.
+
+# Why this matters (plain English)
+
+Package versions lie (backports!). Instead of trusting names like `libssl 1.1.1k`, we trust **what’s inside**: build IDs, section hashes, compiler metadata, and signed provenance. With that, we can answer: *Is this exact binary known‑good, known‑bad, or unknown—on this distro, on this date, with these patches?*
+
+---
+
+# Core concept
+
+* **Binary Fingerprint** = tuple of:
+
+  * **Build‑ID** (ELF/PE), if present.
+  * **Section‑level hashes** (e.g., `.text`, `.rodata`, selected function ranges).
+  * **Compiler/Linker metadata** (vendor/version, LTO flags, PIE/RELRO, sanitizer bits).
+  * **Symbol graph sketch** (optional, min‑hash of exported symbol names + sizes).
+  * **Feature toggles** (FIPS mode, CET/CFI present, Fortify level, RELRO type, SSP).
+* **Provenance Chain** (who built it): Upstream → Distro vendor (with patchset) → Local rebuild.
+* **Trust Score**: combines provenance weight + cryptographic attestations + “golden set” matches + observed patch deltas.
+
+---
+
+# Minimal architecture (fits Stella Ops style)
+
+1. **Ingesters**
+
+   * `ingester.distro`: walks repo mirrors or local systems, extracts ELF/PE, computes fingerprints, captures package→file mapping, vendor patch metadata (changelog, source SRPM diffs).
+   * `ingester.upstream`: indexes upstream releases, commit tags, and official build artifacts.
+   * `ingester.local`: indexes CI outputs (your own builds), in‑toto/DSSE attestations if available.
+
+2. **Fingerprint Store (offline‑ready)**
+
+   * **Primary DB**: PostgreSQL (authoritative).
+   * **Accelerator**: Valkey (ephemeral) for fast lookup by Build‑ID and section hash prefixes.
+   * **Bundle Export**: signed, chunked SQLite/Parquet packs for air‑gapped sites.
+
+3. **Trust Engine**
+
+   * Scores (0–100) per binary instance using:
+
+     * Provenance weight (Upstream signed > Distro signed > Local unsigned).
+     * Attestation presence/quality (in‑toto/DSSE, reproducible build stamp).
+     * Patch alignment vs **Golden Set** (reference fingerprints for “fixed” and “vulnerable” builds).
+     * Hardening baseline (RELRO/PIE/SSP/CET/CFI).
+     * Divergence penalty (unexpected section deltas vs vendor‑declared patch).
+   * Emits **Verdict**: `Patched`, `Likely Patched (Backport)`, `Unpatched`, `Unknown`, with rationale.
+
+4. **Query APIs**
+
+   * `/lookup/by-buildid/{id}`
+   * `/lookup/by-hash/{algo}/{prefix}`
+   * `/classify` (batch): accepts an SBOM file list or live filesystem scan.
+   * `/explain/{fingerprint}`: returns diff vs Golden Set and the proof trail.
+
+---
+
+# Data model (tables you can lift into Postgres)
+
+* `artifact`
+  `(artifact_id PK, file_sha256, size, mime, elf_machine, pe_machine, ts, signers[])`
+* `fingerprint`
+  `(fp_id PK, artifact_id, build_id, text_hash, rodata_hash, sym_sketch, compiler_vendor, compiler_ver, lto, pie, relro, ssp, cfi, cet, flags jsonb)`
+* `provenance`
+  `(prov_id PK, fp_id, origin ENUM('upstream','distro','local'), vendor, distro, release, package, version, source_commit, patchset jsonb, attestation_hash, attestation_quality_score)`
+* `golden_set`
+  `(golden_id PK, package, cve, status ENUM('fixed','vulnerable'), fp_ref, method ENUM('vendor-advisory','diff-sig','function-patch'), notes)`
+* `trust_score`
+  `(fp_id, score int, verdict, reasons jsonb, computed_at)`
+
+Indexes: `(build_id)`, `(text_hash)`, `(rodata_hash)`, `(package, version)`, GIN on `patchset`, `reasons`.
+
+---
+
+# How detection works (fast path)
+
+1. **Exact match**
+   Build‑ID hit → join `golden_set` → return verdict + reason.
+2. **Near match (backport mode)**
+   No Build‑ID match → compare `.text`/`.rodata` and function‑range hashes against “fixed” Golden Set:
+
+   * If patched function ranges match, mark **Likely Patched (Backport)**.
+   * If vulnerable function ranges match, mark **Unpatched**.
+3. **Heuristic fallback**
+   Symbol sketch + compiler metadata + hardening flags narrow candidate set; compute targeted function hashes only (don’t hash the whole file).
+
+---
+
+# Building the “Golden Set”
+
+* Sources:
+
+  * Vendor advisories (per‑CVE “fixed in” builds).
+  * Upstream tags containing the fix commit.
+  * Distro SRPM diffs for backports (extract exact hunk regions; compute function‑range hashes pre/post).
+* Store **both**:
+
+  * “Fixed” fingerprints (post‑patch).
+  * “Vulnerable” fingerprints (pre‑patch).
+* Annotate evidence method:
+
+  * `vendor-advisory` (strong), `diff-sig` (strong if clean hunk), `function-patch` (targeted).
+
+---
+
+# Trust scoring (example)
+
+* Base by provenance:
+
+  * Upstream + signed + reproducible: **+40**
+  * Distro signed with changelog & SRPM diff: **+30**
+  * Local unsigned: **+10**
+* Attestations:
+
+  * Valid DSSE + in‑toto chain: **+20**
+  * Reproducible build proof: **+10**
+* Golden Set alignment:
+
+  * Matches “fixed”: **+20**
+  * Matches “vulnerable”: **−40**
+  * Partial (patched functions match, rest differs): **+10**
+* Hardening:
+
+  * PIE/RELRO/SSP/CET/CFI each **+2** (cap +10)
+* Divergence penalties:
+
+  * Unexplained text‑section drift **−10**
+  * Suspicious toolchain fingerprint **−5**
+
+Verdict bands: `≥80 Patched`, `65–79 Likely Patched (Backport)`, `35–64 Unknown`, `<35 Unpatched`.
+
+---
+
+# CLI outline (Stella Ops‑style)
+
+```bash
+# Index a filesystem or package repo
+stella-fp index /usr/bin /lib --out fp.db --bundle out.bundle.parquet
+
+# Score a host (offline)
+stella-fp classify --fp-store fp.db --golden golden.db --out verdicts.json
+
+# Explain a result
+stella-fp explain --fp <fp_id> --golden golden.db
+
+# Maintain Golden Set
+stella-fp golden add --package openssl --cve CVE-2023-XXXX --status fixed --from-srpm path.src.rpm
+stella-fp golden add --package openssl --cve CVE-2023-XXXX --status vulnerable --from-upstream v1.1.1k
+```
+
+---
+
+# Implementation notes (ELF/PE)
+
+* **ELF**: read Build‑ID from `.note.gnu.build-id`; hash `.text` and selected function ranges (use DWARF/eh_frame or symbol table when present; otherwise lightweight linear‑sweep with sanity checks). Record RELRO/PIE from program headers.
+* **PE**: use Debug Directory (GUID/age) and Section Table; capture CFG/ASLR/NX/GS flags.
+* **Function‑range hashing**: normalize NOPs/padding, zero relocation slots, mask address‑relative operands (keeps hashes stable across vendor rebuilds).
+* **Performance**: cache per‑section hash; only compute function hashes when near‑match needs confirmation.
+
+---
+
+# How this plugs into your world
+
+* **Sbomer/Vexer**: attach trust scores & verdicts to components in CycloneDX/SPDX; emit VEX statements like “Fixed by backport: evidence=diff‑sig, source=Astra/RedHat SRPM.”
+* **Feedser**: when CVE feed says “vulnerable by version,” override with binary proof from Golden Set.
+* **Policy Engine**: gate deployments on `verdict ∈ {Patched, Likely Patched}` OR `score ≥ 65`.
+
+---
+
+# Next steps you can action today
+
+1. Create schemas above in Postgres; scaffold a small `stella-fp` Go/.NET tool to compute fingerprints for `/bin`, `/lib*` on one reference host (e.g., Debian + Alpine).
+2. Hand‑curate a **pilot Golden Set** for 3 noisy CVEs (OpenSSL, glibc, curl). Store both pre/post patch fingerprints and 2–3 backported vendor builds each.
+3. Wire a `classify` step into your CI/CD and surface the **verdict + rationale** in your VEX output.
+
+If you want, I can drop in starter code (C#/.NET 10) for the fingerprint extractor and the Postgres schema migration, plus a tiny “function‑range hasher” that masks relocations and normalizes padding.
--- a/docs/product-advisories/03-Dec-2026
+++ b/docs/product-advisories/03-Dec-2026
@@ -0,0 +1,153 @@
+Here’s a tight, practical plan to add **deterministic binary‑patch evidence** to Stella Ops by integrating **B2R2** (IR lifter/disassembler for .NET/F#) into your scanning pipeline, then feeding stable “diff signatures” into your **VEX Resolver**.
+
+# What & why (one minute)
+
+* **Goal:** Prove (offline) that a distro backport truly patched a CVE—even if version strings look “vulnerable”—by comparing *what the CPU will execute* before/after a patch.
+* **How:** Lift binaries to a normalized IR with **B2R2**, canonicalize semantics (strip address noise, relocations, NOPs, padding), **bucket** by function and **hash** stable opcode/semantics. Patch deltas become small, reproducible evidence blobs your VEX engine can consume.
+
+# High‑level flow
+
+1. **Collect**: For each package/artifact, grab: *installed binary*, *claimed patched reference* (vendor’s patched ELF/PE or your golden set), and optional *original vulnerable build*.
+2. **Lift**: Use B2R2 to disassemble → lift to **LIR**/**SSA** (arch‑agnostic).
+3. **Normalize** (deterministic):
+
+   * Strip addrs/symbols/relocations; fold NOPs; normalize register aliases; constant‑prop + dead‑code elim; canonical call/ret; normalize PLT stubs; elide alignment/padding.
+4. **Segment**: Per‑function IR slices bounded by CFG; compute **stable function IDs** = `SHA256(package@version, build-id, arch, fn-cfg-shape)`.
+5. **Hashing**:
+
+   * **Opcode hash**: SHA256 of normalized opcode stream.
+   * **Semantic hash**: SHA256 of (basic‑block graph + dataflow summaries).
+   * **Const set hash**: extracted immediate set (range‑bucketed) to detect patched lookups.
+6. **Diff**:
+
+   * Compare (patched vs baseline) per function: unchanged / changed / added / removed.
+   * For changed: emit **delta record** with before/after hashes and minimal edit script (block‑level).
+7. **Evidence object** (deterministic, replayable):
+
+   * `type: "disasm.patch-evidence@1"`
+   * inputs: file digests (SHA256/SHA3‑256), Build‑ID, arch, toolchain versions, B2R2 commit, normalization profile ID
+   * outputs: per‑function records + global summary
+   * sign: DSSE (in‑toto link) with your offline key profile
+8. **Feed VEX**:
+
+   * Map CVE→fix‑site heuristics (from vendor advisories/diff hints) to function buckets.
+   * If all required buckets show “patched” (semantic hash change matches inventory rule), set **`affected=false, justification=code_not_present_or_not_reachable`** (CycloneDX VEX/CVE‑level) with pointer to evidence object.
+
+# Module boundaries in Stella Ops
+
+* **Scanner.WebService** (per your rule): host *lattice algorithms* + this disassembly stage.
+* **Sbomer**: records exact files/Build‑IDs in CycloneDX 1.6/1.7 SBOM (you’re moving to 1.7 soon—ensure `properties` include `disasm.profile`, `b2r2.version`).
+* **Feedser/Vexer**: consume evidence blobs; Vexer attaches VEX statements referencing `evidenceRef`.
+* **Authority/Attestor**: sign DSSE attestations; Timeline/Notify surface verdict transitions.
+
+# On‑disk schemas (minimal)
+
+```json
+{
+  "type": "stella.disasm.patch-evidence@1",
+  "subject": [{"name": "libssl.so.1.1", "digest": {"sha256": "<...>"}, "buildId": "elf:..."}],
+  "tool": {"name": "stella-b2r2", "b2r2": "<commit>", "profile": "norm-v1"},
+  "arch": "x86_64",
+  "functions": [{
+    "fnId": "sha256(pkg,buildId,arch,cfgShape)",
+    "addrRange": "0x401000-0x40118f",
+    "opcodeHashBefore": "<...>",
+    "opcodeHashAfter":  "<...>",
+    "semanticHashBefore": "<...>",
+    "semanticHashAfter":  "<...>",
+    "delta": {"blocksEdited": 2, "immDiff": ["0x7f->0x00"]}
+  }],
+  "summary": {"unchanged": 812, "changed": 6, "added": 1, "removed": 0}
+}
+```
+
+# Determinism controls
+
+* Pin **B2R2 version** and **normalization profile**; serialize the profile (passes + order + flags) and include it in evidence.
+* Containerize the lifter; record image digest in evidence.
+* For randomness (e.g., hash‑salts), set fixed zeros; set `TZ=UTC`, `LC_ALL=C`, and stable CPU features.
+* Replay manifests: list all inputs (file digests, B2R2 commit, profile) so anyone can re‑run and reproduce the exact hashes.
+
+# C# integration sketch (.NET 10)
+
+```csharp
+// StellaOps.Scanner.Disasm
+public sealed class DisasmService
+{
+    private readonly IBinarySource _source; // pulls files + vendor refs
+    private readonly IB2R2Host _b2r2;       // thin wrapper over F# via FFI or CLI
+    private readonly INormalizer _norm;     // norm-v1 pipeline
+    private readonly IEvidenceStore _evidence;
+
+    public async Task<DisasmEvidence> AnalyzeAsync(Artifact a, Artifact baseline)
+    {
+        var liftedAfter = await _b2r2.LiftAsync(a.Path, a.Arch);
+        var liftedBefore = await _b2r2.LiftAsync(baseline.Path, baseline.Arch);
+
+        var fnAfter = _norm.Normalize(liftedAfter).Functions;
+        var fnBefore = _norm.Normalize(liftedBefore).Functions;
+
+        var bucketsAfter = Bucket(fnAfter);
+        var bucketsBefore = Bucket(fnBefore);
+
+        var diff = DiffBuckets(bucketsBefore, bucketsAfter);
+        var evidence = EvidenceBuilder.Build(a, baseline, diff, _norm.ProfileId, _b2r2.Version);
+
+        await _evidence.PutAsync(evidence);  // write + DSSE sign via Attestor
+        return evidence;
+    }
+}
+```
+
+# Normalization profile (norm‑v1)
+
+* **Pass order:** CFG build → SSA → const‑prop → DCE → register‑rename‑canon → call/ret stub‑canon → PLT/plt.got unwrap → NOP/padding strip → reloc placeholder canon (`IMM_RELOC` tokens) → block re‑ordering freeze (cfg sort).
+* **Hash material:** `for block in topo(cfg): emit (opcode, operandKinds, IMM_BUCKETS)`; exclude absolute addrs/symbols.
+
+# Hash‑bucketing details
+
+* **IMM_BUCKETS:** bucket immediates by role: {addr, const, mask, len}. For `addr`, replace with `IMM_RELOC(section, relType)`. For `const`, clamp to ranges (e.g., table sizes).
+* **CFG shape hash:** adjacency list over block arity; keeps compiler‑noise from breaking determinism.
+* **Semantic hash seed:** keccak of (CFG shape hash || value‑flow summaries per def‑use).
+
+# VEX Resolver hookup
+
+* Extend rule language: `requires(fnId in {"EVP_DigestVerifyFinal", ...} && delta.immDiff.any == true)` → verdict `not_affected` with `justification="code_not_present_or_not_reachable"` and `impactStatement="Patched verification path altered constants"`.
+* If some required fix‑sites unchanged → `affected=true` with `actionStatement="Patched binary mismatch: function(s) unchanged"`, priority ↑.
+
+# Golden set + backports
+
+* Maintain per‑distro **golden patched refs** (Build‑ID pinned). If vendor publishes only source patch, build once with a fixed toolchain profile to derive reference hashes.
+* Backports: You’ll often see *different* opcode deltas with the *same* semantic intent—treat evidence as **policy‑mappable**: define acceptable delta patterns (e.g., bounds‑check added) and store them as **“semantic signatures”**.
+
+# CLI user journey (StellaOps standard CLI)
+
+```
+stella scan disasm \
+  --pkg openssl --file /usr/lib/x86_64-linux-gnu/libssl.so.1.1 \
+  --baseline @golden:debian-12/libssl.so.1.1 \
+  --out evidence.json --attest
+```
+
+* Output: DSSE‑signed evidence; `stella vex resolve` then pulls it and updates the VEX verdicts.
+
+# Minimal MVP (2 sprints)
+
+**Sprint A (MVP)**
+
+* B2R2 host + norm‑v1 for x86_64, aarch64 (ELF).
+* Function bucketing + opcode hash; per‑function delta; DSSE evidence.
+* VEX rule: “all listed fix‑sites changed → not_affected”.
+
+**Sprint B**
+
+* Semantic hash; IMM bucketing; PLT/reloc canon; UI diff viewer in Timeline.
+* Golden‑set builder & cache; distro backport adapters (Debian, RHEL, Alpine, SUSE, Astra).
+
+# Risks & guardrails
+
+* Stripped binaries: OK (IR still works). PIE/ASLR: neutralized via reloc canon. LTO/inlining: mitigate with CFG shape + semantic hash (not symbol names).
+* False positives: keep “changed‑but‑harmless” patterns whitelisted via semantic signatures (policy‑versioned).
+* Performance: cache lifted IR by `(digest, arch, profile)`; parallelize per function.
+
+If you want, I can draft the **norm‑v1** pass list as a concrete F# pipeline for B2R2 and a **.proto/JSON‑Schema** for `stella.disasm.patch-evidence@1`, ready to drop into `scanner.webservice`.