save progress

2025-12-18 09:10:36 +02:00
parent b4235c134c
commit 28823a8960
169 changed files with 11995 additions and 449 deletions
--- a/docs/product-advisories/18-Dec-2025
+++ b/docs/product-advisories/18-Dec-2025
@@ -0,0 +1,721 @@
+Here are two practical ways to make your software supply‑chain evidence both *useful* and *verifiable*—with enough background to get you shipping.
+
+---
+
+# 1) Binary SBOMs that still work when there’s no package manager
+
+**Why this matters:** Container images built `FROM scratch` or “distroless” often lack package metadata, so typical SBOMs go blank. A *binary SBOM* extracts facts directly from executables—so you still know “what’s inside,” even in bare images.
+
+**Core idea (plain English):**
+
+* Parse binaries (ELF on Linux, PE on Windows, Mach‑O on macOS).
+* Record file paths, cryptographic hashes, import tables, compiler/linker hints, and for ELF also the `.note.gnu.build-id` (a unique ID most linkers embed).
+* Map these fingerprints to known packages/versions (vendor fingerprints, distro databases, your own allowlists).
+* Sign the result as an attestation so others can trust it without re‑running your scanner.
+
+**Minimal pipeline sketch:**
+
+* **Extract:** `readelf -n` (ELF notes), `objdump`/`otool` for imports; compute SHA‑256 for every binary.
+* **Normalize:** Emit CycloneDX or SPDX components for *binaries*, not just packages.
+* **Map:** Use Build‑ID → package hints (e.g., glibc, OpenSSL), symbol/version patterns, and path heuristics.
+* **Attest:** Wrap the SBOM in DSSE + in‑toto and push to your registry alongside the image digest.
+
+**Pragmatic spec for developers:**
+
+* Inputs: OCI image digest.
+* Outputs:
+
+  * `binary-sbom.cdx.json` (CycloneDX) or `binary-sbom.spdx.json`.
+  * `attestation.intoto.jsonl` (DSSE envelope referencing the SBOM’s SHA‑256 and the *image digest*).
+* Data fields to capture per artifact:
+
+  * `algorithm: sha256`, `digest: <hex>`, `type: elf|pe|macho`, `path`, `size`,
+  * `elf.build_id` (if present), `imports[]`, `compiler[]`, `arch`, `endian`.
+* Verification:
+
+  * `cosign verify-attestation --type sbom --digest <image-digest> ...`
+
+**Why the ELF Build‑ID is gold:** it’s a stable, linker‑emitted identifier that helps correlate stripped binaries to upstream packages—critical when filenames and symbols lie.
+
+---
+
+# 2) Reachability analysis so you only page people for *real* risk
+
+**Why this matters:** Not every CVE in your deps can actually be hit by your app. If you can show “no call path reaches the vulnerable sink,” you can *de‑noise* alerts and ship faster.
+
+**Core idea (plain English):**
+
+* Build an *interprocedural call graph* of your app (across modules/packages).
+* Mark known “sinks” from vulnerability advisories (e.g., dangerous API + version range).
+* Compute graph reachability from your entrypoints (HTTP handlers, CLI `main`, background jobs).
+* The intersection of {reachable nodes} × {vulnerable sinks} = “actionable” findings.
+* Emit a signed *witness* (attestation) that states which sinks are reachable/unreachable and why.
+
+**Minimal pipeline sketch:**
+
+* **Ingest code/bytecode:** language‑specific frontends (e.g., .NET IL, JVM bytecode, Python AST, Go SSA).
+* **Build graph:** nodes = functions/methods; edges = call sites (include dynamic edges conservatively).
+* **Mark entrypoints:** web routes, message handlers, cron jobs, exported CLIs.
+* **Mark sinks:** from your vuln DB (API signature + version).
+* **Decide:** run graph search from entrypoints → is any sink reachable?
+* **Attest:** DSSE witness with:
+
+  * artifact digest (commit SHA / image digest),
+  * tool version + rule set hash,
+  * list of reachable sinks with at least one example call path,
+  * list of *proven* unreachable sinks (under stated assumptions).
+
+**Developer contract (portable across languages):**
+
+* Inputs: source/bytecode zip + manifest of entrypoints.
+* Outputs:
+
+  * `reachability.witness.json` (DSSE envelope),
+  * optional `paths/` folder with top‑N call paths as compact JSON (for UX rendering).
+* Verification:
+
+  * Recompute call graph deterministically given the same inputs + tool version,
+  * `cosign verify-attestation --type reachability ...`
+
+---
+
+# How these two pieces fit together
+
+* **Binary SBOM** = “What exactly is in the artifact?” (even in bare images)
+* **Reachability witness** = “Which vulns actually matter to *this* app build?”
+* Sign both as **DSSE/in‑toto attestations** and attach to the image/release. Your CI can enforce:
+
+  * “Block if high‑severity + *reachable*,”
+  * “Warn (don’t block) if high‑severity but *unreachable* with a fresh witness.”
+
+---
+
+# Quick starter checklist (copy/paste to a task board)
+
+* [ ] Binary extractors: ELF/PE/Mach‑O parsers; hash & Build‑ID capture.
+* [ ] Mapping rules: Build‑ID → known package DB; symbol/version heuristics.
+* [ ] Emit CycloneDX/SPDX; add file‑level components for binaries.
+* [ ] DSSE signing and `cosign`/`rekor` publish for SBOM attestation.
+* [ ] Language frontends for reachability (pick your top 1–2 first).
+* [ ] Call‑graph builder + entrypoint detector.
+* [ ] Sink catalog normalizer (map CVE → API signature).
+* [ ] Reachability engine + example path extractor.
+* [ ] DSSE witness for reachability; attach to build.
+* [ ] CI policy: block on “reachable high/critical”; surface paths in UI.
+
+If you want, I can turn this into concrete .NET‑first tasks with sample code scaffolds and a tiny demo repo that builds an image, extracts a binary SBOM, runs reachability on a toy service, and emits both attestations.
+Below is a concrete, “do‑this‑then‑this” implementation plan for a **layered binary→PURL mapping system** that fits StellaOps’ constraints: **offline**, **deterministic**, **SBOM‑first**, and with **unknowns recorded instead of guessing**.
+
+I’m going to assume your target is the common pain case StellaOps itself calls out: when package metadata is missing, Scanner falls back to binary identity (`bin:{sha256}`) and you want to deterministically “lift” those binaries into stable package identities (PURLs) without turning the core SBOM into fuzzy guesswork. StellaOps’ own Scanner docs emphasize **deterministic analyzers**, **no fuzzy identity in core**, and keeping heuristics as opt‑in add‑ons. ([Stella Ops][1])
+
+---
+
+## 0) What “binary mapping” means in StellaOps terms
+
+In Scanner’s architecture, the **component key** is:
+
+* **PURL when present**
+* otherwise `bin:{sha256}` ([Stella Ops][1])
+
+So “better binary mapping” = systematically converting more of those `bin:*` components into **PURLs** (or at least producing **actionable mapping evidence + Unknowns**) while preserving:
+
+* deterministic replay (same inputs ⇒ same output)
+* offline operation (air‑gapped kits)
+* policy safety (don’t hide false negatives behind fuzzy IDs)
+
+Also, StellaOps already has the concept of “gaps” being first‑class via the **Unknowns Registry** (identity gaps, missing build‑id, version conflicts, missing edges, etc.). ([Gitea: Git with a cup of tea][2]) Your binary mapping work should *feed* this system.
+
+---
+
+## 1) Design constraints you must keep (or you’ll fight the platform)
+
+### 1.1 Determinism rules
+
+StellaOps’ Scanner architecture is explicit: core analyzers are deterministic; heuristic plug‑ins must not contaminate the core SBOM unless explicitly enabled. ([Stella Ops][1])
+
+That implies:
+
+* **No probabilistic “best guess” PURL** in the default mapping path.
+* If you do fuzzy inference, it must be emitted as:
+
+  * “hints” attached to Unknowns, or
+  * a separate heuristic artifact gated by flags.
+
+### 1.2 Offline kit + debug store is already a hook you can exploit
+
+Offline kits already bundle:
+
+* scanner plug‑ins (OS + language analyzers packaged under `plugins/scanner/analyzers/**`)
+* a **debug store** layout: `debug/.build-id/<aa>/<rest>.debug`
+* a `debug-manifest.json` that maps build‑ids → originating images (for symbol retrieval) ([Stella Ops][3])
+
+This is perfect for building a **Build‑ID→PURL index** that remains offline and signed.
+
+### 1.3 Scanner Worker already loads analyzers via directory catalogs
+
+The Worker loads OS and language analyzer plug‑ins from default directories (unless overridden), using deterministic directory normalization and a “seal” concept on the last directory. ([Gitea: Git with a cup of tea][4])
+
+So you can add a third catalog for **native/binary mapping** that behaves the same way.
+
+---
+
+## 2) Layering strategy: what to implement (and in what order)
+
+You want a **resolver pipeline** with strict ordering from “hard evidence” → “soft evidence”.
+
+### Layer 0 — In‑image authoritative mapping (highest confidence)
+
+These sources are authoritative because they come from within the artifact:
+
+1. **OS package DB present** (dpkg/rpm/apk):
+
+* Map `path → package` using file ownership lists.
+* If you can also compute file hashes/build‑ids, store them as evidence.
+
+2. **Language ecosystem metadata present** (already handled by language analyzers):
+
+* For example, a Python wheel RECORD or a Go buildinfo section can directly imply module versions.
+
+**Decision rule**: If a binary file is owned by an OS package, **prefer that** over any external mapping index.
+
+### Layer 1 — “Build provenance” mapping via build IDs / UUIDs (strong, portable)
+
+When package DB is missing (distroless/scratch), use **compiler/linker stable IDs**:
+
+* ELF: `.note.gnu.build-id`
+* Mach‑O: `LC_UUID`
+* PE: CodeView (PDB GUID+Age) / build signature
+
+This should be your primary fallback because it survives stripping and renaming.
+
+### Layer 2 — Hash mapping for curated or vendor‑pinned binaries (strong but brittle across rebuilds)
+
+Use SHA‑256 → PURL mapping when:
+
+* binaries are redistributed unchanged (busybox, chromium, embedded runtimes)
+* you maintain a curated “known binaries” manifest
+
+StellaOps already has “curated binary manifest generation” mentioned in its repo history, and a `vendor/manifest.json` concept exists (for pinned artifacts / binaries in the system). ([Gitea: Git with a cup of tea][5])
+For your ops environment you’ll create a similar manifest **for your fleet**.
+
+### Layer 3 — Dependency closure constraints (helpful as a disambiguator, not a primary mapper)
+
+If the binary’s DT_NEEDED / imports point to libs you *can* identify, you can use that to disambiguate multiple possible candidates (“this openssl build-id matches, but only one candidate has the required glibc baseline”).
+
+This must remain deterministic and rules‑based.
+
+### Layer 4 — Heuristic hints (never change the core SBOM by default)
+
+Examples:
+
+* symbol version patterns (`GLIBC_2.28`, etc.)
+* embedded version strings
+* import tables
+* compiler metadata
+
+These produce **Unknown evidence/hints**, not a resolved identity, unless a special “heuristics allowed” flag is turned on.
+
+### Layer 5 — Unknowns Registry output (mandatory when you can’t decide)
+
+If a mapping can’t be made decisively:
+
+* emit Unknowns (identity_gap, missing_build_id, version_conflict, etc.) ([Gitea: Git with a cup of tea][2])
+  This is not optional; it’s how you prevent silent false negatives.
+
+---
+
+## 3) Concrete data model you should implement
+
+### 3.1 Binary identity record
+
+Create a single canonical identity structure that *every layer* uses:
+
+```csharp
+public enum BinaryFormat { Elf, Pe, MachO, Unknown }
+
+public sealed record BinaryIdentity(
+    BinaryFormat Format,
+    string Path,              // normalized (posix style), rooted at image root
+    string Sha256,            // always present
+    string? BuildId,          // ELF
+    string? MachOUuid,        // Mach-O
+    string? PeCodeViewGuid,   // PE/PDB
+    string? Arch,             // amd64/arm64/...
+    long SizeBytes
+);
+```
+
+**Determinism tip**: normalize `Path` to a single separator and collapse `//`, `./`, etc.
+
+### 3.2 Mapping candidate
+
+Each resolver layer returns candidates like:
+
+```csharp
+public enum MappingVerdict { Resolved, Unresolved, Ambiguous }
+
+public sealed record BinaryMappingCandidate(
+    string Purl,
+    double Confidence,          // 0..1 but deterministic
+    string ResolverId,          // e.g. "os.fileowner", "buildid.index.v1"
+    IReadOnlyList<string> Evidence, // stable ordering
+    IReadOnlyDictionary<string,string> Properties // stable ordering
+);
+```
+
+### 3.3 Final mapping result
+
+```csharp
+public sealed record BinaryMappingResult(
+    MappingVerdict Verdict,
+    BinaryIdentity Subject,
+    BinaryMappingCandidate? Winner,
+    IReadOnlyList<BinaryMappingCandidate> Alternatives,
+    string MappingIndexDigest // sha256 of index snapshot used (or "none")
+);
+```
+
+---
+
+## 4) Build the “Binary Map Index” that makes Layer 1 and 2 work offline
+
+### 4.1 Where it lives in StellaOps
+
+Put it in the Offline Kit as a signed artifact, next to other feeds and plug-ins. Offline kit packaging already includes plug-ins and a debug store with a deterministic layout. ([Stella Ops][3])
+
+Recommended layout:
+
+```
+offline-kit/
+  feeds/
+    binary-map/
+      v1/
+        buildid.map.zst
+        sha256.map.zst
+        index.manifest.json
+        index.manifest.json.sig   (DSSE or JWS, consistent with your kit)
+```
+
+### 4.2 Index record schema (v1)
+
+Make each record explicit and replayable:
+
+```json
+{
+  "schema": "stellaops.binary-map.v1",
+  "records": [
+    {
+      "key": { "kind": "elf.build_id", "value": "2f3a..."},
+      "purl": "pkg:deb/debian/openssl@3.0.11-1~deb12u2?arch=amd64",
+      "evidence": {
+        "source": "os.dpkg.fileowner",
+        "source_image": "sha256:....",
+        "path": "/usr/lib/x86_64-linux-gnu/libssl.so.3",
+        "package": "openssl",
+        "package_version": "3.0.11-1~deb12u2"
+      }
+    }
+  ]
+}
+```
+
+Key points:
+
+* `key.kind` is one of `elf.build_id`, `macho.uuid`, `pe.codeview`, `file.sha256`
+* include evidence with enough detail to justify mapping
+
+### 4.3 How to *generate* the index (deterministically)
+
+You need an **offline index builder** pipeline. In StellaOps terms, this is best treated like a feed exporter step (build-time), then shipped in the Offline Kit.
+
+**Input set options** (choose one or mix):
+
+1. “Golden base images” list (your fleet’s base images)
+2. Distro repositories mirrored into the airgap (Deb/RPM/APK archives)
+3. Previously scanned images that are allowed into the kit
+
+**Generation steps**:
+
+1. For each input image:
+
+   * Extract rootfs in a deterministic path order.
+   * Run OS analyzers (dpkg/rpm/apk) + native identity collection (ELF/PE/MachO).
+2. Produce raw tuples:
+
+   * `(build_id | uuid | codeview | sha256) → (purl, evidence)`
+3. Deduplicate:
+
+   * Canonicalize PURLs (normalize qualifiers order, lowercasing rules).
+   * If the same key maps to **multiple distinct PURLs**, keep them all and mark as conflict (do not pick one).
+4. Sort:
+
+   * Sort by `(key.kind, key.value, purl)` lexicographically.
+5. Serialize:
+
+   * Emit line‑delimited JSON or a simple binary format.
+   * Compress (zstd).
+6. Compute digests:
+
+   * `sha256` of each artifact.
+   * `sha256` of concatenated `(artifact name + sha)` for a manifest hash.
+7. Sign:
+
+   * include in kit manifest and sign with the same process you use for other offline kit elements. Offline kit import in StellaOps validates digests and signatures. ([Stella Ops][3])
+
+---
+
+## 5) Runtime side: implement the layered resolver in Scanner Worker
+
+### 5.1 Where to hook in
+
+You want this to run after OS + language analyzers have produced fragments, and after native identity collection has produced binary identities.
+
+Scanner Worker already executes analyzers and appends fragments to `context.Analysis`. ([Gitea: Git with a cup of tea][4])
+
+Scanner module responsibilities explicitly include OS, language, and native ecosystems as restart-only plug-ins. ([Gitea: Git with a cup of tea][6])
+So implement binary mapping as either:
+
+* part of the **native ecosystem analyzer output stage**, or
+* a **post-analyzer enrichment stage** that runs before SBOM composition.
+
+I recommend: **post-analyzer enrichment stage**, because it can consult OS+lang analyzer results and unify decisions.
+
+### 5.2 Add a new ScanAnalysis key
+
+Store collected binary identities in analysis:
+
+* `ScanAnalysisKeys.NativeBinaryIdentities` → `ImmutableArray<BinaryIdentity>`
+
+And store mapping results:
+
+* `ScanAnalysisKeys.NativeBinaryMappings` → `ImmutableArray<BinaryMappingResult>`
+
+### 5.3 Implement the resolver pipeline (deterministic ordering)
+
+```csharp
+public interface IBinaryMappingResolver
+{
+    string Id { get; }      // stable ID
+    int Order { get; }      // deterministic
+    BinaryMappingCandidate? TryResolve(BinaryIdentity identity, MappingContext ctx);
+}
+```
+
+Pipeline:
+
+1. Sort resolvers by `(Order, Id)` (Ordinal comparison).
+2. For each resolver:
+
+   * if it returns a candidate, add it to candidates list.
+   * if the resolver is “authoritative” (Layer 0), you can short‑circuit on first hit.
+3. Decide:
+
+   * If 0 candidates ⇒ `Unresolved`
+   * If 1 candidate ⇒ `Resolved`
+   * If >1:
+
+     * If candidates have different PURLs ⇒ `Ambiguous` unless a deterministic “dominates” rule exists
+     * If candidates have same PURL (from multiple sources) ⇒ merge evidence
+
+### 5.4 Implement each layer as a resolver
+
+#### Resolver A: OS file owner (Layer 0)
+
+Inputs:
+
+* OS analyzer results in `context.Analysis` (they’re already stored in `ScanAnalysisKeys.OsPackageAnalyzers`). ([Gitea: Git with a cup of tea][4])
+* You need OS analyzers to expose file ownership mapping.
+
+Implementation options:
+
+* Extend OS analyzers to produce `path → packageId` maps.
+* Or load that from dpkg/rpm DB at mapping time (fast enough if you only query per binary path).
+
+Candidate:
+
+* `Purl = pkg:<ecosystem>/<name>@<version>?arch=...`
+* Confidence = `1.0`
+* Evidence includes:
+
+  * analyzer id
+  * package name/version
+  * file path
+
+#### Resolver B: Build‑ID index (Layer 1)
+
+Inputs:
+
+* `identity.BuildId` (or uuid/codeview)
+* `BinaryMapIndex` loaded from Offline Kit `feeds/binary-map/v1/buildid.map.zst`
+
+Implementation:
+
+* On worker startup: load and parse index into an immutable structure:
+
+  * `FrozenDictionary<string, BuildIdEntry[]>` (or sorted arrays + binary search)
+* If key maps to multiple PURLs:
+
+  * return multiple candidates (same resolver id), forcing `Ambiguous` verdict upstream
+
+Candidate:
+
+* Confidence = `0.95` (still deterministic)
+* Evidence includes index manifest digest + record evidence
+
+#### Resolver C: SHA‑256 index (Layer 2)
+
+Inputs:
+
+* `identity.Sha256`
+* `feeds/binary-map/v1/sha256.map.zst` OR your ops “curated binaries” manifest
+
+Candidate:
+
+* Confidence:
+
+  * `0.9` if from signed curated manifest
+  * `0.7` if from “observed in previous scan cache” (I’d avoid this unless you version and sign the cache)
+
+#### Resolver D: Dependency closure constraints (Layer 3)
+
+Only run if you have native dependency parsing output (DT_NEEDED / imports). The resolver does **not** return a mapping on its own; instead, it can:
+
+* bump confidence for existing candidates
+* or rule out candidates deterministically (e.g., glibc baseline mismatch)
+
+Make this a “candidate rewriter” stage:
+
+```csharp
+public interface ICandidateRefiner
+{
+    string Id { get; }
+    int Order { get; }
+    IReadOnlyList<BinaryMappingCandidate> Refine(BinaryIdentity id, IReadOnlyList<BinaryMappingCandidate> cands, MappingContext ctx);
+}
+```
+
+#### Resolver E: Heuristic hints (Layer 4)
+
+Never resolves to a PURL by default. It just produces Unknown evidence payload:
+
+* extracted strings (“OpenSSL 3.0.11”)
+* imported symbol names
+* SONAME
+* symbol version requirements
+
+---
+
+## 6) SBOM composition behavior: how to “lift” bin components safely
+
+### 6.1 Don’t break the component key rules
+
+Scanner uses:
+
+* key = PURL when present, else `bin:{sha256}` ([Stella Ops][1])
+
+When you resolve a binary identity to a PURL, you have two clean options:
+
+**Option 1 (recommended): replace the component key with the PURL**
+
+* This makes downstream policy/advisory matching work naturally.
+* It’s deterministic as long as the mapping index is versioned and shipped with the kit.
+
+**Option 2: keep `bin:{sha256}` as the component key and attach `resolved_purl`**
+
+* Lower disruption to diffing, but policy now has to understand the “resolved_purl” field.
+* If StellaOps policy assumes `component.purl` is the canonical key, this will cause pain.
+
+Given StellaOps emphasizes PURLs as the canonical key for identity, I’d implement **Option 1**, but record robust evidence + index digest.
+
+### 6.2 Preserve file-level evidence
+
+Even after lifting to PURL, keep evidence that ties the package identity back to file bytes:
+
+* file path(s)
+* sha256
+* build-id/uuid
+* mapping resolver id + index digest
+
+This is what makes attestations verifiable and helps operators debug.
+
+---
+
+## 7) Unknowns integration: emit Unknowns whenever mapping isn’t decisive
+
+The Unknowns Registry exists precisely for “unresolved symbol → package mapping”, “missing build-id”, “ambiguous purl”, etc. ([Gitea: Git with a cup of tea][2])
+
+### 7.1 When to emit Unknowns
+
+Emit Unknowns for:
+
+1. `identity.BuildId == null` for ELF
+
+   * `unknown_type = missing_build_id`
+   * evidence: “ELF missing .note.gnu.build-id; using sha256 only”
+
+2. Multiple candidates with different PURLs
+
+   * `unknown_type = version_conflict` (or `identity_gap`)
+   * evidence: list candidates + their evidence
+
+3. Heuristic hints found but no authoritative mapping
+
+   * `unknown_type = identity_gap`
+   * evidence: imported symbols, strings, SONAME
+
+### 7.2 How to compute `unknown_id` deterministically
+
+Unknowns schema suggests:
+
+* `unknown_id` is derived from sha256 over `(type + scope + evidence)` ([Gitea: Git with a cup of tea][2])
+
+Do:
+
+* stable JSON canonicalization of `scope` + `unknown_type` + `primary evidence fields`
+* sha256
+* prefix with `unk:sha256:<...>`
+
+This guarantees idempotent ingestion behavior (`POST /unknowns/ingest` upsert). ([Gitea: Git with a cup of tea][2])
+
+---
+
+## 8) Packaging as a StellaOps plug-in (so ops can upgrade it offline)
+
+### 8.1 Plug-in manifest
+
+Scanner plug-ins use a `manifest.json` with `schemaVersion`, `id`, `entryPoint` (dotnet assembly + typeName), etc. ([Gitea: Git with a cup of tea][7])
+
+Create something like:
+
+```json
+{
+  "schemaVersion": "1.0",
+  "id": "stellaops.analyzer.native.binarymap",
+  "displayName": "StellaOps Native Binary Mapper",
+  "version": "0.1.0",
+  "requiresRestart": true,
+  "entryPoint": {
+    "type": "dotnet",
+    "assembly": "StellaOps.Scanner.Analyzers.Native.BinaryMap.dll",
+    "typeName": "StellaOps.Scanner.Analyzers.Native.BinaryMap.BinaryMapPlugin"
+  },
+  "capabilities": [
+    "native-analyzer",
+    "binary-mapper",
+    "elf",
+    "pe",
+    "macho"
+  ],
+  "metadata": {
+    "org.stellaops.analyzer.kind": "native",
+    "org.stellaops.restart.required": "true"
+  }
+}
+```
+
+### 8.2 Worker loading
+
+Mirror the pattern in `CompositeScanAnalyzerDispatcher`:
+
+* add a catalog `INativeAnalyzerPluginCatalog`
+* default directory: `plugins/scanner/analyzers/native`
+* load directories with the same “seal last directory” behavior ([Gitea: Git with a cup of tea][4])
+
+---
+
+## 9) Tests and performance gates (what “done” looks like)
+
+StellaOps has determinism tests and golden fixtures for analyzers; follow that style. ([Gitea: Git with a cup of tea][6])
+
+### 9.1 Determinism tests
+
+Create fixtures with:
+
+* same binaries in different file order
+* same binaries hardlinked/symlinked
+* stripped ELF missing build-id
+* multi-arch variants
+
+Assert:
+
+* mapping output JSON byte-for-byte stable
+* unknown ids stable
+* candidate ordering stable
+
+### 9.2 “No fuzzy identity” guardrail tests
+
+Add tests that:
+
+* heuristic resolver never emits a `Resolved` verdict unless a feature flag is enabled
+* ambiguous candidates never auto-select a winner
+
+### 9.3 Performance budgets
+
+For ops, you care about scan wall time. Adopt budgets like:
+
+* identity extraction < 25ms / binary (native parsing)
+* mapping lookup O(1) / binary (frozen dict) or O(log n) with sorted arrays
+* index load time bounded (lazy load per worker start)
+
+Track metrics:
+
+* count resolved per layer
+* count ambiguous/unresolved
+* unknown density (ties into Unknowns Registry scoring later) ([Gitea: Git with a cup of tea][2])
+
+---
+
+## 10) Practical “ops” workflow: how to keep improving mapping safely
+
+### 10.1 Add a feedback loop from Unknowns → index builder
+
+Unknowns are your backlog:
+
+* “missing build-id”
+* “ambiguous mapping”
+* “hash seen but not in index”
+
+For each Unknown:
+
+1. decide if it should be mapped in core (needs authoritative source)
+2. if yes: add reference artifact to your **index builder input set**
+3. rebuild the BinaryMap index
+4. ship via Offline Kit update (signed)
+
+### 10.2 Don’t let your index silently drift
+
+Because determinism matters, treat the BinaryMap index like a feed:
+
+* version it (`v1`, `v2`)
+* sign it
+* store index digest in scan evidence
+
+That way you can explain: “This binary was mapped using binary-map/v1 digest XYZ”.
+
+---
+
+## 11) Minimal implementation checklist (if you want the shortest path to value)
+
+If you only do 3 things, do these:
+
+1. **Build‑ID extraction everywhere** (ELF/Mach‑O/PE) and always store it in evidence
+   (also emit Unknown when missing, as StellaOps expects) ([Gitea: Git with a cup of tea][8])
+
+2. **Offline Build‑ID → PURL index** shipped in Offline Kit
+   (fits perfectly with the existing debug-store + kit pattern) ([Stella Ops][3])
+
+3. **Deterministic resolver pipeline + Unknowns emission**
+   (so you improve mapping without introducing silent risk) ([Gitea: Git with a cup of tea][2])
+
+---
+
+If you tell me whether your main pain is **distroless**, **FROM scratch**, or **vendor‑bundled runtimes** (chromium/node/openssl/etc.), I can give you the best “Layer 1 index builder” recipe for that category (what to use as authoritative sources and how to avoid collisions) — but the plan above is already safe and implementable without further assumptions.
+
+[1]: https://stella-ops.org/docs/modules/scanner/architecture/ "Stella Ops – Signed Reachability · Deterministic Replay · Sovereign Crypto"
+[2]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/d519782a8f0b30f425c9b6ae0f316b19259972a2/docs/signals/unknowns-registry.md "git.stella-ops.org/unknowns-registry.md at d519782a8f0b30f425c9b6ae0f316b19259972a2 - git.stella-ops.org - Gitea: Git with a cup of tea"
+[3]: https://stella-ops.org/docs/24_offline_kit/index.html "Stella Ops – Signed Reachability · Deterministic Replay · Sovereign Crypto"
+[4]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/18f28168f022c73736bfd29033c71daef5e11044/src/Scanner/StellaOps.Scanner.Worker/Processing/CompositeScanAnalyzerDispatcher.cs "git.stella-ops.org/CompositeScanAnalyzerDispatcher.cs at 18f28168f022c73736bfd29033c71daef5e11044 - git.stella-ops.org - Gitea: Git with a cup of tea"
+[5]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/8d78dd219b5e44c835e511491a4750f4a3ee3640/vendor/manifest.json?utm_source=chatgpt.com "git.stella-ops.org/manifest.json at ..."
+[6]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/bc0762e97d251723854b9c4e482b218c8efb1e04/docs/modules/scanner "git.stella-ops.org/scanner at bc0762e97d251723854b9c4e482b218c8efb1e04 - git.stella-ops.org - Gitea: Git with a cup of tea"
+[7]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/c37722993137dac4b3a4104045826ca33b9dc289/plugins/scanner/analyzers/lang/StellaOps.Scanner.Analyzers.Lang.Go/manifest.json "git.stella-ops.org/manifest.json at c37722993137dac4b3a4104045826ca33b9dc289 - git.stella-ops.org - Gitea: Git with a cup of tea"
+[8]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/d519782a8f0b30f425c9b6ae0f316b19259972a2/docs/reachability/evidence-schema.md?utm_source=chatgpt.com "git.stella-ops.org/evidence-schema.md at ..."