feat: Add new provenance and crypto registry documentation

- Introduced attestation inventory and subject-rekor mapping files for tracking Docker packages. - Added a comprehensive crypto registry decision document outlining defaults and required follow-ups. - Created an offline feeds manifest for bundling air-gap resources. - Implemented a script to generate and update binary manifests for curated binaries. - Added a verification script to ensure binary artefacts are located in approved directories. - Defined new schemas for AdvisoryEvidenceBundle, OrchestratorEnvelope, ScannerReportReadyPayload, and ScannerScanCompletedPayload. - Established project files for StellaOps.Orchestrator.Schemas and StellaOps.PolicyAuthoritySignals.Contracts. - Updated vendor manifest to track pinned binaries for integrity.
2025-11-18 23:47:13 +02:00
parent d3ecd7f8e6
commit e91da22836
44 changed files with 6793 additions and 99 deletions
--- a/docs/product-advisories/17-Nov-2026
+++ b/docs/product-advisories/17-Nov-2026
@@ -0,0 +1,846 @@
+
+Here’s a compact blueprint for bringing **stripped ELF binaries** into StellaOps’s **call‑graph + reachability scoring**—from raw bytes → neutral JSON → deterministic scoring.
+
+---
+
+# Why this matters (quick)
+
+Even when symbols are missing, you can still (1) recover functions, (2) build a call graph, and (3) decide if a vulnerable function is *actually* reachable from the binary’s entrypoints. This feeds StellaOps’s deterministic scoring/lattice engine so VEX decisions are evidence‑backed, not guesswork.
+
+---
+
+# High‑level pipeline
+
+1. **Ingest**
+
+* Accept: ELF (static/dynamic), PIE, musl/glibc, multiple arches (x86_64, aarch64, armhf, riscv64).
+* Normalize: compute file hash set (SHA‑256, BLAKE3), note `PT_DYNAMIC`, `DT_NEEDED`, interpreter, RPATH/RUNPATH.
+
+2. **Symbolization (best‑effort)**
+
+* **If DWARF present**: read `.debug_*` (function names, inlines, CU boundaries, ranges).
+* **If stripped**:
+
+  * Use disassembler to **discover functions** (prolog patterns, xref‑to‑targets, thunk detection).
+  * Derive **synthetic names**: `sub_<va>`, `plt_<name>` (from dynamic symbol table if available), `extern@libc.so.6:memcpy`.
+  * Lift exported dynsyms and PLT stubs even when local symbols are removed.
+  * Recover **string‑referenced names** (e.g., Go/Python/C++ RTTI/Itanium mangling where present).
+
+3. **Disassembly & IR**
+
+* Disassemble to basic blocks; lift to a neutral IR (SSA‑like) sufficient for:
+
+  * Call edges (direct `call`/`bl`).
+  * **Indirect calls** via GOT/IAT, vtables, function pointers (approximate with points‑to sets).
+  * Tailcalls, thunks, PLT interposition.
+
+4. **Call‑graph build**
+
+* Start from **entrypoints**:
+
+  * ELF entry (`_start`), constructors (`.init_array`), exported API (public symbols), `main` (if recoverable).
+  * Optional: **entry‑trace** (cmd‑line + env + loader path) from container image to seed realistic roots.
+* Build **CG** with:
+
+  * Direct edges: precise.
+  * Indirect edges: conservative, with **evidence tags** (GOT target set, vtable class set, signature match).
+* Record **inter‑module edges** to shared libs (soname + version) with relocation evidence.
+
+5. **Reachability scoring (deterministic)**
+
+* Input: list of vulnerable functions/paths (from CSAF/CVE KB) normalized to **function‑level identifiers** (soname!symbol or hash‑based if unnamed).
+* Compute **reachability** from roots → target:
+
+  * `REACHABLE_CONFIRMED` (path with only precise edges),
+  * `REACHABLE_POSSIBLE` (path contains conservative edges),
+  * `NOT_REACHABLE_FOUNDATION` (no path in current graph),
+  * Add **confidence** derived from edge evidence + relocation proof.
+* Emit **proof trails** (the exact path: nodes, edges, evidence).
+
+6. **Neutral JSON intermediate (NJIF)**
+
+* Stored in cache; signed for deterministic replay.
+* Consumed by StellaOps.Policy/Lattice to merge with VEX.
+
+---
+
+# Neutral JSON Intermediate Format (NJIF)
+
+```json
+{
+  "artifact": {
+    "path": "/work/bin/app",
+    "hashes": {"sha256": "…", "blake3": "…"},
+    "arch": "x86_64",
+    "elf": {
+      "type": "ET_DYN",
+      "interpreter": "/lib64/ld-linux-x86-64.so.2",
+      "needed": ["libc.so.6", "libssl.so.3"],
+      "rpath": [],
+      "runpath": []
+    }
+  },
+  "symbols": {
+    "exported": [
+      {"id": "libc.so.6!memcpy", "kind": "dynsym", "addr": "0x0", "plt": true}
+    ],
+    "functions": [
+      {"id": "sub_401000", "addr": "0x401000", "size": 112, "name_hint": null, "from": "disasm"},
+      {"id": "main", "addr": "0x4023d0", "size": 348, "from": "dwarf|heuristic"}
+    ]
+  },
+  "cfg": [
+    {"func": "main", "blocks": [
+      {"b": "0x4023d0", "succ": ["0x402415"], "calls": [{"type": "direct", "target": "sub_401000"}]},
+      {"b": "0x402415", "succ": ["0x402440"], "calls": [{"type": "plt", "target": "libc.so.6!memcpy"}]}
+    ]}
+  ],
+  "cg": {
+    "nodes": [
+      {"id": "main", "evidence": ["dwarf|heuristic"]},
+      {"id": "sub_401000"},
+      {"id": "libc.so.6!memcpy", "external": true, "lib": "libc.so.6"}
+    ],
+    "edges": [
+      {"from": "main", "to": "sub_401000", "kind": "direct"},
+      {"from": "main", "to": "libc.so.6!memcpy", "kind": "plt", "evidence": ["reloc@GOT"]}
+    ],
+    "roots": ["_start", "init_array[]", "main"]
+  },
+  "reachability": [
+    {
+      "target": "libssl.so.3!SSL_free",
+      "status": "NOT_REACHABLE_FOUNDATION",
+      "path": []
+    },
+    {
+      "target": "libc.so.6!memcpy",
+      "status": "REACHABLE_CONFIRMED",
+      "path": ["main", "libc.so.6!memcpy"],
+      "confidence": 0.98,
+      "evidence": ["plt", "dynsym", "reloc"]
+    }
+  ],
+  "provenance": {
+    "toolchain": {
+      "disasm": "ghidra_headless|radare2|llvm-mca",
+      "version": "…"
+    },
+    "scan_manifest_hash": "…",
+    "timestamp_utc": "2025-11-16T00:00:00Z"
+  }
+}
+```
+
+---
+
+# Practical extractors (headless/CLI)
+
+* **DWARF**: `llvm-dwarfdump`/`eu-readelf` for quick CU/function ranges; fall back to the disassembler.
+* **Disassembly/CFG/CG** (choose one or more; wrap with a stable adapter):
+
+  * **Ghidra Headless API**: recover functions, basic blocks, references, PLT/GOT, vtables; export via a custom headless script to NJIF.
+  * **radare2 / rizin**: `aaa`, `agCd`, `aflj`, `agj` to export functions/graphs as JSON.
+  * **Binary Ninja headless** (if license permits) for cleaner IL and indirect‑call modeling.
+  * **angr** for path‑sensitive refinement on tricky indirect calls (optional, gated by budget).
+
+**Adapter principle:** All tools output a **small, consistent NJIF** so the scoring engine and lattice logic never depend on any single RE tool.
+
+---
+
+# Indirect call modeling (concise rules)
+
+* **PLT/GOT**: edge from caller → `soname!symbol` with evidence: `plt`, `reloc@GOT`.
+* **Function pointers**: if a store to a pointer is found and targets a known function set `{f1…fk}`, add edges with `kind: "indirect"`, `evidence: ["xref-store", "sig-compatible"]`.
+* **Virtual calls / vtables**: class‑method set from RTTI/vtable scans; mark edges `evidence: ["vtable-match"]`.
+* **Tailcalls**: treat as edges, not fallthrough.
+
+Each conservative step lowers **confidence**, but keeps determinism: the rules and their hashes are in the scan manifest.
+
+---
+
+# Deterministic scoring (plug into Stella’s lattice)
+
+* **Inputs**: NJIF, CVE→function mapping (`soname!symbol` or function hash), policy knobs.
+* **States**: `{NOT_OBSERVED < POSSIBLE < REACHABLE_CONFIRMED}` with **monotone** merge (never oscillates).
+* **Confidence**: product of edge evidences (configurable weights): `direct=1.0, plt=0.98, vtable=0.85, funcptr=0.7`.
+* **Output**: OpenVEX/CSAF annotations + human proof path; signed with DSSE to preserve replayability.
+
+---
+
+# Minimal Ghidra headless skeleton (exporter idea)
+
+```bash
+analyzeHeadless /work/gh_proj MyProj -import app -scriptPath scripts \
+  -postScript ExportNjif.java /out/app.njif.json
+```
+
+```java
+// ExportNjif.java (outline)
+public class ExportNjif extends GhidraScript {
+  public void run() throws Exception {
+    var fns = getFunctions(true);
+    // collect functions, blocks, calls, externs/PLT
+    // map non‑named functions to sub_<addr>
+    // detect PLT thunks → dynsym names
+    // write NJIF JSON deterministically (sorted keys, stable ordering)
+  }
+}
+```
+
+---
+
+# Integration points in StellaOps
+
+* **Scanner.Analyzers.Binary.Elf**
+
+  * `ElfNormalizer` → hashes, dynamic deps.
+  * `Symbolizer` → DWARF reader + HeuristicDisasm (via tool adapter).
+  * `CgBuilder` → NJIF builder/merger (multi‑module).
+  * `ReachabilityEngine` → path search + confidence math.
+  * `Emitter` → NJIF cache + VEX/CSAF notes.
+
+* **Scheduler**: memoize by `(hashes, toolchain_version, ruleset_hash)` to ensure replayable results.
+
+* **Authority**: sign NJIF + scoring outputs; store manifests (feeds, rule weights, tool versions).
+
+---
+
+# Test fixtures (suggested)
+
+* Tiny ELF zoo: statically linked, PIE, stripped/non‑stripped, C++ with vtables, musl vs glibc.
+* Known CVE libs (e.g., `libssl`, `zlib`) with versioned symbols to validate soname!symbol mapping.
+* Synthetic binaries with function‑pointer tables to validate conservative edges.
+
+---
+
+If you want, I can generate:
+
+* A ready‑to‑run **Ghidra headless exporter** (Java) that writes NJIF exactly like above.
+* A small **.NET parser** that ingests NJIF and emits StellaOps reachability + OpenVEX notes.
+Below is a full architecture plan for implementing **stripped-ELF binary reachability** (call graph + NJIF + deterministic scoring, with a hook for patch-oracles) inside **StellaOps**.
+
+I will assume .NET 10, existing microservice split (Scanner.WebService, Scanner.Worker, Concelier, Excitior, Authority, Scheduler, Sbomer, Signals), and your standing rule: **all lattice logic runs in Scanner.WebService**.
+
+---
+
+## 1. Scope, Objectives, Non-Goals
+
+### 1.1 Objectives
+
+1. **Recover function-level call graphs from ELF binaries**, including **stripped** ones:
+
+* Support ET_EXEC / ET_DYN / PIE, static & dynamic linking.
+* Support at least **x86_64, aarch64** in v1, later armhf, riscv64.
+
+2. **Produce a neutral, deterministic JSON representation (NJIF)**:
+
+* Tool-agnostic: can be generated from Ghidra, radare2/rizin, Binary Ninja, angr, etc.
+* Stable identifiers and schema so downstream services don’t depend on a specific RE engine.
+
+3. **Compute function-level reachability for vulnerabilities**:
+
+* Given CVE → `soname!symbol` (and later function-hash) mappings from Concelier,
+* Decide `REACHABLE_CONFIRMED` / `REACHABLE_POSSIBLE` / `NOT_REACHABLE_FOUNDATION` with evidence and confidence.
+
+4. **Integrate with StellaOps lattice and VEX outputs**:
+
+* Lattice logic runs in **Scanner.WebService**.
+* Results flow into Excitior (VEX) and Sbomer (SBOM annotations), preserving provenance.
+
+5. **Enable deterministic replay**:
+
+* Every analysis run is tied to a **Scan Manifest**: tool versions, ruleset hashes, policy hashes, container image digests.
+
+### 1.2 Non-Goals (v1)
+
+* No dynamic runtime probes (EventPipe/JFR) in this phase.
+* No full decompilation; we only need enough IR for calls/edges.
+* No aggressive path-sensitive analysis (symbolic execution) in v1; that can be a v2 enhancement.
+
+---
+
+## 2. High-Level System Architecture
+
+### 2.1 Components
+
+* **Scanner.WebService (existing)**
+
+  * REST/gRPC API for scans.
+  * Orchestrates analysis jobs via Scheduler.
+  * Hosts **Lattice & Reachability Engine** for all artifact types.
+  * Reads NJIF results, merges with Concelier function mappings and policies.
+
+* **Scanner.Worker (existing, extended)**
+
+  * Executes **Binary Analyzer Pipelines**.
+  * Invokes RE tools (Ghidra, rizin, etc.) in dedicated containers.
+  * Produces NJIF and persists it.
+
+* **Binary Tools Containers (new)**
+
+  * `stellaops-tools-ghidra:<tag>`
+  * `stellaops-tools-rizin:<tag>`
+  * Optionally `stellaops-tools-angr` for advanced passes.
+  * Pinned versions, no network access (for determinism & air-gap).
+
+* **Storage & Metadata**
+
+  * **DB (PostgreSQL)**: scan records, NJIF metadata, reachability summaries.
+  * **Object store** (MinIO/S3/Filesystem): NJIF JSON blobs, tool logs.
+  * **Authority**: DSSE signatures for Scan Manifest, NJIF, and reachability outputs.
+
+* **Concelier**
+
+  * Provides **CVE → component → function symbol/hashes** resolution.
+  * Exposes “Link-Not-Merge” graph of advisory, component, and function nodes.
+
+* **Excitior (VEX)**
+
+  * Consumes Scanner.WebService reachability states.
+  * Emits OpenVEX/CSAF with properly justified statuses.
+
+* **UnknownsRegistry (future)**
+
+  * Receives unresolvable call edges / ambiguous functions from the analyzer,
+  * Feeds them into “adaptive security” workflows.
+
+### 2.2 End-to-End Flow (Binary / Image Scan)
+
+1. Client requests scan (binary or container image) via **Scanner.WebService**.
+2. WebService:
+
+   * Extracts binaries from OCI layers (if scanning image),
+   * Registers **Scan Manifest**,
+   * Submits a job to Scheduler (queue: `binary-elfflow`).
+3. Scanner.Worker dequeues the job:
+
+   * Detects ELF binaries,
+   * Runs **Binary Analyzer Pipeline** for each unique binary hash.
+4. Worker uses tools containers:
+
+   * Ghidra/rizin → CFG, function discovery, call graph,
+   * Converts to **NJIF**.
+5. Worker persists NJIF + metadata; marks analysis complete.
+6. Scanner.WebService picks up NJIF:
+
+   * Fetches advisory function mappings from Concelier,
+   * Runs **Reachability & Lattice scoring**,
+   * Updates scan results and triggers Excitior / Sbomer.
+
+All steps are deterministic given:
+
+* Input artifact,
+* Tool container digests,
+* Ruleset/policy versions.
+
+---
+
+## 3. Binary Analyzer Subsystem (Scanner.Worker)
+
+Introduce a dedicated module:
+
+* `StellaOps.Scanner.Analyzers.Binary.Elf`
+
+### 3.1 Internal Layers
+
+1. **ElfDetector**
+
+   * Inspects files in a scan:
+
+     * Magic `0x7f 'E' 'L' 'F'`,
+     * Confirms architecture via ELF header.
+   * Produces `BinaryArtifact` records with:
+
+     * `hashes` (SHA-256, BLAKE3),
+     * `path` in container,
+     * `arch`, `endianness`.
+
+2. **ElfNormalizer**
+
+   * Uses a lightweight library (e.g., ElfSharp) to extract:
+
+     * `ElfType` (ET_EXEC, ET_DYN),
+     * interpreter (`PT_INTERP`),
+     * `DT_NEEDED` list,
+     * RPATH/RUNPATH,
+     * presence/absence of DWARF sections.
+   * Emits a normalized `ElfMetadata` DTO.
+
+3. **Symbolization Layer**
+
+   * Sub-components:
+
+     * `DwarfSymbolReader`: if DWARF present, read CU, function ranges, names, inlines.
+     * `DynsymReader`: parse `.dynsym`, `.plt`, exported symbols.
+     * `HeuristicFunctionFinder`:
+
+       * For stripped binaries:
+
+         * Use disassembler xrefs, prolog patterns, return instructions, call-targets.
+         * Recognize PLT thunks → `soname!symbol`.
+   * Consolidates into `FunctionSymbol` entities:
+
+     * `id` (e.g., `main`, `sub_401000`, `libc.so.6!memcpy`),
+     * `addr`, `size`, `is_external`, `from` (`dwarf`, `dynsym`, `heuristic`).
+
+4. **Disassembly & IR Layer**
+
+   * Abstraction: `IDisassemblyAdapter`:
+
+     * `Task<DisasmResult> AnalyzeAsync(BinaryArtifact, ElfMetadata, ScanManifest)`
+   * Implementations:
+
+     * `GhidraDisassemblyAdapter`:
+
+       * Invokes headless Ghidra in container,
+       * Receives machine-readable JSON (script-produced),
+       * Extracts functions, basic blocks, calls, GOT/PLT info, vtables.
+     * `RizinDisassemblyAdapter` (backup/fallback).
+   * Produces:
+
+     * `BasicBlock` objects,
+     * `Instruction` metadata where needed for calls,
+     * `CallSite` records (direct, PLT, indirect).
+
+5. **Call-Graph Builder**
+
+   * Consumes `FunctionSymbol` + `CallSite` sets.
+   * Identifies **roots**:
+
+     * `_start`, `.init_array` entries,
+     * `main` (if present),
+     * Exported API functions for shared libs.
+   * Creates `CallGraph`:
+
+     * Nodes: functions (`FunctionNode`),
+     * Edges: `CallEdge` with:
+
+       * `kind`: `direct`, `plt`, `indirect-funcptr`, `indirect-vtable`, `tailcall`,
+       * `evidence`: tags like `["reloc@GOT", "sig-match", "vtable-class"]`.
+
+6. **Evidence & Confidence Annotator**
+
+   * For each edge, computes a **local confidence**:
+
+     * `direct`: 1.0
+     * `plt`: 0.98
+     * `indirect-funcptr`: 0.7
+     * `indirect-vtable`: 0.85
+   * For each path later, Scanner.WebService composes these.
+
+7. **NJIF Serializer**
+
+   * Transforms domain objects into **NJIF JSON**:
+
+     * Sorted keys, stable ordering for determinism.
+   * Writes:
+
+     * `artifact`, `elf`, `symbols`, `cfg`, `cg`, and partial `reachability: []` (filled by WebService).
+   * Stores in object store, returns location + hash to DB.
+
+8. **Unknowns Reporting**
+
+   * Any unresolved:
+
+     * Indirect call with empty target set,
+     * Function region not mapped to symbol,
+   * Logged as `UnknownEvidence` records and optionally published to **UnknownsRegistry** stream.
+
+---
+
+## 4. NJIF Data Model (Neutral JSON Intermediate Format)
+
+Define a stable schema with a top-level `njif_schema_version` field.
+
+### 4.1 Top-Level Shape
+
+```json
+{
+  "njif_schema_version": "1.0.0",
+  "artifact": { ... },
+  "symbols": { ... },
+  "cfg": [ ... ],
+  "cg": { ... },
+  "reachability": [ ... ],
+  "provenance": { ... }
+}
+```
+
+### 4.2 Key Sections
+
+1. `artifact`
+
+   * `path`, `hashes`, `arch`, `elf.type`, `interpreter`, `needed`, `rpath`, `runpath`.
+
+2. `symbols`
+
+   * `exported`: external/dynamic symbols, especially PLT:
+
+     * `id`, `kind`, `plt`, `lib`.
+   * `functions`:
+
+     * `id` (synthetic or real name),
+     * `addr`, `size`, `from` (source of naming info),
+     * `name_hint` (optional).
+
+3. `cfg`
+
+   * Per-function basic block CFG plus call sites:
+
+     * Blocks with `succ`, `calls` entries.
+   * Sufficient for future static checks, not full IR.
+
+4. `cg`
+
+   * `nodes`: function nodes with evidence tags.
+   * `edges`: call edges with:
+
+     * `from`, `to`, `kind`, `evidence`.
+   * `roots`: entrypoints for reachability algorithms.
+
+5. `reachability`
+
+   * Initially empty from Worker.
+   * Populated in Scanner.WebService as:
+
+```json
+{
+  "target": "libssl.so.3!SSL_free",
+  "status": "REACHABLE_CONFIRMED",
+  "path": ["_start", "main", "libssl.so.3!SSL_free"],
+  "confidence": 0.93,
+  "evidence": ["plt", "dynsym", "reloc"]
+}
+```
+
+6. `provenance`
+
+   * `toolchain`:
+
+     * `disasm`: `"ghidra_headless:10.4"`, etc.
+   * `scan_manifest_hash`,
+   * `timestamp_utc`.
+
+### 4.3 Persisting NJIF
+
+* Object store (versioned path):
+
+  * `njif/{sha256}/njif-v1.json`
+* DB table `binary_njif`:
+
+  * `binary_hash`, `njif_hash`, `schema_version`, `toolchain_digest`, `scan_manifest_id`.
+
+---
+
+## 5. Reachability & Lattice Integration (Scanner.WebService)
+
+### 5.1 Inputs
+
+* **NJIF** for each binary (possibly multiple binaries per container).
+* Concelier’s **CVE → (component, function)** resolution:
+
+  * `component_id` → `soname!symbol` sets, and where available, function hashes.
+* Scanner’s existing **lattice policies**:
+
+  * States: e.g. `NOT_OBSERVED < POSSIBLE < REACHABLE_CONFIRMED`.
+  * Merge rules are monotone.
+
+### 5.2 Reachability Engine
+
+New service module:
+
+* `StellaOps.Scanner.Domain.Reachability`
+
+  * `INjifRepository` (reads NJIF JSON),
+  * `IFunctionMappingResolver` (Concelier adapter),
+  * `IReachabilityCalculator`.
+
+Algorithm per target function:
+
+1. Resolve vulnerable function(s):
+
+   * From Concelier: `soname!symbol` and/or `func_hash`.
+   * Map to NJIF `symbols.exported` or `symbols.functions`.
+
+2. For each binary:
+
+   * Use `cg.roots` as entry set.
+   * BFS/DFS along edges until:
+
+     * Reaching target node(s),
+     * Or graph fully explored.
+
+3. For each successful path:
+
+   * Collect edges’ `confidence` weights, compute path confidence:
+
+     * e.g., product of edge confidences or a log/additive scheme.
+
+4. Aggregate result:
+
+   * If ≥ 1 path with only `direct/plt` edges:
+
+     * `status = REACHABLE_CONFIRMED`.
+   * Else if only paths with indirect edges:
+
+     * `status = REACHABLE_POSSIBLE`.
+   * Else:
+
+     * `status = NOT_REACHABLE_FOUNDATION`.
+
+5. Emit `reachability` entry back into NJIF (or as separate DB table) and into scan result graph.
+
+### 5.3 Lattice & VEX
+
+* Lattice computation is done per `(CVE, component, binary)` triple:
+
+  * Input: reachability status + other signals.
+* Resulting state is:
+
+  * Exposed to **Excitior** as a set of **evidence-annotated VEX facts**.
+* Excitior translates:
+
+  * `NOT_REACHABLE_FOUNDATION` → likely `not_affected` with justification “code_not_reachable”.
+  * `REACHABLE_CONFIRMED` → `affected` or “present_and_exploitable” (depending on overall policy).
+
+---
+
+## 6. Patch-Oracle Extension (Advanced, but Architected Now)
+
+While not strictly required for v1, we should reserve architecture hooks.
+
+### 6.1 Concept
+
+* Given:
+
+  * A **vulnerable** library build (or binary),
+  * A **patched** build.
+* Run analyzers on both; produce NJIF for each.
+* Compare call graphs & function bodies (e.g., hash of normalized bytes):
+
+  * Identify **changed functions** and potentially changed code regions.
+* Concelier links those function IDs to specific CVEs (via vendor patch metadata).
+* These become authoritative “patched function sets” (the **patch oracle**).
+
+### 6.2 Integration Points
+
+Add a module:
+
+* `StellaOps.Scanner.Analysis.PatchOracle`
+
+  * Input: pair of artifact hashes (old, new) + NJIF.
+  * Output: list of `FunctionPatchRecord`:
+
+    * `function_id`, `binary_hash_old`, `binary_hash_new`, `change_kind` (`added`, `modified`, `deleted`).
+
+Concelier:
+
+* Ingests `FunctionPatchRecord` via internal API and updates advisory graph:
+
+  * CVE → function set derived from real patch.
+* Reachability Engine:
+
+  * Uses patch-derived function sets instead of or in addition to symbol mapping from vendor docs.
+
+---
+
+## 7. Persistence, Determinism, Caching
+
+### 7.1 Scan Manifest
+
+For every scan job, create:
+
+* `scan_manifest`:
+
+  * Input artifact hashes,
+  * List of binaries,
+  * Tool container digests (Ghidra, rizin, etc.),
+  * Ruleset/policy/lattice hashes,
+  * Time, user, and config flags.
+
+Authority signs this manifest with DSSE.
+
+### 7.2 Binary Analysis Cache
+
+Key: `(binary_hash, arch, toolchain_digest, njif_schema_version)`.
+
+* If present:
+
+  * Skip re-running Ghidra/rizin; reuse NJIF.
+* If absent:
+
+  * Run analysis, then cache NJIF.
+
+This provides deterministic replay and prevents re-analysis across scans and across customers (if allowed by tenancy model).
+
+---
+
+## 8. APIs & Integration Contracts
+
+### 8.1 Scanner.WebService External API (REST)
+
+1. `POST /api/scans/images`
+
+   * Existing; extended to flag: `includeBinaryReachability: true`.
+2. `POST /api/scans/binaries`
+
+   * Upload a standalone ELF; returns `scan_id`.
+3. `GET /api/scans/{scanId}/reachability`
+
+   * Returns list of `(cve_id, component, binary_path, function_id, status, confidence, path)`.
+
+No path versioning; idempotent and additive (new fields appear, old ones remain valid).
+
+### 8.2 Internal APIs
+
+* **Worker ↔ Object Store**:
+
+  * `PUT /binary-njif/{sha256}/njif-v1.json`.
+
+* **WebService ↔ Worker (via Scheduler)**:
+
+  * Job payload includes:
+
+    * `scan_manifest_id`,
+    * `binary_hashes`,
+    * `analysis_profile` (`default`, `deep`).
+
+* **WebService ↔ Concelier**:
+
+  * `POST /internal/functions/resolve`:
+
+    * Input: `(cve_id, component_ids[])`,
+    * Output: `soname!symbol[]`, optional `func_hash[]`.
+
+* **WebService ↔ Excitior**:
+
+  * Existing VEX ingestion extended with **reachability evidence** fields.
+
+---
+
+## 9. Observability, Security, Resource Model
+
+### 9.1 Observability
+
+* **Metrics**:
+
+  * Analysis duration per binary,
+  * NJIF size,
+  * Cache hit ratio,
+  * Reachability evaluation time per CVE.
+
+* **Logs**:
+
+  * Ghidra/rizin container logs stored alongside NJIF,
+  * Unknowns logs for unresolved call targets.
+
+* **Tracing**:
+
+  * Each scan/analysis annotated with `scan_manifest_id` to allow end-to-end trace.
+
+### 9.2 Security
+
+* Tools containers:
+
+  * No outbound network.
+  * Limited to read-only artifact mount + write-only result mount.
+* Binary content:
+
+  * Treated as confidential; stored encrypted at rest if your global policy requires it.
+* DSSE:
+
+  * Authority signs:
+
+    * Scan Manifest,
+    * NJIF blob hash,
+    * Reachability summary.
+  * Enables “Proof-of-Integrity Graph” linkage later.
+
+### 9.3 Resource Model
+
+* ELF analysis can be heavy; design for:
+
+  * Separate **worker queue** and autoscaling group for binary analysis.
+  * Configurable max concurrency and per-job CPU/memory limits.
+* Deep analysis (indirect calls, vtables) can be toggled via `analysis_profile`.
+
+---
+
+## 10. Implementation Roadmap
+
+A pragmatic, staged plan:
+
+### Phase 0 – Foundations (1–2 sprints)
+
+* Create `StellaOps.Scanner.Analyzers.Binary.Elf` project.
+* Implement:
+
+  * `ElfDetector`, `ElfNormalizer`.
+  * DB tables: `binary_artifacts`, `binary_njif`.
+* Integrate with Scheduler and Worker pipeline.
+
+### Phase 1 – Non-stripped ELF + NJIF v1 (2–3 sprints)
+
+* Implement **DWARF + dynsym symbolization**.
+* Implement **GhidraDisassemblyAdapter** for x86_64.
+* Build **CallGraphBuilder** (direct + PLT calls).
+* Implement NJIF serializer v1; store in object store.
+* Basic reachability engine in WebService:
+
+  * Only direct and PLT edges,
+  * Only for DWARF-named functions.
+* Integrate with Concelier function mapping via `soname!symbol`.
+
+### Phase 2 – Stripped ELF Support (2–3 sprints)
+
+* Implement `HeuristicFunctionFinder` for function discovery in stripped binaries.
+* Extend Ghidra script to mark PLT/GOT, vtables, function pointers.
+* Call graph: add:
+
+  * `indirect-funcptr`, `indirect-vtable`, `tailcall` edges.
+* Evidence tagging and local confidence scoring.
+* Extend reachability engine to:
+
+  * Distinguish `REACHABLE_CONFIRMED` vs `REACHABLE_POSSIBLE`.
+
+### Phase 3 – Multi-Arch & Performance (2–3 sprints)
+
+* Add support for **aarch64** (Ghidra language, appropriate calling conventions).
+* Optimize:
+
+  * Binary analysis cache,
+  * Tool container lifecycle,
+  * Concurrent analysis.
+* Add Unknowns reporting and hookup to UnknownsRegistry (if already implemented).
+
+### Phase 4 – Patch-Oracle Pilot (2–3 sprints)
+
+* Implement PatchOracle module:
+
+  * Compare old/new NJIFs,
+  * Detect changed functions.
+* Integrate with Concelier’s advisory graph.
+* Start validating against curated CVE/patch datasets.
+
+### Phase 5 – Hardening & Documentation
+
+* Golden fixtures:
+
+  * Small ELF zoo (stripped/non-stripped, PIE, static, C++, vtables).
+  * Known vulnerable libs (e.g., OpenSSL, zlib) to confirm correct function mapping.
+* Add CLI/demo in `StellaOps.Scanner.Cli`:
+
+  * `stellaops scan-binary --file app --show-reachability`.
+* Customer-facing and internal docs:
+
+  * NJIF schema,
+  * API usage,
+  * Limitations and interpretation guidelines.
+
+---
+
+If you want, next step I can do is take this plan and:
+
+* Break it into **epics / tickets** (SCAN-BINARY-xxx) with clear DoD per phase, or
+* Draft the **Ghidra headless Java script** and the **.NET NJIF model classes** so your agents can plug them straight into the Scanner repo.