Here’s a crisp idea that could give Stella Ops a real moat: **binary‑level reachability**—linking CVEs directly to the exact functions and offsets inside compiled artifacts (ELF/PE/Mach‑O), not just to packages.

---

### Why this matters (quick background)

* **Package‑level flags are noisy.** Most scanners say “vuln in `libX v1.2`,” but that library might be present and never executed.
* **Language‑level call graphs help** (when you have source or rich metadata), but containers often ship only **stripped binaries**.
* **Binary reachability** answers: *Is the vulnerable function actually in this image? Is its code path reachable from the entrypoints we observed or can construct?*

---

### The missing layer: Symbolization

Build a **symbolization layer** that normalizes debug and symbol info across platforms:

* **Inputs**: DWARF (ELF/Mach‑O), PDB (PE/Windows), symtabs, exported symbols, `.eh_frame`, and (when stripped) heuristic signatures (e.g., function byte‑hashes, CFG fingerprints).
* **Outputs**: a source‑agnostic map: `{binary → sections → functions → (addresses, ranges, hashes, demangled names, inlined frames)}`.
* **Normalization**: Put everything into a common schema (e.g., `Stella.Symbolix.v1`) so higher layers don’t care if it came from DWARF or PDB.

---

### End‑to‑end reachability (binary‑first, source‑agnostic)

1. **Acquire & parse**

   * Detect format (ELF/PE/Mach‑O), parse headers, sections, symbol tables.
   * If debug info present: parse DWARF/PDB; else fall back to disassembly + function boundary recovery.
2. **Function catalog**

   * Assign stable IDs per function: `(imageHash, textSectionHash, startVA, size, fnHashXX)`.
   * Record x‑refs (calls/jumps), imports/exports, PLT/IAT edges.
3. **Entrypoint discovery**

   * Docker entry, process launch args, service scripts; infer likely mains (Go `main.main`, .NET hostfxr path, JVM launcher, etc.).
4. **Call‑graph build (binary CFG)**

   * Build inter/intra‑procedural graph (direct + resolved indirect via IAT/PLT). Keep “unknown‑target” edges for conservative safety.
5. **CVE→function linking**

   * Maintain a **signature bank** per CVE advisory: vulnerable function names, file paths, and—crucially—**byte‑sequence or basic‑block fingerprints** for patched vs vulnerable versions (works even when stripped).
6. **Reachability analysis**

   * Is the vulnerable function present? Is there a path from any entrypoint to it (under conservative assumptions)? Tag as `Present+Reachable`, `Present+Uncertain`, or `Absent`.
7. **Runtime confirmation (optional, when users allow)**

   * Lightweight probes (eBPF on Linux, ETW on Windows, perf/JFR/EventPipe) capture function hits; cross‑check with the static result to upgrade confidence.

---

### Minimal component plan (drop into Stella Ops)

* **Scanner.Symbolizer**
  Parsers: ELF/DWARF (libdw or pure‑managed reader), PE/PDB (Dia/LLVM PDB), Mach‑O/DSYM.
  Output: `Symbolix.v1` blobs stored in OCI layer cache.
* **Scanner.CFG**
  Lifts functions to a normalized IR (capstone/iced‑x86 for decode) → builds CFG & call graph.
* **Advisory.FingerprintBank**
  Ingests CSAF/OpenVEX plus curated fingerprints (fn names, block hashes, patch diff markers). Versioned, signed, air‑gap‑syncable.
* **Reachability.Engine**
  Joins (`Symbolix` + `CFG` + `FingerprintBank`) → emits `ReachabilityEvidence` with lattice states for VEX.
* **VEXer.Adapter**
  Emits **OpenVEX** statements with `status: affected/not_affected` and `justification: function_not_present | function_not_reachable | mitigated_at_runtime`, attaching Evidence URIs.
* **Console UX**
  “Why not affected?” panel showing entrypoint→…→function path (or absence), with byte‑hash proof.

---

### Data model sketch (concise)

* `ImageFunction { id, name?, startVA, size, fnHash, sectionHash, demangled?, provenance:{DWARF|PDB|Heuristic} }`
* `Edge { srcFnId, dstFnId, kind:{direct|plt|iat|indirect?} }`
* `CveSignature { cveId, fnName?, libHints[], blockFingerprints[], versionRanges }`
* `Evidence { cveId, imageId, functionMatches[], reachable: bool?, confidence:[low|med|high], method:[static|runtime|hybrid] }`

---

### Practical phases (8–10 weeks of focused work)

1. **P0**: ELF/DWARF symbolizer + basic function catalog; link a handful of CVEs via name‑only; emit OpenVEX `function_not_present`.
2. **P1**: CFG builder (direct calls) + PLT/IAT resolution; simple reachability; first fingerprints for top 50 CVEs in glibc, openssl, curl, zlib.
3. **P2**: Stripped‑binary heuristics (block hashing) + Go/Rust name demangling; Windows PDB ingestion for PE.
4. **P3**: Runtime probes (opt‑in) + confidence upgrade logic; Console path explorer; evidence signing (DSSE).

---

### KPIs to prove the moat

* **Noise cut**: % reduction in “affected” flags after reachability (target 40–70% on typical containers).
* **Precision**: Ground‑truth validation vs PoC images (TP/FP/FN on presence & reachability).
* **Coverage**: % images where we can make a determination without source (goal: >80%).
* **Latency**: Added scan time per image (<15s typical with caches).

---

### Risks & how to handle them

* **Stripped binaries** → mitigate with block‑hash fingerprints & library‑version heuristics.
* **Obfuscated/packed code** → mark `Uncertain`; allow user‑supplied hints; prefer runtime confirmation.
* **Advisory inconsistency** → keep our own curated CVE→function fingerprint bank; sign & version it.
* **Platform spread** → start Linux/ELF, then Windows/PDB, then Mach‑O.

---

### Why competitors struggle

Most tools stop at packages because binary CFG + fingerprint curation is hard and expensive. Shipping a **source‑agnostic reachability engine** tied to signed evidence in VEX would set Stella Ops apart—especially in offline/air‑gapped and sovereign contexts you already target.

If you want, I can draft:

* the `Symbolix.v1` protobuf,
* a tiny PoC (ELF→functions→match CVE with a block fingerprint),
* and the OpenVEX emission snippet your VEXer can produce.
Below is a detailed architecture plan for implementing reachability and call-graph analysis in Stella Ops, covering JavaScript, Python, PHP, and binaries, and integrating with your existing Scanner / Concelier / VEXer stack.

I will assume:

* .NET 10 for core services.
* Scanner is the place where all “trust algebra / lattice” runs (per your standing rule).
* Concelier and VEXer remain “preserve/prune” layers and do not run lattice logic.
* Output must be JSON-centric with PURLs and OpenVEX.

---

## 1. Scope & Objectives

### 1.1 Primary goals

1. From an OCI image, build:

   * A **library-level usage graph** (which libraries are used by which entrypoints).
   * A **function-level call graph** for JS / Python / PHP / binaries.
2. Map CVEs (from Concelier) to:

   * Concrete **components** (PURLs) in the SBOM.
   * Concrete **functions / entrypoints / code regions** inside those components.
3. Perform **reachability analysis** to classify each vulnerability as:

   * `present + reachable`
   * `present + not_reachable`
   * `function_not_present` (no vulnerable symbol)
   * `uncertain` (dynamic features, unresolved calls)
4. Emit:

   * **Structured JSON** with PURLs and call-graph nodes/edges (“reachability evidence”).
   * **OpenVEX** documents with appropriate `status`/`justification`.

### 1.2 Non-goals (for now)

* Full dynamic analysis of the running container (eBPF, ptrace, etc.) – leave as Phase 3+ optional add-on.
* Perfect call graph precision for dynamic languages (aim for safe, conservative approximations).
* Automatic “fix recommendations” (handled by other Stella Ops agents later).

---

## 2. High-Level Architecture

### 2.1 Major components

Within Stella Ops:

* **Scanner.WebService**

  * User-facing API.
  * Orchestrates full scan (SBOM, CVEs, reachability).
  * Hosts the **Lattice/Policy engine** that merges evidence and produces decisions.
* **Scanner.Worker**

  * Runs per-image analysis jobs.
  * Invokes analyzers (JS, Python, PHP, Binary) inside its own container context.
* **Scanner.Reachability Core Library**

  * Unified IR for call graphs and reachability evidence.
  * Interfaces for language and binary analyzers.
  * Graph algorithms (BFS/DFS, lattice evaluation, entrypoint expansion).
* **Language Analyzers**

  * `Scanner.Analyzers.JavaScript`
  * `Scanner.Analyzers.Python`
  * `Scanner.Analyzers.Php`
  * `Scanner.Analyzers.Binary`
* **Symbolization & CFG (for binaries)**

  * `Scanner.Symbolization` (ELF, PE, Mach-O parsers, DWARF/PDB)
  * `Scanner.Cfg` (CFG + call graph for binaries)
* **Vulnerability Signature Bank**

  * `Concelier.Signatures` (curated CVE→function/library fingerprints).
  * Exposed to Scanner as **offline bundle**.
* **VEXer**

  * `Vexer.Adapter.Reachability` – transforms reachability evidence into OpenVEX.

### 2.2 Data flow (logical)

```mermaid
flowchart LR
  A[OCI Image / Tar] --> B[Scanner.Worker: Extract FS]
  B --> C[SBOM Engine (CycloneDX/SPDX)]
  C --> D[Vuln Match (Concelier feeds)]
  B --> E1[JS Analyzer]
  B --> E2[Python Analyzer]
  B --> E3[PHP Analyzer]
  B --> E4[Binary Analyzer + Symbolizer/CFG]

  D --> F[Reachability Orchestrator]
  E1 --> F
  E2 --> F
  E3 --> F
  E4 --> F
  F --> G[Lattice/Policy Engine (Scanner.WebService)]
  G --> H[Reachability Evidence JSON]
  G --> I[VEXer: OpenVEX]
  G --> J[Graph/Cartographer (optional)]
```

---

## 3. Data Model & JSON Contracts

### 3.1 Core IR types (Scanner.Reachability)

Define in a central assembly, e.g. `StellaOps.Scanner.Reachability`:

```csharp
public record ComponentRef(
    string Purl,
    string? BomRef,
    string? Name,
    string? Version);

public enum SymbolKind { Function, Method, Constructor, Lambda, Import, Export }

public record SymbolId(
    string Language,       // "js", "python", "php", "binary"
    string ComponentPurl,  // SBOM component PURL or "" for app code
    string LogicalName,    // e.g., "server.js:handleLogin"
    string? FilePath,
    int? Line);

public record CallGraphNode(
    string Id,                 // stable id, e.g., hash(SymbolId)
    SymbolId Symbol,
    SymbolKind Kind,
    bool IsEntrypoint);

public enum CallEdgeKind { Direct, Indirect, Dynamic, External, Ffi }

public record CallGraphEdge(
    string FromNodeId,
    string ToNodeId,
    CallEdgeKind Kind);

public record CallGraph(
    string GraphId,
    IReadOnlyList<CallGraphNode> Nodes,
    IReadOnlyList<CallGraphEdge> Edges);
```

### 3.2 Vulnerability mapping

```csharp
public record VulnerabilitySignature(
    string Source,             // "csaf", "nvd", "vendor"
    string Id,                 // "CVE-2023-12345"
    IReadOnlyList<string> Purls,
    IReadOnlyList<string> TargetSymbolPatterns, // glob-like or regex
    IReadOnlyList<string>? FilePathPatterns,
    IReadOnlyList<string>? BlockFingerprints    // for binaries, optional
);
```

### 3.3 Reachability evidence

```csharp
public enum ReachabilityStatus
{
    PresentReachable,
    PresentNotReachable,
    FunctionNotPresent,
    Unknown
}

public record ReachabilityEvidence
(
    string ImageRef,
    string VulnId,               // CVE or advisory id
    ComponentRef Component,
    ReachabilityStatus Status,
    double Confidence,           // 0..1
    string Method,               // "static-callgraph", "binary-fingerprint", etc.
    IReadOnlyList<string> EntrypointNodeIds,
    IReadOnlyList<IReadOnlyList<string>>? ExamplePaths // optional list of node-paths
);
```

### 3.4 JSON structure (external)

Minimal external JSON (what you store / expose):

```json
{
  "image": "registry.example.com/app:1.2.3",
  "components": [
    {
      "purl": "pkg:npm/express@4.18.0",
      "bomRef": "component-1"
    }
  ],
  "callGraphs": [
    {
      "graphId": "js-main",
      "language": "js",
      "nodes": [ /* CallGraphNode */ ],
      "edges": [ /* CallGraphEdge */ ]
    }
  ],
  "reachability": [
    {
      "vulnId": "CVE-2023-12345",
      "componentPurl": "pkg:npm/express@4.18.0",
      "status": "PresentReachable",
      "confidence": 0.92,
      "entrypoints": [ "node:..." ],
      "paths": [
        ["node:entry", "node:routeHandler", "node:vulnFn"]
      ]
    }
  ]
}
```

---

## 4. Scanner-Side Architecture

### 4.1 Project layout (suggested)

```text
src/
  Scanner/
    StellaOps.Scanner.WebService/
    StellaOps.Scanner.Worker/
    StellaOps.Scanner.Core/        # shared scan domain
    StellaOps.Scanner.Reachability/
    StellaOps.Scanner.Symbolization/
    StellaOps.Scanner.Cfg/
    StellaOps.Scanner.Analyzers.JavaScript/
    StellaOps.Scanner.Analyzers.Python/
    StellaOps.Scanner.Analyzers.Php/
    StellaOps.Scanner.Analyzers.Binary/
```

### 4.2 API surface (Scanner.WebService)

* `POST /api/scan/image`

  * Request: `{ "imageRef": "...", "profile": { "reachability": true, ... } }`
  * Returns: scan id.
* `GET /api/scan/{id}/reachability`

  * Returns: `ReachabilityEvidence[]`, plus call graph summary (optional).
* `GET /api/scan/{id}/vex`

  * Returns: OpenVEX with statuses based on reachability lattice.

### 4.3 Worker orchestration

`StellaOps.Scanner.Worker`:

1. Receives scan job with `imageRef`.

2. Extracts filesystem (layered rootfs) under `/mnt/scans/{scanId}/rootfs`.

3. Invokes SBOM generator (CycloneDX/SPDX).

4. Invokes Concelier via offline feeds to get:

   * Component vulnerabilities (CVE list per PURL).
   * Vulnerability signatures (fingerprints).

5. Builds a `ReachabilityPlan`:

   ```csharp
   public record ReachabilityPlan(
       IReadOnlyList<ComponentRef> Components,
       IReadOnlyList<VulnerabilitySignature> Vulns,
       IReadOnlyList<AnalyzerTarget> AnalyzerTargets // files/dirs grouped by language
   );
   ```

6. For each language target, dispatch analyzer:

   * JavaScript: `IReachabilityAnalyzer` implementation for JS.
   * Python: likewise.
   * PHP: likewise.
   * Binary: symbolizer + CFG.

7. Collects call graphs from each analyzer and merges them into a single IR (or separate per-language graphs with shared IDs).

8. Sends merged graphs + vuln list to **Reachability Engine** (Scanner.Reachability).

---

## 5. Language Analyzers (JS / Python / PHP)

All analyzers implement a common interface:

```csharp
public interface IReachabilityAnalyzer
{
    string Language { get; } // "js", "python", "php"

    Task<CallGraph> AnalyzeAsync(AnalyzerContext context, CancellationToken ct);
}

public record AnalyzerContext(
    string RootFsPath,
    IReadOnlyList<ComponentRef> Components,
    IReadOnlyList<VulnerabilitySignature> Vulnerabilities,
    IReadOnlyDictionary<string, string> Env,   // container env, entrypoint, etc.
    string? EntrypointCommand                  // container CMD/ENTRYPOINT
);
```

### 5.1 JavaScript (Node.js focus)

**Inputs:**

* `/app` tree inside container (or discovered via SBOM).
* `package.json` files.
* Container entrypoint (e.g., `["node", "server.js"]`).

**Core steps:**

1. Identify **app root**:

   * Heuristics: directory containing `package.json` that owns the entry script.
2. Parse:

   * All `.js`, `.mjs`, `.cjs` in app and `node_modules` for vulnerable PURLs.
   * Use a parsing frontend (e.g., Tree-sitter via .NET binding, or Node+AST-as-JSON).
3. Build module graph:

   * `require`, `import`, `export`.
4. Function-level graph:

   * For each function/method, create `CallGraphNode`.
   * For each `callExpression`, create `CallGraphEdge` (try to resolve callee).
5. Entrypoints:

   * Main script in CMD/ENTRYPOINT.
   * HTTP route handlers (for express/koa) detected by patterns (e.g., `app.get("/...")`).
6. Map vulnerable symbols:

   * From `VulnerabilitySignature.TargetSymbolPatterns` (e.g., `express/lib/router/layer.js:handle_request`).
   * Identify nodes whose `SymbolId` matches patterns.

**Output:**

* `CallGraph` for JS with:

  * `IsEntrypoint = true` for main and detected handlers.
  * Node attributes include file path, line, component PURL.

### 5.2 Python

**Inputs:**

* Site-packages paths from SBOM.
* Entrypoint script (CMD/ENTRYPOINT).
* Framework heuristics (Django, Flask) from environment variables or common entrypoints.

**Core steps:**

1. Discover Python interpreter chain: not needed for pure static, but useful for heuristics.
2. Parse `.py` files of:

   * App code.
   * Vulnerable packages (per PURL).
3. Build module import graph (`import`, `from x import y`).
4. Function-level graph:

   * Nodes for functions, methods, class constructors.
   * Edges for call expressions; conservative for dynamic calls.
5. Entrypoints:

   * Main script.
   * WSGI callable (e.g., `application` in `wsgi.py`).
   * Django URLconf -> view functions.
6. Map vulnerable symbols using `TargetSymbolPatterns` like `django.middleware.security.SecurityMiddleware.__call__`.

### 5.3 PHP

**Inputs:**

* Web root (from container image or conventional paths `/var/www/html`, `/app/public`, etc.).
* Composer metadata (`composer.json`, `vendor/`).
* Web server config if present (optional).

**Core steps:**

1. Discover front controllers (e.g., `index.php`, `public/index.php`).
2. Parse PHP files (again, via Tree-sitter or any suitable parser).
3. Resolve include/require chains to build file-level inclusion graph.
4. Build function/method graph:

   * Functions, methods, class constructors.
   * Calls with best-effort resolution for namespaced functions.
5. Entrypoints:

   * Front controllers and router entrypoints (e.g., Symfony, Laravel detection).
6. Map vulnerable symbols (e.g., functions in certain vendor packages, particular methods).

---

## 6. Binary Analyzer & Symbolizer

Project: `StellaOps.Scanner.Analyzers.Binary` + `Symbolization` + `Cfg`.

### 6.1 Inputs

* All binaries and shared libraries in:

  * `/usr/lib`, `/lib`, `/app/bin`, etc.
* SBOM link: each binary mapped to its component PURL when possible.
* Vulnerability signatures for native libs: function names, symbol names, fingerprints.

### 6.2 Symbolization

Module: `StellaOps.Scanner.Symbolization`

* Detect format: ELF, PE, Mach-O.
* For ELF/Mach-O:

  * Parse symbol tables (`.symtab`, `.dynsym`).
  * Parse DWARF (if present) to map functions to source files/lines.
* For PE:

  * Parse PDB (if present) or export table.
* For stripped binaries:

  * Run function boundary recovery (linear sweep + heuristic).
  * Compute block/fn-level hashes for fingerprinting.

Output:

```csharp
public record ImageFunction(
    string ImageId,      // e.g., SHA256 of file
    ulong StartVa,
    uint Size,
    string? SymbolName,  // demangled if possible
    string FnHash,       // stable hash of bytes / CFG
    string? SourceFile,
    int? SourceLine);
```

### 6.3 CFG + Call graph

Module: `StellaOps.Scanner.Cfg`

* Disassemble `.text` using Capstone/Iced.x86.
* Build basic blocks and CFG.
* Identify:

  * Direct calls (resolved).
  * PLT/IAT indirections to shared libraries.
* Build `CallGraph` for binary functions:

  * Entrypoints: `main`, exported functions, Go `main.main`, etc.
  * Map application functions to library functions via PLT/IAT edges.

### 6.4 Linking vulnerabilities

* For each vulnerability affecting a native library (e.g., OpenSSL):

  * Map to candidate binaries via SBOM + PURL.
  * Within library image, find `ImageFunction`s matching:

    * `SymbolName` patterns.
    * `FnHash` / `BlockFingerprints` (for precise detection).
* Determine reachability:

  * Starting from application entrypoints, traverse call graph to see if calls to vulnerable library function occur.

---

## 7. Reachability Engine & Lattice (Scanner.WebService)

Project: `StellaOps.Scanner.Reachability`

### 7.1 Inputs to engine

* Combined `CallGraph[]` (per language + binary).
* Vulnerability list (CVE, GHSA, etc.) with affected PURLs.
* Vulnerability signatures.
* Entrypoint hints:

  * Container CMD/ENTRYPOINT.
  * Detected HTTP handlers, WSGI/PSGI entrypoints, etc.

### 7.2 Algorithm steps

1. **Entrypoint expansion**

   * Identify all `CallGraphNode` with `IsEntrypoint=true`.
   * Add language-specific “framework entrypoints” (e.g., Express route dispatch, Django URL dispatch) when detected.

2. **Graph traversal**

   * For each entrypoint node:

     * BFS/DFS through edges.
     * Maintain `reachable` bit on each node.
   * For dynamic edges:

     * Conservative: if target cannot be resolved, mark affected path as partially unknown and downgrade confidence.

3. **Vuln symbol resolution**

   * For each vulnerability:

     * For each vulnerable component PURL found in SBOM:

       * Find candidate nodes whose `SymbolId` matches `TargetSymbolPatterns` / binary fingerprints.
   * If none found:

     * `FunctionNotPresent` (if component version range indicates vulnerable but we cannot find symbol – low confidence).
   * If found:

     * Check `reachable` bit:

       * If reachable by at least one entrypoint, `PresentReachable`.
       * Else, `PresentNotReachable`.

4. **Confidence computation**

   * Start from:

     * `1.0` for direct match with explicit function name & static call.
     * Lower for:

       * Heuristic framework entrypoints.
       * Dynamic calls.
       * Fingerprint-only matches on stripped binaries.
   * Example rule-of-thumb:

     * direct static path only: 0.95–1.0.
     * dynamic edges but symbol found: 0.7–0.9.
     * symbol not found but version says vulnerable: 0.4–0.6.

5. **Lattice merge**

   * Represent each CVE+component pair as a lattice element with states: `{affected, not_affected, unknown}`.
   * Reachability engine produces a **local state**:

     * `PresentReachable` → candidate `affected`.
     * `PresentNotReachable` or `FunctionNotPresent` → candidate `not_affected`.
     * `Unknown` → `unknown`.
   * Merge with:

     * Upstream vendor VEX (from Concelier).
     * Policy overrides (e.g., “treat certain CVEs as affected unless vendor says otherwise”).
   * Final state computed here (Scanner.WebService), not in Concelier or VEXer.

6. **Evidence output**

   * For each vulnerability:

     * Emit `ReachabilityEvidence` with:

       * Status.
       * Confidence.
       * Method.
       * Example entrypoint paths (for UX and audit).
   * Persist this evidence alongside regular scan results.

---

## 8. Integration with SBOM & VEX

### 8.1 SBOM annotation

* Extend SBOM documents (CycloneDX / SPDX) with extra properties:

  * CycloneDX:

    * `component.properties`:

      * `stellaops:reachability:status` = `present_reachable|present_not_reachable|function_not_present|unknown`
      * `stellaops:reachability:confidence` = `0.0-1.0`
  * SPDX:

    * `Annotation` or `ExternalRef` with similar metadata.

### 8.2 OpenVEX generation

Module: `StellaOps.Vexer.Adapter.Reachability`

* For each `(vuln, component)` pair:

  * Map to VEX statement:

    * If `PresentReachable`:

      * `status: affected`
      * `justification: component_not_fixed` or similar.
    * If `PresentNotReachable`:

      * `status: not_affected`
      * `justification: function_not_reachable`
    * If `FunctionNotPresent`:

      * `status: not_affected`
      * `justification: component_not_present` or `function_not_present`
    * If `Unknown`:

      * `status: under_investigation` (configurable).

* Attach evidence via:

  * `analysis` / `details` fields (link to internal evidence JSON or audit link).

* VEXer does not recalculate reachability; it uses the already computed decision + evidence.

---

## 9. Executable Containers & Offline Operation

### 9.1 Executable containers

* Analyzers run inside a dedicated Scanner worker container that has:

  * .NET 10 runtime.
  * Language runtimes if needed for parsing (Node, Python, PHP), or Tree-sitter-based parsing.
* Target image filesystem is mounted read-only under `/mnt/rootfs`.
* No network access (offline/air-gap).
* This satisfies “we will use executable containers” while keeping separation between:

  * Target image (mount only).
  * Analyzer container (StellaOps code).

### 9.2 Offline signature bundles

* Concelier periodically exports:

  * Vulnerability database (CSAF/NVD).
  * Vulnerability Signature Bank.
* Bundles are:

  * DSSE-signed.
  * Versioned (e.g., `signatures-2025-11-01.tar.zst`).
* Scanner uses:

  * The bundle digest as part of the **Scan Manifest** for deterministic replay.

---

## 10. Determinism & Caching

### 10.1 Layer-level caching

* Key: `layerDigest + analyzerVersion + signatureBundleVersion`.
* Cache artifacts:

  * CallGraph(s) per layer (for JS/Python/PHP code present in that layer).
  * Symbolization results per binary file hash.
* For images sharing layers:

  * Merge cached graphs instead of re-analyzing.

### 10.2 Deterministic scan manifest

For each scan, produce:

```json
{
  "imageRef": "registry/app:1.2.3",
  "imageDigest": "sha256:...",
  "scannerVersion": "1.4.0",
  "analyzerVersions": {
    "js": "1.0.0",
    "python": "1.0.0",
    "php": "1.0.0",
    "binary": "1.0.0"
  },
  "signatureBundleDigest": "sha256:...",
  "callGraphDigest": "sha256:...",    // canonical JSON hash
  "reachabilityEvidenceDigest": "sha256:..."
}
```

This manifest can be signed (Authority module) and used for audits and replay.

---

## 11. Implementation Roadmap (Phased)

### Phase 0 – Infrastructure & Binary presence

**Duration:** 1 sprint

* Set up `Scanner.Reachability` core types and interfaces.
* Implement:

  * Basic Symbolizer for ELF + DWARF.
  * Binary function catalog without CFG.
* Link a small set of CVEs to binary function presence via `SymbolName`.
* Expose minimal evidence:

  * `PresentReachable`/`FunctionNotPresent` based only on presence (no call graph).
* Integrate with VEXer to emit `function_not_present` justifications.

**Success criteria:**

* For selected demo images with known vulnerable/ patched OpenSSL, scanner can:

  * Distinguish images where vulnerable function is present vs. absent.
  * Emit OpenVEX with correct `not_affected` when patched.

---

### Phase 1 – JS/Python/PHP call graphs & basic reachability

**Duration:** 1–2 sprints

* Implement:

  * `Scanner.Analyzers.JavaScript` with module + function call graph.
  * `Scanner.Analyzers.Python` and `Scanner.Analyzers.Php` with basic graphs.
* Entrypoint detection:

  * JS: main script from CMD, basic HTTP handlers.
  * Python: main script + Django/Flask heuristics.
  * PHP: front controllers.
* Implement core reachability algorithm (BFS/DFS).
* Implement simple `VulnerabilitySignature` that uses function names and file paths.
* Hook lattice engine in Scanner.WebService and integrate with:

  * Concelier vulnerability feeds.
  * VEXer.

**Success criteria:**

* For demo apps (Node, Django, Laravel):

  * Identify vulnerable functions and mark them reachable/unreachable.
  * Demonstrate noise reduction (some CVEs flagged as `not_affected`).

---

### Phase 2 – Binary CFG & Fingerprinting, Improved Confidence

**Duration:** 1–2 sprints

* Extend Symbolizer & CFG for:

  * Stripped binaries (function hashing).
  * Shared libraries (PLT/IAT resolution).
* Implement `VulnerabilitySignature.BlockFingerprints` to distinguish patched vs vulnerable binary functions.
* Refine confidence scoring:

  * Use fingerprint match quality.
  * Consider presence/absence of debug info.
* Expand coverage:

  * glibc, curl, zlib, OpenSSL, libxml2, etc.

**Success criteria:**

* For curated images:

  * Confirm ability to differentiate patched vs vulnerable versions even when binaries are stripped.
  * Reachability reflects true call paths across app→lib boundaries.

---

### Phase 3 – Runtime hooks (optional), UX, and Hardening

**Duration:** 2+ sprints

* Add opt-in runtime confirmation:

  * eBPF probes for function hits (Linux).
  * Map runtime addresses back to `ImageFunction` via symbolization.
* Enhance console UX:

  * Path explorer UI: show entrypoint → … → vulnerable function path.
  * Evidence view with hash-based proofs.
* Hardening:

  * Performance optimization for large images (parallel analysis, caching).
  * Conservative fallbacks for dynamic language features.

**Success criteria:**

* For selected environments where runtime is allowed:

  * Static reachability is confirmed by runtime traces in majority of cases.
  * No significant performance regression on typical images.

---

## 12. How this satisfies your initial bullets

From your initial requirements:

1. **JavaScript, Python, PHP, binary**
   → Dedicated analyzers per language + binary symbolization/CFG, unified in `Scanner.Reachability`.

2. **Executable containers**
   → Analyzers run inside Scanner’s worker container, mounting the target image rootfs; no network access.

3. **Libraries usage call graph**
   → Call graphs map from entrypoints → app code → library functions; SBOM + PURLs tie functions to libraries.

4. **Reachability analysis**
   → BFS/DFS from entrypoints over per-language and binary graphs, with lattice-based merging in `Scanner.WebService`.

5. **JSON + PURLs**
   → All evidence is JSON with PURL-tagged components; SBOM is annotated, and VEX statements reference those PURLs.

---

If you like, next step can be: I draft concrete C# interface definitions (including some initial Tree-sitter integration stubs for JS/Python/PHP) and a skeleton of the `ReachabilityPlan` and `ReachabilityEngine` classes that you can drop into the monorepo.