Here’s a crisp idea that could give Stella Ops a real moat: **binary‑level reachability**—linking CVEs directly to the exact functions and offsets inside compiled artifacts (ELF/PE/Mach‑O), not just to packages. --- ### Why this matters (quick background) * **Package‑level flags are noisy.** Most scanners say “vuln in `libX v1.2`,” but that library might be present and never executed. * **Language‑level call graphs help** (when you have source or rich metadata), but containers often ship only **stripped binaries**. * **Binary reachability** answers: *Is the vulnerable function actually in this image? Is its code path reachable from the entrypoints we observed or can construct?* --- ### The missing layer: Symbolization Build a **symbolization layer** that normalizes debug and symbol info across platforms: * **Inputs**: DWARF (ELF/Mach‑O), PDB (PE/Windows), symtabs, exported symbols, `.eh_frame`, and (when stripped) heuristic signatures (e.g., function byte‑hashes, CFG fingerprints). * **Outputs**: a source‑agnostic map: `{binary → sections → functions → (addresses, ranges, hashes, demangled names, inlined frames)}`. * **Normalization**: Put everything into a common schema (e.g., `Stella.Symbolix.v1`) so higher layers don’t care if it came from DWARF or PDB. --- ### End‑to‑end reachability (binary‑first, source‑agnostic) 1. **Acquire & parse** * Detect format (ELF/PE/Mach‑O), parse headers, sections, symbol tables. * If debug info present: parse DWARF/PDB; else fall back to disassembly + function boundary recovery. 2. **Function catalog** * Assign stable IDs per function: `(imageHash, textSectionHash, startVA, size, fnHashXX)`. * Record x‑refs (calls/jumps), imports/exports, PLT/IAT edges. 3. **Entrypoint discovery** * Docker entry, process launch args, service scripts; infer likely mains (Go `main.main`, .NET hostfxr path, JVM launcher, etc.). 4. **Call‑graph build (binary CFG)** * Build inter/intra‑procedural graph (direct + resolved indirect via IAT/PLT). Keep “unknown‑target” edges for conservative safety. 5. **CVE→function linking** * Maintain a **signature bank** per CVE advisory: vulnerable function names, file paths, and—crucially—**byte‑sequence or basic‑block fingerprints** for patched vs vulnerable versions (works even when stripped). 6. **Reachability analysis** * Is the vulnerable function present? Is there a path from any entrypoint to it (under conservative assumptions)? Tag as `Present+Reachable`, `Present+Uncertain`, or `Absent`. 7. **Runtime confirmation (optional, when users allow)** * Lightweight probes (eBPF on Linux, ETW on Windows, perf/JFR/EventPipe) capture function hits; cross‑check with the static result to upgrade confidence. --- ### Minimal component plan (drop into Stella Ops) * **Scanner.Symbolizer** Parsers: ELF/DWARF (libdw or pure‑managed reader), PE/PDB (Dia/LLVM PDB), Mach‑O/DSYM. Output: `Symbolix.v1` blobs stored in OCI layer cache. * **Scanner.CFG** Lifts functions to a normalized IR (capstone/iced‑x86 for decode) → builds CFG & call graph. * **Advisory.FingerprintBank** Ingests CSAF/OpenVEX plus curated fingerprints (fn names, block hashes, patch diff markers). Versioned, signed, air‑gap‑syncable. * **Reachability.Engine** Joins (`Symbolix` + `CFG` + `FingerprintBank`) → emits `ReachabilityEvidence` with lattice states for VEX. * **VEXer.Adapter** Emits **OpenVEX** statements with `status: affected/not_affected` and `justification: function_not_present | function_not_reachable | mitigated_at_runtime`, attaching Evidence URIs. * **Console UX** “Why not affected?” panel showing entrypoint→…→function path (or absence), with byte‑hash proof. --- ### Data model sketch (concise) * `ImageFunction { id, name?, startVA, size, fnHash, sectionHash, demangled?, provenance:{DWARF|PDB|Heuristic} }` * `Edge { srcFnId, dstFnId, kind:{direct|plt|iat|indirect?} }` * `CveSignature { cveId, fnName?, libHints[], blockFingerprints[], versionRanges }` * `Evidence { cveId, imageId, functionMatches[], reachable: bool?, confidence:[low|med|high], method:[static|runtime|hybrid] }` --- ### Practical phases (8–10 weeks of focused work) 1. **P0**: ELF/DWARF symbolizer + basic function catalog; link a handful of CVEs via name‑only; emit OpenVEX `function_not_present`. 2. **P1**: CFG builder (direct calls) + PLT/IAT resolution; simple reachability; first fingerprints for top 50 CVEs in glibc, openssl, curl, zlib. 3. **P2**: Stripped‑binary heuristics (block hashing) + Go/Rust name demangling; Windows PDB ingestion for PE. 4. **P3**: Runtime probes (opt‑in) + confidence upgrade logic; Console path explorer; evidence signing (DSSE). --- ### KPIs to prove the moat * **Noise cut**: % reduction in “affected” flags after reachability (target 40–70% on typical containers). * **Precision**: Ground‑truth validation vs PoC images (TP/FP/FN on presence & reachability). * **Coverage**: % images where we can make a determination without source (goal: >80%). * **Latency**: Added scan time per image (<15s typical with caches). --- ### Risks & how to handle them * **Stripped binaries** → mitigate with block‑hash fingerprints & library‑version heuristics. * **Obfuscated/packed code** → mark `Uncertain`; allow user‑supplied hints; prefer runtime confirmation. * **Advisory inconsistency** → keep our own curated CVE→function fingerprint bank; sign & version it. * **Platform spread** → start Linux/ELF, then Windows/PDB, then Mach‑O. --- ### Why competitors struggle Most tools stop at packages because binary CFG + fingerprint curation is hard and expensive. Shipping a **source‑agnostic reachability engine** tied to signed evidence in VEX would set Stella Ops apart—especially in offline/air‑gapped and sovereign contexts you already target. If you want, I can draft: * the `Symbolix.v1` protobuf, * a tiny PoC (ELF→functions→match CVE with a block fingerprint), * and the OpenVEX emission snippet your VEXer can produce. Below is a detailed architecture plan for implementing reachability and call-graph analysis in Stella Ops, covering JavaScript, Python, PHP, and binaries, and integrating with your existing Scanner / Concelier / VEXer stack. I will assume: * .NET 10 for core services. * Scanner is the place where all “trust algebra / lattice” runs (per your standing rule). * Concelier and VEXer remain “preserve/prune” layers and do not run lattice logic. * Output must be JSON-centric with PURLs and OpenVEX. --- ## 1. Scope & Objectives ### 1.1 Primary goals 1. From an OCI image, build: * A **library-level usage graph** (which libraries are used by which entrypoints). * A **function-level call graph** for JS / Python / PHP / binaries. 2. Map CVEs (from Concelier) to: * Concrete **components** (PURLs) in the SBOM. * Concrete **functions / entrypoints / code regions** inside those components. 3. Perform **reachability analysis** to classify each vulnerability as: * `present + reachable` * `present + not_reachable` * `function_not_present` (no vulnerable symbol) * `uncertain` (dynamic features, unresolved calls) 4. Emit: * **Structured JSON** with PURLs and call-graph nodes/edges (“reachability evidence”). * **OpenVEX** documents with appropriate `status`/`justification`. ### 1.2 Non-goals (for now) * Full dynamic analysis of the running container (eBPF, ptrace, etc.) – leave as Phase 3+ optional add-on. * Perfect call graph precision for dynamic languages (aim for safe, conservative approximations). * Automatic “fix recommendations” (handled by other Stella Ops agents later). --- ## 2. High-Level Architecture ### 2.1 Major components Within Stella Ops: * **Scanner.WebService** * User-facing API. * Orchestrates full scan (SBOM, CVEs, reachability). * Hosts the **Lattice/Policy engine** that merges evidence and produces decisions. * **Scanner.Worker** * Runs per-image analysis jobs. * Invokes analyzers (JS, Python, PHP, Binary) inside its own container context. * **Scanner.Reachability Core Library** * Unified IR for call graphs and reachability evidence. * Interfaces for language and binary analyzers. * Graph algorithms (BFS/DFS, lattice evaluation, entrypoint expansion). * **Language Analyzers** * `Scanner.Analyzers.JavaScript` * `Scanner.Analyzers.Python` * `Scanner.Analyzers.Php` * `Scanner.Analyzers.Binary` * **Symbolization & CFG (for binaries)** * `Scanner.Symbolization` (ELF, PE, Mach-O parsers, DWARF/PDB) * `Scanner.Cfg` (CFG + call graph for binaries) * **Vulnerability Signature Bank** * `Concelier.Signatures` (curated CVE→function/library fingerprints). * Exposed to Scanner as **offline bundle**. * **VEXer** * `Vexer.Adapter.Reachability` – transforms reachability evidence into OpenVEX. ### 2.2 Data flow (logical) ```mermaid flowchart LR A[OCI Image / Tar] --> B[Scanner.Worker: Extract FS] B --> C[SBOM Engine (CycloneDX/SPDX)] C --> D[Vuln Match (Concelier feeds)] B --> E1[JS Analyzer] B --> E2[Python Analyzer] B --> E3[PHP Analyzer] B --> E4[Binary Analyzer + Symbolizer/CFG] D --> F[Reachability Orchestrator] E1 --> F E2 --> F E3 --> F E4 --> F F --> G[Lattice/Policy Engine (Scanner.WebService)] G --> H[Reachability Evidence JSON] G --> I[VEXer: OpenVEX] G --> J[Graph/Cartographer (optional)] ``` --- ## 3. Data Model & JSON Contracts ### 3.1 Core IR types (Scanner.Reachability) Define in a central assembly, e.g. `StellaOps.Scanner.Reachability`: ```csharp public record ComponentRef( string Purl, string? BomRef, string? Name, string? Version); public enum SymbolKind { Function, Method, Constructor, Lambda, Import, Export } public record SymbolId( string Language, // "js", "python", "php", "binary" string ComponentPurl, // SBOM component PURL or "" for app code string LogicalName, // e.g., "server.js:handleLogin" string? FilePath, int? Line); public record CallGraphNode( string Id, // stable id, e.g., hash(SymbolId) SymbolId Symbol, SymbolKind Kind, bool IsEntrypoint); public enum CallEdgeKind { Direct, Indirect, Dynamic, External, Ffi } public record CallGraphEdge( string FromNodeId, string ToNodeId, CallEdgeKind Kind); public record CallGraph( string GraphId, IReadOnlyList Nodes, IReadOnlyList Edges); ``` ### 3.2 Vulnerability mapping ```csharp public record VulnerabilitySignature( string Source, // "csaf", "nvd", "vendor" string Id, // "CVE-2023-12345" IReadOnlyList Purls, IReadOnlyList TargetSymbolPatterns, // glob-like or regex IReadOnlyList? FilePathPatterns, IReadOnlyList? BlockFingerprints // for binaries, optional ); ``` ### 3.3 Reachability evidence ```csharp public enum ReachabilityStatus { PresentReachable, PresentNotReachable, FunctionNotPresent, Unknown } public record ReachabilityEvidence ( string ImageRef, string VulnId, // CVE or advisory id ComponentRef Component, ReachabilityStatus Status, double Confidence, // 0..1 string Method, // "static-callgraph", "binary-fingerprint", etc. IReadOnlyList EntrypointNodeIds, IReadOnlyList>? ExamplePaths // optional list of node-paths ); ``` ### 3.4 JSON structure (external) Minimal external JSON (what you store / expose): ```json { "image": "registry.example.com/app:1.2.3", "components": [ { "purl": "pkg:npm/express@4.18.0", "bomRef": "component-1" } ], "callGraphs": [ { "graphId": "js-main", "language": "js", "nodes": [ /* CallGraphNode */ ], "edges": [ /* CallGraphEdge */ ] } ], "reachability": [ { "vulnId": "CVE-2023-12345", "componentPurl": "pkg:npm/express@4.18.0", "status": "PresentReachable", "confidence": 0.92, "entrypoints": [ "node:..." ], "paths": [ ["node:entry", "node:routeHandler", "node:vulnFn"] ] } ] } ``` --- ## 4. Scanner-Side Architecture ### 4.1 Project layout (suggested) ```text src/ Scanner/ StellaOps.Scanner.WebService/ StellaOps.Scanner.Worker/ StellaOps.Scanner.Core/ # shared scan domain StellaOps.Scanner.Reachability/ StellaOps.Scanner.Symbolization/ StellaOps.Scanner.Cfg/ StellaOps.Scanner.Analyzers.JavaScript/ StellaOps.Scanner.Analyzers.Python/ StellaOps.Scanner.Analyzers.Php/ StellaOps.Scanner.Analyzers.Binary/ ``` ### 4.2 API surface (Scanner.WebService) * `POST /api/scan/image` * Request: `{ "imageRef": "...", "profile": { "reachability": true, ... } }` * Returns: scan id. * `GET /api/scan/{id}/reachability` * Returns: `ReachabilityEvidence[]`, plus call graph summary (optional). * `GET /api/scan/{id}/vex` * Returns: OpenVEX with statuses based on reachability lattice. ### 4.3 Worker orchestration `StellaOps.Scanner.Worker`: 1. Receives scan job with `imageRef`. 2. Extracts filesystem (layered rootfs) under `/mnt/scans/{scanId}/rootfs`. 3. Invokes SBOM generator (CycloneDX/SPDX). 4. Invokes Concelier via offline feeds to get: * Component vulnerabilities (CVE list per PURL). * Vulnerability signatures (fingerprints). 5. Builds a `ReachabilityPlan`: ```csharp public record ReachabilityPlan( IReadOnlyList Components, IReadOnlyList Vulns, IReadOnlyList AnalyzerTargets // files/dirs grouped by language ); ``` 6. For each language target, dispatch analyzer: * JavaScript: `IReachabilityAnalyzer` implementation for JS. * Python: likewise. * PHP: likewise. * Binary: symbolizer + CFG. 7. Collects call graphs from each analyzer and merges them into a single IR (or separate per-language graphs with shared IDs). 8. Sends merged graphs + vuln list to **Reachability Engine** (Scanner.Reachability). --- ## 5. Language Analyzers (JS / Python / PHP) All analyzers implement a common interface: ```csharp public interface IReachabilityAnalyzer { string Language { get; } // "js", "python", "php" Task AnalyzeAsync(AnalyzerContext context, CancellationToken ct); } public record AnalyzerContext( string RootFsPath, IReadOnlyList Components, IReadOnlyList Vulnerabilities, IReadOnlyDictionary Env, // container env, entrypoint, etc. string? EntrypointCommand // container CMD/ENTRYPOINT ); ``` ### 5.1 JavaScript (Node.js focus) **Inputs:** * `/app` tree inside container (or discovered via SBOM). * `package.json` files. * Container entrypoint (e.g., `["node", "server.js"]`). **Core steps:** 1. Identify **app root**: * Heuristics: directory containing `package.json` that owns the entry script. 2. Parse: * All `.js`, `.mjs`, `.cjs` in app and `node_modules` for vulnerable PURLs. * Use a parsing frontend (e.g., Tree-sitter via .NET binding, or Node+AST-as-JSON). 3. Build module graph: * `require`, `import`, `export`. 4. Function-level graph: * For each function/method, create `CallGraphNode`. * For each `callExpression`, create `CallGraphEdge` (try to resolve callee). 5. Entrypoints: * Main script in CMD/ENTRYPOINT. * HTTP route handlers (for express/koa) detected by patterns (e.g., `app.get("/...")`). 6. Map vulnerable symbols: * From `VulnerabilitySignature.TargetSymbolPatterns` (e.g., `express/lib/router/layer.js:handle_request`). * Identify nodes whose `SymbolId` matches patterns. **Output:** * `CallGraph` for JS with: * `IsEntrypoint = true` for main and detected handlers. * Node attributes include file path, line, component PURL. ### 5.2 Python **Inputs:** * Site-packages paths from SBOM. * Entrypoint script (CMD/ENTRYPOINT). * Framework heuristics (Django, Flask) from environment variables or common entrypoints. **Core steps:** 1. Discover Python interpreter chain: not needed for pure static, but useful for heuristics. 2. Parse `.py` files of: * App code. * Vulnerable packages (per PURL). 3. Build module import graph (`import`, `from x import y`). 4. Function-level graph: * Nodes for functions, methods, class constructors. * Edges for call expressions; conservative for dynamic calls. 5. Entrypoints: * Main script. * WSGI callable (e.g., `application` in `wsgi.py`). * Django URLconf -> view functions. 6. Map vulnerable symbols using `TargetSymbolPatterns` like `django.middleware.security.SecurityMiddleware.__call__`. ### 5.3 PHP **Inputs:** * Web root (from container image or conventional paths `/var/www/html`, `/app/public`, etc.). * Composer metadata (`composer.json`, `vendor/`). * Web server config if present (optional). **Core steps:** 1. Discover front controllers (e.g., `index.php`, `public/index.php`). 2. Parse PHP files (again, via Tree-sitter or any suitable parser). 3. Resolve include/require chains to build file-level inclusion graph. 4. Build function/method graph: * Functions, methods, class constructors. * Calls with best-effort resolution for namespaced functions. 5. Entrypoints: * Front controllers and router entrypoints (e.g., Symfony, Laravel detection). 6. Map vulnerable symbols (e.g., functions in certain vendor packages, particular methods). --- ## 6. Binary Analyzer & Symbolizer Project: `StellaOps.Scanner.Analyzers.Binary` + `Symbolization` + `Cfg`. ### 6.1 Inputs * All binaries and shared libraries in: * `/usr/lib`, `/lib`, `/app/bin`, etc. * SBOM link: each binary mapped to its component PURL when possible. * Vulnerability signatures for native libs: function names, symbol names, fingerprints. ### 6.2 Symbolization Module: `StellaOps.Scanner.Symbolization` * Detect format: ELF, PE, Mach-O. * For ELF/Mach-O: * Parse symbol tables (`.symtab`, `.dynsym`). * Parse DWARF (if present) to map functions to source files/lines. * For PE: * Parse PDB (if present) or export table. * For stripped binaries: * Run function boundary recovery (linear sweep + heuristic). * Compute block/fn-level hashes for fingerprinting. Output: ```csharp public record ImageFunction( string ImageId, // e.g., SHA256 of file ulong StartVa, uint Size, string? SymbolName, // demangled if possible string FnHash, // stable hash of bytes / CFG string? SourceFile, int? SourceLine); ``` ### 6.3 CFG + Call graph Module: `StellaOps.Scanner.Cfg` * Disassemble `.text` using Capstone/Iced.x86. * Build basic blocks and CFG. * Identify: * Direct calls (resolved). * PLT/IAT indirections to shared libraries. * Build `CallGraph` for binary functions: * Entrypoints: `main`, exported functions, Go `main.main`, etc. * Map application functions to library functions via PLT/IAT edges. ### 6.4 Linking vulnerabilities * For each vulnerability affecting a native library (e.g., OpenSSL): * Map to candidate binaries via SBOM + PURL. * Within library image, find `ImageFunction`s matching: * `SymbolName` patterns. * `FnHash` / `BlockFingerprints` (for precise detection). * Determine reachability: * Starting from application entrypoints, traverse call graph to see if calls to vulnerable library function occur. --- ## 7. Reachability Engine & Lattice (Scanner.WebService) Project: `StellaOps.Scanner.Reachability` ### 7.1 Inputs to engine * Combined `CallGraph[]` (per language + binary). * Vulnerability list (CVE, GHSA, etc.) with affected PURLs. * Vulnerability signatures. * Entrypoint hints: * Container CMD/ENTRYPOINT. * Detected HTTP handlers, WSGI/PSGI entrypoints, etc. ### 7.2 Algorithm steps 1. **Entrypoint expansion** * Identify all `CallGraphNode` with `IsEntrypoint=true`. * Add language-specific “framework entrypoints” (e.g., Express route dispatch, Django URL dispatch) when detected. 2. **Graph traversal** * For each entrypoint node: * BFS/DFS through edges. * Maintain `reachable` bit on each node. * For dynamic edges: * Conservative: if target cannot be resolved, mark affected path as partially unknown and downgrade confidence. 3. **Vuln symbol resolution** * For each vulnerability: * For each vulnerable component PURL found in SBOM: * Find candidate nodes whose `SymbolId` matches `TargetSymbolPatterns` / binary fingerprints. * If none found: * `FunctionNotPresent` (if component version range indicates vulnerable but we cannot find symbol – low confidence). * If found: * Check `reachable` bit: * If reachable by at least one entrypoint, `PresentReachable`. * Else, `PresentNotReachable`. 4. **Confidence computation** * Start from: * `1.0` for direct match with explicit function name & static call. * Lower for: * Heuristic framework entrypoints. * Dynamic calls. * Fingerprint-only matches on stripped binaries. * Example rule-of-thumb: * direct static path only: 0.95–1.0. * dynamic edges but symbol found: 0.7–0.9. * symbol not found but version says vulnerable: 0.4–0.6. 5. **Lattice merge** * Represent each CVE+component pair as a lattice element with states: `{affected, not_affected, unknown}`. * Reachability engine produces a **local state**: * `PresentReachable` → candidate `affected`. * `PresentNotReachable` or `FunctionNotPresent` → candidate `not_affected`. * `Unknown` → `unknown`. * Merge with: * Upstream vendor VEX (from Concelier). * Policy overrides (e.g., “treat certain CVEs as affected unless vendor says otherwise”). * Final state computed here (Scanner.WebService), not in Concelier or VEXer. 6. **Evidence output** * For each vulnerability: * Emit `ReachabilityEvidence` with: * Status. * Confidence. * Method. * Example entrypoint paths (for UX and audit). * Persist this evidence alongside regular scan results. --- ## 8. Integration with SBOM & VEX ### 8.1 SBOM annotation * Extend SBOM documents (CycloneDX / SPDX) with extra properties: * CycloneDX: * `component.properties`: * `stellaops:reachability:status` = `present_reachable|present_not_reachable|function_not_present|unknown` * `stellaops:reachability:confidence` = `0.0-1.0` * SPDX: * `Annotation` or `ExternalRef` with similar metadata. ### 8.2 OpenVEX generation Module: `StellaOps.Vexer.Adapter.Reachability` * For each `(vuln, component)` pair: * Map to VEX statement: * If `PresentReachable`: * `status: affected` * `justification: component_not_fixed` or similar. * If `PresentNotReachable`: * `status: not_affected` * `justification: function_not_reachable` * If `FunctionNotPresent`: * `status: not_affected` * `justification: component_not_present` or `function_not_present` * If `Unknown`: * `status: under_investigation` (configurable). * Attach evidence via: * `analysis` / `details` fields (link to internal evidence JSON or audit link). * VEXer does not recalculate reachability; it uses the already computed decision + evidence. --- ## 9. Executable Containers & Offline Operation ### 9.1 Executable containers * Analyzers run inside a dedicated Scanner worker container that has: * .NET 10 runtime. * Language runtimes if needed for parsing (Node, Python, PHP), or Tree-sitter-based parsing. * Target image filesystem is mounted read-only under `/mnt/rootfs`. * No network access (offline/air-gap). * This satisfies “we will use executable containers” while keeping separation between: * Target image (mount only). * Analyzer container (StellaOps code). ### 9.2 Offline signature bundles * Concelier periodically exports: * Vulnerability database (CSAF/NVD). * Vulnerability Signature Bank. * Bundles are: * DSSE-signed. * Versioned (e.g., `signatures-2025-11-01.tar.zst`). * Scanner uses: * The bundle digest as part of the **Scan Manifest** for deterministic replay. --- ## 10. Determinism & Caching ### 10.1 Layer-level caching * Key: `layerDigest + analyzerVersion + signatureBundleVersion`. * Cache artifacts: * CallGraph(s) per layer (for JS/Python/PHP code present in that layer). * Symbolization results per binary file hash. * For images sharing layers: * Merge cached graphs instead of re-analyzing. ### 10.2 Deterministic scan manifest For each scan, produce: ```json { "imageRef": "registry/app:1.2.3", "imageDigest": "sha256:...", "scannerVersion": "1.4.0", "analyzerVersions": { "js": "1.0.0", "python": "1.0.0", "php": "1.0.0", "binary": "1.0.0" }, "signatureBundleDigest": "sha256:...", "callGraphDigest": "sha256:...", // canonical JSON hash "reachabilityEvidenceDigest": "sha256:..." } ``` This manifest can be signed (Authority module) and used for audits and replay. --- ## 11. Implementation Roadmap (Phased) ### Phase 0 – Infrastructure & Binary presence **Duration:** 1 sprint * Set up `Scanner.Reachability` core types and interfaces. * Implement: * Basic Symbolizer for ELF + DWARF. * Binary function catalog without CFG. * Link a small set of CVEs to binary function presence via `SymbolName`. * Expose minimal evidence: * `PresentReachable`/`FunctionNotPresent` based only on presence (no call graph). * Integrate with VEXer to emit `function_not_present` justifications. **Success criteria:** * For selected demo images with known vulnerable/ patched OpenSSL, scanner can: * Distinguish images where vulnerable function is present vs. absent. * Emit OpenVEX with correct `not_affected` when patched. --- ### Phase 1 – JS/Python/PHP call graphs & basic reachability **Duration:** 1–2 sprints * Implement: * `Scanner.Analyzers.JavaScript` with module + function call graph. * `Scanner.Analyzers.Python` and `Scanner.Analyzers.Php` with basic graphs. * Entrypoint detection: * JS: main script from CMD, basic HTTP handlers. * Python: main script + Django/Flask heuristics. * PHP: front controllers. * Implement core reachability algorithm (BFS/DFS). * Implement simple `VulnerabilitySignature` that uses function names and file paths. * Hook lattice engine in Scanner.WebService and integrate with: * Concelier vulnerability feeds. * VEXer. **Success criteria:** * For demo apps (Node, Django, Laravel): * Identify vulnerable functions and mark them reachable/unreachable. * Demonstrate noise reduction (some CVEs flagged as `not_affected`). --- ### Phase 2 – Binary CFG & Fingerprinting, Improved Confidence **Duration:** 1–2 sprints * Extend Symbolizer & CFG for: * Stripped binaries (function hashing). * Shared libraries (PLT/IAT resolution). * Implement `VulnerabilitySignature.BlockFingerprints` to distinguish patched vs vulnerable binary functions. * Refine confidence scoring: * Use fingerprint match quality. * Consider presence/absence of debug info. * Expand coverage: * glibc, curl, zlib, OpenSSL, libxml2, etc. **Success criteria:** * For curated images: * Confirm ability to differentiate patched vs vulnerable versions even when binaries are stripped. * Reachability reflects true call paths across app→lib boundaries. --- ### Phase 3 – Runtime hooks (optional), UX, and Hardening **Duration:** 2+ sprints * Add opt-in runtime confirmation: * eBPF probes for function hits (Linux). * Map runtime addresses back to `ImageFunction` via symbolization. * Enhance console UX: * Path explorer UI: show entrypoint → … → vulnerable function path. * Evidence view with hash-based proofs. * Hardening: * Performance optimization for large images (parallel analysis, caching). * Conservative fallbacks for dynamic language features. **Success criteria:** * For selected environments where runtime is allowed: * Static reachability is confirmed by runtime traces in majority of cases. * No significant performance regression on typical images. --- ## 12. How this satisfies your initial bullets From your initial requirements: 1. **JavaScript, Python, PHP, binary** → Dedicated analyzers per language + binary symbolization/CFG, unified in `Scanner.Reachability`. 2. **Executable containers** → Analyzers run inside Scanner’s worker container, mounting the target image rootfs; no network access. 3. **Libraries usage call graph** → Call graphs map from entrypoints → app code → library functions; SBOM + PURLs tie functions to libraries. 4. **Reachability analysis** → BFS/DFS from entrypoints over per-language and binary graphs, with lattice-based merging in `Scanner.WebService`. 5. **JSON + PURLs** → All evidence is JSON with PURL-tagged components; SBOM is annotated, and VEX statements reference those PURLs. --- If you like, next step can be: I draft concrete C# interface definitions (including some initial Tree-sitter integration stubs for JS/Python/PHP) and a skeleton of the `ReachabilityPlan` and `ReachabilityEngine` classes that you can drop into the monorepo.