Files
git.stella-ops.org/docs/product-advisories/17-Nov-2026 - Stripped-ELF-Reachability.md
master 522fff73cd
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
feat: Add comprehensive documentation for binary reachability with PURL-resolved edges
- Introduced a detailed specification for encoding binary reachability that integrates call graphs with SBOMs.
- Defined a minimal data model including nodes, edges, and SBOM components.
- Outlined a step-by-step guide for building the reachability graph in a C#-centric manner.
- Established core domain models, including enumerations for binary formats and symbol kinds.
- Created a public API for the binary reachability service, including methods for graph building and serialization.
- Specified SBOM component resolution and binary parsing abstractions for PE, ELF, and Mach-O formats.
- Enhanced symbol normalization and digesting processes to ensure deterministic signatures.
- Included error handling, logging, and a high-level test plan to ensure robustness and correctness.
- Added non-functional requirements to guide performance, memory usage, and thread safety.
2025-11-20 23:16:02 +02:00

25 KiB
Raw Blame History

Heres a compact blueprint for bringing stripped ELF binaries into StellaOpss callgraph + reachability scoring—from raw bytes → neutral JSON → deterministic scoring.


Why this matters (quick)

Even when symbols are missing, you can still (1) recover functions, (2) build a call graph, and (3) decide if a vulnerable function is actually reachable from the binarys entrypoints. This feeds StellaOpss deterministic scoring/lattice engine so VEX decisions are evidencebacked, not guesswork.


Highlevel pipeline

  1. Ingest
  • Accept: ELF (static/dynamic), PIE, musl/glibc, multiple arches (x86_64, aarch64, armhf, riscv64).
  • Normalize: compute file hash set (SHA256, BLAKE3), note PT_DYNAMIC, DT_NEEDED, interpreter, RPATH/RUNPATH.
  1. Symbolization (besteffort)
  • If DWARF present: read .debug_* (function names, inlines, CU boundaries, ranges).

  • If stripped:

    • Use disassembler to discover functions (prolog patterns, xreftotargets, thunk detection).
    • Derive synthetic names: sub_<va>, plt_<name> (from dynamic symbol table if available), extern@libc.so.6:memcpy.
    • Lift exported dynsyms and PLT stubs even when local symbols are removed.
    • Recover stringreferenced names (e.g., Go/Python/C++ RTTI/Itanium mangling where present).
  1. Disassembly & IR
  • Disassemble to basic blocks; lift to a neutral IR (SSAlike) sufficient for:

    • Call edges (direct call/bl).
    • Indirect calls via GOT/IAT, vtables, function pointers (approximate with pointsto sets).
    • Tailcalls, thunks, PLT interposition.
  1. Callgraph build
  • Start from entrypoints:

    • ELF entry (_start), constructors (.init_array), exported API (public symbols), main (if recoverable).
    • Optional: entrytrace (cmdline + env + loader path) from container image to seed realistic roots.
  • Build CG with:

    • Direct edges: precise.
    • Indirect edges: conservative, with evidence tags (GOT target set, vtable class set, signature match).
  • Record intermodule edges to shared libs (soname + version) with relocation evidence.

  1. Reachability scoring (deterministic)
  • Input: list of vulnerable functions/paths (from CSAF/CVE KB) normalized to functionlevel identifiers (soname!symbol or hashbased if unnamed).

  • Compute reachability from roots → target:

    • REACHABLE_CONFIRMED (path with only precise edges),
    • REACHABLE_POSSIBLE (path contains conservative edges),
    • NOT_REACHABLE_FOUNDATION (no path in current graph),
    • Add confidence derived from edge evidence + relocation proof.
  • Emit proof trails (the exact path: nodes, edges, evidence).

  1. Neutral JSON intermediate (NJIF)
  • Stored in cache; signed for deterministic replay.
  • Consumed by StellaOps.Policy/Lattice to merge with VEX.

Neutral JSON Intermediate Format (NJIF)

{
  "artifact": {
    "path": "/work/bin/app",
    "hashes": {"sha256": "…", "blake3": "…"},
    "arch": "x86_64",
    "elf": {
      "type": "ET_DYN",
      "interpreter": "/lib64/ld-linux-x86-64.so.2",
      "needed": ["libc.so.6", "libssl.so.3"],
      "rpath": [],
      "runpath": []
    }
  },
  "symbols": {
    "exported": [
      {"id": "libc.so.6!memcpy", "kind": "dynsym", "addr": "0x0", "plt": true}
    ],
    "functions": [
      {"id": "sub_401000", "addr": "0x401000", "size": 112, "name_hint": null, "from": "disasm"},
      {"id": "main", "addr": "0x4023d0", "size": 348, "from": "dwarf|heuristic"}
    ]
  },
  "cfg": [
    {"func": "main", "blocks": [
      {"b": "0x4023d0", "succ": ["0x402415"], "calls": [{"type": "direct", "target": "sub_401000"}]},
      {"b": "0x402415", "succ": ["0x402440"], "calls": [{"type": "plt", "target": "libc.so.6!memcpy"}]}
    ]}
  ],
  "cg": {
    "nodes": [
      {"id": "main", "evidence": ["dwarf|heuristic"]},
      {"id": "sub_401000"},
      {"id": "libc.so.6!memcpy", "external": true, "lib": "libc.so.6"}
    ],
    "edges": [
      {"from": "main", "to": "sub_401000", "kind": "direct"},
      {"from": "main", "to": "libc.so.6!memcpy", "kind": "plt", "evidence": ["reloc@GOT"]}
    ],
    "roots": ["_start", "init_array[]", "main"]
  },
  "reachability": [
    {
      "target": "libssl.so.3!SSL_free",
      "status": "NOT_REACHABLE_FOUNDATION",
      "path": []
    },
    {
      "target": "libc.so.6!memcpy",
      "status": "REACHABLE_CONFIRMED",
      "path": ["main", "libc.so.6!memcpy"],
      "confidence": 0.98,
      "evidence": ["plt", "dynsym", "reloc"]
    }
  ],
  "provenance": {
    "toolchain": {
      "disasm": "ghidra_headless|radare2|llvm-mca",
      "version": "…"
    },
    "scan_manifest_hash": "…",
    "timestamp_utc": "2025-11-16T00:00:00Z"
  }
}

Practical extractors (headless/CLI)

  • DWARF: llvm-dwarfdump/eu-readelf for quick CU/function ranges; fall back to the disassembler.

  • Disassembly/CFG/CG (choose one or more; wrap with a stable adapter):

    • Ghidra Headless API: recover functions, basic blocks, references, PLT/GOT, vtables; export via a custom headless script to NJIF.
    • radare2 / rizin: aaa, agCd, aflj, agj to export functions/graphs as JSON.
    • Binary Ninja headless (if license permits) for cleaner IL and indirectcall modeling.
    • angr for pathsensitive refinement on tricky indirect calls (optional, gated by budget).

Adapter principle: All tools output a small, consistent NJIF so the scoring engine and lattice logic never depend on any single RE tool.


Indirect call modeling (concise rules)

  • PLT/GOT: edge from caller → soname!symbol with evidence: plt, reloc@GOT.
  • Function pointers: if a store to a pointer is found and targets a known function set {f1…fk}, add edges with kind: "indirect", evidence: ["xref-store", "sig-compatible"].
  • Virtual calls / vtables: classmethod set from RTTI/vtable scans; mark edges evidence: ["vtable-match"].
  • Tailcalls: treat as edges, not fallthrough.

Each conservative step lowers confidence, but keeps determinism: the rules and their hashes are in the scan manifest.


Deterministic scoring (plug into Stellas lattice)

  • Inputs: NJIF, CVE→function mapping (soname!symbol or function hash), policy knobs.
  • States: {NOT_OBSERVED < POSSIBLE < REACHABLE_CONFIRMED} with monotone merge (never oscillates).
  • Confidence: product of edge evidences (configurable weights): direct=1.0, plt=0.98, vtable=0.85, funcptr=0.7.
  • Output: OpenVEX/CSAF annotations + human proof path; signed with DSSE to preserve replayability.

Minimal Ghidra headless skeleton (exporter idea)

analyzeHeadless /work/gh_proj MyProj -import app -scriptPath scripts \
  -postScript ExportNjif.java /out/app.njif.json
// ExportNjif.java (outline)
public class ExportNjif extends GhidraScript {
  public void run() throws Exception {
    var fns = getFunctions(true);
    // collect functions, blocks, calls, externs/PLT
    // map nonnamed functions to sub_<addr>
    // detect PLT thunks → dynsym names
    // write NJIF JSON deterministically (sorted keys, stable ordering)
  }
}

Integration points in StellaOps

  • Scanner.Analyzers.Binary.Elf

    • ElfNormalizer → hashes, dynamic deps.
    • Symbolizer → DWARF reader + HeuristicDisasm (via tool adapter).
    • CgBuilder → NJIF builder/merger (multimodule).
    • ReachabilityEngine → path search + confidence math.
    • Emitter → NJIF cache + VEX/CSAF notes.
  • Scheduler: memoize by (hashes, toolchain_version, ruleset_hash) to ensure replayable results.

  • Authority: sign NJIF + scoring outputs; store manifests (feeds, rule weights, tool versions).


Test fixtures (suggested)

  • Tiny ELF zoo: statically linked, PIE, stripped/nonstripped, C++ with vtables, musl vs glibc.
  • Known CVE libs (e.g., libssl, zlib) with versioned symbols to validate soname!symbol mapping.
  • Synthetic binaries with functionpointer tables to validate conservative edges.

If you want, I can generate:

  • A readytorun Ghidra headless exporter (Java) that writes NJIF exactly like above.
  • A small .NET parser that ingests NJIF and emits StellaOps reachability + OpenVEX notes. Below is a full architecture plan for implementing stripped-ELF binary reachability (call graph + NJIF + deterministic scoring, with a hook for patch-oracles) inside StellaOps.

I will assume .NET 10, existing microservice split (Scanner.WebService, Scanner.Worker, Concelier, Excitior, Authority, Scheduler, Sbomer, Signals), and your standing rule: all lattice logic runs in Scanner.WebService.


1. Scope, Objectives, Non-Goals

1.1 Objectives

  1. Recover function-level call graphs from ELF binaries, including stripped ones:
  • Support ET_EXEC / ET_DYN / PIE, static & dynamic linking.
  • Support at least x86_64, aarch64 in v1, later armhf, riscv64.
  1. Produce a neutral, deterministic JSON representation (NJIF):
  • Tool-agnostic: can be generated from Ghidra, radare2/rizin, Binary Ninja, angr, etc.
  • Stable identifiers and schema so downstream services dont depend on a specific RE engine.
  1. Compute function-level reachability for vulnerabilities:
  • Given CVE → soname!symbol (and later function-hash) mappings from Concelier,
  • Decide REACHABLE_CONFIRMED / REACHABLE_POSSIBLE / NOT_REACHABLE_FOUNDATION with evidence and confidence.
  1. Integrate with StellaOps lattice and VEX outputs:
  • Lattice logic runs in Scanner.WebService.
  • Results flow into Excitior (VEX) and Sbomer (SBOM annotations), preserving provenance.
  1. Enable deterministic replay:
  • Every analysis run is tied to a Scan Manifest: tool versions, ruleset hashes, policy hashes, container image digests.

1.2 Non-Goals (v1)

  • No dynamic runtime probes (EventPipe/JFR) in this phase.
  • No full decompilation; we only need enough IR for calls/edges.
  • No aggressive path-sensitive analysis (symbolic execution) in v1; that can be a v2 enhancement.

2. High-Level System Architecture

2.1 Components

  • Scanner.WebService (existing)

    • REST/gRPC API for scans.
    • Orchestrates analysis jobs via Scheduler.
    • Hosts Lattice & Reachability Engine for all artifact types.
    • Reads NJIF results, merges with Concelier function mappings and policies.
  • Scanner.Worker (existing, extended)

    • Executes Binary Analyzer Pipelines.
    • Invokes RE tools (Ghidra, rizin, etc.) in dedicated containers.
    • Produces NJIF and persists it.
  • Binary Tools Containers (new)

    • stellaops-tools-ghidra:<tag>
    • stellaops-tools-rizin:<tag>
    • Optionally stellaops-tools-angr for advanced passes.
    • Pinned versions, no network access (for determinism & air-gap).
  • Storage & Metadata

    • DB (PostgreSQL): scan records, NJIF metadata, reachability summaries.
    • Object store (MinIO/S3/Filesystem): NJIF JSON blobs, tool logs.
    • Authority: DSSE signatures for Scan Manifest, NJIF, and reachability outputs.
  • Concelier

    • Provides CVE → component → function symbol/hashes resolution.
    • Exposes “Link-Not-Merge” graph of advisory, component, and function nodes.
  • Excitior (VEX)

    • Consumes Scanner.WebService reachability states.
    • Emits OpenVEX/CSAF with properly justified statuses.
  • UnknownsRegistry (future)

    • Receives unresolvable call edges / ambiguous functions from the analyzer,
    • Feeds them into “adaptive security” workflows.

2.2 End-to-End Flow (Binary / Image Scan)

  1. Client requests scan (binary or container image) via Scanner.WebService.

  2. WebService:

    • Extracts binaries from OCI layers (if scanning image),
    • Registers Scan Manifest,
    • Submits a job to Scheduler (queue: binary-elfflow).
  3. Scanner.Worker dequeues the job:

    • Detects ELF binaries,
    • Runs Binary Analyzer Pipeline for each unique binary hash.
  4. Worker uses tools containers:

    • Ghidra/rizin → CFG, function discovery, call graph,
    • Converts to NJIF.
  5. Worker persists NJIF + metadata; marks analysis complete.

  6. Scanner.WebService picks up NJIF:

    • Fetches advisory function mappings from Concelier,
    • Runs Reachability & Lattice scoring,
    • Updates scan results and triggers Excitior / Sbomer.

All steps are deterministic given:

  • Input artifact,
  • Tool container digests,
  • Ruleset/policy versions.

3. Binary Analyzer Subsystem (Scanner.Worker)

Introduce a dedicated module:

  • StellaOps.Scanner.Analyzers.Binary.Elf

3.1 Internal Layers

  1. ElfDetector

    • Inspects files in a scan:

      • Magic 0x7f 'E' 'L' 'F',
      • Confirms architecture via ELF header.
    • Produces BinaryArtifact records with:

      • hashes (SHA-256, BLAKE3),
      • path in container,
      • arch, endianness.
  2. ElfNormalizer

    • Uses a lightweight library (e.g., ElfSharp) to extract:

      • ElfType (ET_EXEC, ET_DYN),
      • interpreter (PT_INTERP),
      • DT_NEEDED list,
      • RPATH/RUNPATH,
      • presence/absence of DWARF sections.
    • Emits a normalized ElfMetadata DTO.

  3. Symbolization Layer

    • Sub-components:

      • DwarfSymbolReader: if DWARF present, read CU, function ranges, names, inlines.

      • DynsymReader: parse .dynsym, .plt, exported symbols.

      • HeuristicFunctionFinder:

        • For stripped binaries:

          • Use disassembler xrefs, prolog patterns, return instructions, call-targets.
          • Recognize PLT thunks → soname!symbol.
    • Consolidates into FunctionSymbol entities:

      • id (e.g., main, sub_401000, libc.so.6!memcpy),
      • addr, size, is_external, from (dwarf, dynsym, heuristic).
  4. Disassembly & IR Layer

    • Abstraction: IDisassemblyAdapter:

      • Task<DisasmResult> AnalyzeAsync(BinaryArtifact, ElfMetadata, ScanManifest)
    • Implementations:

      • GhidraDisassemblyAdapter:

        • Invokes headless Ghidra in container,
        • Receives machine-readable JSON (script-produced),
        • Extracts functions, basic blocks, calls, GOT/PLT info, vtables.
      • RizinDisassemblyAdapter (backup/fallback).

    • Produces:

      • BasicBlock objects,
      • Instruction metadata where needed for calls,
      • CallSite records (direct, PLT, indirect).
  5. Call-Graph Builder

    • Consumes FunctionSymbol + CallSite sets.

    • Identifies roots:

      • _start, .init_array entries,
      • main (if present),
      • Exported API functions for shared libs.
    • Creates CallGraph:

      • Nodes: functions (FunctionNode),

      • Edges: CallEdge with:

        • kind: direct, plt, indirect-funcptr, indirect-vtable, tailcall,
        • evidence: tags like ["reloc@GOT", "sig-match", "vtable-class"].
  6. Evidence & Confidence Annotator

    • For each edge, computes a local confidence:

      • direct: 1.0
      • plt: 0.98
      • indirect-funcptr: 0.7
      • indirect-vtable: 0.85
    • For each path later, Scanner.WebService composes these.

  7. NJIF Serializer

    • Transforms domain objects into NJIF JSON:

      • Sorted keys, stable ordering for determinism.
    • Writes:

      • artifact, elf, symbols, cfg, cg, and partial reachability: [] (filled by WebService).
    • Stores in object store, returns location + hash to DB.

  8. Unknowns Reporting

    • Any unresolved:

      • Indirect call with empty target set,
      • Function region not mapped to symbol,
    • Logged as UnknownEvidence records and optionally published to UnknownsRegistry stream.


4. NJIF Data Model (Neutral JSON Intermediate Format)

Define a stable schema with a top-level njif_schema_version field.

4.1 Top-Level Shape

{
  "njif_schema_version": "1.0.0",
  "artifact": { ... },
  "symbols": { ... },
  "cfg": [ ... ],
  "cg": { ... },
  "reachability": [ ... ],
  "provenance": { ... }
}

4.2 Key Sections

  1. artifact

    • path, hashes, arch, elf.type, interpreter, needed, rpath, runpath.
  2. symbols

    • exported: external/dynamic symbols, especially PLT:

      • id, kind, plt, lib.
    • functions:

      • id (synthetic or real name),
      • addr, size, from (source of naming info),
      • name_hint (optional).
  3. cfg

    • Per-function basic block CFG plus call sites:

      • Blocks with succ, calls entries.
    • Sufficient for future static checks, not full IR.

  4. cg

    • nodes: function nodes with evidence tags.

    • edges: call edges with:

      • from, to, kind, evidence.
    • roots: entrypoints for reachability algorithms.

  5. reachability

    • Initially empty from Worker.
    • Populated in Scanner.WebService as:
{
  "target": "libssl.so.3!SSL_free",
  "status": "REACHABLE_CONFIRMED",
  "path": ["_start", "main", "libssl.so.3!SSL_free"],
  "confidence": 0.93,
  "evidence": ["plt", "dynsym", "reloc"]
}
  1. provenance

    • toolchain:

      • disasm: "ghidra_headless:10.4", etc.
    • scan_manifest_hash,

    • timestamp_utc.

4.3 Persisting NJIF

  • Object store (versioned path):

    • njif/{sha256}/njif-v1.json
  • DB table binary_njif:

    • binary_hash, njif_hash, schema_version, toolchain_digest, scan_manifest_id.

5. Reachability & Lattice Integration (Scanner.WebService)

5.1 Inputs

  • NJIF for each binary (possibly multiple binaries per container).

  • Conceliers CVE → (component, function) resolution:

    • component_idsoname!symbol sets, and where available, function hashes.
  • Scanners existing lattice policies:

    • States: e.g. NOT_OBSERVED < POSSIBLE < REACHABLE_CONFIRMED.
    • Merge rules are monotone.

5.2 Reachability Engine

New service module:

  • StellaOps.Scanner.Domain.Reachability

    • INjifRepository (reads NJIF JSON),
    • IFunctionMappingResolver (Concelier adapter),
    • IReachabilityCalculator.

Algorithm per target function:

  1. Resolve vulnerable function(s):

    • From Concelier: soname!symbol and/or func_hash.
    • Map to NJIF symbols.exported or symbols.functions.
  2. For each binary:

    • Use cg.roots as entry set.

    • BFS/DFS along edges until:

      • Reaching target node(s),
      • Or graph fully explored.
  3. For each successful path:

    • Collect edges confidence weights, compute path confidence:

      • e.g., product of edge confidences or a log/additive scheme.
  4. Aggregate result:

    • If ≥ 1 path with only direct/plt edges:

      • status = REACHABLE_CONFIRMED.
    • Else if only paths with indirect edges:

      • status = REACHABLE_POSSIBLE.
    • Else:

      • status = NOT_REACHABLE_FOUNDATION.
  5. Emit reachability entry back into NJIF (or as separate DB table) and into scan result graph.

5.3 Lattice & VEX

  • Lattice computation is done per (CVE, component, binary) triple:

    • Input: reachability status + other signals.
  • Resulting state is:

    • Exposed to Excitior as a set of evidence-annotated VEX facts.
  • Excitior translates:

    • NOT_REACHABLE_FOUNDATION → likely not_affected with justification “code_not_reachable”.
    • REACHABLE_CONFIRMEDaffected or “present_and_exploitable” (depending on overall policy).

6. Patch-Oracle Extension (Advanced, but Architected Now)

While not strictly required for v1, we should reserve architecture hooks.

6.1 Concept

  • Given:

    • A vulnerable library build (or binary),
    • A patched build.
  • Run analyzers on both; produce NJIF for each.

  • Compare call graphs & function bodies (e.g., hash of normalized bytes):

    • Identify changed functions and potentially changed code regions.
  • Concelier links those function IDs to specific CVEs (via vendor patch metadata).

  • These become authoritative “patched function sets” (the patch oracle).

6.2 Integration Points

Add a module:

  • StellaOps.Scanner.Analysis.PatchOracle

    • Input: pair of artifact hashes (old, new) + NJIF.

    • Output: list of FunctionPatchRecord:

      • function_id, binary_hash_old, binary_hash_new, change_kind (added, modified, deleted).

Concelier:

  • Ingests FunctionPatchRecord via internal API and updates advisory graph:

    • CVE → function set derived from real patch.
  • Reachability Engine:

    • Uses patch-derived function sets instead of or in addition to symbol mapping from vendor docs.

7. Persistence, Determinism, Caching

7.1 Scan Manifest

For every scan job, create:

  • scan_manifest:

    • Input artifact hashes,
    • List of binaries,
    • Tool container digests (Ghidra, rizin, etc.),
    • Ruleset/policy/lattice hashes,
    • Time, user, and config flags.

Authority signs this manifest with DSSE.

7.2 Binary Analysis Cache

Key: (binary_hash, arch, toolchain_digest, njif_schema_version).

  • If present:

    • Skip re-running Ghidra/rizin; reuse NJIF.
  • If absent:

    • Run analysis, then cache NJIF.

This provides deterministic replay and prevents re-analysis across scans and across customers (if allowed by tenancy model).


8. APIs & Integration Contracts

8.1 Scanner.WebService External API (REST)

  1. POST /api/scans/images

    • Existing; extended to flag: includeBinaryReachability: true.
  2. POST /api/scans/binaries

    • Upload a standalone ELF; returns scan_id.
  3. GET /api/scans/{scanId}/reachability

    • Returns list of (cve_id, component, binary_path, function_id, status, confidence, path).

No path versioning; idempotent and additive (new fields appear, old ones remain valid).

8.2 Internal APIs

  • Worker ↔ Object Store:

    • PUT /binary-njif/{sha256}/njif-v1.json.
  • WebService ↔ Worker (via Scheduler):

    • Job payload includes:

      • scan_manifest_id,
      • binary_hashes,
      • analysis_profile (default, deep).
  • WebService ↔ Concelier:

    • POST /internal/functions/resolve:

      • Input: (cve_id, component_ids[]),
      • Output: soname!symbol[], optional func_hash[].
  • WebService ↔ Excitior:

    • Existing VEX ingestion extended with reachability evidence fields.

9. Observability, Security, Resource Model

9.1 Observability

  • Metrics:

    • Analysis duration per binary,
    • NJIF size,
    • Cache hit ratio,
    • Reachability evaluation time per CVE.
  • Logs:

    • Ghidra/rizin container logs stored alongside NJIF,
    • Unknowns logs for unresolved call targets.
  • Tracing:

    • Each scan/analysis annotated with scan_manifest_id to allow end-to-end trace.

9.2 Security

  • Tools containers:

    • No outbound network.
    • Limited to read-only artifact mount + write-only result mount.
  • Binary content:

    • Treated as confidential; stored encrypted at rest if your global policy requires it.
  • DSSE:

    • Authority signs:

      • Scan Manifest,
      • NJIF blob hash,
      • Reachability summary.
    • Enables “Proof-of-Integrity Graph” linkage later.

9.3 Resource Model

  • ELF analysis can be heavy; design for:

    • Separate worker queue and autoscaling group for binary analysis.
    • Configurable max concurrency and per-job CPU/memory limits.
  • Deep analysis (indirect calls, vtables) can be toggled via analysis_profile.


10. Implementation Roadmap

A pragmatic, staged plan:

Phase 0 Foundations (12 sprints)

  • Create StellaOps.Scanner.Analyzers.Binary.Elf project.

  • Implement:

    • ElfDetector, ElfNormalizer.
    • DB tables: binary_artifacts, binary_njif.
  • Integrate with Scheduler and Worker pipeline.

Phase 1 Non-stripped ELF + NJIF v1 (23 sprints)

  • Implement DWARF + dynsym symbolization.

  • Implement GhidraDisassemblyAdapter for x86_64.

  • Build CallGraphBuilder (direct + PLT calls).

  • Implement NJIF serializer v1; store in object store.

  • Basic reachability engine in WebService:

    • Only direct and PLT edges,
    • Only for DWARF-named functions.
  • Integrate with Concelier function mapping via soname!symbol.

Phase 2 Stripped ELF Support (23 sprints)

  • Implement HeuristicFunctionFinder for function discovery in stripped binaries.

  • Extend Ghidra script to mark PLT/GOT, vtables, function pointers.

  • Call graph: add:

    • indirect-funcptr, indirect-vtable, tailcall edges.
  • Evidence tagging and local confidence scoring.

  • Extend reachability engine to:

    • Distinguish REACHABLE_CONFIRMED vs REACHABLE_POSSIBLE.

Phase 3 Multi-Arch & Performance (23 sprints)

  • Add support for aarch64 (Ghidra language, appropriate calling conventions).

  • Optimize:

    • Binary analysis cache,
    • Tool container lifecycle,
    • Concurrent analysis.
  • Add Unknowns reporting and hookup to UnknownsRegistry (if already implemented).

Phase 4 Patch-Oracle Pilot (23 sprints)

  • Implement PatchOracle module:

    • Compare old/new NJIFs,
    • Detect changed functions.
  • Integrate with Conceliers advisory graph.

  • Start validating against curated CVE/patch datasets.

Phase 5 Hardening & Documentation

  • Golden fixtures:

    • Small ELF zoo (stripped/non-stripped, PIE, static, C++, vtables).
    • Known vulnerable libs (e.g., OpenSSL, zlib) to confirm correct function mapping.
  • Add CLI/demo in StellaOps.Scanner.Cli:

    • stellaops scan-binary --file app --show-reachability.
  • Customer-facing and internal docs:

    • NJIF schema,
    • API usage,
    • Limitations and interpretation guidelines.

If you want, next step I can do is take this plan and:

  • Break it into epics / tickets (SCAN-BINARY-xxx) with clear DoD per phase, or
  • Draft the Ghidra headless Java script and the .NET NJIF model classes so your agents can plug them straight into the Scanner repo.