Files

Docs CI / lint-and-preview (push) Has been cancelled

Details

feat: Add comprehensive documentation for binary reachability with PURL-resolved edges

- Introduced a detailed specification for encoding binary reachability that integrates call graphs with SBOMs.
- Defined a minimal data model including nodes, edges, and SBOM components.
- Outlined a step-by-step guide for building the reachability graph in a C#-centric manner.
- Established core domain models, including enumerations for binary formats and symbol kinds.
- Created a public API for the binary reachability service, including methods for graph building and serialization.
- Specified SBOM component resolution and binary parsing abstractions for PE, ELF, and Mach-O formats.
- Enhanced symbol normalization and digesting processes to ensure deterministic signatures.
- Included error handling, logging, and a high-level test plan to ensure robustness and correctness.
- Added non-functional requirements to guide performance, memory usage, and thread safety.

2025-11-20 23:16:02 +02:00

25 KiB

Raw Blame History

Here’s a compact blueprint for bringing stripped ELF binaries into StellaOps’s call‑graph + reachability scoring—from raw bytes → neutral JSON → deterministic scoring.

Why this matters (quick)

Even when symbols are missing, you can still (1) recover functions, (2) build a call graph, and (3) decide if a vulnerable function is actually reachable from the binary’s entrypoints. This feeds StellaOps’s deterministic scoring/lattice engine so VEX decisions are evidence‑backed, not guesswork.

High‑level pipeline

Ingest

Accept: ELF (static/dynamic), PIE, musl/glibc, multiple arches (x86_64, aarch64, armhf, riscv64).
Normalize: compute file hash set (SHA‑256, BLAKE3), note PT_DYNAMIC, DT_NEEDED, interpreter, RPATH/RUNPATH.

Symbolization (best‑effort)

If DWARF present: read .debug_* (function names, inlines, CU boundaries, ranges).
If stripped:
- Use disassembler to discover functions (prolog patterns, xref‑to‑targets, thunk detection).
- Derive synthetic names: sub_<va>, plt_<name> (from dynamic symbol table if available), extern@libc.so.6:memcpy.
- Lift exported dynsyms and PLT stubs even when local symbols are removed.
- Recover string‑referenced names (e.g., Go/Python/C++ RTTI/Itanium mangling where present).

Disassembly & IR

Disassemble to basic blocks; lift to a neutral IR (SSA‑like) sufficient for:
- Call edges (direct call/bl).
- Indirect calls via GOT/IAT, vtables, function pointers (approximate with points‑to sets).
- Tailcalls, thunks, PLT interposition.

Call‑graph build

Start from entrypoints:
- ELF entry (_start), constructors (.init_array), exported API (public symbols), main (if recoverable).
- Optional: entry‑trace (cmd‑line + env + loader path) from container image to seed realistic roots.
Build CG with:
- Direct edges: precise.
- Indirect edges: conservative, with evidence tags (GOT target set, vtable class set, signature match).
Record inter‑module edges to shared libs (soname + version) with relocation evidence.

Reachability scoring (deterministic)

Input: list of vulnerable functions/paths (from CSAF/CVE KB) normalized to function‑level identifiers (soname!symbol or hash‑based if unnamed).
Compute reachability from roots → target:
- REACHABLE_CONFIRMED (path with only precise edges),
- REACHABLE_POSSIBLE (path contains conservative edges),
- NOT_REACHABLE_FOUNDATION (no path in current graph),
- Add confidence derived from edge evidence + relocation proof.
Emit proof trails (the exact path: nodes, edges, evidence).

Neutral JSON intermediate (NJIF)

Stored in cache; signed for deterministic replay.
Consumed by StellaOps.Policy/Lattice to merge with VEX.

Neutral JSON Intermediate Format (NJIF)

{
  "artifact": {
    "path": "/work/bin/app",
    "hashes": {"sha256": "…", "blake3": "…"},
    "arch": "x86_64",
    "elf": {
      "type": "ET_DYN",
      "interpreter": "/lib64/ld-linux-x86-64.so.2",
      "needed": ["libc.so.6", "libssl.so.3"],
      "rpath": [],
      "runpath": []
    }
  },
  "symbols": {
    "exported": [
      {"id": "libc.so.6!memcpy", "kind": "dynsym", "addr": "0x0", "plt": true}
    ],
    "functions": [
      {"id": "sub_401000", "addr": "0x401000", "size": 112, "name_hint": null, "from": "disasm"},
      {"id": "main", "addr": "0x4023d0", "size": 348, "from": "dwarf|heuristic"}
    ]
  },
  "cfg": [
    {"func": "main", "blocks": [
      {"b": "0x4023d0", "succ": ["0x402415"], "calls": [{"type": "direct", "target": "sub_401000"}]},
      {"b": "0x402415", "succ": ["0x402440"], "calls": [{"type": "plt", "target": "libc.so.6!memcpy"}]}
    ]}
  ],
  "cg": {
    "nodes": [
      {"id": "main", "evidence": ["dwarf|heuristic"]},
      {"id": "sub_401000"},
      {"id": "libc.so.6!memcpy", "external": true, "lib": "libc.so.6"}
    ],
    "edges": [
      {"from": "main", "to": "sub_401000", "kind": "direct"},
      {"from": "main", "to": "libc.so.6!memcpy", "kind": "plt", "evidence": ["reloc@GOT"]}
    ],
    "roots": ["_start", "init_array[]", "main"]
  },
  "reachability": [
    {
      "target": "libssl.so.3!SSL_free",
      "status": "NOT_REACHABLE_FOUNDATION",
      "path": []
    },
    {
      "target": "libc.so.6!memcpy",
      "status": "REACHABLE_CONFIRMED",
      "path": ["main", "libc.so.6!memcpy"],
      "confidence": 0.98,
      "evidence": ["plt", "dynsym", "reloc"]
    }
  ],
  "provenance": {
    "toolchain": {
      "disasm": "ghidra_headless|radare2|llvm-mca",
      "version": "…"
    },
    "scan_manifest_hash": "…",
    "timestamp_utc": "2025-11-16T00:00:00Z"
  }
}

Practical extractors (headless/CLI)

DWARF: llvm-dwarfdump/eu-readelf for quick CU/function ranges; fall back to the disassembler.
Disassembly/CFG/CG (choose one or more; wrap with a stable adapter):
- Ghidra Headless API: recover functions, basic blocks, references, PLT/GOT, vtables; export via a custom headless script to NJIF.
- radare2 / rizin: aaa, agCd, aflj, agj to export functions/graphs as JSON.
- Binary Ninja headless (if license permits) for cleaner IL and indirect‑call modeling.
- angr for path‑sensitive refinement on tricky indirect calls (optional, gated by budget).

Adapter principle: All tools output a small, consistent NJIF so the scoring engine and lattice logic never depend on any single RE tool.

Indirect call modeling (concise rules)

PLT/GOT: edge from caller → soname!symbol with evidence: plt, reloc@GOT.
Function pointers: if a store to a pointer is found and targets a known function set {f1…fk}, add edges with kind: "indirect", evidence: ["xref-store", "sig-compatible"].
Virtual calls / vtables: class‑method set from RTTI/vtable scans; mark edges evidence: ["vtable-match"].
Tailcalls: treat as edges, not fallthrough.

Each conservative step lowers confidence, but keeps determinism: the rules and their hashes are in the scan manifest.

Deterministic scoring (plug into Stella’s lattice)

Inputs: NJIF, CVE→function mapping (soname!symbol or function hash), policy knobs.
States: {NOT_OBSERVED < POSSIBLE < REACHABLE_CONFIRMED} with monotone merge (never oscillates).
Confidence: product of edge evidences (configurable weights): direct=1.0, plt=0.98, vtable=0.85, funcptr=0.7.
Output: OpenVEX/CSAF annotations + human proof path; signed with DSSE to preserve replayability.

Minimal Ghidra headless skeleton (exporter idea)

analyzeHeadless /work/gh_proj MyProj -import app -scriptPath scripts \
  -postScript ExportNjif.java /out/app.njif.json

// ExportNjif.java (outline)
public class ExportNjif extends GhidraScript {
  public void run() throws Exception {
    var fns = getFunctions(true);
    // collect functions, blocks, calls, externs/PLT
    // map non‑named functions to sub_<addr>
    // detect PLT thunks → dynsym names
    // write NJIF JSON deterministically (sorted keys, stable ordering)
  }
}

Integration points in StellaOps

Scanner.Analyzers.Binary.Elf
- ElfNormalizer → hashes, dynamic deps.
- Symbolizer → DWARF reader + HeuristicDisasm (via tool adapter).
- CgBuilder → NJIF builder/merger (multi‑module).
- ReachabilityEngine → path search + confidence math.
- Emitter → NJIF cache + VEX/CSAF notes.
Scheduler: memoize by (hashes, toolchain_version, ruleset_hash) to ensure replayable results.
Authority: sign NJIF + scoring outputs; store manifests (feeds, rule weights, tool versions).

Test fixtures (suggested)

Tiny ELF zoo: statically linked, PIE, stripped/non‑stripped, C++ with vtables, musl vs glibc.
Known CVE libs (e.g., libssl, zlib) with versioned symbols to validate soname!symbol mapping.
Synthetic binaries with function‑pointer tables to validate conservative edges.

If you want, I can generate:

A ready‑to‑run Ghidra headless exporter (Java) that writes NJIF exactly like above.
A small .NET parser that ingests NJIF and emits StellaOps reachability + OpenVEX notes. Below is a full architecture plan for implementing stripped-ELF binary reachability (call graph + NJIF + deterministic scoring, with a hook for patch-oracles) inside StellaOps.

I will assume .NET 10, existing microservice split (Scanner.WebService, Scanner.Worker, Concelier, Excitior, Authority, Scheduler, Sbomer, Signals), and your standing rule: all lattice logic runs in Scanner.WebService.

1. Scope, Objectives, Non-Goals

1.1 Objectives

Recover function-level call graphs from ELF binaries, including stripped ones:

Support ET_EXEC / ET_DYN / PIE, static & dynamic linking.
Support at least x86_64, aarch64 in v1, later armhf, riscv64.

Produce a neutral, deterministic JSON representation (NJIF):

Tool-agnostic: can be generated from Ghidra, radare2/rizin, Binary Ninja, angr, etc.
Stable identifiers and schema so downstream services don’t depend on a specific RE engine.

Compute function-level reachability for vulnerabilities:

Given CVE → soname!symbol (and later function-hash) mappings from Concelier,
Decide REACHABLE_CONFIRMED / REACHABLE_POSSIBLE / NOT_REACHABLE_FOUNDATION with evidence and confidence.

Integrate with StellaOps lattice and VEX outputs:

Lattice logic runs in Scanner.WebService.
Results flow into Excitior (VEX) and Sbomer (SBOM annotations), preserving provenance.

Enable deterministic replay:

Every analysis run is tied to a Scan Manifest: tool versions, ruleset hashes, policy hashes, container image digests.

1.2 Non-Goals (v1)

No dynamic runtime probes (EventPipe/JFR) in this phase.
No full decompilation; we only need enough IR for calls/edges.
No aggressive path-sensitive analysis (symbolic execution) in v1; that can be a v2 enhancement.

2. High-Level System Architecture

2.1 Components

Scanner.WebService (existing)
- REST/gRPC API for scans.
- Orchestrates analysis jobs via Scheduler.
- Hosts Lattice & Reachability Engine for all artifact types.
- Reads NJIF results, merges with Concelier function mappings and policies.
Scanner.Worker (existing, extended)
- Executes Binary Analyzer Pipelines.
- Invokes RE tools (Ghidra, rizin, etc.) in dedicated containers.
- Produces NJIF and persists it.
Binary Tools Containers (new)
- stellaops-tools-ghidra:<tag>
- stellaops-tools-rizin:<tag>
- Optionally stellaops-tools-angr for advanced passes.
- Pinned versions, no network access (for determinism & air-gap).
Storage & Metadata
- DB (PostgreSQL): scan records, NJIF metadata, reachability summaries.
- Object store (MinIO/S3/Filesystem): NJIF JSON blobs, tool logs.
- Authority: DSSE signatures for Scan Manifest, NJIF, and reachability outputs.
Concelier
- Provides CVE → component → function symbol/hashes resolution.
- Exposes “Link-Not-Merge” graph of advisory, component, and function nodes.
Excitior (VEX)
- Consumes Scanner.WebService reachability states.
- Emits OpenVEX/CSAF with properly justified statuses.
UnknownsRegistry (future)
- Receives unresolvable call edges / ambiguous functions from the analyzer,
- Feeds them into “adaptive security” workflows.

2.2 End-to-End Flow (Binary / Image Scan)

Client requests scan (binary or container image) via Scanner.WebService.
WebService:
- Extracts binaries from OCI layers (if scanning image),
- Registers Scan Manifest,
- Submits a job to Scheduler (queue: binary-elfflow).
Scanner.Worker dequeues the job:
- Detects ELF binaries,
- Runs Binary Analyzer Pipeline for each unique binary hash.
Worker uses tools containers:
- Ghidra/rizin → CFG, function discovery, call graph,
- Converts to NJIF.
Worker persists NJIF + metadata; marks analysis complete.
Scanner.WebService picks up NJIF:
- Fetches advisory function mappings from Concelier,
- Runs Reachability & Lattice scoring,
- Updates scan results and triggers Excitior / Sbomer.

All steps are deterministic given:

Input artifact,
Tool container digests,
Ruleset/policy versions.

3. Binary Analyzer Subsystem (Scanner.Worker)

Introduce a dedicated module:

StellaOps.Scanner.Analyzers.Binary.Elf

3.1 Internal Layers

ElfDetector
- Inspects files in a scan:
  - Magic 0x7f 'E' 'L' 'F',
  - Confirms architecture via ELF header.
- Produces BinaryArtifact records with:
  - hashes (SHA-256, BLAKE3),
  - path in container,
  - arch, endianness.
ElfNormalizer
- Uses a lightweight library (e.g., ElfSharp) to extract:
  - ElfType (ET_EXEC, ET_DYN),
  - interpreter (PT_INTERP),
  - DT_NEEDED list,
  - RPATH/RUNPATH,
  - presence/absence of DWARF sections.
- Emits a normalized ElfMetadata DTO.
Symbolization Layer
- Sub-components:
  - DwarfSymbolReader: if DWARF present, read CU, function ranges, names, inlines.
  - DynsymReader: parse .dynsym, .plt, exported symbols.
  - HeuristicFunctionFinder:
    - For stripped binaries:
      - Use disassembler xrefs, prolog patterns, return instructions, call-targets.
      - Recognize PLT thunks → soname!symbol.
- Consolidates into FunctionSymbol entities:
  - id (e.g., main, sub_401000, libc.so.6!memcpy),
  - addr, size, is_external, from (dwarf, dynsym, heuristic).
Disassembly & IR Layer
- Abstraction: IDisassemblyAdapter:
  - Task<DisasmResult> AnalyzeAsync(BinaryArtifact, ElfMetadata, ScanManifest)
- Implementations:
  - GhidraDisassemblyAdapter:
    - Invokes headless Ghidra in container,
    - Receives machine-readable JSON (script-produced),
    - Extracts functions, basic blocks, calls, GOT/PLT info, vtables.
  - RizinDisassemblyAdapter (backup/fallback).
- Produces:
  - BasicBlock objects,
  - Instruction metadata where needed for calls,
  - CallSite records (direct, PLT, indirect).
Call-Graph Builder
- Consumes FunctionSymbol + CallSite sets.
- Identifies roots:
  - _start, .init_array entries,
  - main (if present),
  - Exported API functions for shared libs.
- Creates CallGraph:
  - Nodes: functions (FunctionNode),
  - Edges: CallEdge with:
    - kind: direct, plt, indirect-funcptr, indirect-vtable, tailcall,
    - evidence: tags like ["reloc@GOT", "sig-match", "vtable-class"].
Evidence & Confidence Annotator
- For each edge, computes a local confidence:
  - direct: 1.0
  - plt: 0.98
  - indirect-funcptr: 0.7
  - indirect-vtable: 0.85
- For each path later, Scanner.WebService composes these.
NJIF Serializer
- Transforms domain objects into NJIF JSON:
  - Sorted keys, stable ordering for determinism.
- Writes:
  - artifact, elf, symbols, cfg, cg, and partial reachability: [] (filled by WebService).
- Stores in object store, returns location + hash to DB.
Unknowns Reporting
- Any unresolved:
  - Indirect call with empty target set,
  - Function region not mapped to symbol,
- Logged as UnknownEvidence records and optionally published to UnknownsRegistry stream.

4. NJIF Data Model (Neutral JSON Intermediate Format)

Define a stable schema with a top-level njif_schema_version field.

4.1 Top-Level Shape

{
  "njif_schema_version": "1.0.0",
  "artifact": { ... },
  "symbols": { ... },
  "cfg": [ ... ],
  "cg": { ... },
  "reachability": [ ... ],
  "provenance": { ... }
}

4.2 Key Sections

artifact
- path, hashes, arch, elf.type, interpreter, needed, rpath, runpath.
symbols
- exported: external/dynamic symbols, especially PLT:
  - id, kind, plt, lib.
- functions:
  - id (synthetic or real name),
  - addr, size, from (source of naming info),
  - name_hint (optional).
cfg
- Per-function basic block CFG plus call sites:
  - Blocks with succ, calls entries.
- Sufficient for future static checks, not full IR.
cg
- nodes: function nodes with evidence tags.
- edges: call edges with:
  - from, to, kind, evidence.
- roots: entrypoints for reachability algorithms.
reachability
- Initially empty from Worker.
- Populated in Scanner.WebService as:

{
  "target": "libssl.so.3!SSL_free",
  "status": "REACHABLE_CONFIRMED",
  "path": ["_start", "main", "libssl.so.3!SSL_free"],
  "confidence": 0.93,
  "evidence": ["plt", "dynsym", "reloc"]
}

provenance
- toolchain:
  - disasm: "ghidra_headless:10.4", etc.
- scan_manifest_hash,
- timestamp_utc.

4.3 Persisting NJIF

Object store (versioned path):
- njif/{sha256}/njif-v1.json
DB table binary_njif:
- binary_hash, njif_hash, schema_version, toolchain_digest, scan_manifest_id.

5. Reachability & Lattice Integration (Scanner.WebService)

5.1 Inputs

NJIF for each binary (possibly multiple binaries per container).
Concelier’s CVE → (component, function) resolution:
- component_id → soname!symbol sets, and where available, function hashes.
Scanner’s existing lattice policies:
- States: e.g. NOT_OBSERVED < POSSIBLE < REACHABLE_CONFIRMED.
- Merge rules are monotone.

5.2 Reachability Engine

New service module:

StellaOps.Scanner.Domain.Reachability
- INjifRepository (reads NJIF JSON),
- IFunctionMappingResolver (Concelier adapter),
- IReachabilityCalculator.

Algorithm per target function:

Resolve vulnerable function(s):
- From Concelier: soname!symbol and/or func_hash.
- Map to NJIF symbols.exported or symbols.functions.
For each binary:
- Use cg.roots as entry set.
- BFS/DFS along edges until:
  - Reaching target node(s),
  - Or graph fully explored.
For each successful path:
- Collect edges’ confidence weights, compute path confidence:
  - e.g., product of edge confidences or a log/additive scheme.
Aggregate result:
- If ≥ 1 path with only direct/plt edges:
  - status = REACHABLE_CONFIRMED.
- Else if only paths with indirect edges:
  - status = REACHABLE_POSSIBLE.
- Else:
  - status = NOT_REACHABLE_FOUNDATION.
Emit reachability entry back into NJIF (or as separate DB table) and into scan result graph.

5.3 Lattice & VEX

Lattice computation is done per (CVE, component, binary) triple:
- Input: reachability status + other signals.
Resulting state is:
- Exposed to Excitior as a set of evidence-annotated VEX facts.
Excitior translates:
- NOT_REACHABLE_FOUNDATION → likely not_affected with justification “code_not_reachable”.
- REACHABLE_CONFIRMED → affected or “present_and_exploitable” (depending on overall policy).

6. Patch-Oracle Extension (Advanced, but Architected Now)

While not strictly required for v1, we should reserve architecture hooks.

6.1 Concept

Given:
- A vulnerable library build (or binary),
- A patched build.
Run analyzers on both; produce NJIF for each.
Compare call graphs & function bodies (e.g., hash of normalized bytes):
- Identify changed functions and potentially changed code regions.
Concelier links those function IDs to specific CVEs (via vendor patch metadata).
These become authoritative “patched function sets” (the patch oracle).

6.2 Integration Points

Add a module:

StellaOps.Scanner.Analysis.PatchOracle
- Input: pair of artifact hashes (old, new) + NJIF.
- Output: list of FunctionPatchRecord:
  - function_id, binary_hash_old, binary_hash_new, change_kind (added, modified, deleted).

Concelier:

Ingests FunctionPatchRecord via internal API and updates advisory graph:
- CVE → function set derived from real patch.
Reachability Engine:
- Uses patch-derived function sets instead of or in addition to symbol mapping from vendor docs.

7. Persistence, Determinism, Caching

7.1 Scan Manifest

For every scan job, create:

scan_manifest:
- Input artifact hashes,
- List of binaries,
- Tool container digests (Ghidra, rizin, etc.),
- Ruleset/policy/lattice hashes,
- Time, user, and config flags.

Authority signs this manifest with DSSE.

7.2 Binary Analysis Cache

Key: (binary_hash, arch, toolchain_digest, njif_schema_version).

If present:
- Skip re-running Ghidra/rizin; reuse NJIF.
If absent:
- Run analysis, then cache NJIF.

This provides deterministic replay and prevents re-analysis across scans and across customers (if allowed by tenancy model).

8. APIs & Integration Contracts

8.1 Scanner.WebService External API (REST)

POST /api/scans/images
- Existing; extended to flag: includeBinaryReachability: true.
POST /api/scans/binaries
- Upload a standalone ELF; returns scan_id.
GET /api/scans/{scanId}/reachability
- Returns list of (cve_id, component, binary_path, function_id, status, confidence, path).

No path versioning; idempotent and additive (new fields appear, old ones remain valid).

8.2 Internal APIs

Worker ↔ Object Store:
- PUT /binary-njif/{sha256}/njif-v1.json.
WebService ↔ Worker (via Scheduler):
- Job payload includes:
  - scan_manifest_id,
  - binary_hashes,
  - analysis_profile (default, deep).
WebService ↔ Concelier:
- POST /internal/functions/resolve:
  - Input: (cve_id, component_ids[]),
  - Output: soname!symbol[], optional func_hash[].
WebService ↔ Excitior:
- Existing VEX ingestion extended with reachability evidence fields.

9. Observability, Security, Resource Model

9.1 Observability

Metrics:
- Analysis duration per binary,
- NJIF size,
- Cache hit ratio,
- Reachability evaluation time per CVE.
Logs:
- Ghidra/rizin container logs stored alongside NJIF,
- Unknowns logs for unresolved call targets.
Tracing:
- Each scan/analysis annotated with scan_manifest_id to allow end-to-end trace.

9.2 Security

Tools containers:
- No outbound network.
- Limited to read-only artifact mount + write-only result mount.
Binary content:
- Treated as confidential; stored encrypted at rest if your global policy requires it.
DSSE:
- Authority signs:
  - Scan Manifest,
  - NJIF blob hash,
  - Reachability summary.
- Enables “Proof-of-Integrity Graph” linkage later.

9.3 Resource Model

ELF analysis can be heavy; design for:
- Separate worker queue and autoscaling group for binary analysis.
- Configurable max concurrency and per-job CPU/memory limits.
Deep analysis (indirect calls, vtables) can be toggled via analysis_profile.

10. Implementation Roadmap

A pragmatic, staged plan:

Phase 0 – Foundations (1–2 sprints)

Create StellaOps.Scanner.Analyzers.Binary.Elf project.
Implement:
- ElfDetector, ElfNormalizer.
- DB tables: binary_artifacts, binary_njif.
Integrate with Scheduler and Worker pipeline.

Phase 1 – Non-stripped ELF + NJIF v1 (2–3 sprints)

Implement DWARF + dynsym symbolization.
Implement GhidraDisassemblyAdapter for x86_64.
Build CallGraphBuilder (direct + PLT calls).
Implement NJIF serializer v1; store in object store.
Basic reachability engine in WebService:
- Only direct and PLT edges,
- Only for DWARF-named functions.
Integrate with Concelier function mapping via soname!symbol.

Phase 2 – Stripped ELF Support (2–3 sprints)

Implement HeuristicFunctionFinder for function discovery in stripped binaries.
Extend Ghidra script to mark PLT/GOT, vtables, function pointers.
Call graph: add:
- indirect-funcptr, indirect-vtable, tailcall edges.
Evidence tagging and local confidence scoring.
Extend reachability engine to:
- Distinguish REACHABLE_CONFIRMED vs REACHABLE_POSSIBLE.

Phase 3 – Multi-Arch & Performance (2–3 sprints)

Add support for aarch64 (Ghidra language, appropriate calling conventions).
Optimize:
- Binary analysis cache,
- Tool container lifecycle,
- Concurrent analysis.
Add Unknowns reporting and hookup to UnknownsRegistry (if already implemented).

Phase 4 – Patch-Oracle Pilot (2–3 sprints)

Implement PatchOracle module:
- Compare old/new NJIFs,
- Detect changed functions.
Integrate with Concelier’s advisory graph.
Start validating against curated CVE/patch datasets.

Phase 5 – Hardening & Documentation

Golden fixtures:
- Small ELF zoo (stripped/non-stripped, PIE, static, C++, vtables).
- Known vulnerable libs (e.g., OpenSSL, zlib) to confirm correct function mapping.
Add CLI/demo in StellaOps.Scanner.Cli:
- stellaops scan-binary --file app --show-reachability.
Customer-facing and internal docs:
- NJIF schema,
- API usage,
- Limitations and interpretation guidelines.

If you want, next step I can do is take this plan and:

Break it into epics / tickets (SCAN-BINARY-xxx) with clear DoD per phase, or
Draft the Ghidra headless Java script and the .NET NJIF model classes so your agents can plug them straight into the Scanner repo.

25 KiB Raw Blame History Unescape Escape