- Introduced attestation inventory and subject-rekor mapping files for tracking Docker packages. - Added a comprehensive crypto registry decision document outlining defaults and required follow-ups. - Created an offline feeds manifest for bundling air-gap resources. - Implemented a script to generate and update binary manifests for curated binaries. - Added a verification script to ensure binary artefacts are located in approved directories. - Defined new schemas for AdvisoryEvidenceBundle, OrchestratorEnvelope, ScannerReportReadyPayload, and ScannerScanCompletedPayload. - Established project files for StellaOps.Orchestrator.Schemas and StellaOps.PolicyAuthoritySignals.Contracts. - Updated vendor manifest to track pinned binaries for integrity.
25 KiB
Here’s a compact blueprint for bringing stripped ELF binaries into StellaOps’s call‑graph + reachability scoring—from raw bytes → neutral JSON → deterministic scoring.
Why this matters (quick)
Even when symbols are missing, you can still (1) recover functions, (2) build a call graph, and (3) decide if a vulnerable function is actually reachable from the binary’s entrypoints. This feeds StellaOps’s deterministic scoring/lattice engine so VEX decisions are evidence‑backed, not guesswork.
High‑level pipeline
- Ingest
- Accept: ELF (static/dynamic), PIE, musl/glibc, multiple arches (x86_64, aarch64, armhf, riscv64).
- Normalize: compute file hash set (SHA‑256, BLAKE3), note
PT_DYNAMIC,DT_NEEDED, interpreter, RPATH/RUNPATH.
- Symbolization (best‑effort)
-
If DWARF present: read
.debug_*(function names, inlines, CU boundaries, ranges). -
If stripped:
- Use disassembler to discover functions (prolog patterns, xref‑to‑targets, thunk detection).
- Derive synthetic names:
sub_<va>,plt_<name>(from dynamic symbol table if available),extern@libc.so.6:memcpy. - Lift exported dynsyms and PLT stubs even when local symbols are removed.
- Recover string‑referenced names (e.g., Go/Python/C++ RTTI/Itanium mangling where present).
- Disassembly & IR
-
Disassemble to basic blocks; lift to a neutral IR (SSA‑like) sufficient for:
- Call edges (direct
call/bl). - Indirect calls via GOT/IAT, vtables, function pointers (approximate with points‑to sets).
- Tailcalls, thunks, PLT interposition.
- Call edges (direct
- Call‑graph build
-
Start from entrypoints:
- ELF entry (
_start), constructors (.init_array), exported API (public symbols),main(if recoverable). - Optional: entry‑trace (cmd‑line + env + loader path) from container image to seed realistic roots.
- ELF entry (
-
Build CG with:
- Direct edges: precise.
- Indirect edges: conservative, with evidence tags (GOT target set, vtable class set, signature match).
-
Record inter‑module edges to shared libs (soname + version) with relocation evidence.
- Reachability scoring (deterministic)
-
Input: list of vulnerable functions/paths (from CSAF/CVE KB) normalized to function‑level identifiers (soname!symbol or hash‑based if unnamed).
-
Compute reachability from roots → target:
REACHABLE_CONFIRMED(path with only precise edges),REACHABLE_POSSIBLE(path contains conservative edges),NOT_REACHABLE_FOUNDATION(no path in current graph),- Add confidence derived from edge evidence + relocation proof.
-
Emit proof trails (the exact path: nodes, edges, evidence).
- Neutral JSON intermediate (NJIF)
- Stored in cache; signed for deterministic replay.
- Consumed by StellaOps.Policy/Lattice to merge with VEX.
Neutral JSON Intermediate Format (NJIF)
{
"artifact": {
"path": "/work/bin/app",
"hashes": {"sha256": "…", "blake3": "…"},
"arch": "x86_64",
"elf": {
"type": "ET_DYN",
"interpreter": "/lib64/ld-linux-x86-64.so.2",
"needed": ["libc.so.6", "libssl.so.3"],
"rpath": [],
"runpath": []
}
},
"symbols": {
"exported": [
{"id": "libc.so.6!memcpy", "kind": "dynsym", "addr": "0x0", "plt": true}
],
"functions": [
{"id": "sub_401000", "addr": "0x401000", "size": 112, "name_hint": null, "from": "disasm"},
{"id": "main", "addr": "0x4023d0", "size": 348, "from": "dwarf|heuristic"}
]
},
"cfg": [
{"func": "main", "blocks": [
{"b": "0x4023d0", "succ": ["0x402415"], "calls": [{"type": "direct", "target": "sub_401000"}]},
{"b": "0x402415", "succ": ["0x402440"], "calls": [{"type": "plt", "target": "libc.so.6!memcpy"}]}
]}
],
"cg": {
"nodes": [
{"id": "main", "evidence": ["dwarf|heuristic"]},
{"id": "sub_401000"},
{"id": "libc.so.6!memcpy", "external": true, "lib": "libc.so.6"}
],
"edges": [
{"from": "main", "to": "sub_401000", "kind": "direct"},
{"from": "main", "to": "libc.so.6!memcpy", "kind": "plt", "evidence": ["reloc@GOT"]}
],
"roots": ["_start", "init_array[]", "main"]
},
"reachability": [
{
"target": "libssl.so.3!SSL_free",
"status": "NOT_REACHABLE_FOUNDATION",
"path": []
},
{
"target": "libc.so.6!memcpy",
"status": "REACHABLE_CONFIRMED",
"path": ["main", "libc.so.6!memcpy"],
"confidence": 0.98,
"evidence": ["plt", "dynsym", "reloc"]
}
],
"provenance": {
"toolchain": {
"disasm": "ghidra_headless|radare2|llvm-mca",
"version": "…"
},
"scan_manifest_hash": "…",
"timestamp_utc": "2025-11-16T00:00:00Z"
}
}
Practical extractors (headless/CLI)
-
DWARF:
llvm-dwarfdump/eu-readelffor quick CU/function ranges; fall back to the disassembler. -
Disassembly/CFG/CG (choose one or more; wrap with a stable adapter):
- Ghidra Headless API: recover functions, basic blocks, references, PLT/GOT, vtables; export via a custom headless script to NJIF.
- radare2 / rizin:
aaa,agCd,aflj,agjto export functions/graphs as JSON. - Binary Ninja headless (if license permits) for cleaner IL and indirect‑call modeling.
- angr for path‑sensitive refinement on tricky indirect calls (optional, gated by budget).
Adapter principle: All tools output a small, consistent NJIF so the scoring engine and lattice logic never depend on any single RE tool.
Indirect call modeling (concise rules)
- PLT/GOT: edge from caller →
soname!symbolwith evidence:plt,reloc@GOT. - Function pointers: if a store to a pointer is found and targets a known function set
{f1…fk}, add edges withkind: "indirect",evidence: ["xref-store", "sig-compatible"]. - Virtual calls / vtables: class‑method set from RTTI/vtable scans; mark edges
evidence: ["vtable-match"]. - Tailcalls: treat as edges, not fallthrough.
Each conservative step lowers confidence, but keeps determinism: the rules and their hashes are in the scan manifest.
Deterministic scoring (plug into Stella’s lattice)
- Inputs: NJIF, CVE→function mapping (
soname!symbolor function hash), policy knobs. - States:
{NOT_OBSERVED < POSSIBLE < REACHABLE_CONFIRMED}with monotone merge (never oscillates). - Confidence: product of edge evidences (configurable weights):
direct=1.0, plt=0.98, vtable=0.85, funcptr=0.7. - Output: OpenVEX/CSAF annotations + human proof path; signed with DSSE to preserve replayability.
Minimal Ghidra headless skeleton (exporter idea)
analyzeHeadless /work/gh_proj MyProj -import app -scriptPath scripts \
-postScript ExportNjif.java /out/app.njif.json
// ExportNjif.java (outline)
public class ExportNjif extends GhidraScript {
public void run() throws Exception {
var fns = getFunctions(true);
// collect functions, blocks, calls, externs/PLT
// map non‑named functions to sub_<addr>
// detect PLT thunks → dynsym names
// write NJIF JSON deterministically (sorted keys, stable ordering)
}
}
Integration points in StellaOps
-
Scanner.Analyzers.Binary.Elf
ElfNormalizer→ hashes, dynamic deps.Symbolizer→ DWARF reader + HeuristicDisasm (via tool adapter).CgBuilder→ NJIF builder/merger (multi‑module).ReachabilityEngine→ path search + confidence math.Emitter→ NJIF cache + VEX/CSAF notes.
-
Scheduler: memoize by
(hashes, toolchain_version, ruleset_hash)to ensure replayable results. -
Authority: sign NJIF + scoring outputs; store manifests (feeds, rule weights, tool versions).
Test fixtures (suggested)
- Tiny ELF zoo: statically linked, PIE, stripped/non‑stripped, C++ with vtables, musl vs glibc.
- Known CVE libs (e.g.,
libssl,zlib) with versioned symbols to validate soname!symbol mapping. - Synthetic binaries with function‑pointer tables to validate conservative edges.
If you want, I can generate:
- A ready‑to‑run Ghidra headless exporter (Java) that writes NJIF exactly like above.
- A small .NET parser that ingests NJIF and emits StellaOps reachability + OpenVEX notes. Below is a full architecture plan for implementing stripped-ELF binary reachability (call graph + NJIF + deterministic scoring, with a hook for patch-oracles) inside StellaOps.
I will assume .NET 10, existing microservice split (Scanner.WebService, Scanner.Worker, Concelier, Excitior, Authority, Scheduler, Sbomer, Signals), and your standing rule: all lattice logic runs in Scanner.WebService.
1. Scope, Objectives, Non-Goals
1.1 Objectives
- Recover function-level call graphs from ELF binaries, including stripped ones:
- Support ET_EXEC / ET_DYN / PIE, static & dynamic linking.
- Support at least x86_64, aarch64 in v1, later armhf, riscv64.
- Produce a neutral, deterministic JSON representation (NJIF):
- Tool-agnostic: can be generated from Ghidra, radare2/rizin, Binary Ninja, angr, etc.
- Stable identifiers and schema so downstream services don’t depend on a specific RE engine.
- Compute function-level reachability for vulnerabilities:
- Given CVE →
soname!symbol(and later function-hash) mappings from Concelier, - Decide
REACHABLE_CONFIRMED/REACHABLE_POSSIBLE/NOT_REACHABLE_FOUNDATIONwith evidence and confidence.
- Integrate with StellaOps lattice and VEX outputs:
- Lattice logic runs in Scanner.WebService.
- Results flow into Excitior (VEX) and Sbomer (SBOM annotations), preserving provenance.
- Enable deterministic replay:
- Every analysis run is tied to a Scan Manifest: tool versions, ruleset hashes, policy hashes, container image digests.
1.2 Non-Goals (v1)
- No dynamic runtime probes (EventPipe/JFR) in this phase.
- No full decompilation; we only need enough IR for calls/edges.
- No aggressive path-sensitive analysis (symbolic execution) in v1; that can be a v2 enhancement.
2. High-Level System Architecture
2.1 Components
-
Scanner.WebService (existing)
- REST/gRPC API for scans.
- Orchestrates analysis jobs via Scheduler.
- Hosts Lattice & Reachability Engine for all artifact types.
- Reads NJIF results, merges with Concelier function mappings and policies.
-
Scanner.Worker (existing, extended)
- Executes Binary Analyzer Pipelines.
- Invokes RE tools (Ghidra, rizin, etc.) in dedicated containers.
- Produces NJIF and persists it.
-
Binary Tools Containers (new)
stellaops-tools-ghidra:<tag>stellaops-tools-rizin:<tag>- Optionally
stellaops-tools-angrfor advanced passes. - Pinned versions, no network access (for determinism & air-gap).
-
Storage & Metadata
- DB (PostgreSQL): scan records, NJIF metadata, reachability summaries.
- Object store (MinIO/S3/Filesystem): NJIF JSON blobs, tool logs.
- Authority: DSSE signatures for Scan Manifest, NJIF, and reachability outputs.
-
Concelier
- Provides CVE → component → function symbol/hashes resolution.
- Exposes “Link-Not-Merge” graph of advisory, component, and function nodes.
-
Excitior (VEX)
- Consumes Scanner.WebService reachability states.
- Emits OpenVEX/CSAF with properly justified statuses.
-
UnknownsRegistry (future)
- Receives unresolvable call edges / ambiguous functions from the analyzer,
- Feeds them into “adaptive security” workflows.
2.2 End-to-End Flow (Binary / Image Scan)
-
Client requests scan (binary or container image) via Scanner.WebService.
-
WebService:
- Extracts binaries from OCI layers (if scanning image),
- Registers Scan Manifest,
- Submits a job to Scheduler (queue:
binary-elfflow).
-
Scanner.Worker dequeues the job:
- Detects ELF binaries,
- Runs Binary Analyzer Pipeline for each unique binary hash.
-
Worker uses tools containers:
- Ghidra/rizin → CFG, function discovery, call graph,
- Converts to NJIF.
-
Worker persists NJIF + metadata; marks analysis complete.
-
Scanner.WebService picks up NJIF:
- Fetches advisory function mappings from Concelier,
- Runs Reachability & Lattice scoring,
- Updates scan results and triggers Excitior / Sbomer.
All steps are deterministic given:
- Input artifact,
- Tool container digests,
- Ruleset/policy versions.
3. Binary Analyzer Subsystem (Scanner.Worker)
Introduce a dedicated module:
StellaOps.Scanner.Analyzers.Binary.Elf
3.1 Internal Layers
-
ElfDetector
-
Inspects files in a scan:
- Magic
0x7f 'E' 'L' 'F', - Confirms architecture via ELF header.
- Magic
-
Produces
BinaryArtifactrecords with:hashes(SHA-256, BLAKE3),pathin container,arch,endianness.
-
-
ElfNormalizer
-
Uses a lightweight library (e.g., ElfSharp) to extract:
ElfType(ET_EXEC, ET_DYN),- interpreter (
PT_INTERP), DT_NEEDEDlist,- RPATH/RUNPATH,
- presence/absence of DWARF sections.
-
Emits a normalized
ElfMetadataDTO.
-
-
Symbolization Layer
-
Sub-components:
-
DwarfSymbolReader: if DWARF present, read CU, function ranges, names, inlines. -
DynsymReader: parse.dynsym,.plt, exported symbols. -
HeuristicFunctionFinder:-
For stripped binaries:
- Use disassembler xrefs, prolog patterns, return instructions, call-targets.
- Recognize PLT thunks →
soname!symbol.
-
-
-
Consolidates into
FunctionSymbolentities:id(e.g.,main,sub_401000,libc.so.6!memcpy),addr,size,is_external,from(dwarf,dynsym,heuristic).
-
-
Disassembly & IR Layer
-
Abstraction:
IDisassemblyAdapter:Task<DisasmResult> AnalyzeAsync(BinaryArtifact, ElfMetadata, ScanManifest)
-
Implementations:
-
GhidraDisassemblyAdapter:- Invokes headless Ghidra in container,
- Receives machine-readable JSON (script-produced),
- Extracts functions, basic blocks, calls, GOT/PLT info, vtables.
-
RizinDisassemblyAdapter(backup/fallback).
-
-
Produces:
BasicBlockobjects,Instructionmetadata where needed for calls,CallSiterecords (direct, PLT, indirect).
-
-
Call-Graph Builder
-
Consumes
FunctionSymbol+CallSitesets. -
Identifies roots:
_start,.init_arrayentries,main(if present),- Exported API functions for shared libs.
-
Creates
CallGraph:-
Nodes: functions (
FunctionNode), -
Edges:
CallEdgewith:kind:direct,plt,indirect-funcptr,indirect-vtable,tailcall,evidence: tags like["reloc@GOT", "sig-match", "vtable-class"].
-
-
-
Evidence & Confidence Annotator
-
For each edge, computes a local confidence:
direct: 1.0plt: 0.98indirect-funcptr: 0.7indirect-vtable: 0.85
-
For each path later, Scanner.WebService composes these.
-
-
NJIF Serializer
-
Transforms domain objects into NJIF JSON:
- Sorted keys, stable ordering for determinism.
-
Writes:
artifact,elf,symbols,cfg,cg, and partialreachability: [](filled by WebService).
-
Stores in object store, returns location + hash to DB.
-
-
Unknowns Reporting
-
Any unresolved:
- Indirect call with empty target set,
- Function region not mapped to symbol,
-
Logged as
UnknownEvidencerecords and optionally published to UnknownsRegistry stream.
-
4. NJIF Data Model (Neutral JSON Intermediate Format)
Define a stable schema with a top-level njif_schema_version field.
4.1 Top-Level Shape
{
"njif_schema_version": "1.0.0",
"artifact": { ... },
"symbols": { ... },
"cfg": [ ... ],
"cg": { ... },
"reachability": [ ... ],
"provenance": { ... }
}
4.2 Key Sections
-
artifactpath,hashes,arch,elf.type,interpreter,needed,rpath,runpath.
-
symbols-
exported: external/dynamic symbols, especially PLT:id,kind,plt,lib.
-
functions:id(synthetic or real name),addr,size,from(source of naming info),name_hint(optional).
-
-
cfg-
Per-function basic block CFG plus call sites:
- Blocks with
succ,callsentries.
- Blocks with
-
Sufficient for future static checks, not full IR.
-
-
cg-
nodes: function nodes with evidence tags. -
edges: call edges with:from,to,kind,evidence.
-
roots: entrypoints for reachability algorithms.
-
-
reachability- Initially empty from Worker.
- Populated in Scanner.WebService as:
{
"target": "libssl.so.3!SSL_free",
"status": "REACHABLE_CONFIRMED",
"path": ["_start", "main", "libssl.so.3!SSL_free"],
"confidence": 0.93,
"evidence": ["plt", "dynsym", "reloc"]
}
-
provenance-
toolchain:disasm:"ghidra_headless:10.4", etc.
-
scan_manifest_hash, -
timestamp_utc.
-
4.3 Persisting NJIF
-
Object store (versioned path):
njif/{sha256}/njif-v1.json
-
DB table
binary_njif:binary_hash,njif_hash,schema_version,toolchain_digest,scan_manifest_id.
5. Reachability & Lattice Integration (Scanner.WebService)
5.1 Inputs
-
NJIF for each binary (possibly multiple binaries per container).
-
Concelier’s CVE → (component, function) resolution:
component_id→soname!symbolsets, and where available, function hashes.
-
Scanner’s existing lattice policies:
- States: e.g.
NOT_OBSERVED < POSSIBLE < REACHABLE_CONFIRMED. - Merge rules are monotone.
- States: e.g.
5.2 Reachability Engine
New service module:
-
StellaOps.Scanner.Domain.ReachabilityINjifRepository(reads NJIF JSON),IFunctionMappingResolver(Concelier adapter),IReachabilityCalculator.
Algorithm per target function:
-
Resolve vulnerable function(s):
- From Concelier:
soname!symboland/orfunc_hash. - Map to NJIF
symbols.exportedorsymbols.functions.
- From Concelier:
-
For each binary:
-
Use
cg.rootsas entry set. -
BFS/DFS along edges until:
- Reaching target node(s),
- Or graph fully explored.
-
-
For each successful path:
-
Collect edges’
confidenceweights, compute path confidence:- e.g., product of edge confidences or a log/additive scheme.
-
-
Aggregate result:
-
If ≥ 1 path with only
direct/pltedges:status = REACHABLE_CONFIRMED.
-
Else if only paths with indirect edges:
status = REACHABLE_POSSIBLE.
-
Else:
status = NOT_REACHABLE_FOUNDATION.
-
-
Emit
reachabilityentry back into NJIF (or as separate DB table) and into scan result graph.
5.3 Lattice & VEX
-
Lattice computation is done per
(CVE, component, binary)triple:- Input: reachability status + other signals.
-
Resulting state is:
- Exposed to Excitior as a set of evidence-annotated VEX facts.
-
Excitior translates:
NOT_REACHABLE_FOUNDATION→ likelynot_affectedwith justification “code_not_reachable”.REACHABLE_CONFIRMED→affectedor “present_and_exploitable” (depending on overall policy).
6. Patch-Oracle Extension (Advanced, but Architected Now)
While not strictly required for v1, we should reserve architecture hooks.
6.1 Concept
-
Given:
- A vulnerable library build (or binary),
- A patched build.
-
Run analyzers on both; produce NJIF for each.
-
Compare call graphs & function bodies (e.g., hash of normalized bytes):
- Identify changed functions and potentially changed code regions.
-
Concelier links those function IDs to specific CVEs (via vendor patch metadata).
-
These become authoritative “patched function sets” (the patch oracle).
6.2 Integration Points
Add a module:
-
StellaOps.Scanner.Analysis.PatchOracle-
Input: pair of artifact hashes (old, new) + NJIF.
-
Output: list of
FunctionPatchRecord:function_id,binary_hash_old,binary_hash_new,change_kind(added,modified,deleted).
-
Concelier:
-
Ingests
FunctionPatchRecordvia internal API and updates advisory graph:- CVE → function set derived from real patch.
-
Reachability Engine:
- Uses patch-derived function sets instead of or in addition to symbol mapping from vendor docs.
7. Persistence, Determinism, Caching
7.1 Scan Manifest
For every scan job, create:
-
scan_manifest:- Input artifact hashes,
- List of binaries,
- Tool container digests (Ghidra, rizin, etc.),
- Ruleset/policy/lattice hashes,
- Time, user, and config flags.
Authority signs this manifest with DSSE.
7.2 Binary Analysis Cache
Key: (binary_hash, arch, toolchain_digest, njif_schema_version).
-
If present:
- Skip re-running Ghidra/rizin; reuse NJIF.
-
If absent:
- Run analysis, then cache NJIF.
This provides deterministic replay and prevents re-analysis across scans and across customers (if allowed by tenancy model).
8. APIs & Integration Contracts
8.1 Scanner.WebService External API (REST)
-
POST /api/scans/images- Existing; extended to flag:
includeBinaryReachability: true.
- Existing; extended to flag:
-
POST /api/scans/binaries- Upload a standalone ELF; returns
scan_id.
- Upload a standalone ELF; returns
-
GET /api/scans/{scanId}/reachability- Returns list of
(cve_id, component, binary_path, function_id, status, confidence, path).
- Returns list of
No path versioning; idempotent and additive (new fields appear, old ones remain valid).
8.2 Internal APIs
-
Worker ↔ Object Store:
PUT /binary-njif/{sha256}/njif-v1.json.
-
WebService ↔ Worker (via Scheduler):
-
Job payload includes:
scan_manifest_id,binary_hashes,analysis_profile(default,deep).
-
-
WebService ↔ Concelier:
-
POST /internal/functions/resolve:- Input:
(cve_id, component_ids[]), - Output:
soname!symbol[], optionalfunc_hash[].
- Input:
-
-
WebService ↔ Excitior:
- Existing VEX ingestion extended with reachability evidence fields.
9. Observability, Security, Resource Model
9.1 Observability
-
Metrics:
- Analysis duration per binary,
- NJIF size,
- Cache hit ratio,
- Reachability evaluation time per CVE.
-
Logs:
- Ghidra/rizin container logs stored alongside NJIF,
- Unknowns logs for unresolved call targets.
-
Tracing:
- Each scan/analysis annotated with
scan_manifest_idto allow end-to-end trace.
- Each scan/analysis annotated with
9.2 Security
-
Tools containers:
- No outbound network.
- Limited to read-only artifact mount + write-only result mount.
-
Binary content:
- Treated as confidential; stored encrypted at rest if your global policy requires it.
-
DSSE:
-
Authority signs:
- Scan Manifest,
- NJIF blob hash,
- Reachability summary.
-
Enables “Proof-of-Integrity Graph” linkage later.
-
9.3 Resource Model
-
ELF analysis can be heavy; design for:
- Separate worker queue and autoscaling group for binary analysis.
- Configurable max concurrency and per-job CPU/memory limits.
-
Deep analysis (indirect calls, vtables) can be toggled via
analysis_profile.
10. Implementation Roadmap
A pragmatic, staged plan:
Phase 0 – Foundations (1–2 sprints)
-
Create
StellaOps.Scanner.Analyzers.Binary.Elfproject. -
Implement:
ElfDetector,ElfNormalizer.- DB tables:
binary_artifacts,binary_njif.
-
Integrate with Scheduler and Worker pipeline.
Phase 1 – Non-stripped ELF + NJIF v1 (2–3 sprints)
-
Implement DWARF + dynsym symbolization.
-
Implement GhidraDisassemblyAdapter for x86_64.
-
Build CallGraphBuilder (direct + PLT calls).
-
Implement NJIF serializer v1; store in object store.
-
Basic reachability engine in WebService:
- Only direct and PLT edges,
- Only for DWARF-named functions.
-
Integrate with Concelier function mapping via
soname!symbol.
Phase 2 – Stripped ELF Support (2–3 sprints)
-
Implement
HeuristicFunctionFinderfor function discovery in stripped binaries. -
Extend Ghidra script to mark PLT/GOT, vtables, function pointers.
-
Call graph: add:
indirect-funcptr,indirect-vtable,tailcalledges.
-
Evidence tagging and local confidence scoring.
-
Extend reachability engine to:
- Distinguish
REACHABLE_CONFIRMEDvsREACHABLE_POSSIBLE.
- Distinguish
Phase 3 – Multi-Arch & Performance (2–3 sprints)
-
Add support for aarch64 (Ghidra language, appropriate calling conventions).
-
Optimize:
- Binary analysis cache,
- Tool container lifecycle,
- Concurrent analysis.
-
Add Unknowns reporting and hookup to UnknownsRegistry (if already implemented).
Phase 4 – Patch-Oracle Pilot (2–3 sprints)
-
Implement PatchOracle module:
- Compare old/new NJIFs,
- Detect changed functions.
-
Integrate with Concelier’s advisory graph.
-
Start validating against curated CVE/patch datasets.
Phase 5 – Hardening & Documentation
-
Golden fixtures:
- Small ELF zoo (stripped/non-stripped, PIE, static, C++, vtables).
- Known vulnerable libs (e.g., OpenSSL, zlib) to confirm correct function mapping.
-
Add CLI/demo in
StellaOps.Scanner.Cli:stellaops scan-binary --file app --show-reachability.
-
Customer-facing and internal docs:
- NJIF schema,
- API usage,
- Limitations and interpretation guidelines.
If you want, next step I can do is take this plan and:
- Break it into epics / tickets (SCAN-BINARY-xxx) with clear DoD per phase, or
- Draft the Ghidra headless Java script and the .NET NJIF model classes so your agents can plug them straight into the Scanner repo.