feat: Add new provenance and crypto registry documentation
Some checks failed
api-governance / spectral-lint (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled

- Introduced attestation inventory and subject-rekor mapping files for tracking Docker packages.
- Added a comprehensive crypto registry decision document outlining defaults and required follow-ups.
- Created an offline feeds manifest for bundling air-gap resources.
- Implemented a script to generate and update binary manifests for curated binaries.
- Added a verification script to ensure binary artefacts are located in approved directories.
- Defined new schemas for AdvisoryEvidenceBundle, OrchestratorEnvelope, ScannerReportReadyPayload, and ScannerScanCompletedPayload.
- Established project files for StellaOps.Orchestrator.Schemas and StellaOps.PolicyAuthoritySignals.Contracts.
- Updated vendor manifest to track pinned binaries for integrity.
This commit is contained in:
master
2025-11-18 23:47:13 +02:00
parent d3ecd7f8e6
commit e91da22836
44 changed files with 6793 additions and 99 deletions

View File

@@ -0,0 +1,846 @@
Heres a compact blueprint for bringing **stripped ELF binaries** into StellaOpss **callgraph + reachability scoring**—from raw bytes → neutral JSON → deterministic scoring.
---
# Why this matters (quick)
Even when symbols are missing, you can still (1) recover functions, (2) build a call graph, and (3) decide if a vulnerable function is *actually* reachable from the binarys entrypoints. This feeds StellaOpss deterministic scoring/lattice engine so VEX decisions are evidencebacked, not guesswork.
---
# Highlevel pipeline
1. **Ingest**
* Accept: ELF (static/dynamic), PIE, musl/glibc, multiple arches (x86_64, aarch64, armhf, riscv64).
* Normalize: compute file hash set (SHA256, BLAKE3), note `PT_DYNAMIC`, `DT_NEEDED`, interpreter, RPATH/RUNPATH.
2. **Symbolization (besteffort)**
* **If DWARF present**: read `.debug_*` (function names, inlines, CU boundaries, ranges).
* **If stripped**:
* Use disassembler to **discover functions** (prolog patterns, xreftotargets, thunk detection).
* Derive **synthetic names**: `sub_<va>`, `plt_<name>` (from dynamic symbol table if available), `extern@libc.so.6:memcpy`.
* Lift exported dynsyms and PLT stubs even when local symbols are removed.
* Recover **stringreferenced names** (e.g., Go/Python/C++ RTTI/Itanium mangling where present).
3. **Disassembly & IR**
* Disassemble to basic blocks; lift to a neutral IR (SSAlike) sufficient for:
* Call edges (direct `call`/`bl`).
* **Indirect calls** via GOT/IAT, vtables, function pointers (approximate with pointsto sets).
* Tailcalls, thunks, PLT interposition.
4. **Callgraph build**
* Start from **entrypoints**:
* ELF entry (`_start`), constructors (`.init_array`), exported API (public symbols), `main` (if recoverable).
* Optional: **entrytrace** (cmdline + env + loader path) from container image to seed realistic roots.
* Build **CG** with:
* Direct edges: precise.
* Indirect edges: conservative, with **evidence tags** (GOT target set, vtable class set, signature match).
* Record **intermodule edges** to shared libs (soname + version) with relocation evidence.
5. **Reachability scoring (deterministic)**
* Input: list of vulnerable functions/paths (from CSAF/CVE KB) normalized to **functionlevel identifiers** (soname!symbol or hashbased if unnamed).
* Compute **reachability** from roots → target:
* `REACHABLE_CONFIRMED` (path with only precise edges),
* `REACHABLE_POSSIBLE` (path contains conservative edges),
* `NOT_REACHABLE_FOUNDATION` (no path in current graph),
* Add **confidence** derived from edge evidence + relocation proof.
* Emit **proof trails** (the exact path: nodes, edges, evidence).
6. **Neutral JSON intermediate (NJIF)**
* Stored in cache; signed for deterministic replay.
* Consumed by StellaOps.Policy/Lattice to merge with VEX.
---
# Neutral JSON Intermediate Format (NJIF)
```json
{
"artifact": {
"path": "/work/bin/app",
"hashes": {"sha256": "…", "blake3": "…"},
"arch": "x86_64",
"elf": {
"type": "ET_DYN",
"interpreter": "/lib64/ld-linux-x86-64.so.2",
"needed": ["libc.so.6", "libssl.so.3"],
"rpath": [],
"runpath": []
}
},
"symbols": {
"exported": [
{"id": "libc.so.6!memcpy", "kind": "dynsym", "addr": "0x0", "plt": true}
],
"functions": [
{"id": "sub_401000", "addr": "0x401000", "size": 112, "name_hint": null, "from": "disasm"},
{"id": "main", "addr": "0x4023d0", "size": 348, "from": "dwarf|heuristic"}
]
},
"cfg": [
{"func": "main", "blocks": [
{"b": "0x4023d0", "succ": ["0x402415"], "calls": [{"type": "direct", "target": "sub_401000"}]},
{"b": "0x402415", "succ": ["0x402440"], "calls": [{"type": "plt", "target": "libc.so.6!memcpy"}]}
]}
],
"cg": {
"nodes": [
{"id": "main", "evidence": ["dwarf|heuristic"]},
{"id": "sub_401000"},
{"id": "libc.so.6!memcpy", "external": true, "lib": "libc.so.6"}
],
"edges": [
{"from": "main", "to": "sub_401000", "kind": "direct"},
{"from": "main", "to": "libc.so.6!memcpy", "kind": "plt", "evidence": ["reloc@GOT"]}
],
"roots": ["_start", "init_array[]", "main"]
},
"reachability": [
{
"target": "libssl.so.3!SSL_free",
"status": "NOT_REACHABLE_FOUNDATION",
"path": []
},
{
"target": "libc.so.6!memcpy",
"status": "REACHABLE_CONFIRMED",
"path": ["main", "libc.so.6!memcpy"],
"confidence": 0.98,
"evidence": ["plt", "dynsym", "reloc"]
}
],
"provenance": {
"toolchain": {
"disasm": "ghidra_headless|radare2|llvm-mca",
"version": "…"
},
"scan_manifest_hash": "…",
"timestamp_utc": "2025-11-16T00:00:00Z"
}
}
```
---
# Practical extractors (headless/CLI)
* **DWARF**: `llvm-dwarfdump`/`eu-readelf` for quick CU/function ranges; fall back to the disassembler.
* **Disassembly/CFG/CG** (choose one or more; wrap with a stable adapter):
* **Ghidra Headless API**: recover functions, basic blocks, references, PLT/GOT, vtables; export via a custom headless script to NJIF.
* **radare2 / rizin**: `aaa`, `agCd`, `aflj`, `agj` to export functions/graphs as JSON.
* **Binary Ninja headless** (if license permits) for cleaner IL and indirectcall modeling.
* **angr** for pathsensitive refinement on tricky indirect calls (optional, gated by budget).
**Adapter principle:** All tools output a **small, consistent NJIF** so the scoring engine and lattice logic never depend on any single RE tool.
---
# Indirect call modeling (concise rules)
* **PLT/GOT**: edge from caller → `soname!symbol` with evidence: `plt`, `reloc@GOT`.
* **Function pointers**: if a store to a pointer is found and targets a known function set `{f1…fk}`, add edges with `kind: "indirect"`, `evidence: ["xref-store", "sig-compatible"]`.
* **Virtual calls / vtables**: classmethod set from RTTI/vtable scans; mark edges `evidence: ["vtable-match"]`.
* **Tailcalls**: treat as edges, not fallthrough.
Each conservative step lowers **confidence**, but keeps determinism: the rules and their hashes are in the scan manifest.
---
# Deterministic scoring (plug into Stellas lattice)
* **Inputs**: NJIF, CVE→function mapping (`soname!symbol` or function hash), policy knobs.
* **States**: `{NOT_OBSERVED < POSSIBLE < REACHABLE_CONFIRMED}` with **monotone** merge (never oscillates).
* **Confidence**: product of edge evidences (configurable weights): `direct=1.0, plt=0.98, vtable=0.85, funcptr=0.7`.
* **Output**: OpenVEX/CSAF annotations + human proof path; signed with DSSE to preserve replayability.
---
# Minimal Ghidra headless skeleton (exporter idea)
```bash
analyzeHeadless /work/gh_proj MyProj -import app -scriptPath scripts \
-postScript ExportNjif.java /out/app.njif.json
```
```java
// ExportNjif.java (outline)
public class ExportNjif extends GhidraScript {
public void run() throws Exception {
var fns = getFunctions(true);
// collect functions, blocks, calls, externs/PLT
// map nonnamed functions to sub_<addr>
// detect PLT thunks → dynsym names
// write NJIF JSON deterministically (sorted keys, stable ordering)
}
}
```
---
# Integration points in StellaOps
* **Scanner.Analyzers.Binary.Elf**
* `ElfNormalizer` → hashes, dynamic deps.
* `Symbolizer` → DWARF reader + HeuristicDisasm (via tool adapter).
* `CgBuilder` → NJIF builder/merger (multimodule).
* `ReachabilityEngine` → path search + confidence math.
* `Emitter` → NJIF cache + VEX/CSAF notes.
* **Scheduler**: memoize by `(hashes, toolchain_version, ruleset_hash)` to ensure replayable results.
* **Authority**: sign NJIF + scoring outputs; store manifests (feeds, rule weights, tool versions).
---
# Test fixtures (suggested)
* Tiny ELF zoo: statically linked, PIE, stripped/nonstripped, C++ with vtables, musl vs glibc.
* Known CVE libs (e.g., `libssl`, `zlib`) with versioned symbols to validate soname!symbol mapping.
* Synthetic binaries with functionpointer tables to validate conservative edges.
---
If you want, I can generate:
* A readytorun **Ghidra headless exporter** (Java) that writes NJIF exactly like above.
* A small **.NET parser** that ingests NJIF and emits StellaOps reachability + OpenVEX notes.
Below is a full architecture plan for implementing **stripped-ELF binary reachability** (call graph + NJIF + deterministic scoring, with a hook for patch-oracles) inside **StellaOps**.
I will assume .NET 10, existing microservice split (Scanner.WebService, Scanner.Worker, Concelier, Excitior, Authority, Scheduler, Sbomer, Signals), and your standing rule: **all lattice logic runs in Scanner.WebService**.
---
## 1. Scope, Objectives, Non-Goals
### 1.1 Objectives
1. **Recover function-level call graphs from ELF binaries**, including **stripped** ones:
* Support ET_EXEC / ET_DYN / PIE, static & dynamic linking.
* Support at least **x86_64, aarch64** in v1, later armhf, riscv64.
2. **Produce a neutral, deterministic JSON representation (NJIF)**:
* Tool-agnostic: can be generated from Ghidra, radare2/rizin, Binary Ninja, angr, etc.
* Stable identifiers and schema so downstream services dont depend on a specific RE engine.
3. **Compute function-level reachability for vulnerabilities**:
* Given CVE → `soname!symbol` (and later function-hash) mappings from Concelier,
* Decide `REACHABLE_CONFIRMED` / `REACHABLE_POSSIBLE` / `NOT_REACHABLE_FOUNDATION` with evidence and confidence.
4. **Integrate with StellaOps lattice and VEX outputs**:
* Lattice logic runs in **Scanner.WebService**.
* Results flow into Excitior (VEX) and Sbomer (SBOM annotations), preserving provenance.
5. **Enable deterministic replay**:
* Every analysis run is tied to a **Scan Manifest**: tool versions, ruleset hashes, policy hashes, container image digests.
### 1.2 Non-Goals (v1)
* No dynamic runtime probes (EventPipe/JFR) in this phase.
* No full decompilation; we only need enough IR for calls/edges.
* No aggressive path-sensitive analysis (symbolic execution) in v1; that can be a v2 enhancement.
---
## 2. High-Level System Architecture
### 2.1 Components
* **Scanner.WebService (existing)**
* REST/gRPC API for scans.
* Orchestrates analysis jobs via Scheduler.
* Hosts **Lattice & Reachability Engine** for all artifact types.
* Reads NJIF results, merges with Concelier function mappings and policies.
* **Scanner.Worker (existing, extended)**
* Executes **Binary Analyzer Pipelines**.
* Invokes RE tools (Ghidra, rizin, etc.) in dedicated containers.
* Produces NJIF and persists it.
* **Binary Tools Containers (new)**
* `stellaops-tools-ghidra:<tag>`
* `stellaops-tools-rizin:<tag>`
* Optionally `stellaops-tools-angr` for advanced passes.
* Pinned versions, no network access (for determinism & air-gap).
* **Storage & Metadata**
* **DB (PostgreSQL)**: scan records, NJIF metadata, reachability summaries.
* **Object store** (MinIO/S3/Filesystem): NJIF JSON blobs, tool logs.
* **Authority**: DSSE signatures for Scan Manifest, NJIF, and reachability outputs.
* **Concelier**
* Provides **CVE → component → function symbol/hashes** resolution.
* Exposes “Link-Not-Merge” graph of advisory, component, and function nodes.
* **Excitior (VEX)**
* Consumes Scanner.WebService reachability states.
* Emits OpenVEX/CSAF with properly justified statuses.
* **UnknownsRegistry (future)**
* Receives unresolvable call edges / ambiguous functions from the analyzer,
* Feeds them into “adaptive security” workflows.
### 2.2 End-to-End Flow (Binary / Image Scan)
1. Client requests scan (binary or container image) via **Scanner.WebService**.
2. WebService:
* Extracts binaries from OCI layers (if scanning image),
* Registers **Scan Manifest**,
* Submits a job to Scheduler (queue: `binary-elfflow`).
3. Scanner.Worker dequeues the job:
* Detects ELF binaries,
* Runs **Binary Analyzer Pipeline** for each unique binary hash.
4. Worker uses tools containers:
* Ghidra/rizin → CFG, function discovery, call graph,
* Converts to **NJIF**.
5. Worker persists NJIF + metadata; marks analysis complete.
6. Scanner.WebService picks up NJIF:
* Fetches advisory function mappings from Concelier,
* Runs **Reachability & Lattice scoring**,
* Updates scan results and triggers Excitior / Sbomer.
All steps are deterministic given:
* Input artifact,
* Tool container digests,
* Ruleset/policy versions.
---
## 3. Binary Analyzer Subsystem (Scanner.Worker)
Introduce a dedicated module:
* `StellaOps.Scanner.Analyzers.Binary.Elf`
### 3.1 Internal Layers
1. **ElfDetector**
* Inspects files in a scan:
* Magic `0x7f 'E' 'L' 'F'`,
* Confirms architecture via ELF header.
* Produces `BinaryArtifact` records with:
* `hashes` (SHA-256, BLAKE3),
* `path` in container,
* `arch`, `endianness`.
2. **ElfNormalizer**
* Uses a lightweight library (e.g., ElfSharp) to extract:
* `ElfType` (ET_EXEC, ET_DYN),
* interpreter (`PT_INTERP`),
* `DT_NEEDED` list,
* RPATH/RUNPATH,
* presence/absence of DWARF sections.
* Emits a normalized `ElfMetadata` DTO.
3. **Symbolization Layer**
* Sub-components:
* `DwarfSymbolReader`: if DWARF present, read CU, function ranges, names, inlines.
* `DynsymReader`: parse `.dynsym`, `.plt`, exported symbols.
* `HeuristicFunctionFinder`:
* For stripped binaries:
* Use disassembler xrefs, prolog patterns, return instructions, call-targets.
* Recognize PLT thunks → `soname!symbol`.
* Consolidates into `FunctionSymbol` entities:
* `id` (e.g., `main`, `sub_401000`, `libc.so.6!memcpy`),
* `addr`, `size`, `is_external`, `from` (`dwarf`, `dynsym`, `heuristic`).
4. **Disassembly & IR Layer**
* Abstraction: `IDisassemblyAdapter`:
* `Task<DisasmResult> AnalyzeAsync(BinaryArtifact, ElfMetadata, ScanManifest)`
* Implementations:
* `GhidraDisassemblyAdapter`:
* Invokes headless Ghidra in container,
* Receives machine-readable JSON (script-produced),
* Extracts functions, basic blocks, calls, GOT/PLT info, vtables.
* `RizinDisassemblyAdapter` (backup/fallback).
* Produces:
* `BasicBlock` objects,
* `Instruction` metadata where needed for calls,
* `CallSite` records (direct, PLT, indirect).
5. **Call-Graph Builder**
* Consumes `FunctionSymbol` + `CallSite` sets.
* Identifies **roots**:
* `_start`, `.init_array` entries,
* `main` (if present),
* Exported API functions for shared libs.
* Creates `CallGraph`:
* Nodes: functions (`FunctionNode`),
* Edges: `CallEdge` with:
* `kind`: `direct`, `plt`, `indirect-funcptr`, `indirect-vtable`, `tailcall`,
* `evidence`: tags like `["reloc@GOT", "sig-match", "vtable-class"]`.
6. **Evidence & Confidence Annotator**
* For each edge, computes a **local confidence**:
* `direct`: 1.0
* `plt`: 0.98
* `indirect-funcptr`: 0.7
* `indirect-vtable`: 0.85
* For each path later, Scanner.WebService composes these.
7. **NJIF Serializer**
* Transforms domain objects into **NJIF JSON**:
* Sorted keys, stable ordering for determinism.
* Writes:
* `artifact`, `elf`, `symbols`, `cfg`, `cg`, and partial `reachability: []` (filled by WebService).
* Stores in object store, returns location + hash to DB.
8. **Unknowns Reporting**
* Any unresolved:
* Indirect call with empty target set,
* Function region not mapped to symbol,
* Logged as `UnknownEvidence` records and optionally published to **UnknownsRegistry** stream.
---
## 4. NJIF Data Model (Neutral JSON Intermediate Format)
Define a stable schema with a top-level `njif_schema_version` field.
### 4.1 Top-Level Shape
```json
{
"njif_schema_version": "1.0.0",
"artifact": { ... },
"symbols": { ... },
"cfg": [ ... ],
"cg": { ... },
"reachability": [ ... ],
"provenance": { ... }
}
```
### 4.2 Key Sections
1. `artifact`
* `path`, `hashes`, `arch`, `elf.type`, `interpreter`, `needed`, `rpath`, `runpath`.
2. `symbols`
* `exported`: external/dynamic symbols, especially PLT:
* `id`, `kind`, `plt`, `lib`.
* `functions`:
* `id` (synthetic or real name),
* `addr`, `size`, `from` (source of naming info),
* `name_hint` (optional).
3. `cfg`
* Per-function basic block CFG plus call sites:
* Blocks with `succ`, `calls` entries.
* Sufficient for future static checks, not full IR.
4. `cg`
* `nodes`: function nodes with evidence tags.
* `edges`: call edges with:
* `from`, `to`, `kind`, `evidence`.
* `roots`: entrypoints for reachability algorithms.
5. `reachability`
* Initially empty from Worker.
* Populated in Scanner.WebService as:
```json
{
"target": "libssl.so.3!SSL_free",
"status": "REACHABLE_CONFIRMED",
"path": ["_start", "main", "libssl.so.3!SSL_free"],
"confidence": 0.93,
"evidence": ["plt", "dynsym", "reloc"]
}
```
6. `provenance`
* `toolchain`:
* `disasm`: `"ghidra_headless:10.4"`, etc.
* `scan_manifest_hash`,
* `timestamp_utc`.
### 4.3 Persisting NJIF
* Object store (versioned path):
* `njif/{sha256}/njif-v1.json`
* DB table `binary_njif`:
* `binary_hash`, `njif_hash`, `schema_version`, `toolchain_digest`, `scan_manifest_id`.
---
## 5. Reachability & Lattice Integration (Scanner.WebService)
### 5.1 Inputs
* **NJIF** for each binary (possibly multiple binaries per container).
* Conceliers **CVE → (component, function)** resolution:
* `component_id``soname!symbol` sets, and where available, function hashes.
* Scanners existing **lattice policies**:
* States: e.g. `NOT_OBSERVED < POSSIBLE < REACHABLE_CONFIRMED`.
* Merge rules are monotone.
### 5.2 Reachability Engine
New service module:
* `StellaOps.Scanner.Domain.Reachability`
* `INjifRepository` (reads NJIF JSON),
* `IFunctionMappingResolver` (Concelier adapter),
* `IReachabilityCalculator`.
Algorithm per target function:
1. Resolve vulnerable function(s):
* From Concelier: `soname!symbol` and/or `func_hash`.
* Map to NJIF `symbols.exported` or `symbols.functions`.
2. For each binary:
* Use `cg.roots` as entry set.
* BFS/DFS along edges until:
* Reaching target node(s),
* Or graph fully explored.
3. For each successful path:
* Collect edges `confidence` weights, compute path confidence:
* e.g., product of edge confidences or a log/additive scheme.
4. Aggregate result:
* If ≥ 1 path with only `direct/plt` edges:
* `status = REACHABLE_CONFIRMED`.
* Else if only paths with indirect edges:
* `status = REACHABLE_POSSIBLE`.
* Else:
* `status = NOT_REACHABLE_FOUNDATION`.
5. Emit `reachability` entry back into NJIF (or as separate DB table) and into scan result graph.
### 5.3 Lattice & VEX
* Lattice computation is done per `(CVE, component, binary)` triple:
* Input: reachability status + other signals.
* Resulting state is:
* Exposed to **Excitior** as a set of **evidence-annotated VEX facts**.
* Excitior translates:
* `NOT_REACHABLE_FOUNDATION` → likely `not_affected` with justification “code_not_reachable”.
* `REACHABLE_CONFIRMED``affected` or “present_and_exploitable” (depending on overall policy).
---
## 6. Patch-Oracle Extension (Advanced, but Architected Now)
While not strictly required for v1, we should reserve architecture hooks.
### 6.1 Concept
* Given:
* A **vulnerable** library build (or binary),
* A **patched** build.
* Run analyzers on both; produce NJIF for each.
* Compare call graphs & function bodies (e.g., hash of normalized bytes):
* Identify **changed functions** and potentially changed code regions.
* Concelier links those function IDs to specific CVEs (via vendor patch metadata).
* These become authoritative “patched function sets” (the **patch oracle**).
### 6.2 Integration Points
Add a module:
* `StellaOps.Scanner.Analysis.PatchOracle`
* Input: pair of artifact hashes (old, new) + NJIF.
* Output: list of `FunctionPatchRecord`:
* `function_id`, `binary_hash_old`, `binary_hash_new`, `change_kind` (`added`, `modified`, `deleted`).
Concelier:
* Ingests `FunctionPatchRecord` via internal API and updates advisory graph:
* CVE → function set derived from real patch.
* Reachability Engine:
* Uses patch-derived function sets instead of or in addition to symbol mapping from vendor docs.
---
## 7. Persistence, Determinism, Caching
### 7.1 Scan Manifest
For every scan job, create:
* `scan_manifest`:
* Input artifact hashes,
* List of binaries,
* Tool container digests (Ghidra, rizin, etc.),
* Ruleset/policy/lattice hashes,
* Time, user, and config flags.
Authority signs this manifest with DSSE.
### 7.2 Binary Analysis Cache
Key: `(binary_hash, arch, toolchain_digest, njif_schema_version)`.
* If present:
* Skip re-running Ghidra/rizin; reuse NJIF.
* If absent:
* Run analysis, then cache NJIF.
This provides deterministic replay and prevents re-analysis across scans and across customers (if allowed by tenancy model).
---
## 8. APIs & Integration Contracts
### 8.1 Scanner.WebService External API (REST)
1. `POST /api/scans/images`
* Existing; extended to flag: `includeBinaryReachability: true`.
2. `POST /api/scans/binaries`
* Upload a standalone ELF; returns `scan_id`.
3. `GET /api/scans/{scanId}/reachability`
* Returns list of `(cve_id, component, binary_path, function_id, status, confidence, path)`.
No path versioning; idempotent and additive (new fields appear, old ones remain valid).
### 8.2 Internal APIs
* **Worker ↔ Object Store**:
* `PUT /binary-njif/{sha256}/njif-v1.json`.
* **WebService ↔ Worker (via Scheduler)**:
* Job payload includes:
* `scan_manifest_id`,
* `binary_hashes`,
* `analysis_profile` (`default`, `deep`).
* **WebService ↔ Concelier**:
* `POST /internal/functions/resolve`:
* Input: `(cve_id, component_ids[])`,
* Output: `soname!symbol[]`, optional `func_hash[]`.
* **WebService ↔ Excitior**:
* Existing VEX ingestion extended with **reachability evidence** fields.
---
## 9. Observability, Security, Resource Model
### 9.1 Observability
* **Metrics**:
* Analysis duration per binary,
* NJIF size,
* Cache hit ratio,
* Reachability evaluation time per CVE.
* **Logs**:
* Ghidra/rizin container logs stored alongside NJIF,
* Unknowns logs for unresolved call targets.
* **Tracing**:
* Each scan/analysis annotated with `scan_manifest_id` to allow end-to-end trace.
### 9.2 Security
* Tools containers:
* No outbound network.
* Limited to read-only artifact mount + write-only result mount.
* Binary content:
* Treated as confidential; stored encrypted at rest if your global policy requires it.
* DSSE:
* Authority signs:
* Scan Manifest,
* NJIF blob hash,
* Reachability summary.
* Enables “Proof-of-Integrity Graph” linkage later.
### 9.3 Resource Model
* ELF analysis can be heavy; design for:
* Separate **worker queue** and autoscaling group for binary analysis.
* Configurable max concurrency and per-job CPU/memory limits.
* Deep analysis (indirect calls, vtables) can be toggled via `analysis_profile`.
---
## 10. Implementation Roadmap
A pragmatic, staged plan:
### Phase 0 Foundations (12 sprints)
* Create `StellaOps.Scanner.Analyzers.Binary.Elf` project.
* Implement:
* `ElfDetector`, `ElfNormalizer`.
* DB tables: `binary_artifacts`, `binary_njif`.
* Integrate with Scheduler and Worker pipeline.
### Phase 1 Non-stripped ELF + NJIF v1 (23 sprints)
* Implement **DWARF + dynsym symbolization**.
* Implement **GhidraDisassemblyAdapter** for x86_64.
* Build **CallGraphBuilder** (direct + PLT calls).
* Implement NJIF serializer v1; store in object store.
* Basic reachability engine in WebService:
* Only direct and PLT edges,
* Only for DWARF-named functions.
* Integrate with Concelier function mapping via `soname!symbol`.
### Phase 2 Stripped ELF Support (23 sprints)
* Implement `HeuristicFunctionFinder` for function discovery in stripped binaries.
* Extend Ghidra script to mark PLT/GOT, vtables, function pointers.
* Call graph: add:
* `indirect-funcptr`, `indirect-vtable`, `tailcall` edges.
* Evidence tagging and local confidence scoring.
* Extend reachability engine to:
* Distinguish `REACHABLE_CONFIRMED` vs `REACHABLE_POSSIBLE`.
### Phase 3 Multi-Arch & Performance (23 sprints)
* Add support for **aarch64** (Ghidra language, appropriate calling conventions).
* Optimize:
* Binary analysis cache,
* Tool container lifecycle,
* Concurrent analysis.
* Add Unknowns reporting and hookup to UnknownsRegistry (if already implemented).
### Phase 4 Patch-Oracle Pilot (23 sprints)
* Implement PatchOracle module:
* Compare old/new NJIFs,
* Detect changed functions.
* Integrate with Conceliers advisory graph.
* Start validating against curated CVE/patch datasets.
### Phase 5 Hardening & Documentation
* Golden fixtures:
* Small ELF zoo (stripped/non-stripped, PIE, static, C++, vtables).
* Known vulnerable libs (e.g., OpenSSL, zlib) to confirm correct function mapping.
* Add CLI/demo in `StellaOps.Scanner.Cli`:
* `stellaops scan-binary --file app --show-reachability`.
* Customer-facing and internal docs:
* NJIF schema,
* API usage,
* Limitations and interpretation guidelines.
---
If you want, next step I can do is take this plan and:
* Break it into **epics / tickets** (SCAN-BINARY-xxx) with clear DoD per phase, or
* Draft the **Ghidra headless Java script** and the **.NET NJIF model classes** so your agents can plug them straight into the Scanner repo.