feat: Add new provenance and crypto registry documentation
- Introduced attestation inventory and subject-rekor mapping files for tracking Docker packages. - Added a comprehensive crypto registry decision document outlining defaults and required follow-ups. - Created an offline feeds manifest for bundling air-gap resources. - Implemented a script to generate and update binary manifests for curated binaries. - Added a verification script to ensure binary artefacts are located in approved directories. - Defined new schemas for AdvisoryEvidenceBundle, OrchestratorEnvelope, ScannerReportReadyPayload, and ScannerScanCompletedPayload. - Established project files for StellaOps.Orchestrator.Schemas and StellaOps.PolicyAuthoritySignals.Contracts. - Updated vendor manifest to track pinned binaries for integrity.
This commit is contained in:
846
docs/product-advisories/17-Nov-2026 - 1.md
Normal file
846
docs/product-advisories/17-Nov-2026 - 1.md
Normal file
@@ -0,0 +1,846 @@
|
||||
|
||||
Here’s a compact blueprint for bringing **stripped ELF binaries** into StellaOps’s **call‑graph + reachability scoring**—from raw bytes → neutral JSON → deterministic scoring.
|
||||
|
||||
---
|
||||
|
||||
# Why this matters (quick)
|
||||
|
||||
Even when symbols are missing, you can still (1) recover functions, (2) build a call graph, and (3) decide if a vulnerable function is *actually* reachable from the binary’s entrypoints. This feeds StellaOps’s deterministic scoring/lattice engine so VEX decisions are evidence‑backed, not guesswork.
|
||||
|
||||
---
|
||||
|
||||
# High‑level pipeline
|
||||
|
||||
1. **Ingest**
|
||||
|
||||
* Accept: ELF (static/dynamic), PIE, musl/glibc, multiple arches (x86_64, aarch64, armhf, riscv64).
|
||||
* Normalize: compute file hash set (SHA‑256, BLAKE3), note `PT_DYNAMIC`, `DT_NEEDED`, interpreter, RPATH/RUNPATH.
|
||||
|
||||
2. **Symbolization (best‑effort)**
|
||||
|
||||
* **If DWARF present**: read `.debug_*` (function names, inlines, CU boundaries, ranges).
|
||||
* **If stripped**:
|
||||
|
||||
* Use disassembler to **discover functions** (prolog patterns, xref‑to‑targets, thunk detection).
|
||||
* Derive **synthetic names**: `sub_<va>`, `plt_<name>` (from dynamic symbol table if available), `extern@libc.so.6:memcpy`.
|
||||
* Lift exported dynsyms and PLT stubs even when local symbols are removed.
|
||||
* Recover **string‑referenced names** (e.g., Go/Python/C++ RTTI/Itanium mangling where present).
|
||||
|
||||
3. **Disassembly & IR**
|
||||
|
||||
* Disassemble to basic blocks; lift to a neutral IR (SSA‑like) sufficient for:
|
||||
|
||||
* Call edges (direct `call`/`bl`).
|
||||
* **Indirect calls** via GOT/IAT, vtables, function pointers (approximate with points‑to sets).
|
||||
* Tailcalls, thunks, PLT interposition.
|
||||
|
||||
4. **Call‑graph build**
|
||||
|
||||
* Start from **entrypoints**:
|
||||
|
||||
* ELF entry (`_start`), constructors (`.init_array`), exported API (public symbols), `main` (if recoverable).
|
||||
* Optional: **entry‑trace** (cmd‑line + env + loader path) from container image to seed realistic roots.
|
||||
* Build **CG** with:
|
||||
|
||||
* Direct edges: precise.
|
||||
* Indirect edges: conservative, with **evidence tags** (GOT target set, vtable class set, signature match).
|
||||
* Record **inter‑module edges** to shared libs (soname + version) with relocation evidence.
|
||||
|
||||
5. **Reachability scoring (deterministic)**
|
||||
|
||||
* Input: list of vulnerable functions/paths (from CSAF/CVE KB) normalized to **function‑level identifiers** (soname!symbol or hash‑based if unnamed).
|
||||
* Compute **reachability** from roots → target:
|
||||
|
||||
* `REACHABLE_CONFIRMED` (path with only precise edges),
|
||||
* `REACHABLE_POSSIBLE` (path contains conservative edges),
|
||||
* `NOT_REACHABLE_FOUNDATION` (no path in current graph),
|
||||
* Add **confidence** derived from edge evidence + relocation proof.
|
||||
* Emit **proof trails** (the exact path: nodes, edges, evidence).
|
||||
|
||||
6. **Neutral JSON intermediate (NJIF)**
|
||||
|
||||
* Stored in cache; signed for deterministic replay.
|
||||
* Consumed by StellaOps.Policy/Lattice to merge with VEX.
|
||||
|
||||
---
|
||||
|
||||
# Neutral JSON Intermediate Format (NJIF)
|
||||
|
||||
```json
|
||||
{
|
||||
"artifact": {
|
||||
"path": "/work/bin/app",
|
||||
"hashes": {"sha256": "…", "blake3": "…"},
|
||||
"arch": "x86_64",
|
||||
"elf": {
|
||||
"type": "ET_DYN",
|
||||
"interpreter": "/lib64/ld-linux-x86-64.so.2",
|
||||
"needed": ["libc.so.6", "libssl.so.3"],
|
||||
"rpath": [],
|
||||
"runpath": []
|
||||
}
|
||||
},
|
||||
"symbols": {
|
||||
"exported": [
|
||||
{"id": "libc.so.6!memcpy", "kind": "dynsym", "addr": "0x0", "plt": true}
|
||||
],
|
||||
"functions": [
|
||||
{"id": "sub_401000", "addr": "0x401000", "size": 112, "name_hint": null, "from": "disasm"},
|
||||
{"id": "main", "addr": "0x4023d0", "size": 348, "from": "dwarf|heuristic"}
|
||||
]
|
||||
},
|
||||
"cfg": [
|
||||
{"func": "main", "blocks": [
|
||||
{"b": "0x4023d0", "succ": ["0x402415"], "calls": [{"type": "direct", "target": "sub_401000"}]},
|
||||
{"b": "0x402415", "succ": ["0x402440"], "calls": [{"type": "plt", "target": "libc.so.6!memcpy"}]}
|
||||
]}
|
||||
],
|
||||
"cg": {
|
||||
"nodes": [
|
||||
{"id": "main", "evidence": ["dwarf|heuristic"]},
|
||||
{"id": "sub_401000"},
|
||||
{"id": "libc.so.6!memcpy", "external": true, "lib": "libc.so.6"}
|
||||
],
|
||||
"edges": [
|
||||
{"from": "main", "to": "sub_401000", "kind": "direct"},
|
||||
{"from": "main", "to": "libc.so.6!memcpy", "kind": "plt", "evidence": ["reloc@GOT"]}
|
||||
],
|
||||
"roots": ["_start", "init_array[]", "main"]
|
||||
},
|
||||
"reachability": [
|
||||
{
|
||||
"target": "libssl.so.3!SSL_free",
|
||||
"status": "NOT_REACHABLE_FOUNDATION",
|
||||
"path": []
|
||||
},
|
||||
{
|
||||
"target": "libc.so.6!memcpy",
|
||||
"status": "REACHABLE_CONFIRMED",
|
||||
"path": ["main", "libc.so.6!memcpy"],
|
||||
"confidence": 0.98,
|
||||
"evidence": ["plt", "dynsym", "reloc"]
|
||||
}
|
||||
],
|
||||
"provenance": {
|
||||
"toolchain": {
|
||||
"disasm": "ghidra_headless|radare2|llvm-mca",
|
||||
"version": "…"
|
||||
},
|
||||
"scan_manifest_hash": "…",
|
||||
"timestamp_utc": "2025-11-16T00:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Practical extractors (headless/CLI)
|
||||
|
||||
* **DWARF**: `llvm-dwarfdump`/`eu-readelf` for quick CU/function ranges; fall back to the disassembler.
|
||||
* **Disassembly/CFG/CG** (choose one or more; wrap with a stable adapter):
|
||||
|
||||
* **Ghidra Headless API**: recover functions, basic blocks, references, PLT/GOT, vtables; export via a custom headless script to NJIF.
|
||||
* **radare2 / rizin**: `aaa`, `agCd`, `aflj`, `agj` to export functions/graphs as JSON.
|
||||
* **Binary Ninja headless** (if license permits) for cleaner IL and indirect‑call modeling.
|
||||
* **angr** for path‑sensitive refinement on tricky indirect calls (optional, gated by budget).
|
||||
|
||||
**Adapter principle:** All tools output a **small, consistent NJIF** so the scoring engine and lattice logic never depend on any single RE tool.
|
||||
|
||||
---
|
||||
|
||||
# Indirect call modeling (concise rules)
|
||||
|
||||
* **PLT/GOT**: edge from caller → `soname!symbol` with evidence: `plt`, `reloc@GOT`.
|
||||
* **Function pointers**: if a store to a pointer is found and targets a known function set `{f1…fk}`, add edges with `kind: "indirect"`, `evidence: ["xref-store", "sig-compatible"]`.
|
||||
* **Virtual calls / vtables**: class‑method set from RTTI/vtable scans; mark edges `evidence: ["vtable-match"]`.
|
||||
* **Tailcalls**: treat as edges, not fallthrough.
|
||||
|
||||
Each conservative step lowers **confidence**, but keeps determinism: the rules and their hashes are in the scan manifest.
|
||||
|
||||
---
|
||||
|
||||
# Deterministic scoring (plug into Stella’s lattice)
|
||||
|
||||
* **Inputs**: NJIF, CVE→function mapping (`soname!symbol` or function hash), policy knobs.
|
||||
* **States**: `{NOT_OBSERVED < POSSIBLE < REACHABLE_CONFIRMED}` with **monotone** merge (never oscillates).
|
||||
* **Confidence**: product of edge evidences (configurable weights): `direct=1.0, plt=0.98, vtable=0.85, funcptr=0.7`.
|
||||
* **Output**: OpenVEX/CSAF annotations + human proof path; signed with DSSE to preserve replayability.
|
||||
|
||||
---
|
||||
|
||||
# Minimal Ghidra headless skeleton (exporter idea)
|
||||
|
||||
```bash
|
||||
analyzeHeadless /work/gh_proj MyProj -import app -scriptPath scripts \
|
||||
-postScript ExportNjif.java /out/app.njif.json
|
||||
```
|
||||
|
||||
```java
|
||||
// ExportNjif.java (outline)
|
||||
public class ExportNjif extends GhidraScript {
|
||||
public void run() throws Exception {
|
||||
var fns = getFunctions(true);
|
||||
// collect functions, blocks, calls, externs/PLT
|
||||
// map non‑named functions to sub_<addr>
|
||||
// detect PLT thunks → dynsym names
|
||||
// write NJIF JSON deterministically (sorted keys, stable ordering)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Integration points in StellaOps
|
||||
|
||||
* **Scanner.Analyzers.Binary.Elf**
|
||||
|
||||
* `ElfNormalizer` → hashes, dynamic deps.
|
||||
* `Symbolizer` → DWARF reader + HeuristicDisasm (via tool adapter).
|
||||
* `CgBuilder` → NJIF builder/merger (multi‑module).
|
||||
* `ReachabilityEngine` → path search + confidence math.
|
||||
* `Emitter` → NJIF cache + VEX/CSAF notes.
|
||||
|
||||
* **Scheduler**: memoize by `(hashes, toolchain_version, ruleset_hash)` to ensure replayable results.
|
||||
|
||||
* **Authority**: sign NJIF + scoring outputs; store manifests (feeds, rule weights, tool versions).
|
||||
|
||||
---
|
||||
|
||||
# Test fixtures (suggested)
|
||||
|
||||
* Tiny ELF zoo: statically linked, PIE, stripped/non‑stripped, C++ with vtables, musl vs glibc.
|
||||
* Known CVE libs (e.g., `libssl`, `zlib`) with versioned symbols to validate soname!symbol mapping.
|
||||
* Synthetic binaries with function‑pointer tables to validate conservative edges.
|
||||
|
||||
---
|
||||
|
||||
If you want, I can generate:
|
||||
|
||||
* A ready‑to‑run **Ghidra headless exporter** (Java) that writes NJIF exactly like above.
|
||||
* A small **.NET parser** that ingests NJIF and emits StellaOps reachability + OpenVEX notes.
|
||||
Below is a full architecture plan for implementing **stripped-ELF binary reachability** (call graph + NJIF + deterministic scoring, with a hook for patch-oracles) inside **StellaOps**.
|
||||
|
||||
I will assume .NET 10, existing microservice split (Scanner.WebService, Scanner.Worker, Concelier, Excitior, Authority, Scheduler, Sbomer, Signals), and your standing rule: **all lattice logic runs in Scanner.WebService**.
|
||||
|
||||
---
|
||||
|
||||
## 1. Scope, Objectives, Non-Goals
|
||||
|
||||
### 1.1 Objectives
|
||||
|
||||
1. **Recover function-level call graphs from ELF binaries**, including **stripped** ones:
|
||||
|
||||
* Support ET_EXEC / ET_DYN / PIE, static & dynamic linking.
|
||||
* Support at least **x86_64, aarch64** in v1, later armhf, riscv64.
|
||||
|
||||
2. **Produce a neutral, deterministic JSON representation (NJIF)**:
|
||||
|
||||
* Tool-agnostic: can be generated from Ghidra, radare2/rizin, Binary Ninja, angr, etc.
|
||||
* Stable identifiers and schema so downstream services don’t depend on a specific RE engine.
|
||||
|
||||
3. **Compute function-level reachability for vulnerabilities**:
|
||||
|
||||
* Given CVE → `soname!symbol` (and later function-hash) mappings from Concelier,
|
||||
* Decide `REACHABLE_CONFIRMED` / `REACHABLE_POSSIBLE` / `NOT_REACHABLE_FOUNDATION` with evidence and confidence.
|
||||
|
||||
4. **Integrate with StellaOps lattice and VEX outputs**:
|
||||
|
||||
* Lattice logic runs in **Scanner.WebService**.
|
||||
* Results flow into Excitior (VEX) and Sbomer (SBOM annotations), preserving provenance.
|
||||
|
||||
5. **Enable deterministic replay**:
|
||||
|
||||
* Every analysis run is tied to a **Scan Manifest**: tool versions, ruleset hashes, policy hashes, container image digests.
|
||||
|
||||
### 1.2 Non-Goals (v1)
|
||||
|
||||
* No dynamic runtime probes (EventPipe/JFR) in this phase.
|
||||
* No full decompilation; we only need enough IR for calls/edges.
|
||||
* No aggressive path-sensitive analysis (symbolic execution) in v1; that can be a v2 enhancement.
|
||||
|
||||
---
|
||||
|
||||
## 2. High-Level System Architecture
|
||||
|
||||
### 2.1 Components
|
||||
|
||||
* **Scanner.WebService (existing)**
|
||||
|
||||
* REST/gRPC API for scans.
|
||||
* Orchestrates analysis jobs via Scheduler.
|
||||
* Hosts **Lattice & Reachability Engine** for all artifact types.
|
||||
* Reads NJIF results, merges with Concelier function mappings and policies.
|
||||
|
||||
* **Scanner.Worker (existing, extended)**
|
||||
|
||||
* Executes **Binary Analyzer Pipelines**.
|
||||
* Invokes RE tools (Ghidra, rizin, etc.) in dedicated containers.
|
||||
* Produces NJIF and persists it.
|
||||
|
||||
* **Binary Tools Containers (new)**
|
||||
|
||||
* `stellaops-tools-ghidra:<tag>`
|
||||
* `stellaops-tools-rizin:<tag>`
|
||||
* Optionally `stellaops-tools-angr` for advanced passes.
|
||||
* Pinned versions, no network access (for determinism & air-gap).
|
||||
|
||||
* **Storage & Metadata**
|
||||
|
||||
* **DB (PostgreSQL)**: scan records, NJIF metadata, reachability summaries.
|
||||
* **Object store** (MinIO/S3/Filesystem): NJIF JSON blobs, tool logs.
|
||||
* **Authority**: DSSE signatures for Scan Manifest, NJIF, and reachability outputs.
|
||||
|
||||
* **Concelier**
|
||||
|
||||
* Provides **CVE → component → function symbol/hashes** resolution.
|
||||
* Exposes “Link-Not-Merge” graph of advisory, component, and function nodes.
|
||||
|
||||
* **Excitior (VEX)**
|
||||
|
||||
* Consumes Scanner.WebService reachability states.
|
||||
* Emits OpenVEX/CSAF with properly justified statuses.
|
||||
|
||||
* **UnknownsRegistry (future)**
|
||||
|
||||
* Receives unresolvable call edges / ambiguous functions from the analyzer,
|
||||
* Feeds them into “adaptive security” workflows.
|
||||
|
||||
### 2.2 End-to-End Flow (Binary / Image Scan)
|
||||
|
||||
1. Client requests scan (binary or container image) via **Scanner.WebService**.
|
||||
2. WebService:
|
||||
|
||||
* Extracts binaries from OCI layers (if scanning image),
|
||||
* Registers **Scan Manifest**,
|
||||
* Submits a job to Scheduler (queue: `binary-elfflow`).
|
||||
3. Scanner.Worker dequeues the job:
|
||||
|
||||
* Detects ELF binaries,
|
||||
* Runs **Binary Analyzer Pipeline** for each unique binary hash.
|
||||
4. Worker uses tools containers:
|
||||
|
||||
* Ghidra/rizin → CFG, function discovery, call graph,
|
||||
* Converts to **NJIF**.
|
||||
5. Worker persists NJIF + metadata; marks analysis complete.
|
||||
6. Scanner.WebService picks up NJIF:
|
||||
|
||||
* Fetches advisory function mappings from Concelier,
|
||||
* Runs **Reachability & Lattice scoring**,
|
||||
* Updates scan results and triggers Excitior / Sbomer.
|
||||
|
||||
All steps are deterministic given:
|
||||
|
||||
* Input artifact,
|
||||
* Tool container digests,
|
||||
* Ruleset/policy versions.
|
||||
|
||||
---
|
||||
|
||||
## 3. Binary Analyzer Subsystem (Scanner.Worker)
|
||||
|
||||
Introduce a dedicated module:
|
||||
|
||||
* `StellaOps.Scanner.Analyzers.Binary.Elf`
|
||||
|
||||
### 3.1 Internal Layers
|
||||
|
||||
1. **ElfDetector**
|
||||
|
||||
* Inspects files in a scan:
|
||||
|
||||
* Magic `0x7f 'E' 'L' 'F'`,
|
||||
* Confirms architecture via ELF header.
|
||||
* Produces `BinaryArtifact` records with:
|
||||
|
||||
* `hashes` (SHA-256, BLAKE3),
|
||||
* `path` in container,
|
||||
* `arch`, `endianness`.
|
||||
|
||||
2. **ElfNormalizer**
|
||||
|
||||
* Uses a lightweight library (e.g., ElfSharp) to extract:
|
||||
|
||||
* `ElfType` (ET_EXEC, ET_DYN),
|
||||
* interpreter (`PT_INTERP`),
|
||||
* `DT_NEEDED` list,
|
||||
* RPATH/RUNPATH,
|
||||
* presence/absence of DWARF sections.
|
||||
* Emits a normalized `ElfMetadata` DTO.
|
||||
|
||||
3. **Symbolization Layer**
|
||||
|
||||
* Sub-components:
|
||||
|
||||
* `DwarfSymbolReader`: if DWARF present, read CU, function ranges, names, inlines.
|
||||
* `DynsymReader`: parse `.dynsym`, `.plt`, exported symbols.
|
||||
* `HeuristicFunctionFinder`:
|
||||
|
||||
* For stripped binaries:
|
||||
|
||||
* Use disassembler xrefs, prolog patterns, return instructions, call-targets.
|
||||
* Recognize PLT thunks → `soname!symbol`.
|
||||
* Consolidates into `FunctionSymbol` entities:
|
||||
|
||||
* `id` (e.g., `main`, `sub_401000`, `libc.so.6!memcpy`),
|
||||
* `addr`, `size`, `is_external`, `from` (`dwarf`, `dynsym`, `heuristic`).
|
||||
|
||||
4. **Disassembly & IR Layer**
|
||||
|
||||
* Abstraction: `IDisassemblyAdapter`:
|
||||
|
||||
* `Task<DisasmResult> AnalyzeAsync(BinaryArtifact, ElfMetadata, ScanManifest)`
|
||||
* Implementations:
|
||||
|
||||
* `GhidraDisassemblyAdapter`:
|
||||
|
||||
* Invokes headless Ghidra in container,
|
||||
* Receives machine-readable JSON (script-produced),
|
||||
* Extracts functions, basic blocks, calls, GOT/PLT info, vtables.
|
||||
* `RizinDisassemblyAdapter` (backup/fallback).
|
||||
* Produces:
|
||||
|
||||
* `BasicBlock` objects,
|
||||
* `Instruction` metadata where needed for calls,
|
||||
* `CallSite` records (direct, PLT, indirect).
|
||||
|
||||
5. **Call-Graph Builder**
|
||||
|
||||
* Consumes `FunctionSymbol` + `CallSite` sets.
|
||||
* Identifies **roots**:
|
||||
|
||||
* `_start`, `.init_array` entries,
|
||||
* `main` (if present),
|
||||
* Exported API functions for shared libs.
|
||||
* Creates `CallGraph`:
|
||||
|
||||
* Nodes: functions (`FunctionNode`),
|
||||
* Edges: `CallEdge` with:
|
||||
|
||||
* `kind`: `direct`, `plt`, `indirect-funcptr`, `indirect-vtable`, `tailcall`,
|
||||
* `evidence`: tags like `["reloc@GOT", "sig-match", "vtable-class"]`.
|
||||
|
||||
6. **Evidence & Confidence Annotator**
|
||||
|
||||
* For each edge, computes a **local confidence**:
|
||||
|
||||
* `direct`: 1.0
|
||||
* `plt`: 0.98
|
||||
* `indirect-funcptr`: 0.7
|
||||
* `indirect-vtable`: 0.85
|
||||
* For each path later, Scanner.WebService composes these.
|
||||
|
||||
7. **NJIF Serializer**
|
||||
|
||||
* Transforms domain objects into **NJIF JSON**:
|
||||
|
||||
* Sorted keys, stable ordering for determinism.
|
||||
* Writes:
|
||||
|
||||
* `artifact`, `elf`, `symbols`, `cfg`, `cg`, and partial `reachability: []` (filled by WebService).
|
||||
* Stores in object store, returns location + hash to DB.
|
||||
|
||||
8. **Unknowns Reporting**
|
||||
|
||||
* Any unresolved:
|
||||
|
||||
* Indirect call with empty target set,
|
||||
* Function region not mapped to symbol,
|
||||
* Logged as `UnknownEvidence` records and optionally published to **UnknownsRegistry** stream.
|
||||
|
||||
---
|
||||
|
||||
## 4. NJIF Data Model (Neutral JSON Intermediate Format)
|
||||
|
||||
Define a stable schema with a top-level `njif_schema_version` field.
|
||||
|
||||
### 4.1 Top-Level Shape
|
||||
|
||||
```json
|
||||
{
|
||||
"njif_schema_version": "1.0.0",
|
||||
"artifact": { ... },
|
||||
"symbols": { ... },
|
||||
"cfg": [ ... ],
|
||||
"cg": { ... },
|
||||
"reachability": [ ... ],
|
||||
"provenance": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 Key Sections
|
||||
|
||||
1. `artifact`
|
||||
|
||||
* `path`, `hashes`, `arch`, `elf.type`, `interpreter`, `needed`, `rpath`, `runpath`.
|
||||
|
||||
2. `symbols`
|
||||
|
||||
* `exported`: external/dynamic symbols, especially PLT:
|
||||
|
||||
* `id`, `kind`, `plt`, `lib`.
|
||||
* `functions`:
|
||||
|
||||
* `id` (synthetic or real name),
|
||||
* `addr`, `size`, `from` (source of naming info),
|
||||
* `name_hint` (optional).
|
||||
|
||||
3. `cfg`
|
||||
|
||||
* Per-function basic block CFG plus call sites:
|
||||
|
||||
* Blocks with `succ`, `calls` entries.
|
||||
* Sufficient for future static checks, not full IR.
|
||||
|
||||
4. `cg`
|
||||
|
||||
* `nodes`: function nodes with evidence tags.
|
||||
* `edges`: call edges with:
|
||||
|
||||
* `from`, `to`, `kind`, `evidence`.
|
||||
* `roots`: entrypoints for reachability algorithms.
|
||||
|
||||
5. `reachability`
|
||||
|
||||
* Initially empty from Worker.
|
||||
* Populated in Scanner.WebService as:
|
||||
|
||||
```json
|
||||
{
|
||||
"target": "libssl.so.3!SSL_free",
|
||||
"status": "REACHABLE_CONFIRMED",
|
||||
"path": ["_start", "main", "libssl.so.3!SSL_free"],
|
||||
"confidence": 0.93,
|
||||
"evidence": ["plt", "dynsym", "reloc"]
|
||||
}
|
||||
```
|
||||
|
||||
6. `provenance`
|
||||
|
||||
* `toolchain`:
|
||||
|
||||
* `disasm`: `"ghidra_headless:10.4"`, etc.
|
||||
* `scan_manifest_hash`,
|
||||
* `timestamp_utc`.
|
||||
|
||||
### 4.3 Persisting NJIF
|
||||
|
||||
* Object store (versioned path):
|
||||
|
||||
* `njif/{sha256}/njif-v1.json`
|
||||
* DB table `binary_njif`:
|
||||
|
||||
* `binary_hash`, `njif_hash`, `schema_version`, `toolchain_digest`, `scan_manifest_id`.
|
||||
|
||||
---
|
||||
|
||||
## 5. Reachability & Lattice Integration (Scanner.WebService)
|
||||
|
||||
### 5.1 Inputs
|
||||
|
||||
* **NJIF** for each binary (possibly multiple binaries per container).
|
||||
* Concelier’s **CVE → (component, function)** resolution:
|
||||
|
||||
* `component_id` → `soname!symbol` sets, and where available, function hashes.
|
||||
* Scanner’s existing **lattice policies**:
|
||||
|
||||
* States: e.g. `NOT_OBSERVED < POSSIBLE < REACHABLE_CONFIRMED`.
|
||||
* Merge rules are monotone.
|
||||
|
||||
### 5.2 Reachability Engine
|
||||
|
||||
New service module:
|
||||
|
||||
* `StellaOps.Scanner.Domain.Reachability`
|
||||
|
||||
* `INjifRepository` (reads NJIF JSON),
|
||||
* `IFunctionMappingResolver` (Concelier adapter),
|
||||
* `IReachabilityCalculator`.
|
||||
|
||||
Algorithm per target function:
|
||||
|
||||
1. Resolve vulnerable function(s):
|
||||
|
||||
* From Concelier: `soname!symbol` and/or `func_hash`.
|
||||
* Map to NJIF `symbols.exported` or `symbols.functions`.
|
||||
|
||||
2. For each binary:
|
||||
|
||||
* Use `cg.roots` as entry set.
|
||||
* BFS/DFS along edges until:
|
||||
|
||||
* Reaching target node(s),
|
||||
* Or graph fully explored.
|
||||
|
||||
3. For each successful path:
|
||||
|
||||
* Collect edges’ `confidence` weights, compute path confidence:
|
||||
|
||||
* e.g., product of edge confidences or a log/additive scheme.
|
||||
|
||||
4. Aggregate result:
|
||||
|
||||
* If ≥ 1 path with only `direct/plt` edges:
|
||||
|
||||
* `status = REACHABLE_CONFIRMED`.
|
||||
* Else if only paths with indirect edges:
|
||||
|
||||
* `status = REACHABLE_POSSIBLE`.
|
||||
* Else:
|
||||
|
||||
* `status = NOT_REACHABLE_FOUNDATION`.
|
||||
|
||||
5. Emit `reachability` entry back into NJIF (or as separate DB table) and into scan result graph.
|
||||
|
||||
### 5.3 Lattice & VEX
|
||||
|
||||
* Lattice computation is done per `(CVE, component, binary)` triple:
|
||||
|
||||
* Input: reachability status + other signals.
|
||||
* Resulting state is:
|
||||
|
||||
* Exposed to **Excitior** as a set of **evidence-annotated VEX facts**.
|
||||
* Excitior translates:
|
||||
|
||||
* `NOT_REACHABLE_FOUNDATION` → likely `not_affected` with justification “code_not_reachable”.
|
||||
* `REACHABLE_CONFIRMED` → `affected` or “present_and_exploitable” (depending on overall policy).
|
||||
|
||||
---
|
||||
|
||||
## 6. Patch-Oracle Extension (Advanced, but Architected Now)
|
||||
|
||||
While not strictly required for v1, we should reserve architecture hooks.
|
||||
|
||||
### 6.1 Concept
|
||||
|
||||
* Given:
|
||||
|
||||
* A **vulnerable** library build (or binary),
|
||||
* A **patched** build.
|
||||
* Run analyzers on both; produce NJIF for each.
|
||||
* Compare call graphs & function bodies (e.g., hash of normalized bytes):
|
||||
|
||||
* Identify **changed functions** and potentially changed code regions.
|
||||
* Concelier links those function IDs to specific CVEs (via vendor patch metadata).
|
||||
* These become authoritative “patched function sets” (the **patch oracle**).
|
||||
|
||||
### 6.2 Integration Points
|
||||
|
||||
Add a module:
|
||||
|
||||
* `StellaOps.Scanner.Analysis.PatchOracle`
|
||||
|
||||
* Input: pair of artifact hashes (old, new) + NJIF.
|
||||
* Output: list of `FunctionPatchRecord`:
|
||||
|
||||
* `function_id`, `binary_hash_old`, `binary_hash_new`, `change_kind` (`added`, `modified`, `deleted`).
|
||||
|
||||
Concelier:
|
||||
|
||||
* Ingests `FunctionPatchRecord` via internal API and updates advisory graph:
|
||||
|
||||
* CVE → function set derived from real patch.
|
||||
* Reachability Engine:
|
||||
|
||||
* Uses patch-derived function sets instead of or in addition to symbol mapping from vendor docs.
|
||||
|
||||
---
|
||||
|
||||
## 7. Persistence, Determinism, Caching
|
||||
|
||||
### 7.1 Scan Manifest
|
||||
|
||||
For every scan job, create:
|
||||
|
||||
* `scan_manifest`:
|
||||
|
||||
* Input artifact hashes,
|
||||
* List of binaries,
|
||||
* Tool container digests (Ghidra, rizin, etc.),
|
||||
* Ruleset/policy/lattice hashes,
|
||||
* Time, user, and config flags.
|
||||
|
||||
Authority signs this manifest with DSSE.
|
||||
|
||||
### 7.2 Binary Analysis Cache
|
||||
|
||||
Key: `(binary_hash, arch, toolchain_digest, njif_schema_version)`.
|
||||
|
||||
* If present:
|
||||
|
||||
* Skip re-running Ghidra/rizin; reuse NJIF.
|
||||
* If absent:
|
||||
|
||||
* Run analysis, then cache NJIF.
|
||||
|
||||
This provides deterministic replay and prevents re-analysis across scans and across customers (if allowed by tenancy model).
|
||||
|
||||
---
|
||||
|
||||
## 8. APIs & Integration Contracts
|
||||
|
||||
### 8.1 Scanner.WebService External API (REST)
|
||||
|
||||
1. `POST /api/scans/images`
|
||||
|
||||
* Existing; extended to flag: `includeBinaryReachability: true`.
|
||||
2. `POST /api/scans/binaries`
|
||||
|
||||
* Upload a standalone ELF; returns `scan_id`.
|
||||
3. `GET /api/scans/{scanId}/reachability`
|
||||
|
||||
* Returns list of `(cve_id, component, binary_path, function_id, status, confidence, path)`.
|
||||
|
||||
No path versioning; idempotent and additive (new fields appear, old ones remain valid).
|
||||
|
||||
### 8.2 Internal APIs
|
||||
|
||||
* **Worker ↔ Object Store**:
|
||||
|
||||
* `PUT /binary-njif/{sha256}/njif-v1.json`.
|
||||
|
||||
* **WebService ↔ Worker (via Scheduler)**:
|
||||
|
||||
* Job payload includes:
|
||||
|
||||
* `scan_manifest_id`,
|
||||
* `binary_hashes`,
|
||||
* `analysis_profile` (`default`, `deep`).
|
||||
|
||||
* **WebService ↔ Concelier**:
|
||||
|
||||
* `POST /internal/functions/resolve`:
|
||||
|
||||
* Input: `(cve_id, component_ids[])`,
|
||||
* Output: `soname!symbol[]`, optional `func_hash[]`.
|
||||
|
||||
* **WebService ↔ Excitior**:
|
||||
|
||||
* Existing VEX ingestion extended with **reachability evidence** fields.
|
||||
|
||||
---
|
||||
|
||||
## 9. Observability, Security, Resource Model
|
||||
|
||||
### 9.1 Observability
|
||||
|
||||
* **Metrics**:
|
||||
|
||||
* Analysis duration per binary,
|
||||
* NJIF size,
|
||||
* Cache hit ratio,
|
||||
* Reachability evaluation time per CVE.
|
||||
|
||||
* **Logs**:
|
||||
|
||||
* Ghidra/rizin container logs stored alongside NJIF,
|
||||
* Unknowns logs for unresolved call targets.
|
||||
|
||||
* **Tracing**:
|
||||
|
||||
* Each scan/analysis annotated with `scan_manifest_id` to allow end-to-end trace.
|
||||
|
||||
### 9.2 Security
|
||||
|
||||
* Tools containers:
|
||||
|
||||
* No outbound network.
|
||||
* Limited to read-only artifact mount + write-only result mount.
|
||||
* Binary content:
|
||||
|
||||
* Treated as confidential; stored encrypted at rest if your global policy requires it.
|
||||
* DSSE:
|
||||
|
||||
* Authority signs:
|
||||
|
||||
* Scan Manifest,
|
||||
* NJIF blob hash,
|
||||
* Reachability summary.
|
||||
* Enables “Proof-of-Integrity Graph” linkage later.
|
||||
|
||||
### 9.3 Resource Model
|
||||
|
||||
* ELF analysis can be heavy; design for:
|
||||
|
||||
* Separate **worker queue** and autoscaling group for binary analysis.
|
||||
* Configurable max concurrency and per-job CPU/memory limits.
|
||||
* Deep analysis (indirect calls, vtables) can be toggled via `analysis_profile`.
|
||||
|
||||
---
|
||||
|
||||
## 10. Implementation Roadmap
|
||||
|
||||
A pragmatic, staged plan:
|
||||
|
||||
### Phase 0 – Foundations (1–2 sprints)
|
||||
|
||||
* Create `StellaOps.Scanner.Analyzers.Binary.Elf` project.
|
||||
* Implement:
|
||||
|
||||
* `ElfDetector`, `ElfNormalizer`.
|
||||
* DB tables: `binary_artifacts`, `binary_njif`.
|
||||
* Integrate with Scheduler and Worker pipeline.
|
||||
|
||||
### Phase 1 – Non-stripped ELF + NJIF v1 (2–3 sprints)
|
||||
|
||||
* Implement **DWARF + dynsym symbolization**.
|
||||
* Implement **GhidraDisassemblyAdapter** for x86_64.
|
||||
* Build **CallGraphBuilder** (direct + PLT calls).
|
||||
* Implement NJIF serializer v1; store in object store.
|
||||
* Basic reachability engine in WebService:
|
||||
|
||||
* Only direct and PLT edges,
|
||||
* Only for DWARF-named functions.
|
||||
* Integrate with Concelier function mapping via `soname!symbol`.
|
||||
|
||||
### Phase 2 – Stripped ELF Support (2–3 sprints)
|
||||
|
||||
* Implement `HeuristicFunctionFinder` for function discovery in stripped binaries.
|
||||
* Extend Ghidra script to mark PLT/GOT, vtables, function pointers.
|
||||
* Call graph: add:
|
||||
|
||||
* `indirect-funcptr`, `indirect-vtable`, `tailcall` edges.
|
||||
* Evidence tagging and local confidence scoring.
|
||||
* Extend reachability engine to:
|
||||
|
||||
* Distinguish `REACHABLE_CONFIRMED` vs `REACHABLE_POSSIBLE`.
|
||||
|
||||
### Phase 3 – Multi-Arch & Performance (2–3 sprints)
|
||||
|
||||
* Add support for **aarch64** (Ghidra language, appropriate calling conventions).
|
||||
* Optimize:
|
||||
|
||||
* Binary analysis cache,
|
||||
* Tool container lifecycle,
|
||||
* Concurrent analysis.
|
||||
* Add Unknowns reporting and hookup to UnknownsRegistry (if already implemented).
|
||||
|
||||
### Phase 4 – Patch-Oracle Pilot (2–3 sprints)
|
||||
|
||||
* Implement PatchOracle module:
|
||||
|
||||
* Compare old/new NJIFs,
|
||||
* Detect changed functions.
|
||||
* Integrate with Concelier’s advisory graph.
|
||||
* Start validating against curated CVE/patch datasets.
|
||||
|
||||
### Phase 5 – Hardening & Documentation
|
||||
|
||||
* Golden fixtures:
|
||||
|
||||
* Small ELF zoo (stripped/non-stripped, PIE, static, C++, vtables).
|
||||
* Known vulnerable libs (e.g., OpenSSL, zlib) to confirm correct function mapping.
|
||||
* Add CLI/demo in `StellaOps.Scanner.Cli`:
|
||||
|
||||
* `stellaops scan-binary --file app --show-reachability`.
|
||||
* Customer-facing and internal docs:
|
||||
|
||||
* NJIF schema,
|
||||
* API usage,
|
||||
* Limitations and interpretation guidelines.
|
||||
|
||||
---
|
||||
|
||||
If you want, next step I can do is take this plan and:
|
||||
|
||||
* Break it into **epics / tickets** (SCAN-BINARY-xxx) with clear DoD per phase, or
|
||||
* Draft the **Ghidra headless Java script** and the **.NET NJIF model classes** so your agents can plug them straight into the Scanner repo.
|
||||
Reference in New Issue
Block a user