# Native Reachability Graph Plan (Scanner · Signals Alignment) ## Goals - Extract native reachability graphs from ELF binaries across layers (stripped and unstripped), emitting: - Build IDs (`.note.gnu.build-id`) and code IDs per file. - Symbol digests (purl+symbol) and edges (callgraph) with deterministic ordering. - Synthetic roots for `_init`, `.init_array`, `.preinit_array`, entry points. - DSSE graph bundle per layer for Signals ingestion. - Offline-friendly, deterministic outputs (stable ordering, UTF-8, UTC). ## Inputs - Layered filesystem with ELF binaries and shared objects. - Layer metadata: digests from `scanner.rootfs.layers` and `scanner.layer.archives` (when provided). - Optional runtime proc snapshot for reconciliation (if available via Signals pipeline). ## Approach - **Discovery**: Walk layer directories; identify ELF binaries (`e_ident`, machine, class). Record per-layer path. - **Identifiers**: Capture build-id (hash of `.note.gnu.build-id`), fallback to SHA-256 of `.text` when absent; store code-id (PE/ELF-friendly string). - **Symbols**: Parse `.symtab`/`.dynsym`; compute stable symbol digests (e.g., SHA-256 over symbol bytes + name); include size/address for ordering. - **Edges**: Build callgraph from relocation/import tables and (when available) `.eh_frame`/`.plt` linkage; emit Unknown edges when target unresolved. - **Synthetic Roots**: Insert edges from synthetic root nodes (per binary) to `_start`, `_init`, `.init_array` entries. - **Layer Bundles**: Emit DSSE bundle per layer with edges, symbols, identifiers, and provenance (layer digest, path, sha256). - **Determinism**: Sort by layer digest, path, symbol name; normalize paths to POSIX separators; timestamps fixed to generation time in UTC ISO-8601. ## Deliverables - Library: `StellaOps.Scanner.Analyzers.Native` (new) with ELF reader and graph builder. - Tests: fixtures under `src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Native.Tests` using stripped/unstripped ELF samples (no network). - DSSE bundle schema: shared constants/types reused by Signals ingestion. - Sprint doc links: referenced from `SPRINT_0146_0001_0001_scanner_analyzer_gap_close.md`. ## Task Backlog (initial) 1) Skeleton project `StellaOps.Scanner.Analyzers.Native` + plugin registration for scanner worker. 2) ELF reader: header detection, build-id extraction, code-id calculation, section loader with deterministic sorting. 3) Symbol digests: compute `sha256(name + addr + size + binding)`; emit per-symbol evidence and purl+symbol IDs. 4) Callgraph builder: edges from PLT/relocs/imports; Unknown targets captured; synthetic roots for init arrays. 5) Layer attribution: carry layer digest/source through evidence; emit DSSE bundle per layer with signatures stubbed for now. 6) Tests/fixtures: stripped+unstripped ELF, shared objects, missing build-id, init array edges; golden JSON/NDJSON bundles. 7) Signals alignment: finalize DSSE graph schema and bundle naming; hook into reachability ingestion contract. ## Open Questions - Final DSSE payload shape (Signals team) — currently assumed `graph.bundle` with edges, symbols, metadata. - Whether to include debugline info for coverage (could add optional module later).***