3.1 KiB
3.1 KiB
Native Reachability Graph Plan (Scanner · Signals Alignment)
Goals
- Extract native reachability graphs from ELF binaries across layers (stripped and unstripped), emitting:
- Build IDs (
.note.gnu.build-id) and code IDs per file. - Symbol digests (purl+symbol) and edges (callgraph) with deterministic ordering.
- Synthetic roots for
_init,.init_array,.preinit_array, entry points. - DSSE graph bundle per layer for Signals ingestion.
- Build IDs (
- Offline-friendly, deterministic outputs (stable ordering, UTF-8, UTC).
Inputs
- Layered filesystem with ELF binaries and shared objects.
- Layer metadata: digests from
scanner.rootfs.layersandscanner.layer.archives(when provided). - Optional runtime proc snapshot for reconciliation (if available via Signals pipeline).
Approach
- Discovery: Walk layer directories; identify ELF binaries (
e_ident, machine, class). Record per-layer path. - Identifiers: Capture build-id (hash of
.note.gnu.build-id), fallback to SHA-256 of.textwhen absent; store code-id (PE/ELF-friendly string). - Symbols: Parse
.symtab/.dynsym; compute stable symbol digests (e.g., SHA-256 over symbol bytes + name); include size/address for ordering. - Edges: Build callgraph from relocation/import tables and (when available)
.eh_frame/.pltlinkage; emit Unknown edges when target unresolved. - Synthetic Roots: Insert edges from synthetic root nodes (per binary) to
_start,_init,.init_arrayentries. - Layer Bundles: Emit DSSE bundle per layer with edges, symbols, identifiers, and provenance (layer digest, path, sha256).
- Determinism: Sort by layer digest, path, symbol name; normalize paths to POSIX separators; timestamps fixed to generation time in UTC ISO-8601.
Deliverables
- Library:
StellaOps.Scanner.Analyzers.Native(new) with ELF reader and graph builder. - Tests: fixtures under
src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Native.Testsusing stripped/unstripped ELF samples (no network). - DSSE bundle schema: shared constants/types reused by Signals ingestion.
- Sprint doc links: referenced from
SPRINT_0146_0001_0001_scanner_analyzer_gap_close.md.
Task Backlog (initial)
- Skeleton project
StellaOps.Scanner.Analyzers.Native+ plugin registration for scanner worker. - ELF reader: header detection, build-id extraction, code-id calculation, section loader with deterministic sorting.
- Symbol digests: compute
sha256(name + addr + size + binding); emit per-symbol evidence and purl+symbol IDs. - Callgraph builder: edges from PLT/relocs/imports; Unknown targets captured; synthetic roots for init arrays.
- Layer attribution: carry layer digest/source through evidence; emit DSSE bundle per layer with signatures stubbed for now.
- Tests/fixtures: stripped+unstripped ELF, shared objects, missing build-id, init array edges; golden JSON/NDJSON bundles.
- Signals alignment: finalize DSSE graph schema and bundle naming; hook into reachability ingestion contract.
Open Questions
- Final DSSE payload shape (Signals team) — currently assumed
graph.bundlewith edges, symbols, metadata. - Whether to include debugline info for coverage (could add optional module later).***