Files
git.stella-ops.org/docs/modules/scanner/design/native-reachability-plan.md
StellaOps Bot bc0762e97d up
2025-12-09 00:20:52 +02:00

3.1 KiB

Native Reachability Graph Plan (Scanner · Signals Alignment)

Goals

  • Extract native reachability graphs from ELF binaries across layers (stripped and unstripped), emitting:
    • Build IDs (.note.gnu.build-id) and code IDs per file.
    • Symbol digests (purl+symbol) and edges (callgraph) with deterministic ordering.
    • Synthetic roots for _init, .init_array, .preinit_array, entry points.
    • DSSE graph bundle per layer for Signals ingestion.
  • Offline-friendly, deterministic outputs (stable ordering, UTF-8, UTC).

Inputs

  • Layered filesystem with ELF binaries and shared objects.
  • Layer metadata: digests from scanner.rootfs.layers and scanner.layer.archives (when provided).
  • Optional runtime proc snapshot for reconciliation (if available via Signals pipeline).

Approach

  • Discovery: Walk layer directories; identify ELF binaries (e_ident, machine, class). Record per-layer path.
  • Identifiers: Capture build-id (hash of .note.gnu.build-id), fallback to SHA-256 of .text when absent; store code-id (PE/ELF-friendly string).
  • Symbols: Parse .symtab/.dynsym; compute stable symbol digests (e.g., SHA-256 over symbol bytes + name); include size/address for ordering.
  • Edges: Build callgraph from relocation/import tables and (when available) .eh_frame/.plt linkage; emit Unknown edges when target unresolved.
  • Synthetic Roots: Insert edges from synthetic root nodes (per binary) to _start, _init, .init_array entries.
  • Layer Bundles: Emit DSSE bundle per layer with edges, symbols, identifiers, and provenance (layer digest, path, sha256).
  • Determinism: Sort by layer digest, path, symbol name; normalize paths to POSIX separators; timestamps fixed to generation time in UTC ISO-8601.

Deliverables

  • Library: StellaOps.Scanner.Analyzers.Native (new) with ELF reader and graph builder.
  • Tests: fixtures under src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Native.Tests using stripped/unstripped ELF samples (no network).
  • DSSE bundle schema: shared constants/types reused by Signals ingestion.
  • Sprint doc links: referenced from SPRINT_0146_0001_0001_scanner_analyzer_gap_close.md.

Task Backlog (initial)

  1. Skeleton project StellaOps.Scanner.Analyzers.Native + plugin registration for scanner worker.
  2. ELF reader: header detection, build-id extraction, code-id calculation, section loader with deterministic sorting.
  3. Symbol digests: compute sha256(name + addr + size + binding); emit per-symbol evidence and purl+symbol IDs.
  4. Callgraph builder: edges from PLT/relocs/imports; Unknown targets captured; synthetic roots for init arrays.
  5. Layer attribution: carry layer digest/source through evidence; emit DSSE bundle per layer with signatures stubbed for now.
  6. Tests/fixtures: stripped+unstripped ELF, shared objects, missing build-id, init array edges; golden JSON/NDJSON bundles.
  7. Signals alignment: finalize DSSE graph schema and bundle naming; hook into reachability ingestion contract.

Open Questions

  • Final DSSE payload shape (Signals team) — currently assumed graph.bundle with edges, symbols, metadata.
  • Whether to include debugline info for coverage (could add optional module later).***