up
This commit is contained in:
42
docs/modules/scanner/design/native-reachability-plan.md
Normal file
42
docs/modules/scanner/design/native-reachability-plan.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# Native Reachability Graph Plan (Scanner · Signals Alignment)
|
||||
|
||||
## Goals
|
||||
- Extract native reachability graphs from ELF binaries across layers (stripped and unstripped), emitting:
|
||||
- Build IDs (`.note.gnu.build-id`) and code IDs per file.
|
||||
- Symbol digests (purl+symbol) and edges (callgraph) with deterministic ordering.
|
||||
- Synthetic roots for `_init`, `.init_array`, `.preinit_array`, entry points.
|
||||
- DSSE graph bundle per layer for Signals ingestion.
|
||||
- Offline-friendly, deterministic outputs (stable ordering, UTF-8, UTC).
|
||||
|
||||
## Inputs
|
||||
- Layered filesystem with ELF binaries and shared objects.
|
||||
- Layer metadata: digests from `scanner.rootfs.layers` and `scanner.layer.archives` (when provided).
|
||||
- Optional runtime proc snapshot for reconciliation (if available via Signals pipeline).
|
||||
|
||||
## Approach
|
||||
- **Discovery**: Walk layer directories; identify ELF binaries (`e_ident`, machine, class). Record per-layer path.
|
||||
- **Identifiers**: Capture build-id (hash of `.note.gnu.build-id`), fallback to SHA-256 of `.text` when absent; store code-id (PE/ELF-friendly string).
|
||||
- **Symbols**: Parse `.symtab`/`.dynsym`; compute stable symbol digests (e.g., SHA-256 over symbol bytes + name); include size/address for ordering.
|
||||
- **Edges**: Build callgraph from relocation/import tables and (when available) `.eh_frame`/`.plt` linkage; emit Unknown edges when target unresolved.
|
||||
- **Synthetic Roots**: Insert edges from synthetic root nodes (per binary) to `_start`, `_init`, `.init_array` entries.
|
||||
- **Layer Bundles**: Emit DSSE bundle per layer with edges, symbols, identifiers, and provenance (layer digest, path, sha256).
|
||||
- **Determinism**: Sort by layer digest, path, symbol name; normalize paths to POSIX separators; timestamps fixed to generation time in UTC ISO-8601.
|
||||
|
||||
## Deliverables
|
||||
- Library: `StellaOps.Scanner.Analyzers.Native` (new) with ELF reader and graph builder.
|
||||
- Tests: fixtures under `src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Native.Tests` using stripped/unstripped ELF samples (no network).
|
||||
- DSSE bundle schema: shared constants/types reused by Signals ingestion.
|
||||
- Sprint doc links: referenced from `SPRINT_0146_0001_0001_scanner_analyzer_gap_close.md`.
|
||||
|
||||
## Task Backlog (initial)
|
||||
1) Skeleton project `StellaOps.Scanner.Analyzers.Native` + plugin registration for scanner worker.
|
||||
2) ELF reader: header detection, build-id extraction, code-id calculation, section loader with deterministic sorting.
|
||||
3) Symbol digests: compute `sha256(name + addr + size + binding)`; emit per-symbol evidence and purl+symbol IDs.
|
||||
4) Callgraph builder: edges from PLT/relocs/imports; Unknown targets captured; synthetic roots for init arrays.
|
||||
5) Layer attribution: carry layer digest/source through evidence; emit DSSE bundle per layer with signatures stubbed for now.
|
||||
6) Tests/fixtures: stripped+unstripped ELF, shared objects, missing build-id, init array edges; golden JSON/NDJSON bundles.
|
||||
7) Signals alignment: finalize DSSE graph schema and bundle naming; hook into reachability ingestion contract.
|
||||
|
||||
## Open Questions
|
||||
- Final DSSE payload shape (Signals team) — currently assumed `graph.bundle` with edges, symbols, metadata.
|
||||
- Whether to include debugline info for coverage (could add optional module later).***
|
||||
Reference in New Issue
Block a user