This commit is contained in:
StellaOps Bot
2025-12-09 00:20:52 +02:00
parent 3d01bf9edc
commit bc0762e97d
261 changed files with 14033 additions and 4427 deletions

View File

@@ -0,0 +1,42 @@
# Native Reachability Graph Plan (Scanner · Signals Alignment)
## Goals
- Extract native reachability graphs from ELF binaries across layers (stripped and unstripped), emitting:
- Build IDs (`.note.gnu.build-id`) and code IDs per file.
- Symbol digests (purl+symbol) and edges (callgraph) with deterministic ordering.
- Synthetic roots for `_init`, `.init_array`, `.preinit_array`, entry points.
- DSSE graph bundle per layer for Signals ingestion.
- Offline-friendly, deterministic outputs (stable ordering, UTF-8, UTC).
## Inputs
- Layered filesystem with ELF binaries and shared objects.
- Layer metadata: digests from `scanner.rootfs.layers` and `scanner.layer.archives` (when provided).
- Optional runtime proc snapshot for reconciliation (if available via Signals pipeline).
## Approach
- **Discovery**: Walk layer directories; identify ELF binaries (`e_ident`, machine, class). Record per-layer path.
- **Identifiers**: Capture build-id (hash of `.note.gnu.build-id`), fallback to SHA-256 of `.text` when absent; store code-id (PE/ELF-friendly string).
- **Symbols**: Parse `.symtab`/`.dynsym`; compute stable symbol digests (e.g., SHA-256 over symbol bytes + name); include size/address for ordering.
- **Edges**: Build callgraph from relocation/import tables and (when available) `.eh_frame`/`.plt` linkage; emit Unknown edges when target unresolved.
- **Synthetic Roots**: Insert edges from synthetic root nodes (per binary) to `_start`, `_init`, `.init_array` entries.
- **Layer Bundles**: Emit DSSE bundle per layer with edges, symbols, identifiers, and provenance (layer digest, path, sha256).
- **Determinism**: Sort by layer digest, path, symbol name; normalize paths to POSIX separators; timestamps fixed to generation time in UTC ISO-8601.
## Deliverables
- Library: `StellaOps.Scanner.Analyzers.Native` (new) with ELF reader and graph builder.
- Tests: fixtures under `src/Scanner/__Tests/StellaOps.Scanner.Analyzers.Native.Tests` using stripped/unstripped ELF samples (no network).
- DSSE bundle schema: shared constants/types reused by Signals ingestion.
- Sprint doc links: referenced from `SPRINT_0146_0001_0001_scanner_analyzer_gap_close.md`.
## Task Backlog (initial)
1) Skeleton project `StellaOps.Scanner.Analyzers.Native` + plugin registration for scanner worker.
2) ELF reader: header detection, build-id extraction, code-id calculation, section loader with deterministic sorting.
3) Symbol digests: compute `sha256(name + addr + size + binding)`; emit per-symbol evidence and purl+symbol IDs.
4) Callgraph builder: edges from PLT/relocs/imports; Unknown targets captured; synthetic roots for init arrays.
5) Layer attribution: carry layer digest/source through evidence; emit DSSE bundle per layer with signatures stubbed for now.
6) Tests/fixtures: stripped+unstripped ELF, shared objects, missing build-id, init array edges; golden JSON/NDJSON bundles.
7) Signals alignment: finalize DSSE graph schema and bundle naming; hook into reachability ingestion contract.
## Open Questions
- Final DSSE payload shape (Signals team) — currently assumed `graph.bundle` with edges, symbols, metadata.
- Whether to include debugline info for coverage (could add optional module later).***