Merge branch 'main' of https://git.stella-ops.org/stella-ops.org/git.stella-ops.org
This commit is contained in:
23
AGENTS.md
23
AGENTS.md
@@ -162,16 +162,19 @@ You will be explicitly told which role you are acting in. Your behavior must cha
|
||||
|
||||
Your goals:
|
||||
|
||||
1. Review new advisory files against:
|
||||
|
||||
* Archived advisories: `docs/product-advisories/archive/*.md`.
|
||||
* Implementation plans: `docs/implplan/SPRINT_*.md`.
|
||||
* Historical tasks: `docs/implplan/archived/all-tasks.md`.
|
||||
2. Identify new topics or features that require implementation.
|
||||
3. For genuinely new items (not already implemented or planned):
|
||||
|
||||
* Check the relevant module docs: `docs/modules/<module>/*arch*.md` for compatibility or contradictions.
|
||||
* If contradictions arise, you must surface and discuss them with the requester (in prose) and propose alignments.
|
||||
1. Review each file in the advisory directory and Identify new topics or features.
|
||||
2. Then determine whether the topic is relevant by:
|
||||
2. 1. Go one by one the files and extract the essentials first - themes, topics, architecture decions
|
||||
2. 2. Then read each of the archive/*.md files and seek if these are already had been advised. If it exists or it is close - then ignore the topic from the new advisory. Else keep it.
|
||||
2. 3. Check the relevant module docs: `docs/modules/<module>/*arch*.md` for compatibility or contradictions.
|
||||
2. 4. Implementation plans: `docs/implplan/SPRINT_*.md`.
|
||||
2. 5. Historical tasks: `docs/implplan/archived/all-tasks.md`.
|
||||
2. 4. For all of the new topics - then go in SPRINT*.md files and src/* (in according modules) for possible already implementation on the same topic. If same or close - ignore it. Otherwise keep it.
|
||||
2. 5. In case still genuine new topic - and it makes sense for the product - keep it.
|
||||
3. When done for all files and all new genuine topics - present a report. Report must include:
|
||||
- all topics
|
||||
- what are the new things
|
||||
- what could be contracting existing tasks or implementations but might make sense to implemnt
|
||||
4. Once scope is agreed, hand over to your **project manager** role (4.2) to define implementation sprints and tasks.
|
||||
5. **Advisory and design decision sync**:
|
||||
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,768 @@
|
||||
Here’s a quick, practical heads‑up about **binary initialization routines** and why they matter for reachability and vuln triage.
|
||||
|
||||
---
|
||||
|
||||
### What’s happening before `main()`
|
||||
|
||||
In ELF binaries/shared objects, the runtime linker runs **constructors** *before* `main()`:
|
||||
|
||||
* `.preinit_array` → runs first (rare, but highest priority)
|
||||
* `.init_array` → common place for constructors (ordered by index)
|
||||
* Legacy sections: `.init` (function) and `.ctors` (older toolchains)
|
||||
* On exit you also have `.fini_array` / `.fini`
|
||||
|
||||
These constructors can:
|
||||
|
||||
* Register signal/atexit handlers
|
||||
* Start threads, open sockets/files, tweak `LD_PRELOAD` hooks
|
||||
* Call library code you assumed was only used later
|
||||
|
||||
So if you’re doing **call‑graph reachability** for vulnerability impact, starting from only `main()` (or exported APIs) can **miss real edges** that execute at load time.
|
||||
|
||||
---
|
||||
|
||||
### What to model (synthetic roots)
|
||||
|
||||
Treat the following as **synthetic entry points** in your graph:
|
||||
|
||||
1. All function pointers in `.preinit_array`
|
||||
2. All function pointers in `.init_array`
|
||||
3. The symbol `_init` (if present) and legacy `.ctors` entries
|
||||
4. For completeness on teardown paths: `.fini_array`, `_fini`
|
||||
5. **Dynamic loader interposition**: if `DT_NEEDED` libs have their own constructors, they’re roots too (even if you never call them explicitly)
|
||||
|
||||
For PIE/DSO builds, remember that every loaded **dependency’s** init arrays run as part of `dlopen()`/program start—model those edges across DSOs.
|
||||
|
||||
---
|
||||
|
||||
### How to extract quickly
|
||||
|
||||
* **Static parse**: read `PT_DYNAMIC`, then `DT_PREINIT_ARRAY`, `DT_INIT_ARRAY`, their sizes; iterate pointers and add edges to your graph.
|
||||
* **Symbol fallback**: if `DT_INIT`/`_init` exists, add it as a root.
|
||||
* **Ordering**: preserve index order inside arrays (it can matter).
|
||||
* **Relocations**: resolve `R_X86_64_RELATIVE` (etc.) so pointers point to the real code addresses.
|
||||
|
||||
Mini‑C example (constructor runs pre‑main):
|
||||
|
||||
```c
|
||||
static void __attribute__((constructor)) boot(void) {
|
||||
// vulnerable call here executes before main()
|
||||
}
|
||||
int main(){ return 0; }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### For Stella Ops (binary reachability)
|
||||
|
||||
* **Graph seeds**: `roots = { init arrays of main ELF + all DT_NEEDED DSOs }`
|
||||
* **Policy**: mark edges from these roots as `phase=load` vs `phase=runtime`, so your explainer can say “reachable at load time.”
|
||||
* **PURLs**: attach edges to the package/node that owns the constructor symbol (DSO package purl), not just the main app.
|
||||
* **Attestation**: store the discovered root list (addresses + resolved symbols + DSO soname) in your deterministic scan manifest, so audits can replay it.
|
||||
* **Heuristics**: if `dlopen()` is detected statically (strings/symbols), add a potential root “DLOPEN_INIT[*]” bucket for libs found under common plugin dirs.
|
||||
|
||||
---
|
||||
|
||||
### Quick checklist
|
||||
|
||||
* [ ] Parse `.preinit_array`, `.init_array`, `.init` (and legacy `.ctors`)
|
||||
* [ ] Resolve relocations; preserve order
|
||||
* [ ] Seed graph with these as **synthetic roots**
|
||||
* [ ] Include constructors of every `DT_NEEDED` DSO
|
||||
* [ ] Tag edges as `phase=load` for prioritization/explainability
|
||||
* [ ] Persist root list in the scan’s evidence bundle
|
||||
|
||||
If you want, I can drop in a tiny .NET/ELF parser snippet or a Rust routine that walks `DT_INIT_ARRAY` and returns symbol‑resolved roots next.
|
||||
Here’s a concrete, C#‑oriented spec you can hand to a developer to implement ELF init/constructor discovery and plug it into a reachability engine like Stella Ops.
|
||||
|
||||
I’ll structure it like an internal design doc:
|
||||
|
||||
1. What we need to do
|
||||
2. Public API (what the rest of the system calls)
|
||||
3. ELF parsing details (minimal, but correct)
|
||||
4. Constructor / init routine discovery algorithm
|
||||
5. Dynamic deps (DT_NEEDED) and load‑time roots
|
||||
6. Integration with the call graph / reachability
|
||||
7. Attestation / evidence output
|
||||
8. Testing strategy
|
||||
|
||||
---
|
||||
|
||||
## 1. Goal / Requirements
|
||||
|
||||
**Business goal**
|
||||
|
||||
When scanning ELF binaries and shared libraries, we must model functions that run **before `main()`** or at **library load/unload** as *synthetic entry points* in the call graph:
|
||||
|
||||
* `.preinit_array` (pre‑init constructors)
|
||||
* `.init_array` (constructors)
|
||||
* Legacy constructs:
|
||||
|
||||
* `.ctors` array
|
||||
* `_init` (via `DT_INIT`)
|
||||
* For teardown (optional but recommended):
|
||||
|
||||
* `.fini_array`
|
||||
* `_fini` (via `DT_FINI`)
|
||||
|
||||
**We must:**
|
||||
|
||||
* Discover all these routines in:
|
||||
|
||||
* The main executable
|
||||
* All its `DT_NEEDED` shared libraries (and any DSOs subsequently loaded, if we scan them)
|
||||
* Represent them as **roots** in the reachability graph:
|
||||
|
||||
* `phase = Load` for preinit/init/constructors
|
||||
* `phase = Unload` for finalizers
|
||||
* Resolve each routine to:
|
||||
|
||||
* Owning binary path and SONAME
|
||||
* Virtual address in the ELF
|
||||
* Best‑effort symbol name (`_ZN...`, `my_ctor`, etc.)
|
||||
* Order/index within its array (to preserve call order)
|
||||
* Emit a structured **evidence/attestation** record so scans are replayable.
|
||||
|
||||
---
|
||||
|
||||
## 2. Public API (C#)
|
||||
|
||||
### 2.1 Data model
|
||||
|
||||
Create a small domain model in a library, e.g. `StellaOps.ElfInit`:
|
||||
|
||||
```csharp
|
||||
namespace StellaOps.ElfInit;
|
||||
|
||||
public enum InitRoutineKind
|
||||
{
|
||||
PreInitArray,
|
||||
InitArray,
|
||||
LegacyCtorsSection,
|
||||
LegacyInitSymbol,
|
||||
FiniArray,
|
||||
LegacyFiniSymbol
|
||||
}
|
||||
|
||||
public enum InitPhase
|
||||
{
|
||||
Load,
|
||||
Unload
|
||||
}
|
||||
|
||||
public sealed record InitRoutineRoot(
|
||||
string BinaryPath, // Full path on disk
|
||||
string? Soname, // From DT_SONAME if present
|
||||
InitRoutineKind Kind,
|
||||
InitPhase Phase,
|
||||
ulong VirtualAddress, // VA within this ELF
|
||||
ulong? FileOffset, // File offset (if resolved), null if unknown
|
||||
string? SymbolName, // Best-effort name from symbol table
|
||||
int? ArrayIndex // Index for array-based roots
|
||||
);
|
||||
```
|
||||
|
||||
### 2.2 Discovery service
|
||||
|
||||
Public entry point that other components use:
|
||||
|
||||
```csharp
|
||||
public interface IInitRoutineDiscovery
|
||||
{
|
||||
/// <summary>
|
||||
/// Discover load/unload routines (constructors) in a single ELF file
|
||||
/// and, optionally, in its DT_NEEDED dependencies.
|
||||
/// </summary>
|
||||
InitDiscoveryResult Discover(string elfPath, InitDiscoveryOptions options);
|
||||
}
|
||||
|
||||
public sealed record InitDiscoveryOptions
|
||||
{
|
||||
/// <summary>
|
||||
/// If true, also discover init routines in DT_NEEDED shared libraries
|
||||
/// (using IElfDependencyResolver to locate them on disk).
|
||||
/// </summary>
|
||||
public bool IncludeDependencies { get; init; } = true;
|
||||
|
||||
/// <summary>
|
||||
/// If true, include fini routines (.fini_array, DT_FINI, etc.)
|
||||
/// as unload-phase roots.
|
||||
/// </summary>
|
||||
public bool IncludeUnloadPhase { get; init; } = true;
|
||||
}
|
||||
|
||||
public sealed record InitDiscoveryResult(
|
||||
IReadOnlyList<InitRoutineRoot> Roots,
|
||||
IReadOnlyList<InitRoutineError> Errors // non-fatal problems per binary
|
||||
);
|
||||
|
||||
public sealed record InitRoutineError(
|
||||
string BinaryPath,
|
||||
string Message,
|
||||
Exception? Exception = null
|
||||
);
|
||||
```
|
||||
|
||||
### 2.3 Dependency resolution
|
||||
|
||||
We don’t hard‑code how to find `DT_NEEDED` libraries on disk. Define an abstraction:
|
||||
|
||||
```csharp
|
||||
public interface IElfDependencyResolver
|
||||
{
|
||||
/// <summary>
|
||||
/// Resolve SONAME (e.g. "libc.so.6") to a local file path.
|
||||
/// Returns null if not found.
|
||||
/// </summary>
|
||||
string? ResolveLibrary(string soname, string referencingBinaryPath);
|
||||
}
|
||||
```
|
||||
|
||||
The implementation can respect `LD_LIBRARY_PATH`, typical system dirs, container images, etc., but that’s outside this spec.
|
||||
|
||||
`IInitRoutineDiscovery` will depend on:
|
||||
|
||||
* `IElfParser`
|
||||
* `IElfDependencyResolver`
|
||||
* `ISymbolResolver` (symbol tables)
|
||||
|
||||
---
|
||||
|
||||
## 3. ELF Parsing Spec (C#‑friendly)
|
||||
|
||||
You can either use a NuGet ELF library or implement a minimal in‑house parser. This spec assumes a **minimal custom parser** that supports:
|
||||
|
||||
* ELF64, little‑endian
|
||||
* ET_EXEC, ET_DYN
|
||||
* x86‑64 (`e_machine == EM_X86_64`) as v1; keep architecture pluggable for later
|
||||
|
||||
### 3.1 Core types
|
||||
|
||||
Create an internal parser namespace, e.g. `StellaOps.Elf`:
|
||||
|
||||
```csharp
|
||||
internal sealed class ElfFile
|
||||
{
|
||||
public string Path { get; }
|
||||
public ElfClass ElfClass { get; }
|
||||
public ElfEndianness Endianness { get; }
|
||||
public ElfHeader Header { get; }
|
||||
public IReadOnlyList<ProgramHeader> ProgramHeaders { get; }
|
||||
public IReadOnlyList<SectionHeader> SectionHeaders { get; }
|
||||
public DynamicSection? Dynamic { get; }
|
||||
|
||||
public ReadOnlyMemory<byte> RawBytes { get; }
|
||||
|
||||
// Helper: mapping VA -> file offset using PT_LOAD segments
|
||||
public bool TryMapVaToFileOffset(ulong virtualAddress, out ulong fileOffset);
|
||||
}
|
||||
|
||||
internal enum ElfClass { Elf32, Elf64 }
|
||||
internal enum ElfEndianness { Little, Big }
|
||||
|
||||
// Fill out ElfHeader / ProgramHeader / SectionHeader / DynamicEntry types
|
||||
```
|
||||
|
||||
Implementation notes:
|
||||
|
||||
* Read ELF header:
|
||||
|
||||
* Validate magic: `0x7F 'E' 'L' 'F'`
|
||||
* `EI_CLASS` → 32/64‑bit
|
||||
* `EI_DATA` → endianness
|
||||
* Read **program headers** (`e_phoff`, `e_phnum`).
|
||||
|
||||
* Identify `PT_LOAD` (for VA→file mapping).
|
||||
* Identify `PT_DYNAMIC` (for `DynamicSection`).
|
||||
* Read **section headers** (`e_shoff`, `e_shnum`).
|
||||
|
||||
* Identify sections by name: `.preinit_array`, `.init_array`, `.fini_array`, `.ctors`.
|
||||
* You need the section name string table `.shstrtab` to decode names.
|
||||
|
||||
### 3.2 Dynamic section parsing
|
||||
|
||||
Define dynamic section model:
|
||||
|
||||
```csharp
|
||||
internal sealed class DynamicSection
|
||||
{
|
||||
public IReadOnlyList<DynamicEntry> Entries { get; }
|
||||
public ulong? InitFunction { get; } // DT_INIT
|
||||
public ulong? FiniFunction { get; } // DT_FINI
|
||||
public ulong? InitArrayAddress { get; } // DT_INIT_ARRAY
|
||||
public ulong? InitArraySize { get; } // DT_INIT_ARRAYSZ
|
||||
public ulong? FiniArrayAddress { get; } // DT_FINI_ARRAY
|
||||
public ulong? FiniArraySize { get; } // DT_FINI_ARRAYSZ
|
||||
public ulong? PreInitArrayAddress { get; } // DT_PREINIT_ARRAY
|
||||
public ulong? PreInitArraySize { get; } // DT_PREINIT_ARRAYSZ
|
||||
|
||||
public string? Soname { get; } // DT_SONAME (decoded via DT_STRTAB)
|
||||
public IReadOnlyList<string> Needed { get; } // DT_NEEDED list
|
||||
|
||||
public ulong? StrTabAddress { get; }
|
||||
public ulong? SymTabAddress { get; }
|
||||
public ulong? StrTabSize { get; }
|
||||
}
|
||||
```
|
||||
|
||||
Implementation details:
|
||||
|
||||
* Dynamic entries are at `PT_DYNAMIC.p_offset`, each `Elf64_Dyn`:
|
||||
|
||||
* `d_tag` (signed 64‑bit)
|
||||
* `d_un` union (`d_val` or `d_ptr`, treat as `ulong`)
|
||||
|
||||
* Tags of interest (values are from ELF spec):
|
||||
|
||||
* `DT_NULL = 0`
|
||||
* `DT_NEEDED = 1`
|
||||
* `DT_STRTAB = 5`
|
||||
* `DT_SYMTAB = 6`
|
||||
* `DT_STRSZ = 10`
|
||||
* `DT_INIT = 12`
|
||||
* `DT_FINI = 13`
|
||||
* `DT_SONAME = 14`
|
||||
* `DT_INIT_ARRAY = 25`
|
||||
* `DT_FINI_ARRAY = 26`
|
||||
* `DT_INIT_ARRAYSZ = 27`
|
||||
* `DT_FINI_ARRAYSZ = 28`
|
||||
* `DT_PREINIT_ARRAY = 32`
|
||||
* `DT_PREINIT_ARRAYSZ = 33`
|
||||
|
||||
* To decode SONAME and NEEDED:
|
||||
|
||||
* Use `DT_STRTAB` as base VA of the dynamic string table.
|
||||
* Map VA to file offset with `TryMapVaToFileOffset`.
|
||||
* For each `DT_NEEDED` / `DT_SONAME`, treat `d_val` as an offset into that string table; read a null‑terminated UTF‑8 C‑string.
|
||||
|
||||
---
|
||||
|
||||
## 4. Constructor & Init Routine Discovery
|
||||
|
||||
We now define the algorithm implemented by `InitRoutineDiscovery` for a **single ELF file**.
|
||||
|
||||
High‑level steps:
|
||||
|
||||
1. Parse `ElfFile`.
|
||||
2. Parse `DynamicSection`.
|
||||
3. Resolve:
|
||||
|
||||
* Pre‑init array (`DT_PREINIT_ARRAY`, `.preinit_array`)
|
||||
* Init array (`DT_INIT_ARRAY`, `.init_array`)
|
||||
* Legacy `.ctors`
|
||||
* `_init`, `_fini` via `DT_INIT`/`DT_FINI`
|
||||
* Fini array (`DT_FINI_ARRAY`, `.fini_array`)
|
||||
4. For each VA, optionally resolve symbol name.
|
||||
5. Build `InitRoutineRoot` entries.
|
||||
|
||||
### 4.1 Pointer size & endianness
|
||||
|
||||
* For ELF64:
|
||||
|
||||
* Pointer size = 8 bytes.
|
||||
* For ELF32:
|
||||
|
||||
* Pointer size = 4 bytes (if/when you support it).
|
||||
* Use `BinaryPrimitives.ReadUInt64LittleEndian` or `ReadUInt64BigEndian` depending on `ElfEndianness`.
|
||||
|
||||
### 4.2 Mapping VA → file offset
|
||||
|
||||
`ElfFile.TryMapVaToFileOffset`:
|
||||
|
||||
* Iterate `ProgramHeaders` with `p_type == PT_LOAD`.
|
||||
* If `virtualAddress` in `[p_vaddr, p_vaddr + p_memsz)`:
|
||||
|
||||
* `fileOffset = p_offset + (virtualAddress - p_vaddr)`
|
||||
* Return false if no matching segment.
|
||||
|
||||
### 4.3 Reading init arrays
|
||||
|
||||
Generic helper:
|
||||
|
||||
```csharp
|
||||
internal static IReadOnlyList<ulong> ReadPointerArray(
|
||||
ElfFile elf,
|
||||
ulong arrayVa,
|
||||
ulong arrayBytes)
|
||||
{
|
||||
var results = new List<ulong>();
|
||||
if (!elf.TryMapVaToFileOffset(arrayVa, out var fileOffset))
|
||||
return results;
|
||||
|
||||
int pointerSize = elf.ElfClass == ElfClass.Elf64 ? 8 : 4;
|
||||
int count = (int)(arrayBytes / (ulong)pointerSize);
|
||||
|
||||
var span = elf.RawBytes.Span;
|
||||
for (int i = 0; i < count; i++)
|
||||
{
|
||||
ulong offset = fileOffset + (ulong)(i * pointerSize);
|
||||
if (offset + (ulong)pointerSize > (ulong)span.Length)
|
||||
break;
|
||||
|
||||
ulong pointerValue = elf.Endianness switch
|
||||
{
|
||||
ElfEndianness.Little when pointerSize == 8
|
||||
=> System.Buffers.Binary.BinaryPrimitives.ReadUInt64LittleEndian(span[(int)offset..]),
|
||||
ElfEndianness.Little
|
||||
=> System.Buffers.Binary.BinaryPrimitives.ReadUInt32LittleEndian(span[(int)offset..]),
|
||||
ElfEndianness.Big when pointerSize == 8
|
||||
=> System.Buffers.Binary.BinaryPrimitives.ReadUInt64BigEndian(span[(int)offset..]),
|
||||
_ // Big, 32-bit
|
||||
=> System.Buffers.Binary.BinaryPrimitives.ReadUInt32BigEndian(span[(int)offset..]),
|
||||
};
|
||||
|
||||
if (pointerValue != 0)
|
||||
results.Add(pointerValue);
|
||||
}
|
||||
|
||||
return results;
|
||||
}
|
||||
```
|
||||
|
||||
Apply to:
|
||||
|
||||
* Pre‑init: if `Dynamic.PreInitArrayAddress` and `Dynamic.PreInitArraySize` present.
|
||||
* Init: if `Dynamic.InitArrayAddress` and `Dynamic.InitArraySize` present.
|
||||
* Fini: if `Dynamic.FiniArrayAddress` and `Dynamic.FiniArraySize` present.
|
||||
|
||||
### 4.4 Legacy `.ctors` section
|
||||
|
||||
Fallback for older toolchains:
|
||||
|
||||
* Find section with `Name == ".ctors"`.
|
||||
* Its contents are just an array of pointers (same pointer size as ELF).
|
||||
* Some compilers include a sentinel `-1` or `0` at beginning or end. Treat:
|
||||
|
||||
* `0` or `0xFFFFFFFFFFFFFFFF` (for 64‑bit) as sentinel; skip them.
|
||||
* Use similar `ReadPointerArray` logic but starting from `sh_offset` rather than a VA.
|
||||
|
||||
### 4.5 `_init` / `_fini` functions
|
||||
|
||||
* `Dynamic.InitFunction` (from `DT_INIT`) is a single VA.
|
||||
* `Dynamic.FiniFunction` (from `DT_FINI`) likewise.
|
||||
|
||||
Even if arrays exist, these may also be present; treat them as **independent roots**.
|
||||
|
||||
---
|
||||
|
||||
## 5. Symbol Resolution (best‑effort names)
|
||||
|
||||
Define interface:
|
||||
|
||||
```csharp
|
||||
public interface ISymbolResolver
|
||||
{
|
||||
/// <summary>
|
||||
/// Find the symbol whose address matches `virtualAddress` exactly,
|
||||
/// or, if not found, the closest preceding symbol (with an offset).
|
||||
/// </summary>
|
||||
SymbolInfo? ResolveSymbol(ElfFile elf, ulong virtualAddress);
|
||||
}
|
||||
|
||||
public sealed record SymbolInfo(
|
||||
string Name,
|
||||
ulong Value,
|
||||
ulong Size
|
||||
);
|
||||
```
|
||||
|
||||
Implementation sketch:
|
||||
|
||||
* Use `.dynsym` (dynamic symbol table), and `.symtab` (full symbol table) if available.
|
||||
* Each symbol entry includes:
|
||||
|
||||
* Name offset in string table
|
||||
* Value (VA)
|
||||
* Size
|
||||
* Type/binding (function, object, etc.)
|
||||
* Build an in‑memory index (e.g. sorted by `Value`) per ELF file.
|
||||
* `ResolveSymbol`:
|
||||
|
||||
* Prefer exact match of `Value`.
|
||||
* If none, find symbol with largest `Value` less than `virtualAddress` and treat as “nearest symbol + offset”.
|
||||
* You can show just `Name` or `Name+0xOFFSET` in explanations; for `InitRoutineRoot` we store plain `Name`.
|
||||
|
||||
---
|
||||
|
||||
## 6. Dynamic Dependencies & Load-Time Roots
|
||||
|
||||
When `InitDiscoveryOptions.IncludeDependencies == true`:
|
||||
|
||||
1. For root ELF:
|
||||
|
||||
* Discover its roots as above.
|
||||
2. For each `neededSoname` in `Dynamic.Needed`:
|
||||
|
||||
* Ask `IElfDependencyResolver.ResolveLibrary(neededSoname, rootElfPath)`.
|
||||
* If it returns a path not yet processed:
|
||||
|
||||
* Parse this ELF and recursively discover its roots.
|
||||
3. Return a **flat list** of all `InitRoutineRoot` objects, but with their own `BinaryPath`/`Soname`.
|
||||
|
||||
Important: **We do not implicitly model `dlopen()`** at this stage. That’s separate:
|
||||
|
||||
* As an optional heuristic, if the binary imports `dlopen`, tag those DSOs so later we can add “potential plugin load” roots. You can park this as a TODO in the comments.
|
||||
|
||||
---
|
||||
|
||||
## 7. Call Graph / Reachability Integration
|
||||
|
||||
This depends on your existing modeling, but here’s a generic spec a C# dev can follow.
|
||||
|
||||
Assume there is an internal model:
|
||||
|
||||
```csharp
|
||||
public sealed class CallGraph
|
||||
{
|
||||
public Node GetOrCreateNode(string binaryPath, ulong virtualAddress, string? symbolName);
|
||||
public Node GetOrCreateSyntheticRoot(string rootId, string description);
|
||||
public void AddEdge(Node from, Node to, CallEdgeMetadata metadata);
|
||||
}
|
||||
|
||||
public sealed record CallEdgeMetadata(
|
||||
string EdgeKind, // e.g. "loader-init"
|
||||
InitPhase Phase, // Load / Unload
|
||||
InitRoutineKind InitKind,
|
||||
int? ArrayIndex
|
||||
);
|
||||
```
|
||||
|
||||
### 7.1 Synthetic loader node
|
||||
|
||||
Create a single graph node representing the dynamic loader / program start:
|
||||
|
||||
```csharp
|
||||
var loaderNode = callGraph.GetOrCreateSyntheticRoot(
|
||||
"LOADER",
|
||||
"ELF dynamic loader / process start"
|
||||
);
|
||||
```
|
||||
|
||||
### 7.2 Adding edges for each root
|
||||
|
||||
For each `InitRoutineRoot root`:
|
||||
|
||||
1. Get or create a node for the target function:
|
||||
|
||||
```csharp
|
||||
var target = callGraph.GetOrCreateNode(
|
||||
root.BinaryPath,
|
||||
root.VirtualAddress,
|
||||
root.SymbolName
|
||||
);
|
||||
```
|
||||
|
||||
2. Add edge from loader:
|
||||
|
||||
```csharp
|
||||
callGraph.AddEdge(
|
||||
loaderNode,
|
||||
target,
|
||||
new CallEdgeMetadata(
|
||||
EdgeKind: "loader-init",
|
||||
Phase: root.Phase,
|
||||
InitKind: root.Kind,
|
||||
ArrayIndex: root.ArrayIndex
|
||||
)
|
||||
);
|
||||
```
|
||||
|
||||
3. Optional: If you model **per‑library** loader nodes, you can add:
|
||||
|
||||
* `LOADER -> libLoaderNode`
|
||||
* `libLoaderNode -> each constructor`
|
||||
|
||||
but that’s a nice‑to‑have, not required.
|
||||
|
||||
### 7.3 Phases
|
||||
|
||||
* For `.preinit_array`, `.init_array`, `.ctors`, `_init`:
|
||||
|
||||
* `Phase = InitPhase.Load`
|
||||
* For `.fini_array`, `_fini`:
|
||||
|
||||
* `Phase = InitPhase.Unload`
|
||||
|
||||
This allows downstream UI to say e.g.:
|
||||
|
||||
> This vulnerable function is reachable at **load time** via constructor `foo()` in `libbar.so`.
|
||||
|
||||
---
|
||||
|
||||
## 8. Attestation / Evidence Output
|
||||
|
||||
We want deterministic, auditable output per scan.
|
||||
|
||||
Define a JSON schema (C# record) stored alongside other scan artifacts:
|
||||
|
||||
```csharp
|
||||
public sealed record InitRoutineEvidence(
|
||||
string ScannerVersion,
|
||||
DateTimeOffset ScanTimeUtc,
|
||||
IReadOnlyList<InitRoutineEvidenceEntry> Entries
|
||||
);
|
||||
|
||||
public sealed record InitRoutineEvidenceEntry(
|
||||
string BinaryPath,
|
||||
string? Soname,
|
||||
InitRoutineKind Kind,
|
||||
InitPhase Phase,
|
||||
ulong VirtualAddress,
|
||||
ulong? FileOffset,
|
||||
string? SymbolName,
|
||||
int? ArrayIndex
|
||||
);
|
||||
```
|
||||
|
||||
Implementation details:
|
||||
|
||||
* After `IInitRoutineDiscovery.Discover` completes:
|
||||
|
||||
* Convert each `InitRoutineRoot` to `InitRoutineEvidenceEntry`.
|
||||
* Serialize with `System.Text.Json` (property names in camelCase or snake_case; choose a stable convention).
|
||||
* Store the evidence file e.g. `init_roots.json` inside the scan’s result directory.
|
||||
|
||||
---
|
||||
|
||||
## 9. Implementation Details & Edge Cases
|
||||
|
||||
### 9.1 Architectures
|
||||
|
||||
First version:
|
||||
|
||||
* Support:
|
||||
|
||||
* `ElfClass.Elf64`
|
||||
* `ElfEndianness.Little`
|
||||
* `EM_X86_64`
|
||||
* For anything else:
|
||||
|
||||
* Log an `InitRoutineError` and skip (but don’t hard‑fail the whole scan).
|
||||
|
||||
Design the parser so architecture is an enum:
|
||||
|
||||
```csharp
|
||||
internal enum ElfMachine : ushort
|
||||
{
|
||||
X86_64 = 62,
|
||||
// others later
|
||||
}
|
||||
```
|
||||
|
||||
### 9.2 Relocations (simplification)
|
||||
|
||||
Real loaders apply relocations to constructor arrays; some pointers may be stored as relative relocations.
|
||||
|
||||
For **v1 implementation**:
|
||||
|
||||
* Assume that:
|
||||
|
||||
* Array entries are already absolute VAs in the ELF’s address space (which is typical for non‑PIE or when link‑time addresses are used).
|
||||
* If you need better fidelity later:
|
||||
|
||||
* Parse `.rela.dyn` / `.rel.dyn`.
|
||||
* Apply `R_X86_64_RELATIVE` relocations whose `r_offset` falls within the array’s address range:
|
||||
|
||||
* Effective address = (base address + addend); if you treat base as 0, you get a VA that’s correct **within the file** (relative).
|
||||
|
||||
Document this as a TODO so later you can extend without breaking the API.
|
||||
|
||||
### 9.3 Error handling
|
||||
|
||||
* All parsing errors **must be non‑fatal** to the overall scan:
|
||||
|
||||
* Record `InitRoutineError` with `BinaryPath`, message, and exception.
|
||||
* Continue with other binaries.
|
||||
* If a binary is not ELF or has invalid magic:
|
||||
|
||||
* Return no roots, but optionally log a low‑severity error.
|
||||
|
||||
---
|
||||
|
||||
## 10. Testing Strategy
|
||||
|
||||
### 10.1 Unit tests with synthetic ELF fixtures
|
||||
|
||||
Create a small test project `StellaOps.ElfInit.Tests` with known ELF files checked into test resources:
|
||||
|
||||
* Binaries compiled with small C programs like:
|
||||
|
||||
```c
|
||||
static void __attribute__((constructor)) c1(void) {}
|
||||
static void __attribute__((constructor)) c2(void) {}
|
||||
static void __attribute__((destructor)) d1(void) {}
|
||||
int main() { return 0; }
|
||||
```
|
||||
|
||||
* Variants:
|
||||
|
||||
* Using `.ctors` (old GCC flags) for legacy coverage.
|
||||
* Shared library with `__attribute__((constructor))` and `DT_NEEDED` from a main binary.
|
||||
* Binary with no constructors (expect zero roots).
|
||||
|
||||
Assertions:
|
||||
|
||||
* The count of `InitRoutineRoot` matches expected.
|
||||
* `Kind` and `Phase` are correct.
|
||||
* `ArrayIndex` is correctly ordered: 0,1,2 …
|
||||
* `SymbolName` contains expected mangled function names (if compiler doesn’t drop them).
|
||||
* For dependencies:
|
||||
|
||||
* Discover roots in `libfoo.so` when main depends on it via `DT_NEEDED`.
|
||||
|
||||
### 10.2 Integration tests with call graph
|
||||
|
||||
* Given a small binary and a known vulnerable function reachable from a constructor:
|
||||
|
||||
* Run full pipeline.
|
||||
* Assert that the vulnerable function is marked reachable from synthetic `LOADER` node via the constructor.
|
||||
|
||||
### 10.3 Fuzz / robustness
|
||||
|
||||
* Run the discovery on:
|
||||
|
||||
* Random non‑ELF files.
|
||||
* Truncated ELF files.
|
||||
* Very large binaries.
|
||||
* Ensure no unhandled exceptions; only `InitRoutineError` entries.
|
||||
|
||||
---
|
||||
|
||||
## 11. Suggested C# Project Layout
|
||||
|
||||
```text
|
||||
src/
|
||||
StellaOps.ElfInit/
|
||||
IInitRoutineDiscovery.cs
|
||||
InitRoutineModels.cs
|
||||
InitRoutineDiscovery.cs
|
||||
IElfDependencyResolver.cs
|
||||
ISymbolResolver.cs
|
||||
Evidence/
|
||||
InitRoutineEvidence.cs
|
||||
Elf/
|
||||
ElfFile.cs
|
||||
ElfParser.cs
|
||||
ElfHeader.cs
|
||||
ProgramHeader.cs
|
||||
SectionHeader.cs
|
||||
DynamicSection.cs
|
||||
VaMapper.cs
|
||||
PointerArrayReader.cs
|
||||
tests/
|
||||
StellaOps.ElfInit.Tests/
|
||||
Resources/
|
||||
sample_no_ctor
|
||||
sample_init_array
|
||||
sample_preinit_init_fini
|
||||
sample_with_deps_main
|
||||
libsample_ctor.so
|
||||
InitRoutineDiscoveryTests.cs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
If you’d like, I can next:
|
||||
|
||||
* Draft `InitRoutineDiscovery` in C# with full method bodies, or
|
||||
* Provide a minimal `ElfFile`/`ElfParser` implementation skeleton you can fill in.
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user