up
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Notify Smoke Test / Notify Unit Tests (push) Has been cancelled
Notify Smoke Test / Notifier Service Tests (push) Has been cancelled
Notify Smoke Test / Notification Smoke Test (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
Signals CI & Image / signals-ci (push) Has been cancelled
Signals Reachability Scoring & Events / reachability-smoke (push) Has been cancelled
Signals Reachability Scoring & Events / sign-and-upload (push) Has been cancelled
Manifest Integrity / Validate Schema Integrity (push) Has been cancelled
Manifest Integrity / Validate Contract Documents (push) Has been cancelled
Manifest Integrity / Validate Pack Fixtures (push) Has been cancelled
Manifest Integrity / Audit SHA256SUMS Files (push) Has been cancelled
Manifest Integrity / Verify Merkle Roots (push) Has been cancelled
devportal-offline / build-offline (push) Has been cancelled
Mirror Thin Bundle Sign & Verify / mirror-sign (push) Has been cancelled
This commit is contained in:
301
docs/contracts/buildid-propagation.md
Normal file
301
docs/contracts/buildid-propagation.md
Normal file
@@ -0,0 +1,301 @@
|
||||
# CONTRACT-BUILDID-PROPAGATION-401: Build-ID and Code-ID Propagation
|
||||
|
||||
> **Status:** Published
|
||||
> **Version:** 1.0.0
|
||||
> **Published:** 2025-12-13
|
||||
> **Owners:** Scanner Guild, Signals Guild, BE-Base Platform Guild
|
||||
> **Unblocks:** SCANNER-BUILDID-401-035, SCANNER-INITROOT-401-036, and downstream tasks
|
||||
|
||||
## Overview
|
||||
|
||||
This contract defines how GNU build-id (ELF), PE GUID, and Mach-O UUID propagate through the reachability pipeline from Scanner to SBOM, Signals, and runtime facts. It ensures consistent identification of binaries across components for deterministic symbol resolution and replay.
|
||||
|
||||
---
|
||||
|
||||
## 1. Build-ID Sources and Formats
|
||||
|
||||
### 1.1 Per-Format Extraction
|
||||
|
||||
| Binary Format | Build-ID Source | Prefix | Example |
|
||||
|---------------|-----------------|--------|---------|
|
||||
| ELF | `.note.gnu.build-id` | `gnu-build-id:` | `gnu-build-id:5f0c7c3cab2eb9bc...` |
|
||||
| PE (Windows) | Debug GUID from PE header | `pe-guid:` | `pe-guid:12345678-1234-1234-1234-123456789abc` |
|
||||
| Mach-O | `LC_UUID` load command | `macho-uuid:` | `macho-uuid:12345678123412341234123456789abc` |
|
||||
|
||||
### 1.2 Canonical Format
|
||||
|
||||
```
|
||||
build_id = "{prefix}{hex_lowercase}"
|
||||
```
|
||||
|
||||
- Hex encoding: lowercase, no separators (except PE GUID retains dashes)
|
||||
- Minimum length: 16 bytes (32 hex chars) for ELF/Mach-O
|
||||
- PE GUID: Standard GUID format with dashes
|
||||
|
||||
### 1.3 Fallback When Build-ID Absent
|
||||
|
||||
When build-id is not present (stripped binaries, older toolchains):
|
||||
|
||||
```json
|
||||
{
|
||||
"build_id": null,
|
||||
"build_id_fallback": {
|
||||
"method": "file_hash",
|
||||
"value": "sha256:...",
|
||||
"confidence": 0.7
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Fallback chain:**
|
||||
1. `file_hash` - SHA-256 of entire binary file (confidence: 0.7)
|
||||
2. `code_section_hash` - SHA-256 of .text section (confidence: 0.6)
|
||||
3. `path_hash` - SHA-256 of file path (confidence: 0.3, last resort)
|
||||
|
||||
---
|
||||
|
||||
## 2. Code-ID for Name-less Symbols
|
||||
|
||||
### 2.1 Purpose
|
||||
|
||||
`code_id` provides stable identification for symbols in stripped binaries where the symbol name is unavailable.
|
||||
|
||||
### 2.2 Format
|
||||
|
||||
```
|
||||
code_id = "code:{lang}:{base64url_sha256}"
|
||||
```
|
||||
|
||||
**Canonical tuple for binary symbols:**
|
||||
```
|
||||
{format}\0{build_id_or_file_hash}\0{section}\0{addr}\0{size}\0{code_block_hash}
|
||||
```
|
||||
|
||||
### 2.3 Code Block Hash
|
||||
|
||||
For stripped functions, compute hash of the code bytes:
|
||||
|
||||
```
|
||||
code_block_hash = "sha256:" + hex(SHA256(code_bytes[addr:addr+size]))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Cross-RID (Runtime Identifier) Mapping
|
||||
|
||||
### 3.1 Problem Statement
|
||||
|
||||
Different platform builds (linux-x64, win-x64, osx-arm64) of the same source code produce different binaries with different build-ids. Runtime facts from one platform must map to the correct binary variant.
|
||||
|
||||
### 3.2 Variant Group
|
||||
|
||||
Binaries from the same source are grouped by source digest:
|
||||
|
||||
```json
|
||||
{
|
||||
"variant_group": {
|
||||
"source_digest": "sha256:...",
|
||||
"variants": [
|
||||
{
|
||||
"rid": "linux-x64",
|
||||
"build_id": "gnu-build-id:aaa...",
|
||||
"file_hash": "sha256:..."
|
||||
},
|
||||
{
|
||||
"rid": "win-x64",
|
||||
"build_id": "pe-guid:bbb...",
|
||||
"file_hash": "sha256:..."
|
||||
},
|
||||
{
|
||||
"rid": "osx-arm64",
|
||||
"build_id": "macho-uuid:ccc...",
|
||||
"file_hash": "sha256:..."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 Runtime Fact Correlation
|
||||
|
||||
When Signals ingests runtime facts:
|
||||
|
||||
1. Extract `build_id` from runtime event
|
||||
2. Look up variant group containing this build_id
|
||||
3. Correlate with richgraph nodes having matching `build_id`
|
||||
4. If no match, fall back to `code_id` + `code_block_hash` matching
|
||||
|
||||
---
|
||||
|
||||
## 4. SBOM Integration
|
||||
|
||||
### 4.1 CycloneDX 1.6 Properties
|
||||
|
||||
Build-ID propagates to SBOM via component properties:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "library",
|
||||
"name": "libssl.so.3",
|
||||
"version": "3.0.11",
|
||||
"properties": [
|
||||
{"name": "stellaops:build-id", "value": "gnu-build-id:5f0c7c3c..."},
|
||||
{"name": "stellaops:code-id", "value": "code:binary:abc123..."},
|
||||
{"name": "stellaops:file-hash", "value": "sha256:..."}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 SPDX 3.0 Integration
|
||||
|
||||
Build-ID maps to SPDX external references:
|
||||
|
||||
```json
|
||||
{
|
||||
"spdxId": "SPDXRef-libssl",
|
||||
"externalRef": {
|
||||
"referenceCategory": "PERSISTENT-ID",
|
||||
"referenceType": "gnu-build-id",
|
||||
"referenceLocator": "gnu-build-id:5f0c7c3c..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Signals Runtime Facts Schema
|
||||
|
||||
### 5.1 Runtime Event with Build-ID
|
||||
|
||||
```json
|
||||
{
|
||||
"event_type": "function_hit",
|
||||
"timestamp": "2025-12-13T10:00:00Z",
|
||||
"binary": {
|
||||
"path": "/usr/lib/x86_64-linux-gnu/libssl.so.3",
|
||||
"build_id": "gnu-build-id:5f0c7c3c...",
|
||||
"file_hash": "sha256:..."
|
||||
},
|
||||
"symbol": {
|
||||
"name": "SSL_read",
|
||||
"address": "0x12345678",
|
||||
"symbol_id": "sym:binary:..."
|
||||
},
|
||||
"context": {
|
||||
"pid": 12345,
|
||||
"container_id": "abc123..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 Ingestion Endpoint
|
||||
|
||||
```
|
||||
POST /signals/runtime-facts
|
||||
Content-Type: application/x-ndjson
|
||||
Content-Encoding: gzip
|
||||
|
||||
{"event_type":"function_hit","binary":{"build_id":"gnu-build-id:..."},...}
|
||||
{"event_type":"function_hit","binary":{"build_id":"gnu-build-id:..."},...}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. RichGraph Integration
|
||||
|
||||
### 6.1 Node with Build-ID
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "sym:binary:...",
|
||||
"symbol_id": "sym:binary:...",
|
||||
"lang": "binary",
|
||||
"kind": "function",
|
||||
"display": "SSL_read",
|
||||
"build_id": "gnu-build-id:5f0c7c3c...",
|
||||
"code_id": "code:binary:...",
|
||||
"code_block_hash": "sha256:...",
|
||||
"purl": "pkg:deb/debian/libssl3@3.0.11"
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 CAS Evidence Storage
|
||||
|
||||
```
|
||||
cas://binary/
|
||||
by-build-id/{build_id}/ # Index by build-id
|
||||
graph.json # Associated graph
|
||||
symbols.json # Symbol table
|
||||
by-code-id/{code_id}/ # Index by code-id
|
||||
block.bin # Code block bytes
|
||||
disasm.json # Disassembly
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Implementation Requirements
|
||||
|
||||
### 7.1 Scanner Changes
|
||||
|
||||
| Component | Change | Priority |
|
||||
|-----------|--------|----------|
|
||||
| ELF parser | Extract `.note.gnu.build-id` | P0 |
|
||||
| PE parser | Extract Debug GUID | P0 |
|
||||
| Mach-O parser | Extract `LC_UUID` | P0 |
|
||||
| RichGraphBuilder | Populate `build_id` field on nodes | P0 |
|
||||
| SBOM emitters | Add `stellaops:build-id` property | P1 |
|
||||
|
||||
### 7.2 Signals Changes
|
||||
|
||||
| Component | Change | Priority |
|
||||
|-----------|--------|----------|
|
||||
| Runtime facts ingestion | Parse and index `build_id` | P0 |
|
||||
| Scoring service | Correlate by `build_id` then `code_id` | P0 |
|
||||
| Store repository | Add `build_id` index | P1 |
|
||||
|
||||
### 7.3 CLI/UI Changes
|
||||
|
||||
| Component | Change | Priority |
|
||||
|-----------|--------|----------|
|
||||
| `stella graph explain` | Show build_id in output | P1 |
|
||||
| UI symbol drawer | Display build_id with copy button | P1 |
|
||||
|
||||
---
|
||||
|
||||
## 8. Validation Rules
|
||||
|
||||
1. `build_id` must match regex: `^(gnu-build-id|pe-guid|macho-uuid):[a-f0-9-]+$`
|
||||
2. `code_id` must match regex: `^code:[a-z]+:[A-Za-z0-9_-]+$`
|
||||
3. When `build_id` is null, `build_id_fallback` must be present
|
||||
4. `code_block_hash` required when `build_id` is null and symbol is stripped
|
||||
5. Variant group `source_digest` must be consistent across all variants
|
||||
|
||||
---
|
||||
|
||||
## 9. Test Fixtures
|
||||
|
||||
Location: `tests/Binary/fixtures/build-id/`
|
||||
|
||||
| Fixture | Description |
|
||||
|---------|-------------|
|
||||
| `elf-with-buildid/` | ELF binary with GNU build-id |
|
||||
| `elf-stripped/` | ELF stripped, fallback to code-id |
|
||||
| `pe-with-guid/` | PE binary with Debug GUID |
|
||||
| `macho-with-uuid/` | Mach-O binary with LC_UUID |
|
||||
| `variant-group/` | Same source, multiple RIDs |
|
||||
|
||||
---
|
||||
|
||||
## 10. Related Contracts
|
||||
|
||||
- [richgraph-v1](./richgraph-v1.md) - Graph schema with build_id field
|
||||
- [Binary Reachability](../reachability/binary-reachability-schema.md) - Binary evidence schema
|
||||
- [Symbol Manifest](../specs/SYMBOL_MANIFEST_v1.md) - Symbol identification
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 1.0.0 | 2025-12-13 | Scanner Guild | Initial contract for build-id propagation |
|
||||
326
docs/contracts/init-section-roots.md
Normal file
326
docs/contracts/init-section-roots.md
Normal file
@@ -0,0 +1,326 @@
|
||||
# CONTRACT-INIT-ROOTS-401: Init-Section Synthetic Roots
|
||||
|
||||
> **Status:** Published
|
||||
> **Version:** 1.0.0
|
||||
> **Published:** 2025-12-13
|
||||
> **Owners:** Scanner Guild, Policy Guild, Signals Guild
|
||||
> **Unblocks:** SCANNER-INITROOT-401-036, EDGE-BUNDLE-401-054, and downstream tasks
|
||||
|
||||
## Overview
|
||||
|
||||
This contract defines how ELF/PE/Mach-O initialization sections (`.init_array`, `.ctors`, `DT_INIT`, etc.) are modeled as synthetic roots in reachability graphs. These roots represent code that executes during program load, before `main()`, and must be included in reachability analysis for complete vulnerability assessment.
|
||||
|
||||
---
|
||||
|
||||
## 1. Init-Section Categories
|
||||
|
||||
### 1.1 ELF Init Sections
|
||||
|
||||
| Section/Tag | Phase | Order | Description |
|
||||
|-------------|-------|-------|-------------|
|
||||
| `.preinit_array` / `DT_PREINIT_ARRAY` | `preinit` | 0-N | Executed before dynamic linker init |
|
||||
| `.init` / `DT_INIT` | `init` | 0 | Single init function |
|
||||
| `.init_array` / `DT_INIT_ARRAY` | `init` | 1-N | Array of init function pointers |
|
||||
| `.ctors` | `init` | after init_array | Legacy C++ constructors |
|
||||
| `.fini` / `DT_FINI` | `fini` | 0 | Single cleanup function |
|
||||
| `.fini_array` / `DT_FINI_ARRAY` | `fini` | 1-N | Array of cleanup function pointers |
|
||||
| `.dtors` | `fini` | after fini_array | Legacy C++ destructors |
|
||||
|
||||
### 1.2 PE Init Sections
|
||||
|
||||
| Mechanism | Phase | Order | Description |
|
||||
|-----------|-------|-------|-------------|
|
||||
| `DllMain` (DLL_PROCESS_ATTACH) | `init` | 0 | DLL initialization |
|
||||
| TLS callbacks | `init` | 1-N | Thread-local storage callbacks |
|
||||
| C++ global constructors | `init` | after TLS | Via CRT init table |
|
||||
| `DllMain` (DLL_PROCESS_DETACH) | `fini` | 0 | DLL cleanup |
|
||||
|
||||
### 1.3 Mach-O Init Sections
|
||||
|
||||
| Section | Phase | Order | Description |
|
||||
|---------|-------|-------|-------------|
|
||||
| `__mod_init_func` | `init` | 0-N | Module init functions |
|
||||
| `__mod_term_func` | `fini` | 0-N | Module termination functions |
|
||||
|
||||
---
|
||||
|
||||
## 2. Synthetic Root Schema
|
||||
|
||||
### 2.1 Root Object in richgraph-v1
|
||||
|
||||
```json
|
||||
{
|
||||
"roots": [
|
||||
{
|
||||
"id": "root:init:0:sym:binary:abc123...",
|
||||
"phase": "init",
|
||||
"source": "init_array",
|
||||
"order": 0,
|
||||
"target_id": "sym:binary:abc123...",
|
||||
"binary_path": "/usr/lib/libfoo.so.1",
|
||||
"build_id": "gnu-build-id:5f0c7c3c..."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 2.2 Root ID Format
|
||||
|
||||
```
|
||||
root:{phase}:{order}:{target_symbol_id}
|
||||
```
|
||||
|
||||
**Examples:**
|
||||
- `root:preinit:0:sym:binary:abc...` - First preinit function
|
||||
- `root:init:0:sym:binary:def...` - DT_INIT function
|
||||
- `root:init:1:sym:binary:ghi...` - First init_array entry
|
||||
- `root:main:0:sym:binary:jkl...` - main() function
|
||||
- `root:fini:0:sym:binary:mno...` - DT_FINI function
|
||||
|
||||
### 2.3 Phase Enumeration
|
||||
|
||||
| Phase | Numeric Order | Execution Time |
|
||||
|-------|---------------|----------------|
|
||||
| `load` | 0 | Dynamic linker resolution |
|
||||
| `preinit` | 1 | Before dynamic init |
|
||||
| `init` | 2 | During initialization |
|
||||
| `main` | 3 | Program entry (main) |
|
||||
| `fini` | 4 | During termination |
|
||||
|
||||
---
|
||||
|
||||
## 3. Root Discovery Algorithm
|
||||
|
||||
### 3.1 ELF Root Discovery
|
||||
|
||||
```
|
||||
1. Parse .dynamic section for DT_PREINIT_ARRAY, DT_INIT, DT_INIT_ARRAY
|
||||
2. For each array:
|
||||
a. Read function pointer addresses
|
||||
b. Resolve to symbol (if available) or emit unknown
|
||||
c. Create root with phase + order
|
||||
3. Find _start, main, _init, _fini symbols and add as roots
|
||||
4. Sort roots by (phase, order, target_id) for determinism
|
||||
```
|
||||
|
||||
### 3.2 Handling Unresolved Targets
|
||||
|
||||
When init array contains address without symbol:
|
||||
|
||||
```json
|
||||
{
|
||||
"roots": [
|
||||
{
|
||||
"id": "root:init:2:unknown:0x12345678",
|
||||
"phase": "init",
|
||||
"source": "init_array",
|
||||
"order": 2,
|
||||
"target_id": "unknown:0x12345678",
|
||||
"resolved": false,
|
||||
"reason": "No symbol at address 0x12345678"
|
||||
}
|
||||
],
|
||||
"unknowns": [
|
||||
{
|
||||
"id": "unknown:0x12345678",
|
||||
"type": "unresolved_init_target",
|
||||
"address": "0x12345678",
|
||||
"source": "init_array[2]"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. DT_NEEDED Dependency Modeling
|
||||
|
||||
### 4.1 Purpose
|
||||
|
||||
`DT_NEEDED` entries specify shared library dependencies. These execute their init code before the depending binary's init code.
|
||||
|
||||
### 4.2 Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"dependencies": [
|
||||
{
|
||||
"id": "dep:libssl.so.3",
|
||||
"name": "libssl.so.3",
|
||||
"source": "DT_NEEDED",
|
||||
"order": 0,
|
||||
"resolved_path": "/usr/lib/x86_64-linux-gnu/libssl.so.3",
|
||||
"resolved_build_id": "gnu-build-id:abc..."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Init Order with Dependencies
|
||||
|
||||
```
|
||||
1. libssl.so.3 preinit → init
|
||||
2. libcrypto.so.3 preinit → init
|
||||
3. libc.so.6 preinit → init
|
||||
4. main_binary preinit → init → main
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Patch Oracle Integration
|
||||
|
||||
### 5.1 Oracle Expected Roots
|
||||
|
||||
```json
|
||||
{
|
||||
"expected_roots": [
|
||||
{
|
||||
"id": "root:init:*:sym:binary:*",
|
||||
"phase": "init",
|
||||
"source": "init_array",
|
||||
"required": true,
|
||||
"reason": "Init function must be detected for CVE-2023-XXXX"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 Oracle Forbidden Roots
|
||||
|
||||
```json
|
||||
{
|
||||
"forbidden_roots": [
|
||||
{
|
||||
"id": "root:preinit:*:*",
|
||||
"phase": "preinit",
|
||||
"reason": "Preinit code should not exist after patch"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Policy Integration
|
||||
|
||||
### 6.1 Reachability State with Init Roots
|
||||
|
||||
When evaluating reachability:
|
||||
|
||||
1. If vulnerable function is reachable from `main` → `REACHABLE`
|
||||
2. If vulnerable function is reachable from `init` roots → `REACHABLE_INIT`
|
||||
3. If vulnerable function is reachable only from `fini` → `REACHABLE_FINI`
|
||||
|
||||
### 6.2 Policy DSL Extensions
|
||||
|
||||
```yaml
|
||||
# Require init-phase reachability for not_affected
|
||||
rules:
|
||||
- name: init-reachability-required
|
||||
condition: |
|
||||
vuln.phase_reachable.includes("init") and
|
||||
reachability.confidence >= 0.8
|
||||
action: require_evidence
|
||||
|
||||
- name: init-only-lower-severity
|
||||
condition: |
|
||||
reachability.reachable_phases == ["init"] and
|
||||
not reachability.reachable_phases.includes("main")
|
||||
action: reduce_severity
|
||||
severity_adjustment: -1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Evidence Requirements
|
||||
|
||||
### 7.1 Init Root Evidence Bundle
|
||||
|
||||
```json
|
||||
{
|
||||
"root_evidence": {
|
||||
"root_id": "root:init:0:sym:binary:...",
|
||||
"extraction_method": "dynamic_section",
|
||||
"source_offset": "0x1234",
|
||||
"target_address": "0x5678",
|
||||
"target_symbol": "frame_dummy",
|
||||
"evidence_hash": "sha256:...",
|
||||
"evidence_uri": "cas://binary/roots/sha256:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 7.2 CAS Storage Layout
|
||||
|
||||
```
|
||||
cas://reachability/roots/{graph_hash}/
|
||||
init.json # All init-phase roots
|
||||
fini.json # All fini-phase roots
|
||||
dependencies.json # DT_NEEDED graph
|
||||
evidence/
|
||||
root:{id}.json # Per-root evidence
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Determinism Rules
|
||||
|
||||
### 8.1 Root Ordering
|
||||
|
||||
Roots are sorted by:
|
||||
1. Phase (numeric: load=0, preinit=1, init=2, main=3, fini=4)
|
||||
2. Order within phase (numeric)
|
||||
3. Target ID (string, ordinal)
|
||||
|
||||
### 8.2 Root ID Canonicalization
|
||||
|
||||
```
|
||||
root_id = "root:" + phase + ":" + order + ":" + target_id
|
||||
```
|
||||
|
||||
All components lowercase, no whitespace.
|
||||
|
||||
---
|
||||
|
||||
## 9. Implementation Status
|
||||
|
||||
| Component | Location | Status |
|
||||
|-----------|----------|--------|
|
||||
| ELF init parser | `NativeCallgraphBuilder.cs` | Implemented |
|
||||
| Root model | `NativeSyntheticRoot` | Implemented |
|
||||
| richgraph-v1 roots | `RichGraph.cs` | Implemented |
|
||||
| Patch oracle roots | `PatchOracleComparer.cs` | Implemented |
|
||||
| Policy integration | - | Pending |
|
||||
| DT_NEEDED graph | - | Pending |
|
||||
|
||||
---
|
||||
|
||||
## 10. Test Fixtures
|
||||
|
||||
Location: `tests/Binary/fixtures/init-roots/`
|
||||
|
||||
| Fixture | Description |
|
||||
|---------|-------------|
|
||||
| `elf-simple-init/` | Binary with single init function |
|
||||
| `elf-init-array/` | Binary with multiple init_array entries |
|
||||
| `elf-preinit/` | Binary with preinit_array |
|
||||
| `elf-ctors/` | Binary with .ctors section |
|
||||
| `elf-stripped-init/` | Stripped binary with init |
|
||||
| `pe-dllmain/` | PE DLL with DllMain |
|
||||
| `pe-tls-callbacks/` | PE with TLS callbacks |
|
||||
|
||||
---
|
||||
|
||||
## 11. Related Contracts
|
||||
|
||||
- [richgraph-v1](./richgraph-v1.md) - Root schema in graphs
|
||||
- [Build-ID Propagation](./buildid-propagation.md) - Binary identification
|
||||
- [Patch Oracles](../reachability/patch-oracles.md) - Oracle validation
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 1.0.0 | 2025-12-13 | Scanner Guild | Initial contract for init-section roots |
|
||||
317
docs/contracts/native-toolchain-decision.md
Normal file
317
docs/contracts/native-toolchain-decision.md
Normal file
@@ -0,0 +1,317 @@
|
||||
# DECISION-NATIVE-TOOLCHAIN-401: Native Lifter and Demangler Selection
|
||||
|
||||
> **Status:** Published
|
||||
> **Version:** 1.0.0
|
||||
> **Published:** 2025-12-13
|
||||
> **Owners:** Scanner Guild, Platform Guild
|
||||
> **Unblocks:** SCANNER-NATIVE-401-015, SCAN-REACH-401-009
|
||||
|
||||
## Decision Summary
|
||||
|
||||
This document records the decisions for native binary analysis toolchain selection, enabling implementation of native symbol extraction, callgraph generation, and demangling for ELF/PE/Mach-O binaries.
|
||||
|
||||
---
|
||||
|
||||
## 1. Component Decisions
|
||||
|
||||
### 1.1 ELF Parser
|
||||
|
||||
**Decision:** Use custom pure-C# ELF parser
|
||||
|
||||
**Rationale:**
|
||||
- No native dependencies, portable across platforms
|
||||
- Already implemented in `StellaOps.Scanner.Analyzers.Native`
|
||||
- Sufficient for symbol table, dynamic section, and relocation parsing
|
||||
- Avoids licensing complexity of external libraries
|
||||
|
||||
**Implementation:** `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/Elf/`
|
||||
|
||||
### 1.2 PE Parser
|
||||
|
||||
**Decision:** Use custom pure-C# PE parser
|
||||
|
||||
**Rationale:**
|
||||
- No native dependencies
|
||||
- Already implemented in `StellaOps.Scanner.Analyzers.Native`
|
||||
- Handles import/export tables, Debug directory
|
||||
- Compatible with air-gapped deployment
|
||||
|
||||
**Implementation:** `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/Pe/`
|
||||
|
||||
### 1.3 Mach-O Parser
|
||||
|
||||
**Decision:** Use custom pure-C# Mach-O parser
|
||||
|
||||
**Rationale:**
|
||||
- Consistent with ELF/PE approach
|
||||
- No native dependencies
|
||||
- Sufficient for symbol table and load commands
|
||||
|
||||
**Implementation:** `src/Scanner/__Libraries/StellaOps.Scanner.Analyzers.Native/Internal/MachO/`
|
||||
|
||||
### 1.4 Symbol Demangler
|
||||
|
||||
**Decision:** Use per-language managed demanglers with native fallback
|
||||
|
||||
| Language | Primary Demangler | Fallback |
|
||||
|----------|-------------------|----------|
|
||||
| C++ (Itanium ABI) | `Demangler.Net` (NuGet) | llvm-cxxfilt via P/Invoke |
|
||||
| C++ (MSVC) | `UnDecorateSymbolName` wrapper | None (Windows-specific) |
|
||||
| Rust | `rustc-demangle` port | rustfilt via P/Invoke |
|
||||
| Swift | `swift-demangle` port | None |
|
||||
| D | `dlang-demangler` port | None |
|
||||
|
||||
**Rationale:**
|
||||
- Managed demanglers provide determinism and portability
|
||||
- Native fallback only for edge cases
|
||||
- No runtime dependency on external tools
|
||||
|
||||
**NuGet packages:**
|
||||
```xml
|
||||
<PackageReference Include="Demangler.Net" Version="1.0.0" />
|
||||
```
|
||||
|
||||
### 1.5 Disassembler (Optional, for heuristic analysis)
|
||||
|
||||
**Decision:** Use Iced (x86/x64) + Capstone.NET (ARM/others)
|
||||
|
||||
| Architecture | Library | NuGet Package |
|
||||
|--------------|---------|---------------|
|
||||
| x86/x64 | Iced | `Iced` |
|
||||
| ARM/ARM64 | Capstone.NET | `Capstone.NET` |
|
||||
| Other | Skip disassembly | N/A |
|
||||
|
||||
**Rationale:**
|
||||
- Iced is pure managed, no native deps for x86
|
||||
- Capstone.NET wraps Capstone with native lib
|
||||
- Disassembly is optional for heuristic edge detection
|
||||
|
||||
### 1.6 Callgraph Extraction
|
||||
|
||||
**Decision:** Static analysis only (no dynamic execution)
|
||||
|
||||
**Methods:**
|
||||
1. Relocation-based: Extract call targets from relocations
|
||||
2. Import/Export: Map import references to exports
|
||||
3. Symbol-based: Direct and indirect call targets from symbol table
|
||||
4. CFG heuristics: Basic block boundary detection (x86 only)
|
||||
|
||||
**No dynamic analysis:** Avoids execution risks, portable.
|
||||
|
||||
---
|
||||
|
||||
## 2. CI Toolchain Requirements
|
||||
|
||||
### 2.1 Build Requirements
|
||||
|
||||
| Component | Requirement | Notes |
|
||||
|-----------|-------------|-------|
|
||||
| .NET SDK | 10.0+ | Required for all builds |
|
||||
| Native libs (optional) | Capstone 4.0+ | Only for ARM disassembly |
|
||||
| Test binaries | Pre-built fixtures | No compiler dependency in CI |
|
||||
|
||||
### 2.2 Test Fixture Strategy
|
||||
|
||||
**Decision:** Ship pre-built binary fixtures, not source + compiler
|
||||
|
||||
**Rationale:**
|
||||
- Deterministic: Same binary hash every run
|
||||
- No compiler dependency in CI
|
||||
- Smaller CI image footprint
|
||||
- Cross-platform: Same fixtures on all runners
|
||||
|
||||
**Fixture locations:**
|
||||
```
|
||||
tests/Binary/fixtures/
|
||||
elf-x86_64/
|
||||
binary.elf # Pre-built
|
||||
expected.json # Expected graph
|
||||
expected-hashes.txt # Determinism check
|
||||
pe-x64/
|
||||
binary.exe
|
||||
expected.json
|
||||
macho-arm64/
|
||||
binary.dylib
|
||||
expected.json
|
||||
```
|
||||
|
||||
### 2.3 Fixture Generation (Offline)
|
||||
|
||||
Fixtures are generated offline by maintainers:
|
||||
|
||||
```bash
|
||||
# Generate ELF fixture (run once, commit result)
|
||||
cd tools/fixtures
|
||||
./generate-elf-fixture.sh
|
||||
|
||||
# Verify hashes match
|
||||
./verify-fixtures.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Demangling Contract
|
||||
|
||||
### 3.1 Output Format
|
||||
|
||||
Demangled names follow this format:
|
||||
|
||||
```json
|
||||
{
|
||||
"symbol": {
|
||||
"mangled": "_ZN4Curl7Session4readEv",
|
||||
"demangled": "Curl::Session::read()",
|
||||
"source": "itanium-abi",
|
||||
"confidence": 1.0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Demangling Sources
|
||||
|
||||
| Source | Description | Confidence |
|
||||
|--------|-------------|------------|
|
||||
| `itanium-abi` | Itanium C++ ABI (GCC/Clang) | 1.0 |
|
||||
| `msvc` | Microsoft Visual C++ | 1.0 |
|
||||
| `rust` | Rust mangling | 1.0 |
|
||||
| `swift` | Swift mangling | 1.0 |
|
||||
| `fallback` | Native tool fallback | 0.9 |
|
||||
| `heuristic` | Pattern-based guess | 0.6 |
|
||||
| `none` | No demangling available | 0.3 |
|
||||
|
||||
### 3.3 Failed Demangling
|
||||
|
||||
When demangling fails:
|
||||
|
||||
```json
|
||||
{
|
||||
"symbol": {
|
||||
"mangled": "_Z15unknown_format",
|
||||
"demangled": null,
|
||||
"source": "none",
|
||||
"confidence": 0.3,
|
||||
"demangling_error": "Unrecognized mangling scheme"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Callgraph Edge Types
|
||||
|
||||
### 4.1 Edge Type Enumeration
|
||||
|
||||
| Type | Description | Confidence |
|
||||
|------|-------------|------------|
|
||||
| `call` | Direct call instruction | 1.0 |
|
||||
| `plt` | PLT/GOT indirect call | 0.95 |
|
||||
| `indirect` | Indirect call (vtable, function pointer) | 0.6 |
|
||||
| `init_array` | From init_array to function | 1.0 |
|
||||
| `tls_callback` | TLS callback invocation | 1.0 |
|
||||
| `exception` | Exception handler target | 0.8 |
|
||||
| `switch` | Switch table target | 0.7 |
|
||||
| `heuristic` | CFG-based heuristic | 0.4 |
|
||||
|
||||
### 4.2 Unknown Targets
|
||||
|
||||
When call target cannot be resolved:
|
||||
|
||||
```json
|
||||
{
|
||||
"unknowns": [
|
||||
{
|
||||
"id": "unknown:call:0x12345678",
|
||||
"type": "unresolved_call_target",
|
||||
"source_id": "sym:binary:abc...",
|
||||
"call_site": "0x12345678",
|
||||
"reason": "Indirect call through register"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Performance Constraints
|
||||
|
||||
### 5.1 Size Limits
|
||||
|
||||
| Metric | Limit | Action on Exceed |
|
||||
|--------|-------|------------------|
|
||||
| Binary size | 100 MB | Warn, proceed |
|
||||
| Symbol count | 1M symbols | Chunk processing |
|
||||
| Edge count | 10M edges | Chunk output |
|
||||
| Memory usage | 4 GB | Stream processing |
|
||||
|
||||
### 5.2 Timeout Constraints
|
||||
|
||||
| Operation | Timeout | Action on Exceed |
|
||||
|-----------|---------|------------------|
|
||||
| ELF parse | 60s | Fail with partial |
|
||||
| Demangle all | 120s | Truncate results |
|
||||
| CFG analysis | 300s | Skip heuristics |
|
||||
| Total analysis | 600s | Fail gracefully |
|
||||
|
||||
---
|
||||
|
||||
## 6. Integration Points
|
||||
|
||||
### 6.1 Scanner Plugin Interface
|
||||
|
||||
```csharp
|
||||
public interface INativeAnalyzer : IAnalyzerPlugin
|
||||
{
|
||||
Task<NativeObservationDocument> AnalyzeAsync(
|
||||
Stream binaryStream,
|
||||
NativeAnalyzerOptions options,
|
||||
CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 RichGraph Integration
|
||||
|
||||
Native analysis results feed into RichGraph:
|
||||
|
||||
```
|
||||
NativeObservation → NativeReachabilityGraph → RichGraph nodes/edges
|
||||
```
|
||||
|
||||
### 6.3 Signals Integration
|
||||
|
||||
Native symbols with runtime hits:
|
||||
|
||||
```
|
||||
Signals runtime-facts + RichGraph → ReachabilityFact with confidence
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Implementation Checklist
|
||||
|
||||
| Task | Status | Owner |
|
||||
|------|--------|-------|
|
||||
| ELF parser | Done | Scanner Guild |
|
||||
| PE parser | Done | Scanner Guild |
|
||||
| Mach-O parser | In Progress | Scanner Guild |
|
||||
| C++ demangler | Done | Scanner Guild |
|
||||
| Rust demangler | Pending | Scanner Guild |
|
||||
| Callgraph builder | Done | Scanner Guild |
|
||||
| Test fixtures | Partial | QA Guild |
|
||||
| CI integration | Pending | DevOps Guild |
|
||||
|
||||
---
|
||||
|
||||
## 8. Related Documents
|
||||
|
||||
- [richgraph-v1 Contract](./richgraph-v1.md)
|
||||
- [Build-ID Propagation](./buildid-propagation.md)
|
||||
- [Init-Section Roots](./init-section-roots.md)
|
||||
- [Binary Reachability Schema](../reachability/binary-reachability-schema.md)
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 1.0.0 | 2025-12-13 | Platform Guild | Initial toolchain decision |
|
||||
Reference in New Issue
Block a user