This commit is contained in:
StellaOps Bot
2025-12-13 02:22:15 +02:00
parent 564df71bfb
commit 999e26a48e
395 changed files with 25045 additions and 2224 deletions

View File

@@ -39,4 +39,220 @@
## Open Questions
- Final DSSE payload shape (Signals team) — currently assumed `graph.bundle` with edges, symbols, metadata.
- Whether to include debugline info for coverage (could add optional module later).***
- Whether to include debugline info for coverage (could add optional module later).
---
## 8. Native Schema Alignment with richgraph-v1 (Sprint 0401)
Native callgraph output must conform to `richgraph-v1` (see `docs/contracts/richgraph-v1.md`). This section defines the native-specific mappings.
### 8.1 NativeFunction Node Schema
Maps ELF/PE/Mach-O symbols to richgraph-v1 nodes:
```json
{
"id": "sym:binary:...",
"symbol_id": "sym:binary:base64url(sha256(tuple))",
"lang": "binary",
"kind": "function",
"display": "ssl3_read_bytes",
"code_id": "code:binary:base64url(...)",
"code_block_hash": "sha256:deadbeef...",
"symbol": {
"mangled": "_Z15ssl3_read_bytesP6ssl_stPviijPi",
"demangled": "ssl3_read_bytes(ssl_st*, void*, int, int, int, int*)",
"source": "DWARF",
"confidence": 0.98
},
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64",
"build_id": "gnu-build-id:a1b2c3d4e5f6...",
"symbol_digest": "sha256:...",
"evidence": ["dynsym", "dwarf"],
"attributes": {
"section": ".text",
"address": "0x401000",
"size": 256,
"binding": "global",
"visibility": "default",
"elf_type": "STT_FUNC"
}
}
```
### 8.2 SymbolID Construction for Native
Canonical tuple (NUL-separated, per `richgraph-v1` §SymbolID):
```
binary:
{file_hash}\0{section}\0{addr}\0{name}\0{linkage}\0{code_block_hash?}
Examples:
sym:binary:base64url(sha256("sha256:abc...\0.text\00x401000\0ssl3_read_bytes\0global\0"))
sym:binary:base64url(sha256("sha256:abc...\0.text\00x401000\0\0local\0sha256:deadbeef")) # stripped
```
### 8.3 NativeCallEdge Schema
Maps PLT/GOT/relocation-based calls to richgraph-v1 edges:
```json
{
"from": "sym:binary:...",
"to": "sym:binary:...",
"kind": "call",
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64",
"symbol_digest": "sha256:...",
"confidence": 0.85,
"evidence": ["plt", "got", "reloc"],
"candidates": [],
"attributes": {
"reloc_type": "R_X86_64_PLT32",
"got_offset": "0x602020",
"plt_index": 42
}
}
```
### 8.4 Edge Kind Mapping
| Native Call Type | richgraph-v1 `kind` | Confidence | Evidence |
|------------------|---------------------|------------|----------|
| Direct call (resolved) | `call` | 1.0 | `["disasm"]` |
| PLT call (resolved) | `call` | 0.95 | `["plt", "got"]` |
| PLT call (unresolved) | `indirect` | 0.5 | `["plt"]` + `candidates[]` |
| GOT indirect | `indirect` | 0.6 | `["got", "reloc"]` |
| Function pointer | `indirect` | 0.3 | `["disasm", "heuristic"]` |
| Init array entry | `init` | 1.0 | `["init_array"]` |
| TLS constructor | `init` | 1.0 | `["tls_init"]` |
### 8.5 Native Root Nodes
Synthetic roots for native entry points:
```json
{
"roots": [
{
"id": "sym:binary:..._start",
"phase": "load",
"source": "e_entry"
},
{
"id": "sym:binary:...main",
"phase": "runtime",
"source": "symbol"
},
{
"id": "init:binary:0x401000",
"phase": "init",
"source": "DT_INIT_ARRAY[0]"
},
{
"id": "init:binary:0x401020",
"phase": "init",
"source": ".ctors[0]"
}
]
}
```
### 8.6 Build ID and Code ID Handling
| Source | build_id format | code_id fallback |
|--------|-----------------|------------------|
| ELF `.note.gnu.build-id` | `gnu-build-id:{hex}` | N/A |
| PE Debug Directory | `pdb-guid:{guid}:{age}` | N/A |
| Mach-O `LC_UUID` | `macho-uuid:{uuid}` | N/A |
| Missing build-id | None | `sha256:{file_hash}` |
When build-id is missing:
1. Set `build_id` to null
2. Set `code_id` using file hash: `code:binary:base64url(sha256("{file_hash}\0{section}\0{addr}\0{size}"))`
3. Add `"build_id_source": "FileHash"` to attributes
4. Emit `U1` uncertainty state with entropy based on % of symbols missing build-id
### 8.7 Stripped Binary Handling
For stripped binaries without symbol names:
1. **Synthetic name:** `sub_{address}` (e.g., `sub_401000`)
2. **Code block hash:** SHA-256 of function bytes (`sha256:{hex}`)
3. **Confidence:** 0.4 (heuristic function boundary detection)
4. **Evidence:** `["heuristic", "cfg"]`
Example node:
```json
{
"id": "sym:binary:...",
"symbol_id": "sym:binary:...",
"lang": "binary",
"kind": "function",
"display": "sub_401000",
"code_id": "code:binary:...",
"code_block_hash": "sha256:deadbeef...",
"symbol": {
"mangled": null,
"demangled": null,
"source": "NONE",
"confidence": 0.4
},
"evidence": ["heuristic", "cfg"]
}
```
### 8.8 Unknown Edge Targets
When call target cannot be resolved:
1. Create synthetic target node with `"kind": "unknown"`
2. Add to `candidates[]` on edge if multiple possibilities
3. Emit edge with low confidence (0.3)
4. Register in Unknowns registry
```json
{
"from": "sym:binary:...caller",
"to": "unknown:binary:plt_42",
"kind": "indirect",
"confidence": 0.3,
"candidates": [
"pkg:deb/ubuntu/libssl@3.0.2",
"pkg:deb/ubuntu/libcrypto@3.0.2"
],
"evidence": ["plt", "unresolved"]
}
```
### 8.9 DSSE Bundle for Native Graphs
Per-layer DSSE bundle structure:
```json
{
"payloadType": "application/vnd.stellaops.graph+json",
"payload": "<base64(canonical_graph_json)>",
"signatures": [
{
"keyid": "stellaops:scanner:native:v1",
"sig": "<base64(signature)>"
}
]
}
```
Subject path: `cas://reachability/graphs/{blake3}`
### 8.10 Implementation Checklist
- [ ] `NativeFunctionNode` maps to `richgraph-v1` node schema
- [ ] `NativeCallEdge` maps to `richgraph-v1` edge schema
- [ ] SymbolID uses `sym:binary:` prefix with canonical tuple
- [ ] CodeID uses `code:binary:` prefix for stripped symbols
- [ ] Graph hash uses BLAKE3-256 (`blake3:{hex}`)
- [ ] Symbol digest uses SHA-256 (`sha256:{hex}`)
- [ ] Init array roots use `phase: "init"`
- [ ] Missing build-id triggers U1 uncertainty
- [ ] DSSE envelope per layer with `stellaops:scanner:native:v1` key