up
This commit is contained in:
@@ -39,4 +39,220 @@
|
||||
|
||||
## Open Questions
|
||||
- Final DSSE payload shape (Signals team) — currently assumed `graph.bundle` with edges, symbols, metadata.
|
||||
- Whether to include debugline info for coverage (could add optional module later).***
|
||||
- Whether to include debugline info for coverage (could add optional module later).
|
||||
|
||||
---
|
||||
|
||||
## 8. Native Schema Alignment with richgraph-v1 (Sprint 0401)
|
||||
|
||||
Native callgraph output must conform to `richgraph-v1` (see `docs/contracts/richgraph-v1.md`). This section defines the native-specific mappings.
|
||||
|
||||
### 8.1 NativeFunction Node Schema
|
||||
|
||||
Maps ELF/PE/Mach-O symbols to richgraph-v1 nodes:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "sym:binary:...",
|
||||
"symbol_id": "sym:binary:base64url(sha256(tuple))",
|
||||
"lang": "binary",
|
||||
"kind": "function",
|
||||
"display": "ssl3_read_bytes",
|
||||
"code_id": "code:binary:base64url(...)",
|
||||
"code_block_hash": "sha256:deadbeef...",
|
||||
"symbol": {
|
||||
"mangled": "_Z15ssl3_read_bytesP6ssl_stPviijPi",
|
||||
"demangled": "ssl3_read_bytes(ssl_st*, void*, int, int, int, int*)",
|
||||
"source": "DWARF",
|
||||
"confidence": 0.98
|
||||
},
|
||||
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64",
|
||||
"build_id": "gnu-build-id:a1b2c3d4e5f6...",
|
||||
"symbol_digest": "sha256:...",
|
||||
"evidence": ["dynsym", "dwarf"],
|
||||
"attributes": {
|
||||
"section": ".text",
|
||||
"address": "0x401000",
|
||||
"size": 256,
|
||||
"binding": "global",
|
||||
"visibility": "default",
|
||||
"elf_type": "STT_FUNC"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 8.2 SymbolID Construction for Native
|
||||
|
||||
Canonical tuple (NUL-separated, per `richgraph-v1` §SymbolID):
|
||||
|
||||
```
|
||||
binary:
|
||||
{file_hash}\0{section}\0{addr}\0{name}\0{linkage}\0{code_block_hash?}
|
||||
|
||||
Examples:
|
||||
sym:binary:base64url(sha256("sha256:abc...\0.text\00x401000\0ssl3_read_bytes\0global\0"))
|
||||
sym:binary:base64url(sha256("sha256:abc...\0.text\00x401000\0\0local\0sha256:deadbeef")) # stripped
|
||||
```
|
||||
|
||||
### 8.3 NativeCallEdge Schema
|
||||
|
||||
Maps PLT/GOT/relocation-based calls to richgraph-v1 edges:
|
||||
|
||||
```json
|
||||
{
|
||||
"from": "sym:binary:...",
|
||||
"to": "sym:binary:...",
|
||||
"kind": "call",
|
||||
"purl": "pkg:deb/ubuntu/openssl@3.0.2?arch=amd64",
|
||||
"symbol_digest": "sha256:...",
|
||||
"confidence": 0.85,
|
||||
"evidence": ["plt", "got", "reloc"],
|
||||
"candidates": [],
|
||||
"attributes": {
|
||||
"reloc_type": "R_X86_64_PLT32",
|
||||
"got_offset": "0x602020",
|
||||
"plt_index": 42
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 8.4 Edge Kind Mapping
|
||||
|
||||
| Native Call Type | richgraph-v1 `kind` | Confidence | Evidence |
|
||||
|------------------|---------------------|------------|----------|
|
||||
| Direct call (resolved) | `call` | 1.0 | `["disasm"]` |
|
||||
| PLT call (resolved) | `call` | 0.95 | `["plt", "got"]` |
|
||||
| PLT call (unresolved) | `indirect` | 0.5 | `["plt"]` + `candidates[]` |
|
||||
| GOT indirect | `indirect` | 0.6 | `["got", "reloc"]` |
|
||||
| Function pointer | `indirect` | 0.3 | `["disasm", "heuristic"]` |
|
||||
| Init array entry | `init` | 1.0 | `["init_array"]` |
|
||||
| TLS constructor | `init` | 1.0 | `["tls_init"]` |
|
||||
|
||||
### 8.5 Native Root Nodes
|
||||
|
||||
Synthetic roots for native entry points:
|
||||
|
||||
```json
|
||||
{
|
||||
"roots": [
|
||||
{
|
||||
"id": "sym:binary:..._start",
|
||||
"phase": "load",
|
||||
"source": "e_entry"
|
||||
},
|
||||
{
|
||||
"id": "sym:binary:...main",
|
||||
"phase": "runtime",
|
||||
"source": "symbol"
|
||||
},
|
||||
{
|
||||
"id": "init:binary:0x401000",
|
||||
"phase": "init",
|
||||
"source": "DT_INIT_ARRAY[0]"
|
||||
},
|
||||
{
|
||||
"id": "init:binary:0x401020",
|
||||
"phase": "init",
|
||||
"source": ".ctors[0]"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 8.6 Build ID and Code ID Handling
|
||||
|
||||
| Source | build_id format | code_id fallback |
|
||||
|--------|-----------------|------------------|
|
||||
| ELF `.note.gnu.build-id` | `gnu-build-id:{hex}` | N/A |
|
||||
| PE Debug Directory | `pdb-guid:{guid}:{age}` | N/A |
|
||||
| Mach-O `LC_UUID` | `macho-uuid:{uuid}` | N/A |
|
||||
| Missing build-id | None | `sha256:{file_hash}` |
|
||||
|
||||
When build-id is missing:
|
||||
1. Set `build_id` to null
|
||||
2. Set `code_id` using file hash: `code:binary:base64url(sha256("{file_hash}\0{section}\0{addr}\0{size}"))`
|
||||
3. Add `"build_id_source": "FileHash"` to attributes
|
||||
4. Emit `U1` uncertainty state with entropy based on % of symbols missing build-id
|
||||
|
||||
### 8.7 Stripped Binary Handling
|
||||
|
||||
For stripped binaries without symbol names:
|
||||
|
||||
1. **Synthetic name:** `sub_{address}` (e.g., `sub_401000`)
|
||||
2. **Code block hash:** SHA-256 of function bytes (`sha256:{hex}`)
|
||||
3. **Confidence:** 0.4 (heuristic function boundary detection)
|
||||
4. **Evidence:** `["heuristic", "cfg"]`
|
||||
|
||||
Example node:
|
||||
```json
|
||||
{
|
||||
"id": "sym:binary:...",
|
||||
"symbol_id": "sym:binary:...",
|
||||
"lang": "binary",
|
||||
"kind": "function",
|
||||
"display": "sub_401000",
|
||||
"code_id": "code:binary:...",
|
||||
"code_block_hash": "sha256:deadbeef...",
|
||||
"symbol": {
|
||||
"mangled": null,
|
||||
"demangled": null,
|
||||
"source": "NONE",
|
||||
"confidence": 0.4
|
||||
},
|
||||
"evidence": ["heuristic", "cfg"]
|
||||
}
|
||||
```
|
||||
|
||||
### 8.8 Unknown Edge Targets
|
||||
|
||||
When call target cannot be resolved:
|
||||
|
||||
1. Create synthetic target node with `"kind": "unknown"`
|
||||
2. Add to `candidates[]` on edge if multiple possibilities
|
||||
3. Emit edge with low confidence (0.3)
|
||||
4. Register in Unknowns registry
|
||||
|
||||
```json
|
||||
{
|
||||
"from": "sym:binary:...caller",
|
||||
"to": "unknown:binary:plt_42",
|
||||
"kind": "indirect",
|
||||
"confidence": 0.3,
|
||||
"candidates": [
|
||||
"pkg:deb/ubuntu/libssl@3.0.2",
|
||||
"pkg:deb/ubuntu/libcrypto@3.0.2"
|
||||
],
|
||||
"evidence": ["plt", "unresolved"]
|
||||
}
|
||||
```
|
||||
|
||||
### 8.9 DSSE Bundle for Native Graphs
|
||||
|
||||
Per-layer DSSE bundle structure:
|
||||
|
||||
```json
|
||||
{
|
||||
"payloadType": "application/vnd.stellaops.graph+json",
|
||||
"payload": "<base64(canonical_graph_json)>",
|
||||
"signatures": [
|
||||
{
|
||||
"keyid": "stellaops:scanner:native:v1",
|
||||
"sig": "<base64(signature)>"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Subject path: `cas://reachability/graphs/{blake3}`
|
||||
|
||||
### 8.10 Implementation Checklist
|
||||
|
||||
- [ ] `NativeFunctionNode` maps to `richgraph-v1` node schema
|
||||
- [ ] `NativeCallEdge` maps to `richgraph-v1` edge schema
|
||||
- [ ] SymbolID uses `sym:binary:` prefix with canonical tuple
|
||||
- [ ] CodeID uses `code:binary:` prefix for stripped symbols
|
||||
- [ ] Graph hash uses BLAKE3-256 (`blake3:{hex}`)
|
||||
- [ ] Symbol digest uses SHA-256 (`sha256:{hex}`)
|
||||
- [ ] Init array roots use `phase: "init"`
|
||||
- [ ] Missing build-id triggers U1 uncertainty
|
||||
- [ ] DSSE envelope per layer with `stellaops:scanner:native:v1` key
|
||||
|
||||
Reference in New Issue
Block a user