52 lines
3.6 KiB
Markdown
52 lines
3.6 KiB
Markdown
# PURL-Resolved Callgraph Edges (Nov 2026)
|
||
|
||
This note captures the required behavior for joining binary callgraphs with SBOM components using **purl + symbol digest** annotations. It replaces any pointer to prior advisories; everything needed to ship the feature is here.
|
||
|
||
## 1. Goal
|
||
|
||
Annotate every call edge in `richgraph-v1` with:
|
||
|
||
- `purl` of the component that defines the callee, and
|
||
- a stable `symbol_digest` (hash of normalized signature plus optional instruction fingerprint).
|
||
|
||
This lets graphs from multiple binaries merge naturally and line up with SBOM entries, so reachability answers “is the vulnerable function reachable in my deployment?” without re-identifying components.
|
||
|
||
## 2. Data model additions
|
||
|
||
- **Node**: `SymbolNode` gains `purl` and `symbol_digest` fields (sha256 of normalized signature; include demangled name and parameter types; optionally append block hash for stripped code).
|
||
- **Edge**: `CallEdge` gains `purl` (callee owner) and `symbol_digest`; keep existing `kind`/`evidence` fields. When callee resolution is ambiguous, include `candidates[]` with ranked purls and set `confidence` accordingly.
|
||
- **Provenance**: store analyzer fingerprint (`analyzer`, `version`, `toolchain_digest`) and graph hash in CAS metadata.
|
||
|
||
## 3. Producer rules
|
||
|
||
1) **Map callee → file → SBOM component**. Use import tables (ELF DT_NEEDED + reloc, PE IAT, Mach-O stubs) or resolved path. If multiple candidates, emit `candidates[]` and lower confidence.
|
||
2) **Compute symbol digest**. Normalize the signature, demangle if possible, lowercase type names, strip addresses, then sha256 the canonical form. For stripped symbols, combine synthetic name and code block hash.
|
||
3) **Attach to edges**. For every `call` edge, set `purl` and `symbol_digest`. If callee is external but unresolved, emit `purl:"pkg:unknown"` and also write an Unknowns entry (see signals unknowns registry).
|
||
4) **Determinism**. Sort nodes and edges before hashing; keep evidence arrays sorted (`import`, `reloc`, `disasm`, `runtime`). Graph hash uses BLAKE3 over canonical JSON.
|
||
|
||
## 4. Consumer rules
|
||
|
||
- **Signals**: merge edges from many binaries by `(purl, symbol_digest)`; keep multiple `site` entries. Store in `call_edges` with `purl` as the join key for SBOM overlays.
|
||
- **Policy/VEX**: treat `reachable` if any entrypoint path hits a `symbol_digest` that matches an affected function for the CVE purl.
|
||
- **UI/CLI**: display `purl@version` plus demangled name; show site offsets for debugging; show confidence when candidates were present.
|
||
|
||
## 5. SBOM join strategy
|
||
|
||
1) Use `purl` from component resolver; if absent, fall back to `build_id` plus hash match and emit `purl:"pkg:unknown"`.
|
||
2) When multiple SBOM components share a purl, keep all matches but prefer those whose file hash equals the binary under analysis.
|
||
3) For runtime traces, attach the same `symbol_digest` so runtime hits boost confidence on the correct edge.
|
||
|
||
## 6. Acceptance tests
|
||
|
||
- Imports-only: edge from binary main to `pkg:deb/ubuntu/openssl@3.0.2` `symbol_digest=sha256:...` must appear without running disassembly.
|
||
- Disassembly: direct `call` to internal function carries `purl` of the hosting binary’s SBOM entry.
|
||
- Ambiguity: when two candidate purls exist, graph stores `candidates[2]` and `confidence < 1`.
|
||
- Graph hash stability: reordering analyzer flags does not change BLAKE3 hash.
|
||
|
||
## 7. Deliverables
|
||
|
||
- Update `richgraph-v1` schema and DTOs (Scanner + Signals).
|
||
- Persist `purl`/`symbol_digest` in Mongo `call_edges` and CAS manifests.
|
||
- CLI: extend `stella reachability upload-callgraph` and `stella graph explain` to surface `purl` plus digest.
|
||
- Docs: reference this file from Scanner, Signals, and Reachability guides once implemented.
|