# PURL-Resolved Callgraph Edges (Nov 2026) This note captures the required behavior for joining binary callgraphs with SBOM components using **purl + symbol digest** annotations. It replaces any pointer to prior advisories; everything needed to ship the feature is here. ## 1. Goal Annotate every call edge in `richgraph-v1` with: - `purl` of the component that defines the callee, and - a stable `symbol_digest` (hash of normalized signature plus optional instruction fingerprint). This lets graphs from multiple binaries merge naturally and line up with SBOM entries, so reachability answers “is the vulnerable function reachable in my deployment?” without re-identifying components. ## 2. Data model additions - **Node**: `SymbolNode` gains `purl` and `symbol_digest` fields (sha256 of normalized signature; include demangled name and parameter types; optionally append block hash for stripped code). - **Edge**: `CallEdge` gains `purl` (callee owner) and `symbol_digest`; keep existing `kind`/`evidence` fields. When callee resolution is ambiguous, include `candidates[]` with ranked purls and set `confidence` accordingly. - **Provenance**: store analyzer fingerprint (`analyzer`, `version`, `toolchain_digest`) and graph hash in CAS metadata. ## 3. Producer rules 1) **Map callee → file → SBOM component**. Use import tables (ELF DT_NEEDED + reloc, PE IAT, Mach-O stubs) or resolved path. If multiple candidates, emit `candidates[]` and lower confidence. 2) **Compute symbol digest**. Normalize the signature, demangle if possible, lowercase type names, strip addresses, then sha256 the canonical form. For stripped symbols, combine synthetic name and code block hash. 3) **Attach to edges**. For every `call` edge, set `purl` and `symbol_digest`. If callee is external but unresolved, emit `purl:"pkg:unknown"` and also write an Unknowns entry (see signals unknowns registry). 4) **Determinism**. Sort nodes and edges before hashing; keep evidence arrays sorted (`import`, `reloc`, `disasm`, `runtime`). Graph hash uses BLAKE3 over canonical JSON. ## 4. Consumer rules - **Signals**: merge edges from many binaries by `(purl, symbol_digest)`; keep multiple `site` entries. Store in `call_edges` with `purl` as the join key for SBOM overlays. - **Policy/VEX**: treat `reachable` if any entrypoint path hits a `symbol_digest` that matches an affected function for the CVE purl. - **UI/CLI**: display `purl@version` plus demangled name; show site offsets for debugging; show confidence when candidates were present. ## 5. SBOM join strategy 1) Use `purl` from component resolver; if absent, fall back to `build_id` plus hash match and emit `purl:"pkg:unknown"`. 2) When multiple SBOM components share a purl, keep all matches but prefer those whose file hash equals the binary under analysis. 3) For runtime traces, attach the same `symbol_digest` so runtime hits boost confidence on the correct edge. ## 6. Acceptance tests - Imports-only: edge from binary main to `pkg:deb/ubuntu/openssl@3.0.2` `symbol_digest=sha256:...` must appear without running disassembly. - Disassembly: direct `call` to internal function carries `purl` of the hosting binary’s SBOM entry. - Ambiguity: when two candidate purls exist, graph stores `candidates[2]` and `confidence < 1`. - Graph hash stability: reordering analyzer flags does not change BLAKE3 hash. ## 7. Deliverables - Update `richgraph-v1` schema and DTOs (Scanner + Signals). - Persist `purl`/`symbol_digest` in Mongo `call_edges` and CAS manifests. - CLI: extend `stella reachability upload-callgraph` and `stella graph explain` to surface `purl` plus digest. - Docs: reference this file from Scanner, Signals, and Reachability guides once implemented.