3.6 KiB
3.6 KiB
PURL-Resolved Callgraph Edges (Nov 2026)
This note captures the required behavior for joining binary callgraphs with SBOM components using purl + symbol digest annotations. It replaces any pointer to prior advisories; everything needed to ship the feature is here.
1. Goal
Annotate every call edge in richgraph-v1 with:
purlof the component that defines the callee, and- a stable
symbol_digest(hash of normalized signature plus optional instruction fingerprint).
This lets graphs from multiple binaries merge naturally and line up with SBOM entries, so reachability answers “is the vulnerable function reachable in my deployment?” without re-identifying components.
2. Data model additions
- Node:
SymbolNodegainspurlandsymbol_digestfields (sha256 of normalized signature; include demangled name and parameter types; optionally append block hash for stripped code). - Edge:
CallEdgegainspurl(callee owner) andsymbol_digest; keep existingkind/evidencefields. When callee resolution is ambiguous, includecandidates[]with ranked purls and setconfidenceaccordingly. - Provenance: store analyzer fingerprint (
analyzer,version,toolchain_digest) and graph hash in CAS metadata.
3. Producer rules
- Map callee → file → SBOM component. Use import tables (ELF DT_NEEDED + reloc, PE IAT, Mach-O stubs) or resolved path. If multiple candidates, emit
candidates[]and lower confidence. - Compute symbol digest. Normalize the signature, demangle if possible, lowercase type names, strip addresses, then sha256 the canonical form. For stripped symbols, combine synthetic name and code block hash.
- Attach to edges. For every
calledge, setpurlandsymbol_digest. If callee is external but unresolved, emitpurl:"pkg:unknown"and also write an Unknowns entry (see signals unknowns registry). - Determinism. Sort nodes and edges before hashing; keep evidence arrays sorted (
import,reloc,disasm,runtime). Graph hash uses BLAKE3 over canonical JSON.
4. Consumer rules
- Signals: merge edges from many binaries by
(purl, symbol_digest); keep multiplesiteentries. Store incall_edgeswithpurlas the join key for SBOM overlays. - Policy/VEX: treat
reachableif any entrypoint path hits asymbol_digestthat matches an affected function for the CVE purl. - UI/CLI: display
purl@versionplus demangled name; show site offsets for debugging; show confidence when candidates were present.
5. SBOM join strategy
- Use
purlfrom component resolver; if absent, fall back tobuild_idplus hash match and emitpurl:"pkg:unknown". - When multiple SBOM components share a purl, keep all matches but prefer those whose file hash equals the binary under analysis.
- For runtime traces, attach the same
symbol_digestso runtime hits boost confidence on the correct edge.
6. Acceptance tests
- Imports-only: edge from binary main to
pkg:deb/ubuntu/openssl@3.0.2symbol_digest=sha256:...must appear without running disassembly. - Disassembly: direct
callto internal function carriespurlof the hosting binary’s SBOM entry. - Ambiguity: when two candidate purls exist, graph stores
candidates[2]andconfidence < 1. - Graph hash stability: reordering analyzer flags does not change BLAKE3 hash.
7. Deliverables
- Update
richgraph-v1schema and DTOs (Scanner + Signals). - Persist
purl/symbol_digestin Mongocall_edgesand CAS manifests. - CLI: extend
stella reachability upload-callgraphandstella graph explainto surfacepurlplus digest. - Docs: reference this file from Scanner, Signals, and Reachability guides once implemented.